Conversations with Tyler - Brendan Foody on Teaching AI and the Future of Knowledge Work

Starting point is 00:00:04 Conversations with Tyler is produced by the Mercatus Center at George Mason University, bridging the gap between academic ideas and real-world problems. Learn more at Mercadis.org. For a full transcript of every conversation, enhanced with helpful links, visit Conversationswithtyler.com. Hello, everyone, and welcome back to Conversations with Tyler. Today I'm sitting here chatting with Brendan Foodie at the offices of Mercore. Mercor is an AI company. We'll get into more detail soon enough, which dates from early

Starting point is 00:00:39 2023. Brendan is the CEO and co-founder. I believe he's the youngest unicorn founder ever. Mercore, by some estimates, is the fastest growing company ever, for instance, the quickest speed to $400 million. Brendan also, at age 22, is the youngest conversations with Tyler guest ever. My proudest achievement. There's more we'll get to soon. enough, but Brendan, welcome. Thank you so much for having me, Tyler. I'm excited to be here. Now, I saw an ad online not too long ago from Air Corps, and it said $150 an hour for a poet. Why would you pay a poet $150 an hour? That's a phenomenal place to start. I think it's because,

Starting point is 00:01:22 so for a background on what the company does, we hire all of the experts that teach the leading AI models. And so when one of the AI labs wants to teach their models how to be better at poetry, we'll find some of the best poets in the world that can help to measure success via creating e-vals and examples of how the model should behave. And one of the reasons that we're able to pay so well to attract the best talent is that when we have these phenomenal poets that teach the models how to do things once, they're then able to apply those skills and that knowledge across billions of users, hence allowing us to pay $150 an hour for some of the best poets in the world.

Starting point is 00:02:02 So the poets grade the poetry of the models, or they grade the writing, or what is it they're grading? It could be some combination, depending on the project. But an example might be similar to how a professor in English class would create a rubric to grade an essay or a poem that they might have for the students. We could have a poet that creates a rubric to grade, you know, how well is the model creating whatever poetry you would like and a response that would be desirable to a given user. How do you know when you have a good poet or a great poet? That's so much of the challenge of it, especially with these very subjective domains in the liberal arts, right? Is that so much of it is this question of taste where you want some degree of consensus of, you know, different exceptional people believing that they're each doing a good job, but you probably don't want too much consensus because you also want to get all of these edge case scenarios of what are the models doing that might deviate a little bit from what, the norm is. So you want your poet creators to disagree with each other some amount.

Starting point is 00:03:05 Some about, exactly. But still a response that is conducive with what most users would want to see in their model responses. Are you ever tempted to ask the AI models, how good are the poet creators? We often are. We do a lot of this. It's where we'll have the humans create called a rubric or some sort of e-val to measure success and then have the models, say they're perspective.

Starting point is 00:03:31 because you actually can get a little bit of signal from that, especially if you have an expert, I mean, you know, we have tens of thousands of people that are working on our platform at any given time. And so oftentimes there will be someone that is tired or not putting a lot of effort into their work and the models are able to help us with catching that. So you had a recent project lately. You hired Larry Summers, I believe, for finance and economics. That was a little bit of a unique deal. He's been a guest on this podcast. Cass Sunstein for law, he's been a guest twice on this podcast. Eric Topol from Edison, I've been a guest on his podcast. How do you pick those people? Obviously, they're highly accomplished, but what makes them good at doing this other than just being smart, productive people? Absolutely. Well, so I'll step back and provide a little bit of context on Apex or the AI Productivity Index and why we chose them to help with it.

Starting point is 00:04:22 The largest disconnect that we were seeing in AI research is that everyone was focused on academic evils like GPQA for PhD-level reasoning or IMO for Olympiad Math, which were wholly disconnected from the outcomes that customers actually care about of how do we get the model to automate a medical diagnosis or a legal draft or preparing a certain financial analysis of a company. And so we chose legal experts, medical experts, finance experts, people that have a broad economic perspective to see what is the right methodology to think about measuring success across each of these domains. working with them on segmenting, what are all the different industries within law?

Starting point is 00:05:05 What are all the different types of law? And how do we leverage our marketplace of all of these experts to best capture and measure how well models have automated all of those domains? So it's because they've had real world experience and they're not only academics? Is that the way to think about it? I think that's part of it. I think a lot of them obviously have meaningful real world experience, but also this broad vantage point of the entire industry, right, of not just,

Starting point is 00:05:31 someone that specializes in a particular type of law or a particular industry in big law, but rather having this very large perspective and how we should structure the project, how we should think about the rigorous processes associated with curating the datasets, setting up the reviews, etc. And the paper you did with that group of people as your researchers and many others, I should add, what's the main thing you all learned from that exercise and from the paper? I think the largest takeaway is the rate of model and improvement. improvement at economically valuable tasks is incredible. Like, if you look at the level that GPD

Starting point is 00:06:09 4-0 scored on this model, right, a frontier model a year ago, and that against GPD-5 today, the delta is profound. And so it often gets... Can you put a number on that or somehow? Yeah, called a 25, 30% improvement. Per year. Exactly. Well, now GPD-5 is at a, uh, 64%. So maintaining that would definitely be challenging. But I mean, it gets my mind wondering, like what will this technology be able to do in another year or two? And how will that have this profound impact on the economy that so many of us have been wondering about for a while? But when you give these numbers, to what extent are you measuring how well they do on the test versus how much economic value are they creating? Well, so I'll walk through the method.

Starting point is 00:07:01 methodology and how we derive that. Essentially, within each industry, we start out with surveys of hundreds of experts. So think within consulting, we get experts that were previously at McKinsey, BN, BCG, and other top consulting firms. And then we survey how do they spend their time? What percentage of their time is in customer meetings, is in online research, is an analysis, preparing deliverables for customers. And then within each of those buckets, we ask them to write the corresponding prompts and rubrics associated with how they spend their time. So, you know, using their time as the best proxy we have for the economic value associated with their salary or what customers are willing to pay for. And it's incredible to see, right? The model scoring 64% on that is pretty profound. Obviously, there is some complexity in mapping that to economic impact because in certain industries like medicine, you can't have a 30% failure rate. You need to have near-prefects are similar to driverless cars in some ways.

Starting point is 00:07:58 but in other industries like an initial legal draft or a consulting analysis, this technology is already starting to have a profound impact, and it's only accelerating. But isn't there something about switching from task to task, which the models can't do at all? So the model would beat me on a test. The model might even run better podcast questions than I do, but somehow combining those all in a single entity, I can do, and even the best model, it's still basically at zero as far as I can tell. So the economic value is in a way still at zero?

Starting point is 00:08:31 Well, so it's interesting. I think what you're getting at is there's sort of two key things the model struggle at that humans tend to be very good at. The first is these longer horizon tasks of not just something that we could do in a few hours, but something that might take us 50 or 100 hours to do. And then the second thing is integrating multiple tools with our response and going about doing these things, maybe interacting with people as one of those elements. And I think that that's coming very soon. And the next version of Apex...

Starting point is 00:09:01 And what does very soon mean? Your best guess. Well, I'll talk about it in terms of Apex, and then I'll talk about it in terms of model advancement, because there's a large correlation between the two. We're doing a lot to measure all of those capabilities and how models interact with the entire workspace and how models do these very long horizon tasks and an eval that we're launching in the next couple of months. And very quickly, once researchers are able to measure those capabilities, they'll be able to hill climb them. And so I would be shocked if we don't have

Starting point is 00:09:36 enormously capable models across those dimensions of lots of tool use with very long horizon tasks in the next six to 12 months. And let's just take the body of knowledge alone. Forget about the long horizon, just an on-the-spot test. Let's say I'm Cass Sunstein. I know Cass. He has an incredibly impressive body of knowledge in many areas. When are we at the point where basically Cass cannot ask a question that the best models cannot answer? Wow, that's an interesting question. Well, I think it depends domain by domain, but in law, I think it's going to be a long time. And the reason is that there's so much taste involved in legal responses that effectively getting all of the taste that Cass has into the model is going to be difficult.

Starting point is 00:10:26 I do think we'll very quickly get to the point where Cass has a really hard time finding a mistake the model makes, right? Where he has to spend maybe a week just like trying to probe it. How far away is that? Is that? That might be about two or three years. It wouldn't surprise me for a question and response. I would think it's six months away would be my guess.

Starting point is 00:10:48 But if he asked it a thousand questions. I think he could induce an error. But 50 questions, I think, in less than a year, we might be there. It depends also a little bit in how tightly you define an error. Like, he might have all sorts of knowledge of niche areas of the law that the model isn't strong at. And so there's some question of how you measure this. But I hold Cass in very high regard with respect to his niche knowledge of the law and

Starting point is 00:11:15 ability to stump the models. And what would be an area where the human expert is relatively strong and an area where the human expert compared to the model is relatively weak. There's interestingly a lot of areas in law where the right way of approaching something is not written down or codified. It exists more in the heads of experts, at least not explicitly. And I think it's those domains where there's a lot of taste that isn't well documented, that the models will struggle immensely with because they either need those tokens in the pre-trained data of doing these web-scale training runs, or they needed in the post-training data of having

Starting point is 00:11:55 a legal expert from us to create those datasets. And if they don't have those, then the model will inevitably struggle with that particular problem. Now, I've argued in economics that the leading economics journals should take their referee reports and the submissions and send them somewhere, arguably here. Would that be useful to you? It certainly would. We've talked about it a bunch of the past, but I think that the largest way that these deep domain experts can help to contribute to the advancement of AI is defining the evils. When we have these phenomenal tests for model capabilities, whether in economics, law, or other domains, it's amazing how fast the researchers can help climb them and optimize around them. And so more help in building these tests

Starting point is 00:12:43 and sending them to us and other labs is extremely impactful. So those are nonprofits. those institutions. Why don't they just send it to you now for free? Do you have a theory of this? I'm not sure exactly. It would improve science, right? It would improve science. I think maybe two things. One is awareness of this. I think that while evals are the thing that everyone's talking about in Silicon Valley in the AI labs, it feels like most people and the rest of the country couldn't quite describe exactly why you need an e-val. And I think the second is a little bit of fear, right, where everyone worries about how is AI going to impact their jobs, their work, their ability to contribute to the economy and be meaningful. And I think that that's always top of mind, even for

Starting point is 00:13:29 nonprofit organizations that want to contribute and preach this world of abundance. So let's say we took the live economics or legal, whatever seminars, it'd say the top 10, top 20 schools, recorded them all, somehow anonymized the data, but you had the comments in transcript and sent that to you. Would that be useful? It would be very useful. One thing I will say, though, is that there are sort of two kinds of data is a good way of thinking about it. The first kind of data is just the output. You have some curriculum that the model is reading and learning from. The second kind of data is some way of measuring success, where you have the rubric for the response, you have the test question answer, you have the unit test and code. And that second kind of

Starting point is 00:14:16 of data is the most valuable, where we're able to have the models attempt the problem many, many times, score those responses and learn from them. But both are incredibly impactful and things we would love to get support with. So on your wish list, just to make this more concrete, you can have some kind of data, forget about realism, you just get it for free. What is it you most want? Oh, interesting. For say social sciences. Forget about realism. I think that we tend to focus a lot on what's economically valuable. And so if people have tests that the models are bad at, that map to a meaningful amount of economic value, you know, and it could be an academic domain that can be applied to create a lot of value in other areas, that's super exciting for us. Maybe a good

Starting point is 00:15:05 heuristic is if we could build a model that without seeing this test and reading through it could max out the test, how much economic impact would that add? Whatever test is able to measure that the best is most helpful, right? And so maybe in medicine, it's, you know, a test around how well the model is doing a certain diagnosis in a particularly difficult domain where we think the models can add a ton of impact. Maybe in economics, it's, you know, areas of analysis and modeling of businesses that aren't well codified but could meaningfully impact the way that we underwrite businesses. Those types of things are what's going through my head. And let's say it's poetry.

Starting point is 00:15:43 Let's say you can get it for free, grab what you want from the known universe. What's the data that's going to make the models working through your company better at poetry? Well, I think that it's people that have phenomenal taste of what would users of the end products, users of these frontier models, want to see. Like someone that understands that when a given prompt is given to the model, what is the type of response? that people are going to be amazed with. How do we define the characteristics of those responses is imperative? And so probably more than just poets that have spent a lot of time in school, we would want people that know how to write work that gets a lot of traction from readers that gains broad popularity and interest, drives the impact, so to speak, in whatever

Starting point is 00:16:35 dimension that we define it within poetry. But what's the day to you? you want concretely? Is it a tape of them sitting around a table? Students come bring their poems. The person says, I like this one. Here's why. Here's why not. Is it that tape or is it written reports? Or what's like the thing that would come in the mail when you get your wish? The best analog is a rubric. If you have some... A rubric for how to grade. A rubric for how to grade. So if you have, here, like, if the poem has, you know, evokes this idea that is inevitably going to come up in this prompt or is a characteristic of a really good response will, you know, reward the model a certain amount. If it says this thing will penalize the model, if it styles the response in this way,

Starting point is 00:17:19 will reward it. Those are the types of things. In many ways, very similar to the way that a professor might create a rubric to grade an essay or a poem. Poetry is definitely a more difficult one because I feel like it's very unbounded. With a lot of essays that you might grade from your students, It's a relatively well-scoped prompt where you can probably create a rubric that's easy to apply to all of them, versus I can only imagine in poetry classes how difficult it is to both create an accurate rubric as well as apply it. And so the people that are able to do that the best are certainly extremely valuable and exciting. But to get all nerdy here, you know, Immanuel Kant in his third critique, critique of judgment, he said, in essence, taste is that which cannot be captured in a rubric.

Starting point is 00:18:04 And if the data you want is a rubric and taste is, really important. Maybe Kant was wrong, but how do I square that whole picture? Isn't it by invoking taste you're being circular and wishing for a free lunch that comes from outside the model in a sense? Well, there are other kinds of data that could do if it can't be captured in a rubric. Like another kind is RLHF, where you could have the model generate two responses, similar to what you might see in chat, and then have these people with a lot of taste choose which response they prefer and do that many times until the model is able to understand our preferences. And so that could be one way of going about it as well.

Starting point is 00:18:41 I'm sure you know these studies, whether there's some AI-generated poems and some human-generated poems. And often the humans prefer the AI-generated poems, even though to people with quote-unquote taste, they're worse. Yeah. I mean, who side do you take there? Well, it depends what you're optimizing for.

Starting point is 00:18:58 I mean, I think that generally we're in the mindset of, for the power users of these AI products, what are the types of responses that they would want to see and be happy with. But it's challenging because that sometimes deviates from the types of responses that the top 1% of experts in poetry might say as a broadly good poem. And so striking that balance is really up to a lot of the researchers and product leaders at the labs of what do they think good looks like and how do we act as their partner in defining that. If you could model a much older poet, William Wordsworth, Blake, John Milton, Rilke,

Starting point is 00:19:40 some of my friends say, there are no truly great poets left anymore. The best poets were way back when. Is it a goal to model the older poets and figure out what they would think? And rather than having Larry Summers and Cass Sunstein come in, that you have some AI-generated model of John Milton? Maybe. Well, I will say it ties back to the, goal of apex, which is that we saw people were too focused on a lot of these purely academic

Starting point is 00:20:07 domains and not focusing enough on how will people actually use the models in the economy. But I certainly do think that especially as we start to automate more industries and there's more liberal arts and these kinds of domains where people want to spend time on poetry, certainly building the tools to help them create phenomenal poems and make them happy and their readers happy is definitely the way we'd go about it. I'm not sure if it would be using the archetypes of these former poets. How would you go about it, Tyler? I don't know. I don't trust contemporary poets, frankly. There aren't many of them I like to read. Maybe, you know, Jeffrey Hill would be one. Some are too postmodern. Some maybe are too woke.

Starting point is 00:20:54 Some are too identity-driven. I love older poetry, so it's not that I don't like poetry. but I worry about putting them. They're not quite in charge. I get that, but giving them so much leeway. Yeah, it does evoke this really interesting idea of how we want to teach models and measure success of these models. Is it via consensus? Is it via a handful of the top experts in that given domain? And there's really no correct answer.

Starting point is 00:21:20 And I think that different AI labs, different researchers will go down different routes and that will frame the ways that these products feel and the things that they ultimately achieve. Like maybe we should only enshrine the current age when the current age is at a peak. Like Scott Sumner says, the best movies were maybe made in the 1960s and 70s, whether or not you agree, but you could have movie evaluators be only from that time. There's some still alive.

Starting point is 00:21:48 If you think the best heavy metal, say, comes from the 1980s, well, you wouldn't have like the current evaluators, you would pick evaluators from the 80s. The best poetry does seem by most people's standards to be really quite old, and we can't resurrect those individuals. But the notion that you enshrine current taste when taste changes so much, it's a very interesting decision. It certainly is.

Starting point is 00:22:11 My guess is that in a long enough time horizon, will enshrine taste from every different decade and every different era, and then the model will be able to learn what taste do you have, and how does it pull on each of those knowledge bases to best personalize it, your preferences. How much of society, ideally, should become a big reinforcement learning machine? We sort of tape everyone, everything, every debate people have over the coffee table. I think it will become an immense amount very quickly. There's obviously still going to be the personal conversations over the coffee table that people don't want recorded. But my firm

Starting point is 00:22:53 belief is, especially for economically valuable tasks, will move towards a world where, people do things once. Instead of the investment banker redundantly analyzing a data room to prepare an analysis of a company, you know, every couple of weeks for a new project and a new customer, they'll teach the model how to do that once in the particular domains that they operate in. And similar to building software ones, they'll be able to use that many times as they use their agent. Instead of the customer support rep monotonously responding to tickets every day, and they'll find the mistake that the agent makes, they'll turn that into an oral environment,

Starting point is 00:23:31 and then all of a sudden the agent will be able to solve that problem many times. And so I think in many ways, the economic incentives and how knowledge work will change has a lot of similarities to software, and that we'll move towards these fixed-cost investments of teaching an agent how to do something, building an oral environment for something, and then being able to use agents as many times as we want

Starting point is 00:23:56 to perform that activity. And that's why I believe that a huge portion of the economy will become an oral environment machine. And do you think pendants or meta-like classes will be more important than that? Oh, I don't know. I'm going to do both. I think a lot of both.

Starting point is 00:24:13 If I take myself, like, I don't do that much small talk. Say you attached a little pendant to me and you got the tape of all my conversations, you could feed it in. What's the social value of that? Is it like $5, $5?50. $50 a bit more? How valuable is that? Well, it certainly depends a lot by person. I would imagine yours are quite valuable. But quite. Like what's... How much would you... I'm not asking for an offer,

Starting point is 00:24:40 but how much actually would you pay? Well, I would pay a lot just out of pure curiosity. But if I were trying to think about how valuable it would be to our customers and our business, I imagine it would be something in the order of, it's hard because it changes over time, certainly tens of thousands, if not many, hundreds of thousands of dollars a year and how that evolves over time. But my guess is that for the vast majority of people, they'll still care a lot about privacy. And so maybe that data will be collected to personalize their individual agent, but they're not going to be comfortable with that getting added to the broader like model weights to customize the base model that billions of users are.

Starting point is 00:25:25 But that's easy. So you can tape me with my pendant. I run it through my AI and I say, take out anything I don't want Mercore to hear. And it will do that quite well, maybe not perfectly. And then you get what's left over, all the debates about elasticity's and tax incidents. Maybe. I suspect you're probably more comfortable with it that most people. Most people would probably say, well, you're asking the AI to, you know, be the layer of trust to remove the sensitive information, but it's going to have bias in doing so. And so I think there's always going to be some level of sensitivity around these topics. And I actually believe that some of the companies that have done a very good job around their brand of privacy are going to have an

Starting point is 00:26:07 advantage in it. Like I think Apple, well, maybe not totally at the, you know, frontier of AI yet, has done such a good job in their brand around privacy. And that's going to allow them to have a lot of trust from users in a way that they're able to collect all of this personalized information. Let's say three to five years out, when the top models will be both clearly better than virtually all human experts or maybe all human experts and recognized as such. The latter we certainly don't have. What do you think in that world the reputation of expertise is like? Now, one view is no one respects the experts because the machines are better, but I think an alternative possibility is the machine by not being tied to a personality is less disliked. And people actually

Starting point is 00:26:53 respect the experts more because they get this impersonal distillation of the expert. It's like, oh, the experts did that. They're so amazing. And they're not annoying me like on the late night TV show. Like, what will happen to the status of human experts? I think so. I think that I definitely am already at the point where there are certain domains where I trust chatybt or whatever model I'm using more than I trust, a particular expert in that industry, you know, for a very quick, like, medical perspective, even in some cases or whatever it is. And so I think that there's some element of it being highly competent. There's some element of it not having a face to it that causes us to place this high trust. But I do think that the point you made at the beginning

Starting point is 00:27:40 is around evoking the question of what is the point at which these models will be able to do everything that experts aren't able to do. And my read on the market is that models are advancing very, very quickly in being able to automate, call it 50% or 75% of what humans and experts are able to do, but will really struggle with that last 25%. And I think that for a very long time, human expertise will be imperative to help accomplish that last 25% as the ultimate bottleneck to more economic prosperity and productivity. How long until the best models can write a poem as good as the median Pablo Neruta poem? Oh, I think that's probably not too far off.

Starting point is 00:28:27 I would say less than a year. Yeah, yeah, I think less than a year. How about the very best Pablo Neruda poems? I'm not too calibrate on poetry, so I'd have a hard time saying. But I think it's much further out. And is that your intuition? I agree. I think that's consistent with my intuition as well.

Starting point is 00:28:44 But I think that this longer tail of advancement is generally the most difficult. The other heuristic I have for it is that going back to this dimension of the time horizon of the task, models are in some way superhuman with what you can do in a chat window, right, with your chat bot. But they still can't draft an email for us. They still can't schedule a meeting. And those things will come. But I think that there's a long way before we're able to tell a model,

Starting point is 00:29:13 go off and build a startup for 90 days. And there's going to be an immense amount of human expertise associated with how do we get to that across every knowledge work vertical that we want the models to operate in. Insofar as we turn society into this big engine for reinforcement learning, what new jobs get created by doing that? Well, I think the most interesting part of our business is that everyone else in Silicon Valley is talking about how we automate a way. jobs versus we're very focused on how do we build this new job category of people training agents, building oral environments to help teach models. And that's what I believe it'll converge to. Instead of the investment bankers doing the analysis, they'll build oral environments and train

Starting point is 00:30:00 agents. And it'll be the same across consulting and software engineers and customer support and pretty much every knowledge work vertical. And so it's hard to say the exact pace at which that'll happen, but I would not be surprised if within five years a majority of high-end knowledge workers are training models, whether in their full-time jobs or through our marketplace, to help improve agents at whatever workflows they want to automate. And to hold those jobs, how much technical AI will a person need to have? Or do they just have to know about the thing? They just need to know about the thing. The only element of technical AI that they'll need is to find where the model makes a mistake. So long as they can find where the model

Starting point is 00:30:45 makes a mistake and sort of understand in some ways the frontier of the model and its capabilities, how you can push it to its limit, then it's relatively easy to create some criteria way of measuring that mistake so that the model can learn from it. And I think we'll have that across every different vertical with every different tool with these very long horizons, whether it's 100 hours or hundred days that we want the model to work on something, and that's going to very quickly become the primary bottleneck to model improvement. Is the demand for software price elastic? I think it's extremely price elastic. In fact, I think that the elasticity is the exact right thing to hone in on with respect to how job displacement will evolve in these domains. Like, I think

Starting point is 00:31:33 if we make software engineers 10 times more efficient, we'll have even more software engineers. Maybe we'll have 10 times as many software engineers and build 100 times as much software, right, versus other domains, maybe that's not the case, right? Maybe we only need so much accounting in the world or we only need so much customer support. But I think software engineers certainly will be able to do so much more. Where else do you think of is price elastic? I think that building businesses is also, so a lot of the product and distribution associated with software is certainly going to be something we see a lot of more of. I think there's a lot of domains. Even if you think about investing, obviously, it's not as price elastic as software, but I do think that there's still enormous inefficiency with

Starting point is 00:32:21 respect to how we allocate capital in the economy. Like if I think back to the early days of Mercor, you know, we were having a hard time getting our $10,000 of working capital for our initial, you know, seed investments. And then very quickly, once you get to a reasonable scale, the markets are very, very capitalized. And so I think a lot of this early capital allocation, as well as even just better understanding how companies will develop over time is going to be really interesting. And also how that information and analysis manifests itself within companies, right? For an operator, they sort of have this investing problem of what are all the different bets that they have within their companies? How do they allocate capital and resources associated with that? And so I think

Starting point is 00:33:05 that there's so much elasticity with respect to how we build more products, how we distribute those products, and how we allocate resources within companies more effectively. What will education look like? Five to ten years out. I think education is one of the things I'm most excited about, where a good heuristic is if everyone has Salcon as their personal tutor available 24-7 to teach them whatever topic they want to learn, it'll be that. that it's much easier to motivate themselves. It's much better access to information, much better ways of explaining that information.

Starting point is 00:33:44 And that'll be profoundly impactful. But that seems less price elastic, right? Like only so many hours of Salcon a day, no slight intended to him. Yeah. But it's not going to be 27 hours a day, right? Yeah, that's true. That's true.

Starting point is 00:33:58 So employment for teachers, researchers might shrink? I think in some ways, areas of that, may shrink. But I also think that there's a large element to teaching that exists in personal relationships, of which the model will be able to do part but not all of it, of how does, you know, the teacher act as guiding the student through their journey and helping them to improve both in their curriculum as well as their emotional development. And so I think teachers will still play an important role in the economy and ideally able to just provide higher touch, of contact with all the students and smaller class settings. So this is October 2025. How many people

Starting point is 00:34:41 work at Mercur? Right now, we have just over 300 people across the world as our full-time employees. How did you hire so many good people so quickly? Well, we used our technology and our platform to help with it a bunch. I mean, the origin story of the company was automating all of the ways that we would review resumes, conduct interviews, and decide who to hire. And so, the ways that we assess talent, the ways that we optimize funnels to build out teams is really ingrained in the DNA of the company and a top priority of me and my co-founders. And so I'm extremely grateful for everyone that we have on the team and they make it look easy. How do other people do interviews wrong? I think that one of the, well, this is something we've

Starting point is 00:35:28 talked about a bunch because you obviously wrote a phenomenal book on talent. I think one of the largest problems that people make is that they don't measure the actual skills and capabilities that they want someone to exhibit on the job. Instead of focusing on how do we measure how well this person does, their investment analysis of the data room, they have this vibe space conversation of, you know, where did the person grow up, how similar are they, do they think they would enjoy hanging out together? And obviously, like, that's still important if you're having a working relationship, but I think that they often over-index on that relative to the skills that people actually exhibit.

Starting point is 00:36:08 So just give them a project. Give them a project. And grade them, in essence. I think that's the cleanest way to do it. Let's say it's not programming. As the company gets bigger, the major AI companies, a lot of them now are quite large, and most of the people who work there don't do AI at all. They do jobs that are not so dissimilar from what they might do at Coca-Cola, which is fine.

Starting point is 00:36:29 That's just part of growth. they're legal, their communications, they do events, whatever. When you're trying to hire people like that, say, like, what's the test? What's the project? Or what is it you look for? I think that that's definitely more difficult. I think you probably want to look for cases in their life where they've worked in similar roles because you can't curate a project that's as similar to exactly what they've done.

Starting point is 00:36:56 And so you would see the best proxy for that. really drill in to understand the details of that working environment, how similar it is, how well they performed in that, talking to people that previously worked with them in that environment to get a gauge for it. But it definitely is more difficult to measure someone's slope and how they'll develop on the job over a six-month time horizon than it is to measure their Y intercept. And so I think that's one trend that we've found in talent assessment. Do you think body language in an interview is predictive? I think it can be, but I also think it can be a false signal because I've definitely had cases where I over-index on, oh, this person feels a little bit awkward or whatever it is,

Starting point is 00:37:40 but they do a phenomenal job at the actual work. And so I think it's important to be very cautious around which of these signals are actually correlated with performance and which ones aren't. Articulateness, overrated or underrated. Depends a lot on the job. Depends a lot on the job. Let's say 10 years from now, when we can really measure pretty well the performance of people we're interviewing today, less than 10 years, but say 10 years, let's say you have a company such as Amazon, does a very large number of interviews, and let's say they're all taped, and you run them through the best AI models. How good a predictor do you think that will be, in your opinion?

Starting point is 00:38:20 I think that it will be certainly superhuman, because humans aren't very good at it. Right. But it's still such a difficult problem that there's going to be variants. And I think that for roles like the one you described, what's going through my head is there's a lot of confounding variables. Did the person have an issue in their family that caused them to be off their game or not show up to work? Did they get sick during the interview process and maybe weren't full of energy? There's all these things that just add noise to that problem. But I do believe that as we're able to get all of that data in context,

Starting point is 00:39:03 to have all the notes from the manager around what was happening in this person's life, both during the interview process as well as on the job, that will allow it to over time become phenomenal. And so maybe we have that on a 10-year time horizon. How can we make labor markets more efficient? I think that one of the largest inefficiencies in labor market, is that everything is disaggregated. And that when one of our friends is applying to a job, they would apply to a couple dozen jobs. And when companies considering who to hire, they'll consider

Starting point is 00:39:36 a fraction of a percent of people in the economy. And it feels like there needs to be a structural change there where there's an aggregator that everyone applies to and every company hires from facilitating this perfect flow of information. But we need a very good AI for that to work? I think a very good AI will help with that working. And the reason is that the reason I think it doesn't happen today is that there's a very difficult matching problem. And let me give LinkedIn as an analog. LinkedIn has all the distribution to pretty much every company and every candidate. But at the very same time, it's incredibly difficult to understand based on someone's LinkedIn profile, whether they'll actually perform well at a given job. And so I believe that in that case, it's very much a matching problem, less. also a distribution and aggregation problem to facilitate this effective flow of information and aggregation within knowledge markets. But I think it's also in line with the fact that the nature of jobs is changing dramatically, right?

Starting point is 00:40:36 Previously, everyone would think about this problem in the context of full-time roles, but as we trend towards this world of everyone building RL environments and being able to do work remotely and train models in this fractional way, that also will shift the dynamics of enabling more aggregation, enabling more globalized matching, and how that will impact the economy. Some of my friends think that mentor as a nepotism will make a good comeback, and they say everyone will submit a perfect cover letter,

Starting point is 00:41:11 have an optimized LinkedIn profile, they'll even have practiced with an AI doing the interview. They won't all get up to speed, but a lot of them will, and there'll be this large mass of apparently pretty qualified candidates, And what you'll actually do is resort to the old tried and true. Well, do you know this guy's uncle or something else who can recommend them? Agree? Disagree? I think in some companies and industries that will happen, I agree with it.

Starting point is 00:41:37 My hope is that we have models that are helping to run companies in a very thoughtful, efficient way, that are data-driven about it, where the models have an eval set of all the performance reviews of people in that given company, and they're able to make an accurate prediction over whether this reference or that piece of nepotism should actually be considered or maybe as a counter signal, right? And so that's my hope, but it'll probably play out with some combination of both over time. In the AI sort of run labor market, let's say it's more efficient, but you think there are fewer second chances and late bloomers in that world. You get scored too early, so to speak, and then you're tracked. It's a bit more like how European schooling systems can differ from American. I think there will be a lot of second chances.

Starting point is 00:42:25 And the reason that there will be is that oftentimes they're effective. And so the models will identify that and realize that maybe someone wasn't the right fit for that first role. There's another role that they could be a really good fit for. Because I do think that there are jobs in the economy that almost everyone would excel at. And it's really just this matching problem of finding the intersection of something, that they're excited about where they'll also add an immense amount of economic value. As you know, there are AI services now. You're doing an interview across the top or the bottom of your screen. The AI can give you advice, answers. Does that work at all? What do you

Starting point is 00:43:03 think of those? We read up against a lot of those. One thing I found in talent assessment is that initially people tried to work against AI, similar to what we do in the academic settings, where people would try to say, we're going to have you write the essay on paper so that you're not able to use chat chbt to help you with the essay. When really the right way of approaching it is seeing what people can do when using all of those tools. If we tell them, hey, use all of these phenomenal code gen tools and record your screen in building a product to see what you're able to do over the course of an hour, that's a far better predictor of this person's ability to actually deliver impact than it is to say don't use the tools at all. And so I think that's one shift that we're going to see and will likely frame the relevancy of a lot of these AI cheating tools over the coming few years. Can someone fool you by using an AI cheating tool or do you feel you more or less always know?

Starting point is 00:44:00 I think that there were cases where people could fool us, but now we're quite good at figuring it out. We're quite good at figuring it out and also moving towards assessments where we almost encourage it, right, and are comfortable with the fact that they're using these tools. because we want to see what they're able to do with them. So you were a Teal Fellow, right? And you dropped out. How could they improve their methods? Well, this is something we've talked with them a lot about because the Teal Fellowship is constrained by that exact matching problem

Starting point is 00:44:30 that we were talking about earlier, where they can only consider an interview a fraction of a percent of the people in the world that they think would be a good fit for the fellowship. And so we've worked with them on building out AI interviews that are able to better assess Teal Fellows and using models to analyze the transcripts of those recordings to see what are the signals to better select Teel Fellows

Starting point is 00:44:55 and all of that, which I find very interesting. But isn't it part of their strength? Say, Peter, he's quite controversial, politically and otherwise. Being in Teal Fellow has a certain brand that's distinct from anything political, but it's a very particular thing. Not everyone wants to do it. doesn't it work well because it's an extremely local market and you get people with a certain

Starting point is 00:45:17 kind of orneriness and selecting from that pool just goes pretty well and you don't want to be in the bigger pool of people? Maybe I think that you're right in the element that referrals are very important, right? Oftentimes great people know great people and so they'll always need to leverage referrals. But at the same time, I think they rightfully care a lot about people that think unconventionally and come from unconventional backgrounds, that people from. every part of the world that might otherwise not get a meeting with a venture capitalist or some of these more traditional institutions. And so ensuring that they're able to consider those

Starting point is 00:45:53 candidates and to give them this opportunity and incorporate them into the fellowship is incredibly important and part of the mission. Could it be scaled 10x? Absolutely. 100x? I think so. 100,000 X. Well, I think it sort of ties to what we were talking about earlier of the elasticity of demand for better investors, right? Because in some ways, hiring has so much overlap with investing of, imagine if we could have Peter interview everyone in the world when they're 18 or 20 or whatever the age is and make a decision around whether he wants to give them 100K check.

Starting point is 00:46:28 That would probably be very powerful with respect to economic mobility and how many companies were able to create. And so I think that will happen. And it's just a matter of time of building the right technology. and the right focus to enable it. But is the following possible? Let's say Peter is a just tremendous interviewer. That's easy to believe.

Starting point is 00:46:49 But he's really a great interviewer for the subset of people attracted to him. And if you just put him out in the broader pool, who's going to be a lifeguard at the swimming pool or something? Maybe he's just not that good in an interviewer for that. I agree with that. I think that's certainly the case. And so imagine if you had a panel of domain experts

Starting point is 00:47:09 across every industry that we're able to perform these interviews. Because certainly the best models will be better than any single best individual, but I would expect that the aggregate sum of all experts in each domain will likely remain better than the models for a long time. Now you dropped out of school, now you're doing the company. Obviously, you're very busy. But imagine as an act of magic you could have a free year, just inserted between today and tomorrow,

Starting point is 00:47:37 and you come back and nothing has changed. to go off and do anything you want with literature, with art, with travel, with music, with climbing the Alps, I don't know, what would you do with the year? That's a fascinating question. Can it be AI related? No. Can not be company related? Let's see. I would love to travel. I think that sounds like it would be a lot of fun, because as you can imagine, in running the company, I've worked 100 hours a week for the last three years. And I love doing it and I'll continue doing that. But I do think that seeing the world

Starting point is 00:48:12 and getting more of this understanding of how do perspectives vary by country and geography, how are people thinking about AI differently elsewhere is really interesting. I really like that. Remember, after Chat ChypD came out, Sam did this world tour of going to all the different places, seeing what they thought about AI, how they viewed it impacting their world. And I think that global perspective is incredibly valuable and formative. Where do you want to go the most? I want to go to Japan a lot. I've never been to Japan, so I'll have to make it out there. That's probably my topic. It's a great visit. One thing I found, since I have traveled a lot, obviously I'm older and in some ways less busy than you are, that it helps me interview quite a bit

Starting point is 00:48:57 because people more and more come from all over. And it's like if you're a model has the poetic taste of different eras of John Milton, Wordsworth, Shakespeare, whatever, traveling is an individual's version to get some version of that. Yeah. So say you hire a lot of people from India, I suspect you do. It's a populous country, a lot in the Bay Area. Going to India then becomes very important because you get a better sense of just where they're coming from. Yeah, I completely agree. I think also being able to connect with those individuals very quickly around, hey, I've been to this place, and I'm very familiar with India and all these different things is really helpful in building relationships and setting up

Starting point is 00:49:39 trust across all the different people that we work with and interact with. How did your eighth grade donut company go? One of my favorite topics. Well, so I could tell the story, which is I initially realized that Safeway donuts were selling for $5 a dozen. And my eighth grade mind was thinking, that is such a deal. I would pay $2 a donut, and I bet my friends would as well. And so I would bike down to Safeway and I would buy Safeway donuts for $5 a dozen,

Starting point is 00:50:09 go to my middle school, sell them for $2 each. Eventually, my middle school called me into the principal's office to shut me down because I was scaling up my operations. And then I moved my donut stand about 50 feet over off of school campus so they couldn't police me. I paid my mom $20 a week to drive me in her minivan to be able to bring more donuts to and from safely. She charged you $20.

Starting point is 00:50:31 She charged me $20, exactly. With that underpriced or overpriced? I think it was about right. I anchored it on the cost of an Uber. And I was like, I'm not going to pay more than an Uber, but I need the car to wait long enough that I'm able to load up, you know, 10 or 20 dozen donuts. And so I did this.

Starting point is 00:50:47 I'd pay my friends and donuts because I perceived the cost of the donuts as, you know, my cost basis versus they perceived it as $2 each. And so I had a little bit of arbitrage in the salaries. I had competition pop up where they would sell checks donuts, which are higher end donuts. but they had a $1 cost basis. And so I dropped my prices to $1 for two weeks to drive them out of business before I had learned anything about it and competitive laws. And so those were just a few of the stories from my eighth grade, donut dynasty is what we called it.

Starting point is 00:51:19 Other than just intelligence, what makes a person good at extemporaneous speaking? And you won awards for this, right? I did. Well, actually, I won awards for it, but I wasn't nearly as good as my co-founders. So in high school, we all did speech and debate together. So you knew each other from high school? Like age 14, right? Age 14, exactly.

Starting point is 00:51:37 And we were on the policy debate team together. We also did national extemporaneous speaking. And they were the winningest speech and debate team of all time and policy debate, the most competitive event where they won the Tournament of Champions, the National Speech and Debate Association, and NDCA, the three largest national tournaments, which no other team has ever done. And I did okay, but I'm dyslexic. And so I would always, you know, stumble over words or mix things up and wasn't quite the same level as them.

Starting point is 00:52:06 But I think there's a few things that go into the answer of what makes sound phenomenal at it. I think that high clarity of thought often correlates very strongly with, you know, people that speak very well. And so, as you mentioned, intelligence plays another role. I think a second thing is confidence, someone that's willing to speak and improve and iterate on it, because oftentimes it's just, just doing more of that activity that allows you to improve on it. And then maybe a third one is more than just intelligence. It's also the speed of thought. And I think about those as different dimensions.

Starting point is 00:52:43 There are certain people I think of as having very high aptitude, but thinking very deeply and slowly about a given thing. And other people that I think of as having, you know, reasonably high aptitude or medium aptitude, but being able to like be quick on their feet. And so I definitely think there's some innate element of that. And which are you? I tend to think I'm more of the slower, deeper thinking bucket,

Starting point is 00:53:06 but it depends a little bit on how much coffee I've had. So you started the company, so you were 19. Yeah. Why is it, there's a positive statistical correlation between being dyslexic and entrepreneurship. And there is one in some published papers. What's the mechanism? It's shockingly strong, actually. I'm not sure exactly, but I find that one unique thing is that,

Starting point is 00:53:29 It feels like my brain works a little bit differently and that there are certain things that people are so much better at than I am where, you know, they're reading through evidence in a debate round very quickly and I could never do that. But there are certain ideas or ways of approaching a problem that are just different that enable more creativity, potentially being unconventional in doing so. And I think that that is one advantage I've had. And one of our early investors actually, Scott Sandell is dyslexic and backed a lot of dyslexic entrepreneurs. And so we've talked about this a little bit. One of my hypotheses is that quite early on, you have to learn how to delegate. And that's a skill that when people are not forced to learn, often very competent people,

Starting point is 00:54:13 don't become good at it until much later, but the dyslexic person is good at it right away. Totally. Yeah. Asking people to help read something for them. That's right. Could you please do this for me? That certainly could be the case. And focusing on bigger picture. in some useful ways, at least for being a founder, not good for every job, of course. Totally. But I think one thing I really came to appreciate, especially during high school, is that there are certain things that some people are phenomenal at and others are horrible at.

Starting point is 00:54:40 Like, I felt areas of debate and reading through evidence quickly where I felt extremely unintelligent and it was super humbling. And so much of finding success in your career is just understanding, like, what are your strengths and how do you leverage those? and much less about what are your weaknesses. And so that's something that I've sort of taken with how I approach McCour, but also how I encourage our employees to think about their roles within the business, of what are the things where they have these comparative advantages and phenomenal strengths,

Starting point is 00:55:14 and how do they leverage those most effectively? How much do you feel you're in touch with the general culture of intelligent 22-year-old men in the United States? Or are you just so in the company? You have no idea what's going on. I'm so in the company. I don't know. I think that I obviously was in college for a couple of years before I dropped out. And so I had some people around me that so much of our company is 22 plus or minus a couple of years. So I guess I have that heuristic. But I certainly don't think that I have spent as much time with people my age as if I had stayed in school as another comparison. This is not a question about you because we don't ask personal questions. But a good tech friend of mine, you've probably heard of him. He says to men in that age bracket 22, 23, that there truly is a dating crisis, that something has gone wrong. Not about you, but just America in general. They're very smart,

Starting point is 00:56:08 possibly nerdy person in that age group. Is there a dating crisis? I think certainly in San Francisco, not in New York, but certainly in San Francisco. And you think it's just gender imbalance or the country screwed up more generally? I haven't thought too much about this. I think it's probably a gender imbalance in San Francisco, especially in certain industries. But I think that dating apps are probably generally in society, I don't use dating apps, but are generally in society helping to drive a lot more efficiency in solving this matching problem. So you're pro dating app. Most of the people I know are against them. No, I think I think they're good. I'm very much a proponent of better technology to solve these matching problems and enable people to be happy in their lives. Your last name is foodie.

Starting point is 00:56:51 Should I believe in nominative determinism? Are you a foodie? I certainly love I dieting and good good food. It's funny. My dad always loved cooking growing up and was certainly much more of a foodie than I am, but a little bit of it rubbed off on me. And so while I'm not as much into cooking, I love eating good food. Where in San Francisco should people eat?

Starting point is 00:57:13 Or nearby? Lots of good restaurants. I think there are sort of the everyday restaurants that I think are very good. and then higher end. Every day I love Mexican food. So El Matates is a great Mexican restaurant. I also, for higher end food, I like Cotonia and California. Queens, lots of good restaurants like that.

Starting point is 00:57:34 At the meta level, what's the thing people should know about eating out here? Like where I live, I would just say, you need to know to go to the suburbs. May or may not be true here, but here, what do they need to know other than particular names of restaurants? I find that belly is really accurate. the app for food ratings in San Francisco because there's a high density of users. And so if you use Belli as your guide, you'll generally find good spots. Why is the company called Merckor? Mercor means marketplace in Latin. And we want to build the largest marketplace in the world. So we named it Mercor. We're from Mercatus. Do you know what Mercatus means in Latin? It's a variant.

Starting point is 00:58:10 Yeah. It means market. Okay. There you go. So yeah, we're from the same named institution in that sense. Well, it's funny. In high school, my co-founders and I went to a Jesuit school, and my co-founder, Sirius, studied Latin, and so we've always certainly thought a lot about Latin roots and Latin words. Your family wasn't Catholic, I believe, right? That's correct. Did going to a Jesuit school help you think, or what did that add to the mix? Well, none of the three of us were Catholic, despite going to Catholic school, which was a little bit funny.

Starting point is 00:58:40 But one interesting story is that my mom was concerned about whether I would start selling drugs when I was doing my donut stand in eighth grade because, you know, it's an easy step. And so I like to think that, you know, Catholic school helped instill good values in what I should care about and being very focused on school at the time, on speech and debate, on building companies. And so very grateful for that education. Last two questions. First, what's the next goal you have for the company? The next goal for the company is really in scaling up a lot of these super realistic evaluative. that I've talked about of how do we get measure the ways that models use all sorts of different

Starting point is 00:59:21 tools on trajectories that would take someone days or weeks to do is a big focus for us. And especially how that impacts enterprise, right? Where I think that so far for the last two years, people have been very focused on the idea of intelligence rather than the idea of models being useful and bridging the gap between what do enterprises actually want to use. How do we measure that? and how do we get those capabilities and models is, to me, the most exciting thing that I could work on. And what do you want to learn next? Work related or otherwise?

Starting point is 00:59:54 That's an interesting question. I feel like Mercor is at the intersection of labor markets and AI research. And we grew up with the DNA and labor markets of thinking all about how do we aggregate all these people on our platform, how do we match them? We hired people that are deep domain experts in labor markets like Sandeep Jane, who was the chief product officer, chief technology officer at Uber. But I am most fascinated by all of the advancements in AI research of how do we apply human talent and human labor to all of these problems at the frontier in more efficient ways to train models and what are the specific rubrics or data types that are driving the most model improvement. And so I've been most interested in how to learn that. Brendan Foodie, thank you very much. Thank you so much for having me, Tyler. Thanks for listening to Conversations with Tyler.

Starting point is 01:00:52 You can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. If you like this podcast, please consider giving us a rating and leaving a review. This helps other listeners find the show. On Twitter, I'm at Tyler Cowen, and the show is at Cowen Convose. Until next time, please keep listening. listening and learning.

Conversations with Tyler - Brendan Foody on Teaching AI and the Future of Knowledge Work

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.