The Peterman Pod - Ex-Citadel Quant and AI Researcher On Breaking In, Tech vs Finance Careers

Starting point is 00:00:00 90% of the battle in research is actually finding the right problems. This is Nimitz-Sohani, Stanford PhD, AI researcher, and previously a quant at Citadel, and I asked them about both AI research and quant careers. Do you have any advice for someone who wants to move into AI research? If you want to sort of switch from a SWI track to AI track, there's got to be something behind it, right? We compared and contrasted these roles, which had some surprising insights. Yeah, I think, you know, Quant actually probably has a better way. work-life balance than AI. Unlike in tech, you know, one thing I was surprised by, you know,

Starting point is 00:00:34 culturally is how tight-lipped people are in finance, even within a firm. We also went deep into what he's actually working on now. The main challenge of Transformers is that here's the full episode. When you think about the opportunities that are not available to you without a PhD, what comes to mind? Yeah. So I think, you know, there's not really too many opportunities. that are actually unavailable to people without a PhD, but some of them just get a lot easier with PhDs. I mean, so academia is an obvious one that does require a PhD. That was never something I was super interested in for a lot of reasons.

Starting point is 00:01:20 But I think some roles that are definitely a PhD opens a lot of doors to are kind of the two that I've had experience with. one is like, you know, sort of doing industry research in AI like I'm doing now or, you know, you know, back in the day there were, you know, a few different, like industry research in computer science or mathematics was a little bit more diverse, but more and more people are converging towards AI. So I'll just say like AI research is one of them where having a PhD helps or, not that you, you know, a lot of people do AI research without a PhD, but, you know, the type and shape of the role can look kind of different. And another one is.

Starting point is 00:02:02 is quantitative finance. So again, a lot of people go into quant, you know, out of undergrad, but certainly having a PhD, like, opens you up to, you know, some sort of different opportunities, and it can be a lot easier to get your foot in the door there. So if we think concretely, let's say I was going for an AI researcher role or something like that, are you saying the PhD helps you in that first step of filtering, or does it help somewhere else in the process in getting one of those roles? Yeah. So I think it's both. So certainly it's a lot easier to get an interview

Starting point is 00:02:37 if you've differentiated yourself from the pack in some way. You know, just applying for AI research role at a, you know, top firm can be difficult, you know, if you don't have, you know, whatever the right schools, quote-unquote, on your resume, the right internships, the right connections, whatever. But it's certainly doable. And then I guess, like, I think like more, less transactionally, having a or like doing the PhD can like develop like a key critical skill set that can help you

Starting point is 00:03:10 as as along your path towards becoming a great AI researcher. But of course, you know, there is an argument for being thrown into the fire as well and just like kind of learning on the job. And that's certainly an option that works for for many people. I think have, you know, there are some things that are harder to do in industry than an academia, like kind of the more exploratory first principles, like fundamental research without necessarily like a direct application. You know, in an industry, um, definitely skews a lot heavier towards the applied side of things. But I think like having that fundamental background can be very valuable depending on what kind of research you're targeting. You mentioned the type and the shape

Starting point is 00:03:54 of the role could be different if you had a PhD versus not. Could you give a example of what you mean? If you're working on more like engineering heavy stuff in AI, so, you know, building, you know, training or evaluation infrastructure, you know, working on like, you know, data processing, things like that, those are not things like a PhD is really necessary for at all. I'd say if you want to do more like sort of pie in the sky type stuff, like, you know, architecture design, things like that, a PhD can be, you know, can be helpful there because, you know, you know, you know, you. you have more time to kind of explore directions that may not pay off in the short term. But, you know, again, there are examples of people being successful in, you know, without a PhD, with or without a PhD in both domains. So I think, like, if your only goal is to be an AI researcher and you're not super, you know, tied to the, you know, the particular type of work you do, you just really want to get into the field,

Starting point is 00:04:56 a PhD is definitely not necessary. but I think like if you're if you're still in the sort of exploration phase of your career and you want to find like a problem that really draws your interest then a PhD can be a good way to do that. You mentioned the PhD skill set or something that you kind of develop when you get a PhD.

Starting point is 00:05:22 What is that skill set? I would say like 90% of the battle in research is actually finding the right problems. So, you know, you have to find a problem that is, you know, interesting. It's meaningful that, you know, people are actually going to care if you solve it. You have to, you know, sometimes convince people it's interesting because they might not have thought about it the same way. And then, then you have to execute.

Starting point is 00:05:44 And, you know, you need to make sure it's a problem that's appropriately scope that is actually tractable for you to make progress on. So, you know, I think like all of those things were not skills that I had developed. You know, it was more, you know, just like execution was my strength. And so definitely that was a, you know, big learning process for me during the PhD is like, that sort of like research taste and problem selection. And this is something that, you know, just like being immersed in the field, you know, really helps with. You know, once you've read enough papers, you know, talk to enough people, you kind of get a sense of the patterns and trends that are going on in the field. You mentioned the research taste and finding the right problems. If you could kind of condense what you learned in your PhD, is there maybe some top tips that kind of lead to you finding the right problems? I would say the main things that I find useful are just, you know,

Starting point is 00:06:45 keeping abreast of the current literature, just reading as many papers that you can. And it doesn't have to be reading them like end to end, just like skimming abstracts, you know, seeing what's going on. What are people thinking about? And, yeah, I think the other thing is just working your way up. So, you know, initially, earlier in your career, you want to attack, like, you know, very small sub-problems that are, you know, you're reasonably likely to make progress on, right? So one example, you know, example of this can be, like, you take a, you know, method and you try to

Starting point is 00:07:18 extend it to some, like, you know, special case or something, like, slightly different from the origin application. And then as you, as you go on, um, and, uh, as you mature as a researcher, you can start tackling like bigger and bigger problems. So, um, you know, not just like kind of extending previous work, but maybe, uh, coming up with like totally new ideas, things like that. So I think there is, you know, a gradual stage of maturation as a researcher. And I think, you know, some people do try to skip those steps. And I think that's generally, you know, inadvisable, I think. You mentioned keeping abreast of the literature.

Starting point is 00:07:52 What's the go-to spot for you to kind of get your feed of the hot research paper to read? Honestly, Twitter is one of the, probably the main way that I keep up with papers. I think if you follow enough good people on X, I guess, your feed becomes pretty curated to that. So that's usually the first way I find out about stuff. obviously, you know, just like talking to people, you know, co-workers and so on. But yeah, I think X is my go-to. So I try to, I try to curate my feed in such a way that it's mostly, yeah, machine learning papers and pictures of cute animals. Do you have, like, a good starting point for someone who's like, just wants to plug in?

Starting point is 00:08:34 I pretty much, like, you know, started with following people I knew from Stanford and, you know, elsewhere, you know, professors whose paper I'd read, like, you know, prominent people at big labs and so on. And then like anytime, you know, they tweet a paper, they like a paper or something like that. If it's interesting to me, I just click on that and I, you know, follow all the people, you know, tagged in or associated with that work. And that's, yeah, so I sort of grew my follow list organically via that. And understand after your PhD, you became a quantitative researcher at Citadel. Where were you in your career and why did you decide to become a quant? Yeah, I joined Seddell Securities after graduation for my PhD. And so they're, yeah, so basically, I had actually interned there right before, the summer right

Starting point is 00:09:24 before graduating. And I liked it a lot. The reason I decided to intern, just kind of, I wanted to see what else was out there. I had a few friends who had interned at Sidel at Sidel securities and, you know, enjoyed it, or at other quant firms. And I was just kind of curious, you know, by that point I'd been. working in AI research for like four to five years. And yeah, like I said, I was generally, you know, when I entered my PhD, I was interested in careers in which I could apply my

Starting point is 00:09:57 interests in mathematics and computation. And so I think the three major careers of that form at the time were, you know, machine learning research, which I already had experience with, you know, quantitative finance. And then the last one, maybe quantum computing, but it was a much smaller sort of domain and one I had no experience with, although that one's also kind of blowing up these days. So yeah, quant finance, I was just kind of curious and I'd heard good things. And so I decided to intern and I ended up liking it a lot. I think it was, you know, refreshing in some ways. Like I said, the PhD is a grind. You know, you can burn out at various points and it was kind of a fresh, fresh set of problems, totally different environment.

Starting point is 00:10:46 Well, it's funny that you say that the grind of the PhD, you kind of took a break to become McLeod Citadel because I've heard that the work culture is pretty intense at these finance companies. Is that the case? Yeah, so that is the reputation, but I think that definitely varies a lot based on the team you're on, the firm you're at. Yeah. I personally, I had a pretty great work-life balance, as funny as that might sound, as a quant. You know, I think one reason is that, you know, traders typically will work, you know, trading hours or whatever locale they're in. You know, of course there are markets all over the world, you know, APEC, you know, Europe and so on. But you're in the U.S., U.S. traders are typically working, you know, around U.S. trading hours and, you know, of course, a little bit before and after just to prepare.

Starting point is 00:11:39 and stuff. And so I think that generally has a sort of trickle-down effect on the culture where, you know, most people are just kind of, you know, really clustered working around trading hours and then don't take their work home all too much. So, you know, even though the work can be done at any time, I think it just, that is sort of how the office culture operates. Yeah, when it comes to quantitative finance or quant work, how would you describe the work? It really, really depends a lot on both the team you're on, like the sector you focus on, whether you're at a hedge fund or a market maker, whether you're like front office or back office,

Starting point is 00:12:19 quant, and, you know, of course, the company. And so, you know, some quants will spend all their time just like on alpha generation. So, you know, generating new, you know, trading strategies and, you know, back testing them and so on and putting them into practice and monetizing them. You know, I mean, well, some people will focus just on the alpha, some people will focus on the monetization. Or you can be, you know, like a risk quant. So you're

Starting point is 00:12:44 basically not necessarily generating strategies at all, but just like, you know, trying to come up with metrics to capture risk and, you know, avoid that reduced risk without reducing, you know, cutting into profits. You might be, you might be doing like data analysis. So you might have like a ton of like historical like trade data and stuff and analyzing them in various ways and so on. So, yeah, I think, yeah, like I said, like, you know, hedge fund versus market making, they're actually very different problems. So I think the thing that unifies all of them is really, you know, having a strong math background.

Starting point is 00:13:25 Like in the day to day, let's say, you know, your project that you're currently working on. Like, what would that look like? What would the shape of that problem look like? And how does math concretely play a role? You know, depending on what sector you're in, you know, there's a lot of different math that'll come to play. I mean, you know, the sort of the backbone of finance is stochastic calculus. So I think that comes up almost everywhere. But then there are other things like, you know, numerics, numerical optimization, numerical interpolation, things like that. You know,

Starting point is 00:13:56 machine learning, of course, is, you know, now more and more firms are like getting into, getting really deep into the, you know, deep learning space, even like establishing their own research, research arms that do like LLM type research and stuff like that. Yeah, numerical linear algebra. Yeah, so there's a lot of different math. And it's actually very, like, it's actually very diverse in terms of what, you know,

Starting point is 00:14:23 people are always trying to come up with ways to apply, like, different fields of math to quant. I think some of it is just for, for fun, kind of, you know, because, you know, quants are such a mathy, intellectual bunch.

Starting point is 00:14:37 But there is actually a lot that, that underpins the entire field. So, yeah, I mean, I think, yeah, stochastic calculus is probably the most unifying part. Like, that's kind of the, you know, finance 101 type math. So you mentioned coding a lot as a quant, and I have had some friends who were Suisse at Citadel and these various companies. Understand the roles are quite different. How do quants and suez typically collaborate at these companies? really depends a lot on the company.

Starting point is 00:15:10 You know, some companies like, you know, Jane Street, for instance, like the number of people who are called quants is actually very small. And, you know, traders themselves are quite technical and, like, implement a lot of stuff. And then, of course, they have, like, software engineers as well. Whereas Citadel, I think, is a more quant forward firm. So I think, you know, you know, a quants might be, if not the largest, like, percentage of employees. Like, it might be, like, about equal in terms of the technical staff. And so, yeah, I think, you know, there can be a lot of overlap in what a quant, quantitative researcher and what a software engineer does. And also, you know, between a quant and a trader. So, yeah, it kind of just depends. Like at some firms, I think it's more divorced where quants are really doing, you know, the strategy work. And then it's like kind of handed off to software engineers to implement. But at other firms, I think you might do a bit of,

Starting point is 00:16:05 both because, you know, of course, like, the person, like, if they have the implementational skills, the person best posed to, like, actually implement something is the person who, like, understands all the, you know, reasons and, you know, edge cases and things like that. And so, yeah, like I said, you know, I did a ton of coding, mostly in C++, also some Python. If you were to compare and contrast finance and tech generally across these roles, What comes to mind? So I think a lot of the skill set, first of all, is actually quite similar. Yeah, like I said, you know, math and computer science were my main interests,

Starting point is 00:16:46 and I wanted a job that would leverage both of them. And I think that's been the case in Quant, and that's also been the case in, you know, in AI research that I've done. And so, you know, I knew nothing about finance before I joined Sadella Securities, but, you know, I read a few textbooks that were recommended by people. you know, that was really all I needed. And yeah, from there, I just, you know, drew upon my sort of technical skills. And I think, like, AI research is a lot of the same way.

Starting point is 00:17:17 I think if you have really strong fundamentals, you can pick up, you know, pick up the rest. So, yeah, in terms of technical skills, I don't think it was, you know, really a rough transition, either going, you know, going either way. I think, you know, obviously the, you know, culture is different, you know, SF versus New York, those kind of things. Yeah, work hours, I would say, yeah, I think, you know, Quant actually probably has a better work-life balance than AI. You know, particularly because of the level of competition in AI right now, like, you know, it's just a very competitive space. And so one of the ways you can gain a comparative advantage is just, like,

Starting point is 00:17:58 by outworking your competition and that kind of is, you know, what happens in practice a lot of places. I know a lot of people who are just like working around the clock. I've heard insane stories about the comp structures in quantitative finance firms. Is that all true? Like, is it heavily bonus weighted and I've also heard stuff about garden leave? So, yeah, in terms of comp, yeah, I think one thing is like, you know, there's not really standardized levels like there are in tech. you know, you can't just sit, you know, it's not like someone is just like an IC5 and you kind of know, like, what, you know, kind of pay bans they're, they, what they're making. It's, it's, yeah, I think comp is really driven by a few things, you know, how the company does that year, how your team does that year. If you are really on like the alpha side of things, like, you know, how you, how your particular strategies did that year. And then, of course, there's like other things that play into it, like seniority. both in terms of hierarchy, if you're at one of the firms that does have a kind of explicit hierarchy,

Starting point is 00:19:03 or in terms of like, you know, just like tenure at the firm or years of experience, things like that. And so, yeah, I think, you know, quant firms, I think are more secretive and, you know, partly because of the, you know, relative lack of standardization. So it is kind of opaque in terms of, like, how those factors actually combine for your final comp. But, yeah, I think it, it, you know, it can be very bonus driven if you're really on the alpha side of things. And that attracts some people to that kind of thing where they really would just want like as they want to be as exposed as possible to, I guess, like the fruits of their labor. But it is, you know, the downside is can be much riskier business as well. Um, so yeah, it's just more variable. But like if you're,

Starting point is 00:19:48 you know, more back office type thing, I think the comp is, you know, probably a little bit more deterministic if you're not, you know, directly tied to alpha generation. Yeah, it's interesting. I mean, because we were talking about AI research versus quans. And obviously being a quant is famous for earning a lot. If you have generated a lot of alpha, I hear compensation easily in the millions for a lot of these people. But at the same time, AI research also popped off too. You know, if you're the top 1% of either of these firms,

Starting point is 00:20:21 you're going to do very well. Yeah, I mean, yeah, they're kind of crazy. Yeah, I mean, these things really do. exist where people are making like NBA player salaries and stuff. I think, you know, for the, for the median case, uh, yeah, it's still, it's still very good, but I think, uh, yeah, it's, it's, uh, not, not exactly that outlandish. Um, yeah, sorry, uh, you, you also mentioned stuff about NDAs and stuff and, and garden leave. Uh, so, or sorry, non-competes, I guess. So, yeah, I think, um, you know, so finance firms are very, you know, they're very serious about this

Starting point is 00:20:55 sort of thing. You know, unlike in tech, you know, one thing I was surprised by, you know, culturally is how tight-lipped people are in finance, even within a firm, you know, there is things that you can and cannot share across teams and or people might just want to be more secretive because, you know, they're protective of their alphas. And so if you, like, know, kind of what they're doing, you can sort of re-implement a similar thing and, like, capture, take over some of their alpha, right? Because what makes it alpha is that it's, you know, secret if more, the more people who know about it, like the less profitable it's going to be for any individual. And so, yeah, I think it's, you know, quite secretive. You know, even, even the firms that are, you know,

Starting point is 00:21:39 have a reputation for being more open are actually quite secretive versus in tech, you know, people talk about things all the time. And so it was a bit drawing for me, returning it to tech and, like, hearing like people, you know, talk about what they're doing in like a, you know, very open way. I was like, wow, like, you're just going to tell me that for free. So yeah, a non-compete is like, yeah, probably the, you know, most notorious part of this is, yes, a lot of firms will have a clause in your, you know, in the contract you sign at the beginning stating that you cannot work for a competitor for a period of time after you, after you leave the firm. And this period of time is typically decided by the company when you when you leave but it can be um anywhere from well it can be

Starting point is 00:22:20 zero uh up to like two years uh i think i've heard like even up to three years for for some places but i think that's where i think the norm is i would say the norm is like you know six months to two years um and so yeah during this period you're basically just paid to not work um yeah it's called garden leave because I guess you know you you sit at home and garden or whatever and it's actually like I mean it's actually a quite interesting thing you know it creates interesting incentives for some people because you are typically compensated quite well during this garden leave period so it's not necessarily a you know it's not necessarily a downside for some people and yeah basically idea is like you know you won't you know leak ideas to your competitors and you know by the time your garden

Starting point is 00:23:05 leave is over. You know, if you have some special alphas or trading strategies, you know, two years down the line, they're probably not even relevant anymore. So it doesn't even matter. You mentioned the secrecy within in quantitative finance. And I see a natural incentive here to kind of be hostile or I guess competing within the firm because, yeah, my alpha is my alpha. I'm not going to help you. Did you ever feel that or see stories of the that. Yeah, no, it's definitely a thing. You know, people are, yeah, I think a lot of people are, you know, reluctant or, or even forbidden to talk about any details, basically, of what they, what they do. You know, some people, you know, some people, you know, won't even, don't even, like, say what

Starting point is 00:23:53 sector, you know, they work on, you know, at least across companies and stuff like that. So, yeah, I think that's definitely a thing, you know, some firms are set up where it's like basically pods. So, you know, one pod is just responsible for basically all of their P&L and then the firm takes a cut. And so, you know, different pods might be working on, you know, very similar things unknowingly, right, but they're not sharing any of the information. And there is some logic behind this because the idea is you want to have uncorrelated, you know, uncorrelated returns. So if all the pods are like, you know, talking to each other sharing ideas, you know, chances are they're going to start doing very similar things. And then, you know, that exposes you to risk where, you know,

Starting point is 00:24:38 what if the thing you're doing is actually wrong and, you know, you can wipe out not just one pod, but entire team of them. Whereas if people are working independently, then, you know, that's, that's not, that's less of a risk. Earlier, you mentioned a, the top 1% of AI researchers and Qants are going to do extraordinarily well. And I'm curious, what sets the top 1% of AI researchers and Qants apart from the rest? there are a lot of things. I think there are different ways to get to that point as well. It can be raw technical skill.

Starting point is 00:25:10 Some people are just really, really good at what they do, able to, you know, the prototypical like 10x engineer, that kind of thing. And they just have, you know, a better, a higher level of intuition and or execution speed, stuff like that. You know, of course there's politics involved. Like, you know, people who are better at playing the political game can, you know, rise up in the ranks.

Starting point is 00:25:35 I think in quant, you know, one thing is that it is a bit, you know, it's harder to game the system because there are kind of hard metrics that it's easier to evaluate how someone does, especially, you know, if you're at alpha quant, you know, it's quite clear, right? Like, if you implement a strategy and you make the firm a ton of money, like, that's obviously going to be recognized. I think, you know, in AI, you know, it can be a little bit harder, but of course, I mean, the analog might be like you publish like a seminal paper. like you make a true breakthrough in the field,

Starting point is 00:26:06 you make the models much better than, yeah, that sort of thing. So, yeah, I think it's, yeah, I guess it's probably similar to, you know, other domains. It's just a combination of skill and, like, you know, playing the game. And I think being in the right place at the right time has a lot to do with it. You know, both in Kwan in terms of seeing something before other people do and then like making, taking advantage, capitalizing on market trends and turning that into a profit

Starting point is 00:26:35 or an AI, you know, like having the right idea at the right time. When it comes to quant firms, I'm kind of curious, there's all these tier lists out there. What are the top firms and why? Rentech, like we talked about, is one of the sort of mythical firms in the space.

Starting point is 00:26:51 You know, you can't really argue with their returns, the historical returns over like a, you know, 20 year, 30 year period. It's pretty insane. So, you know, rent-tech is maybe, you know, the gold standard, you know, depending on who you ask. Then there are other firms like, you know, some of the slightly bigger ones like, you know, Jane Street, Citadel, jump trading, Hudson River. I think those are generally very well-regarded firms. And, you know, having that kind of thing on your resume can definitely be, you know, an asset to future quant rules and things like that.

Starting point is 00:27:24 So, yeah, very good firms, I think, you know, great technical talent, great returns. obviously. And then there are some like elite smaller ones, you know, similar to Rentec, like, you know, smaller, more secretive, less well known, but still very, very excellent returns. So, yeah, TGS is one. It's in Southern California. Yeah, XTX is another one, one of the newer firms. I think RADX is another newer firm that's, yeah, in that boat. So, yeah. Are there any stories you working in the space that you think might be interesting? You know, finance firms do not mess around. So you hear stories about like, you know, people are just doing, you know, dumb things.

Starting point is 00:28:08 Like, you know, traders or quants having like an internal WhatsApp group where they, you know, talk about strategies and they're like, like, so as a quant you have like trading restrictions. You have to get all trades pre-approved. And so, of course, if you're, you know, someone who works in equities or something, you know, you probably are not going to be able to trade those tickers at all. But, you know, people try to get around it with their little WhatsApp groups or whatever, like telling their friends to, you know, buy these stocks or something, split the profits or whatever. If that happens and you get found out, you know, they're going to go after you. You'll get fired. Obviously, there'll be lawsuits. You can even go to jail. Yeah, there's like a few stories about this because it is against the law. And so, yeah, heard more stories about this. that's one of the things they tell you about in training actually

Starting point is 00:28:59 is like yeah do not do this similar with like non-competes you know people going to competitors or starting their own thing or something and like getting accused of taking strategies and stuff like that all these firms have like elite legal teams and yeah just not something you want to mess with I've heard that in quantitative finance

Starting point is 00:29:20 that it's kind of intense sometimes or rather people may get fired very often. Did you ever have, like, you just were working with someone that kind of disappeared? Yeah. Yeah. I mean, yes, that does happen.

Starting point is 00:29:37 Yeah, I think it's interesting in quant because, like I said, yeah, comp is a function of many things among which is seniority. And so I think your job security can actually be kind of U-shaped because senior quants, like, you know, even if they're very good, they just get very expensive after a while because that's sort of what the market rate is for senior quants.

Starting point is 00:29:58 And so even a good quant, you know, can stop being worth it after a while. Whereas like early career quants, you know, you can, you know, they might be very good and also not command, you know, as high of a salary. So, yeah, the job shape is a great, not usually, like kind of like an inverse parabola almost. And yeah, I mean, people certainly get fired. It's a quant in general, I think, has a culture of, you know, well, one is like up or out and two is like, you know, just trimming the low performers. And again, I think this can become, you know, especially easy if you're like more on the alpha side.

Starting point is 00:30:35 Like if you're just not making money, like it can be pretty clear. But in generally, even for like, you know, engineers, yeah, I think there is this kind of culture. Yeah, traders, of course, it's, yeah, since they're, you know, making trades and stuff, again, it's like very easy to monitor. So I think that can be even more, more brutal. why did you leave Citadel to join Cartesian? When I joined Citadel, it was partly because I was just interested in learning about a new problem domain and like, you know, learning some new stuff. You know, learning about finance in general, I think was also like kind of interesting to me.

Starting point is 00:31:09 And yeah, I think, you know, I became more financially literate as a result of things and stuff like that. Like it was, it was a great learning experience for me. And I was kind of optimizing for like growth potential partially as well. But yeah, I mean, by that point, you know, I'd been at Sidel for, you know, a couple years and was, yeah, I think, you know, like, it's sort of like your growth, you know, at most places will kind of like accelerate for a little and then like sort of taper off. But I think there was still a lot more to be learned and had I decided to continue on that path. But I, you know, I saw what was going on in the field of AI. You know, when I graduated actually, it was right before Chad GPT came out. And so I think a lot of change. And so I think a lot of change. even since I joined Citadel. And I heard that, you know, the founders of Cartesia were starting this company. And for context, I knew all of them from my PhD at Stanford. They were actually all in Chris Ray's lab with me. I knew Albert pretty well. And so, yeah, I had, you know, tons of respect for them. They're, you know, great researchers. I worked, you know, pretty closely with some of them. You know,

Starting point is 00:32:13 Albert was a good friend of mine, knew the other guys. And so it just seemed like a great opportunity and a great time to get in to get back into the field of AI when things were sort of taking off. And I thought it would be great in terms of, you know, personal and technical growth. Also, the opportunity to join a small startup was definitely something I was interested in me and kind of like shape the company and the culture, you know, as one of the earlier employees. And so, yeah, it was really, yeah, I think, yeah, it was all about, all about growth, getting back into AI. And I think, um, and I think, like, know, there is like a definitely different risk profile. I think when I graduated my PhD, I was kind of more like risk averse. You know, quant was like a stable, you know, lucrative opportunity

Starting point is 00:33:02 that, you know, was the right choice for me at that time. Now that it had sort of established myself a little bit, gotten some of that stability, I thought it was, you know, opportune time to take a risk. Cartesian, I guess, if you could give us some context on the primary problem, the company solving, and just like what the company's about? Yeah, we are a voice AI company. You know, our current mission is to build sort of the next generation of voice AI and a platform for that. So what that means is we do, you know, our flagship product is text of speech. We, you know, we also have products around speech to text, you know, voice agents and, yeah, stuff like that. And yeah, I think, you know, we believe voice AI is the future.

Starting point is 00:33:46 You know, it's already, it's actually one of the fastest growing areas of AI. You know, people are using voice AI in, you know, many applications, you know, call centers being one of the predominant ones, but also, you know, a bunch of applications and entertainment, you know, a bunch of like, you know, companions, like a bunch of different things. And so that's, that's kind of the product set we're building. And yeah, in terms of, you know, why do we choose voice? So I think, you know, voice is actually a very interesting test bed for a lot of research ideas that are, that we're exploring. So we're, you know, we also have like a sort of research arm of the company that focus on kind of longer term research around, you know, around long context, around multimodality, things like continual learning and memory, you know, test time compute. And in general, like, you know, or, or, you know, sort of overall, like, even higher level goals to build real-time, you know, systems that are truly intelligent and that you can, like,

Starting point is 00:34:48 interact with and that can learn from experience. And so I think, you know, building these, you know, voice agents, you know, speech-to-speech models and so on is, you know, it requires you to kind of, you know, solve some of these problems for the, you know, sort of eventual, eventual idea of like a kind of like always on assistant personal assistant when it comes to this voice AI space who are the top competitors so um our main competitor is a company called 11 labs um they're you know another voice AI company basically um and uh yeah so they i think had about a 18th month had started on us uh yeah i actually used to play with 11 labs you know long before cartesian was everything just like kind of make like you know fun videos and

Starting point is 00:35:33 and whatnot. And so, yeah, it's, you know, very, very similar company. I think, you know, where Cartesian stands out, I think is, you know, we have sort of a focus on, you know, things like latency. So, you know, low latency is really important for a lot of voice A applications, you know, for naturalness, you know, like the conversation we're having now, you know, you can't afford to have, you know, you know, a second pause in between, like, each, uh, each, uh, you know, each turn of the conversation, you know, that just really breaks the sort of illusion and, uh, and, um, immersion. And so, you know, latency is really important for, uh, you know, a lot of our customers. Um, yeah, you know, we're continuing to try to push the boundary of

Starting point is 00:36:20 sequence modeling and stuff to get, you know, better and better quality, um, without compromising on latency. Um, and then, um, you know, going into like more end-to-end system, as well. So right now the way voice agents are typically implemented is you have a speech-to-text system that transcribes some text. Then you feed this into a language modeling backbone. And then you have a text-to-speech system that will take the text that is output by the language model and, you know, speak the result. But this has a lot of problems in terms of latency, again, in terms of naturalness, because it's kind of not an end-to-end system. So there's a lot of, you know, loss in between each of these components and so on. And so, yeah, that's, you know, one thing that we're trying to build

Starting point is 00:37:02 towards. But even right now, I would say, you know, even if you just look at our Texas speech products, I think, you know, we're definitely right up there as, you know, one of the leaders in the space. I think, yeah, you know, 11 wins on some languages. We win on some. You know, I would say we have like better voice cloning, things like that. So, yeah, we're trying to become number one and everything. But, yeah, I think, you know, know, like I said, voice AI is a very fast-growing space, and so a lot of people are jumping into the space, but I think the pie is very large. What does it look like if Cartesia completely destroys 11 labs? I think we already win in terms of things like latency in terms of cost. I think

Starting point is 00:37:44 if we can conclusively win in terms of quality, not just for, you know, subset of tasks, you know, and not just for a subset of things, like, but, you know, there are many things that people care about for text to speech quality. There is, you know, just adhering to the transcript. So actually reading what is put, you know, in front of the model, which is, you know, can be surprisingly hard, especially if you have different languages, especially of, you know, special characters, you know, repetitions, whatever, you know, all models struggle with this. But there's also naturalness. Like, does it really sound like a person saying this or does it sound robotic? You know, of course, like people, a lot of, a lot of applications actually care about naturalness

Starting point is 00:38:20 even more than just transcript fidelity. And then, you know, of course, there are all speed and things like that. And then there are features like, you know, voice cloning, accent localization. So, you know, taking my voice and making it have a different accent, things like that. You know, control ability, you know, speed, emotion, things like that. And so, yeah, like I said, I think, you know, we have better quality in some areas, maybe worse than in some others. You know, we'd like to get to number one and, you know, in as many categories as possible, right?

Starting point is 00:38:54 And so I think that's that sort of the thing, right? Like, you know, switching costs exist. Even in AI, I think, you know, depending on the size of the customer, like some customers are reluctant to switch over from, you know, one thing to the other. You know, obviously startups can be more nimble, but, you know, when you're talking about enterprise scale, you know, this matters. But like if you are, you know, conclusively show that you're better in every way, then I guess like at some point it becomes hard to argue for not switching. I imagine you could have worked at a big lab, open AI, drop-big, etc. What's the main difference in working in an AI startup versus one of these big AI labs? Big labs have obviously amazing resources.

Starting point is 00:39:35 You know, they have all the compute in the world, you know, tons of researchers and so on. I think one thing is that like the flip side of that is that I think big labs can sometimes be more averse to sort of out-of-the-box ideas and a little bit more susceptible to groupthink or like sort of overarching trends in the field and like less willing to take a risk on something different. And then, you know, that makes sense, right? Because, you know, with sort of these, you know,

Starting point is 00:40:05 great resources, you know, there's a lot of cost to investigating new ideas that don't turn out well. Whereas a startup, I think you're a bit more nimble. You're able to, you know, you're able to be a little bit more exploratory if you do it strategically and sort of challenge the orthodoxy in that way. And so that was one of the things, like I mentioned, that drew me to Cartasia. You know, Albert has, you know, a lot of interesting ideas that I think don't necessarily go with the accepted grain.

Starting point is 00:40:37 Like, you know, in, you know, around the time Mamba came out, you know, people were kind of like of the opinion. A lot of people were with opinion that like, you know, sequence modeling was kind of a solved problem and all you need is skill. Like you just take the transformer recipe and you just scale it further and further. Yeah, I mean, Albert showed that, you know, that's not necessarily the case, right? With Mamba that you can actually get real advantages in terms of, in terms of things like, you know, efficiency, computational efficiency, but also like even in terms of just like raw quality, you know, state space models can be advantageous for a lot of classes of problems or like things like hybrid models where you take some state space model layers, some transformer layers, things like that. You know, another more recent work that we put out at Cartagio was this idea of H-Nets where, so, yeah, for context, the way that text modeling is usually done is you take, you know, you take, you know, raw text, you know, sequence of characters or, you know, UTF 8 bytes or whatever. And then you compress it or, you know, you represent it as these things called tokens, which are basically like little pieces of words or subwords. And then you run modeling over that. So it's like a two-stage pipeline.

Starting point is 00:41:47 You know, we showed that if you actually just go from the raw, you know, raw characters and you kind of learn this tokenization, you learn how to like draw these boundaries in between groups of letters instead. You can actually get better performance. And so, yeah, that's the kind of thing. You know, I think challenging accepted ideas. Like, that's the kind of thing that appealed to me. For contact, you mentioned state space models versus transformer. Could you just give a quick primer, I guess? without going into too much, I guess, technical detail,

Starting point is 00:42:18 you know, basically the main challenge of transformers is that the memory that they use at inference time grows linearly with the sequence length. Because what they do is like, you know, they will take each token and store, you know, a representation of it in what's called the KV cache, you know, the key value cache. And so as you as your sequence grows longer and longer,

Starting point is 00:42:41 you're still storing all of this information in context in your memory. And so for very long sequences, you know, this can get prohibitive, both in terms of, you know, computational cost in terms of memory. SSMs are different because instead of storing everything into, you know, everything like in this uncompressed way, they take that information and they compress it. So the size of the state is fixed. And so as a result, like, you're, you know, the cost of doing a certain step doesn't change with the length of the sequence

Starting point is 00:43:12 and the amount of information you have to keep in memory does not grow with the sequence length. And so kind of an intuition, our co-founder, Albert Gou has a great blog on this is that SSMs are kind of like a brain. You know, the human brain also does not

Starting point is 00:43:27 store an unbounded amount of context. You know, it takes in information and it processes it and it like, you know, keeps it in this fixed-sized state, which is, you know, which is our brain. Of course, you can simulate it having a, you know, unbounded state, via use of external tools, like, you know, writing stuff down and so on.

Starting point is 00:43:44 But the core, you know, the core primitive remains fixed. Whereas transformers are more like a database where you can kind of recall anything in the context. And so I think both of these approaches are complementary, right? And yeah, so, you know, we're currently exploring kind of, you know, extensions of that analogy. But yeah, I would say that's kind of the, you know, high-level thing. Is the sequence just the input? So the longer the prompt, the longer the sequence, and therefore more memory consumption at inference.

Starting point is 00:44:16 That's right. So the sequence is the prompt plus the response. So, you know, as the model is generating the response, you know, the context, you know, the context includes what has been generated so far, right? So you can refer back to what you yourself have said and, you'll figure out what is, you know, what is the next appropriate token to say. And so this can get obviously especially large for multi-turn conversations where now, the context includes like everything that has been said in the entire conversation up to that point.

Starting point is 00:44:44 And so, you know, beyond a point, you know, as I'm sure we've all had experience with, you know, if you're chatting with these language models, you know, sort of ceases to be, you know, that useful maybe after, you know, 100, you know, tens or something of turns. And, you know, it can be best to start a new conversation. But the challenge with that. And, you know, of course, companies are doing things to try and sort of address or bandaid this, you know, for instance, like chat GPT now. like, say, of some, like, global context in between conversations and stuff like that. But it doesn't really truly learn from, you know, from your personal proclivities and preferences

Starting point is 00:45:21 and, like, the things you've asked in the past. Like, there is some semblance of this, of course, but I wouldn't say that it's, like, you know, truly personal yet in terms of, like, an actual agent that is, like, kind of learning and growing every day. Yeah, you know, I've been using cloud code a bunch. And I noticed occasionally it does. this thing, it says it's compacting or something like that. I imagine it's taking the multi-turn conversation. I don't know what it's doing, just maybe summarizing it and restoring it. Yep. Yeah, there are all sorts of different ways to kind of compress the KV cache, either sort of mechanistically or kind of doing like a, you know, textual summaries or things like

Starting point is 00:46:01 that. Yeah, this is a pretty active area of research as well. You mentioned that the state space models, they have a compressed. representation of the KV cache or, and so I'm curious, does that have a tradeoff in terms of the quality of inference? Is it lossy? Yeah, so there are certainly tradeoffs, you know, the, yeah, I think depending on the task, like, so for very like recall heavy, you know, or fact-based tasks, you know, SSM, pure SSM models can lag transformers because like the, the ability of transformers to do this kind of exact in-context recall, it turns out to be very helpful for this kind of task. Whereas for other tasks that don't require this type of thing, you know, S-SMs can

Starting point is 00:46:49 scale just as well or better as transformers, even for like a fixed parameter budget, you know, let alone the inference budget. You can you can kind of get the best of both worlds. You know, a lot of people have shown this by doing a hybrid model. So you just basically interleave state space model and transformer layers with some ratios. And so, yeah, Nvidia's put out stuff like this, even the Quinn, you know, the latest Quinn models

Starting point is 00:47:15 follow the strategy as well. So yeah, I think, you know, the cutting edge, I would say for text is probably in these hybrid models, at least in terms of like what's out there for open source. But the interesting thing is that, you know, for other modalities like audio, it actually makes a lot of sense

Starting point is 00:47:32 to have this compression as like an is an explicit inductive bias. So using, you know, SSMs for, for audio has proven, you know, very useful for us. You know, we found that it actually improves performance. It's kind of almost a free lunch, you know, you get improved performance and improve quality

Starting point is 00:47:49 and improved performance at inference time. And the reason is that sort of like, if you think about what these models are doing, you know, audio is, you know, depending on how you represent it, is a very like, you know, there's very little information contained in any one, like, you know, time step or token, if you will, of audio. It's, you know, like a frame of, you know, depending on what you're doing, like, you know,

Starting point is 00:48:16 10 milliseconds to 100 milliseconds. And so, you know, one frame to the next doesn't really vary that much. And so, you know, compressing these into, you know, sort of fixed-sized state can actually, like, makes a lot of sense. As opposed to text, which is a much, like, sort of dense. informational modality, you know, one word to the next, there is actually a ton of information contained in each of those tokens. And so, you know, compression is less, you know, it's kind of already like pre-compressed if you're using a token level representation. But, but yeah,

Starting point is 00:48:47 even so, I think, you know, hybrid models, I would say hybrid models, I think, are the future in, in that regard. I see. Okay. So it's because the modality itself has, I guess, redundancy in the data that means that this lossiness is actually an asset rather than a problem. Exactly. Yeah. So, yeah, I think, you know, there's a lot of interplay between modality and architecture. It's definitely not something, you cannot design your architecture independently if your data. And so, yeah, kind of this like, you know, co-design and, like, thinking about, you know, modality from a fundamental level.

Starting point is 00:49:26 You know, this is one of, you know, the research problems that I mentioned that, like, kind of drives a lot of the work we do here. When you think about companies that focus on product versus research, what pattern do you think is most effective? So I think, you know, personally, and this is one of, you know, also one of the reasons I decided to join Cartesian, I think it is very important to have both. I think like, so there are, you know, several startups popping out recently that are really, you know, focused on core research. and not, don't even necessarily have like an idea how to productionize it or turn that into, you know, a product or revenue stream.

Starting point is 00:50:09 I think like, you know, I personally am fairly skeptical of this approach. I think, you know, for a few reasons. I think, first of all, you know, you know, big labs, you know, have tons of resources and also have, you know, large teams focused on this sort of thing. I think, yeah, I think like, you know, ultimately the goal of a company is to is to make money, right? And so I think, you know, eventually, you know, if you are, if you are a company of this form, like, you need to eventually deliver, like, you know, massively outsized returns, you know, at some point. And so I think you're, you're taking a big risk where it can kind of be an all or

Starting point is 00:50:45 nothing type thing. I think the flip side of like a sort of product only company that's built on AI models that are built by other people, I think that is like risky in the sense that you don't have as much of a moat. So, you know, like we saw this with, you know, the initial chat GPT or, you know, going from GPT3 or GPD4, right, a lot of these wrapper companies kind of just got made obsolete by the fact that the base models improved so much that they could often just do what the wrapper was trying to do by by themselves with, without very much scaffolding. And so it became kind of thing you can just build, build in-house rather than needing another company to, you know, post-process the output of these models. I think, like, being in the intersection is

Starting point is 00:51:33 actually quite valuable for, you know, for many reasons. I think having a product, a real product that customers use is something that can drive the research. So you see firsthand the issues, and you can use that to drive, you know, your next iteration of modeling, you know, try and fix these issues, not as a Band-Aid, but, like, you know, from the ground up. right, like from at the model level itself. And so I think having control over the models is like very important when you're, when you're building an AI product, which is not to say that like, you know, there's no room for any non-research company.

Starting point is 00:52:13 I think it just like, it has to be in the, you know, right kind of space. And so, yeah, I think Cartisa has a great blend of research and product. You know, we're very, I would say we're first and foremost a product company. but we want to build the best products we can and we believe that that requires us to actually solve some of these fundamental research problems in order to do that. I think there's a lot of people who want to get into AI research.

Starting point is 00:52:42 I mean, it's just talking to a friend today who's a SIE and he's saying, I don't think software engineering is going to be around in end years or something like that. So he's been investigating. I'm curious, do you have any advice for someone who is technical and wants to move into AI research? My philosophy has always been

Starting point is 00:53:02 to try and build up my technical skills as much as possible. I think if your fundamentals are good enough, like, you know, at some point, you know, the opportunities will just come to you rather than the other way around. And so I would say just focus on getting as good as you can at, you know, at coding,

Starting point is 00:53:20 at, you know, AI, read tons of papers. Yeah, I think math skills and math intuition are really important. And so that's what I've kind of been optimizing for, you know, ever since undergrad when I realized what I wanted to do was at least, you know, some combination of math and computer science. And so I've always more focused on like kind of building up those fundamentals. And I think that is the way to get your foot in the door. I think like bigger companies, it can be a little harder to pivot, you know, teams or what you work on. And so for for something like that, I think you're just switching like teams or companies.

Starting point is 00:53:56 companies can be like, you know, sort of the only path forward. I think you can kind of get siloed in a little bit if you're at a bigger company sometimes. Although I do think, you know, some companies are better about it. And, you know, I have seen people transition from suez to research and and stuff like that. So I think, you know, this is one of the areas where getting a sort of qualification on your resume can be useful, like getting a, you know, master's in AI at least or something like that can help when, when you're, you know, looking to make a sort of lateral career change like that. You're saying there's kind of two common paths. One would be get more education and use that qualification to kind of pivot directly into AI research or go to a startup where you can kind of like mold yourself into an AI research role. That's kind of right. But I think even if you want to go to a startup, right, but you want to sort of switch from

Starting point is 00:54:52 a SWI track to AI track, like there's got to be some. There's got to be something behind it, right? You have to have some evidence of a skill set, whether it's like sort of, you know, organically grown or from, you know, from schooling. But I think it can probably be a lot easier to get your foot in the door if you have some evidence of it on your resume. So like, let's say you were, you hired at Cartesian and then that person comes to you and he's like, hey, I want to do more AI research. in that case, is that something where it's like just flip the switch and next project, this AI research project? This has actually happened, you know, in Cartesian itself. You know, we have

Starting point is 00:55:35 had people transition roles like that. So I think it is definitely easier at a startup, which is, you know, it can be a bit more flexible just because, you know, everyone kind of knows everyone. And so you can get a sense of, you know, whether this might be an appropriate career changes by like kind of knowing the person for a while. And so, yeah, I mean, we've actually done, you know, people have done this in Cartesian with, you know, a lot of success. Do you have a biggest regret when you look back on your whole career? I mean, I think I often overthink things and I think I have spent a lot of time regretting, you know, past decisions that turned out not to matter in the end. And I kind of regret the amount of time I spent regretting other things. So, you know, I try to learn from that now. You know,

Starting point is 00:56:20 I think like, you know, don't sweat the small stuff. Like, you know, minor setbacks happen and they happen. But I think, you know, there's a risk of, you know, putting too much stress on yourself and you're, like, beating yourself up and stuff like that. And those are just like not productive ways to spend your time and they don't make anyone feel good. And so I think, yeah, I try and, you know, I try not to regret stuff because, yeah, I think it's just not a super good use of time. If you had to go back in time and you could give yourself some advice when you're just entering the industry, what would you say? Focus on building the deep technical skills. Yeah, don't waste time with like sort of trifling stuff or spreading yourself too thin.

Starting point is 00:57:05 Yeah, just like focus on what you want to focus on, I guess. Like basically like the skills that you want to leverage in your day job, just like do those and get good at those. and you know, that's where you should spend all your time at work. And, yeah. You make it sound so simple. Maybe it is. You know, it is a simple, it's kind of a simple recipe that's very hard to follow, right? Like, it's very hard to maintain that discipline.

Starting point is 00:57:33 It's kind of like, you know, what's a secret to being healthier? It's, you know, exercising, eating right. And those are things that are just very much easier said than done. but yeah, I think that it is that simple. Awesome. Cool. Well, yeah, thanks so much for your time. Thank you. Thanks for listening to the podcast. I don't sell anything or do sponsorships, but if you want to help out with the podcast, you can support by engaging with the content on YouTube or on Spotify if you want to drop a review. That'll be super helpful.

Starting point is 00:58:07 And if there's any guests that you want to bring on to, please let me know. I feel like sourcing very senior ICs. There's no well-studied list out there on Google that I can just search this up. So if there's someone in your org or at your company who you really look up to and you want to hear their career story, let me know and I'll reach out to them.

The Peterman Pod - Ex-Citadel Quant and AI Researcher On Breaking In, Tech vs Finance Careers

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.