Orchestrate all the Things - The State of AI Report, 2021. Featuring AI Investors Nathan Benaich and Ian Hogarth

Episode Date: October 14, 2021

It's this time of year again: reports on the state of AI for 2021 are out.  In what is becoming a valued yearly tradition, we caught up with AI investors and authors of the State of AI report, ...Nathan Benaich and Ian Hogarth, to discuss the 2021 release. Some of the topics we covered are lessons learned from operationalizing AI and MLOps, new concepts and datasets, language models, AI ethics, and AI-powered biotech and pharma. Article published on ZDNet

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. It's this time of year again. Reports on the state of AI for 2021 are out. In what is becoming a valued yearly tradition, we caught up with AI investors and authors of the State of AI report, Nathan Benay and Ian Hogarth, to discuss their 2021 release. Some of the topics we covered are lessons learned from operationalizing AI and MLOs, new concepts and datasets, language models, AI ethics and AI-powered biotech.
Starting point is 00:00:38 I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook. So, you did it again uh what fourth time i was i was beginning i was trying to um to remember like what version is it it's the fourth already okay yeah yeah v4 and i think it's growing every year at least that's that's what it looks like to me. Yeah, it's growing with inflation. I think we're at 180 or so now. So just a couple more from last year. Okay.
Starting point is 00:01:12 Yeah, I don't know. Initially, I thought it was over 200 and, you know, I was a bit scared, as I should be. But I just managed, you know, to just skim through it in time, just in time. So I figured, okay, since it's this time of year that AI reports are coming out, and there's many people who are familiar with the one that's done by Matt Turk. And actually, one of my ZDNet colleagues covered that last week. I thought, okay, let's begin by positioning, let's say, what you do in relation to that one, which many people are familiar with. So I have my own views as to how that differs, but I'm sure you're familiar with both and you can probably better frame it yourself.
Starting point is 00:02:09 Yeah, for sure um so when ian and i started this rapport project four years ago like our view is really that we um both by virtue of being investors at different um or mostly early stages but uh within machine learning we had access to major AI labs, major academic groups, various up and coming startups, and then bigger companies too, and increasingly people who worked in government. And so I had, by virtue of that, a window into the different vantage points of these stakeholders within the ecosystem. And we thought there's a big opportunity to create a public good product. And that was open source that tried to synthesize the most interesting and important things that happened across industry research, politics, and talent in order to inform each one of these stakeholders who might not in their day-to-day job,
Starting point is 00:03:02 consider the view of the other stakeholders that they don't interact with so um and that way we could level the playing field of of discourse and conversation in ai and very much from an editorial point of view it's a reflection of what we think is the most interesting and and um and of course like cross-checked with reviewers who are active in this field every day. Um, and yeah, other reports in particular, Matt Turks, I think is, um, a very like in-depth view on in particular, like the industry and the technology trends that are driving industry. Um, and I think are very valuable to, uh, especially like software buyers and, um, and big companies and, you know, startups trying to make their way through this landscape, which is incredibly bit more broad in that sense and considers different angles, which are probably outside of the scope of more vertical reports like MAD.
Starting point is 00:04:11 Yeah, I think that's, you know, as a reader, let's say that's precisely my impression as well. So if that's, and obviously that's by design, so I guess, you know, well done, goal achieved. I would say, I would confirm that yours is actually a more broad. One part that I found interesting in Matt's report is, well, actually not exactly and not technically part of the report, but the fact that they now have an index. I mean, you know, indexes are, you know, just right, left and center these days. And I find it interesting that, you know, someone realized, left and center these days. And I find it interesting that, you
Starting point is 00:04:45 know, someone realized that, as you also pointed out earlier, where this domain is quite dense these days. So it makes sense to create an index. And I was wondering, well, what you think of it, and whether you actually either plan to do something similar or maybe already have something similar internally? Yeah. I mean, I'd also seen Lux Capital's index indices that they'd produced on, you know, themes like bio and manufacturing. I think indices are a great idea. So, I know you beat me to it. I think it's cool because it's like the most, I guess like the most pure reflection of how you think a category is moving. And by virtue of being built by somebody who's investing
Starting point is 00:05:35 or building companies in this field, as is different from somebody who's commenting on the field from an investment banking point of view, I think you're just getting a much more realistic representation over a sector. So I think index is a great idea. Yeah, cool. So even though I really, really appreciate the fact that,
Starting point is 00:05:58 well, first of all, you have a kind of standard structure in the report by now. And I think you actually start by going into research side of things. And actually, it's a little bit of a trap for me personally, because I find it very interesting. So I tend to dive right in. And then by the time I have digested it, I'm kind of out of steam. So I'm going to do things a. So, fish here. I'm going
Starting point is 00:06:25 to do things a little bit different and start with the industry insights and I tried to pick the ones that I found personally more interesting and have not been already picked up by others and more specifically by Tony who did the analysis of Matt's report already last year. So there were many valid, I would say, insights that you both share, such as the high, obviously, there's lots of funding flowing in and many IPOs and great valuations and all of that stuff.
Starting point is 00:07:00 I think that's been covered already. So I thought I'd focus more on the technical side of things still within the industry. So one thing that caught my eye was what you call insights from machine learning in production. So nudging researchers to go from model-centric to being data-centric. And I think that kind of solidifies, let's say,
Starting point is 00:07:28 sort of hands or insights, non documented insight, let's say I've heard from from many people in the last few years, it goes something like, well, you know, you can have the best algorithm, but in the end, you're probably better off paying more attention to your data like data quality than trying to come up with sometimes you know not kind of negligible improvements in the algorithm so how do you see that playing out in the industry yeah we thought we thought it was really important to highlight the basically renewed attention in more industry-minded academic work around data quality and various issues that can reside within data that ultimately propagate towards the model and determining
Starting point is 00:08:25 whether the model like predicts well or predicts poorly and um you know i think the vantage point is like a lot of academia was focused on competing on benchmarks like status but static benchmarks and showing model performance offline on those on those benchmarks and then moving into industry so like the generation one was a lot about like, let's just get a model that works, works quote unquote for a specific problem. And then kind of deal with any breaks or any changes like whenever they happen.
Starting point is 00:08:56 And we tried to dig into this a bit last year, but there's been a huge amount of like money and interest and engineering time that's been thrown into devops for machine learning otherwise known as ml ops and um this is motivated by that idea that like machine learning is not a static uh like software product that you can write once and forget about you have to constantly update it and it's not just updating the model weights it's looking at how uh your classes might drift over time looking if they're if you're like still using the right benchmarks to determine whether a new model that you've trained is going to work in production or not um you know issues with like randomly like choosing different random seeds to your model and then seeing like completely different behavior on real world data, or even that, you know, the potential issues and then thinking about more systematic ways of solving them
Starting point is 00:10:07 and building software that is ultimately more robust and high performance. So, I mean, you're right. And actually, MLOps, as you pointed out, is gaining, is growing, is getting more mindset, let's say, and rightly so, I would say, because, well, there are many moving parts in building software, and DevOps was an answer to that,
Starting point is 00:10:36 but there's actually even more moving parts in building and deploying machine learning models, so it's even more complicated. And I think that's a hard realization that people are hit with when either entering the domain or you know as the progress through it. And the flip side to that or actually another side effect let's say of that is what you called distribution shift when you do machine learning in production. And in a nutshell, I would frame it as, well, the fact that because you have this, you know, MLOps cycle that you use to manage your data sets and your models and all of that stuff.
Starting point is 00:11:15 So it kind of happens sometimes that by the time you hit production, your data set may have changed and therefore there may be some discrepancies. So have you seen that playing out and actually causing issues? It's hard to pin specific examples because most of these issues would probably end up in companies and they sort of wouldn't want people to know about it. But I think one of the obvious areas is around pricing, you know, on like various retail websites, quite frequently, depending on how much information they found at least was that there's two major new data sets that were released. And it's not just by an academic group or an industry group.
Starting point is 00:12:12 It's really a collaboration between the two. And it looks at different distribution shifts across different modalities of data and different particular topics. So this is some work from Stanford and Yandex and they had data from medical imaging from like wildlife monitoring satellite imaging etc so very like industry-minded domains and I think the value of having more industry-minded data sets being used in academia means that ultimately academic projects are more likely to succeed in the production environment because like there's there's less distribution shift or less like a generalization error that will occur when you move from industry to academia and vice versa.
Starting point is 00:13:08 So we were very encouraged by this and so we felt the need to showcase it. Yeah, I mean, it does sound encouraging. And actually, to be honest with you, part of the reason why that picked my attention was simply, well, just establishing the terminology. I mean, it's something that obviously, you know, happens. I just hadn't heard the term before. So I said, well, okay, it's good to call that out. Yeah. And I would say the same thing probably for data cascades,
Starting point is 00:13:35 which is, again, in a nutshell, the fact that, well, sometimes when you get, you know, lower quality data, certain point in your pipeline, that kind of trickles down through the pipeline and you may have unexpected and typically, well, bad outcomes from this. And again, the notion is kind of intuitive, at least to me, and I guess to most people that have at least some exposure to data pipelines, but it's a relatively new term.
Starting point is 00:14:05 So I thought it's worth commenting on. Yeah, yeah. I mean, as you said, it's a pretty fairly intuitive idea of just the domino effect of if you have a problem at the start, it's gonna likely come down by the time you get to the last domino. And what's notable here in this report was that like the overwhelming majority of data scientists reported having one of these issues or experiencing one of these issues.
Starting point is 00:14:48 And the second thing was when they were looking at trying to attribute why these issues actually happened, it's mostly due to just lack of recognition of the importance of data within the context of their work in AI, or be the lack of adequate training in this domain, or just the problem of not getting access to enough specialized data for the particular problem that they were solving. So I think this is yet like another example of in some ways like adding new requirements to research papers such that in order to get published they must at least look at you know having open code accessibility and a github repo attached to it that the project is reproducible that the data is is legible and of high quality and that data cascades don't occur, et cetera, et cetera, because these cascades are dangerous
Starting point is 00:15:35 if you're not aware of them. Yeah. And that's, I guess, another instance of work kind of being exposed to the notion and the discipline that it requires to deal with it early on. So in research environments, it should be beneficial when these practices are moved on to production environments, let's say. Yeah.
Starting point is 00:16:05 And that's not even mentioning kind of cascades of problems that occur when you implement one, one paper and then with another paper and a third paper, and then there might be some meta cascades happening there as well. Yeah. Yeah. That's, that's even,
Starting point is 00:16:18 even harder, even more complicated. Okay. So let's shift gears a little bit from the mlop side of things to something i know you're that's near and dear to you uh personally and well to the founder that you manage as well so uh the biotech space in which i risk i saw that you recently had your uh your first company that IPO'd, so Excientia. And you also, in last year's report, this was one of your predictions as well.
Starting point is 00:16:53 And you made good on that. And luckily for you, even with a company that you personally had invested in. So that's even more of a vindication, I guess. Congratulations on that. And there was also another one, Recursion Pharmaceuticals, and they also IPO'd. And I was wondering if you'd like to just, you know, share a few words on those two IPOs, which I think, especially on the Excientia one, you're obviously in a very well position to comment on that yeah yeah um so last year we made the case that um that a biology was quote-unquote
Starting point is 00:17:29 experiencing its ai moment which is a reflection of a huge inflection in the number of papers getting published that essentially tore out an old school method of doing some kind of statistical analysis of some biological experiment and then replaced it with in most cases something deep learning and it automatically became better and and sort of felt like there were a lot of like low-hanging fruits within the biology domain that could fit into this paradigm and last year was the time when this sort of problem solvingsolving approach of using machine learning for various things went on overdrive.
Starting point is 00:18:09 And one of the outputs of this idea of using machine learning in biology is in the pharmaceutical industry. And for decades, we've all known and all suffered the fact that drugs take way too long to be discovered, to be testedie saying, you know, I think this gene is responsible for this disease. Like, let's go prosecute it the map of what they should focus on. And that's been really unlocked with especially like new approaches in deep learning. And the former category has largely said, well, the latter category has been tried before, sort of doesn't work. That's computational chemistry and physics, blah, blah, blah. And the only way to validate whether the latter approach works is if they can generate drug candidates that are actually in the clinic. And ultimately, can they get those drugs approved? And so why we think that these two IPOs are a pretty big deal is that they both generated and designed molecules using their machine learning
Starting point is 00:20:06 system, which will, for example, take a variety of different characteristics of that molecule and then set the task to the software to generate ideas of what a molecule could look like that fit those characteristics and kind of meet the trade-off requirements. So it's the first company that had three of those drugs in clinical trials in the last 12 months. And their IPO documentation makes for an interesting read because they show that the time to market for, well, not a time to market, more like the number of ideas, of chemical ideas,
Starting point is 00:20:41 that the company needs to prosecute before it finds one that works is an order of magnitude lower than what you see for traditional pharmaceutical companies. And so we think it's a major new wave in the pharmaceutical industry. And even though it seems kind of big to technology folks like us, it's still very, very small in the overall context of the industry where these behemoth still very very small in the overall context of the industry where you know these behemoth pharma companies are worth hundreds of billions of dollars and together recursion and xcnt are worth you know at best 10 billion uh so it's still incredibly early
Starting point is 00:21:14 days and um and uh yeah many many more many more innovations to come so what do you think of this kind of approach basically being adopted by the behemoths? And I'm asking because well kind of recently a few months back I had the chance to have a conversation with some people who actually specialize in let's say machine learning and AI applications geared towards specifically this domain. So they do, for example, specialized NLP for biotech and this kind of thing. And their take was that, well, this is seeing more and more adoption even within the behemoth.
Starting point is 00:21:59 So expect this whole process to go 10 up soon. Even locally in London, AstraZeneca and GSK are beefing up their machine learning teams quite a bit too. I think it's one of those examples of a mentality shift of how business is done. And as younger generations who grew up with computers and writing code to solve their problems, as opposed to running more manual experiments in their spare time, like end
Starting point is 00:22:38 up in higher levels of those organizations, they just bring different problem solving toolkits to the table. And so I think it's inevitable. The question will ultimately be, can you actually shift the cost curve and spend less money on fewer experiments and have a higher hit rate? That'll still take time. It's also not the only frontier in which machine learning is impacting pharma companies. You have other examples, for example, following research literature, something where software
Starting point is 00:23:14 is now, machine learning enabled software is now kind of being used. So there are kind of these other quite interesting examples where as you get used to buying software, the CIO gets used to buying software in one area they get more comfortable about you know paying for it in others yeah you're right actually that particular area so uh following uh the research literature was the one that the people i had this conversation with pointed out because well it's it's their their their area of expertise so they do, they actually train specific language models to be able to do that in their particular domain. Was it Causally? No, it's not Causally, it's the people who do the Spark NLP libraries,
Starting point is 00:24:01 the name escapes me now but well you probably know them. But yeah, causal is a good example of that as well. Speaking of which language models that is, that's a good segue to talk a little bit about that, because obviously, you know, it's another very interesting topic, not just in terms of research. You also point out some breakthroughs in the research section. But for this conversation, I think I'm more interested in the commercialization of those language models, at least to begin with. So you also highlighted in the report how, for example, OpenAI went from publishing their models in their entirety
Starting point is 00:24:47 to actually making them available through an API, and how this gave birth to a kind of ecosystem, let's say, of startups that are trying to leverage this model to produce their specialized products on top of that. So, and I have to say that I also was exposed a little bit to a couple of those startups. So you see, for example, companies that specialize in doing like writing marketing copy or advertising copy or that kind of thing. And in most cases, you know, in my personal experience, I wasn't, I wasn't particularly impressed, let's say, by what I saw, but I don't know. I'd just like to ask what you think of the strategy in general, and whether you've had the chance to evaluate any of those startups at all,
Starting point is 00:25:37 and you know, what your take is. Yeah, I've played around probably with similar versions than you have. So generating copy, like creating emails or like templated emails to some degree, creating like automated LinkedIn, like messages and things like that. I'd say like good, not great. But I think the nice benefit that OpenAI has generated with this is just like massive awareness over what language machines could do if they get increasingly good. And I think they're going to get increasingly good very, very quickly, especially as OpenAI starts to build kind of like daughter models of GPT-3,3 which is i think what they refer to codex as um and codex itself is a pretty epic product which has been um crying out for somebody to build it and i've always been afraid of a startup building it in case github at some point does it
Starting point is 00:26:39 and so finally i feel vindicated in that degree but like um so i i think it's yeah opening creativity and then like the vertical um focused models that are based on gpt3 i think will probably be excellent i think it's the other way to think about it is kind of uh there is a certain quality of fashion with what developers kind of coalesce around and i I think we've profiled kind of natural language breakthroughs over the last kind of few iterations of the report. And I remember really being floored by the work that Prima did on, I think it was a couple of years back now, where they had a machine that read Wikipedia,
Starting point is 00:27:23 figured out notable computer, i think it was computer scientists or scientists who should have been included um with a focus on on women and then use their their primary system to actually write the wikipedia bios and add them for the uh for the missing people and there have been these kind of examples of where i think you know really cool stuff is going to be possible in industry. But I don't think anyone had applied that kind of world-class Silicon Valley marketing engine to the problem. And so I think a lot of what, as Nathan says, what OpenAI did there, I think was sort of leverage their research credibility with GPT-3 and then just really market Codex in a way that
Starting point is 00:28:09 caught the attention of a much much larger set of people. It felt like an Apple developer event more than a typical AI company's developer event. That's also another interesting aspect of the commercialization of these models. The fact that you also kind of touch upon it in the research section that, well, many of these models are now being used in ways that were not originally anticipated, basically. So they're either applied to different domains or used in a different way. So for example, this whole prompt approach.
Starting point is 00:28:41 And I think that's also an interesting aspect of these models. Yeah. And I mean, even beyond that, where you see similar language models getting used in biology. So we saw this pretty amazing paper in science that like re-implemented a language model to basically learn the viral spike protein and then which versions of the spike protein on COVID-19 were more or less virulent or confer more or less virulence, and then use that to forecast what kind of evolutionary path the virus would have to take in order to birth more or less virulent versions of it, which you could think about like from a proactive stockpiling of various um
Starting point is 00:29:26 vaccines is is pretty is pretty cool so i just think it's amazing that like these models can internalize um various various basic forms of language no matter if that's like biology chemistry or you know human language that's the part that i was like most interested in it's also in some funny way unsurprising and that language is so um malleable and extensible right you know we've used language describes so many different aspects of kind of human endeavor and human knowledge so it is kind of it's fundamentally pretty general purpose so i think we're only going to see the kind of unusual applications of language models grow. Okay, so there's many aspects of that that we could comment on,
Starting point is 00:30:16 and I find it very interesting, so let's try and touch upon as many of those as possible. So speaking of language in general, you also mentioned the benchmark. So there used to be a benchmark called GLUE, which was used to measure how close to human performance those language models could get. And now there's another version that has been actually, I think, since last year, which is called SUPERGLUE, and it's more demanding, but that has also been solved. And you make some interesting observations on that in the report, and you can go through them. What I'd like to contrast to that is that, well, I don't know if you're aware,
Starting point is 00:30:54 recently the Gradient Awards for this year came out, and one of the runner-ups was an article which was written by Walid Shaba. And in that, he basically reiterated what has been his chronic, let's say, stance on this, which is like, well, okay, you can train supermodels all you want. You can memorize the entire internet, if you will,
Starting point is 00:31:18 but that's not going to solve natural language understanding. So what's your take? And again, keeping in mind the superglue benchmark. I feel like sometimes these benchmarks and models is almost like chicken and egg. It's hard, you know, you see state of the art getting pushed in a certain direction and then you realize the benchmark is not good enough to appropriately test the limits of that technology.
Starting point is 00:31:50 So I don't know. I feel like, I guess there's more and more effort towards trying to figure out what is a future-proof benchmark and some of the active benchmarking works that we showcase are a hint in that direction but the other part of me says like it's going to be this constant chasing of something new comes out a new benchmark is needed or a new benchmark is generated something new comes out i think of this stuff as being um i've always liked the the framework that that some maybe more uh agnostic researchers have taken here so i think you can kind of say there's kind of almost like an uh you
Starting point is 00:32:36 know an atheist uh view here which is these models aren't going to get us that much further they're kind of kind of fundamentally they don't really understand anything there's like the the sort of true believer view which is all we need to do is scale these up and they'll be completely sentient i think there's kind of almost a view in the middle which is sort of the maybe slightly more agnostic view that says we've got a few more big things to discover but these will be part of it um and i've always liked that view because it feels kind of it's always felt kind of uh like it has the right amount of deference for how much these models are able to do, but does kind of capture the fact that, for example, causality or something, you know, you've got these kind of major blocks that are sort of missing in this, missing at the moment
Starting point is 00:33:19 that kind of hold back real confidence that this can scale. Okay, well, talking about ever-growing language models, basically, and more parameters and more, that also means more resources to train. And this kind of touches upon another aspect of the AI ethics aspect. And it was the whole Tim Timnit Gebru story and that was a big deal last year and I think for good reason and actually I think that most people that are kind of caught up let's say in that story tend to focus on the racial aspect. I think that what's equally and perhaps even more important in Gebru's findings and what she pointed out was, well, the fact that, you know, these language models, these humongous language models also have very real sustainability implications because, well, you need humongous
Starting point is 00:34:17 resources to train them as well. And that's, you know, as we see ever-growing language models, I think also one of the things that caught my eye in the report was some recommendations, let's say, by Google, if my memory serves me right, on how you can potentially lower the impact of training these humongous language models. I think it was Google and Berkeley, actually, that did work to try and figure out
Starting point is 00:34:42 how can we reduce the carbon emission, the carbon footprint of these large language models. And they considered a couple of different things. They considered, number one, do we use a dense network or a sparse one? Number two, where do we train the network geographically? What data center is it in? And then three, what processor do we use? And then they basically played around with these different variables
Starting point is 00:35:04 to see what would be most effective. And in particular, what's interesting is they say that GPT-3 has the highest energy consumption out of other models like the switch transformer or G-shard. And it also produces the most net CO2 and generally uses the most compute in terms of accelerator years is the term that they talk about. Perhaps because it's Google, other infrastructure is generally more efficient and less energy taxing and CO2 emitting. I think it's generally like a move towards some more awareness
Starting point is 00:35:56 over what are the variables that we actually can control. And I wonder whether there would be like a dynamic like rooting layer to use the best data center on the best processor to optimize for this but it's probably like a lot of a lot of tbd i think the to me the kind of the heart of uh of of what happened with with timnit and google is that uh the topics that that timnit were timnit was investigating and, you know, I, I admire her greatly for the contribution she's made to the, to the field sort of started to come in, into sort of more explicit tension with, um, with kind of the, the, the sort of for-profit nature of the
Starting point is 00:36:39 business model of Google. Um, and that, you know, there's, there's multiple dimensions of that, but ultimately if you're going to start to put, know these kind of these large language models into production through you know through large search engines it you know it gets there is more tension that arises when you start to question the you know the bias within those systems or or kind of environmental concerns you know you're kind of you're ultimately creating a a challenge for your corporate parents to navigate as it as it as it sort of um you know put these uh puts this research into production um and it feels like the most interesting response to that uh has been the rise of alternative government structures so you know when Alutha AI launched, they explicitly said that, you know, they were trying to provide access to large pre-trained models, which would enable large swathes of research that would not be possible while such technologies are locked away behind corporate walls, because for-profit entities have explicit incentives to downplay risks and discourage security programming. So there's kind of this interesting sort of like, like wider response, I think, which is not just kind of the,
Starting point is 00:37:53 the dismay and within the AI community over how Timnit Cabru was treated, but also this kind of this, this sort of, you know, system level response to start to really build open-source alternatives. And in the same way, it's kind of notable that Timnit's collaborator, Margaret Mitchell, is now at an open-source project hugging face. Actually, I was meaning to ask you about Luther AI. It's something I wasn't aware of personally, and I just found out
Starting point is 00:38:24 by reading your report. And it's a treasure wasn't aware of personally. And, you know, I just found out by reading the report. So, and it's treasure trove in that respect. But yeah, what impressed me about them was basically, yeah, their stated intention, which you just reiterated. And it kind of made me wonder, okay, sounds great, but is it viable? Is it sustainable? And we just talked earlier about how OpenAI was kind of forced,
Starting point is 00:38:47 let's say, to commercialize their own language model. So how is Luther AI going to be sustainable? I think that maybe the first big difference between the early era of OpenAI and Eleuther is that OpenAI was never explicitly a kind of open source project. So there are, you know, it's not quite a sort of apples to apples comparison in that Eleuther has achieved all sorts of things without kind of having to form a
Starting point is 00:39:27 for-profit structure and kind of ultimately has sort of governance that is more community-based than the early governance of OpenAI, which is much more tied to a kind of handful of patrons at the start, you know, Elon and some open and others. So I wouldn't, I wouldn't kind of, I wouldn't make too strong a comparison between what I Luther is doing and the early days of open AI. I think they're quite different in the in sort of the underlying organisational
Starting point is 00:40:01 structure. Okay. And, yeah, well, time flies by, by the way. I think we're almost out of time. I may be able to squeeze in one last question about Anthropic. Because I think that was also kind of a big deal. You described them as a kind of third pole, basically, in the going-after AGI game, let's say. So do you think they have a chance?
Starting point is 00:40:29 On paper, they look quite impressive, but I don't know. What's your take? Well, maybe I should answer this one, but I should also be kind of be transparent that I'm an angel investor in Anthropic. So I'm not a biased kind of observer. My take is that the people who left OpenAI
Starting point is 00:40:52 to create Anthropic, they have tried to pivot the governance structure by creating this kind of public benefit corporation and to try to sort of set up a long-term benefit committee that basically hands control over the company to people who are not the company or its investors. And I don't know how much progress
Starting point is 00:41:22 has been made towards that so far, but there's kind of quite a fundamental governance shift and I think that that allows for a new class of actors to come together and work on something but the main reason I think one could be positive and bullish about Anthropic as a third poll is just the caliber of the early team you know in particular people like Dario Amodi, who, you know, with the most senior researcher at OpenAI and pioneered all of the various large language model work that OpenAI is now kind of understandably famous for. Okay. Yeah. Thanks. And yeah, I think we're officially out of time. As usual, you know, we only get to cover like a fraction of what we could potentially talk
Starting point is 00:42:05 about. But well, it's a humongous report that you put out there. So I'm well aware of that. Okay, so that's it, I guess. Thanks. Thanks a lot for your time, as always. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.