Orchestrate all the Things - The state of AI in 2020: democratization, industrialization, and the way to artificial general intelligence. Featuring AI investors Nathan Benaich and Ian Hogarth

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amatiotis and we'll be connecting the dots together. Today's episode features a discussion on all things AI with authors of the State of AI 2020 report, Nathan Benoist and Ethan Hogarth. Benoist and Hogarth work on the intersection of industry, research, investment and policy with extensive background and various currently held positions such as venture capital investor and researcher. This gives them a unique vantage point on all things AI.

Starting point is 00:00:34 Their report, which is published for the third year in a row, is their way of sharing their insights with the AI ecosystem at large. I hope you will enjoy the podcast. If you like my hope you will enjoy the podcast. If you like my work, you can follow Linked Data Registration on Twitter, LinkedIn, and Facebook. Well, it would be nice to say a few words about you and why you do that. And I would also ask you to report a little bit on the meta of the report. And the reason I'm saying that is because, honestly, it was both interesting and a little bit on the meta of Underreport. And the reason I'm saying that is, well,

Starting point is 00:01:05 because honestly it was both interesting and a little bit of a pain to go through that because it grew in size considerably from last year. So I'm wondering, you know, whether you expanded the team, which I think you did, and how long it could possibly have taken you to compile this. Yeah, well, I mean, there were definitely points when we were making slides where we were wondering

Starting point is 00:01:28 why we're doing it. But I think that, you know, it goes back to, you know, conversations that Nathan and I have had. And, you know, obviously Nathan and I are both investors, Nathan through Airstreets, his venture capital firm, and myself as an angel investor. And one thing I've always loved about interacting with Nathan is that he takes a very rigorous, almost academic approach

Starting point is 00:01:53 to following what is happening. And I found that we were talking as much about research papers and sort of interesting policy documents as we were about startups. And I think both of us felt that we were in quite a privileged position because we were interacting with some of the world's leading AI researchers, investing in some of the sort of the most interesting early stage technology companies. And, you know, in my case, doing quite a lot of work on the policy side and interacting with some people in sort of government around AI.

Starting point is 00:02:30 And so I think we kind of felt that we had this unusual vantage point that spans lots of different kind of ways of looking at the field and when we would talk to researchers they would not be that familiar with what was happening in say startup land and when you talk to sort of government people they're not necessarily that familiar with what's happening in say research world and i think actually you know we believe that the more the more we can join those dots up the better the better will you know better things will go for the field and the more likely we are to make sort of uh responsible choices that span research you knowization, and policy. So it's really a sort of, I guess, a piece of open source work that we put out there.

Starting point is 00:03:14 We're not researchers ourselves. And so it's sort of something that we can contribute back to the wider community of machine learning to sort of move things forward and hopefully sort of steer discussion and collaboration across these different areas. Thank you. Yeah, I think that sums it up quite nicely. And then maybe to just answer your question on who makes it

Starting point is 00:03:37 and whether we have a team. Yeah, we have discussed like growing the team, but it's essentially the handcraft work of Ian and myself. We take probably, I guess, two months or so to put it together. But it's the result of quite detailed and systematic exposure to all these different areas over the last 12 months by virtue of our um you know investing work but also by virtue of the newsletter that i write the guide to it and if anybody listening to your podcast uh is looking for a paid internship uh then then perhaps we can uh we can uh we can talk well i'm sure that there may be people interested in that, and I can appreciate the amount of works that goes into it.

Starting point is 00:04:26 Well, you know, there's nearly 180 slides, so just, you know, casually browsing through them would take you something like three hours, I guess. That's how long it took me, at least. Thanks for taking the time. Yeah. So it's an interesting coincidence, you know, that your report is coming out now, because as you probably know, Gartner's latest Gartner Hype Cycle on AI was just released yesterday as well.

Starting point is 00:04:58 So it's not nearly as extensive as your work, but there's two interesting main points in it, which I would like to get your opinions on, because I find them interesting because I basically disagree with both. So the main points that they put forward is what they call the democratization of AI and the industrialization of AI. So in my disagreement in terms of democratization, and I think it's a point that you also touched in your report, is that, well, define democratization, basically,

Starting point is 00:05:32 because it seems to me that it's a very expensive and elite sport to be doing AI on the top level, at least. And this is something that I see clearly outlined in your report as well. And as far as industrialization goes well yes I mean things are improving there's you know more streamlining in terms of tools and availability but in my opinion getting a model from you know from the lab to production is still more craft than more craft than than science let's say so but what do you think on those yeah um i mean i think those are those are two also kind of um important topics that we can pull out in the report um so i think with the the notion of democratization there's you know a couple of

Starting point is 00:06:22 angles we can take on it. The first one that we think is super interesting is this notion of AI research being open or closed. And what does that actually mean? On slide 11, we look at data that describes archive publications. It essentially asks of the publications that are in archive, how many or what proportion of them include the code base that's been developed and used to produce the results that are published in those papers. And what you find is that, you know, for the last two, three years or so, that number has been very, very low. So it's only around 15% of papers that actually publish their code. So to the question of democratization, you know, one of the crucial components of democratization is reproducibility and openness and the fact that, you know, tools are modular and can be built on one another and exchanged freely. So, you know, we think that this kind of topic of

Starting point is 00:07:24 research being less open than you'd think you know deserves a bit of a bit of discussion and consideration for how that will impact the field moving forward just building on that i think it's you know it's worth saying that historically machine learning has been incredibly open you, a huge amount of sort of, uh, you know, open data sets and, you know, uh, you know, open up, you know, use of the archive, et cetera. And I think that any, uh, company or large research organization kind of, um, coming into the field has had to sort of, uh, respect that to some degree, right. So there's only so closed it can get without a sort of major um you know major backlash but at the same time i i think that uh you know this data would clearly

Starting point is 00:08:13 indicate that the field is certainly sort of finding ways to be closed when it's convenient yeah um but but having you know having said that um elsewhere in the report we look at perhaps a subject in open source that relates to democratization but also industrialization and that's the topic of what are the popular GitHub repos these days

Starting point is 00:08:37 and what we find is that in Q2 this year some of the more popular and fastest growing GitHub repos are machine learning based. But within that category, they generally relate to MLOps, like machine learning operations. And your audience will probably know this well,

Starting point is 00:08:57 but essentially DevOps applied to machine learning, which is when you do have your model in production, how do you make sure that over time it's still creating value? And if that value suddenly gets destroyed for either changes in how your users are engaging with your product and what kind of data they're creating, what can you do as a developer to fix that and interpret when issues are actually being flagged.

Starting point is 00:09:29 So, you know, at least by virtue of kind of communities in the engineering and open source world that we live in and interact with and the startups that we invest in, it's certainly looking like investments and interest in MLOps as it relates to the progress of machine learning outside of R&D or experiments and into real world production production for real-time systems is definitely growing.

Starting point is 00:09:51 I think it's probably worth saying that in general, I would say that we see brilliant startup founders probably finding it easier to get started than they would have done a few years ago in terms of tooling available to them and sort of the sort of maturity of the sort of infrastructure. But if you wanted to start a sort of AGI research company today, the bar is probably higher in terms of the compute requirements, particularly if you sort of believe in the scaling hypothesis and the kind of the idea that, you know, taking approaches like GPT-3 and continuing to scale them up

Starting point is 00:10:30 is going to be more and more expensive and less and less accessible to sort of newer entrants without large amounts of capital. Yeah, that was a point in your report which I found particularly interesting and one that I think many people, at least people who are not in the inner circle, let's say, of how AI works, don't fully grasp the amount of resources, compute and data and also people's energy that goes into training those models and that makes them in reality not really accessible to well to pretty much anyone but a very few select organizations and I think that one of the other things that piqued my interest in your report was, well, a kind of advice, let's say, or path to success on how those models can be interesting and useful to others

Starting point is 00:11:30 beyond the organizations that produce them. And you suggested, if I'm not mistaken, that, well, perhaps an idea would be to take those pre-trained models and actually fine-tune them to specific domains. Have you seen cases where people have done that with success? How easy do you think that's to do? Yeah. So I think one of the interesting examples of this, you know, take a large model or take a pre-trained model in one field and move it to

Starting point is 00:12:04 another field and sort of bootstrap performance to a higher level than if you were to not do that. And also that plays into one of the dominant themes in the report is, you know, the slide where we talk about confocal microscopy and basically like using imaging to understand biology and treating it as a similar kind of task to ImageNet, where, you know, so much of the improvements in network architecture and network performance and computer vision was, as we all know, driven by carefully curated data sets from which models can learn something useful. And, you know, as biology and healthcare has become an increasingly digital domain with lots of imaging, whether that relates to healthcare conditions or what cells look like when they're diseased or normal, compiling data sets that describe that kind of biology and then using transfer learning from ImageNet into those domains has yielded much better performance than starting from scratch.

Starting point is 00:13:09 So that's the example I would probably highlight. And then kind of related to that outside of computer vision, but more in language models, we're seeing several examples of, for example, language models being useful in protein engineering or in understanding DNA and essentially treating sequence of amino acids that encode proteins or DNA as just another form of language, a form of strings that language models can interpret just in the same way

Starting point is 00:13:35 they can interpret, you know, characters that spell out words. One thing we don't really flag in the report george but i think is kind of an additional subtlety is obviously you know you can highlight the sort of the potential costs of training a model but the other thing that organizations with very large um you know amounts of capital can do is run lots of experiments uh and and sort of you know iterate in iterate in these kind of large experiments without having to worry too much about the cost of each training run. So, you know, there's a sort of a degree to which you can be more experimental with these large models if you have more capital.

Starting point is 00:14:17 Obviously, it also does slightly bias you towards these, you know, almost brute force approaches of just applying scale and more capital and more data to the problem but i think that you know if you buy the scaling hypothesis then that's a that you know then that's a uh that's a you know kind of i think a um you know a fertile area for progress that shouldn't be dismissed just because it doesn't have you know deep intellectual insights at the heart of it. There's an interesting example that you mentioned in the report of a model that theoretically shouldn't be able to compete against bigger models with more parameters and more training. And I'm referring to the model used by a company called PolyAI and you showed how

Starting point is 00:15:06 this actually performed better compared to you know very uh both compared to models with more parameters and more resources behind them such as BERT for example so I was wondering if you have any insights as to why uh that is you know what what is it that they did so well that managed them, that enabled them to compete and actually beat those models? Yeah, I think the main point here is research engineering in big technology companies has been increasingly about publishing more general purpose models. This idea of one model can rule them all or conduct different tasks. And that's what research teams are fundamentally excited and interested in.

Starting point is 00:15:53 And that's kind of contrasted with comparatively smaller companies that are very domain focused, largely kind of tackling use cases that are at the periphery of large technology companies. And in those cases, like, yes, research is important, but actually in those cases to get models to work in production, you actually probably have to do more engineering than you have to do research. And almost by definition, like engineering is not interesting to the majority of researchers. And so in this sort of slide, this work that we described from PolyAI, they're a dialogue company to deal with conversations in customer contact centers.

Starting point is 00:16:35 And we're essentially showing that the tasks that they have of detecting intent and understanding what somebody on the phone is trying to accomplish by calling, for example, a restaurant is solved in a much better way by treating this problem as a, what they call a contextual re-ranking problem, which is given a kind of menu of potential options that a caller is trying to possibly accomplish based on our understanding of that domain, we can design a more appropriate model that can better learn customer intent from data

Starting point is 00:17:12 than just trying to take this kind of general purpose model, in this case BERT, that can do okay on various conversational applications, but just doesn't have kind of like the engineering guardrails or the engineering nuances that can make it robust in the real-world domain. And the kind of interesting TLDR from this is that this model that they published is significantly smaller by parameter size than BERT.

Starting point is 00:17:40 And while that's less news-catching in the research field is actually more relevant for production applications because you can learn from less data and be effective with less data and it can also be trained on a lower kind of computational footprint which means that it's more accessible to companies and less costly to scale um so i think this this is an interesting example of that is sort of like contrasting priorities between the technology companies that focus on cutting-edge general-purpose research and startups that care more about when machine learning hits the road, how do you make it robust and useful? Yeah, that's interesting. In a way, this kind of brings us to the topic of bias, let's say,

Starting point is 00:18:30 in whether you should use it and in what way, and to the broader topic of, well, language models and how they're built and how they work. And to the best of my knowledge the most prominent critique of the approach taken by models such as GPT-3 and all the predecessors is Gary Marcus who has made the point of you know pointing out the deficiencies in those. I'm sure you're aware of this critique and it basically comes down to you know what you consider bias and whether it's good and how you should insert it in the model and all of those things so i wonder what what your take is on that yeah so i mean my my interpretation of his kind of critique is really that you know gbt3 is an amazing like language model that can take a prompt and output a sequence of text

Starting point is 00:19:28 that is legible and comprehensible and in many cases relevant to what the prompt was. But there are numerous examples where it veers off course, either in a way that expresses bias or that is just gibberish and not relevant. And elsewhere in the report, we show a new kind of benchmark published by Berkeley, which exposes some of these issues across various kinds of academic tasks. I think the interesting kind of extension towards what GPT-3 could do is, and kind of

Starting point is 00:20:04 relates to our discussion around poly AI, is this aspect of injecting control, like some kind of toggles on the model that allow it to have some guardrails at least, or at least kind of tune what kind of outputs it can create from a given input. And there are different ways that you might be able to do this.

Starting point is 00:20:25 And I think in the past we talked about knowledge bases and knowledge graphs, or perhaps even some kind of learned intent variable that can be used to inject this kind of control over this more general purpose sequence generator. So I kind of viewed through that angle. I think his concern this is certainly valid you know to some degree and I think points to your kind of next generation of what generative models

Starting point is 00:20:54 like GPT-3 could could move towards if if the goal is to have them be useful in production environments yeah I think I mean Gary Marcus is kind of a almost a professional critic of organizations like deep mind and open ai and i think it's very healthy to have those critical perspectives um you know when there is such a sort of reckless hype cycle around some of this stuff um but at the same time i I do think that, you know, OpenAI has one of the more thoughtful approaches to policy around this stuff. You know, they seem to take, you know,

Starting point is 00:21:35 their sort of, you know, responsibilities seriously. You know, for example, you know, the approach they took to releasing, you know, to releasing various models in the past and the work they've done on malicious uses of AI. And I think in general, Nathan and I hold their policy team and Jack Clark in particular in very high regards. So I think the OpenAI is trying um i think what's sort of what's maybe concerning about the the sort of you know their approach and the scaling hypothesis is you know it feels like they're kind of saying if you throw more data and more compute um uh and build larger and larger models at some point you sort of move beyond this kind of um sequence prediction into some kind of emergent intelligence and

Starting point is 00:22:25 obviously some people don't really agree with that you know that that theory of how we achieve agi but if that's if they're right right let's say they're right and and the critics are wrong then we might have a uh you know a very smart but not very um you know uh well adjusted agi on our hands um as as evidenced by sort of some of these early instances of bias as you scale these models. So I think it's incumbent on organizations like OpenAI if they are going to pursue this approach to tell us all how they're going to do it safely

Starting point is 00:23:00 because it's not obvious yet from their research agenda or obvious to me how you marry AI safety with this kind of throw more data on the computer problem and AGI will emerge approach. This points again towards another interesting part in your report, where you mentioned that there's a number of practitioners who feel that progress in mature areas of machine learning is stagnant, and there's a brute force approach whereby you can just throw more compute and more data at the problem and at some point you'll just solve it this way or whether you need to add something else

Starting point is 00:23:54 to the mix, which is I think what people like Gary Marcus are advocating for. So what's your stance on this dichotomy? I'll start there. I mean, I think that causality is kind of, you know, it's kind of, you know, arguably at the heart of much of human progress, right?

Starting point is 00:24:19 You know, from a sort of epistemological kind of perspective, causal reasoning has given us you know uh the scientific methods um it's you know it's at the heart i think of of of our best world models um and so i think that i i'm personally incredibly excited about the the work that people like you dad pearl have have pioneered here and i'm excited about us figuring out how to bring more causality into machine learning um my sense is that it feels like the the biggest potential disruption to kind of the general trend of you know uh larger and larger um you know correlation driven models um because i think if you can if you can crack causality you can start to to build a pretty powerful um sort

Starting point is 00:25:15 of scaffolding of of knowledge upon knowledge and have machines start to really contribute to our own knowledge bases and and sort of um scientific processes um so i think it's very exciting and i think there's a reason that some of the smartest people in machine learning are sort of spending their weekends and evenings working on it um but i think it's still it's still in its its kind of infancy as a as a sort of as a as a as a kind of um as an area of attention for the the sort of the commercial community in machine learning. I think we really only found one, you know, one or two examples of it being used kind of in the wild, one by Faculty, a London-based machine learning company, and one by Benevolent AI in our report this year.

Starting point is 00:26:03 But I think, you know, we're excited to see more of this. And I do think it could be a kind of a pretty powerful dislocation of kind of business as usual in machine learning and sort of correlation-based curve-fitting approaches. Okay. Thank you. Nathan, you want to do something today? yeah I mean I'm just probably opining on the like practitioners feel that progress is stagnant I think

Starting point is 00:26:34 it generally relates towards like the feeling of researchers that you know there's not much that lies beyond deep learning and like these generative models. And we've been discussing it's kind of like hyperscale mentality.

Starting point is 00:26:52 People just, I think, just anecdotally feel a bit sick of the throw more computer at the problem and it's going to fix everything under the sun. So I think they, especially in research, they're looking for something that's a bit more like intellectually inspiring or intellectually novel.

Starting point is 00:27:10 But I think outside of research, I would say that the application area is far from stagnant. And actually even in domains that haven't seen much impact of A, computer science and B, machine learning, such as healthcare and biology, we still highlight a couple of examples, actually, of startups that are at the cutting edge of R&D put into production for problems in biology.

Starting point is 00:27:38 I think the couple ones that we would highlight is, in for example one problem of of uh of sort of like drug screening and figuring out okay if i have if i have a software product that can you know generate lots of potential drugs that could work against the you know disease protein that i'm interested in targeting how do i know out of you know thousands or probably like hundreds of thousands of possible drugs which one will work um and assuming i can figure out which one might work how do i know if i can actually make it um and so you know there's a couple of startups working on this and here we profile some work from posteros based in london in the u.s and you know they use machine learning and some of these more modern transformer architectures

Starting point is 00:28:25 to teach chemists or suggest to chemists what route what kind of recipe mix one would use to make a molecule of interest and this kind of this kind of work is a like state-of-the-art and be you know really interesting and important package as part of the overall problem of drug discovery. And then some other work that we profile from in vivo AI in Canada. And here, this startup is using graph neural networks to learn representations of chemical molecules and say, okay, can we predict whether this molecule is solvent?

Starting point is 00:29:05 Whether it binds to a specific target, is it toxic, and do that all from just like the chemical Lewis diagram that we learn in Chemistry 101 or organic chemistry. So I think this is like state-of-the-art as well and is actually adding a lot of potential value to industry. And there's some examples of progress that's far from stagnant in kind of more emergent industry domains. It's great.

Starting point is 00:29:33 You touched on two topics that I actually wanted to follow up with. So let's start with the first one, graph neural networks. I've seen very big interest on on those and I can understand the reasons. You also kind of touch upon them in the report. So it's a kind of change of paradigm in what data you can process. So with the typical kind of neural networks, you can process two dimensional data and graph neural networks, you can process three-dimensional data and graph neural networks you can process three-dimensional or well any beyond the two dimensions so that's quite quite a breakthrough i would say and it lets people take advantage of connections basically and extra information in their data so i've seen lots of interesting work lately on that and lots of interest as well. I'm sure you keep track on that.

Starting point is 00:30:26 So what would you pick as the highlights in this subdomain? Yeah, this relates to some discussions I have with some friends who are from the pure machine learning domain and see a lot of excitement in biology and just want to understand how to think about problems in biology from a machine learning standpoint. And I think it kind of comes down to one topic, which is what is the right representation of biological data that actually expresses all the complexity and the physics and chemistry and sort of like living nuances of a biological system into like a compact,

Starting point is 00:31:07 easy to describe mathematical representation that a machine learning model can do something with. And, you know, as you described, a lot of the existing models kind of treat, you know, we'll sort of treat problems as like vectors or 2D representations. And it's sort of sometimes hard to kind of conceptualize biological systems as just like a matrix array or a vector or something like that. And so it could very well be that we're just not, you know, exploiting all of the implicit information that kind of resides in a biological system in the form of a vector. So I think that's why the graphical representations are at least an interesting kind of next step,

Starting point is 00:31:49 because they just feel so intuitive as a tool to represent something that is intuitively connected, as a chemical molecule would be, because it's connected to atoms and ponds and things like that. So we've certainly seen examples in molecule property prediction and chemical synthesis planning, but also in trying to identify novel small molecules by essentially treating small molecules as a combination combination of small like lego building blocks and so you can you can use advances in you know dna sequencing where you can attach a little tag to a small lego building block of a chemical then you can mix all of these chemicals into a tube with your target and then you can essentially see what building blocks have assembled together and bind to your target of interest. And then that's your like candidate small molecule that seems to work.

Starting point is 00:32:52 And then you can use these graph neural networks to try and learn what commonalities these building blocks have that make them really good binders of your target of interest. And this is some work that's been published with some startups in Boston and Google Research that's essentially adding this machine learning layer to a very kind of standard and well-understood chemical screening approach and generating several-fold improvement on the baseline. So I think that's also super exciting.

Starting point is 00:33:27 I saw a very interesting analysis by Chaitanya Joshi, who was essentially arguing that kind of, you know, graph neural networks and the transformer architecture and sort of tension-based methods are kind of, you know, kind of the same underlying, have the same underlying logic, you know, where you can kind of think of sentences as essentially fully connected word graphs. And I think that, you know, one thing we noticed a lot during the report this year is the way that the transform architecture is creeping into lots of unusual use cases. wouldn't have predicted it to be used for. And then secondly, that scaling it up is obviously having more impact in terms of the performance of these larger language models.

Starting point is 00:34:18 So I think that maybe the meta point around both graph neural networks and these attention-based methods in general is that they seem to represent uh a sort of a general enough approach that there's going to be uh there's going to be progress just by continuing to hammer very hard on that nail for the next few years and i i one of the ways I'm sort of challenging myself is just to sort of take a minute and just assume that actually we might just see a lot more progress just by doing the same thing with more aggression for a bit. And so I would assume that some of the gains that are being found in these GNNs sort of cross-pollinate with the work that's

Starting point is 00:35:07 happening with language models and transformers. And that approach continues to be a very fertile area for sort of super general kind of AGI-esque research. Okay, thanks. And the other point that, well, GN it looks kind of obvious but you may confirm it or not that probably the advent of COVID had had something to do with that even though it was possibly going on already but yeah if you would like to say a few words about what you have seen in that domain. Yeah. So I think the main kind of applications in AI for biology are one in like drug discovery and then the other one in kind of clinical medicine.

Starting point is 00:36:21 And so in clinical medicine, I think there's a couple of um like really exciting developments um that i think are like very powerful signals for where the field's going and kind of its state of its maturity um the main one is actually that you know for the first time ever um you know the u.s medicaid and medicare system that pays for um procedures in the U.S. has actually approved a medical imaging product for stroke that's created by Viz.ai. And so despite a lot of FDA approvals for deep learning-based medical imaging, whether that's for a stroke or mammographies or broken bones, this is the only one that so far has actually gotten reimbursement.

Starting point is 00:37:11 And many in the field feel that reimbursement is the critical moment because that's the economic incentive for doctors to prescribe because they get paid back. And so we think that's a major event. Still a lot of work to be done, of course, to scale this and to make sure that more patients are eligible for that reimbursement, but still major nonetheless. The second area is in drug discovery. And there, what's worth highlighting is that a business that was originally found

Starting point is 00:37:45 in Scotland and largely under the radar of most technology press hype has been the first developer drug through machine learning methods that have now entered phase one clinical studies in Japan for treatment of OCD. And that business is Excientio. And that same company has also managed to confirm a licensing deal for assets that are discovered through a quarter of a billion dollar project with Sanofi that just started two years ago and has now essentially been executed, which kind of proves out that large pharma companies are actually getting value from working with, you know, first drug discovery companies. So we kind of think this is like the two major kind of industrial moments, but, you know,

Starting point is 00:38:34 beneath the surface, there's tons of activity across those segments overall. Yeah, I think that, you know, one, I always sort of go back to carlotta perez's kind of uh you know framework for thinking about you know how financial capital interacts with you know um you know technological progress and i think we've certainly started to have the kind of the speculation phase right where lots and lots of capital is flowing at these kinds of the intersection of machine learning and biology and you know there are going to be some really amazing companies that come out of it and i think we will start to see real you know a real deployment phase you know kick in um and then also

Starting point is 00:39:17 i'm sure there's kind of a theranos hidden in there somewhere as well that's something that's going to be a total revealed to be a total fraud. But we feel pretty excited about the potential here. And I think that in particular, we think that companies like LabGenius, in which we're both an investor, and XINC are examples of companies that are likely to be doing some really quite profound work over the coming years. Okay, I think we're almost out of time.

Starting point is 00:39:49 So maybe one last question and we can wrap up. And I guess it has to be AI ethics. I mean, after touching upon the use of AI in biology and you actually, this is not the only domain in which you also referenced in the report. So speaking about COVID, for example, and how AI is used in image recognition systems and all those things. So you put forth a number of interesting observations and suggestions in the report, but the main

Starting point is 00:40:21 question for me would be, well how how can we make them real I mean how can these things be even enforced what sort of things sorry you make a you make some interesting observations in terms of AI ethics so you know a set of guidelines or rules that should be observed. The question is what is a real way to enforce whatever guidelines we decide are appropriate?

Starting point is 00:40:56 I think there are at the start of some quite interesting approaches to regulation where Nathan, do you want to jump to the slide about the UK use of facial recognition? Yeah. um so you know this is an interesting example where um you know a a uk citizen basically claimed his human rights were breached whilst christmas shopping um and you know uh the ruling was kind of interesting in that he was you know he was ruled against but there was also this kind of this duty to the to the to you know duty um to the

Starting point is 00:41:55 police in making sure that discrimination was proactively eliminated from technology being used so you know legally the police are now kind of on the hook for getting rid of bias before they can use this software and so it creates a kind of a uh you know it creates a much higher bar to deploying this um this software and it creates a a sort of almost a a legal opportunity for uh you know, anyone who is, anyone who experiences bias at the hands of an algorithm to have, you know, foundation for suing the government or a private actor deploying this technology. So I think this is one interesting approach where you essentially say

Starting point is 00:42:38 effectively the software has to be, you know, demonstrated, you know, demonstrated to make, you know demonstrated you know demonstrated to make um you know uh you know um uh extreme efforts to remove bias and be sort of you know ethical in that regard um obviously there's many many aspects of ethics but you know bias being one and i think that that kind of that places a much greater burden on the um on the on the the entity deploying the software um the other approach that i think is interesting is kind of some degree of kind of um api driven auditability so there's um i think it was in i think it was in washington state where they have made it so that any facial recognition system has to have um uh has to have an api that would

Starting point is 00:43:28 allow an independent third party to assess it for um you know performance and and bias across across different categories of identity so i i think there's kind of this like interesting um you know a couple of approaches emerging where law enforcement are sort of figuring out how to police the use of this and how to, sorry, not law enforcement, how regulators are figuring out how to incentivise ethical behaviour, either by introducing third parties in a novel way, like this kind of this API driven approach,

Starting point is 00:44:06 or just by saying that the use of this is held to standards that then open the users up to lawsuits if they don't meet those standards. So I think we're starting to see some, some sort of regulatory innovation in this area. And I think that now that this is so prime time, you're going to see even more kind of even more emerge on the regulatory side to try to restrain use of these algorithms to kind of ethical approaches. Okay. Well, thanks.

Starting point is 00:44:46 I guess we're a bit over time now, so if you don't have anything to add to that, Nathan, we're probably going to have to wrap up here. Yeah, sounds good. Thanks a lot for some really good questions and some cool

Starting point is 00:45:01 discussion. And if anyone's listening and wants an internship, hey, Nick, just get in touch. I'll be sure to highlight that, at least in the podcast. I'm not promising anything for the write-up, but for the podcast, you get that. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

Orchestrate all the Things - The state of AI in 2020: democratization, industrialization, and the way to artificial general intelligence. Featuring AI investors Nathan Benaich and Ian Hogarth

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.