No Priors: Artificial Intelligence | Technology | Startups - The Intersection of AI and Blockchain, with Transformers author and NEAR founder Illia Polosukhin

Starting point is 00:00:00 A blockchain operating system might just be the key to a democratized Web3. In fact, more than 25 million users already getting a taste of this, thanks to NIR. This week, Elad and I are joined by Ilya Polosukean, the co-founder of NIR, and the co-author of the landmark Transformers paper, to discuss the interaction of blockchain and AI technologies, what we should expect from AI agents, how to handle the content authenticity problem and why the alignment problem in AI is really a human problem. Ilya, welcome to No Pryors. Thanks for doing this. Thanks for inviting.

Starting point is 00:00:36 You are one of the authors of the original Transformers paper. We've also had Noam and Jacob on. How did you get involved with that seminal work in AI? I worked on a team on natural language understanding. It's focused on question answering, and the state of the art at this time was LSTM's recurring networks. which you could not launch in production at all because they're too slow and take a fair bit of time to process as document scale. So Jacob at the time was using attention for query similarity and he had this idea like using attention for encoder decoder type.

Starting point is 00:01:18 I kind of jumped into it and with a shish were playing around with can we actually get it to train and understand the order of work. and do translation just based on, you know, attention. So, yeah, it was pretty cool to explore that and obviously grew into something very interesting and awesome. You originally co-founded NIR in, I think, 2018, meaning for it to be an AI-focused company. What was that initial mission, and how did it become a blockchain company?

Starting point is 00:01:49 Yeah, so we started this idea that we wanted to teach machines to code. You know, we have transformers coming out. There was a lot of kind of really interesting, push in 17, 1617 around AI. And so our expectation was we kind of would write the exponential growth of AI, which has happened in this year, we thought it will happen in 1718. And so with that, we got a really interesting data set around language to code. But more interestingly, we had a whole community of developers, mostly students, who were

Starting point is 00:02:24 doing crowdsourcing for us. So we would give them code. They would write descriptions. We would give them descriptions. They would write code for them, write tests, like all kinds of tasks. And we actually faced a challenge of paying them because a lot of them were in China, in Eastern Europe and kind of other countries where there's monetary control problems. People don't have bank accounts. And so we started looking into blockchain just like to solve our own problem. the AI kind of expansion explosion didn't happen at the time and so we saw an opportunity of we can actually build a blockchain that we would use to solve this first and focus on that while kind of waiting out the AI thing to really happen and as you go into the blockchain rabbit hole you realize there's a lot more that meets the eye yeah yeah ended up being a pretty big mission

Starting point is 00:03:13 exactly so you call near a blockchain operating system for any of our listeners who haven't used it. Like, what does that mean? So the idea is that we want to kind of go upstack, right? We want kind of an environment where you can discover and use Web 3 experiences, you know, benefits from them and not need to think about the low-level, you know, implementations and quote-unquote hardware that runs under it, right? So similarly how operating systems on your phone, you know, kind of abstracts out all the complexity

Starting point is 00:03:47 of networking and payments and everything, you just use it and you have apps that developers can build. And so that's really what we're trying to achieve and kind of build this framework and platform for everybody to build their applications in Web3 and really deliver it to the user and to consumer. Where do you see a lot of the overlap coming in terms of Web3 and AI? You've thought very deeply about both. I remember when I first met you, you were just switching from sort of NIR's original mission and at the blockchain-based mission. And you know, you were known as a team that could literally build anything, right? Like you had yourself and Alex and Pai Guy and all these like amazing people. And you went down the direction

Starting point is 00:04:25 to building blockchain in part, I think originally around this data labeling kind of mission and the ability to do payments and things like that. And now I know you've been thinking a lot again about how these two worlds interact or intersect. Where do you think are going to be the biggest places of overlap between AI and blockchain or Web3? There's few levels of interesting intersections. I think the most obvious one that everybody talks about is various marketplaces for resources, be that compute, model, or data, right? So data crowdsourcing. So those are pretty obvious, right? Web3 is really good at creating marketplaces, creating traceability, and providing an equitable place for everyone to participate. Now, the more interesting ones is

Starting point is 00:05:06 where AI kind of agents, right, which we've seen like initial versions of, but obviously they're going to continue evolving. If you equip them with a blockchain account, right, they are now becoming an economic agent that is able to pay other people and pay other AIs to do work, right? And they can communicate, right? And I think one of the things that a lot of people

Starting point is 00:05:29 who are, oh, like this language models are just the same advancements like as everything before, missing the point that this is the first time that a machine is able to communicate with people in the same way, right? There's no more need in an intermediate human that interprets data and then tells it to other people. Now machine can communicate directly to people. And so it can task them with work. It can provide them context. And so I really think one of the most

Starting point is 00:05:57 interesting cases is organizations that are run completely by AI, right, where quote-unquote CEO role is taking by AI agent, who is tasked by, you know, by community or board of directors or whatever is oversight governance is. to hit specific APIs and follow specific mission. They can even give specific feedback with training data when they don't think it's doing the right job. But what it does is like creates this kind of a new layer of management that potentially removes a lot of middle management right now,

Starting point is 00:06:26 which is like transforming information and context for each individual person and giving them specific area of work and then gathering their creativity and putting it back together. I think that's as very interesting use case that kind of really melds blockchain and AI together. Why, like, you have a traditional biotech cancer research commercial entity, like why blockchain and why AI for that?

Starting point is 00:06:53 I use this example, right? We want to, you know, continue making progress on solving cancer, right? And it's a very complex problem, right? There's a lot of like specific sub-cancers that, you know, need research. And so all of this and like coordinating people doing experiments, propagating information,

Starting point is 00:07:08 and recruiting, you know, people recruiting the candidates, right? All of this requires, like, somebody to do this work and kind of organize the process and really set up a lot of pipeline and, you know, funding and all those things. And right now there's so much overhead around everything from, you know, how grant funding is allocated from the nonprofits that collect money for research, how, like, experiments are set up, the information sharing. Like, all of those pieces are really kind of broken. And so you can actually have, you know, like coordinated efforts,

Starting point is 00:07:38 that is designed just to do that and it can consume all this information and kind of specifically task who is the best person at doing the experiment of which lab is the best at doing this specific sets of experiments fund them for this amount of money, you know, oversee

Starting point is 00:07:54 their delivery and then kind of iterate and you know if it thinks this lab is not doing a good job, fire them without having like extra personal affiliations that you know people do have. I'm actually excited about some folks are already building some examples of this in like a simpler

Starting point is 00:08:11 forms, but I think we'll see, you know, first organizations like this, probably even this year where potentially with the simpler missions and kind of more straightforward like KPI metrics, but where kind of this information propagation and onboarding of people happens already through a kind of language model AI agent. A simpler version of this that I've heard people talk about and it may be the first step towards it is actually providing on-the-job feedback via an AI versus like a human manager with the idea that it depersonalizes the feedback, right? So if you have an agent or an AI providing feedback, some surveys at least

Starting point is 00:08:49 have suggested that the average employee may be more comfortable with that because it feels more objective, it feels depersonized, it feels like it can be provided in a directive way. And it seems like that's one aspect of sort of this AI as CEO concept that you're describing. Do you think the first place that it'll show up is Dow's, or do you think it'll show up in a different part of the community. Yeah, I think DOS is, and especially what happened with DOS, there was a lot of people who were really excited about DAUs kind of as a concept, and so they put a lot of time running them.

Starting point is 00:09:17 But it's actually a very, like, not interesting job, right? It's like, you onboard new members, you explain to them all the same thing, you know, you answer to their questions. And so that's the part which, like, you can already automate, right? You can, like, have a Discord bot that is, like, have all the context about the DOS, you know, interactions and kind of unboards new people and it gives them. them like new, you know, tasks to start with and kind of coordinate them. So I think that will be the first place where this kind of starts showing up and as well

Starting point is 00:09:45 because you have like payments kind of like there and you don't have any social constraints that usually have in like regular organizations. Like, you know, a lot of people will revolt if you like tomorrow say, hey, by the way, your new boss is this AI model. Yeah, yeah. How do you think about AI in the context or I should say blockchain and AI in the context of things like alignment. Yeah. So I think this is a very interesting topic. So I have this view that we need human alignment instead of AI alignment. So right now, kind of when we talk about, you know, hey, we need to align AI with like human values. But the reality is that, you know, all the problems that exist, they all exist because of humans doing things. And they've existed

Starting point is 00:10:28 before. I actually like to use the Byzantine fault tolerance problem, right, which is basis for blockchain, but its roots are in history where there was people propagating misinformation and you were trying to figure out how to prevent misinformation in the army. So this is like a really old problem of misinformation and kind of like how to work around that. And so I think what we need to start doing is figuring out how do we build a society that is actually able to deal with kind of effective misinformation at scale. So, like, we've kind of built, like, a lot of our society has started building up tolerance to misinformation around, you know, TV and mass media, but we don't have, like, a system and framework around dealing with it at scale. And that's what AI brings, brings just scale to the same problem. And so this is where reputation, identity, and kind of systems around our social, like code operating system that powers our kind of communities is really important, right? How do all this, this, this

Starting point is 00:11:33 pieces work together and how does they actually operate when there is malicious actors who potentially are able to, you know, in mass create like very personalized misinformation or create, you know, fake political actor that is, you know, convincing every individual exactly in what they think, you know, that government should do to get elected. And this is where Web 3 comes in as like a set of primitives, right? We have cryptography to authenticate content and create a path, everything from, you know, you take a picture of his camera. Some of them already have a secure enclave that can sign the image that's taken. And so as that image gets processed, we can actually propagate that information and have a proof that it came from, you know,

Starting point is 00:12:19 specific time and place and then being processed by a specific set of filters. Right. So that can give you like an anchor. Then you still need to know kind of who is publishing what, right? Like we're recording this podcast, you know, people listening to it, it could have been completely generated at this point, right? But if, for example, we all sign the, you know, the final podcast and say, hey, yes, we've recorded it and this is that content, now when somebody's listening to it, they can check that indeed, hey, this content is signed by us. Now, the question of us comes in, right? So this is where kind of identity and reputation is important. And so this is where kind of unchain identity becomes your kind of coalescence of all of the content and all the

Starting point is 00:13:03 interactions that you do. And then that links to kind of, you know, reputation in different communities and provides context for people who are watching for this content to be able to understand, you know, who is this person who's talking or where they're coming from and what are the information values they have. So I think like it's, it needs to be a kind of systematic approach. And it will start with pieces, right? I think one of the imported pieces will be kind of a green lock similar to SSL transitioned

Starting point is 00:13:32 on the content, right? Like as you go to YouTube, as you go to New York Times, you actually will see that like, hey, this content has been signed by this party and this party is in some trust route or trust graph of communities that you are following.

Starting point is 00:13:49 So that's probably like one important piece. And again, blockchain and cryptography is just like tools to enable that product experience. And then from there, you know, we need similar things on the government level, right, when you file paperwork, when you file, you know, your identity. The fact that your SSN is a, you know, number that you give to everyone, which is like supposed to be secret, is like, for example, ridiculous. So things like that is like all of this needs to improve and kind of upgrade to this new level where like a massive amounts of kind of at scale of things

Starting point is 00:14:20 that have been happening now are possible. What do you think is the most likely form of blockchain-based identity because, you know, the blockchain really has been the earliest place where you've had programmatic actors interacting around economic and other utility functions, right? It really is money as code. And effectively, smart contracts are ways to programmatically interact with that, right? So you had almost like the execution layer without the intelligence, and now we're adding the intelligence. You have the cryptography, but you're missing a real sense of identity, which is needed if you have an agent or bot representing you interacting with another agent, which is probably where a lot of things will work in the future online. What do you think is the most likely form of

Starting point is 00:14:57 identity on the blockchain and why hasn't it happened yet? It has happened to some extent, right? We have, you know, like millions of people actually using blockchain right now and they're using it more for financial use cases and kind of that's their financial identity. The wallet is identity kind of thing. Yeah, wallet has became an identity, right? And the reality is like your quote-unquote private keys are your identity, but that's just too hard of a concept for people to actually work with, right? And so on NEAR, we actually change that. We, you know, you have a properly named account, so like mine is root.orgnear, which can have lots of different private keys accessing it with different permissions, right? I can give a key, and in a way, permissions to an agent to, for

Starting point is 00:15:37 example, interact on behalf of this, or I can withdraw it, right? I can give it to a specific application, etc. So, like, a more extensible model is needed. That's one. We need to have more social interactions kind of being spawned from this. And so this is, again, blockchain-abring system is powering actually social interactions and kind of communication. We actually have a project working on chat and other ways of using now this identity

Starting point is 00:16:03 in more places. It's mostly because we didn't have a critical mass of these applications that are using this identity for it to really become kind of the core. And if it's not the core, it's not as useful because nobody, you know, like, hey, you don't have it. So, like, we're not going to use it as a default thing everywhere.

Starting point is 00:16:21 So, like, we really need to kind of go over. Like, again, I think SSL is a really good example of something that's like, it delivers value. It's clearly valuable. But it was such a, like, uphill battle to get it there, right? And so I think, like, until you have this critical mass of, like, kind of website switched and browser support, it didn't become a default, right? So we kind of need, like, the same here to happen.

Starting point is 00:16:46 Like, we'll need to have a critical mass of applications using, identity and then then we kind of sees it like in browsers or wallets or whatever like applications to hold it and then we'll see a transition function happen where like hey oh you don't have it like you should get it because it's actually easier

Starting point is 00:17:03 and better to use it and it gives you like more financial freedom as well and more upside. Where do you think the most likely failures like system wide are to be like with growing capabilities and AI like where do these mid-again's

Starting point is 00:17:19 in terms of reputation systems with blockchain or content provenance are likely to, how's it going to manifest in ways that affect us? Yeah, I think there will be, probably next year, it will be very interesting in US because I think this will be a place where everybody will just take whatever their toys that have in toolbox and do it even just for kicks, right?

Starting point is 00:17:42 Even if it's not malicious, although some players will be malicious. And I think what we'll see everything from like completely fake, narrative candidates to, like, I would be very interested to see, like, a web page where you land and, you know, you log in, and it literally generates specifically for this user based on their interest, a agenda for this candidate, right? So, like, hyper-focused, you know, marketing for candidates based on, like, who this voter is, right? So things like that, like, we'll have all those possible things where the media will kind of be fly. it was like, you know, you can spin up new media right now and just generate content about your

Starting point is 00:18:24 candidate like that you want and then market that. So like you can have like all kind of things now just exploding without any way of like framing it on the user side. If like does this have history, is this coming from the right sources? Has this been validated, right? And so I think that's going to be really important. I think the other side actually is law enforcement and this is sadly already happening. The people are using this tools now in very malicious ways right now and law enforcement don't have a like really good ways to deal with this. And so I think everything from this like on camera, like signing, we need this now.

Starting point is 00:19:04 Like they really have no way to like kind of identify if the image was generated or not. And similarly like for, you know, audio recordings and things like that, like there needs to be kind of additional kind of levels of verification. And this goes into actually like video calls and voice calls because right now somebody can call you on the phone and play a recorded, like generated audio of somebody they recorded 30 seconds off. And this can be this very nefarious means, right?

Starting point is 00:19:36 It's a huge consumer fraud problem already. Well, it's huge consumer, but it's also like, beyond that it's becoming like a real criminal problem. Like criminals are able to use this tool. now. And it's like the barrier of entry there is like very low. And so, uh, this is where like you really need like, you know, the phone calls, the kind of all of this, like you need more information identification and like kind of cryptography embedded into the system. Otherwise, it's completely going sideways really quickly. Yeah, this is where people would be using API as like

Starting point is 00:20:05 Element or LFG or 11 Labs to create a voice snippet, right, where they'll upload to your point 30 seconds of voice, train a model. And then the output set. sounds close enough to the person that you could fool a financial advisor or a bank or somebody else to do transactions on your behalf or things like that. Yeah. And you like swipe their phone and now you're able to like impersonate them completely. Right. So yeah. So this is like a real problem and like having kind of authenticated pass is required there to really stop. And like we have actually like the phones are actually have so much already like we have face ID and fingerprints. We have, there's secure enclaves that sign things

Starting point is 00:20:46 that are like, haven't been hacked as far as I know. So there's like a lot of the pieces are there. Now we just need like a product stack that actually pushes it to the user and like to the products. Yeah, that makes sense. I guess one other area where some people have talked about

Starting point is 00:21:01 overlap between the blockchain world and the AI world is around training and there's almost like two or three different forms of that. One form of that is there's a lot of GPU capacity that was purchased for mining on the crypto side. And given how valuable GPU is now, on the training side, there's all sorts of sort of models to aggregate GPU specifically for training

Starting point is 00:21:26 in different ways, you know, aggregating access capacity. And then separate from that, there's ideas around, well, can you train a model in a distributed way across a blockchain more generally? Do you think either of those things are concepts that will work, or how do you think about them relative to the future? Yeah, I mean, it's interesting. because it sounds like such a no-brainer that, hey, let's grab those GPUs that, for example, Ethereum just moved from proof of work to proof of stake. Let's grab those and start using them. The challenges, the GPUs there are like not the ones that AI folks want to use, right? Like, kind of all the AI is really zeroed in on like, how do we get A100s

Starting point is 00:22:06 or H-100s? And the GPUs that like folks used for Ethereum mining and like similar is like all ones like that are not also focused on like floating point arithmetic for example as much and so the challenge was more around like people who did did that like Corviva is probably a good example right they were mining company like it's more that they had a know-how how to build data centers and they can like get access to massive like talk to invidia and like get massive access to that versus like repurposing the same GPUs although I mean obviously like for smaller models for some specific maybe inference things

Starting point is 00:22:45 there's maybe transition there's a question of decentralized training right in general right like hey we have like lots of GPUs everywhere can we train it and the reality right now

Starting point is 00:22:56 the requirements on bandwidth right like people who training these models right now they have like a you know 800 gigabit connect right between the GPUs right so maybe you have 100 megabit on

Starting point is 00:23:10 between this usually not and you need to like replay and like work around problems for decentralized. So I think decentralized training right now is like still not as realistic, although there's some research people are trying. I think an inference is really interesting because we do need so much more compute for inference than we need for training, right? Like it's a very interesting like economy of scale.

Starting point is 00:23:32 You train once, like Lama trained once and then everybody runs it everywhere. And so the inference is where I think there's a lot of interesting cases. One is you want it to be private, right? Right now, if you're doing inference, you need to send it to some service, and that service may or may not record it and both input and output. Second one is you want large capacity that can scale with more usage, right? Tomorrow I have 10x more users. I want to be able to scale with that.

Starting point is 00:24:07 And so this is where I think using some of this hardware that exists as well as kind of leveraging maybe new methods of privacy and coordination that can again crypto has like mpc like multi-party computation there's zero knowledge proofs etc like they can be leveraged to achieve that and have kind of secure like secure decentralized inference so i think that's way more realistic than training and also way more needed and then i guess one of the really early applications that Nira was thinking about was data labeling and to your point, the ability to pay people who are doing data labeling for AI purposes, right? And since that time, I think a number of companies have really grown out in terms of the data labeling world in a

Starting point is 00:24:54 centralized way. There's scale.aI, there's serves, there's a few others. Do you think the best solution in the long run is still a decentralized model where you're using tokens to pay effectively for labeling? Do you think things will stay in the centralized world? Like, how do you view all that evolving over time. Yeah, I think decentralized kind of a Web3 marketplace is a more effective way to do this. And it kind of provides few interesting benefits. One of them is that it opens up kind of the market, right, where you don't need to set up like a local office and kind of hire people and train them, et cetera. Like you can just open up global market, anybody can join. And you have a very specific rules, right, that if they follow, they get paid, right? So I've

Starting point is 00:25:38 mechanical torque before, for example, and you can actually, as a client, you can just decline paying them, right? So people in mechanical Turk, like the workers have very low kind of way to push back if I say, at the same time, they don't have any like quality and knowledge assessment on the platform, right? So I think having quality knowledge and this kind of escrow model all embedded into one marketplace that opens up for everyone and, you know, anybody everywhere can get paid at any time, like offering that both the people who doing this work wants because they kind of are more protected actually. And it's like fair game. And then the people who want to give tasks, they can actually get access to like way larger workforce. They can

Starting point is 00:26:22 like specify specific parameters. They can, you know, price it at whatever level they want. That's going to be the kind of future of it. Can you talk a little bit about what makes the quality control problem for annotation hard here? Right. Because one thing that I've seen, with significant research labs is like still continued in sourcing of annotators for both pre-training sets and LHF because some of the external services and marketplaces can't get to the level of quality that they're looking for in particular domain. So can you just describe the dynamics there? Yeah, so I think there's two parts. One is like domain knowledge, right? That generally like hard, like it's hard to tap in into like a specific centralized service right because they need to kind of like for them to do payments to do all the things they need to set up a subsidiary in whatever country they have the workers they need to train them they need to hire them maybe it's contracts but like they need a lot of overhead that they do that for example developers let's imagine you know you're building a new really cool developer platform uh which uses you know language models and you want to fine tune on code right well

Starting point is 00:27:35 the existing platforms, like them hiring a bunch of developers to actually do this, right? And, you know, if they're doing this full-time, it's like super complicated. Then kind of building out the validation tooling for how to like cross-valid that the work has been done. Now, on the Webstream Marketplace, you know, any student can join and like do this, right? They don't need like, you know, join it, like get a contract with a specific company. They don't need to have the company in the local region to work with. with them. And like students, you know, for coding, for example, are really interested in doing this because they usually don't have much money. And this is a way for them to practice their

Starting point is 00:28:15 work anyway. And then as a task giver, you can actually specify the specific way you want the cross-valdation to happen. And one of the things we've done, it's like honeypots, right, where you actually specify specific types of incorrect answers that people need to mark as incorrect and otherwise they actually lose the buy-in. And so there's like, actually, like very clear like economic game theory where people have buy-ins they they lose them if they like do poor quality of work and so they have like way more incentive to do this versus like let's say if you're working on the contract there's like way more leeway usually if you're not doing your work right so it's like just way higher kind of self evaluation as well that happens and so

Starting point is 00:29:02 I mean there's a lot of pieces that needs to come together for this to be like high quality. But again, it just opens up this marketplace and makes it effective. And it, in a way, removes a lot of the human part as well. One thing that I think is really neat about how near approaches innovation is you do both internal sort of near road mapping and product development. And then you also have a series of things that you either spin out or spin up or you're sort of involved with as sort of these ancillary companies or projects or efforts. What areas are you most excited about over the next coming year in terms of either nearer or some of these other efforts that you're involved with.

Starting point is 00:29:34 So we do actually have a project in this Web3 AI data marketplace that we are spinning out to focus on. Now they build a product. They have all the pieces. Now it's like ready to actually go to market and bring customers. I think the really interesting area is kind of partnering with existing kind of either already Web3 enabled or interested in Web3 teams who want to give access to me. more functionality because they're users, right?

Starting point is 00:30:05 We have, for example, Sweatcoin, which is a really good example of, like, it was a VEP2 project that had 120 million installs, that had a ton of people using it every day, kind of for a very specific use case, right, kind of tracking their steps and, you know, maybe getting a discount on their next shoes. But now, as they're transforming to Web3, they're kind of opening up, right, and you can now participate in economic activity, you can, you know, learn about new kind of innovations that happening in the ecosystem, you can now, you know, but like as they integrate more into blockchain library system, I can potentially interact with, like, on the social side, do the

Starting point is 00:30:41 tasks and gigs. And so like you kind of really open up the, what before was like a very limited kind of economy to really this like, you know, composable open web. I think that's really exciting and like we will see probably more and more examples of that. And finally, I'm really interested in kind of, as I mentioned, like, because we have now open web and social wear, the kind of what I call future of SaaS. So I think a lot of, between Web3 and AI, a lot of SaaS will actually start being replaced. Because right now what SAS is is like one database

Starting point is 00:31:15 with a specific UI for a specific problem. The database is the same between CRM, the hiring tool, marketing tool, even some of the project management tools, right? The database underlying is not that different. And it's been just like the front end. And like interconnecting all of this databases is like a ton of work. It always breaks. Right. But now you can have like the database you own, right? So using kind of Web 3 tech. And then you can build all of this front ends on top, either through kind of blockchabry system shared components or even through describing with natural language, some of the interfaces and business processes you want to have. Right. So the way people will interact with like kind of their business operations and, all the tooling they need will start to change.

Starting point is 00:32:04 And so I'm really excited about this space. And we have one company that is kind of starting to build out some of the things in this space. And over next year, we'll see kind of that evolving. Do you think that moves to an agent-driven world? In other words, when you imagine the interfaces on top of this that are sort of driving these business processes for future SaaS applications, do you view them as sort of traditional UIs or do you view them as agents that are interacting programmatically or some hybrid? It will be a hybrid. So, like, in my imagination right now, at least, I expect, like, you can describe a business process, which is like, hey, you know, when we have a new creative from, like, marketing department, spin up a Twitter campaign and create me a dashboard that tracks the conversions on our product, right?

Starting point is 00:32:49 And so what it does, it, like, creates, you know, the pipeline of those things. And then it also creates a page where I can see, like, normal user interface of, like, analytics. So it might be more generated dynamic UI. Exactly, yeah. And it's like adjusted for specific use case you need. And probably there's like a bunch of templates that is like, you know, fine tunes for your specific problem. And this is possible right now. Yeah, I guess it kind of moves you down the path of what you were talking about in terms of like AI as CEO or AI as project manager where you're kind of morphing into a world where you're delegating to an AI to drive a bunch of activities and then come back to you with the results like you would an employee or a coworker, which is very different from the world of UI today. where you just go to the same spot to see analytics, you go to the same spot for communication, which is your email, you go to the same spot for,

Starting point is 00:33:37 you know, interacting with the workflow. And you're saying this should be more of a dynamic world where things get brought back to you based on a series of tasks that you provide out. Yeah. And it's like probably a shared environment as well where, you know,

Starting point is 00:33:47 we probably will co-work on a business process and, you know, we'll share one display, but then we'll maybe fork it because I'm more interested in conversion and you're more interested in retention, for example. And so that's kind of the dynamism,

Starting point is 00:34:00 right now that also doesn't exist where like we all look at the same you know gira task management and I'm like I don't really care about half of this stuff right but it's not a filter problem it's like I want different information showed in a different way author of the paper that changed the world here we are in 2023 is it bigger transformers all the way or there are other architectural directions that are worth thinking about that you're paying attention to I think there's definitely something around like how do we get this models to have the capacity to to let themselves think before outputting or like kind of process more. And I think it's like still within the transformer structure and it can be like advanced.

Starting point is 00:34:43 But I haven't seen anything that's like really matches my intuition around us. But I think the like the simplicity of this architecture and like indeed like the amount of optimization that's going into this right now is just it will be really hard to match. and kind of, you know, if there's enough expressivity, you can express any function. So, like, this is not a problem at this point of, like, hey, we don't have an expressivity, right? It's more around how do we, I like, compose a data set that's, you know, cleaner, better, or add some, you know, self-critique and understanding of, like, is this content correct? Or I need more time to think versus, you know, hey, I'm forcing you to output next token,

Starting point is 00:35:24 even if you don't have an answer yet. So I think that parts we really need and I think they kind of fit in the architecture, but just require more engineering and more different types of tasks as well for training. I think like, you know, the fact that we're just using a big language model is kind of interesting because this is not the task he would expect everything to be able to, you know, just predict next token. So like, you know, starting to, obviously, our LFH being already helpful, but like starting to like, hey, can you critiques this answer, what would be the better answer, et cetera. Do you view that as a training or fine-tuning thing or do you that as an inference thing?

Starting point is 00:36:03 I mean, it's going to be like a combination, right? So I think we just need an architecture that at training time, you're able to. So like, I mean, the simplest thing is like instead of outputting a token in the next, right, you can actually give it like, you know, empty token, for example, for some period of time. And then when it says, like, okay, I'm ready. give it to output next token, right? And so this way you can train it to, like, think more before outputting. And then at inference time, you can vary it, right?

Starting point is 00:36:33 Like, hey, I'll give you more time to think, you know, or like, no, you have no time to think. But then you can train it to, like, actually being able to, like, dynamically to output. So, again, this is like a very simple thing, but, like, you can keep expanding on this, you know, output it and then feed it back and, like, is this a right answer, like, et cetera. So there's a few different models. But I think to Jakob's point, like the fact that this model is doing a really effective search in kind of this knowledge space means that probably like pushing more into that concept is more useful than doing more searches at inference time because like it means you already lost all the semantics if you're doing search at it first time. I think you made a really interesting point where it's possible that transformer architecture increasingly is getting locked in. And there's two components of that.

Starting point is 00:37:20 one is it just seems to run really well on the main silicon that we're using right now for AI, which are GPUs. And then secondly, there's so much optimization work going into it and so much being built around it that it effectively creates optimization that just won't happen for any models anytime soon. And so you effectively end up with this interesting feedback loop or lock in effect for this set of models. Do you think that we're in a spot now where this is just kind of the future for the next five years or 10 years or something? Or what do you think it's the likelihood that other approaches or architectures will emerge anytime soon? I mean, there might be another architecture that, like, reasonably fits with the same silicon.

Starting point is 00:38:00 I think there's an interesting example of there's a company that built an alternative, right, silicon, that is kind of allows to process things in pipelines. And so, like, the chips are actually, like, kind of smaller compute chips, but they're kind of all, like, in a grid and the data flows from one side to another. So the example there is, on one side, it's like a really interesting architecture. It can build really cool things with this, but it doesn't fit transformers very well, right? Like, you can do transformers with it, but it doesn't fit very well. And like your cost that, like, you get, like, you know, cost to output ratio is not that interesting.

Starting point is 00:38:40 And so in comparison to, you know, you're just optimizing on GPUs or using some of the new hardware accelerators. And so this is where example, like, I mean, I'm not to speculate, here on a specific company, but, you know, I wouldn't expect they will have, like, a ton of people lining up because, like, there is a ton of alternatives for Transformers that come in, and, like, somebody would need to, like, go in and develop a lot of new architectures that fit better as this model. And so it'll be really hard for them to, like, be a viable business and kind of have the economies of scale that Nvidia are having right now to kind of continue optimizing and building best state-of-the-art chips, right?

Starting point is 00:39:21 So unless somebody's, like, really investing in this, I think it will be more around, like, what else we can do with Carr and Silicon, right, and kind of combinations of this. And then, I mean, maybe there's something new will come out. Yeah, but when things lock in technologically, they actually tend to lock in pretty strongly until those are really big C-change or sort of the optimization of those things hit an asymptote. And it's interesting because I think a prior example of this kind of chip plus software reinforcement loop was really the Windows and Intel monopolies of the 90s. They used to call it Wintel for Windows and Intel because it was such a strong mutual lock-in effect where you had

Starting point is 00:39:58 chips. They were optimized for Windows and Windows was optimized for the chipset and it just kind of kept going from there. And so this is I feel like a stronger version of that in some sense where you have the underlying compute architecture and the most important model reinforcing each other in a way that kind of locks both of them in. Yeah. And what changed that is pretty much come of mobile, right, and creation of IRM devices, IRM chips that are kind of optimized for mobile and then came back into PCs, right? So, yeah, unless there's like a completely new form factor, which hard to predict, right?

Starting point is 00:40:31 But also it's like, that's a lot of investment to go from not just software, not just hardware, but like full stack, right, innovation. Yeah, I think it's unclear if this is a strong enough market force, but the short-term, you know, demand supply imbalance around GPUs with all of the growth of applications, especially as like you think any of these applications work, like inference needs grow, right? Your ability to build enough for Nvidia, really, to build enough GPUs to service the demand is, like, it's blocking a lot of companies, right?

Starting point is 00:41:04 And I think the question is, like, there is more incentive to make heterogeneous hardware work than there ever has been. And, like, can that catch up with the full stack optimization that you describe, the Kuda, like, investment that NVIDIA has made? It's super unclear. But I think, like, there's been no reason to chase that until, you know, this past 18 months. And I think now there is. Yeah. But at the same time, we have, like, every single, you know, large companies doing their own hardware accelerator. as well as, you know, a bunch of folks who are kind of spun out of those.

Starting point is 00:41:39 And so, like, we're going to have a, you know, a market full of hardware accelerators, which are still optimized for Transformers or at least, like, similar structured architectures hitting the market, like, this year and next year. Yeah. Ilya, this is great. I hope you will, after Alad and I work through all of the Transformers authors, like, Pokemon style, got to catch them all. I hope you'll come back for a reunion episode.

Starting point is 00:42:01 But thank you for doing this. Yeah, thanks for Zimpon. For sure. Thank you. Find us on Twitter at NoPriarsPod. Subscribe to our YouTube channel if you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

No Priors: Artificial Intelligence | Technology | Startups - The Intersection of AI and Blockchain, with Transformers author and NEAR founder Illia Polosukhin

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.