Waveform: The MKBHD Podcast - How Does AI Actually Work?

Starting point is 00:00:00 Autograph Collection Hotels offer over 300 independent hotels around the world, each exactly like nothing else. Hand-selected for their inherent craft, each hotel tells its own unique story through distinctive design and immersive experiences, from medieval falconry to volcanic wine tasting. Autograph Collection is part of the Marriott Bonvoy portfolio of over 30 hotel brands around the world. Find the unforgettable at autographcollection.com. Support for this podcast comes from Huntress. Keeping your data safe is important. However, if you're a small business owner, then protecting the information of yourself, your company, and your workers is vital. In comes Huntress.

Starting point is 00:00:48 Huntress is where fully managed cybersecurity meets human expertise. They offer a revolutionary approach to managed security that isn't all about tech. It's about real people providing real defense. When threats arise or issues occur, their team of seasoned cyber experts is ready 24 hours a day, 365 days a year for support. Visit huntress.com slash Vox to start a free trial or learn more. What's up, people of the internet?

Starting point is 00:01:20 People of the internet, yes, it's David. Today we got a little bonus episode for you. Don't worry, we've got a regular episode on Friday still, so stay tuned for that. But I wanted to dig a little bit deep into what exactly AI is, right? I think we all have been hearing about that for months now, years possibly, but nobody actually has really explained

Starting point is 00:01:42 what it is or how it works, right? People just say that things are AI, but what does that even mean? So I want to give an answer for that. So I called up my friend from Google who definitely knows what that means. And we have a little nice conversation about how this all works. So hope you enjoy. Daniel was gracious enough to come on the podcast. And of course, we met at my cafe. Classic, so yeah, we're going to debrief after this, but, um, enjoy. We've been talking a lot about, uh, AI and generative AI and all of this stuff that's happening in the world right now. And it's very confusing. So we thought it would be actually pretty useful if we got someone who knew what they were talking about to come on the podcast.

Starting point is 00:02:28 We weren't just speculating constantly. So today we have Danu Mbanga with us. He is Google's head of generative AI or director of generative AI. So we're gonna have a long conversation about what that means. All right, so Danu, if you were to explain to someone, including me, what you do at

Starting point is 00:02:48 Google or as your job, what would that be? We we try to incubate generative solutions into production grade applications for companies, startups and or enterprises. So would that include like a company comes to you, they say, we want to use generative AI, and then you work with them to actually integrate it into their product? That's correct.

Starting point is 00:03:12 But we do have many other teams that really focus on that long tail work of integration, so to say. Our interest is to figure out what are the patterns that are not necessarily common at this point, or the new ones, and then really turn that into 10x scalable packages so as you know many of these technology items especially within the ai space are fairly new yeah so they demand new technologies they demand new approaches to

Starting point is 00:03:37 technologies and then what we do is to try to figure out what are the patterns within this a bit open ecosystem at this point and package these patterns into applications that we can now either give to the teams that are more consistently working on with customers on putting that into their product or open source these capabilities so that some folks can use that. So instead of just throwing in a chatbot that is based on a large language model, you're actually integrating a specific solution that makes more sense to that company. Exactly. Think about the early days of programming when people were writing code, right? So you have a bunch of folks writing programs,

Starting point is 00:04:14 and then sometime I think about the 80s and the 90s, there's this common pattern around design patterns that emerged where some folks would say, hey, put these things together and then it's going to be called this specific pattern, so to say. And then based on the design pattern, you somewhat create a new language and a new mechanism for people to use technology in a bit more consistent manner. So it's what we do.

Starting point is 00:04:38 So we try to understand what are the design patterns of AI and Gen AI and then put that into either technology and or educational artifacts for people to use. Yeah, so I want to get into a little bit about what AI actually is, because we talk a lot about AI and generative AI and all this stuff on the podcast,

Starting point is 00:04:56 because it's like the only conversation happening right now and for the last year. But I think that something that confuses a lot of people is the fact that you see all these companies that are saying, we have AI now, we have AI now, and nobody really knows what that means. Sometimes it means they added a large language model chatbot. Sometimes it means they added some stuff under the hood that is doing a lot of work. Sometimes it means they're just rebranding something that wasn't really AI into AI. So in your words, what is AI in relation to what we're seeing in the industry right now? So to me, right, AI is a system, so to say.

Starting point is 00:05:36 It's a collection of tools, techniques, science, and engineering capabilities. And when we get to talk about generative AI, I'm also going to talk about it in terms of a system because i don't think it's necessarily one single thing um and it has evolved over time but if you look at that system that ai system overall um it's it's again like i say the collection of tools and technologies that are really geared towards uh providing human cognitive capabilities to uh computers and make it so that these computers can accelerate the processes through which we produce different things in technology. So you can think of AI as being a collection of planning and scheduling and sensing the world around us and understanding that world into a set of cognitive containers, so to say,

Starting point is 00:06:28 and then being able to do other things out of that level of understanding. So AI is somewhat bringing intelligence, human intelligence, analogs to the computers overall. And so I know that's a bit of a complex definition, but that's where we are now in understanding that as a system. And when you try to break that down into what that really means in terms of technology, then it comes in three major forms. One of the major form, which is an umbrella term, which is AI, it encompasses things like, like I said, planning, sensing, scheduling, and then processing that data that you sensed with a certain list of tools. And then these tools are usually borrowed from the mathematical worlds of statistics and probability. And then combining the collection of these tools is what we traditionally call machine learning.

Starting point is 00:07:18 So AI is bigger than machine learning. And then within machine learning, you have a set of tools that are mathematical, statistical and whatnot and then a subset of these tools which is also the essence of generative AI is a family of techniques called deep learning. And so deep learning is involved with using neural networks so to say which is almost an artificial representation or analog or modeling of what the brain could possibly look like to the full extent of our understanding of it and trying to represent essentially a data structure that would be used to process and set of techniques that would be used to process the data that is

Starting point is 00:07:59 sensed. Just to reel it back for a sec so that people understand the difference between those three, can you, in a couple of sentences, define the difference between machine learning? Like, individually, what is machine learning, what is deep learning, and then what is, what was the third one you said? AI? I guess AI, yeah. Between those three, can you define them in like two sentences each? Got it. So, with AI, you want the machine to do things that seem human sort of same right imagine being here and someone asks you hey david what is the color of the car in the garage you would have to do a few things you would have to plan the way you would get out and get to the garage you would have to look at this artifact in the garage and understand it as a car and then

Starting point is 00:08:43 you would have to understand colors and then look at that and say okay the color is red for example so there is this set of steps so to say that you have to carry out as a human intelligent person that would say okay i'm going to plan my way out i'm going to plan my way into the garage i'm going to look at this object detect that object as a car and then eventually eventually detect the colors, right? So there are a few things that you do. Now, if you were to break that, so that's AI, so to say, imagining that a system could do that,

Starting point is 00:09:12 imagining asking a robot to do the same set of tasks, then overall, I would consider that to be AI. Now, if you break that into some levels of deeper details, and taking out the planning and scheduling, what are the techniques that you use possibly for navigating this ecosystem all the way up until you got to the garage? What are the different techniques that you use

Starting point is 00:09:32 in order to analyze that object and understand that as a car? And so that set of techniques is what is machine learning. Okay, so it's like machine vision and object recognition. That kind of stuff would be the machine learning. Okay. So it's like machine vision and like object recognition. That kind of stuff would be the machine learning techniques that get applied on top of AI that create the machine learning mechanism. Exactly. So machine learning could be considered just the mathematical underpinning set of artifacts that you would use as a subset of AI. Okay.

Starting point is 00:10:01 Right. So and then deep learning is just one of these techniques. So within the context of machine learning, there are different techniques. One of them is called nearest neighbors. It's usually preceded with a K. So K is for a number. We can say, what are the four nearest neighbors to David and Danny?

Starting point is 00:10:23 And then as a matter of fact, we would look at all the people that are within these buildings and then understand what are the people that have a distance that is the four closest distance to us. So those are the four nearest neighbors. That's just one technique out of many techniques. There's another technique called support vector machines. And there are many other techniques like regression, classification, and so on and so forth. Now, deep learning is a subset of all of these techniques that uses neural networks as a representation of the data that you would use to process in order to identify objects, classify objects, and so on and so forth. So you get AI as a bigger bucket that has other things including

Starting point is 00:11:02 planning and scheduling and sensing you get machine learning that is more focused on the mathematical and statistical and probability techniques yeah okay and then you get deep learning that is just one of the application of machine learning techniques that focuses more on artificial neural networks and then deep learning did that become something that became very popular in the vernacular because it was discovered that it was a very good way to do machine learning? Did people try to do a bunch of different machine learning techniques, but deep learning just became the most useful one? That is correct. So bringing it back to your initial question, which was how do these techniques or how do these definitions relate to the current state of

Starting point is 00:11:45 affairs of machine learning? It's very related because machine learning has been applied for a while, deep learning also, for the last 10, 20 years, so to say. So the techniques have been around, but then the techniques were boosted based on the advent of a couple of capabilities. And then so we started observing that deep learning was really doing two things. One, it was that it was able to process a large amount of data. And so the traditional machine learning techniques, supervised learning and so on, they would tend to plateau when you give it too much data. So it would give you some performance

Starting point is 00:12:22 and at some point it wouldn't really give you more. It doesn't scale. It doesn't scale. So you start having diminishing returns. So you spend a lot of compute capabilities, but you're not really getting good results. But with deep learning, it was seen that you can one, parallelize that aggressively if you have a lot of compute capabilities, GPUs and or TPUs. And two, it wouldn't necessarily plateau. That means that you can give it a lot of data and the performance will keep going up. And so what we saw was that

Starting point is 00:12:52 the techniques have been applied for the last many years, but their techniques are increasingly getting better and better given the advent of additional capabilities that are supporting that increase in performance. What is that additional capability given the advent of additional capabilities that are supporting that increase in performance. What is that additional capability that has really tipped the scale, especially in the last year? Let's go back to the last six years, so to say, with the invention of the transformer architecture. Do you want to explain what that is?

Starting point is 00:13:24 So the transformer architecture was created in 2017. And before that, there were many other architectures within the deep learning ecosystem that were used to process data scale. One of the abilities for these systems to process text, for example, or sequences of data, things like music, things like video, things that have to do with frames, had been studied for many years. So we had sequence-to-sequence models, we had things that we called LSTM, long short-term memory models, that essentially made it possible for someone to process sequence data

Starting point is 00:13:59 and even possibly generate sequence data. But the problem with those architectures was that if you have a text, if you have an entire page, and then you want to either summarize that or analyze that, then you have to put an entire thing into the model. And so we started having limitations with the capabilities that the machines themselves would have to host that amount of text in order for you to ask a specific question of that text, for example, what is this text talking about? Or generate a summary of this text or whatnot.

Starting point is 00:14:29 So there were some scaling issues. Because if you were to synthesize the entire page of text, it's hard. It's more computationally expensive to the n plus one degree to generate or to synthesize more and more text as you add words. Exactly. And it scale, it scale quadratically essentially right. The other thing is that to improve the quality specifically when you have to analyze things like text you want to maintain a certain say grammatical structure. If you're being

Starting point is 00:14:58 asked a question about a sentence sometimes the answer is really towards the end of a sentence but you have to maintain the context with the beginning of the sentence. So there was this idea of essentially keeping or maintaining a structure of the content that you're analyzing by applying different mechanisms. And one of the mechanisms that was invented with the Transformer architecture in 2017 is what we call the attention mechanism.

Starting point is 00:15:22 So the attention mechanism is a mechanism through which, within the neural network, it's possible for you to maintain a structure or keep information about how, let's say, specific words are related within the text that you're analyzing. So essentially, you're coming up with a mechanism through which you can analyze a large amount of text while still maintaining the information about how these specific tokens and or words, words is just one representation or tokens are one representation of words, are related within that context.

Starting point is 00:15:55 Now it gets very expensive computationally and on memory and storage to get that done. And that was the challenge pre-2017. What the transformer architecture brought about was the ability to process these large amounts of data, maintain the structure that they have, and it not being extremely expensive on the hardware, the storage, and the compute. So then it was possible by basically parallelizing

Starting point is 00:16:23 some of these architectures to make it possible for you to process a very large amount of data, build extremely scalable, very, very internet said that if you were able to basically break through the plateauing of diminishing returns when you start having more and more and more performance. So we started seeing things going this way, where you get more and more performance and eventually you get new abilities, you get emergent abilities out of the same models. When you say emergent abilities, do you mean things that we didn't expect? Exactly. Traditionally, we would train what we call supervised models based on tasks. And so essentially what that would be would be that you would go to the model and say, what's the color of this object? And then we say red. So that's a model that is trained

Starting point is 00:17:19 towards understanding, given an object, what is the color. And so the way you do that is that you give it a lot of examples that are labeled and you say, this is a mug, the mug is red and black. This is another mug, this mug is white and so on and so forth. You tag it manually. Exactly. And the next time you show it some data, it will give you a mug. But it's expensive to basically train one model that can recognize mugs recognize people also answer a question and so on and so forth so being able to give multiple tasks so to say to a single model that you train once was a challenge but to make it multi-modal yeah multi-modal has a couple yeah you can make it multi-t multimodal. Multimodal, essentially, to a level of simplicity, really means that you're able to get the model to analyze

Starting point is 00:18:10 images and text and audio and video at the same time. And then it could be multimodal input, single output, i.e. you train the model to see images, text, audio, video, but you're only asking it questions about text or the text-to-text format. That paradigm is called a picture is worth more than a thousand words. So you can essentially get multiple pictures within the model, get it to learn from it, but the way you interact with it is still in the text map.

Starting point is 00:18:38 So we started seeing benefits where very, very large models that had seen a lot of data coming out of the entire, not entire, but a huge part of craw where very, very large models that had seen a lot of data coming out of the entire, not entire, but a huge part of crawled websites, for example, a huge part of data that is available out there started behaving in such a way that they had this almost general purpose intelligence. They could do reasoning up to a certain extent. And that is tested by giving it some mathematical problems and then it would do derivation, so to say, assuming that it had seen some of these derivations in some mathematical books, for example, or writings. So it would learn that structure, leveraging that attention mechanism and being able to derive the answer step by step and give you a specific answer. being able to derive the answer step by step and give you a specific answer. And is that still considered an emergent property if it was being fed different levels of derivations through different text input?

Starting point is 00:19:32 That's a very good question. So the thing that makes that an emergent property is the fact that it's doing that in a multitask fashion. So remember, initially we would train one model to do one thing. So if it was one model that was trained only on doing a derivation over a specific mathematical problem set, that would be very simple. It wouldn't be considered emergent. But if you train one model that can do that on a mathematical corpus, at the same time take an SAT exam, at the same time give you a summary of a specific piece of text that you give it,

Starting point is 00:20:07 and at the same time write code, at the same time optimize code and review code, at the same time. So those are the different kinds of emergent properties that a multi-task, a large model is able to do. In your opinion, are those emergent properties kind of subsets of the attention mechanism? Like, is that the thing that really allows it to do these kind of things? One analog that I would give you is, you know, in physics, for example, when you have particles that are moving

Starting point is 00:20:42 at a very, very fast pace, so to say, in a contained environment, then you start getting temperature, right? Heat and whatnot. And if they move faster, then you get higher and higher temperature. Temperature itself or heat itself is not necessarily something that is a physical artifact. It's an emergence of that fast movement. But that movement itself is very simple. So similarly, the attention mechanism makes it so that the specific elements that you feed the model get to learn about each other.

Starting point is 00:21:16 And so they get this interaction mode through which they basically function. They have this simplistic function mechanism at a very, very low level. And there's almost this transformation, this phase transition that happens where the higher level thing, which is the model, starts giving you some of these specific behaviors in a multitask fashion. It has skill sets you didn't anticipate it to be able to have that are based on things

Starting point is 00:21:41 you did give it, but you didn't realize were connected. Exactly. A couple of other emerging properties. One of my favorite is called in-context learning, where basically a large model now would learn from what we call demonstrations. So again, traditionally, you would want to give an input to the model, and then the model would give you an answer. That is a straight input-output relationship. But some of these models today, you could say, hey, give me an answer that looks like this, or here are four demonstrations of the kinds of questions that I will be asking you.

Starting point is 00:22:12 Therefore, going forward from now, I need you to be answering these questions in this manner. And for some reason, it's able to remember that context, learn from these demonstration that you gave it, and then start giving you answers going forward that sounds like that. And that's why... Is that something we didn't expect it to be able to do? Exactly.

Starting point is 00:22:30 That's why systems like ChatGPT or BARD are very interesting in that sense because you can even basically tell the system, hey, you are a knowledgeable scientist about this field. Given that background, start answering my questions and then it will be giving you some very interesting and then there are many ways you can get creative about that space right you can say you are a very funny and creative artist start giving me answers within these specific uh steps and the last emerging property i'm going to talk about is what we call chain of thought or

Starting point is 00:23:05 reasoning. I think I spoke about it a bit earlier, where the model or the AI system is able to give you a step-by-step breakdown on how it came up with the answer. Right. Right. So that's very interesting too. And that's definitely not something we expected it to be able to do. Exactly. Okay. So that was a lot. I think there's a lot of answers to this question. But effectively, it seems like AI is sort of the outer layer where you try to teach human analogs to a machine. And then you've got machine learning, which is a subset of AI, and deep learning, which is a subset of machine learning. And then when you feed these models, just these enormous amounts of data, you end up with these emergent properties that you're not really expecting.

Starting point is 00:23:51 We're going to get a little bit deeper into those emergent properties and very philosophical next. So stick around. I think I'm going to go get a coffee first. Support for this show comes from Klaviyo. You're building a business. Klaviyo helps you grow it. Klaviyo's AI-powered marketing platform puts all your customer data, plus email, SMS, and analytics in one place. With Klaviyo, Tendfish Phenom Fishwife delivers real-time, personalized experiences that keeps their customers hooked. They've grown 70 times revenue in just four years with Klaviyo. Now that's scale.

Starting point is 00:24:39 Visit klaviyo.com to learn how brands like Fishwife build smarter digital relationships with Klaviyo. Support for this show is brought to you by Nissan Kicks. It's never too late to try new things. And it's never too late to reinvent yourself. The all-new reimagined Nissan Kicks is the city-sized crossover vehicle that's been completely revamped for urban adventure. From the design and styling to the performance, all the way to features like the Bose Personal Plus sound system, you can get closer to everything you love about city life in the all new reimagined Nissan Kicks. Learn more at www.nissanusa.com slash 2025 dash kicks.

Starting point is 00:25:25 Available feature. Bose is a registered trademark of the Bose Corporation. So I think that because large language models and chatbots and things like DALI are sort of the only things that a lot of normal people in their everyday life have seen AI affecting. What else is the transformer transforming? What industries are being pulled up by AI and what's actually driving that? Because I think that most people just see like, oh, we've got chat GPT. Oh, now this random app that I never talked to has a chatbot for some reason, right? But we hear all across every industry that every industry is being uplifted by AI.

Starting point is 00:26:11 So is that also transformer-based? And how is that working since it's not using a language model? Right. So the transformer started the revolution, so to say, right? So the ability to have these emerging properties. And since then, so that was in 2017. It the ability to have these emerging properties. And since then, so that was in 2017, it's been, what, six years now? Since then, there's been a lot of evolution of that specific architecture. There's been a lot of creativity around building some of these AI

Starting point is 00:26:37 systems, generative AI systems that can generate images or text, or given some text, give you some image or given some image, give you some text that's captioning and or applying this paradigm shift, so to say, into many industries and many applications. There are two ways I would say we can look at this. One is old-school AI is not gone, so we're still using that. We're still applying some of these techniques or is old school AI is not gone, right? So we're still using that. We're still applying some of these techniques

Starting point is 00:27:08 or recommended systems. When you go on the website, you're still being recommended some artifacts, some things to buy and or suggestions of books to read and whatnot. So many of these initial applications of AI are, they're really, really useful for very large companies that have the abilities. And this is one thing that I really like talking about.

Starting point is 00:27:32 Very big companies that have the ability to hire hundreds of engineers, so to say, or dozens of engineers. Highly trained, highly paid, that can build some of these highly tuned systems that would scale to, say, hundreds of millions of users. For the businesses that are not the multi-billion dollar businesses, we're seeing new opportunities open up because these industries can now use some of these generative AI systems. In the past, you needed about seven months to 18 months to build an application with programmers, designers, product managers, and so on and so forth. But now if you have a vision, well, you can go on board and say, hey, this is my vision. Help me iterate on that. Give me five ideas that are related to

Starting point is 00:28:17 this. And then after that, you can say, hey, now write a product requirement specifications for a system that may look like that. And then you can say, hey, based on all of these interactions, write a project plan. And you can iterate on that context with the chatbot, so to say. And then after that, you can say, hey, considering these artifacts or considering everything that we've talked about, help me write a design document that I could use to implement this app, this solution, and it would do that.

Starting point is 00:28:48 And then you can say, now I need you to help me implement this in Python. Design the APIs for me, write the implementation of the APIs for me, write the system design for me. It could even help you draw some of these things. And so what you're seeing is that you're moving from a life cycle where you had to

Starting point is 00:29:06 use about 18 months with a team of 10 to even get an idea into a good shape to probably a matter of hours to weeks working with prompts and being very creative in the way you interact with that part or as a smaller group you interact with that part yeah to come up with a solution that is pretty pretty good yeah and so what i see is that many industries um many startup and enterprises are really really taking advantage of that i've seen good examples in media i've seen good examples in in healthcare and life sciences i've seen good examples in financial services but in all things essentially i'm seeing a lot of movement. Do you use these kind of systems in your own work to build your own APIs and stuff?

Starting point is 00:29:50 You use BARD for your own work? Yeah, I use BARD. I use BARD every day. Every time I have an idea, every time I want to process something, I use BARD to iterate on the idea. Wow. I use BARD for outlines.

Starting point is 00:30:04 If I need to give a talk for example at a conference um usually for me the process of creating content would be based on the work that depends on the topic but based on the work that i do and based on some research i try to come up with a specific outline that really touches on the points that i would like to talk about and so i use bard to create to help me create that outline and then i may fill the outline myself and give it back to BARD and say, hey, help me summarize this. And or help me extract specific talking points out of this. And then I can say, hey, make this a bit more creative and make this a bit more, you know, in different types of tones. So there's this, that's one mode of interaction. The other mode of interaction is the one that I spoke about earlier, which is when I have an idea, rough idea,

Starting point is 00:30:47 say I want to create a system that helps you determine what coffee you're going to drink in the morning based on prior, whatever, like just a toy example like that. And so I can formulate specific questions and interact with BARD in that way. And I could have a prototype before the end of the day that works, that is implementing Python full stack. That's crazy.

Starting point is 00:31:08 And backend. Yeah. Yeah. That's like a productivity explosion. Exactly. I want to reel it back a little bit because we talked about AI, we talked about machine learning, we talked about deep learning, but the big thing that's being, that's on everyone's mind in the last year is generative AI, which you've talked about multiple times so far. But we didn't really define what generative AI is and what makes it different from those other forms of AI.

Starting point is 00:31:32 So can you give a quick explanation of what generative AI actually is? Remember, we talked about AI overall being a system, not just one thing. So in machine learning, being a set of techniques that are more mathematical in nature, deep learning being one of these techniques that focuses a bit more on neural networks. So by virtue of getting something that is a lot more fundamental,

Starting point is 00:31:56 generative AI is a deep learning technique. So it's still using the deep learning technologies, but generative AI is really focused on generating or creating a specific artifact. And so that artifact could be an image, it could be a piece of text, or it could be a piece of audio, or it could be something else. Yeah, that's a very simplistic definition of what generative ai and what what is the foundation of generative ai like what allows that to work because we see things like generative fill in photoshop we see generated music now like every single creative industry and non-creative industry is being sort of upended by this generated content what is allowing systems to actually generate content instead of just classifying content yeah so that's a beautiful question in the sense that there's a very, very strong common denominator among all of these things.

Starting point is 00:32:49 And that's the transformer architecture I spoke about earlier, right? So what we've seen is that applying the same technique and then changing the question a little bit gives you exactly content that is generated that you're interested in. For example, we can say, using these lower transform architecture, help me generate an image. You can give the generative AI problem as given images of different artifacts, like animals, like cats and dogs and whatnot, create something that looks like some of these things using, I don't know, interpolation or extrapolation, different techniques, and make it look like the family of things that I've shown you in the past. And it will give you

Starting point is 00:33:34 something that doesn't exist in real life. Maybe the image, very high fidelity image of a dog or cat that doesn't exist in real life, but really, really looks like the samples or the things that you've shown it in the past. So the ability for these models to essentially create content in different modalities is the generative ability. Yeah, so we think about like large language models being fed into a transformer, right? And that's just like,

Starting point is 00:33:58 give me all of the text that has ever been written on the internet and we can develop relationships between words. But when you're when you're generating an image or you're generating audio what is being fed into the transformer in that way right because we we see uh you know there's um a lot of genetics work that's being worked on with transformers too what kind of data do you feed into transformers to actually make that work in a variety of different fields. Right. So in general, you would give it today, and text was very easy, easier to acquire.

Starting point is 00:34:30 That's why you hear of large language models today a lot more, right? And the results also from generating text were a lot more impressive and exciting to look at. That's why, in my opinion, that field somewhat took over. But you're right. So you could consider the input to be pretty much anything that could be put into a sequence. A video, for example, is a sequence of frames. So you could give multiple videos broken down into frames to a transformer-based architecture,

Starting point is 00:35:02 and it gets a bit more complex in a way those sequences are processed, all structure is maintained. There are many techniques around our attention mechanism and so on and so forth. Okay. Let's consider that to be a black box and then it knows how to do that. Then what you give it is a set of frames,

Starting point is 00:35:17 which are videos, so to say, and then you say, give me something that looks like that. So in that sense, you've given it videos or a set of frames. You could also have a mechanism through which you give it videos and text, which we do today. There is this encoding model that is called Clip, essentially putting together images and videos, I mean, and text. Which is the foundations of Dali and a lot of AI image generation at least that

Starting point is 00:35:45 foundational technique of of these kinds of abilities where you you you teach the model to recognize images and text together as a as a joint entity so to say and the process through which you do that is by getting the images processed with what we call tokenizer and or encoder specific to an image and that turns that into a vector we call that an embedding and then you do you go through the same kind of process with the text where you turn the text into a vector and then once you have these two vectors you can then combine them with basically algebra and then at a higher level you have the task and all the questions that you want the model to answer

Starting point is 00:36:25 in one scenario you could you would want the model to say for example given an image explain the content of this image for me or you may have the reverse problem which is given a text generate an image that contains the the information so to say that i've provided in this text which is the business of mid-journey. Yeah. So kind of to break that down, depending on the field that you're trying to use Transformers on, you are turning data into numbers, and you're comparing those numbers to each other and then getting an output.

Starting point is 00:36:59 So because you're able to take video or images or text and vectorize them and turn them into tokens, you can compare them to each other, even though they're different types of media. Correct. Correct. That is excellent. And one of the things that makes it really work beautifully is because once you take the images or video or audio, you encode that into an initial vector. That process is called tokenization. Then once you get the token, and by the way, the tokens can be a bit more complex. For example, the tokenizers could learn to not just use a word per token mapping, but

Starting point is 00:37:36 it could also split words into two or three if that word has a bit more complexity, multiple meanings and complexities, or if it finds it effective. So sub-tokenization? Sub-tokenization. has a bit multiple meanings and complexities, or if it finds it effective. So- Subtokenization? Subtokenization. So you may have a situation where a five words sentence gives you 12 or 15 tokens or maybe less. So it's a matter,

Starting point is 00:37:56 the concept is more about information preservation within a substructure that is a vector, rather than a one-to-one mapping between the words and and the vector right same thing with images an image is a two-dimensional structure which has a third dimension of red green blue right right so if you flatten that entire thing into a pixel intensity over that entire two times uh times three so to say dimension then then you get a larger vector but that's just a

Starting point is 00:38:25 simplistic tokenization where you say, hey, I'm going to flatten an image, flatten that more by red, green, blue. And then after that, I'm going to have a vector representing the pixel. From there, you can have a deeper tokenization that may consider the structure, for example, the adjacency of objects or the distance between objects, or even some deeper level of understanding of the objects within that image. At the end of the day, you go from a piece of artifact like audio. And in an audio, you use a spectrogram and you turn that into a specific artifact. So you go from an asset to a vector. You go from an asset to a vector.

Starting point is 00:39:10 Now, there's this other step called embedding, which is basically doing a projection of that vector onto a vector space that is shared by every other piece of artifacts, I mean, other piece of data in that space. It's like a normalization. Like a normalization. But then by that projection, what you essentially do, especially if you have a multimodal model, like if you work with an image and if you work with text, for example, then you tokenize them each, which is a one-to-one relationship between the image and the text and the tokenizer that works for them. And once you have these two vectors, you do that through training the beautiful thing about that is that once you land these things within the same space they become of the same nature right so you can

Starting point is 00:39:51 start comparing them yeah so you can start assigning relations and making uh having statements like a car car written in text form compared to the image of a car compares to the image of a car so it's like uh those will be closed in location. It's almost like a Rosetta stone. You're taking one language, another language, and you're sharing them in a certain way. And then once you have this shared, say you translate them all to Latin, then you can do whatever you want from there. And the common substrate of all of these different things, assumption at least, is that there's

Starting point is 00:40:23 information that is preserved in these different types of artifacts. So you're almost doing an information extraction exercise, right? Describe that, what do you mean by information? So it may be a longer conversation, but at the end of the day, so information, and I know you had a whole video about the nature of information.

Starting point is 00:40:44 It could be contextualized to the piece of artifacts that you're working with. But in a very, very simple manner, information is this entity or this thing that could give you... It's hard to define information without using information. It's this thing that can give you a bit of a pattern, right? And we usually base that pattern on the notion of order, disorder, symmetry, and so on and so forth. Yeah. But if you have something that can give you a pattern about indifference and or disorder about a specific subsystem, then you start having information. For example, if I do this, nothing has changed very much.

Starting point is 00:41:29 So if you were on the receptive end of that pattern, you won't really get much information. But if I do, there's a difference in what I did before and what I'm doing now. Now, you may not understand why I'm doing that, but you would understand that there's a difference between the frequency at which I tapped my hands before and the frequency

Starting point is 00:41:51 at which I tapped it after. Then you've gained information. So it's the same way that you may understand some differences within an image, for example, looking at a contour and then something changed between this and this, then you may realize that these may be two different objects and so on and so forth. And within text as well, you may have difference maybe between words or between paragraphs and between different structures. So you have some form of information.

Starting point is 00:42:17 And a beautiful thing about information is that it could be combined. So it's the evolution of information is what you're maintaining. It's the extraction of information is what you're maintaining. So it's the extraction of information and or differences in patterns within different modalities of data artifacts. But the beautiful thing about that is that it could be combined at a certain level. Right.

Starting point is 00:42:34 Or compared. Yeah. And that's what makes it possible for you to essentially extract information out of an image by understanding how different it is or how many different patterns exist within that image. Yeah. And extracting information out of a piece of text by understanding how different it is or how many different patterns exist within that image. Yeah. And extracting information out of a piece of text by understanding how many different patterns exist within that text. And then putting that together in a normalized space through which you can start comparing them. And then reversing that, you can now combine text to images and basically have that relation maintained.

Starting point is 00:43:00 With all of that combined, would you say would that be the fundamentals of a general AI that could do everything? We're getting into the realm of AGI. Yeah, AGI. I would love your opinion on that if you feel comfortable talking about it. Of course, of course. So what is intelligence according to you? According to me? This is such a big question.

Starting point is 00:43:27 I've thought about this a lot. My personal opinion on this at this point is, well, for the listeners, we're going to define AGI really quickly. AGI is artificial general intelligence. Effectively meaning you can ask an AI to do anything that a human could be able to do or possibly even more, right? And it could be able to help you with that. Would you agree that that's the definition or do you have an expanded definition? That's somewhat why I'm asking the question of what is intelligence. Because agreeing on AGI being artificial general intelligence assumes that we agree on what intelligence is.

Starting point is 00:44:07 Sure, okay. My definition of intelligence would be... Wow, thanks. The ability to synthesize information and create new actions based on information that you weren't explicitly told to do. That'd be probably my definition of intelligence. That's a decent definition. Would you disagree that the context in which you have to do that specific workflow that you define has to be defined, i.e. you have to do it within the context of, I don't know, literature or robotics automation in the subfield.

Starting point is 00:44:49 For example, having a robot that can control a specific arm either for surgery, and it would be a different thing if that robot controlled an arm, say, in a restaurant, and so on and so forth. I think that when we talk about the generalization of intelligence or even information, we're making a bold claim that goes beyond what we understand so far about the nature of these things. Uh-huh. Right. Sure. So, if I want to break down the problem of AGI, again, I might have already expressed that I'm not a very big fan of that definition because I don't really think we know exactly what we mean when we say that. But if we want to get into a practical realm, I think that it may be possible to essentially, and which is the state in which we are now,

Starting point is 00:45:36 by getting these models to progress in their ability to impact the world as well. So we discussed the software version of the AI so far, which is you give it data, it could recognize it, or at this point it can also generate data. But what is the software real-world interaction mode at this point? So we have many systems, for example, in healthcare and life sciences that have to deal with the real world in the way that, say, hospital equipment functions or in the way that, say, hospital equipment functions or in the way that a robotic arm that controls cameras functions.

Starting point is 00:46:11 So you get many other things about a real world that may have to do with intelligence. So I think a lot of the work that we're doing on improving the quality of these AI systems has to bring things all the way up to these definitions of AI that I mentioned earlier, which involve and include planning, scheduling, and acting, and sensing as well. So when you start augmenting these systems with these additional capabilities, then you start training agents that are able to plan and schedule and act in the real world. Then you get that sense of AGI that's closer to the definition that you gave it, right? Now, the ability to do that at that level, at that scale,

Starting point is 00:46:53 gets challenged by where are you sensing what kind of information and also where are you acting in which kind of world environments, right? And if you want to look at the real world in which we operate, and you want to look at all the types of interactions and actions that can happen, the number of possibilities is larger than the number of atoms in the universe, right? And so how would you have a generally intelligent system that knows how to act in this entire world that find that quite a challenging thing to believe? But if you constrain the problem, if you make the problem as simple as,

Starting point is 00:47:29 I want to have a generally intelligent system that would learn how to use all the hospital equipments within the hospital system, then maybe you have the opportunity to have an AGI system that can essentially take in a task and execute that effectively. So that is my techno-optimist view of the possibilities of AGI by training agents that have world representations, but these are simpler worlds representations that are constrained by the problem space in which you want these systems to operate. Right. And then being able to plan, schedule, sense, and act, including the other type of capabilities that it can do. Okay, interesting. So Danu doesn't think that we're going to have this one omniscient AGI,

Starting point is 00:48:17 artificial general intelligence, that's going to be handling everything. But he rather thinks that we're going to have these smaller, more specialized AIs that kind of handle different tasks and help us do stuff a lot faster. This is actually not that different from that whole conversation around the Tesla bot, right? Like where you could have a robot that's like a human that does human tasks, or you can have a bunch of really small robots that handle the tasks that we already do on a daily basis. Kind of the same thing. Pretty interesting.

Starting point is 00:48:45 In the next segment, we're going to get into the problem of AI hallucinating, which is where it just makes up a ton of random stuff. And that's clearly a problem. I was very curious about that. So that'll be a fun conversation. Plus, we need to see how fast Danu can type. So get ready for that. As you're back all season long, from puck drop to the final shot, you're always taken care of with a sportsbook born in Vegas.

Starting point is 00:49:27 That's a feeling you can only get with BetMGM. And no matter your team, your favorite skater, or your style, there's something every NHL fan is going to love about BetMGM. Download the app today and discover why BetMGM is your hockey home for the season. Raise your game to the next level this year with BetMGM, a sports book worth a celly and an official sports betting partner of the National Hockey League. BetMGM.com for terms and conditions.

Starting point is 00:49:52 Must be 19 years of age or older to wager. Ontario only. Please play responsibly. If you have any questions or concerns about your gambling or someone close to you, please contact Connex Ontario at 1-866-531-2600 to speak to an advisor free of charge. BetMGM operates pursuant to an operating agreement with iGaming Ontario.

Starting point is 00:50:13 Support for the show today comes from NetSuite. Anxious about where the economy is headed? You're not alone. If you ask nine experts, you're likely to get 10 different answers. So unless you're a fortune teller and it's perfectly okay that you're not, nobody can say for certain. So that makes it tricky to future proof your business in times like these. That's why over 38,000 businesses are already setting their future plans with NetSuite by Oracle. This top rated cloud ERP brings accounting, financial management, inventory, HR, and more onto one unified platform, letting you streamline operations and cut down on costs. With NetSuite's real time insights and forecasting tools, you're not just managing your business, you're anticipating its next move. You can close the books in days, not weeks, and keep your focus forward on what's

Starting point is 00:50:53 coming next. Plus, NetSuite has compiled insights about how AI and machine learning may affect your business and how to best seize this new opportunity. So you can download the CFO's Guide to AI and Machine Learning at netsuite.com slash waveform. Guide to AI and Machine Learning at netsuite.com slash waveform. The guide is free to you at netsuite.com slash waveform, netsuite.com slash waveform. Ellis wanted to hop in and ask a question real quick. Yeah, sorry. I really liked what you said about defining intelligence as it pertains to AGI. And I thought David brought up a really important kind of intelligence, like intuition and deduction and the ability to extract not just pieces of information, but threads and systems of information from multiple kinds of contexts. But there's lots of

Starting point is 00:51:40 other kinds of intelligence that people like cognitive scientists like to define and classify, things like spatial reasoning, things like engaging in dialectical thinking. And these are all intelligences that we've observed in ourselves. And so when we think about sort of a general purpose Swiss army knife AI, do you think that we should be limiting that to the kinds of tasks that our brains do on a daily basis? Or do you see that there's going to be almost like new methods of thinking and new cognitive strengths that emerge as these neural networks get stronger?

Starting point is 00:52:22 That's a super interesting question. In a practical sense, I'm actually with you, David, on that definition, right? Because I think that that's the form of intelligence that could be mechanistically or mechanically implemented in a piece of software, as in program, right? So by our own intuition, we can think about doing things like that by breaking that down into steps the kind of intelligence

Starting point is 00:52:52 that you're talking about to me is a bit more like that emergence right like that emergent ability that and i don't think we've gotten to the point where we can perceive what those are. Yeah, or intuitively. It's like a new color. Yeah, intuitively know what exactly we need to do in order for the models to have that spatial awareness or that other kind of ability. Now, we can program that by having segmentation models and having distance calculations and coming up with some mathematical heuristics through which we can claim that we've achieved that capability. But I would argue that the way we learn, us, is not exactly the way we teach the machines how to do that, right? So there's definitely a lot more research and we may stumble upon, you know,

Starting point is 00:53:43 we may basically strike luck and then find out that other kinds of scaling mechanisms or the way it works in physics today is that you have smaller systems, you have a simple interaction mode like magnetization or just collision analysis or the different forces that we're working with, about four of them. And based on these simple interaction modes, you get the entire universe the way we know at

Starting point is 00:54:05 least it's the current theory yeah fundamentals of physics yeah this is a it's the way we we know but it may be another way right we may have just sensed it in our own apparatus of sensing in that kind of form and we're able to explain it the way we explained it but it's still a projection on a on a on a screen that we're looking at and doing our analysis. So I'm super excited about the possibility of us finding out more cognitive routes, so to say, in the way these systems learn. And right now, the best tools that we have essentially in our laboratory of AI are these deep learning tools, the transformers, and many other architectures that are being built around that.

Starting point is 00:54:45 Things like memory- are many other architectures that are being built around that. Things like memory-aware neural network architectures or things like the ability to pull from a vector store and augment the knowledge with the retrieval augmented generation capability. So I feel like the more we add interaction modes and information retrieval and use utilization capabilities within these models, the more possibilities we have to have this additional emerging capability that is a lot more cognitive than the mechanic way we've been doing things. So I think that's an open question. I think it's a beautiful question. And I hope we get lucky in our lifetime to find a way to get that done. Yeah, me too.

Starting point is 00:55:25 We've stumbled upon a lot of random stuff in science, so there's definitely a possibility that we have that happen, which would be big. Yeah, I think it's Richard Feynman that say that science is the belief in the ignorance of the expert. So I think that if we really take it as a basic principle that we could stumble upon some things and then we believe that whatever we know so far is may or may not be the way then we have an opportunity to really incorporate new information and all knowledge that can get us faster and further

Starting point is 00:55:55 yeah i want to pull this back uh a little bit back to some practical stuff again um a big philosophical yeah no i love the philosophical conversation. I love the philosophical conversation. I think that one thing that people think about when they think about generative AI is the problem of hallucinations. And for people that don't know, hallucinating is basically when you generate something

Starting point is 00:56:21 that just isn't right or isn't true. In large language models, you can ask a question and it will confidently lie to you sometimes. And how do you look at how we're going to solve that problem? Because it seems like part of generative AI and part of large language models in general is that it's just parroting information based on probabilities. And those probabilities are not always going to be correct. So you're, I'm assuming, working on ways to make these AIs more accurate. Accuracy is obviously going to be a major problem

Starting point is 00:56:57 and something that we need to solve over the next couple of years. How do you look at solving the hallucination problem? The problem of hallucination, the the problem of hallucination so the way these models work now is that you give it a lot of data and the question you're really asking of it is um give me let's simplify a token to a word and working within the text domain give me the next word based on these words that i gave you right so if you if the if you qualify the problem as write a novel for me or write a paragraph or write a summary of something then traditionally what would happen is that you would give it the beginning of uh a sentence and you would say complete this sentence for me and so

Starting point is 00:57:41 it's that sentence completion sort of say that is based on probability and even basing that on probability in within the context of this conversation is the simplification so there's work there's a lot more going on but the basic principles of how it works is that it would let's assume that it works off of the most probable word to follow that word that existed and then taking that longer sentence as an input figuring out what is the most that word that existed. And then taking that longer sentence as an input, figuring out what is the most probable word that could follow and so on and so forth. What that means if you simplify the problem

Starting point is 00:58:11 just at that level, and if I say, give me a complete sentence, doctor something works at John Hopkins or something like that, then it would just put a name there. Right. Right the the the question you haven't asked is make sure that that name is an existing human being that is really a doctor at jenna hopkins and whatnot right so fundamentally it's a different question to ask off of that

Starting point is 00:58:42 system and then we're back on the reason I call these things a system in the beginning is because yes, you may have a model that gives you the next word prediction or the next token prediction, but then you still need to do a lot more work on top of that input and that output, and even that processing sometimes to make sure that the output and the response that you get out of it

Starting point is 00:59:01 is a truthful one or a real one, right? Or a less toxic one if the answer is toxic and you don't want to serve toxicity to your users. So there are many pre-processing and post-processing activities that need to happen. One, to make sure that the context, I mean, the answer of the model is grounded. We call that concept grounding, grounded in reality. And the second is to make sure that the output of that model goes by a certain set of responsible AI principles. So those are two things. But fundamentally, the way the science works is that it will give you something, whether that thing is true or not. Sure. It's your job to make sure that that thing becomes true.

Starting point is 00:59:41 whether that thing is true or not. It's your job to make sure that that thing becomes true. And so the way that happens then now is that you need to associate that response to basically a source of truth, right? What is truth? What is truth? What is reality? And that's another thing where you,

Starting point is 01:00:01 another reason why you probably want to contain and contextualize that use case so to say down to a source of truth right uh give me dr blah that works at john hopkins then you need to probably have a database of all the doctors that work in that hospital and make sure that after you get the name of a doctor because the model will give you that you check that against that database and if that person doesn't exist or or you can say, fill this specific spot off of the names, the list of names on the database and constrain and constrain. So, so that's why Bard now has that Google button where you can ask a

Starting point is 01:00:35 question and then you can double check it. That's another context. That's another mechanism for, for that, but that's not exactly why it has that. It has that button. Just to land on the concept of hallucination. So it was named hallucination because it could give you some answers that seem real, but they're not necessarily real. But this is a normal functioning mode of these technologies. The reason it took us a while to release BART, for example, was not because, well, we invented the transformers. So we've known how to do this thing for a long time.

Starting point is 01:01:10 But it's all of the additional set of technologies that we had to build and principles that we had to really build around the behaviors of a model that really get us to, one, the requirements of building additional technologies. and then two, the challenge around making these technologies deterministic in the sense that you always want a specific answer. So you have to do a lot more evaluations. You have to do a lot more checks and balances. You have to add a number of metrics, like is this model answering a question when it doesn't know the answer? You probably want to codify that into something that gets checked yeah and so on and so forth so there's been a lot of work that we've done on one uh really having clear and concise responsible ai's um principles

Starting point is 01:01:55 and then two turning those into technologies and or checking mechanisms that could work in conjunction with the creation the operation uh and the operation of a model. And then three, making sure that these cores and outputs of checks are available so that that technology could be used on the cloud ecosystem, for example, as part of the platform. So we work with research to understand what are these responsibility principles

Starting point is 01:02:19 that could be turned into metrics and guardrails and so on. Those get turned into product capabilities that work alongside our models. And then these models are exposed or commercialized, so to say, on our cloud platform, this product called Vertex AI. And you can go find that out on cloud at Google. So that's how we're essentially fighting

Starting point is 01:02:41 the problem of hallucination. There's a lot more work going on in that space okay well i think i'm going to close it out here soon but i um i want to end with asking if you think that there's anything that we missed anything that people would gain a lot from hearing about that they just are not hearing in popular media that's very important to the whole AI story? Two things, maybe. One is the consumer applications of AI, so BARD, ChatGPT, are very popular now, which is something that I'm happy about because I think that it's really bringing the conversation closer and closer to everyone. And you and I have been working in tech for a while so we may have been aware of that coming

Starting point is 01:03:25 up and coming together but I think it's a massive opportunity that today people that are news editors or writers or artists or folks that work in different domains can use some of these things to help them write better to help them generate images that they can use as part of the content that they produce and create, to write better letters, to write better, to do homework, and so on and so forth. So I really love the consumer application. But one of the things that I don't think is talked about a lot is the developer experience and also the way the barrier of entry from creativity and product generation product creation standpoint it's getting really really lower with these set of technologies and so i i really think that we are at the cusp of a new form of economy where creation of valuable items of different kinds of forms

Starting point is 01:04:31 would not just be a matter of a few being able to do that because they have a high training and they've spent years doing, I don't know, an undergrad in computer science and so on and so forth. But if you bring that level of assistive creativity abilities to the masses, so to say, I found that people have ideas. It's like people are creative. If you sit down and you tell someone, let me take away the problem of knowing how to implement these ideas. Let's talk about your ideas. You get many ideas to start emerging. take away the problem of knowing how to implement these ideas let's talk about your ideas you get many ideas to start emerging so i think that we're really at the border of a transformation where the economy may take a different form if different people without the need to really

Starting point is 01:05:18 understand in details how to implement some of these ideas are able to one, iterate on the ideas with the assistance of generative AI to validate some of these ideas with the ability to prototype those in a matter of hours rather than years. And then three, test these ideas in the ecosystem and maybe find value for different people that they could commercialize these ideas for. So I'm very optimistic about the possibilities of this in the future. All right. Well, the last thing we're going to do, we have a little game here that we play when we bring guests on where we figure out how quickly they can type the alphabet.

Starting point is 01:05:56 It's a running scoreboard. You can use either the MacBook keyboard. It's a keyboard thing. Yeah, it's a keyboard test. Now I get it. Yes. Okay, so you'll take that. So you get...

Starting point is 01:06:09 I'm going to ask the AI to type this thing for me. So you get three chances. Wait, what is the most optimized way of typing the whole alphabet? As soon as you start typing, it starts. So as soon as you type the letter A, it'll'll start and does he have to hit enter at the end no you don't have to enter as soon as you hit z it'll finish got it and so and uh no typos is allowed so uh if you miss a letter like let's say you miss b and go on to c it will not have to hit B. You have to hit every single letter. And you'll see at the top where it says type A.

Starting point is 01:06:47 Okay. That will tell you the letter you're supposed to use. Do we give people tests at all, or is it just three chances? It's three chances. There's three total chances. Okay. Ready? You've got to hit G.

Starting point is 01:07:07 Oh. It's harder than it looks. Definitely a lot harder than it looks. That's okay. You gotta hit J. This is why you get three chances. Don't worry about it. I was extremely slow.

Starting point is 01:07:24 Okay. So first run, 26 seconds. Now I'm going to just... Hit reset. Can I change the keyboard? Yeah, you can change the keyboard. All right, so now I understand why there are options. Yeah. So we have...

Starting point is 01:07:39 We have mechanical keyboard. We also have the butterfly keyboard that Apple sells. Let me know. I i'm gonna get a key all right so i'll go mechanical you're mechanical let's do it set us up however you want round two go for it nice okay 26 to 9 yeah 26 to 9.8, 26 to 9.8. Much better. That's a big come up.

Starting point is 01:08:08 Much better. Last try. Last try. But you guys aren't impressed. That means that I'm not. 9 is pretty good. I'm not that high. 9 is not bad.

Starting point is 01:08:15 I was not far in front of that. 9 is actually really good, especially for a second. We've seen some things in here that you would not believe. I'll show you the scoreboard after this and you'll be surprised. Okay, ready? Ready? Go. Nice.

Starting point is 01:08:39 8. 8.73. Not bad. Honestly, not bad. Okay. Where's that on the leaderboard, David? So here's the leaderboard. Fastest, Tom Scott, 3. Not bad. Honestly, not bad. Where is that on the leaderboard, David? So here's the leaderboard. Fastest Tom Scott, 3.5 seconds.

Starting point is 01:08:49 It was insane. That was crazy to watch. It was just pfft. So, let's see. 8.73 is right above Brandon. Wow. Actually, no. Faster than David Blaine, too. You beat David Blaine

Starting point is 01:09:05 He might be a magician but you're a magician on the keyboard Wow 8.73 You also beat Hasan Minhaj Hey Hasan So you beat Hasan Minan Minaj David Blaine and Brandon

Starting point is 01:09:26 nice nice cool alright well thank you again thanks for having me seriously thank you for coming where can people find you

Starting point is 01:09:33 on the internet well I'm Dean Banga on XNow nowadays and I'm on LinkedIn as well

Starting point is 01:09:43 so as Dan Banga essentially awesome we'll link that in the description and do you want to shout out any projects Okay. Nowadays. And I'm on LinkedIn as well. So as Dan Banger, essentially. Awesome. We'll link that in the description. And do you want to shout out any projects that you're finishing up right now or working on right now? That people can see at Google? So the Vertex AI platform is really the platform that I'm working on.

Starting point is 01:09:59 Right. So that's what we put our solutions on. that's what we put our solutions on. And I would say that look forward to many other more industry slash domain adapted capabilities around LLMs because I think that large models are a big thing and I think it requires a lot of additional technologies to actually make it work in applications. And I think that this is about the time where we need to come up with things like design patterns. So if you think about a Gang of Four, for example, it's a book that was needed

Starting point is 01:10:32 when programming needed some kind of structure. So I think we are at a place in time now where we need some kind of structure on how we build and deploy large models in enterprise environments. And that's something that I'm working on. Awesome. Sweet. Well, everyone watching and listening at home, if you were surprised that we had an episode today,

Starting point is 01:10:50 don't worry. We have a normal episode coming on Friday. This was just a little extra story for you. So, hope you enjoyed it. And we'll see you on Friday. Cheers. Peace. Thank you. AWS. AWS Generative AI gives you the tools to power your business forward with the security and speed of the world's most experienced cloud.

Your Ad Here

Waveform: The MKBHD Podcast - How Does AI Actually Work?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.