The Changelog: Software Development, Open Source - Bringing Whisper and LLaMA to the masses (Interview)

Starting point is 00:00:00 what's up friends this week on the change law we're talking with georgie gergenhoff about his work on whisper.cpp and llama.cpp you're gonna love it he first crossed our radar with whisper cpp which is his port of open ai's whisper model in c and c plus plus whisper is a speech recognition model enabling audio transcription and translation, something we're paying very close attention to here at Changelog. Between the invite and the show's recording, Georgie had another hit project on his hands,

Starting point is 00:00:36 Llama.cpp. This is a port of Facebook's Llama model in C and C++. Whisper CPP made a splash, but Llama.cpp is growing in GitHub stars faster than StableDiffusion did, which was a rocket ship in and of itself. A massive thank you to our friends and our partners at Fastly and Fly. Our pods are fast to download globally because Fastly, they are fast globally. Check them out at Fastly.com.

Starting point is 00:01:05 And our friends at Fly let you put your app App and your database close your users all over the world with no ops. Learn more at fly.io. This episode is brought to you by our friends at Postman. Postman helps more than 25 million developers to build, test, debug, document, monitor, and publish their APIs. And I'm here with Ken Lane, Chief Evangelist at Postman. So Ken, I know you're aware of this, but companies are becoming more and more aware of this idea of API-first development. But what does it mean to be API-first to you? API-first is simply prioritizing application programming interfaces or APIs over the applications they're used in.

Starting point is 00:01:58 APIs are everywhere. They're behind every website we use, every mobile application, every device we have connected to the Internet, our cars. And what you're doing by being API first is you are prioritizing those APIs before whatever the application is that is putting them to work. And you have a strategy for how you are delivering those APIs. So you have a consistent experience across all of your applications. Take us beyond theory, break it down for me. What changes for a team when they shift their strategy, when they shift their thinking to API first development? So when your one team goes, hey, we're building a website, It's got a address verification and part of an e-commerce solution.

Starting point is 00:02:46 That web application is using the APIs to do what it does. And then when another team comes along and goes, hey, we're building a mobile application that's going to do a similar e-commerce experience and may use some of the similar API patterns behind their application, they need address verification, that that's a consistent experience. Those two teams are talking rather than building in isolation, doing their own thing and reinventing the wheel. And then when a third team comes along,

Starting point is 00:03:17 needs to build a marketing landing page that has address verification, they don't have to do that work because those teams have already collaborated, coordinated, and address verification is a first class part of the experience. And it's consistent no matter where you encounter that solution across the enterprise. And that can apply to any experience, not just address verification. Very cool. Thank you, Ken. Okay, the next step is to go to postman.com slash changelog pod. Again, postman.com slash changelog pod. Sign up and start using Postman for free today.

Starting point is 00:03:53 Or for our listeners already using Postman, check out the entire API platform that Postman has to offer. Again, postman.com slash changelogpod. Well, exciting times to say the least. Welcome to the show, Jorgi. Nice to have you here. Thank you for the invite. You bet. And happy to have you on your first podcast. So we're having a first here.

Starting point is 00:04:42 Yeah, I'm a bit excited. We're excited too. I wasn't sure you were going to say yes. You're a a first here. Yeah, I'm a bit excited. We're excited too. I wasn't sure you were going to say yes. You're a very busy guy. You have, well, at the time that I contacted you, you had one project that was blowing up. Now, since then, you have a second project that is blowing up even faster, it seems. The first one, whisper.cpp, which we took an interest in for a couple of reasons. And now, llama.cpp, which is like brand new this week, hacked together in an evening, and is currently growing

Starting point is 00:05:12 on GitHub stars, according to the thing you posted on Twitter, at a faster rate than stable diffusion itself. So, man, what's with all the excitement? Yeah, that's a good question. I still don't have a good answer for it. But yeah, I guess this is all the hype these days. People find this field to be very interesting, very useful. And somehow with these projects and with this approach that I'm having,

Starting point is 00:05:45 like coding this stuff in C and C++, and running it on the CPU, it's kind of generating additional interest in this area. Yeah. Yeah, so far it feels great. I'm excited how it evolves.

Starting point is 00:06:02 Yeah, it's pretty cool. I think that these large language models and AI has very much been in the hands of big tech and funded organizations, large corporations, and some has been open source and we started to see it kind of trickle down

Starting point is 00:06:18 into the hands of regular developers and open AI, of course, leading the way in many ways. They have their Whisper speech recognition model, which allows for transcription, allows for translation. And your project, Whisper.cpp, which is a port of that in C and C++, was really kind of an opportunity for a bunch of people to take it and get our own hands on it, you know, and run it on their own machines and say, okay, all of a sudden, because this model itself has been released, I don't need to like use an API. I can run it on my Mac book. I can run it on my iPhone. I can run

Starting point is 00:06:55 it on, well, the new ones are getting run on pixels. It's getting run on Raspberry Pis, et cetera. And that's, that's exciting. So I was just curious when you started whisper.cpp why did you decide to code that up what was your motivation for starting that project yeah i'll be happy to to tell about a little bit the story about how it came together but as you mentioned yes like the big corporations like they're producing and holding most of the interesting models currently. Right. And being able to run the consumer hardware is something that sounds definitely interesting. Okay, so Whisper CPP, it was kind of a little bit of luck and good timing. Actually, most of the stuff has been this way. And how it started.

Starting point is 00:07:51 So Whisper was released in the end of September last year. And by that time, I was basically a non-believer, like non-AI believer. I didn't really believe in this. I didn't really believe much this. I didn't really believe in much in the neural network stuff. And like, I don't know, like more conservative point of view, like I was wondering usually like, why are these people wasting so much efforts on this stuff? But I'm totally ignorant point of view.

Starting point is 00:08:19 I wasn't really familiar with the details and stuff like this. But when Whisper came out, I was happened to be working on a small library. It was like a kind of a hobby project. Basically, this is the ggml library, which is at the core of Whisper CPP. And it's a, like a very tight project implementing some tensor algebra. I was doing this for some machine learning tasks like work-related stuff also, but I

Starting point is 00:08:51 usually hack quite a lot of projects in my free time like side projects, like try to find some cool and interesting ideas and stuff like this. Usually I do this in C++ but I was looking to change change it a little bit. And so GDML was an attempt to write something in C,

Starting point is 00:09:12 like real men do. For sure. Yeah. So I was working on this library. I wanted to have like some basic functionality, like make it kind of efficient, very strict with memory management, avoid unnecessary memory allocation,

Starting point is 00:09:33 have multi-training support. This is some kind of a tool that you can basically use in other projects to solve different machine learning tasks. I wasn't thinking about neural networks at all, as I mentioned. It was kind of not interesting to me at that point. Okay, so I had some initial version of GDML, and there was some hype about GPT by that time, I guess.

Starting point is 00:10:01 And also, I was definitely inspired by Fabrice Ballard. Like he had a similar tensor library, liplnc, I think it's called. And there was an interesting idea to try to implement a transformer model. Like GPT-2 is such a model. And I already had the tools, like had the necessary functionality. So I gave it a try. I actually found some interesting kind of a blog post or tutorial

Starting point is 00:10:33 like GPT Illustrated or something like this, GPT tool. I went through the steps. I like implemented this with GML. I was happy. It was running. We were generating some junk like random

Starting point is 00:10:47 I posted I think I posted on reddit maybe also hacker news I forgot but basically no interest and I said okay like I guess that's not very interesting to people let's move on with

Starting point is 00:11:04 other stuff. Like the next day or the day after that, I don't know, Whisper came out and I opened the repo, I opened it, I looked at the code and I figured like basically this 90% I have the code already written for the GPT tool because like the transformer model in Whisper it's

Starting point is 00:11:28 kind of very similar to GPT-2 I mean there are obviously quite some differences as well but the core stuff is quite similar. So I figured okay I can easily port this it might be interesting to have it running on a CPU I know

Starting point is 00:11:44 everybody is running on a CPU. I know that everybody's running on GPUs, so probably it will not be efficient. It will be not very useful, but let's give it a try. And that's

Starting point is 00:11:54 basically how it came. And yeah, it slowly started getting some traction. So Whisper

Starting point is 00:12:03 was interesting to me immediately for a couple of reasons. First of all, we obviously have audio that needs transcribed and we always are trying to improve the way that we do everything. And so automated transcriptions are very much becoming a thing. More and more people are doing them. So that, first of all, I was like, okay, a Whisper implementation that's pretty straightforward to use on our own.

Starting point is 00:12:26 Obviously, you call it a hobby project. Do not use it for your production things. Do not trust it, but it's proven to be pretty trustworthy. And then the second thing that was really cool about it was just how simple it was, insofar as the entire implementation of the model is contained in two source files, right? So you have it broken up into the tensor operations and the transformer inference.

Starting point is 00:12:49 One's in C, the other's in C++. And that's just, as a person who doesn't write C and C++ and doesn't understand a lot of this stuff, it still makes it approachable to me where it's like, okay, this isn't quite as scary. And for people who do know C and C++, but maybe not know all of the modeling side and everything else involved there, very approachable. So A, you can run it on your own stuff, CPU-based.

Starting point is 00:13:15 B, you can actually understand what's going on here if you give these two files a read, or at least high level. And so I think that was two things about Whisper that were attractive to me. Do you think that's what got people the most interested in it? The other thing is it was very much pro Apple, Silicon, pro M1.

Starting point is 00:13:33 And a lot of the tooling for these things tend to be not Mac first, I guess. And so having one that's like, actually, it's going to run great on a Mac because of all the new Apple, Silicon stuff, I guess that was also somewhat attractive. Yeah. So yeah, what made it attractive, I guess, as you said, okay, the simplicity, I think

Starting point is 00:13:53 definitely it's like about 10,000 lines of code. It's not that much, but overall like this neural networks set the core there. They're actually pretty simple, like it's simple matrix operations, additions, multiplications, and stuff like this. So it's not that surprising. Yeah, another thing that generated interest was this Python. It's a bit tricky topic, but yes, so I was mostly focused on learning this on Apple. And so I'm like, I don't use Python a lot and pretty much I don't use it at all. And I don't really know the ecosystem very well.

Starting point is 00:14:36 But what I figured is you basically, if you try to run the Python code base on M1, it's not really utilizing yet the available resources of these powerful machines yet, because if I understand correctly, it's kind of in the process of being implemented to be able to run these operations on the GPO or the neural engine or whatever. And again, maybe it's a good point to clarify here. Maybe there's some incorrect stuff that I will say in general about Python and transformers and stuff like this. So don't trust me on everything because I'm just kind of new to this.

Starting point is 00:15:22 Fair. Okay, so you run it on M1. The Python is not really fast. because I'm just kind of new to this. Okay, so you run it on M1. The Python is not really fast, and it was surprising when I ran it with my port. It was quite efficient because for the very big matrix multiplications, which are like the heavy operations during the computation,

Starting point is 00:15:42 I was in the encoder, like in the encoder part of the transformer, and running those operations with the Apple Accelerate framework, which is like an interface that somehow gives you extra processing power compared to just running it on the CPU. So it was, yeah, it was efficient

Starting point is 00:16:03 during Whisper CPP. I think people appreciated that. There was, I said it was a bit tricky because there was this thing with the text decoding mode. So, yeah, I'll try not to get into super much details, but

Starting point is 00:16:19 there are like basically two modes of decoding. The text-like generating generating the transcription they call it greedy approach and beam search and beam search is much heavier to process in terms of computational power compared to the greedy approach i just had the greedy approach implemented and it was running by default. While on the Python repo is the beam search running by default. And I tried to clarify this in the instructions. I don't think a lot of people really notice the difference.

Starting point is 00:16:57 Yeah. So they're comparing a little bit tapos to oranges. Oh man, good pun. I'm curious what it takes to make a port. What exactly is a port? Can you describe that? So obviously Whisper was out from an open AI that was released.

Starting point is 00:17:16 What exactly is a port? How did you sort of assemble the pieces to create a port? Yeah, I think port is not super correct word, but I don't know. Usually you port. Yeah, I think port is not super correct word, but I don't know. Usually you port some software, you can port it from some certain architecture, like, I don't know, running on a PC

Starting point is 00:17:36 and then you port it there, implement it and it starts running on a PlayStation or whatever. Like this kind of makes more sense to call it port. Here is just maybe a reimplementation. It's more correct to say, but basically the idea is to implement these computational steps.

Starting point is 00:17:58 And the input data, the model, the weights that were released by OpenAI, they're absolutely the same. You just load it, and instead of computing all the operations in Python, I'm computing them with C. Right. That's it. Gotcha.

Starting point is 00:18:18 You probably recall when Whisper first dropped, I did download it and run one of our episodes through it. And this was back, remember on Get With Your Friends with Matt, I was talking about my pip install hesitancy. Yes. Like some of that is with regards to Whisper, because like Yorgi, I'm not a Python developer, and so I'm very much coming to Whisper

Starting point is 00:18:38 as a guy who wants to use it to his advantage, but doesn't understand any of the tooling, really. And some kind of prodding out of Blackbox and following instructions. And I got it all installed and I ran everything and I took one of our episodes and I just kind of did their basic command line flags that they say to run with the medium model

Starting point is 00:19:00 or whatever it was, kind of like the happiest path. And I ran our episode through it and it did a great job. It transcribed our episode in like something like 20 hours on my Mac. And so remember at that time, Adam, we're talking about, well, we could like split it up by chapter and send it off to a bunch of different machines and put them back together again because we're like 20 hours is well faster than our current human based transcriptions. But still, it's pretty slow. And I did the same thing with Georgie's whisper.cpp when he dropped it in

Starting point is 00:19:29 September or October or whenever that happened to come out. And again, just the approachability of like, okay, clone the repo, make, run the make command. Right.

Starting point is 00:19:39 And then run this very simple dot slash main, whatever, pass it your thing. Yeah. And the exact same episode, it was like four or five minutes versus 20 hours. Now, I could have been doing it wrong. I'm sure there's ways of optimizing it. But just that difference was, okay, I installed it much faster.

Starting point is 00:19:56 I didn't have to have any of the Python stuff, which I'm scared of. And at least in the most basic way of using it, each tool, it was just super fast in comparison. And so that was just exciting. I'm like, oh, wow, this is actually approachable. I could understand if I needed to. And it seems like, at least on an M1 Mac, it performs a whole lot better with pretty much the same results.

Starting point is 00:20:20 Because like Georgie said, it's the same models. You're using their same models. You're just not using all the tooling that they use that they wrote around those models in order to run the inference and stuff. You're speaking to the main directory in the examples folder for whisper.cpp. There's a readme in there that sort of describes how you use the main file and pass a few options, pass a few WAV files, for example,

Starting point is 00:20:45 and out comes a transcript wherever using different flags you can pass to the main.cpp C++ file, essentially, to do that. Yeah, so, yeah, regarding the repo and how it is structured, I kind of have an experience with, I know what people appreciate about such type of open source project

Starting point is 00:21:09 and it should be very simple. Like every extra step that you add, it will push people. So I wanted to make something like you clone their Apple, you type make and you get going. And yeah, that's how it correctly works and they read me their instructions how to use it.

Starting point is 00:21:26 I guess to preface that, or to suffix that, the quick start guide, or at least the quick start section of your README says you build a main example with make, and then you transcribe an audio file with dot slash main, you pass a flag of dash F, and then wherever your WAV file is,

Starting point is 00:21:42 there you go. It's as simple as that once you've gotten this built on your machine. Yeah, exactly. There are extra options. You can fine-tune some parameters of the transcription and processing. By the way, it's not just, okay, the main is like the main demonstration. With the main functionality for transcribing WAV files. But there are also additionally a lot of examples.

Starting point is 00:22:08 That's one of the interesting things also about WhisperCPP. I tried to provide very different ways to use this model. And they're mostly just basic hacks and some ideas like from people like wanting some particular functionality, like doing some voice commands like Siri, Alexa, stuff like this. So there are a lot of examples there and people can like take a look and get ideas for projects. This episode is brought to you by Sentry. They just launched Session Replay. It's a video-like reproduction of exactly what the user sees when using your application.

Starting point is 00:23:05 And I'm here with Ryan Albrecht, Senior Software Engineer at Sentry and one of the leads behind their emerging technologies team. So Ryan, what can you tell me about this feature? Well, Sentry has always had a great error and issue debugging experience. We've got features like being able to see stack traces and breadcrumbs. So you've got a lot of context about what the issue is, but a picture is worth a thousand words and a movie is probably worth a thousand pictures. And so session replay, it's going to give you that video-like experience where you can actually click through from your issue and see how did the user get into the situation? What was the error? And then what happened afterwards? That's pretty cool. Okay. So this point is kind of intended, but can you

Starting point is 00:23:41 paint a more visual picture for me? So when you open a replay for an error, what you see on the screen is on the left side, you've got a video player with the play pause button on the bottom. You can adjust the speed. And on the left side, you've got all your developer tools, consoles there, the network is there. You can dig in to see like what requests were failing. What were the messages that your application was generating? And you can scrub through the video backwards and forwards to understand what happened before and after this issue, what was leading up to it, and what do you have to go and fix? Very cool.

Starting point is 00:24:10 Thank you, Ryan. So if you've been playing detective, trying to track down support tickets, read through breadcrumbs, stack traces, and the like, trying to recreate the situation of a bug or an issue that your application has, now you have a game-changing feature called Session Replay. Head to Sentry.io and log into your dashboard. It's right there in the sidebar to set up in your front end. And if you're not using Sentry, hey, what's going on?

Starting point is 00:24:34 Head to Sentry.io and use the code PARTYTIME. That gets you three months for free on the team plan. Again, Sentry.io and use the code party time so going one layer deeper maybe not even necessary for everyone else but for you and i jerry maybe this is more pertinent limited to 16-bit wav files why is the limit to 16 you can we often at least i record in 32-bit so when i'm recording this i'm tracking this here in audition my wav files are in 32-bit because that gives a lot more information. You can really do a lot in post-production with effects and stuff like that or decreasing or increasing semblances and just different stuff in audio to

Starting point is 00:25:36 kind of give you more data. And I guess in this case, you're constrained by 16 bit wave files. Why is that? Yeah, the constraint is actually coming from the model itself. Basically, OpenAI, when they trained it, I think they basically use this type of data format, I guess. So you have to give to the model the input audio that you give. It has to be 16, wait, not 16-bit, 16 kilohertz, right? 16-bit WAV files is in your README, so I'm going based on that.

Starting point is 00:26:09 It's a problem, a mistake. No. Oh, okay, it's, yeah, yeah, yeah, it's 16-bit PCM. Okay, it's just integers, not lots. Yeah, okay, so it's 16-bit and it's also 16 kilohertz. But yeah, technically, you can resample and convert any kind of audio, whatever sample rate to 16. And yeah, I mean, would you get better results if the model was able to process higher sample rate or higher habits?

Starting point is 00:26:44 It's just one less step really because you've got the FFmpeg in here now so you have one more dependency really in the chain of if we were leveraging say this on a daily basis for production flows to get transcripts or most of the way for transcripts so that's just one more step really. It's not

Starting point is 00:26:59 really an issue necessarily it's just one more thing in the tool chain. Yeah that's the drawbacks of C and this environment. You don't have like Python. You just pip install or whatever and you can create party libraries. Here it's more difficult and you have to stick to like the basics.

Starting point is 00:27:18 So your examples have a lot of cool stuff. Karaoke style movie generation, which is experimental. You can tweak the time stamping and the output formats kind of to the hilt to get exactly what you're looking for. And then also you have a cool real time audio input example where it's basically streaming the audio right off the device into the thing and, you know, saying what you're saying while you're saying it or rightly right after you say it. I hear the next version,

Starting point is 00:27:45 it's going to actually do it before you say it, which will be groundbreaking. But what are some other cool things that people have been building? Because I mean, the community has really kind of bubbled around this program. Do you have any examples of people using whisper.cpp in the wild or experimentally that are cool?

Starting point is 00:28:02 Yes, this is definitely one of the cool parts of the project. I really like the contributions and people using it and giving feedback and all this stuff. Yeah, I definitely, there are quite a few projects already running. Like, there are people making iOS applications, macOS applications, they're like companies with bigger products

Starting point is 00:28:29 integrating into them. I'm not sure we should say names, but it's definitely being applied in different places. And yeah, I guess another interesting application is at some point we got it even running in a web page and one of the examples does exactly that basically with web assembly you can load the model in a web page in your browser

Starting point is 00:28:54 and basically you don't even have to install like the rep or compile it you just open a browser and you start transcribing you just yeah you still have to load the model, which is not very small. But it's amazing it can run even in a web page. And I think there are a few services, like web services, that popped up using this idea to offer you a free subscription,

Starting point is 00:29:21 oh, transcription. Right. Could you, kind of obvious, but could you deploy that or distribute that through Docker, a Docker container, for example? That way you could just essentially, you know, Docker compose up and boom, you've got maybe a web service on your local area network

Starting point is 00:29:38 if you wanted to, just to use or play with. Yeah, I guess so. Yeah, I'm not familiar with Docker environment, but I think you should be able to do that. to use or play with. Yeah, I guess so. Yeah, I'm not familiar with Docker environment, but I think you should be able to do it. I see people are already using it for the Lama, and I guess there's no reason to not be able to. I don't know the details.

Starting point is 00:29:56 Of course you can do it as a web service, but sometimes you want no dependence on anybody's cloud, whether it's literally a virtual private server that you've spun up as your service or simply, Hey, I want to, you know, use this locally in Docker or something like that. And just essentially you've built the server in there. You've, you've got whatever flavor of links you want. You've got, you know, whisper.cpp already in there and you've got a browser or a, a web server running it just to ping for a local network. You can name the service, whisper.lan, for example. Yeah, you could totally get that done, I think.

Starting point is 00:30:33 So you brought up the fact that people are running this in the browser, in WebAssembly. Opportunistically, I'd like to get on the air my corollary to Atwood's Law that I posted last week on the socials. You guys know Atwood's law, any application that can be written in JavaScript eventually will be written in JavaScript. Well, my corollary, which I'm not going to call it Santos corollary, cause that would be presumptuous. I'm not going to call it that, but I don't have a name for it yet, but it is any application that

Starting point is 00:30:59 can be compiled in WebAssembly and run on the browser eventually will be compiled to WebAssembly and run on the browser because it be compiled to WebAssembly and run on the browser because it's just too much fun, right? The most recent example would be this one. But prior to that, you know, they're running WordPress in the browser now. Not like, you know, the rendered HTML of a WordPress site in your browser, like the backend in your front end, in your browser, because WebAssembly, we just love it. We're going to run everything in it. Why would you do that?

Starting point is 00:31:28 To show everybody that you can do it. Okay. I'm sure there's other reasons, but that was pretty much what their blog post was. The folks who did it, I think it's the wasmlabs.dev folks put WordPress into the browser with WebAssembly because we can do it now.

Starting point is 00:31:47 And so we're going to. So that was just me being opportunistic. Back to you, Gorgie. If we talk about Whisper and the roadmap. So it's 1.2. It's been out there for a while. My guess is it's probably being less important to you now that llama.cpp is out.

Starting point is 00:32:03 But we'll get to llama in a moment. You have a roadmap. On your roadmap is a feature that you know I'm interested in because I told you this when I contacted you. And this goes back to the meme that we created years ago, Adam. Remember how we said that the changelog is basically a Trojan horse where we invite people on our show and then we lob our feature requests at them when they least expect it.

Starting point is 00:32:27 You know, before, as I was preparing for this conversation, I was thinking, Jerry's going to say this in this show for sure. I invited you here to give you my feature request. Right. And to make it more pressure-filled feature request. But I'm just mostly joking because I realize this seems like it's super hard

Starting point is 00:32:44 and you can talk to that. But diarization, I don't know if that's how you say it, speaker identification is the way that I think about it, is not a thing. In Whisper, it doesn't seem. It's certainly not a thing in Whisper.cpp. I've heard that it's, Whisper models aren't even necessarily going to be good at that. There's some people who are hacking it together with some other tools where they're like, they use whisper, then they use this other Python tool. Then they use whisper again in a pipeline to get it done. This is something that we very much desire for our transcripts because we have it already with our human transcribed transcripts.

Starting point is 00:33:20 It's nice to know that I was the one talking and then Georgie answered and then Adam talked. And we have those now, but we wouldn't have them using whisper and it's on your roadmap. So I know it's down there. There's other things that seem more important like GPU and stuff, but can you speak to maybe the difficulties of that, how you'd go about it and when we can, when we can have it? This feature is super interesting from what I get from the responses. Basically being able to separate the speakers. You're right. So basically it's not out of the box supported by the model and there are third party tools

Starting point is 00:33:57 and they're like themselves, like those tools are other networks, like doing some additional processing. And again, I basically have almost absolutely no idea or expertise with this kind of stuff and what works and what doesn't work and basically zero. And there were like a few ideas popping around using Whisper like in not traditional way to achieve some sort of diarization and like

Starting point is 00:34:29 it boils down to trying to extract some of the internal results of the computation and try to classify based on some, let's say, features or I don't know, I'm not sure really how to properly explain it.

Starting point is 00:34:48 So I tried a few things. I know people are also trying to do this. I think it's, I guess it's not working out. So, I don't know, this slow, unlikely, at least from my point of view, maybe if someone figures it out and it really works, we could probably have it someday. But for now, it seems unlikely. It's a pipe dream.

Starting point is 00:35:12 I don't understand why it's a pipe dream. It seems because there's other transcripts and services out there that have it, that are not LLM-based or AI-based. They're just I don't know how they work, honestly. But, for example, I had Connor Sears from Rewatch on Founders Talk a while back.

Starting point is 00:35:30 And one of the killer features, I thought, for Rewatch, so just a quick summary, Rewatch is a place where teams can essentially upload their videos to Rewatch later. So you might do an all-hands demo, you might do a Friday demo of your sprint or whatever, and new hires can come on and re-watch those things or things around the team and whatnot to sort of catch up. It's a way that teams are using

Starting point is 00:35:55 these videos and also the searchable transcripts to provide an on-ramp for new hires and or training or just whomever, whatever. That's how they're using them. He actually came from GitHub, and they had this thing called GitHub TV when he worked there. And Connor's a designer, and long story short, they've had this thing. And so he really wanted the transcription feature, and they have transcripts that are pretty amazing, and they have this diarization, I don't know if that's what they call it,

Starting point is 00:36:26 but they have Jared, Adam, whomever else labeled. Why is it possible there, and why is it such a hard thing here? Yeah, I think the explanation is basically Wispel wasn't designed for this task, and I guess most likely they're using something that was designed for this task. And I guess most likely they're using something that was designed for this task. Some other models, it was trained to do the authorization. And yeah, you can always like plug in some third-party project and our network to do this extra step.

Starting point is 00:37:01 It would be cool to be like being able to do it with a single model, but for now, it would be cool to be able to do it with a single model. Right. For now, it's not possible. Is it kind of like converting your WAV files to 16-bit first before using the model? It's like that's one more step in the mix, basically. Yeah, but it's even worse than that

Starting point is 00:37:18 because it's a much harder step. It's basically like running it through Whisper and then running it through a separate thing, which its entire purpose is segmentation or diarization and then it's like two passes whereas what we're talking about it's like well ffmpeg dash whatever and it's like this is just like the tooling around that is it puts us see for me there are solutions that seem like they're kind of hacky and people are getting them to work, but it's like back in the Python world again. And it's very slow because of that, from what I can tell.

Starting point is 00:37:49 So against Python, Jared. I don't hate Python. This pip install has got you really upset. We've got to solve this. No, it's just I like the simplicity and the straightforward stuff that Georgie does. I just want it in whisper.cpp. I know.

Starting point is 00:38:01 I think maybe whisper too will just support this feature and then we'll all be happy. Right? Like you'll just upgrade your models and everything. You'll, you'll just check it off your roadmap. But if not for something like that, I think it is probably a difficult thing to accomplish just because the models aren't set up to do that particular task.

Starting point is 00:38:20 Like they're just set up for speech recognition, not for, I don't know, speaker classification or whatever you call it. Yeah, with the way things are going lately, I suppose by the end of the month, OpenAI will probably release a new model

Starting point is 00:38:35 which supports everything. The day we ship this episode, it'll support that. This stuff's moving at the speed of light right now. So it probably will be. By the time this ships, it'll probably be a feature of Whisper 2. Yeah, I think so. Hopefully. So the project, I should give it a shout out.

Starting point is 00:38:51 I do not dislike Python. PyAnnote, P-Y-A-N-N-O-T-E, is what people are combining with Whisper in order to get both features through a pipeline. So if you're interested in that, people are doing it. It seems a little bit buggy. They aren't quite happy with the results, but they have some results. You got to be careful because Brett can't listen to the show, Jared.

Starting point is 00:39:14 And sometimes he even reads transcripts. He might just scan for his name or Python, essentially. He's got two searches on our transcripts. Well, now he just brought his name into it. He wouldn't have been able to find it until just now. I've been thinking too behind the scenes that the fact that this runs on Apple Silicon, when you got the ARM thing that's kind of baked in there,

Starting point is 00:39:33 I believe it is called Neon, which I think is pretty interesting, this Neon technology kind of, in that separate sort of super processor or additional processor speed, like what did you learn or have to learn about Apple's Silicon to make this project work? What did you even,

Starting point is 00:39:48 not so much learn to make it work well, but what did you learn about the processor that was like, wow, they're doing that in this consumer grade, pretty much ubiquitous, or available to mostly anybody who can afford it, obviously. What did you learn about their processor?

Starting point is 00:40:01 Yeah, so ARM Neon, this is the name of the instruction set that runs on the Apple Silicon CPUs. When I started GDML, I recently had my shiny new M1. I have

Starting point is 00:40:18 been using it for my workstation, like transitioning from Linux. Oh wow, you're a Linux convert, okay. Yeah, but yeah, this machine is so good, I decided to switch and... You won't go back? I think I'm not going back at any time soon. Elaborate. I'm

Starting point is 00:40:34 listening. Go ahead. So, yeah, I was interested in understanding how to utilize and basically, so this is called like single instruction multiple data seeing the programming where you utilize this instruction set to process things faster and I wanted to get some experience into that so I had this implemented in ggml to like support for the heavy operations to use arm neon and what it requires

Starting point is 00:41:08 to be able to use it just read the documentation and figure out how to properly load and load and store the data in an effective manner it's a tricky stuff in general. I'm no expert by far. So I'll probably mention this at some point, but people are looking at the code lately and they're helping me optimize these parts. They're kind of difficult to code in general. So yeah, Armion is helping for the CPU processing and then there is this extra Apple framework which I'm not really sure which part of the hardware it utilizes. Basically this

Starting point is 00:41:57 is the Apple Accelerate framework. It has a linear algebra API, so you can say, okay, multiply these matrices and so it's really fast. And I think it's really something that is called AMX coprocessor, but it's not super clear to me. I don't really care. It's just fast.

Starting point is 00:42:21 So why not use it? This is one of the optimizations, yeah. What I found interesting when I was researching a little further to prepare for this call was that this is a quote-unquote secret coprocessor. It's called the Apple Matrix Coprocessor, AMX is what they call it. And it's not very well known,

Starting point is 00:42:40 and so as this Apple Silicon is only a couple years old, it's not that old, so even examining or, even examining or building new software technology on top of it. But like this is, I think we have to look at, you know, one of the many reasons that Apple chose to abandon Intel and go their own route. And obviously a lot of the work they did in their mobile devices from an iPhone to an iPad and all the things happening in their processors led them to this direction. But even this, the accelerated coprocessor that is there secretively, essentially just waiting to be tapped into, is kind of interesting just because it does what it does. Yeah, I guess when you make your own hardware and software,

Starting point is 00:43:21 you definitely get some advantages compared to not doing it. So I think it's a good approach. I like this way. We're even speculating too on like Apple and artificial intelligence. And maybe this is the glimpse into their genius that is not yet revealed. Because if you can do what you've done with this coprocessor and this neon arm technology and this AMX, this Apple matrix coprocessor, we have to wonder what are the reasons why they went this route?

Starting point is 00:43:54 One, it couldn't be just simply to put it into our hands, but to put into our hands for them to use in some way, shape or form. So it's got to make you wonder what the future for them might be in AI because they are really black box and secretive in terms of new features and new products and things like that. But this might give us a glimpse into that future.

Starting point is 00:44:13 Yeah, true. I don't know. I'm not really competent. As far as I know, the optimal way of a lab is to use Core ML, like some other framework which utilizes like everything, like neural engine, GPU, CPU, whatever. Yeah, and I think they, for example, recently demonstrated how to run stable diffusion with CoreML, quite efficient.

Starting point is 00:44:39 So yeah, I guess like using Accelerate, it's not really something new. It's probably not even the right way to go in the long run. But for now, it works. It's okay. It works for now. It works for, it's good enough for us regular people. So on the Whisper front, I know we should get to Lama here soon because it's the most exciting new thing.

Starting point is 00:45:00 And here we are burying the lead deep into the show like fools. But Whisper is interesting to me. The GPU support, so one of the things about it is it's simple, it's great hardware support, very generic, runs on the CPU. You do have GPU support also on the roadmap. Is that something that

Starting point is 00:45:18 you put it on there because people ask for it or are you actually interested in this? Because it seems like it could definitely complicate things. Yeah, GPU support I avoid because you usually have to learn some framework like CUDA, OpenCL, stuff like this. It's complicated. It takes time to understand everything.

Starting point is 00:45:35 There are some workarounds like using what was it called? MVBus where it kind of automatically does it for you. But I don't know. There will be some probably in the future some basic support I think more interesting is the for Apple hardware

Starting point is 00:45:51 is the transition of the encoder part like one of the heavy parts to the Apple Neuro engine which we already have a prototype and this will kind of speed up the processing even further so that's

Starting point is 00:46:06 Have you been able to run any benchmarks against your prototype or have you gotten to that phase where you're actually seeing how much gains you're getting? Yeah, actually this was a super cool contribution, basically I read about 4ML, I decided I'm probably not going to invest time learning all these complex

Starting point is 00:46:22 tools but certainly one day while I while a contributor like posted, you can see the linked issues in the repo, how to do it, which was super great. And he demonstrated it's possible. We initially observed like three times speed up, I think. Nice. But then other people joined, they showed us even how to make it even better. And I like this because people are contributing,

Starting point is 00:46:50 sharing ideas, making it faster. So I guess at least three times, but this is just the color, the color remains not optimal, so it's not super great overall.

Starting point is 00:47:05 You gotta love that moment with an open source project where you start to get significant contributions, right? Not drive-by, readme fixes, or docs, which are helpful, but not like this is a significant contribution of a new way of doing something or proof of concept.

Starting point is 00:47:21 That's pretty exciting. It seems like your two projects now, especially Whisper, because it's been around a lot longer, has had a lot of very smart coder types not afraid of hopping in and really helping out. Did you do anything to cultivate that,

Starting point is 00:47:38 or is it just the nature of the project that it brings a certain kind of contributor? Yeah, I'm also wondering about this and really enjoy it. So my previous projects, they didn't have a lot of contributions involved. And now with Whisper and Lava, I really like that it's getting attention. Do they do anything specific? Not really. I guess just people find it.

Starting point is 00:48:01 Maybe first of all, they find it useful and they start suggesting ideas for making it even more useful. And then people eventually start journeying to make code improvements and stuff like this. And there is, I think, I don't know, from my perspective, it's a relatively big momentum currently. People are very interested in supporting this yeah i try to like uh make it so they're kind of able to get into it like create some entry-level tasks and things that people can get involved because eventually like currently there are so many requests and issues and all this stuff uh that's kind of very difficult to handle by my own. So it would be nice to have

Starting point is 00:48:47 more people involved. Switching gears now, I think we put the cart before the llama, Adam. I don't know if that rings true to you. I was actually wondering if we should have our good friend Luda bring us in. Llama, llama. Red pajama. You know what I'm saying?

Starting point is 00:49:25 Luda Chris. Llama, llama, red pajama you know what i'm saying i've been dying to do red pajama llama, Llama Llama Drama, just all these rhymes. And I haven't been able to work those in quite yet. I knew you were. Getting to it now, the most exciting thing on the interwebs until I guess GPT-4 stole some steam yesterday. But February 24th, Facebook releases, Facebook Research or Meta AI, who knows what they call themselves these days, released Lama, a foundational 65 billion parameter large language model. And then according to some commentary, a European alpha coder went on a bender one night and ported it to C++ so we can all run it on our Pixel phones.

Starting point is 00:50:02 So that's the story, Yorgi. How do you feel about being called a European alpha bender? European alpha coder. I thought that was a funny way of casting it by somebody on Twitter. Yeah, really. I really like this meme. It originated on Twitter,

Starting point is 00:50:15 like someone calling me an alpha male European or something. I don't know. It's kind of funny. Well, so you did hack this together in an evening. Is that, is that lore or is that true? Yeah, it's,

Starting point is 00:50:29 it's basically kind of true. But again, it's a combination of like factors and good timing and some luck.

Starting point is 00:50:37 Basically, we had the four-bit quantization stuff for the Whisper, just an idea working like where you, basically,

Starting point is 00:50:44 you take the model, you compress it down to four bits. You lose and stuff for the Whisper, just an idea working, like where you basically take the model, you compress it down to four bits, you lose some accuracy, but it's smaller and it processes faster. So we had that like in DGML and it was available. So a few days later comes out the LAMA. I do some calculations, I figure out like, okay, 65 billion parameters, you probably need about 40 gigs of RAM with 4-bit quantization. So this can run on a MacBook.

Starting point is 00:51:16 Why not do it? And yeah, it just was a matter of time to find some free time to try it. And yeah, last Friday, came after work home, had the words downloaded. But yeah, why I was able to do it so quickly. Basically, from what I saw, it's pretty much GPT-J architecture with some modification, like some extra normalization layers. And some minor changes. Basically, again, the existing call for the GPT-J, I just simply modified it there.

Starting point is 00:51:51 It happened pretty quickly. You had a leg up. Prior art helped you that you created. Yes, yeah. So that quote, success is when, what is it, Adam? Preparation meets opportunity. Right, so like Georgi was perfectly prepared Quote, success is when, what is it, Adam? Preparation meets opportunity. Right? So like Georgie was perfectly prepared between this GGML library that he'd previously developed

Starting point is 00:52:10 and this knowledge he has. He was like primed for this position. For sure. Yeah. Which is great. I love that when that happens in my life. And so I applaud that moment for you because I mean, when you're in the trenches and you feel like you're like in the wilderness and you put some code out there, and in the case of Whisper.cpp, you get a glimpse of your hacker direction, your hacker sense.

Starting point is 00:52:35 You feel like you want to use a spidey sense kind of play on words. And you've done it again. Why not port another popular direction for artificial intelligence in everyday life. Boom. Done. That's my hype way, Jared. Boom.

Starting point is 00:52:51 Done. I like that. Boom. Done. Boom. Done. Right off into the sunset. Yeah.

Starting point is 00:52:56 So why do you think people are so excited about this one in particular? So I guess, you know, whisper is very much for audio. It's, it's more scoped to a smaller domain, whereas Llama is like your typical text autocomplete thing. Like it's going to do, like create your own chat GPT

Starting point is 00:53:15 is basically sort of not the pitch, but it's more akin to that. And chat GPT is so interesting and sticky for people that this is like, okay we can now we can go build our own little text ais is that what you think is why it's like like if you if you check the github stars on this thing like the chart it's pretty much straight vertical like it just goes straight up the y-axis it doesn't there is no x-axis i'm exaggerating a little bit for dramatic effect but you know what i mean. People are really, really

Starting point is 00:53:46 running this thing. Yeah, I'm also wondering, I don't have a good answer. I guess it's the chat GPT hype. Yeah. Doing inference locally, having your chat assistant on your device and stuff like this.

Starting point is 00:54:02 I don't know, I personally just try to kind of keep it real. As I told you, I was a non-believer a few months ago. Now it's hard to ignore. It seems to be working. It does work. You actually seem less excited about this than anybody else who's been posting

Starting point is 00:54:23 onto Mastodon and Twitter. I'm running it on my Pixel phone, one token per second, obviously slow. I've got it running on my MacBook. It's over here on this Raspberry Pi 4 now. People have kind of been invigorated by it, but what I'm getting from you, Georgi, is it's cool, but maybe Whisper is even cooler. cool, but it's not like, maybe

Starting point is 00:54:45 Whisper is even cooler. Yeah, I find, actually, I find Whisper much more useful. Like, it solves a very well-defined problem, and it solves it really good. Yeah. So, with the third generation, I mean, okay, it's developing quite fast. I don't have, like, I personally haven't seen anywhere, let's not go in this direction,

Starting point is 00:55:08 but yeah, I think people are just basically excited to be able to run this locally. I'm mostly doing it for fun, I would say. And did you have to agree to those strict terms to get access to the model from Facebook?

Starting point is 00:55:29 I submitted the form. Okay. Did you read the terms? Did you get the memo? Yeah, yeah. Of course I read them. Okay. That's for sure.

Starting point is 00:55:39 Why? Did you read them, Adam? I haven't read them. I'm paraphrasing from Simon Willison's article on the subject when he says you have to agree to some strict terms to access the model. So I just assumed that you were cool with the strict terms. I'm in quotes here. You can't see me on video.

Starting point is 00:55:56 The strict terms. Yeah, I'm not distributing it. Right. So I'm not distributing the word. So I think that's totally fine. Is that kind of how you agree to an end-user license agreement? You scroll to the bottom and hit the checkbox? Not you, but the royal you, like everybody.

Starting point is 00:56:13 Yeah, of course, you just hit agree. Yeah, exactly. I actually had a friend who had a great idea for that back in the day where you could provide EULA acceptance as a service, and you just go and you live somewhere where no EULAs apply or something like out there in the middle of the ocean, you know, and you then outsource the check, the checking of the checkbox, you know, people could just have you check it for them. Yeah. And so they both get the checkbox checked, but then they have plausible deniability

Starting point is 00:56:43 because they didn't actually check it. And then one person just checks it for all of us. But that person's outside of any jurisdiction. And so we win. What do you think, Adam? I love it. I'm going to subscribe to that. Please, Jared, put that link in the show notes so I can follow it and utilize that link. How cool would that be?

Starting point is 00:56:59 That would be cool. So now you have these two projects. One is kind of taking off, at least at the moment, more than the other one. Maybe it's merely on a hype wave. Maybe there's more to it than that. Obviously, there'll be more models released soon that also need to be ported over for us. Where do you go from here? Where do you take it?

Starting point is 00:57:18 Are you dedicated to doing more for Whisper? Do you think Lama is where you're going to put your time? Do you not care about any of these things? You're just having fun? Because I know this is just like fun for you, right? This is not your job. Yeah, I'm doing this basically in the free time. And I don't know, for the moment, I just plan to try to make it a bit more accessible, maybe attract some people to start contributing and help out because there are quite a lot of requests already popping up. And my personal interests are just try to do some other fun, cool demos and

Starting point is 00:57:57 tools and examples and stuff like this. I don't know. I kind of, from one point of view, I don't really want to spend super much time into these projects. I prefer to get them into hopefully into a state where other people are

Starting point is 00:58:17 helping out so I can do other stuff. So in terms of extensibility, you said by way of allowing others to come into the project, contribute code, help you move it along. I assume part of that is desires for other integrations with like popular C++ libraries or frameworks. Our good friend, ChatGPT4, as a matter of fact,

Starting point is 00:58:41 that's the model I'm using to get this request. Something like OpenCV or I believe it's called EGEN and potential other advantages for integrations. Are you thinking about stuff like that where other C++ applications or libraries can leverage this work to sort of take it to the next level or do other things with it? To give an example, OpenCV is a real-time optimized computer vision library. It offers different tools.

Starting point is 00:59:07 And EGEN, I believe, is something similar where it's more around linear algebra, matrices, vectors, numerical solvers, etc., related to algorithms. Have you thought about that kind of other angle where it's not so much just you, but leverage of this in C++ land. Okay, so my point of view for these projects, I prefer things to be super minimal and without any third-party dependencies. And I just prefer it like this. I keep things simple and don't rely on other stuff. If you ask the other

Starting point is 00:59:47 way around, could other projects use GDML? That's my angle. Can they use you? I'm thinking about it and I guess GDML is kind of like, I would say it's a beginner level framework. They're more

Starting point is 01:00:02 advanced and mature frameworks for this type of processing for sure. Like, and even probably more efficient. I guess there is hype

Starting point is 01:00:13 around GDML because it's kind of simple and you can tweak stuff easily and these things. But if you want to make something like a quality product,

Starting point is 01:00:22 let's say, or something like more production, you probably should use some existing and well-established framework, let's say. But still, I think

Starting point is 01:00:35 I'm surprised. I'm super surprised by the interest of GDML. Can it become something more? I don't know. Maybe we'll, I guess we'll give it a try in some way and see if we can evolve it it will be i don't know it will be i don't have a good vision because i'm doing it to be useful to me the good thing is i see people are kind of understanding it already, which I kind of did not really expect because I see stuff and there's like some weird things.

Starting point is 01:01:11 But maybe, who knows, with time it can become something bigger. I'll be happy to see that happening. I'm curious about your path and how maybe it could be emulated. So, you know, what if other people would love to be a European alpha male coder

Starting point is 01:01:31 like yourself, Georgie? How did you learn this stuff? I know obviously you've been doing this in your day job, C and C+, or things that C++ are programming language you've been using, but can you share some of your path either to maybe programming in general, but specifically like getting into this world of being able to build these tools that work with these models how'd you learn this stuff yeah okay so i've been basically

Starting point is 01:01:54 programming since pretty much high school and i have a lot of interest in coding i do it as a hobby in my free time you can see like my see like my GitHub is full of random geeky projects and stuff. So I basically pretty much enjoy it. My education background is physics. I studied physics in university, have a master's in medical physics. But yeah, after university I started working in the software industry and I don't know what is the path. And I feel a bit weird like already hearing to answer these type of questions, but I just enjoyed. I find it fun. Sure. And yeah, I guess that's it. I was hoping part of your path might be the potential desire

Starting point is 01:02:48 to continue to play and provide potential future ports as Jared kind of alluded to earlier, which was, this kind of reminds me, Jared, of like whenever APIs were early and thriving and you had the whole mashup phase where you can like take one thing and do another or you can, I think even when, with his work early on to get into github was work on i believe at the time was octopress no it wasn't octopress was something octokit octokit i think it was renamed to octokit though

Starting point is 01:03:16 okay it had a different name for a while there i think potentially written in ruby it was a you know essentially you know api sdk essentially you know i think of it like this like this is kind of written in Ruby was essentially API SDK essentially. I think of it like this. This is kind of like that era where you have these models coming out and you need ports and you need this and this is like a potential new fertile ground for one, not so much newcomers because you've been programming for quite a while, but new into this scene where you're providing high quality ports

Starting point is 01:03:43 that people are using that have a lot of stars on GitHub and a lot of popularity, preparation meets opportunity, obviously, and great timing. So I just think that's kind of like maybe an interesting space we're in right now with this newfound stuff happening. Yeah, I mean, I think that's totally your call, Georgie, because you're doing this because it interests you and because you get, I don't know, intellectual stimulation from it. And if it gets boring, like just porting the next big model that gets released because people expect you to or something, I can see where that would no longer be worth it for you.

Starting point is 01:04:18 Right. Do you have bigger ambitions with this? Do you have an end goal in mind or are you just kind of opportunistically following your interests and your hobby and coding up cool stuff and a couple things happen to be smash hits? Like bigger opportunities, as you can imagine, my inbox is full of people asking me to do stuff.

Starting point is 01:04:44 I wasn't really planning on doing anything. There is one idea which we'll probably get to try. We'll see. And it's in the same path as I mentioned, like trying to get people involved and contribute and try to grow this approach. And I don't know, I personally, I don't have any big expectations from this. For example, I'm not going

Starting point is 01:05:11 to promise anything. I just have a lot of ideas for random cool hacks. This is what's interesting and I'll probably eventually implement those and share them and I hope people like them yeah one thing that Simon said I'm going to paraphrase one thing he said in his coverage recently of Lama and he also mentions Georgi so good to mention this he says that furious sound that furious typing sound you can hear is thousands of hackers around the world starting to dig in and figure out what life is like when you can run a GP3 class model on your own hardware.

Starting point is 01:05:51 And I think that this conversation and what you've produced is a glimpse into, you know, that phrase that he had that sentence or two he shared. Cause I think that's kind of what happens. Like you can now run this on your own hardware, an M1 Mac or an M2 Mac if you've got an Apple Silicon, and get results pretty quickly. Better than the 20 hours you had, Jared, initially with the non-C++ version of it, which I think is pretty interesting. I just love this.

Starting point is 01:06:15 It's kind of like this new invigoration in the world where it's like, wow, I can run these high-class models on my own machine and get results and play, which I think is the most truly fun part about software development, hacking, programming, whatever you want to call it, is this ability to play to some degree with your own rules in your own time on your own machine and not have to leverage an API or buffering or anything like that whatsoever with an API where you have no rate limits.

Starting point is 01:06:44 You just got your own thing to do, and you can play with it. You can integrate FFmpeg to do different things, to preface it to a 16-bit wave. You can maybe, before Whisper 2 comes out, and you want to do diarization, transcripting, you can do that too. You don't have to wait for the thing to happen. And obviously, if Whisper 2 supports that feature,

Starting point is 01:07:03 roll back your code and not use it, because it's baked into the model now. But that's the cool thing I think that's happening right now. Would you guys agree? Yeah, I guess you don't even need heavy hardware, which is expensive for a car to run and maintain and all this stuff. So it opens up interesting opportunities for sure. Well, even the GPU aspect, like you can build your own machine, you can buy a phenomenal NVIDIA or AMD GPU, you can build your own CPU up from the motherboard to the compute,

Starting point is 01:07:39 to the RAM, to the GPU. But system on a chip is is readily available to pretty much every human being, given the money affordance in your own pocketbook to pay for it, of course, but system on a chip, this Apple Silicon is pretty interesting, how it just bakes all that into one thing

Starting point is 01:07:58 and it's integrated. You don't have to build your own machine to get there, is the point. Yeah, Apple Silicon for me is quite exciting. I expect it to become even more approachable and, I don't know, usable or what was the word? So, yeah, I mean,

Starting point is 01:08:13 still I think it's like a bit, not still a great idea to run this. I mean, the efficiency is not quite there yet. But with the way things are progressing, like exponential growth of computational power and exponential shrinkage of the models,

Starting point is 01:08:37 maybe in one year, maybe one year you'll be able to do on your CPU what you're currently able to do with, I don't know, modern GPUs. I guess, I don't know. Well, Georgi, thanks so much for coming on the show, man. This has been fascinating. I love that you're just kind of the true hacker spirit

Starting point is 01:08:55 of just like coding up this stuff in your free time because it's something you love to do and your path to get here is just like, I just code on this stuff all the time because it's what I like to do. Your work is helping a lot of people. It's definitely also riding the AI hype cycle that we're currently on. So hopefully it continues to go that way. I think that we'll lose people as we go, but as things get better as well, we'll put this stuff in the hands of more and more people on their own hardware, on their own, with their own software, easily integrating. And especially for, I mean,

Starting point is 01:09:30 from us, we're not quite yet using Whisper because we're still, you know, trying to figure out that speaker identification bit. Thank you so much for guaranteeing it in the next six months. I'm just, I'm just joking. But we're excited about it. And we can see a future where this, you know, directly benefits us, which is super cool and in the meantime it's benefiting a bunch of people so yeah I just really appreciate you taking the time I know you don't do podcasts

Starting point is 01:09:52 so this is your first one and prying you away from your keyboard think about what you could have done with this time you could have changed the world already but instead you just decided to talk to us so we appreciate that thanks Thanks for having me. I enjoyed it. Well, something you may not know because we almost never tell you, but we have a YouTube channel. Mainly we use it for clips and features from shows like this, of course, the changelog,

Starting point is 01:10:21 jazz party, go time, founders talk, brain science, the entire changelog podcast universe clips. They're all there at youtube.com slash changelog. You should subscribe. And if we're giving you that much value to go to YouTube and subscribe, you might as well check out changelog plus plus. That is our membership. Yes, we love our members. We drop the ads. We bring you a little closer to the metal.

Starting point is 01:10:45 We give you bonus content. We give you discounts at the merch store. So cool. And on the note of bonus content, we have a bonus segment today for our Plus Plus subscribers. So if you're a Plus Plus member, check it out. It's right after this. If not, changelog.com slash plus plus. It's right after this. If not, changelog.com slash plus plus. It's too easy.

Starting point is 01:11:10 Once again, a big thank you to our friends and our partners at Fastly and Fly. And also, our good friends over at TypeSense. Blazing fast in memory search. So cool. TypeSense.org. Check it out. And those beats. Those beats are banging. Breakmaster Cylinder brings it every single week for us, and we appreciate it.

Starting point is 01:11:27 Of course, thank you also to you, our listeners. Thank you so much for choosing to listen to this show all the way to the very end like this. We appreciate you. But that's it. This show's done. We will see you on Monday.

Your Ad Here

The Changelog: Software Development, Open Source - Bringing Whisper and LLaMA to the masses (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.