The Changelog: Software Development, Open Source - Bringing Whisper and LLaMA to the masses (Interview)
Episode Date: March 22, 2023This week we're talking with Georgi Gerganov about his work on Whisper.cpp and llama.cpp. Georgi first crossed our radar with whisper.cpp, his port of OpenAI’s Whisper model in C and C++. Whisper is... a speech recognition model enabling audio transcription and translation. Something we're paying close attention to here at Changelog, for obvious reasons. Between the invite and the show's recording, he had a new hit project on his hands: llama.cpp. This is a port of Facebook’s LLaMA model in C and C++. Whisper.cpp made a splash, but llama.cpp is growing in GitHub stars faster than Stable Diffusion did, which was a rocket ship itself.
Transcript
Discussion (0)
what's up friends this week on the change law we're talking with georgie gergenhoff about his
work on whisper.cpp and llama.cpp you're gonna love it he first crossed our radar with whisper
cpp which is his port of open ai's whisper model in c and c plus plus whisper is a speech
recognition model
enabling audio transcription and translation,
something we're paying very close attention to here at Changelog.
Between the invite and the show's recording,
Georgie had another hit project on his hands,
Llama.cpp.
This is a port of Facebook's Llama model
in C and C++.
Whisper CPP made a splash, but Llama.cpp is growing in GitHub stars faster than StableDiffusion did,
which was a rocket ship in and of itself.
A massive thank you to our friends and our partners at Fastly and Fly.
Our pods are fast to download globally because Fastly, they are fast globally.
Check them out at Fastly.com.
And our friends at Fly let you put your app App and your database close your users all over the world with no ops.
Learn more at fly.io.
This episode is brought to you by our friends at Postman. Postman helps more than 25 million developers to build, test, debug, document, monitor, and publish their APIs.
And I'm here with Ken Lane, Chief Evangelist at Postman.
So Ken, I know you're aware of this, but companies are becoming more and more aware of this idea of API-first development.
But what does it mean to be API-first to you?
API-first is simply prioritizing application programming interfaces
or APIs over the applications they're used in.
APIs are everywhere.
They're behind every website we use, every mobile application,
every device we have connected to the Internet, our cars.
And what you're doing by being API first is you are prioritizing those APIs before whatever the application is that is putting them to work.
And you have a strategy for how you are delivering those APIs. So you have a consistent experience
across all of your applications. Take us beyond theory, break it down for me. What changes
for a team when they shift their strategy, when they shift their thinking to API first development?
So when your one team goes, hey, we're building a website, It's got a address verification and part of an e-commerce solution.
That web application is using the APIs to do what it does. And then when another team comes along
and goes, hey, we're building a mobile application that's going to do a similar e-commerce experience
and may use some of the similar API patterns behind their application, they need address verification,
that that's a consistent experience.
Those two teams are talking
rather than building in isolation,
doing their own thing and reinventing the wheel.
And then when a third team comes along,
needs to build a marketing landing page
that has address verification,
they don't have to do that work
because those teams have already collaborated,
coordinated, and address verification is a first class part of the experience. And it's consistent
no matter where you encounter that solution across the enterprise. And that can apply to
any experience, not just address verification. Very cool. Thank you, Ken. Okay, the next step is to go to postman.com slash changelog
pod. Again, postman.com slash changelog pod. Sign up and start using Postman for free today.
Or for our listeners already using Postman, check out the entire API platform that Postman
has to offer. Again, postman.com slash changelogpod. Well, exciting times to say the least.
Welcome to the show, Jorgi.
Nice to have you here.
Thank you for the invite.
You bet.
And happy to have you on your first podcast.
So we're having a first here.
Yeah, I'm a bit excited.
We're excited too. I wasn't sure you were going to say yes. You're a a first here. Yeah, I'm a bit excited. We're excited too.
I wasn't sure you were going to say yes.
You're a very busy guy.
You have, well, at the time that I contacted you, you had one project that was blowing up.
Now, since then, you have a second project that is blowing up even faster, it seems.
The first one, whisper.cpp, which we took an interest in for a couple of reasons. And now, llama.cpp,
which is like brand new this week, hacked together in an evening, and is currently growing
on GitHub stars, according to the thing you posted on Twitter, at a faster rate than stable
diffusion itself. So, man, what's with all the excitement?
Yeah, that's a good question. I still don't have a good answer for it.
But yeah, I guess this is all the hype these days.
People find this field to be very interesting,
very useful.
And somehow with these projects
and with this approach that I'm having,
like coding this stuff
in C and C++,
and running it on the CPU,
it's kind of generating
additional interest in this area.
Yeah.
Yeah, so far it feels great.
I'm excited how it evolves.
Yeah, it's pretty cool.
I think that these large language models
and AI has very much been
in the hands of big tech
and funded organizations,
large corporations,
and some has been open source
and we started to see it kind of trickle down
into the hands of regular developers
and open AI, of course,
leading the way in many ways.
They have their Whisper
speech recognition model, which allows for transcription, allows for translation.
And your project, Whisper.cpp, which is a port of that in C and C++, was really kind of an
opportunity for a bunch of people to take it and get our own hands on it, you know, and run it on their own machines and say, okay, all of a sudden, because this model itself has been released,
I don't need to like use an API. I can run it on my Mac book. I can run it on my iPhone. I can run
it on, well, the new ones are getting run on pixels. It's getting run on Raspberry Pis,
et cetera. And that's, that's exciting. So I was just curious when you started whisper.cpp why did you decide to code that up what was your motivation for starting that project yeah i'll be
happy to to tell about a little bit the story about how it came together but as you mentioned
yes like the big corporations like they're producing and holding most of the interesting models currently.
Right.
And being able to run the consumer hardware is something that sounds definitely interesting.
Okay, so Whisper CPP, it was kind of a little bit of luck and good timing. Actually, most of the stuff has been this way.
And how it started.
So Whisper was released in the end of September last year.
And by that time, I was basically a non-believer, like non-AI believer.
I didn't really believe in this.
I didn't really believe much this. I didn't really believe in much in the neural network stuff.
And like, I don't know, like more conservative point of view,
like I was wondering usually like,
why are these people wasting so much efforts on this stuff?
But I'm totally ignorant point of view.
I wasn't really familiar with the details and stuff like this.
But when Whisper came out, I was happened to be working on a small library.
It was like a kind of a hobby project.
Basically, this is the ggml library, which is at the core of Whisper CPP.
And it's a, like a very tight project implementing some tensor algebra. I was doing this
for some machine learning
tasks like work-related stuff
also, but I
usually hack quite a lot of
projects in my free time
like side projects, like
try to find some cool and interesting
ideas and stuff like this.
Usually I do this in C++
but I was looking to change change it a little bit.
And so GDML was an attempt to write something in C,
like real men do.
For sure.
Yeah.
So I was working on this library.
I wanted to have like some basic functionality,
like make it kind of efficient,
very strict with memory management,
avoid unnecessary memory allocation,
have multi-training support.
This is some kind of a tool that you can basically use
in other projects to solve different machine learning tasks.
I wasn't thinking about neural networks at all,
as I mentioned.
It was kind of not interesting to me at that point.
Okay, so I had some initial version of GDML,
and there was some hype about GPT by that time, I guess.
And also, I was definitely inspired by Fabrice Ballard.
Like he had a similar tensor library, liplnc, I think it's called.
And there was an interesting idea to try to implement a transformer model.
Like GPT-2 is such a model.
And I already had the tools, like had the necessary functionality.
So I gave it a try.
I actually found some interesting
kind of a blog post or tutorial
like GPT Illustrated or something like this,
GPT tool.
I went through the steps.
I like implemented this with GML.
I was happy.
It was running.
We were generating some junk
like random
I posted
I think I posted on reddit
maybe also hacker news
I forgot but
basically no interest and
I said okay like I guess
that's not very interesting
to people let's move on with
other stuff.
Like the next day or the day after that, I don't know,
Whisper came out and I opened the repo,
I opened it, I looked at the code
and I figured like basically this 90%
I have the code already written for the GPT tool
because like the transformer model
in Whisper it's
kind of very similar to GPT-2
I mean there are obviously
quite some differences as well but
the core stuff is
quite similar. So I figured
okay I can easily port this
it might be interesting to
have it running on a CPU I know
everybody is running on a CPU. I know that everybody's running
on GPUs,
so probably
it will not be efficient.
It will be
not very useful,
but let's give it a try.
And that's
basically
how it came.
And
yeah,
it slowly
started getting
some traction.
So Whisper
was interesting to me
immediately for a couple of reasons.
First of all, we obviously have audio that needs transcribed
and we always are trying to improve the way that we do everything.
And so automated transcriptions are very much becoming a thing.
More and more people are doing them.
So that, first of all, I was like, okay, a Whisper implementation
that's pretty straightforward to use on our own.
Obviously, you call it a hobby project.
Do not use it for your production things.
Do not trust it, but it's proven to be pretty trustworthy.
And then the second thing that was really cool about it
was just how simple it was,
insofar as the entire implementation of the model
is contained in two source files, right?
So you have it broken up into the tensor operations and the transformer inference.
One's in C, the other's in C++.
And that's just, as a person who doesn't write C and C++ and doesn't understand a lot of
this stuff, it still makes it approachable to me where it's like, okay, this isn't quite
as scary.
And for people who do know C and C++,
but maybe not know all of the modeling side
and everything else involved there, very approachable.
So A, you can run it on your own stuff, CPU-based.
B, you can actually understand what's going on here
if you give these two files a read, or at least high level.
And so I think that was two things about Whisper
that were attractive to me.
Do you think that's what got people
the most interested in it?
The other thing is it was very much
pro Apple, Silicon, pro M1.
And a lot of the tooling for these things
tend to be not Mac first, I guess.
And so having one that's like,
actually, it's going to run great on a Mac
because of all the new Apple, Silicon stuff,
I guess that was also somewhat attractive.
Yeah.
So yeah, what made it attractive, I guess, as you said, okay, the simplicity, I think
definitely it's like about 10,000 lines of code.
It's not that much, but overall like this neural networks set the core there. They're actually pretty simple, like it's simple matrix operations,
additions, multiplications, and stuff like this.
So it's not that surprising.
Yeah, another thing that generated interest was this Python.
It's a bit tricky topic, but yes, so I was mostly focused on learning this on Apple.
And so I'm like, I don't use Python a lot and pretty much I don't use it at all.
And I don't really know the ecosystem very well.
But what I figured is you basically, if you try to run the Python code base on M1, it's not really utilizing yet the available resources of
these powerful machines yet, because if I understand correctly, it's kind of in the
process of being implemented to be able to run these operations on the GPO or the
neural engine or whatever. And again, maybe it's a good point to clarify here.
Maybe there's some incorrect stuff that I will say in general
about Python and transformers and stuff like this.
So don't trust me on everything
because I'm just kind of new to this.
Fair.
Okay, so you run it on M1.
The Python is not really fast. because I'm just kind of new to this. Okay, so you run it on M1.
The Python is not really fast,
and it was surprising when I ran it with my port.
It was quite efficient because for the very big matrix multiplications,
which are like the heavy operations
during the computation,
I was in the encoder,
like in the encoder part of the transformer,
and running those operations
with the Apple Accelerate framework,
which is like an interface
that somehow gives you extra processing power
compared to just running it on the CPU.
So it was, yeah, it was efficient
during Whisper CPP. I think people
appreciated that.
There was, I said it was
a bit tricky because there was
this thing with the
text decoding mode.
So, yeah, I'll try
not to get into super much details, but
there are like basically
two modes of decoding.
The text-like generating generating the transcription they call it
greedy approach and beam search and beam search is much heavier to process in terms of computational
power compared to the greedy approach i just had the greedy approach implemented and it was running by default.
While on the Python repo is the beam search running by default.
And I tried to clarify this in the instructions.
I don't think a lot of people really notice the difference.
Yeah.
So they're comparing a little bit tapos to oranges.
Oh man, good pun.
I'm curious what it takes to make a port.
What exactly is a port?
Can you describe that?
So obviously Whisper was out from an open AI
that was released.
What exactly is a port?
How did you sort of assemble the pieces
to create a port?
Yeah, I think port is not super correct word,
but I don't know. Usually you port. Yeah, I think port is not super correct word,
but I don't know.
Usually you port some software,
you can port it from some certain architecture, like, I don't know, running on a PC
and then you port it there,
implement it and it starts running on a PlayStation
or whatever.
Like this kind of makes more sense to call it port.
Here is just maybe a reimplementation.
It's more correct to say,
but basically the idea is to implement
these computational steps.
And the input data, the model,
the weights that were released by OpenAI,
they're absolutely the same.
You just load it, and instead of computing all the operations in Python,
I'm computing them with C.
Right.
That's it.
Gotcha.
You probably recall when Whisper first dropped,
I did download it and run one of our episodes through it.
And this was back, remember on Get With Your Friends with Matt,
I was talking about my pip install hesitancy.
Yes.
Like some of that is with regards to Whisper,
because like Yorgi, I'm not a Python developer,
and so I'm very much coming to Whisper
as a guy who wants to use it to his advantage,
but doesn't understand any of the tooling, really.
And some kind of prodding out of Blackbox
and following instructions.
And I got it all installed and I ran everything
and I took one of our episodes
and I just kind of did their basic command line flags
that they say to run with the medium model
or whatever it was, kind of like the happiest path.
And I ran our episode through it and it did a great job.
It transcribed our episode in like something like 20 hours on my Mac.
And so remember at that time, Adam, we're talking about, well, we could like split it
up by chapter and send it off to a bunch of different machines and put them back together
again because we're like 20 hours is well faster than our current human based transcriptions.
But still, it's pretty slow.
And I did the same thing with Georgie's whisper.cpp when he dropped it in
September or October or whenever that happened to come out.
And again,
just the approachability of like,
okay,
clone the repo,
make,
run the make command.
Right.
And then run this very simple dot slash main,
whatever,
pass it your thing.
Yeah.
And the exact same episode, it was like four or five minutes versus 20 hours.
Now, I could have been doing it wrong.
I'm sure there's ways of optimizing it.
But just that difference was, okay, I installed it much faster.
I didn't have to have any of the Python stuff, which I'm scared of.
And at least in the most basic way of using it, each tool,
it was just super fast in comparison.
And so that was just exciting.
I'm like, oh, wow, this is actually approachable.
I could understand if I needed to.
And it seems like, at least on an M1 Mac,
it performs a whole lot better with pretty much the same results.
Because like Georgie said, it's the same models.
You're using their same models.
You're just not using all the tooling that they use
that they wrote around those models in order to run the inference and stuff.
You're speaking to the main directory in the examples
folder for whisper.cpp. There's a readme in there that sort of describes
how you use the main file and pass a few options, pass
a few WAV files, for example,
and out comes a transcript wherever using different flags
you can pass to the main.cpp C++ file,
essentially, to do that.
Yeah, so, yeah, regarding the repo
and how it is structured,
I kind of have an experience with,
I know what people appreciate
about such type of open source project
and it should be very simple.
Like every extra step that you add,
it will push people.
So I wanted to make something
like you clone their Apple,
you type make and you get going.
And yeah, that's how it correctly works
and they read me their instructions how to use it.
I guess to preface that,
or to suffix that, the quick start guide,
or at least the quick start section of your README
says you build a main example
with make, and then you transcribe an audio file
with dot slash main,
you pass a flag of dash F,
and then wherever your WAV file is,
there you go. It's as
simple as that once you've gotten this built on your machine.
Yeah, exactly.
There are extra options.
You can fine-tune some parameters of the transcription and processing.
By the way, it's not just, okay, the main is like the main demonstration.
With the main functionality for transcribing WAV files.
But there are also additionally a lot of examples.
That's one of the interesting things also about WhisperCPP.
I tried to provide very different ways to use this model.
And they're mostly just basic hacks and some ideas like from people like wanting some particular functionality, like doing some voice commands like Siri, Alexa, stuff like this.
So there are a lot of examples there and people can like take a look and get ideas for projects.
This episode is brought to you by Sentry.
They just launched Session Replay.
It's a video-like reproduction of exactly what the user sees
when using your application.
And I'm here with Ryan Albrecht, Senior Software Engineer at Sentry and one of the leads behind
their emerging technologies team. So Ryan, what can you tell me about this feature?
Well, Sentry has always had a great error and issue debugging experience. We've got features
like being able to see stack traces and breadcrumbs. So you've got a lot of context about what the
issue is, but a picture is worth a thousand words and a movie is probably worth a thousand pictures. And so session replay,
it's going to give you that video-like experience where you can actually click through from your
issue and see how did the user get into the situation? What was the error? And then what
happened afterwards? That's pretty cool. Okay. So this point is kind of intended, but can you
paint a more visual picture for me? So when you open a replay
for an error, what you see on the screen is on the left side, you've got a video player with the play
pause button on the bottom. You can adjust the speed. And on the left side, you've got all your
developer tools, consoles there, the network is there. You can dig in to see like what requests
were failing. What were the messages that your application was generating? And you can scrub
through the video backwards and forwards to understand what happened before
and after this issue, what was leading up to it, and what do you have to go and fix?
Very cool.
Thank you, Ryan.
So if you've been playing detective, trying to track down support tickets, read through
breadcrumbs, stack traces, and the like, trying to recreate the situation of a bug or an issue
that your application has, now you have a game-changing feature called
Session Replay.
Head to Sentry.io and log into your dashboard.
It's right there in the sidebar to set up in your front end.
And if you're not using Sentry, hey, what's going on?
Head to Sentry.io and use the code PARTYTIME.
That gets you three months for free on the team plan.
Again, Sentry.io and use the code party time so going one layer deeper maybe not even necessary for everyone else but for you and i jerry maybe
this is more pertinent limited to 16-bit wav files why is the limit to 16 you can we often
at least i record in 32-bit so when i'm recording this i'm tracking this here in audition my wav
files are in 32-bit because that gives a lot more information. You can
really do a lot in post-production with effects and stuff like that or decreasing
or increasing semblances and just different stuff in audio to
kind of give you more data. And I guess in this case, you're constrained by
16 bit wave files. Why is that? Yeah, the constraint is actually
coming from the model itself.
Basically, OpenAI, when they trained it,
I think they basically use this type of data format, I guess.
So you have to give to the model the input audio that you give.
It has to be 16, wait, not 16-bit, 16 kilohertz, right?
16-bit WAV files is in your README, so I'm going based on that.
It's a problem, a mistake.
No.
Oh, okay, it's, yeah, yeah, yeah, it's 16-bit PCM.
Okay, it's just integers, not lots.
Yeah, okay, so it's 16-bit and it's also 16 kilohertz.
But yeah, technically, you can resample and convert any kind of audio,
whatever sample rate to 16.
And yeah, I mean, would you get better results if the model was able to process higher sample rate or higher habits?
It's just one less step really because
you've got the FFmpeg in here now
so you have one more dependency really in the chain of
if we were leveraging say this
on a daily basis for production flows
to get transcripts or
most of the way for transcripts so that's just
one more step really. It's not
really an issue necessarily it's just one more thing
in the tool chain. Yeah that's the
drawbacks of C and this environment.
You don't have like Python.
You just pip install or whatever
and you can create party libraries.
Here it's more difficult
and you have to stick to like the basics.
So your examples have a lot of cool stuff.
Karaoke style movie generation,
which is experimental.
You can tweak the time stamping
and the output formats kind of to the hilt to get exactly what you're looking for. And then also you
have a cool real time audio input example where it's basically streaming the audio right off the
device into the thing and, you know, saying what you're saying while you're saying it or rightly
right after you say it. I hear the next version,
it's going to actually do it before you say it,
which will be groundbreaking.
But what are some other cool things
that people have been building?
Because I mean, the community
has really kind of bubbled around this program.
Do you have any examples of people using
whisper.cpp in the wild or experimentally that are cool?
Yes, this is definitely one of the cool parts of the project.
I really like the contributions and people using it
and giving feedback and all this stuff.
Yeah, I definitely, there are quite a few projects already running.
Like, there are people making iOS applications,
macOS applications,
they're like companies
with bigger products
integrating into them.
I'm not sure we should say names,
but it's definitely being applied
in different places.
And yeah, I guess another interesting
application is at some point
we got it even running in a web page and one of the examples
does exactly that basically with web assembly you can load the model in a web page in your browser
and basically you don't even have to install like the rep or compile it you just open a browser and
you start transcribing you just yeah you still have to load the model,
which is not very small.
But it's amazing it can run even in a web page.
And I think there are a few services,
like web services,
that popped up using this idea
to offer you a free subscription,
oh, transcription.
Right.
Could you, kind of obvious,
but could you deploy that or distribute that through Docker,
a Docker container, for example?
That way you could just essentially, you know,
Docker compose up and boom,
you've got maybe a web service on your local area network
if you wanted to, just to use or play with.
Yeah, I guess so.
Yeah, I'm not familiar with Docker environment,
but I think you should be able to do that. to use or play with. Yeah, I guess so. Yeah, I'm not familiar with Docker environment,
but I think you should be able to do it.
I see people are already using it for the Lama,
and I guess there's no reason to not be able to.
I don't know the details.
Of course you can do it as a web service, but sometimes you want no dependence on anybody's cloud,
whether it's literally a virtual private server that you've spun
up as your service or simply, Hey, I want to, you know, use this locally in Docker or something
like that. And just essentially you've built the server in there. You've, you've got whatever
flavor of links you want. You've got, you know, whisper.cpp already in there and you've got a
browser or a, a web server running it just to ping for a local network.
You can name the service, whisper.lan, for example.
Yeah, you could totally get that done, I think.
So you brought up the fact that people are running this
in the browser, in WebAssembly.
Opportunistically, I'd like to get on the air
my corollary to Atwood's Law
that I posted last week on the socials. You guys know Atwood's
law, any application that can be written in JavaScript eventually will be written in JavaScript.
Well, my corollary, which I'm not going to call it Santos corollary, cause that would be presumptuous.
I'm not going to call it that, but I don't have a name for it yet, but it is any application that
can be compiled in WebAssembly and run on the browser eventually will be compiled to WebAssembly
and run on the browser because it be compiled to WebAssembly and run
on the browser because it's just too much fun, right? The most recent example would be this one.
But prior to that, you know, they're running WordPress in the browser now. Not like, you know,
the rendered HTML of a WordPress site in your browser, like the backend in your front end,
in your browser, because WebAssembly, we just love it.
We're going to run everything in it.
Why would you do that?
To show everybody that you can do it.
Okay.
I'm sure there's other reasons, but that was
pretty much what their blog post was.
The folks who did it, I think it's the
wasmlabs.dev folks
put WordPress into the browser with WebAssembly
because we can do it now.
And so we're going to.
So that was just me being opportunistic.
Back to you, Gorgie.
If we talk about Whisper and the roadmap.
So it's 1.2.
It's been out there for a while.
My guess is it's probably being less important to you
now that llama.cpp is out.
But we'll get to llama in a moment.
You have a roadmap.
On your roadmap is a feature that you know I'm interested in
because I told you this when I contacted you.
And this goes back to the meme that we created years ago, Adam.
Remember how we said that the changelog is basically a Trojan horse
where we invite people on our show and then we lob our feature requests at them
when they least expect it.
You know, before, as I was preparing for this conversation,
I was thinking, Jerry's going to say this
in this show for sure.
I invited you here to give you my feature request.
Right.
And to make it more pressure-filled feature request.
But I'm just mostly joking because I realize
this seems like it's super hard
and you can talk to that. But diarization, I don't know if that's how you say it,
speaker identification is the way that I think about it, is not a thing. In Whisper, it doesn't
seem. It's certainly not a thing in Whisper.cpp. I've heard that it's, Whisper models aren't even
necessarily going to be good at that. There's some people who are hacking it together with some other tools where they're like,
they use whisper, then they use this other Python tool.
Then they use whisper again in a pipeline to get it done.
This is something that we very much desire for our transcripts because we have it already
with our human transcribed transcripts.
It's nice to know that I was the one talking and then Georgie answered and then Adam talked.
And we have those now, but we wouldn't have them using whisper and it's on your roadmap.
So I know it's down there. There's other things that seem more important like GPU and stuff, but
can you speak to maybe the difficulties of that, how you'd go about it and when we can,
when we can have it? This feature is super interesting from what I get from the responses.
Basically being able to separate the speakers.
You're right.
So basically it's not out of the box supported by the model and there are third party tools
and they're like themselves, like those tools are other networks, like doing some additional
processing. And again, I basically have almost absolutely no idea
or expertise with this kind of stuff
and what works and what doesn't work and basically zero.
And there were like a few ideas popping around
using Whisper like in not traditional way to achieve
some sort of diarization
and like
it boils down to
trying to extract some of the
internal
results of the computation
and try to
classify based on
some, let's say, features
or I don't know, I'm not sure really how to properly explain it.
So I tried a few things.
I know people are also trying to do this.
I think it's, I guess it's not working out.
So, I don't know, this slow, unlikely, at least from my point of view,
maybe if someone figures it out and it really works,
we could probably have it someday.
But for now, it seems
unlikely. It's a pipe dream.
I don't understand why it's a pipe dream. It seems
because there's other transcripts and services out there
that have it, that are
not LLM-based or
AI-based. They're just
I don't know how they work, honestly.
But, for example, I had Connor Sears from Rewatch
on Founders Talk a while back.
And one of the killer features, I thought, for Rewatch,
so just a quick summary,
Rewatch is a place where teams can essentially
upload their videos to Rewatch later.
So you might do an all-hands demo,
you might do a Friday demo of your sprint or whatever, and new hires can come on
and re-watch those things or things around the team and
whatnot to sort of catch up. It's a way that teams are using
these videos and also the searchable transcripts
to provide an on-ramp for new hires and
or training or just whomever, whatever.
That's how they're using them.
He actually came from GitHub, and they had this thing called GitHub TV when he worked there.
And Connor's a designer, and long story short, they've had this thing.
And so he really wanted the transcription feature, and they have transcripts that are pretty amazing,
and they have this diarization, I don't know if that's what they call it,
but they have Jared, Adam, whomever else labeled.
Why is it possible there, and why is it such a hard thing here?
Yeah, I think the explanation is basically Wispel wasn't designed for this task,
and I guess most likely they're using something that was designed for this task. And I guess most likely they're using something
that was designed for this task.
Some other models, it was trained to do the authorization.
And yeah, you can always like plug in some third-party project
and our network to do this extra step.
It would be cool to be like being able to do it
with a single model, but for now, it would be cool to be able to do it with a single model.
Right.
For now, it's not possible.
Is it kind of like converting your WAV files to 16-bit
first before using the model?
It's like that's one more step in the mix, basically.
Yeah, but it's even worse than that
because it's a much harder step.
It's basically like running it through
Whisper and then running
it through a separate thing, which its entire purpose is segmentation or diarization and then it's
like two passes whereas what we're talking about it's like well ffmpeg dash whatever and it's like
this is just like the tooling around that is it puts us see for me there are solutions that seem
like they're kind of hacky and people are getting them to work, but it's like back in the Python world again.
And it's very slow because of that, from what I can tell.
So against Python, Jared.
I don't hate Python.
This pip install has got you really upset.
We've got to solve this.
No, it's just I like the simplicity
and the straightforward stuff that Georgie does.
I just want it in whisper.cpp.
I know.
I think maybe whisper too will just support this feature and then we'll all be happy.
Right?
Like you'll just upgrade your models and everything.
You'll,
you'll just check it off your roadmap.
But if not for something like that,
I think it is probably a difficult thing to accomplish just because the
models aren't set up to do that particular task.
Like they're just set up for speech recognition,
not for,
I don't know, speaker
classification or whatever you call it.
Yeah, with the way things are going lately, I
suppose by the end of
the month, OpenAI
will probably release a new model
which supports everything.
The day we ship this episode, it'll support that.
This stuff's moving at the speed of light right now.
So it probably will be.
By the time this ships, it'll probably be a feature of Whisper 2.
Yeah, I think so.
Hopefully.
So the project, I should give it a shout out.
I do not dislike Python.
PyAnnote, P-Y-A-N-N-O-T-E,
is what people are combining with Whisper
in order to get both features through a pipeline.
So if you're interested in that, people are doing it.
It seems a little bit buggy.
They aren't quite happy with the results, but they have some results.
You got to be careful because Brett can't listen to the show, Jared.
And sometimes he even reads transcripts.
He might just scan for his name or Python, essentially.
He's got two searches on our transcripts.
Well, now he just brought his name into it.
He wouldn't have been able to find it until just now.
I've been thinking too behind the scenes
that the fact that this runs on Apple Silicon,
when you got the ARM thing that's kind of baked in there,
I believe it is called Neon,
which I think is pretty interesting,
this Neon technology kind of,
in that separate sort of super processor
or additional processor speed,
like what did you learn or have to learn
about Apple's Silicon to make this project work?
What did you even,
not so much learn to make it work well,
but what did you learn about the processor
that was like, wow, they're doing that
in this consumer grade,
pretty much ubiquitous,
or available to mostly anybody
who can afford it, obviously.
What did you learn about their processor?
Yeah, so ARM Neon,
this is the name of the instruction
set that runs on
the Apple Silicon CPUs.
When I started
GDML, I recently had
my shiny new
M1. I have
been using it for my
workstation, like transitioning from Linux.
Oh wow, you're a Linux convert, okay.
Yeah, but yeah, this
machine is so good, I decided
to switch and... You won't go back?
I think I'm not going back at any time soon.
Elaborate. I'm
listening. Go ahead.
So, yeah, I was interested
in understanding how
to utilize and
basically, so this is called
like single instruction multiple data seeing the
programming where you utilize this instruction set to process things faster and I wanted to
get some experience into that so I had this implemented in ggml to like support for the heavy operations to use arm neon and what it requires
to be able to use it just read the documentation and figure out how to properly load and load and
store the data in an effective manner it's a tricky stuff in general. I'm no expert by far.
So I'll probably mention this at some point,
but people are looking at the code lately
and they're helping me optimize these parts.
They're kind of difficult to code in general.
So yeah, Armion is helping for the CPU processing and then there is this extra
Apple framework which I'm not really sure which part of the hardware it utilizes. Basically this
is the Apple Accelerate framework. It has a linear algebra API, so you can say, okay, multiply these matrices
and so it's really
fast. And I
think it's really something that is called
AMX coprocessor,
but it's not super
clear to me. I don't really care.
It's just fast.
So
why not use it?
This is one of the optimizations, yeah.
What I found interesting when I was researching a little further
to prepare for this call was that this is a
quote-unquote secret coprocessor.
It's called the Apple Matrix Coprocessor, AMX is what they call it.
And it's not very well known,
and so as this Apple Silicon is only a couple years old,
it's not that old, so even examining or, even examining or building new software technology on top of it.
But like this is, I think we have to look at, you know, one of the many reasons that Apple chose to abandon Intel and go their own route.
And obviously a lot of the work they did in their mobile devices from an iPhone to an iPad and all the things happening in their processors led them to this direction.
But even this, the accelerated coprocessor that is there secretively,
essentially just waiting to be tapped into, is kind of interesting
just because it does what it does.
Yeah, I guess when you make your own hardware and software,
you definitely get some advantages compared to not doing it.
So I think it's a good approach.
I like this way.
We're even speculating too on like Apple and artificial intelligence.
And maybe this is the glimpse into their genius that is not yet revealed.
Because if you can do what you've done with this coprocessor and this neon arm technology and this AMX,
this Apple matrix coprocessor,
we have to wonder what are the reasons why they went this route?
One,
it couldn't be just simply to put it into our hands,
but to put into our hands for them to use in some way,
shape or form.
So it's got to make you wonder what the future for them might be in AI
because they are really black box and secretive in terms of new features
and new products and things like that.
But this might give us a glimpse into that future.
Yeah, true.
I don't know.
I'm not really competent.
As far as I know, the optimal way of a lab is to use Core ML,
like some other framework which utilizes like everything,
like neural engine, GPU, CPU, whatever.
Yeah, and I think they, for example, recently demonstrated how to run stable diffusion with
CoreML, quite efficient.
So yeah, I guess like using Accelerate, it's not really something new.
It's probably not even the right way to go in the long run.
But for now, it works.
It's okay.
It works for now.
It works for, it's good enough for us regular people.
So on the Whisper front, I know we should get to Lama here soon
because it's the most exciting new thing.
And here we are burying the lead deep into the show like fools.
But Whisper is interesting to me.
The GPU support, so one of the things about it
is it's simple, it's
great hardware support,
very generic, runs on the CPU.
You do have GPU support also
on the roadmap. Is that something that
you put it on there
because people ask for it or are you actually interested in this?
Because it seems like it could definitely complicate things.
Yeah, GPU support I avoid
because you usually have to learn some framework
like CUDA, OpenCL, stuff like this.
It's complicated.
It takes time to understand everything.
There are some workarounds
like using what was it called?
MVBus where it kind of automatically does it for you.
But I don't know.
There will be some probably in the
future some basic support
I think more interesting is the
for Apple hardware
is the transition of the
encoder part like one
of the heavy parts to the Apple
Neuro engine which we already have a prototype
and this will kind of
speed up the processing
even further
so that's
Have you been able to run any benchmarks against your prototype
or have you gotten to that phase where you're actually
seeing how much gains you're getting?
Yeah, actually this was a super
cool contribution, basically I
read about 4ML, I decided
I'm probably not going to invest time
learning all these complex
tools but certainly one day
while I while a contributor
like posted, you can see the linked issues in the repo, how to do it, which was super great.
And he demonstrated it's possible. We initially observed like three times speed up, I think.
Nice.
But then other people joined, they showed us even how to make it even better.
And I like this
because people are contributing,
sharing ideas,
making it faster.
So I guess at
least three times, but this is just
the color, the color remains
not optimal,
so it's not super great
overall.
You gotta love that moment with an open source project
where you start to get significant
contributions, right?
Not drive-by, readme fixes, or
docs, which are helpful, but not like
this is a significant
contribution of a new way of doing something
or proof of concept.
That's pretty exciting. It seems like
your two projects now,
especially Whisper,
because it's been around a lot longer,
has had a lot of very smart coder types
not afraid of hopping in
and really helping out.
Did you do anything to cultivate that,
or is it just the nature of the project
that it brings a certain kind of contributor?
Yeah, I'm also wondering about this and really enjoy it.
So my previous projects, they didn't have a lot of contributions involved.
And now with Whisper and Lava, I really like that it's getting attention.
Do they do anything specific?
Not really.
I guess just people find it.
Maybe first of all, they find it useful and they start suggesting ideas for making it even more useful.
And then people eventually start journeying to make code improvements and stuff like this.
And there is, I think, I don't know, from my perspective, it's a relatively big momentum currently.
People are very interested in supporting this yeah i try to like uh make it so
they're kind of able to get into it like create some entry-level tasks and things that people can
get involved because eventually like currently there are so many requests and issues and all
this stuff uh that's kind of very difficult to handle by my own.
So it would be nice to have
more people involved.
Switching gears now, I think we put the
cart before the llama, Adam.
I don't know if that rings true
to you. I was actually wondering if we should have
our good friend Luda
bring us in. Llama, llama.
Red pajama. You know what I'm saying?
Luda Chris. Llama, llama, red pajama you know what i'm saying i've been dying to do red pajama llama, Llama Llama Drama, just all these rhymes. And I haven't been able to work those in quite yet.
I knew you were.
Getting to it now, the most exciting thing on the interwebs until I guess GPT-4 stole some steam yesterday.
But February 24th, Facebook releases, Facebook Research or Meta AI, who knows what they call themselves these days, released Lama,
a foundational 65 billion parameter large language model.
And then according to some commentary,
a European alpha coder went on a bender one night and ported it to C++
so we can all run it on our Pixel phones.
So that's the story, Yorgi.
How do you feel about being called a European alpha bender?
European alpha coder.
I thought that was a funny way of casting it
by somebody on Twitter.
Yeah, really.
I really like this meme.
It originated on Twitter,
like someone calling me an alpha male European or something.
I don't know.
It's kind of funny.
Well, so you did hack this together in an evening.
Is that,
is that lore
or is that true?
Yeah, it's,
it's basically
kind of true.
But again,
it's a combination
of like
factors
and good timing
and some luck.
Basically,
we had the
four-bit quantization
stuff for the
Whisper,
just an idea working
like where you,
basically,
you take the model, you compress it down to four bits. You lose and stuff for the Whisper, just an idea working, like where you basically take
the model, you compress it down to four bits, you lose some accuracy, but it's
smaller and it processes faster.
So we had that like in DGML and it was available.
So a few days later comes out the LAMA.
I do some calculations, I figure out like, okay, 65 billion parameters, you
probably need about 40 gigs of RAM with 4-bit quantization.
So this can run on a MacBook.
Why not do it?
And yeah, it just was a matter of time to find some free time to try it.
And yeah, last Friday, came after work home, had the words downloaded.
But yeah, why I was able to do it so quickly.
Basically, from what I saw, it's pretty much GPT-J architecture with some
modification, like some extra normalization layers. And some minor changes.
Basically, again, the existing call for the GPT-J,
I just simply modified it there.
It happened pretty quickly.
You had a leg up.
Prior art helped you that you created.
Yes, yeah.
So that quote, success is when, what is it, Adam?
Preparation meets opportunity.
Right, so like Georgi was perfectly prepared Quote, success is when, what is it, Adam? Preparation meets opportunity. Right?
So like Georgie was perfectly prepared between this GGML library that he'd previously developed
and this knowledge he has.
He was like primed for this position.
For sure.
Yeah.
Which is great.
I love that when that happens in my life.
And so I applaud that moment for you because I mean, when you're in the trenches and you
feel like you're like in the wilderness and you put some code out there, and in the case of Whisper.cpp, you get a glimpse of your hacker direction, your hacker sense.
You feel like you want to use a spidey sense kind of play on words.
And you've done it again.
Why not port another popular direction for artificial intelligence in everyday life.
Boom.
Done.
That's my hype way,
Jared.
Boom.
Done.
I like that.
Boom.
Done.
Boom.
Done.
Right off into the sunset.
Yeah.
So why do you think people are so excited about this one in particular?
So I guess,
you know,
whisper is very much for audio.
It's,
it's more scoped to a smaller domain,
whereas Llama is like your typical text autocomplete thing.
Like it's going to do, like create your own chat GPT
is basically sort of not the pitch,
but it's more akin to that.
And chat GPT is so interesting and sticky for people
that this is like, okay we can now we can go
build our own little text ais is that what you think is why it's like like if you if you check
the github stars on this thing like the chart it's pretty much straight vertical like it just goes
straight up the y-axis it doesn't there is no x-axis i'm exaggerating a little bit for dramatic
effect but you know what i mean. People are really, really
running this thing.
Yeah, I'm also wondering,
I don't have a good answer.
I guess it's the chat
GPT hype. Yeah.
Doing inference locally,
having your chat assistant
on your device and stuff like this.
I don't know,
I personally just try to kind of keep it real.
As I told you, I was a non-believer a few months ago.
Now it's hard to ignore.
It seems to be working.
It does work.
You actually seem less excited about this
than anybody else who's been posting
onto Mastodon and Twitter.
I'm running it on my Pixel phone, one token per second,
obviously slow.
I've got it running on my MacBook.
It's over here on this Raspberry Pi 4 now.
People have kind of been invigorated by it,
but what I'm getting from you, Georgi, is it's cool,
but maybe Whisper is even cooler. cool, but it's not like, maybe
Whisper is even cooler.
Yeah, I find, actually, I find Whisper much more useful.
Like, it solves a very well-defined problem, and it solves it really good.
Yeah.
So, with the third generation, I mean, okay, it's developing quite fast.
I don't have, like, I personally haven't seen
anywhere,
let's not go in this direction,
but yeah,
I think people are
just basically excited
to be able to run this locally.
I'm mostly doing it for fun,
I would say.
And did you have to agree
to those strict terms to get access to the model from Facebook?
I submitted the form.
Okay.
Did you read the terms?
Did you get the memo?
Yeah, yeah.
Of course I read them.
Okay.
That's for sure.
Why?
Did you read them, Adam?
I haven't read them.
I'm paraphrasing from Simon Willison's article on the subject
when he says you have to agree to some strict terms to access the model.
So I just assumed that you were cool with the strict terms.
I'm in quotes here.
You can't see me on video.
The strict terms.
Yeah, I'm not distributing it.
Right.
So I'm not distributing the word.
So I think that's totally fine.
Is that kind of how you agree to an end-user license agreement?
You scroll to the bottom and hit the checkbox?
Not you, but the royal you, like everybody.
Yeah, of course, you just hit agree.
Yeah, exactly.
I actually had a friend who had a great idea for that back in the day
where you could provide EULA acceptance as a service,
and you just go and you live somewhere where no EULAs
apply or something like out there in the middle of the ocean, you know, and you then outsource
the check, the checking of the checkbox, you know, people could just have you check it for them.
Yeah. And so they both get the checkbox checked, but then they have plausible deniability
because they didn't actually check it.
And then one person just checks it for all of us.
But that person's outside of any jurisdiction.
And so we win.
What do you think, Adam?
I love it.
I'm going to subscribe to that.
Please, Jared, put that link in the show notes so I can follow it and utilize that link. How cool would that be?
That would be cool.
So now you have these two projects.
One is kind of taking off, at least at the moment, more than the other one.
Maybe it's merely on a hype wave.
Maybe there's more to it than that.
Obviously, there'll be more models released soon that also need to be ported over for us.
Where do you go from here?
Where do you take it?
Are you dedicated to doing more for Whisper?
Do you think Lama is where you're going to put your time?
Do you not care about any of these things? You're just having fun? Because I know this is just like fun for you,
right? This is not your job. Yeah, I'm doing this basically in the free time. And I don't know,
for the moment, I just plan to try to make it a bit more accessible, maybe attract some people
to start contributing and help out because there are quite a lot of
requests already popping up.
And my personal interests are just try to do some other fun, cool demos and
tools and examples and stuff like this.
I don't know. I kind of, from one point of view, I don't really want
to spend
super much time
into these projects.
I prefer to get them into
hopefully into a state
where other people are
helping out so I can do
other stuff.
So in terms of extensibility,
you said by way of allowing others to come into the project,
contribute code, help you move it along.
I assume part of that is desires for other integrations
with like popular C++ libraries or frameworks.
Our good friend, ChatGPT4, as a matter of fact,
that's the model I'm using to get this request.
Something like OpenCV or I believe it's called EGEN and potential
other advantages for integrations. Are you thinking about stuff like that where other C++
applications or libraries can leverage this work
to sort of take it to the next level or do other things with it?
To give an example, OpenCV is a real-time
optimized computer vision library.
It offers different tools.
And EGEN, I believe, is something similar where it's more around linear algebra, matrices, vectors, numerical solvers, etc., related to algorithms.
Have you thought about that kind of other angle where it's not so much just you, but leverage of this in C++ land.
Okay, so my point of view for these projects,
I prefer things to be super minimal
and without any third-party dependencies.
And I just prefer it like this.
I keep things simple and don't rely on
other stuff. If you ask the other
way around, could
other projects use GDML?
That's my angle. Can they use you?
I'm thinking about it and I guess
GDML is kind of
like, I would say it's
a beginner level framework.
They're more
advanced and mature frameworks
for this type of processing
for sure.
Like,
and even probably
more efficient.
I guess
there is hype
around GDML
because it's kind of simple
and you can tweak stuff
easily
and these things.
But if you want to make
something like
a quality product,
let's say,
or something
like more production, you
probably should use
some existing and
well-established
framework, let's say. But
still, I think
I'm surprised. I'm super surprised
by the interest of GDML. Can it
become something more? I don't know.
Maybe we'll, I guess we'll
give it a try in some way and
see if we can evolve it it will be i don't know it will be i don't have a good vision because
i'm doing it to be useful to me the good thing is i see people are kind of understanding it already, which I kind of did not really expect
because I see stuff and there's like some weird things.
But maybe, who knows,
with time it can become something bigger.
I'll be happy to see that happening.
I'm curious about your path
and how maybe it could be emulated.
So, you know, what if other people
would love to be a European
alpha male coder
like yourself, Georgie? How did you learn
this stuff? I know obviously you've been
doing this in your day job,
C and C+, or things that C++ are
programming language you've been using, but
can you share some of your path either
to maybe programming in general, but specifically like getting into this world of being able to build
these tools that work with these models how'd you learn this stuff yeah okay so i've been basically
programming since pretty much high school and i have a lot of interest in coding i do it as a
hobby in my free time you can see like my see like my GitHub is full of random geeky projects
and stuff. So I basically pretty much enjoy it. My education background is physics. I
studied physics in university, have a master's in medical physics. But yeah, after university I started working in the software industry and I don't know what is the path.
And I feel a bit weird like already hearing to answer these type of questions, but I just enjoyed.
I find it fun.
Sure.
And yeah, I guess that's it. I was hoping part of your path might be the potential desire
to continue to play and provide potential future ports
as Jared kind of alluded to earlier,
which was, this kind of reminds me, Jared,
of like whenever APIs were early and thriving
and you had the whole mashup phase
where you can like take one thing and do another
or you can, I think even when,
with his work early on to get into github was work on i believe at the time was octopress no it wasn't octopress was something octokit octokit i think it was renamed to octokit though
okay it had a different name for a while there i think potentially written in ruby it was a you
know essentially you know api sdk essentially you know i think of it like this like this is kind of written in Ruby was essentially API SDK essentially.
I think of it like this. This is kind of like that era where you have
these models coming out and you need ports and you need this
and this is like a potential new fertile ground
for one, not so much newcomers because you've been
programming for quite a while, but new into this scene
where you're providing high quality ports
that people are using that have
a lot of stars on GitHub and a lot of popularity, preparation meets opportunity, obviously,
and great timing.
So I just think that's kind of like maybe an interesting space we're in right now with
this newfound stuff happening.
Yeah, I mean, I think that's totally your call, Georgie, because you're doing this because
it interests you and because you get, I don't know, intellectual stimulation from it.
And if it gets boring, like just porting the next big model that gets released because people expect you to or something, I can see where that would no longer be worth it for you.
Right.
Do you have bigger ambitions with this? Do you have an end goal in mind or are you just kind of opportunistically
following your interests and your hobby
and coding up cool stuff
and a couple things happen to be smash hits?
Like bigger opportunities,
as you can imagine,
my inbox is full of people asking me to do stuff.
I wasn't really planning on doing anything.
There is one idea which we'll probably get to try.
We'll see.
And it's in the same path as I mentioned,
like trying to get people involved
and contribute and try to grow this approach.
And I don't know, I personally, I don't have
any big expectations from this. For example, I'm not going
to promise anything. I just have
a lot of ideas for random cool hacks.
This is what's interesting and I'll probably eventually
implement those and share them and I hope people
like them yeah one thing that Simon said I'm going to paraphrase one thing he said in his
coverage recently of Lama and he also mentions Georgi so good to mention this he says that
furious sound that furious typing sound you can hear is thousands of hackers around the world starting to dig in and figure out what life is like when you can run a GP3
class model on your own hardware.
And I think that this conversation and what you've produced is a glimpse into,
you know,
that phrase that he had that sentence or two he shared.
Cause I think that's kind of what happens.
Like you can now run this on your own hardware,
an M1 Mac or an M2 Mac if you've got an Apple Silicon, and get results pretty quickly.
Better than the 20 hours you had, Jared, initially with the non-C++ version of it, which I think is pretty interesting.
I just love this.
It's kind of like this new invigoration in the world where it's like, wow, I can run these high-class models on my own machine and get results and play,
which I think is the most truly fun part about software development,
hacking, programming, whatever you want to call it,
is this ability to play to some degree with your own rules
in your own time on your own machine
and not have to leverage an API or buffering
or anything like that whatsoever with an API
where you have no rate limits.
You just got your own thing to do,
and you can play with it.
You can integrate FFmpeg to do different things,
to preface it to a 16-bit wave.
You can maybe, before Whisper 2 comes out,
and you want to do diarization, transcripting, you can do that too.
You don't have to wait for the thing to happen.
And obviously, if Whisper 2 supports that feature,
roll back your code and not use it, because it's baked into the model now.
But that's the cool thing I think that's happening right now.
Would you guys agree?
Yeah, I guess you don't even need heavy hardware, which is expensive for a car to run and maintain and all this stuff.
So it opens up interesting opportunities for sure.
Well, even the GPU aspect, like you can build your own machine,
you can buy a phenomenal NVIDIA or AMD GPU,
you can build your own CPU up from the motherboard to the compute,
to the RAM, to the GPU.
But system on a chip is is readily available to pretty much
every human being, given
the money affordance
in your own pocketbook to pay for it, of course,
but system on a chip, this
Apple Silicon is pretty interesting, how
it just bakes all that into one thing
and it's integrated. You don't have to build your own
machine to get there, is the point.
Yeah, Apple Silicon for me is
quite exciting. I expect it
to become even more
approachable and, I don't
know, usable or what was the word?
So, yeah, I mean,
still I think it's like
a bit, not
still a great idea to run this.
I mean, the efficiency is not
quite there yet.
But with the way things are progressing,
like exponential growth of computational power
and exponential shrinkage of the models,
maybe in one year,
maybe one year you'll be able to do on your CPU
what you're currently able to do with,
I don't know, modern GPUs.
I guess, I don't know.
Well, Georgi, thanks so much for coming on the show, man.
This has been fascinating.
I love that you're just kind of the true hacker spirit
of just like coding up this stuff in your free time
because it's something you love to do
and your path to get here is just like,
I just code on this stuff all the time
because it's what I like to do. Your work is helping a lot of people. It's definitely also riding the AI hype
cycle that we're currently on. So hopefully it continues to go that way. I think that we'll
lose people as we go, but as things get better as well, we'll put this stuff in the hands of
more and more people on their own hardware, on their own, with their own software, easily integrating. And especially for, I mean,
from us, we're not quite yet using Whisper because we're still, you know, trying to figure out that
speaker identification bit. Thank you so much for guaranteeing it in the next six months. I'm just,
I'm just joking. But we're excited about it. And we can see a future where this, you know,
directly benefits us, which is super cool
and in the meantime it's benefiting
a bunch of people
so yeah I just really appreciate you taking the time
I know you don't do podcasts
so this is your first one
and prying you away from your keyboard
think about what you could have done with this time
you could have changed the world already
but instead you just decided to talk to us
so we appreciate that thanks Thanks for having me. I enjoyed it.
Well, something you may not know because we almost never tell you, but we have a YouTube
channel. Mainly we use it for clips and features from shows like this, of course, the changelog,
jazz party, go time, founders talk, brain science, the entire changelog podcast universe clips.
They're all there at youtube.com slash changelog.
You should subscribe.
And if we're giving you that much value to go to YouTube and subscribe, you might as well check out changelog plus plus.
That is our membership.
Yes, we love our members.
We drop the ads.
We bring you a little closer to the metal.
We give you bonus content.
We give you discounts at the merch store.
So cool.
And on the note of bonus content, we have a bonus segment today for our Plus Plus subscribers.
So if you're a Plus Plus member, check it out.
It's right after this.
If not, changelog.com slash plus plus.
It's right after this. If not, changelog.com slash plus plus. It's too easy.
Once again, a big thank you to our friends and our partners at Fastly and Fly. And also, our good friends over at TypeSense.
Blazing fast in memory search.
So cool.
TypeSense.org.
Check it out.
And those beats.
Those beats are banging.
Breakmaster Cylinder brings it every single week for us, and we appreciate it.
Of course, thank you also to you, our listeners.
Thank you so much for choosing to listen to this show all the way to the very end like this.
We appreciate you.
But that's it.
This show's done.
We will see you on Monday.