Software Huddle - Lessons from Transcribing and Indexing 3.5 Million Podcasts with Arvid Kahl

Starting point is 00:00:00 and now we are just scanning the whole podcasting ecosystem every single day, 50,000 shows, transcribing all of them, analyzing them. It's pretty wild. The two areas I want to dive in a lot are the transcription work. So once you download that, like the actual parse out the transcription and summarize and all that sort of stuff, and then the search infrastructure. For the longest time, all my transcription actually ran locally on this very machine

Starting point is 00:00:25 that I'm talking to you about. That's cool. For the business. I had this thing, and when there was a power outage, I had a problem. Where are you at right now with search? Are you using Elasticsearch now? Kind of, I'm using OpenSearch now.

Starting point is 00:00:37 Okay, that counts, yeah. Right, it's kind of that, yeah. Yep, that counts. Well, because I thought, were you Mellie search at one point? Yes. Why did you choose PHP Laravel in this one? Or did someone else choose Alexa for you

Starting point is 00:00:50 with Feedback Panda? I guess like why the switch there after having a successful exit. Sup everybody, super excited to have Arvid call on the show today. You might know him as the Bootstrap founder. He sold Feedback Panda, he's running Podscan, and it's just like a really interesting combination

Starting point is 00:01:06 of technical and business stuff, like the stuff he's doing with Podscan where he's downloading and transcribing and making searchable the entire podcast universe every single day. It's just really fascinating. We talk about some tips he's learned from doing all the AI transcription stuff,

Starting point is 00:01:22 running his own, running the search infrastructure for that and just managing all that stuff and just all these lessons you learned along the way. I followed Arvid for a long time and really excited to get the chance to talk with him. It did not disappoint, he's a fun guy. So check it out, you know, and if you have any questions for us,

Starting point is 00:01:38 if you want guests to have on anything like that, feel free to reach out to me or to Sean. And with that, let's get to the show. Arvid, welcome to the show. Thanks so much for having me on. I'm excited. Yeah, I am super excited to have you here. We were talking before we started,

Starting point is 00:01:52 and just like you're one of the most interesting people to follow on Twitter, just like documenting your journey, not only selling Feedback Panda, but now the cool stuff you're doing with Podscan and just like keeping us updated with like all the challenges, the business challenges, the technical challenges, all the stuff you're learning and stuff is very cool.

Starting point is 00:02:08 You're an awesome Twitter follow and all that stuff. But I guess for people that don't know as much about you or about Podscan, can you just give us an intro to yourself? Sure, so thanks for the kind words, really appreciate it. And I think building in public has been a big part of my journey over the last half decade or so that I've been more publicly available. I've been a developer since I had a computer, it's like a 14 or something.

Starting point is 00:02:32 That's when it started. And then I just always wanted to build my own things, which I did. Then I realized, yeah, I can actually make money of this. So I built my own businesses as well. Feedback Panda was probably the biggest thing I've done around 2019. When I had a two-year-old business,, we sold it and that was a big move. Put myself more in the public eye as well with all my founder peers and stuff. And ever since then, I've been building software businesses and a media business, a blog, YouTube and podcasts and all of that in public in front of people trying to just sharing the journey. Because I feel that is where I learned most from other people is when they shared

Starting point is 00:03:07 their challenges and how they overcame them or how they struggled all of that. So I've been trying to do the very same thing for others. Podscan is a result of this. I was I was just building my own media business, my little podcast empire, and I was like, oh, man, I wish I could take in just notes for my listener, kind of like voicemail. So I built a little SaaS tool for that. And I was like, cool, but nobody is buying this. I want to make a business out of this. How do I find

Starting point is 00:03:32 people who want voicemail on their podcasts? I was like, hmm, there are all these social monitoring tools for Twitter for Facebook, where you can kind of track like Google Alerts, really retract a name, track a brand or whatever. This doesn't seem to exist for podcasts, I needed for my own tool. And then very quickly realized that I shouldn't just build this as a little marketing stunt. This is a business all in itself, like figuring out who talks about your names, your brands, your products, or anything you're interested in really, on podcasts all over the world. So that's where Podscan started. And that's how a year and a half ago

Starting point is 00:04:06 that I had this idea and implemented it. And now we are just scanning the whole podcasting ecosystem every single day, 50,000 shows, transcribing all of them, analyzing them. It's pretty wild. And I'm super, super happy to be able to share that journey because it's been up and downs, lots of downs, like many entrepreneurial things, couple of ups.

Starting point is 00:04:27 And it's been really cool to just be, have that be part of, you know, a community of people that I admire. Got a lot of good help there too. Yeah, for sure. Yeah, and it's, again, it's really been so fun to watch the journey. Like I remember buying and reading Zero to Sold, your book,

Starting point is 00:04:42 and that was fun and just like all the stuff you've done. But then like with Podscan, it's been just another level of like, just seeing you talk through all the technical stuff. And I love it when I can get someone on here who's like very technical, like you can talk about that, but I can also talk about the business challenges and like, you know, finding product market fit and finding customers and marketing and all that, that sort of stuff, which is like very fun. So that's great. Yeah, you went through how you got the idea for Podscan. I guess at a high level, just to tell people,

Starting point is 00:05:09 you are ingesting the entire podcast universe every single day. I looked on your website and it says 35,000 new episodes a day. Basically, every single episode that comes out, you pull it down, you automatically transcribe it, make it searchable for people and alerts and all these sorts of things. Yeah, it's one of the weirder things about this particular software as a service business is that the scale that I operate at has nothing to do with how many customers I have.

Starting point is 00:05:38 Most software businesses, they scale with the people using the product, but it really doesn't matter to me if I have 10 customers who want to track all podcasts everywhere, or if I have 100,000 customers who want to track all of these things. Obviously there is a scalability there somewhere, but it by far is just outsized by how many shows are released every day,

Starting point is 00:05:57 which is kind of static if you think about it, right? You said 35, sometimes it's 50, depending on the day. Mondays are more, weekends are less, right? There's always a very clear number of shows in any given day that's being released because people may start a lot of podcasts, but not a lot of people keep doing their podcast. So it's a fairly static number, 3.8 million shows all over the world, which is lots, but not

Starting point is 00:06:20 all of them are active. And not all of them are always active or active on a weekly cadence, right? There's shows that are daily and so on. So it's a lot of data, but once you figure out how to deal with this, I had to build an infrastructure for parsing all these RSS feeds because the podcast ecosystem fortunately is built on top of open standards. So the RSS feed, every podcast has to have one, even if it's also hosted on Spotify or Apple podcasts, those kind of more walled garden systems, there's always some kind of podcast hosting company out there that hosts the original data source. And we fetched that

Starting point is 00:06:54 download the mp3 and I have a fleet of GPU based servers that transcribe, we can dive into the details. I hope you do. Yes, please. How this stuff works, but they effectively 24 seven, all these servers, dozens of them, are just churning through. They get a URL, they download the file if they can, because that's also bot blocking and all of that shenanigans happening. They try to transcribe it in whatever language it's in. They try to get timestamped lines of SRT format kind of subtitle files come out of it, or

Starting point is 00:07:24 even word level timestamps in those podcasts as well. There's a lot of data being collected that gets sent back into my database, and then I have a whole other system of AI-based data extraction that figures out who's a host, who's a guest, who's the sponsor, what are the main topics and all of that, because that's what my customers really want to know. For Podscan, that would be PR companies, agencies, marketing departments, like people who really have a vested interest in knowing who is talking about their product, their boss, their brand,

Starting point is 00:07:55 right? Who is shit talking, that airline that works for them or they work for. It's all of that kind of stuff with different levels of urgency that people need to know something about. So we just ingest everything to make sure that we catch everything. Yep. Yep. And even on that level, how do you find everything?

Starting point is 00:08:13 I'm aware of the RSS, a little bit of how it works on Open Standards, but are there four or five hosting providers that have just a list of all the podcasts they host? Or how do you find those 3.8, 3.9 million podcasts out there? So I'm just going to give away all the secrets that I have that are very easily Googleable. And probably if you were to ask a chat GPT or a Claude something about this, they would guide you into that very direction as well. In the beginning, that does not did not exist like that tech, like the whole AI world kind of happened in parallel

Starting point is 00:08:45 with the development of Outscan. I started in April 24, and the tech was kind of there, but not really. At least not at Cursor, and that stuff didn't exist. So I had to do a lot of things manually. And when I started, I also had this big question of, so where are all these shows? I didn't know. But as anything for any developer building a business today, I just looked into the open source world. And I found what is called the podcast index, which is a kind of a community operated database of all the podcasts everywhere that also has ties to the people who work on the pod ping is kind of think a

Starting point is 00:09:22 blockchain based pop-up system where podcasts can say, I have a new episode that is just out and anybody subscribing to that feed then gets kind of a notification. So they, you know, a player or another server that wants to do stuff with it can pull that episode and do whatever with it. So that's called Podping and that's integrated

Starting point is 00:09:41 into the big world of the podcast index. And they have an API, which is great, because you can just pull the most recently released episodes, the most trending episodes, but they also, and that's the most fortunate part of all, have a four gigabyte, I think, refreshed every week SQLite export of all the RSS feeds of all podcasts everywhere. So if you built the infrastructure to handle this, which is a whole other topic, you can just take the SQLite file, parse it into your system, and all of a sudden,

Starting point is 00:10:15 you have 4 million feeds to check, or 3.8. And you just check those feeds every end minute, all 4 million of them? You should see the wondrous architecture behind that. But yeah, I figured out that Popping is pretty well established among most podcast hosting providers. Think of Transistor or Acast or those kind of providers that exist. So they integrate this automatically for most of their shows.

Starting point is 00:10:41 So the bigger providers having this integrated means that most podcasts will tell me when they have a new episode, but then some don't. And for those, I just have a couple of machines on AWS and on Hetzner, like being a German, I obviously needed to take the very German hosting alternative too. And that's also where I host all my GPO based servers because that's the only way for me to have this profitable in any way.

Starting point is 00:11:06 I'll get to that. But those couple servers, three or four exist, and each of them, with a certain offset, checks every single feed at least once daily, prioritizing the bigger feeds of the bigger shows, where I know there's a couple of thousands, a couple of hundred thousand listeners, they get checked every six hours, every four hours, depending on that. But yes, there's a lot of fetching RSS feeds, which taught me a lot about eTag headers and about caching headers and

Starting point is 00:11:38 these kinds of things, just telling me, okay, we don't have anything. Don't pull the full 10 megabyte RSS feed like every couple hours, please. Which is fine if you do it once in a while, but if you do this for every single podcast hosted on that provider, it becomes a problem. So there was a technical challenge in not overwhelming them with my requests, which I didn't want to do because I love the podcasting ecosystem and I didn't want to be a burden on it. But you also want to be up to date and you want to be able to fetch the right information

Starting point is 00:12:05 at the right time. So that was an interesting balance to strike. And did that come out of discussion? Did you talk with Justin at Transistor or someone like that? Or did you just figure that out as you're working through this yourself? Were there actual conversations with them on how to do that? Or how did that go through?

Starting point is 00:12:19 Yeah. Justin is an example. Talked to him a lot. Talked to Greg over at CastOS as well. A lot of the podcast hosting providers that also are kind of in the indie hacker bootstrapper community. I had a very easily approached them with this and had a good conversation. But again, it's all open source, so there's a lot of documentation around it. Podcasting 2.0 is kind of a standard format

Starting point is 00:12:41 that also explains how certain things should be done and how certain API should be accessed. And, you know, like there's a lot of good docs out there. And when I started actually downloading those files, one of these larger companies reached out to me because they saw I had put the name of Podscan in the user agent, which was, you know, important to just tell them, hey, I'm a bot. And they were like, hey, cool, you already put your name in there. Here's a couple more resources on how this should work so we can kind of discard your downloads against the reporting that they have to do for marketing reasons. My podcast downloads should not count to an ad being played

Starting point is 00:13:20 or something. So there's a whole organization there to make sure that podcast ad plays count as actual ad plays. Yeah, there was a lot of outreach towards me, all very benign. There never was stop what you're doing. There always was, hey, this is how we do it right. So it's a pretty good community. Again, building open standards. There's a lot of like willingness to give

Starting point is 00:13:42 in this community for sure. Yeah, yeah. Very cool. Okay, man. I was excited to talk to you because I'm like, there are some interesting technical challenges, and I didn't even think of this one. I'm just finding all these podcasts and downloading them efficiently and all that sort of stuff. We're already getting started here.

Starting point is 00:13:57 But the two areas I want to dive in a lot are the transcription work. So once you download that, the actual parse out the transcription, then summarize and all that stuff, and then the search infrastructure. You already put it really well, the scale is independent of your customers, which is an interesting problem that not a lot of people have.

Starting point is 00:14:18 It's like all this creation is happening apart from your customers that you have to ingest regardless of that, which really blew up your probably cost and infrastructure right away. So I want to talk about that stuff. I guess let's start with transcription. I guess walk me through, like OpenAI has whisper models and things like that. I guess the one thing I hear people, like all these AI companies allow you to do transcriptions via API, but a lot of people say,

Starting point is 00:14:45 hey, once you get to big scale, like those are expensive. You said you run your own GPA, you flee. I guess like maybe walk me through that infrastructure of like, when, how are you transcribing these episodes? Let me even walk a little bit further back because the business that I built originally, the whole voicemail thing, right? That's kind of what started,

Starting point is 00:15:04 did the whole journey into what thing, right? That's kind of what started the whole journey into what eventually turned into Podscan. That business needed transcription too, obviously, right? Voicemail, you want to take in the audio, you want to send over the audio so people can put that into their podcast episode and then like answer a question or play it to show

Starting point is 00:15:20 that people talk about their show. But I also wanted to send an email over, hey, you got a voicemail, this is what they say. So I needed to figure out how to take a 20, 30 second fragment of an MP3 or something and transcribe it effectively. So that's when I started, and that was, must've been 2018, no, that's not right,

Starting point is 00:15:38 2020, one, I don't know when it happened. It was a couple of years ago that I just started playing with this. But pre-ChatGPT? It's pre-ChatGPT, but it was not pre-Whisper, because that's when I kind of looked into Whisper for the very first time. So Whisper, for anybody who is not aware of this, is a freely available transcription model, like a speech-to-text model,

Starting point is 00:16:01 that OpenAI has been offering for quite a while. It's, I think, one of their earlier machine learning or, I guess, would almost be machine translation, although it's not, models that runs both depending on how you use it on a GPU, obviously, right? You can can use it in any kind of GPU context, but it also runs on the CPU. If you use a tooling like, what is it called, Whisper.cpp, which is just like Llama.cpp, kind of the attempt to bring the not so large models, I guess smaller language models or medium-sized ones, quantized models

Starting point is 00:16:40 most of the time, because if you try to get the full model on a normal GPU, it wouldn't work because it's just so sizable. It's the same with Lama and all of these things. But even just for the speech-to-text ones, they have really good versions of these models that you can run on a CPU. And I tried that out on my little MacBook back in the day, and I was amazed. It took a couple seconds for like a minute's worth of audio, which nowadays is almost a joke. But I mean, if that runs on a regular server that doesn't even need to have a GPU, and you

Starting point is 00:17:12 don't have like 100 customers trying to do this at the same time, then that's perfectly fine. Right? That's a background process. That's some kind of queue me up and send me the result later. And then I dispatched the email in a couple minutes kind of work. So that's what the original contact for me was with transcription. I was like, this is amazing. I mean, these models, they have different models of different kind of quality. There's a tiny model, a small, a medium, a large, and they all obviously are bigger and better or smaller and worse and smaller and faster and bigger and slower.

Starting point is 00:17:46 So there's a trade-off there. And when I started with Podscan, where I wanted to actually get the stuff transcribed on the GPU because of the scale that we mentioned, the large model was extremely slow compared to the tiny one. Tiny one was, I think, 24 times as fast as the large model. So transcribe an hour could take you a couple seconds or it could take you 15 minutes or even more, right? Depending on the quality that you want. So I had to build almost an infrastructure because when I was beginning, I didn't have any funding, didn't have any money. I just wanted to see how this works.

Starting point is 00:18:20 I had a Mac Studio with, I think it's an M1 Mac Studio. It's not the latest and greatest now, but for the longest time, all my transcription actually ran locally on this very machine that I'm talking to you about. I had this thing and when I was a power outage, I had a problem. It was wild. But yeah, most of the transcription happened on this machine, then I bought a Mac Mini because I needed something for my studio computer where I record and all of that stuff. And I figured out, hey, if this thing is sitting idle, this could do maybe 20% of what my Mac Studio can do,

Starting point is 00:18:55 but that's 20 additional percent on top of the transcription there. So that one was running 24 seven as well. And then I tried to find alternative ways because I knew that obviously, can't just transcribe this stuff on my local computer anymore. So I looked into what does AWS offer

Starting point is 00:19:09 in terms of like GPU compute. And that was quite expensive because, you know, AWS has a certain kind of premium for a certain kind of extra service that other providers don't offer or not, offer not as reliably. So I looked into others. Google didn't have anything back in the day

Starting point is 00:19:24 that was accessible to me. So that didn't happen. So I looked into others, Google didn't have anything back in the day that that was accessible to me. So that didn't happen. And there were things like what is it called? Ori was one and Lambda Labs was one that was really cool. Like they had kind of hosted instances too. And I think on Lambda Labs on a 10 graphics cards was the first time that I started using like transcription on other servers someone else's Yeah, just just externalized and all like VPC. So I'm a Laravel developer,

Starting point is 00:19:49 at least that's what I've been over the last couple of years. So I'm just running a Laravel applications that call like FFmpeg in the background or that call like a binary for transcription. I'll get into that in a second. So I just set them up via SSH, and they were just running on these things and they were talking back and forth to my API.

Starting point is 00:20:04 It's still what it is right now. The level of orchestration in my business is, let's call it horrible, but it works because it's just such a mishmash of all these different things that came from the initial locally running thing. But yeah, that's transcription right now runs through a tool that is called Whisper C Translate 2 for anybody who's interested in that. But there are many different kinds of binaries that you can install on an Ubuntu server or on a Mac, for that matter, that run Whisper in some capacity. There's Whisper X, there's just the original Whisper, I think, that tries to run it on a CUDA infrastructure.

Starting point is 00:20:40 But I found Whisper- Yeah, why did you choose the C Translate to- Because one of the things that I wanted for Podscan was diarization, like figuring out who is speaking. That's kind of what this is, right? Like speaker one, speaker two, that kind of stuff doesn't come for free. It actually is a quite computationally intensive act to do.

Starting point is 00:21:00 Like you, just from me trying to understand it, right? You take the waveform of the conversation and then what a diarizer does is look at different parts of the waveform and see how they structurally differ. And if the waveform looks kind of like this, probably speaker one, if it's a little bit louder or like certainly shifted in a certain way, that's speaker two, this is simplified. But that's kind of how diarization works. So it has to go through the whole audio file and check for the just slight differences in when people speak and then there's like a dead sound detection in there. And

Starting point is 00:21:32 all of that, this it's computationally intensive. In fact, it is so computationally intensive that it takes almost as long if not longer to diarize a podcast or an audio file between all of them are podcasts, so that's why I'm defaulting to this, as it would take to transcribe it. With Whisper's latest and greatest, what is it, version three large turbo model,

Starting point is 00:21:54 which is like a thing that they invented halfway through me building on the large normal model. They released a turbo model, which was like 12 times as fast, which immediately 12x'd my throughput in terms of transcription, which was wonderful, but I digress. So to be able to-

Starting point is 00:22:09 And similar quality for that? Almost the exact same. Like they figured out, I think a drop like the word error rate went up by 1% of something, which can be significant, but not for a podcast transcript that then go through an AI step, that's fine, right? So to be able to diorize, I needed a tool that had diorization built in,

Starting point is 00:22:27 or at least had access to diorization because some of those Whisper tools did not. People just want transcriptions. They don't necessarily need who's talking and when. It was important for me and my customers, for some shows, but not for everybody using the tools. And C Translate 2, which in itself is a completely different tool,

Starting point is 00:22:44 it's meant to be a translation tool with Whisper turned into a translation and transcription tool. So for them, it also mattered who was speaking. So they had a diarization thing built in which is called PianoTé or PianoT for the people who don't speak French, like the people who built this tool. I chatted with those folks too,

Starting point is 00:23:03 because once they noticed that I'm using this to diorize, and again, build it in public, right? I shared it on Twitter and all kinds of places. That gets to people out of the woodwork that built these kinds of projects. So they reached out, we had a chat, it was a lot of fun just to see where they are going. They wanted to build an API for this

Starting point is 00:23:19 and everything was an interesting, almost business-focused conversation, not even technical conversation at that point. But yeah, that PianoT tool is exactly this. It's a diorization tool, and then that interacts with the transcription tool, and that allows every single timestamp to have the speaker label in it as well.

Starting point is 00:23:37 And is that in a single pass, or does it do like one pass and spit out a transcript, and then you run through again, and does the diorization, matches those up. It's just a single path? No, it's the other way around. It diorizes first because it needs to know where the different fragments are.

Starting point is 00:23:52 And then that gets fed into it, not even the transcription process, but the, what is it called? The timestamp assembly process that happens throughout those steps. It transcribes, here are the words, and it's like, okay, this is when the timestamp is. Who was talking then?

Starting point is 00:24:10 Then it pulls in the speaker label. It's really just a string formatting situation, but it pulls that data in later. That's still one pass with the tool, but obviously it's an individual step in a multi-step chain of many different things. Yeah, yeah, interesting. You mentioned starting with these short voicemail

Starting point is 00:24:30 recordings, and now, of course, you have hour-long, two-hour-long, who knows how long podcasts. Is there a difference in quality between those two? Like, is it able to manage a much larger file just as well as it is a shorter one? It takes longer to do? Or is there like a degradation in quality? Funny enough, that depends on how many of these things you run in parallel

Starting point is 00:24:52 on the same GPU. That's been my experience. So one of the things that I very early was focusing on was like maximizing efficiency, because if I'm going to pay like $300, which is I think roughly what I pay for a GPU based server on Hetzner, just saying, right? Like that's pricing you will find nowhere else. I mean, those things sometimes just explode apparently, but that's fine, right? Like, obviously that's fine for you. It's, it's stateless. Yeah. Yeah. Right. And there, there are migration paths. And if you build it well, you could just redeploy. So if I pay that much money,

Starting point is 00:25:29 then I really need to use this GPU 24 seven. So I tried from the beginning to figure out what's the optimal, just paralyze ability of this, like how many of these things can I run in parallel, and I found that anything more than four tends to degrade performance at the later points in conversations, like 30, 40 minutes in. And that might well just have been that particular version of the model on this particular kind of GPU, but I've seen these things and I've kind of reduced it ever since. I used to run like 12 or 15 in parallel because I could. I had a H100 once. That was great.

Starting point is 00:26:09 Like running it on that kind of it's a total waste. Honestly, that's another learning. And I'm just going to throw everything I learned out here because those things. This is great. This is awesome. Yeah. I'm going to talk about them as randomly as they appear to me, like in the process of building the business. But if you're doing transcription on high-end GPUs, you're wasting the full potential of the GPU. Like, even if you're running them in parallel, for some reason, running like a Whisper large model, even the large, large

Starting point is 00:26:40 one, not just the Turbo, if you run one or two or 20 on an H100, the throughput of all of these together is probably going to be the same. It is pretty wild. It's like you might as well use a really old, weird GPU that is a bit slower, but much, much cheaper for you to run these things on. But they're not using the whole, I don't know, VRAM of the thing,

Starting point is 00:27:02 that they're not using the full memory space. I'm grasping for straws. I don't know what they're not using the whole, I don't know, VRAM of the thing, that they're not using the full memory space. I'm grasping for straws. I don't know what they're not using, but they're just not using it. So for transcription, they're not getting any faster. They're kind of bound by some internal bus at that point. That doesn't change. So it's not very parallelizable. So I stopped these $1,200 a month rentals very quickly after I noticed that I could just buy four full servers that would do way more than I could ever run on this thing. And yeah, that's my learning here too. But there's a lot of that. Yep. And that's surprising to me that you found the quality degraded running in parallel.

Starting point is 00:27:45 I would have just assumed they would have gone slower if you're overloading it. But it's actually the quality, which is going to be a harder thing to sort of pick up and notice for a little while. Yeah. It was mostly when I was starting to hit the 80, 90 percent of VRAM on that particular GPU, I think it must have something to do with memory management or something in there, right? Like the parallelizable things, they are only that parallelizable

Starting point is 00:28:14 because there's always context switching in that. And who knows exactly what magic happens behind the scenes, but it did impede the productivity of those systems. So I found a working set. Do you, you're transcribing so many of these per day. Do you have any prioritization based on like, popularity of the podcast and things like that? You do, okay.

Starting point is 00:28:36 So. Yeah, I had to build this. Like I have several queues that just internally even where my, it should be PubSub, but it's not because it isn't. It just happens to be an API-based thing where my clients, like the transcriber API servers, they just go to an API, fetch a thing, and then they come back when they need a new one.

Starting point is 00:28:59 All of that queuing is handled internally there. So it's just a big old redis with multiple queues with IDs, right? That's pretty much what it is. So I have three queues. There's a high priority, a medium priority, and a low priority queue. And then there's a fourth one, which is a do this immediately queue

Starting point is 00:29:14 that skips all the other queues. Because sometimes one of my customers needs this transcript right now, and I don't want to queue it with the other ones that, you know, maybe Joe Rogan is just really something and he would get the first spot because he has one of the biggest ones I need to even get like in front of Joe Rogan on the queue. So I have this kind of immediacy queue as well. How does something get triggered? Like is that a customer reaching out to you and saying they need it or are they allowed to like click a button in the thing saying, hey, I want this right now?

Starting point is 00:29:41 Yes and yes. So both of the both of these. So I'm using a lot of internal signals in Podscan because it's kind of hard to figure out which podcasts actually are popular and useful. It's easy to figure out who's popular. You just need to scrape their Apple podcast page and count the number of reviews, or you need to check their podcast ranking somewhere. You know, popularity is something that I figured out quickly, lots of scraping, let me tell you that. But usefulness is a different thing because again,

Starting point is 00:30:11 useful to my customers is very different than useful to the full humanity who wants to listen to shows. I needed to build internal signals into Podscan that signal to me that somebody finds this podcast useful now and maybe also in the future. So when there's a lot of social mention tracking on, on, on Podscan, that's one of the big features, right?

Starting point is 00:30:32 It's kind of Google alerts for podcasts. Somebody mentions my name, your name, you get a notification. So whenever this happens to somebody who's a paid customer, for example, that is a signal to me that this podcast from which this mention originates probably should get at least a medium priority in the future. If then somebody clicks on that mention, goes to the part of Podscan where they can look into the transcript and they actually stay there for 30 seconds, okay, this is a high priority candidate now because that actually is a reasonable assumption that this might be useful in the future.

Starting point is 00:31:07 And there's like 20 of these spread out all over the system. Somebody searches for something and clicks on it, right? I'm just collecting a lot of signals to have my own internal priorities. That's a good signal. Yeah. That's very cool.

Starting point is 00:31:24 So I guess I'm just building with AI systems. You do a lot, the prompt is so important and getting the right context in there. Is that important when doing transcription? Are you building out some sort of prompt and context to go with the transcription? Is the translation, or not translation, transcription context specific?

Starting point is 00:31:43 Like, hey, this is a tech podcast and you indicate that in some way, or is translation, transcription context specific? Like, hey, this is a tech podcast and you sort of indicate that in some way, or is that less so in transcription where you're not doing as much context engineering or whatever you wanna say there? I wish it was. From the beginning, that would have been something that I really, really have liked to exist,

Starting point is 00:32:00 mostly because brand names are hard. Just imagine Spectrum or Charter, like an ISP company or like a cable company, those kinds of things. Companies with names that are also actual words that other people use in some other context, that is the easy part. But now let's look at like Feedly or Feedlyly or whatever, right? Things that don't really exist in human language. How are they going to be transcribed? First off,

Starting point is 00:32:30 they're going to be mistranscribed, like to begin with, because the thing is going to associate a different word with it. And then, well, do I now have to put an alert on the word that is not the word that people are looking for? It's horrible context is really, really hard. So the problem is that Whisper, as just the model that it is, it has a context string you can attach to it, where you can give it almost a dictionary of words that you know that are going to be in there.

Starting point is 00:33:00 But you have to know that they're in there. So I can't just give it a dictionary of all the interesting words that I want to find because I did that in the beginning and then it found them in all the wrong places, right? Because that's the thing with context. The moment you give an LLM of any sort and I would assume that these things are kind of LLMs, really small and different, but they are. The moment you give them that, they find it because there are these kind of people pleasing mechanisms built into them, right?

Starting point is 00:33:25 So that does not exist. What I do at this point is I feed the title of the episode and the name of the podcast as context into the episode itself. Sometimes if I have it, the names of the people that have been on the show previously, I give a little context with actual verbs and nouns and names that exist.

Starting point is 00:33:44 But I wish I could give it semantic context. This is a tech show. So look for that. Doesn't exist yet. I wonder if OpenAI is ever going to work more on Whisper as it is, or even if they are, if they're going to open source it, which I don't think they are, but we have what we have.

Starting point is 00:34:04 But that context layer needs to happen afterwards in an actual text based LLM step if needed. Yep. Yep. And so do you take that original transcript and then feed that through an LLM with more context to like clean up that transcript? Or like what happens between that initial transcription and diarization and putting it into your search system? Not much.

Starting point is 00:34:26 So I thought about it. Obviously, there are experiments that I run with fitting that thing right into GPT or right into Claude or whatever. But it is so expensive. It is so expensive to run. Like, there's 70,000 or so podcast episodes released over the planet every day. It is so expensive to run. There's 70,000 or so podcast episodes released

Starting point is 00:34:46 over the planet every day, and half of them are just people reading from the Bible or the Quran or the Torah. So I kinda dismiss them as relevant for my marketing or PR people, so I don't describe all of these, selectively don't. But the other 50,000 or so that come into the system, if I were to take every single transcript of these

Starting point is 00:35:07 and send that like full context into a chat GPT, it would be okay if I just have to answer a question in there, right? Like I have this, this one of the features of Podscan is whenever you have an alert set up, you can look for keywords but you can also say, well, if a keyword is found, let me ask this question off the AI, like, does this podcast episode talk about flowers, as well as the thing

Starting point is 00:35:30 that was mentioned? Or is there more than two or three people in this show? Because I only want alerts for like, lots of people on podcasts or whatever, right? Whatever question you might have, you can put into an alert, and it then takes the full transcript, puts it into a GPT and asks this question, you get a yes or no back. And judging from that, I sent the alert or I don't. That's like one of the most popular features of FontScan is this kind of context-aware alert, because you can ask any question, right?

Starting point is 00:35:56 Is this a good candidate for this kind of tool? And if GPT thinks sure, then you get that as a notification. You can build a lot of really cool stuff on top of this. But that's only feasible because it's not happening to every single episode, because it's only when a keyword is hit. And because the output of the LLM is just one or two tokens,

Starting point is 00:36:14 it's a yes or a no. If I were to put, I don't know, 500,000 tokens in there, like a Joe Rogan can easily get there with his 4 and 1 half hours of just brain dumps. And I expect 500,000 tokens out back, and that's the expensive part. Man, I did the math. I think I would pay $10,000 a day

Starting point is 00:36:33 if I were to do this at scale. This is what I'm currently paying for the whole setup a month. Like, the whole infrastructure costs me that. Have you looked into running LLMs locally, like you've done with the whisper or is that just like too much of a hassle to do that? I've done this and they still are. These are kind of my backup LLMs if there is ever like so often happens that OpenAI has just you know to turn on

Starting point is 00:36:57 the respond with 500s mode for a couple of minutes or a couple hours. So for that, I have Llama 3.1, the 7b, once running locally on these servers, because I use my transcription servers also as my dispatch to AI servers, right? They just go through a couple of queues and they fetch new stuff, and then they send it, either they send it out to OpenAI or they run it locally as a fallback,

Starting point is 00:37:24 which then hampers my transcription efforts, which is why I don't like it. But yes, I have llama 3.1s. They're fine for the question answering thing that I just described, that the context-aware filtering. They're not that great at data extraction, because they are just not that great. You can do it if you reduce the context window

Starting point is 00:37:44 and you chunk over a conversation. You extract part by do it if you reduce the context window and you kind of chunk over a conversation, you extract part by part, and then you kind of synthesize it in another call, but that takes like five minutes apiece. It's kind of, you know, for one show, and that's just a lot. So I've tried this initially, results were okay-ish, customers were like, okay, I guess that's mostly right.

Starting point is 00:38:02 But ever since I switched to OpenAI to their platform, particularly with what is it the 4.0 mini model and the nano model that they recently released, they're fine for all of this extraction stuff and all of the summarization and these things. It is still quite manageable in cost. It's like 1,500 or so a month when there's a lot of work to be done.

Starting point is 00:38:25 I think I hover around like spending 50 bucks every two days or so. That's still math. Okay. Yeah, interesting. Yep. And then one last question I have around the transcription stuff is like, how do you evolve this? Like if you think of either a new feature or maybe a new

Starting point is 00:38:45 mechanism, do you go back and redo old episodes or do you just say, hey, we only do that for stuff going forward now or how do you like, because you know, going back and doing 35 million episodes would be quite a cost. I guess, how do you think about the evolution of the system? So as a frugal entrepreneur who also is serving a group of people who are interested in what's happening right now, the answer should be clear. I'm mostly focused on delivering this value for current and recent things. There are always features and people have used them to retranscribe older episodes or reanalyze older episodes episodes. I have a an API

Starting point is 00:39:27 endpoint where people can tell me, hey, this podcast needs to be retranscribed this with stuff in there, all episodes, and they can call that endpoint a couple times a day, there's a rate limit there, right. So it's like, you can tell me that there's stuff wrong, I'm going to fit it into the other things into my cues. But realistically, the most important stuff happens today because it matters. If there's another oil spill, people don't care about episodes from like 20 years ago, or like 10 years ago, right? They need today prioritized. So the strategy is once Potskan

Starting point is 00:39:59 becomes so hyper profitable that I can do this, I'm totally going to do it. You know, it's really what it is because it's really just a question of resources. All these cues that I can do this, I'm totally going to do it. It's really what it is because it's really just a question of resources. All these queues that I have running, if I were to, I think at this point, it probably would be 2x or 3x my system, my infrastructure of transcription servers, I would walk like three days into the past for every day that I walk into the future, just in terms of like how much effort I can spend on catching up with older episodes. So that 35 million that I walk into the future, just in terms of how much effort I can spend on catching up with older episodes.

Starting point is 00:40:27 So the 35 million that I've so far ingested are obviously in the past, and they still go back further back into the past whenever there's room under queue. Yep, okay. Okay, so you do redo some older stuff when there's capacity, interesting. Okay, yep, very cool.

Starting point is 00:40:43 Okay, I wanna switch to search now because search is such a hard problem. It also just differs based on scope. You hear a lot of people say, hey, use Postgres or MySQL full text search. And that totally works if it's first and last name in a CRM within an organization, right? But 34 million podcasts, transcripts that are,

Starting point is 00:41:04 I don't know, 50 kilobytes, I don't know how big they are, but like many kilobytes each, like you're doing serious search stuff. Tell us about your, where are you at right now with search? Are you using Elasticsearch now? Kind of, I'm using OpenSearch now. Okay, that counts, yeah. Right, it's kind of, yeah.

Starting point is 00:41:20 Yeah, that counts. Well, because I thought, were you Melly search at one point? Yes. Okay. So, and Well, because I thought, were you Mellie Search at one point? Yes. Okay. So, and I still am. Okay, I can tell you, I can tell you the evolution of this and what choices I made along the way. So, when I started out, again, it's a Laravel project. So, it's a PHP application built on Laravel, which is a wonderful system. Like, it's really meant for people who try to build business-able solutions, like monetizable solutions on top of it.

Starting point is 00:41:48 So I had a very easy time just integrating all the default components to get paddle on there as my payment processor and to get like login with this and login with that. Like just get all of these things into the application real quick. So I could then build the business logic that only I could build. That's kind of the idea. So one of the things that Laravel offers is a thing called Laravel Scout, which is a library that plucks into search engines.

Starting point is 00:42:11 And they support, I think, three. One of them is Melee Search, the other one is Type Sense, and then Algolia, I think, those three. And all of these are kind of real-time search databases. They're not really these kind of big old full text everything search. It's more like if you have, like you said, titles and, and whatnot, you can put them in

Starting point is 00:42:31 the index and a super super fast, like sub five millisecond searches and all that. And obviously, that's great, right? Why would you not start with this? But then came reality and then came Okay, this is now 500 gigabytes of text data that this thing needs to search. Maybe search still does, it's really cool. It ingests all this data. It has just like Elasticsearch,

Starting point is 00:42:52 just kind of reverse whatever it's called lookup thing, TF-IDF type thing or inverted index or something like that. That's the one, yeah, the inverted index, right? Like the frequency of the term is invertedly proportional to how relevant it is. So that is also in Melee and all these other tools. So they can find things rather quickly. It just turns out that ingestion is rather slow

Starting point is 00:43:19 after a certain scale. I was talking to the founders of Melee Search as well, because again, building in public, once you talk about I'm using your tool, they were like, hey, let's talk. And I've had this many, many very interesting troubleshooting, bug hunting, and feature request conversations

Starting point is 00:43:37 with the CTO of Melee Search and the CEO of Melee Search was really great. And they helped me out setting it up. I sent them a full snapshot of my big old database, and they told me this is the biggest one we've ever worked with in this tool because so far they only had done imagine IMDB style things. You have a name of a person, name of a movie,

Starting point is 00:43:56 just all really short text, easily indexed, easily prioritized. But like you said, a transcript of a show very quickly goes into the hundreds of kilobytes and just to even store that reliably and access it reliably and then put into an index somewhere, super hard. So I got a lot of help from them. They built new versions of the software that had features

Starting point is 00:44:17 that I needed in this just so I could keep running it. And I think I'm still running it for some parts of the website where I need that quick lookup, but for the actual gigantic big old like Boolean search and the wildcard search in transcripts, I very recently migrated everything over to open search on AWS in like a hosted instance there. Nice. And how is that originating? I guess like how recent is that and what are your feelings on it right now? Well, I think I only felt burnout 4,000 times

Starting point is 00:44:47 throughout this process. It was incredibly hard. When it comes to data that is millions of items, that's already hard. When it's millions of rows in a database, millions of anything. But when it's also then almost almost four terabyte in size, that needs to be just even sent over the network somewhere.

Starting point is 00:45:08 And it needs to be scheduled, it needs to be loaded into RAM and sent over like, because when I put something in my search database, it's slightly different than what it is in my MySQL database, right? It's enriched with other things. What's the name of the podcast, right? That's just one foreign key lookup away. But, you know, you kind of have to collect it and then put it over. So man, was that something.

Starting point is 00:45:29 The migration process, I built it so that it was running in parallel for the thing to still be used in production. But in the background, I was shifting this information over to open search, which is really, really good at ingesting this data. And still is, it's extremely fast, super reliable. You know, knock on wood, I guess. But this has been very, very cool to see how reliable and fast the search has been. But man, the migration was probably one of the most stressful

Starting point is 00:45:56 things that I've done over the last couple of months. Yeah. How long did that take to move four terabytes over to open search? Because of the fact that I needed to transload a lot of data from other tables in my database and often had to look up very different things like chart rankings and all of this too, because we track that as well, which is a whole other thing. This is probably my biggest table

Starting point is 00:46:16 is not even just a podcast with the transcripts is the chart history of all chart locations of all podcasts all over the world. This is a solo pro-narrative project. It's amazing. Yeah. I don't know how this hasn't killed me yet, but when that happened, I think it took 14 days as a background process to migrate this over.

Starting point is 00:46:38 I had to build a migration, a restartable, a skippable migration command and that kind of stuff and just keep running it in some shell that was running on my third monitor and looking at it, hoping that it wouldn't break, that kind of stuff. Yeah, it took a while, but then I built a kind of a hybrid system where I would then launch it as a production feature

Starting point is 00:46:56 that people could switch over or could be switched over to the new search on the API. So I could see, does this break when people search for the old thing? What results would they get there? Are they the same as the new ones? Then I slowly migrated over time, flipped the switch one day and nothing exploded.

Starting point is 00:47:12 It was one of the happiest days in my developer life. Ever since then, this has been powering discovery and all kinds of other things in the background. It's been hard, but it's been rewarding. It's really cool. Very cool. Yeah. How does the cost of your open search infrastructure compare to the cost of the GPUs from Hertz and Error?

Starting point is 00:47:36 So open search, I think for the production one, I'm paying $700 a month right now. And I think it's 500 gigabytes of data, the provision for 500. I think we are at 350 in month right now. And I think it's 500 gigabytes of data, and the provision for 500. I think we are at 350 in data right now. So as this grows, obviously that will grow too. But I think that makes it, let me say, twice as expensive as the Melee Search cluster, cluster, Melee Search server, which is a,

Starting point is 00:48:01 and I'm kidding you not, a 64 core Hetson machine with 350 gigabytes that I pay $300 for a month. Oh my God. Oh, so you were running your own, I forgot that Melia's open search, so you were running your own. Yeah, so that's all it did. So yeah, that board broke a couple of times, that was fun.

Starting point is 00:48:18 But also the only problem there was the speed of ingestion and their ingestion queue, that could route to one million objects and then the whole thing froze at some point. So I had to build, what was it? I have this kind of back off logic and that queries that server's queue counts the items in the queue

Starting point is 00:48:38 and the depending on that sends over new items or doesn't. I was like, I don't wanna deal with this. Could AWS please solve this for me? And obviously the open search there is capable of dealing with my inflow. So that worked pretty well. But yeah, that still exists. That server is still running for certain kind of queries. The smart, yep, the different things. Yep. Yeah. Yeah. I always say like elastic search is my least favorite thing to run, but I think it's just partly because of the nature of the data. You're often just throwing large amounts of data at it,

Starting point is 00:49:08 doing full-text search, or maybe some aggregations too. And it's just like, it's a hard problem. Honestly, I can tell you I've been obviously dealing with Search not just in this project. I've done it before. I was a salary software developer for companies. Search is always something, right? And it's always hard.

Starting point is 00:49:26 And Elasticsearch in particular, and OpenSearch is kind of almost a dialect of it. I don't want to insult anybody working on either project, right? Because there's this kind of competition between of them. But effectively, to me, they are the same. So they are. I mean, it's a fork of it, you know?

Starting point is 00:49:40 So yeah. Right? It's just conceptually, they don't differ enough for me to see them as different entities, so I kind of treat them the same. To me, what I always hated, and I mean this as a developer, I did not enjoy the DSL, the description of queries in there. That is just rough. It's rough to understand how exactly should and all of these weird bool into queries. And no, it was hard.

Starting point is 00:50:09 You know what saved me there? AI. AI. Yeah, this really is really that. Like, I don't think I have written a single part of any of these queries. And these are really complicated, composited queries. Like, I'm using Laravel OpenSearch, which is kind of sitting on top of Eloquent, which is Laravel's ORM, kind of the thing that turns my whatever things are into SQL queries or whatever, that exists for OpenSearch too and for Elasticsearch as well. And I'm just trying to kind of composite this like I always would as a Laravel developer. But sometimes I have to do a raw query. And I have not written a single line of this.

Starting point is 00:50:47 This has all been Juni, which is kind of the coding agent that is inside of PHP Storm, you know, the JetBrains one. And also Cloud Code and whatever these fancy tools are, they've helped me a lot too. But none of these DSLs were written by me because I just couldn't. And I didn't have to. I could just tell the thing,

Starting point is 00:51:06 hey, I want boost things where the full word is in there and then also look for ands and ors and do a Boolean query, make that. Like I just literally, that's how I code nowadays. Maybe that's interesting too. I did, yes. I'm talking into this microphone for as long as it lets me talk.

Starting point is 00:51:22 I'm using a tool called Whisper, what is it called, Whisper Flow? Whisper Flow, okay. Which captures after you press a key command, captures everything you say, then transcribes it, sends it through a quick AI to kind of clean it up and then paste it into whatever text input you have currently active, which is really cool

Starting point is 00:51:38 because you can do this on Twitter, you can do this inside your IDE, you can literally code with the thing you shouldn't, but you can. So I just effectively, I'm writing a mental draft and I just speak it out loud, throw that into Juni or Cloud Code or whatever, or cursor if you fancy. And then I just let the thing do whatever I told it

Starting point is 00:51:56 for 10 minutes. And that's exactly how I built most of the functionality of Podscan after a year or so of building it, because that's when I started using AI. Yep, that's so cool. Yeah, I've heard people talk about Super Whisper, Whisper Flow, doing that stuff. Like John Lundquist is another big one on this sort of stuff.

Starting point is 00:52:13 And I haven't, you know, I tried Super Whisper a year ago and just haven't gotten back into it, but that's an interesting one. I would say that, and then also like, are you using any of like Cloud Code or Codex or any of like the truly terminal based like set it and forget it type type things so i've um i've used clod code initially but then i found juni and juni is exactly this like they just built effectively a clod code copy

Starting point is 00:52:37 or clone before it came out i think but don't quote me on that into Into the IDE, into PHP Storm, which kind of sits at the side of it, and you can, it is effectively a terminal with a little bit of nicer UI. So that's how that works. So yeah, I'll use it all day long. Like that's, I code so little now that I feel- Code in terms of actually typing anything? Actually typing, obviously like sometimes there's a line and I deleted or remove it put a lot of a lot statement there or i notice something that the i didn't do the way i would do it because it has a different code smell i would be factor whatever but sometimes that perfect doing is just me writing a comment a perfect this to look more like that and then i had to be a no with the thing and then it does it so. So even coding to me is more a managerial task

Starting point is 00:53:26 than it is an implementation task right now. And a lot of founders really hate this, like if they are very technical, because coding to them was the moment of creation that the joyful part of here is what only I could do and this is the thing that I've built. I'm starting to kind of miss that too, but the speed at which I'm now creating features,

Starting point is 00:53:46 like this migration to a search engine that actually does what my customers pay me money for instead of just being fast. And this is no diss at Melee. Melee is wonderful in all kinds of scenarios. It is even wonderful for mine, but I found it. I needed more. I needed queries that people can, you know,

Starting point is 00:54:03 almost like reports built build very complex things in. It just wasn't possible with the old one. So I had to migrate and AI assisted coding allowed me to do this. I would have spent months like just slogging through this and probably not get it right. Yep, exactly. And yeah, it's like it does feel like something's lost, but it's also like fun, especially if you're doing UI or front end stuff, like just the pace of creation there and like seeing that change, like that's a different kind of high to, and then you let it do it. And then it takes a couple times of you telling it what is that wrong to get it right. But still,

Starting point is 00:54:49 within 15 minutes, you have a completely refactored front end. And then if you like it, you commit. And if you don't like it, you roll back to the last commit. It is so simple. It costs you no brain activity other than looking at it's like, nope, or yes, right. And it is it is, to me, coding now is really prompting. And in the sense of what we always have done, we would have a scope document. We would kind of describe what the product is gonna be.

Starting point is 00:55:14 We have some kind of information source for us, what we want it to be, and then we would implement it from there. Now we just need to create that document and allow the machine to implement it best it can, which is often better than I would probably do it. I consider myself a 0.8 developer, like 0.8x developer, not a 10x, not a 1x. I'm under there, but at least I'm good at tool use. So, you know, that helps. Yeah, yeah. I don't know that that's true, but yeah, it's it's gonna be a sec. So you say Whisper flow and and junia are there any other like AI use like v0 for for visuals design stuff or anything else?

Starting point is 00:55:55 Okay, so Recently, I have found another really cool use for tools like v0 or or lovable like those things They help so much in showing potential customers or potential like collaborators orero or lovable, like those things. They help so much in showing potential customers or potential collaborators or clients or whatever what an integration of your tool in their tool could look like. So I was recently talking to a client who is running an analytics company, like somewhere in the artist space. And they track a lot of things for a lot of different people. And they wonder if Podcastcan can help us with podcasts here, right? They probably pulled this data in and

Starting point is 00:56:29 kind of use it for things. They didn't really know what to use it for just yet. But they knew that they were missing out on podcasts. So they needed a new way to get that information. So I talked to them, we had a little chat for an hour or so. And I took the transcript of that conversation, the full transcript of the chat, the full transcript of the chat, the kind of demo, I guess, the product demo. I threw the transcript into cloud,

Starting point is 00:56:51 into Anthropix cloud, 4.0 or whatever, 3.97, nobody knows. And I told it from this transcript of a conversation, write a prompt for lovable so that I can build, so that Lovable can then build a tool that shows this person exactly how our integration could work in their product. So then came out a prompt, I took that into Lovable, had it generate this thing, worked on it for I think an hour or so just like moving little things around, making them clickable, coming up with more scenarios. And then I took that link and sent it

Starting point is 00:57:26 to the guy the next morning. And I had a fully working kind of working, right? It's still kind of a click dummy thing, but it was a fully integrated data centric version of what the product could look like without data. For that stuff, for like 20 bucks a month, yes, right? Sales enablement, that's what that is. Oh my goodness, yeah, yeah, yeah.

Starting point is 00:57:51 That is very cool. Yeah, yeah, that's super cool. Oh man, I love that. Yeah, I love, I'd say like I'm very bad at just design generally, so like even using VZero or I haven't just tied out level yet, but I should. But just to like get my juices flowing on what something could look like is very helpful for me.

Starting point is 00:58:09 So I love those reasons. Yeah. Yeah, I use this all the time just for trying to figure out how to make things more consistent. With the moment I have something in my code base, like with Juni here, Juni has full access to all of the view front end files that I have. So if I tell it, hey, look for files

Starting point is 00:58:27 that don't look like the others, it actually does. And then it kind of streamlines them. It's so cool. Like these tools have insight into, into even concepts like, you know, white space and order and hierarchy that I don't have. I don't have the eye for that, but they do because they're trained on this data.

Starting point is 00:58:43 It's really useful. Coding with AI is not just having AI write code for you that you already know it's going to write. It's making it figure out the things that you would never think about and then implement that. Yeah. Oh, that is so cool. Okay.

Starting point is 00:58:58 I want to ask you, you mentioned use PHP and Laravel for this one. With Feedback Panda, you use Elixir. I guess why did you choose PHP, Laravel, and this one,back Panda, you use Elixir. I guess like what, why did you choose PHP Laravel in this one or did someone else choose Elixir for you with Feedback Panda? I guess like why the switch there after having a successful exit? Yeah, it kind of was chosen for me because at the time I was working as an Elixir developer. Like when I was building Feedback Panda, I was still kind of moonlighting. That was a moonlighting project.

Starting point is 00:59:25 And my full-time job was writing an Internet of Things platform that was built on top of Elixir as a language, which also was my first job with Elixir. That was just a tech choice by the CTO of the company who thought, yeah, we're building a very parallel thing here with IoT. Might as well use the language that is built on top of the Erlang VM, right, which is built on as the phone switching networks, like deep internal core system, highly parallel. So that's what I had at the time. So I just used the tech stack that I had at the time to build feedback vendor. And later after that, even my first SaaS post exit that was

Starting point is 01:00:04 called permanent link, just kind of a link forwarding tool for authors for people who have And later after that, even my first SaaS post exit that was called Permanent Link, which is kind of a link forwarding tool for authors, for people who have links in books and don't want them to die when the original website goes away. It's kind of that idea. That was also built in Elixir. I wish I would have built it in PHP because it's just so much more maintainable by AI at this point. Like all AI systems are built on so many Stack Overflow PHP questions. They get it right. For Elixir, it's a more rarely used language, and that makes

Starting point is 01:00:31 hiring hard. It makes maintenance a bit harder. It's also different to deploy all of that. But I think I started a pod line, the voice chat thing, as a Laravel project, because in my community, in the indie hacker bootstrapped solo software founder community, Laravel just came up more and more. Like everybody was talking about how PHP has a comeback now and Taylor Ottwell had built this amazing thing and not just not only had he built an amazing framework and Laravel is really, really good as a framework for PHP, particularly compared to the other frameworks, the one

Starting point is 01:01:06 inside a war, but it's really good. But the the whole ecosystem around it, like the business ecosystem around it, the tailor and the team build, plus the actual user ecosystem around it, just a very kind, friendly and super community centric and founder centric community that you don't necessarily have with other languages as much. Elixir is a very technical language. It's functional programming that attracts a lot of nerds, which we all are, but a lot

Starting point is 01:01:35 of people who are purists, that's maybe the term. So you don't necessarily have to have a good PHP. You couldn't have a good PHP. Let's be real. A language like this can't have purism because it does not exist. But I chose that because I wanted to see what the hubbub was all about. And then I built this product so easily and so quickly, I was like, okay, yeah, I'm going to keep using this. So that's how that happened. You're staying here. Yeah. Yeah. Do you think we're going to see just less, I don't want

Starting point is 01:02:01 to say like innovation, but just like new languages, new frameworks and all that stuff, given the way we're sort of building now so quickly with AI, like, do you think that's like, hey, we're going to see less innovation there because the AIs have already internalized so much PHP, Laravel, so much Node, all that sort of stuff, so much React. I'll answer this with two answers.

Starting point is 01:02:24 The real answer, like the actual answer to your question, would be yes, I think so, because it's just kind of a Lindy effect, right? Things that have been around for a long time will likely stick around for an equally long time. And new things, they have a harder barrier to entry to get even into the LLMs. So just from the way that technology has been developing

Starting point is 01:02:44 and the lag in which these things ingest new information means that the older information is likely going to be more in the outputs. And since these outputs are now generating blog posts, that's going to feed the ecosystem that then gets ingested back in. So likely. But my bigger meta theory here is that we're going to look at specific programming languages at some point over the next 10, 20 years, the same way we look at different implementation of binary code at this point. Like, we're going to...

Starting point is 01:03:11 It doesn't matter, like, what this stuff is going to be compiled down into. The language that we're going to communicate with is one that an AI coding assistant understands. Like, right now, we're coding into a compiler or an interpreter, depending on what we use, right, with PHP, or with JavaScript or C or whatever, it's always this kind of

Starting point is 01:03:28 executable that then turns that into machine code, we're going to develop a new meta language that sits on top of AI coding that then I don't know, compiles all front-end related stuff into JavaScript, because that's still in the browser, and then compiles all back-end stuff into, I don't know, Rust, like anything that the AI thinks is the best implementation for this, and then also it can run reliably. I think at some point, and it's going to freak a lot of people out, the actual programming language that something is implemented in, it's not going to matter anymore as much.

Starting point is 01:04:01 Do you think there will be a sort of meta-language like you're saying, or will that just be English? And that's... I would be surprised if it wasn't almost dialect of the English language, but I think there will be, like there has to be a transitionary path. And I don't know what it is. It might be, Markdown comes to mind, like something like Markdown, but for logic, right?

Starting point is 01:04:24 Not for documents, not for presentation, but for representation, like almost like an idea, topology of some sorts. I don't really know. Maybe it's just people rambling into a voice prompt for half an hour. Maybe that is the interface. Seems to work pretty well already, right? It's just like we're doing. Yeah. Yeah. it's true.

Starting point is 01:04:46 Yeah, yeah. Okay, one last tech question I have and then a few business questions. So real-time alerts, you know, people can put in these keywords and say, hey, alert me whenever one of these shows up. How's that running? Is that just a cron job every X minutes,

Starting point is 01:05:00 you're scanning all the new podcasts and firing off alerts, or what's the sort of infrastructure look like for that? Yeah, so like I said an hour ago, there is this popping thing where we get notified of new episodes, then we add them to the internal queue, our transcription systems then fetches that from the queue transcribed and sends it back to the main server. At that point, we have a full transcript, then comes another step. That's the analysis step. So we send it back to the same server that then calls OpenAI or whatever,

Starting point is 01:05:28 gets back the structured data, adds that to the data in the database. And then when we have either received a transcript and we're like, okay, this is enough, we don't need to analyze it, let's just go. Or we have received a transcript and done the analysis step, then we complete. That's when we start scanning all the

Starting point is 01:05:46 alerts that we have in the system, which is like 3,000 or so at this point. We just go through every single alert, that's the keyword match. And then if it does, is there also this kind of context aware thing, sends things back to the API, goes back to Claude or whatever and goes back and that says yes or no. And that's when we dispatch the notification. So it's every, like real time. Every time there's a new transcript, you run it. Every time there's a new transcript, we run every single Lord Donna to just scan. Interesting.

Starting point is 01:06:13 Yeah, and is that load pretty, that's just not that much load to deal with there? That is just a text scanning. It's like trying to find, it's not even regex, it's just, you know, text retrieval. Yep. I was thinking of doing the queries against Elastic, doing that with the most recent ones, but I guess if you're just doing it against the doc directly, yeah, you're just doing it in a process. Yeah. I just load the full text and I go line by line, is this in it? And if yes, then that's great.

Starting point is 01:06:41 I mean, we could also do a more semantic approach probably. And that might be a thing for future developments that we would check for, like almost, like turn it, vectorize it, do embeddings and check against the embeddings. And if there's a certain kind of similarity, but we're not there yet. It's still that we trigger by individual keywords.

Starting point is 01:07:01 And if the keyword is found, then that may trigger another AI step or not, but it's still very much just text substitution and text like index off presence pretty much. Yeah, yeah, yeah, cool. Okay, some quick business stuff before I let you go here. You know, you sold Feedback Panda, this was 2019, I believe. So I guess how has maybe Indie hacking

Starting point is 01:07:24 or just the environment changed from 2019 to 2024, 2025, not to be working on Podscan? Yeah, I've seen a shift in fatigue. There's like subscription fatigue. That's a big thing. Like a lot of things that people would have easily paid some money every month a couple years ago, then I like, oh, not another thing. So it's harder to sell it particularly even low

Starting point is 01:07:46 ticket things to people. I'm operating in a space now and I had to claw my way kind of up here, where my lowest tier price for Potskin is $200 a month. And then it's 200, 500, 2500. Those are the tiers that I have. I started with 40. Right. And it might even have 29 at some point at the lowest tier. And you know how it is, like you get customers that are extremely price sensitive, but also have extremely high expectations of how you should be available as a founder, as a customer service rep or whatever.

Starting point is 01:08:15 So I moved my way up and it's much, much nicer up here where we are right now. But it's just getting anybody to subscribe to yet another tool has become much harder, which that's like the number one problem for indie hackers is getting this initial traction. And it's harder. It's just harder to build something that people find valuable enough to pay money for. And then, of course, the AI hype train that has kind of come into our community has caused an interesting shift. Because it's not that AI can actually build these businesses easily, right?

Starting point is 01:08:49 Lovable is great, but as I said, it's not gonna build the full business. It's a click dummy. It's a prototype of something. And then you can take that maybe and turn it into something bigger and extend it and build your own business around it. But the expectation that people now have

Starting point is 01:09:05 is that we could just easily build this with AI, right? Even though they have no idea if that's actually feasible or not. So that is another barrier to entry because people are like, yeah, but I'm not gonna pay for this. Even though they don't have the capacity or experience to build these tools.

Starting point is 01:09:19 That's this weird cognitive dissonance now that AI could solve this for me. I don't need to give you any money, which makes people dismiss solutions that they a couple years ago would have had no problem paying money for at all. So that's kind of a problem that I see there. Yeah, yeah. It's like, well, AI has solved this for you, but like through all your work connecting all the AI, be like, you know, it's like, yeah, yeah. That's exactly right. Yeah. Yeah. AI is not a replacement and it's certainly not very good at understanding the edge cases of a real customers lived experience. When you see people building

Starting point is 01:09:54 prototypes and then building little businesses, and I don't want to belittle it, which is why I should probably choose a different word, building fledgling businesses on top of it and say, I've just built a clone for Sentry or whatever, and this is so much cheaper, you should totally use this. Sure, great, but obviously encoded within the code base of these bigger incumbents are learnings of decades

Starting point is 01:10:19 of customers coming to them with very specific problems that needed a very specific solution that is not obvious or even visible to anybody outside the company. So AI cannot necessarily encode it unless it has access to the internal JIRA or Trello board of those companies and or code base and even then might not understand where those things came from. So that's the mode that you have right now as a founder is your intricate knowledge and capacity to understand the work that the people that pay you money to solve it for them

Starting point is 01:10:51 actually have to do and how hard it is and how complicated the human aspect of this is that it's not just easily solved with bits and bytes. It also is a relational, a conversational thing more than anything else. That's the saving grace of indie hacking is that you can still be present as a founder in your early customers' lives. They might choose you for that, not for the tech necessarily, because other people can build the same tech, but only you can care. Yeah, that's a good point. That's a good point. The year, maybe like two years after selling Feedback Panda,

Starting point is 01:11:27 like did you feel at peace? Did you feel a whole, I know you've like stayed busy since then. I think you've done a good job of like staying busy and doing stuff. I guess like, what was that experience like for you? Like it's like having sex, yeah. Yeah, the exit was, it was interesting

Starting point is 01:11:42 because everybody tells you that you're gonna fall into this hole of not knowing what to do. And you don't believe them and then you do. That's kind of my experience. So I was so deeply entrenched in building a software product that when I had to give it up to give to somebody else and get at the same time find financial security, which was great. I still felt like half of me was gone. And I needed to kind of fill a purpose, right? It's like, where do you find your purpose? A lot of people ask this, and I didn't really know that that was where I got it from, but building things for

Starting point is 01:12:16 people that I care about, so they have a little bit of an easier day that had become encoded in my identity at this point. And I couldn't anymore. So I had a pretty hard time for a couple of weeks, but then I found, ah, just gonna talk about it, just gonna write about it. I always wanted to write, so why not start a blog? And from that came the Bootser Founder blog and the Bootser Founder podcast

Starting point is 01:12:36 and the Bootser Founder newsletter and the YouTube channel. And it just easily cascaded into this multi-distribution system media company. And then I wrote a book, like you very kindly mentioned Zero to Sold earlier, it's my first book that kind of shows the story of Feedback Panda and how I approach building businesses, which I still very much use as my playbook, obviously,

Starting point is 01:12:57 cause that's my experience. And in that I was like, hmm, I have so many links in the book, I wish there was a tool that, right? And that's kind of where Permanent came from, where I needed a tool to keep the links in my book active. And then Permanent Link allowed me to also have links to my podcasts and all of that. And then my podcasts got bigger.

Starting point is 01:13:16 And then I was like, oh, I wish people could send me voicemails. And I needed to keep building these tools that I needed for myself along the way. And I think that reestablished my purpose continuously, which is also why I'm still knee deep in Potscan right now, even though I wouldn't have to. Like I could just do whatever I want,

Starting point is 01:13:34 but it's kind of what I do. I do whatever I want and that is building a SaaS business that is trying to do on its own what other companies have whole departments for. So it's just an enjoyable thing that also makes money and makes people happy along the way. Yeah. Yep. And you're just like tackling these fun challenges and working with the yeah, just like that. Yeah. So then like, what's your ideal outcome with pod scan? Like, given that you have the exit earlier and you have that success, like, are you just wanting to run this until it's boring to you?

Starting point is 01:14:06 Yeah, it's a fun question because there is the reasonable answer, and that is, yeah, let's just see where it goes. But that's always the aspirational answer is, yeah, Spotify acquires a lot of companies. I'm still looking at it from the founder perspective. I know that right now we have a team of three-ish, four-ish people that, other than me, all part-time, just, you know, working on specific things in the business outside of the technical domain, sales and marketing and that stuff.

Starting point is 01:14:37 So it's still pretty much a solopreneur driven business, even though it's not a team of one. And I'm like, that does hamper the capacity of this company. And I know that with a proper team and with proper funding to build infrastructure even further, this could do much more and much faster. We are now between 5 and 15 minutes between the release of a podcast and its alert handling,

Starting point is 01:15:05 it could probably be like with just the speed of transcription tools as they are right now, under 60 seconds, like somebody could release the episode and you could start listening and they are not even done with the intro and you would already know what the sponsor in minute 65 is saying, right? Like that is what this tool could do and can already sometimes do. But with a bigger team and maybe in another context, that could also work. So I'm going to see if I can get there myself. But I'm certainly not going to say no to acquisition offers along the way, just realistic as a founder. Like if somebody thinks this is a good thing for them in

Starting point is 01:15:39 their businesses, well, just email me. It's very, very simple. Yeah, that's where I'm at. So I'm just going to see what happens. And if it doesn't work at all, if it implodes at some point because it gets so cost prohibitive that I can't afford running it, then I'm going to turn it off.

Starting point is 01:15:58 I'm going to see what the next thing is. Or I'm going to try something else. I'm trying to decouple my identity as a person from my identity as a founder, from my identity as a person from my identity as a founder, from my identity as a founder of that business. These are different layers. I can always go back to being a founder or just being myself and not be a founder for a while. It's also fine. Yeah. Yeah, for sure. Well, Arva, this has been awesome. I'm so glad I got to talk to

Starting point is 01:16:19 you. And yeah, this conversation did not disappoint at all. So keep doing what you're doing. I keep sharing just the know, just the process because I just love following along every week and hearing what's going on. I guess, yeah, for people that want to find out more about you, where's the best place to go? Well, thanks so much, Alex. I appreciate you giving me this chance to just nerd out for an hour or so. That's great. If you want to find me, I'm on Twitter or X, as the cool people call it now, at Arvid Carl, A-R-V-I-D-K-A-H-L. That's my handle there. And podscan.fm is where Podscan lives. That definitely is a place to check out. My podcast is at TBF.fm. That's the Bootstrap founder.

Starting point is 01:17:00 That is where I've talked to Taylor Otwell. We were talking about Laravel a couple of weeks ago and he talked to me about like Laravel Cloud and their AWS deployment and all that. It was very interesting too. So I just, that's what I get to do as well, right? Like since I don't have a job, really, I just get to hang out with all my nerd friends and all the other tech people

Starting point is 01:17:18 and I just get to chat with them and then build my own software on their platforms. It's bizarre. It's what a life. I'm so fortunate to be part of this. It's true. My wife always jokes about all my internet friends, right? Cause it's like, I'm in the middle of Nebraska.

Starting point is 01:17:32 I don't have like a lot of tech people here, but then like I talked to all these just different people on the internet and get to see them at conferences once in a while. So that's, that's right. Yeah. Same for me. Like outside of this window, there's probably some cattle running around

Starting point is 01:17:42 cause we're just in the country. Yet I get to talk to you and to this community of wonderful people. I feel very blessed. I'm very grateful for the life that I have. It really is. It's a great time to be alive. Yep, for sure. All right. Thanks for coming on. Really appreciate it. All right. Well, thanks so much.

Software Huddle - Lessons from Transcribing and Indexing 3.5 Million Podcasts with Arvid Kahl

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.