Software Huddle - Lessons from Transcribing and Indexing 3.5 Million Podcasts with Arvid Kahl
Episode Date: July 8, 2025Big time guest today as Arvid Kahl joins us. Arvid is my favorite type of guest -- a deeply technical founder that can talk about both the technical and business challenges of a startup. Lots to enjoy... from this episode. Arvid is known as the Bootstrapped Founder and has documented his path to selling Feedback Panda back in 2019. He's now building Podscan and sharing his journey as he goes. Podscan is a fascinating project. It's making the content of *every* podcast episode around the world fully searchable. He currently has 3.5 million episodes transcribed and adds another 30,000 - 50,000 episodes every day. This involves a ton of technical challenges, including how to get the best transcription results from the latest LLMs, whether you should use APIs from public providers or run your own LLMs, and how to efficiently provide full-text search across terabytes of transcription data. Arvid shares the lessons he's learned and the various strategies he's tried over the years. But there are also unique business challenges. For most technical businesses, your infrastructure costs grow in line with your customers. More customers == more data == more servers. With Podscan, Arvid has to index the entire podcast ecosystem regardless of his customers. This means a lot of upfront investment as he looks to grow his customer base. Arvid tells us how he's optimized his infrastructure to account for this unique challenge.
Transcript
Discussion (0)
and now we are just scanning the whole podcasting ecosystem every single day,
50,000 shows, transcribing all of them, analyzing them.
It's pretty wild.
The two areas I want to dive in a lot are the transcription work.
So once you download that, like the actual parse out the transcription
and summarize and all that sort of stuff, and then the search infrastructure.
For the longest time, all my transcription actually ran locally
on this very machine
that I'm talking to you about.
That's cool.
For the business.
I had this thing, and when there was a power outage,
I had a problem.
Where are you at right now with search?
Are you using Elasticsearch now?
Kind of, I'm using OpenSearch now.
Okay, that counts, yeah.
Right, it's kind of that, yeah.
Yep, that counts.
Well, because I thought,
were you Mellie search at one point?
Yes.
Why did you choose PHP Laravel in this one?
Or did someone else choose Alexa for you
with Feedback Panda?
I guess like why the switch there
after having a successful exit.
Sup everybody, super excited to have Arvid call
on the show today.
You might know him as the Bootstrap founder.
He sold Feedback Panda, he's running Podscan,
and it's just like a really interesting combination
of technical and business stuff,
like the stuff he's doing with Podscan
where he's downloading and transcribing
and making searchable the entire podcast universe
every single day.
It's just really fascinating.
We talk about some tips he's learned
from doing all the AI transcription stuff,
running his own, running the search infrastructure for that
and just managing all that stuff
and just all these lessons you learned along the way.
I followed Arvid for a long time
and really excited to get the chance to talk with him.
It did not disappoint, he's a fun guy.
So check it out, you know,
and if you have any questions for us,
if you want guests to have on anything like that,
feel free to reach out to me or to Sean.
And with that, let's get to the show.
Arvid, welcome to the show.
Thanks so much for having me on.
I'm excited.
Yeah, I am super excited to have you here.
We were talking before we started,
and just like you're one of the most interesting people
to follow on Twitter, just like documenting your journey,
not only selling Feedback Panda,
but now the cool stuff you're doing with Podscan
and just like keeping us updated
with like all the challenges, the business challenges,
the technical challenges,
all the stuff you're learning and stuff is very cool.
You're an awesome Twitter follow and all that stuff.
But I guess for people that don't know as much about you
or about Podscan, can you just give us an intro to yourself?
Sure, so thanks for the kind words, really appreciate it.
And I think building in public has been a big part
of my journey over the last half decade or so
that I've been more publicly available.
I've been a developer since I had a computer, it's like a 14 or something.
That's when it started.
And then I just always wanted to build my own things, which I did.
Then I realized, yeah, I can actually make money of this.
So I built my own businesses as well.
Feedback Panda was probably the biggest thing I've done around 2019.
When I had a two-year-old business,, we sold it and that was a big move.
Put myself more in the public eye as well with all my founder peers and stuff.
And ever since then, I've been building software businesses and a media business, a blog, YouTube and podcasts and all of that in public in front of people trying to just sharing the journey. Because I feel that is where I learned most from other people is when they shared
their challenges and how they overcame them or how they struggled all of that.
So I've been trying to do the very same thing for others.
Podscan is a result of this.
I was I was just building my own media
business, my little podcast empire, and I was like, oh, man, I wish I could take in
just notes for my listener, kind of like voicemail. So I built a little
SaaS tool for that. And I was like, cool, but nobody is buying
this. I want to make a business out of this. How do I find
people who want voicemail on their podcasts? I was like, hmm,
there are all these social monitoring tools for Twitter for
Facebook, where you can kind of track like Google Alerts, really
retract a name, track a brand or whatever. This doesn't seem to
exist for podcasts, I needed for my own tool. And then very quickly realized that I shouldn't just build
this as a little marketing stunt. This is a business all in itself, like figuring out who talks
about your names, your brands, your products, or anything you're interested in really, on podcasts
all over the world. So that's where Podscan started. And that's how a year and a half ago
that I had this idea and implemented it.
And now we are just scanning the whole podcasting ecosystem
every single day, 50,000 shows,
transcribing all of them, analyzing them.
It's pretty wild.
And I'm super, super happy to be able to share that journey
because it's been up and downs, lots of downs,
like many entrepreneurial things, couple of ups.
And it's been really cool to just be,
have that be part of, you know,
a community of people that I admire.
Got a lot of good help there too.
Yeah, for sure.
Yeah, and it's, again,
it's really been so fun to watch the journey.
Like I remember buying and reading Zero to Sold, your book,
and that was fun and just like all the stuff you've done.
But then like with Podscan, it's been just another level of like, just seeing you talk through
all the technical stuff. And I love it when I can get someone on here who's like very technical,
like you can talk about that, but I can also talk about the business challenges and like,
you know, finding product market fit and finding customers and marketing and all that,
that sort of stuff, which is like very fun. So that's great.
Yeah, you went through how you got the idea for Podscan.
I guess at a high level, just to tell people,
you are ingesting the entire podcast universe
every single day.
I looked on your website and it says 35,000 new episodes a day.
Basically, every single episode that comes out,
you pull it down, you automatically transcribe it,
make it searchable for people and alerts and all these sorts of things.
Yeah, it's one of the weirder things about this particular software as a service business is that
the scale that I operate at has nothing to do with how many customers I have.
Most software businesses, they scale with the people using the product,
but it really doesn't matter to me if I have 10 customers who want to track
all podcasts everywhere,
or if I have 100,000 customers
who want to track all of these things.
Obviously there is a scalability there somewhere,
but it by far is just outsized
by how many shows are released every day,
which is kind of static if you think about it, right?
You said 35, sometimes it's 50, depending on the day.
Mondays are more, weekends are less, right?
There's always a very clear
number of shows in any given day that's being released because
people may start a lot of podcasts, but not a lot of
people keep doing their podcast. So it's a fairly static number,
3.8 million shows all over the world, which is lots, but not
all of them are active. And not all of them are always active or
active on a weekly cadence, right?
There's shows that are daily and so on. So it's a lot of data, but once you figure out how to deal with this,
I had to build an infrastructure for parsing all these RSS feeds because the podcast ecosystem
fortunately is built on top of open standards.
So the RSS feed, every podcast has to have one, even if it's also hosted on Spotify or
Apple podcasts, those kind of more walled garden systems, there's always some kind of
podcast hosting company out there that hosts the original data source. And we fetched that
download the mp3 and I have a fleet of GPU based servers that transcribe, we can dive
into the details. I hope you do. Yes, please. How this stuff works, but they effectively
24 seven, all these servers, dozens of them, are
just churning through.
They get a URL, they download the file if they can, because that's also bot blocking
and all of that shenanigans happening.
They try to transcribe it in whatever language it's in.
They try to get timestamped lines of SRT format kind of subtitle files come out of it, or
even word level timestamps in those podcasts as well.
There's a lot of data being collected that gets sent back into my database,
and then I have a whole other system of AI-based data extraction that
figures out who's a host, who's a guest, who's the sponsor,
what are the main topics and all of that,
because that's what my customers really want to know.
For Podscan, that would be PR companies, agencies, marketing departments, like people who really
have a vested interest in knowing who is talking about their product, their boss, their brand,
right?
Who is shit talking, that airline that works for them or they work for.
It's all of that kind of stuff with different levels of urgency that people need to know
something about.
So we just ingest everything to make sure that we catch everything.
Yep.
Yep.
And even on that level, how do you find everything?
I'm aware of the RSS, a little bit of how it works on Open Standards, but are there
four or five hosting providers that have just a list of all the podcasts they host?
Or how do you find those 3.8, 3.9 million podcasts out there?
So I'm just going to give away all the secrets that I have that are very easily Googleable.
And probably if you were to ask a chat GPT or a Claude something about this,
they would guide you into that very direction as well.
In the beginning, that does not did not exist like that tech,
like the whole AI world kind of happened in parallel
with the development of Outscan. I started in April 24, and the tech was kind of there, but
not really. At least not at Cursor, and that stuff didn't exist. So I had to do a lot of things
manually. And when I started, I also had this big question of, so where are all these shows? I didn't
know. But as anything for any developer building a business today, I
just looked into the open source world. And I found what is called
the podcast index, which is a kind of a community operated
database of all the podcasts everywhere that also has ties to
the people who work on the pod ping is kind of think a
blockchain based pop-up system
where podcasts can say, I have a new episode
that is just out and anybody subscribing to that feed
then gets kind of a notification.
So they, you know, a player or another server
that wants to do stuff with it can pull that episode
and do whatever with it.
So that's called Podping and that's integrated
into the big world of the podcast index.
And they have an API, which is great, because you can just pull the most recently released episodes,
the most trending episodes, but they also, and that's the most fortunate part of all,
have a four gigabyte, I think, refreshed every week SQLite export of all the RSS feeds of all podcasts everywhere.
So if you built the infrastructure
to handle this, which is a whole other topic,
you can just take the SQLite file,
parse it into your system, and all of a sudden,
you have 4 million feeds to check, or 3.8.
And you just check those feeds every end minute,
all 4 million of them?
You should see the wondrous architecture behind that.
But yeah, I figured out that Popping is pretty well established
among most podcast hosting providers.
Think of Transistor or Acast or those kind of providers that exist.
So they integrate this automatically for most of their shows.
So the bigger providers having this integrated means that most podcasts will tell me when they have a new episode,
but then some don't.
And for those, I just have a couple of machines on AWS
and on Hetzner, like being a German,
I obviously needed to take the very German hosting
alternative too.
And that's also where I host all my GPO based servers
because that's the only way for me to have this profitable in any way.
I'll get to that.
But those couple servers, three or four exist,
and each of them, with a certain offset,
checks every single feed at least once daily,
prioritizing the bigger feeds of the bigger shows,
where I know there's a couple of thousands, a couple of hundred thousand listeners,
they get checked every six hours, every four hours, depending on that. But yes, there's a lot of
fetching RSS feeds, which taught me a lot about eTag headers and about caching headers and
these kinds of things, just telling me, okay, we don't have anything. Don't pull the full
10 megabyte RSS feed like every couple hours, please.
Which is fine if you do it once in a while, but if you do this for every single podcast
hosted on that provider, it becomes a problem.
So there was a technical challenge in not overwhelming them with my requests, which
I didn't want to do because I love the podcasting ecosystem and I didn't want to be a burden
on it.
But you also want to be up to date and you want to be able to fetch the right information
at the right time.
So that was an interesting balance to strike.
And did that come out of discussion?
Did you talk with Justin at Transistor or someone like that?
Or did you just figure that out as you're
working through this yourself?
Were there actual conversations with them on how to do that?
Or how did that go through?
Yeah.
Justin is an example.
Talked to him a lot.
Talked to Greg over at CastOS as well.
A lot of the podcast
hosting providers that also are kind of in the indie hacker bootstrapper community. I had a very
easily approached them with this and had a good conversation. But again, it's all open source,
so there's a lot of documentation around it. Podcasting 2.0 is kind of a standard format
that also explains how certain things should be done and how certain
API should be accessed. And, you know, like there's a lot of good docs out there.
And when I started actually downloading those files, one of these larger companies reached out
to me because they saw I had put the name of Podscan in the user agent, which was, you know,
important to just tell them, hey, I'm a bot.
And they were like, hey, cool, you already put your name in there. Here's a couple more resources
on how this should work so we can kind of discard your downloads against the reporting that they
have to do for marketing reasons. My podcast downloads should not count to an ad being played
or something. So there's a whole organization there to make sure that podcast ad plays count as actual ad plays.
Yeah, there was a lot of outreach towards me,
all very benign.
There never was stop what you're doing.
There always was, hey, this is how we do it right.
So it's a pretty good community.
Again, building open standards.
There's a lot of like willingness to give
in this community for sure.
Yeah, yeah. Very cool.
Okay, man. I was excited to talk to you because I'm like,
there are some interesting technical challenges,
and I didn't even think of this one.
I'm just finding all these podcasts
and downloading them efficiently and all that sort of stuff.
We're already getting started here.
But the two areas I want to dive in a lot are the transcription work.
So once you download that,
the actual parse out the transcription,
then summarize and all that stuff,
and then the search infrastructure.
You already put it really well,
the scale is independent of your customers,
which is an interesting problem that not a lot of people have.
It's like all this creation is happening apart from
your customers that you have to ingest regardless of that,
which really blew up your probably cost and infrastructure right away.
So I want to talk about that stuff.
I guess let's start with transcription.
I guess walk me through, like OpenAI has whisper models and things like that.
I guess the one thing I hear people, like all these AI companies allow you to do transcriptions
via API, but a lot of people say,
hey, once you get to big scale, like those are expensive.
You said you run your own GPA, you flee.
I guess like maybe walk me through that infrastructure
of like, when, how are you transcribing these episodes?
Let me even walk a little bit further back
because the business that I built originally,
the whole voicemail thing, right?
That's kind of what started,
did the whole journey into what thing, right? That's kind of what started the whole journey
into what eventually turned into Podscan.
That business needed transcription too,
obviously, right?
Voicemail, you want to take in the audio,
you want to send over the audio
so people can put that into their podcast episode
and then like answer a question or play it to show
that people talk about their show.
But I also wanted to send an email over,
hey, you got a voicemail, this is what they say.
So I needed to figure out how to take a 20,
30 second fragment of an MP3 or something
and transcribe it effectively.
So that's when I started, and that was,
must've been 2018, no, that's not right,
2020, one, I don't know when it happened.
It was a couple of years ago
that I just started playing with this.
But pre-ChatGPT?
It's pre-ChatGPT, but it was not pre-Whisper,
because that's when I kind of looked into Whisper for the very first time.
So Whisper, for anybody who is not aware of this,
is a freely available transcription model, like a speech-to-text model,
that OpenAI has been offering for quite a while.
It's, I think, one of their earlier machine learning or, I guess,
would almost be machine translation, although it's not, models that runs
both depending on how you use it on a GPU, obviously, right?
You can can use it in any kind of GPU context, but it also runs on the CPU.
If you use a tooling like,
what is it called, Whisper.cpp, which is just like Llama.cpp, kind of the attempt to bring
the not so large models, I guess smaller language models or medium-sized ones, quantized models
most of the time, because if you try to get the full model on a normal GPU, it wouldn't
work because it's just so sizable.
It's the same with Lama and all of these things.
But even just for the speech-to-text ones, they have really good versions of these models
that you can run on a CPU.
And I tried that out on my little MacBook back in the day, and I was amazed.
It took a couple seconds for like a minute's worth of audio, which nowadays is almost a joke. But I mean, if that runs
on a regular server that doesn't even need to have a GPU, and you
don't have like 100 customers trying to do this at the same
time, then that's perfectly fine. Right? That's a background
process. That's some kind of queue me up and send me the
result later. And then I dispatched the email in a couple
minutes kind of work. So that's what the original contact for me was with transcription. I was like, this is amazing. I mean,
these models, they have different models of different kind of quality. There's a tiny model,
a small, a medium, a large, and they all obviously are bigger and better or smaller and worse and
smaller and faster and bigger and slower.
So there's a trade-off there.
And when I started with Podscan, where I wanted to actually get the stuff transcribed on the
GPU because of the scale that we mentioned, the large model was extremely slow compared
to the tiny one.
Tiny one was, I think, 24 times as fast as the large model. So transcribe an hour
could take you a couple seconds or it could take you 15 minutes or even more, right? Depending on
the quality that you want. So I had to build almost an infrastructure because when I was
beginning, I didn't have any funding, didn't have any money. I just wanted to see how this works.
I had a Mac Studio with, I think it's an M1 Mac Studio. It's not the latest and greatest now,
but for the longest time, all my transcription actually ran locally on this very machine that
I'm talking to you about. I had this thing and when I was a power outage, I had a problem.
It was wild. But yeah, most of the transcription happened on this machine, then I bought a Mac Mini
because I needed something for my studio computer
where I record and all of that stuff.
And I figured out, hey, if this thing is sitting idle,
this could do maybe 20% of what my Mac Studio can do,
but that's 20 additional percent
on top of the transcription there.
So that one was running 24 seven as well.
And then I tried to find alternative ways
because I knew that obviously,
can't just transcribe this stuff
on my local computer anymore.
So I looked into what does AWS offer
in terms of like GPU compute.
And that was quite expensive because, you know,
AWS has a certain kind of premium
for a certain kind of extra service
that other providers don't offer or not,
offer not as reliably.
So I looked into others.
Google didn't have anything back in the day
that was accessible to me. So that didn't happen. So I looked into others, Google didn't have anything back in the day that that was accessible to
me. So that didn't happen. And there were things like what is
it called? Ori was one and Lambda Labs was one that was
really cool. Like they had kind of hosted instances too. And I
think on Lambda Labs on a 10 graphics cards was the first
time that I started using like transcription on other servers
someone else's Yeah, just just externalized and all like VPC.
So I'm a Laravel developer,
at least that's what I've been over the last couple of years.
So I'm just running a Laravel applications that
call like FFmpeg in the background or
that call like a binary for transcription.
I'll get into that in a second.
So I just set them up via SSH,
and they were just running on these things and they
were talking back and forth to my API.
It's still what it is right now. The level of orchestration in my business is,
let's call it horrible, but it works because it's just such a mishmash of all these different
things that came from the initial locally running thing. But yeah, that's transcription
right now runs through a tool that is called Whisper C Translate 2 for anybody who's interested in that. But there are many different kinds of binaries
that you can install on an Ubuntu server or on a Mac,
for that matter, that run Whisper in some capacity.
There's Whisper X, there's just the original Whisper,
I think, that tries to run it on a CUDA infrastructure.
But I found Whisper-
Yeah, why did you choose the C Translate to-
Because one of the things that I wanted for Podscan
was diarization, like figuring out who is speaking.
That's kind of what this is, right?
Like speaker one, speaker two,
that kind of stuff doesn't come for free.
It actually is a quite computationally intensive act to do.
Like you, just from me trying to understand it, right?
You take the waveform of the conversation
and then what a diarizer does is look at different parts of the waveform and see how they
structurally differ. And if the waveform looks kind of like this, probably speaker one, if it's a
little bit louder or like certainly shifted in a certain way, that's speaker two, this is simplified.
But that's kind of how diarization works. So it has to go through the whole audio file
and check for the just slight differences in when people speak
and then there's like a dead sound detection in there. And
all of that, this it's computationally intensive. In
fact, it is so computationally intensive that it takes almost
as long if not longer to diarize a podcast or an audio file
between all of them are podcasts,
so that's why I'm defaulting to this,
as it would take to transcribe it.
With Whisper's latest and greatest,
what is it, version three large turbo model,
which is like a thing that they invented halfway through
me building on the large normal model.
They released a turbo model,
which was like 12 times as fast,
which immediately 12x'd my throughput
in terms of transcription, which was wonderful,
but I digress.
So to be able to-
And similar quality for that?
Almost the exact same.
Like they figured out, I think a drop like the word error rate
went up by 1% of something, which can be significant,
but not for a podcast transcript
that then go through an AI step, that's fine, right?
So to be able to diorize,
I needed a tool that had diorization built in,
or at least had access to diorization
because some of those Whisper tools did not.
People just want transcriptions.
They don't necessarily need who's talking and when.
It was important for me and my customers, for some shows,
but not for everybody using the tools.
And C Translate 2,
which in itself is a completely different tool,
it's meant to be a translation tool
with Whisper turned into a translation
and transcription tool.
So for them, it also mattered who was speaking.
So they had a diarization thing built in
which is called PianoTé or PianoT for the people
who don't speak French, like the people who built this tool.
I chatted with those folks too,
because once they noticed that I'm using this to diorize,
and again, build it in public, right?
I shared it on Twitter and all kinds of places.
That gets to people out of the woodwork
that built these kinds of projects.
So they reached out, we had a chat,
it was a lot of fun just to see where they are going.
They wanted to build an API for this
and everything was an interesting,
almost business-focused conversation,
not even technical conversation at that point.
But yeah, that PianoT tool is exactly this.
It's a diorization tool,
and then that interacts with the transcription tool,
and that allows every single timestamp
to have the speaker label in it as well.
And is that in a single pass,
or does it do like one pass and spit out a transcript,
and then you run through again,
and does the diorization, matches those up.
It's just a single path?
No, it's the other way around.
It diorizes first because it needs to know
where the different fragments are.
And then that gets fed into it,
not even the transcription process,
but the, what is it called?
The timestamp assembly process
that happens throughout those steps.
It transcribes, here are the words,
and it's like, okay, this is when the timestamp is.
Who was talking then?
Then it pulls in the speaker label.
It's really just a string formatting situation,
but it pulls that data in later.
That's still one pass with the tool,
but obviously it's an individual step in
a multi-step chain of many different things.
Yeah, yeah, interesting.
You mentioned starting with these short voicemail
recordings, and now, of course, you have hour-long,
two-hour-long, who knows how long podcasts.
Is there a difference in quality between those two?
Like, is it able to manage a much larger file just as well
as it is a shorter one?
It takes longer to do?
Or is there like a degradation in quality?
Funny enough, that depends on how many of these things you run in parallel
on the same GPU.
That's been my experience.
So one of the things that I very early was focusing on was like
maximizing efficiency, because if I'm going to pay like $300, which is I think roughly
what I pay for a GPU based server on Hetzner, just saying, right? Like that's pricing you will find
nowhere else. I mean, those things sometimes just explode apparently, but that's fine, right? Like,
obviously that's fine for you. It's, it's stateless. Yeah. Yeah. Right. And there, there are
migration paths. And if you build it well, you could just redeploy. So if I pay that much money,
then I really need to use this GPU 24 seven. So I tried from the
beginning to figure out what's the optimal, just paralyze
ability of this, like how many of these things can I run in
parallel, and I found that anything more than four tends to degrade performance at
the later points in conversations, like 30, 40 minutes in. And that might well just have been
that particular version of the model on this particular kind of GPU, but I've seen these
things and I've kind of reduced it ever since. I used to run like 12 or 15 in parallel because I could.
I had a H100 once. That was great.
Like running it on that kind of it's a total waste.
Honestly, that's another learning.
And I'm just going to throw everything I learned out here because those things.
This is great. This is awesome. Yeah.
I'm going to talk about them as randomly as they appear to me,
like in the process of building the business.
But if you're doing transcription on high-end GPUs, you're wasting the full potential of the GPU. Like, even if you're
running them in parallel, for some reason, running like a Whisper large model, even the large, large
one, not just the Turbo, if you run one or two or 20 on an H100,
the throughput of all of these together
is probably going to be the same.
It is pretty wild.
It's like you might as well use a really old, weird GPU
that is a bit slower, but much, much cheaper for you
to run these things on.
But they're not using the whole, I don't know, VRAM of the thing,
that they're not using the full memory space. I'm grasping for straws. I don't know what they're not using the whole, I don't know, VRAM of the thing, that they're not using the full
memory space. I'm grasping for straws. I don't know what they're not using, but they're just
not using it. So for transcription, they're not getting any faster. They're kind of bound by some
internal bus at that point. That doesn't change. So it's not very parallelizable. So I stopped
these $1,200 a month rentals very quickly
after I noticed that I could just buy four full servers that would do way more than I could ever
run on this thing. And yeah, that's my learning here too. But there's a lot of that.
Yep. And that's surprising to me that you found the quality degraded running in parallel.
I would have just assumed they would have gone slower if you're overloading it.
But it's actually the quality, which is going to be a harder thing to sort of pick up and notice for a little while.
Yeah.
It was mostly when I was starting to hit the 80, 90 percent of VRAM on that particular GPU,
I think it must have something to do
with memory management or something in there, right?
Like the parallelizable things,
they are only that parallelizable
because there's always context switching in that.
And who knows exactly what magic happens behind the scenes,
but it did impede the productivity of those systems.
So I found a working set.
Do you, you're transcribing so many of these per day.
Do you have any prioritization based on like,
popularity of the podcast and things like that?
You do, okay.
So.
Yeah, I had to build this.
Like I have several queues that just internally even
where my, it should be PubSub, but it's not because it isn't.
It just happens to be an API-based thing
where my clients, like the transcriber API servers,
they just go to an API, fetch a thing,
and then they come back when they need a new one.
All of that queuing is handled internally there.
So it's just a big old redis with multiple queues with IDs, right?
That's pretty much what it is.
So I have three queues.
There's a high priority, a medium priority,
and a low priority queue.
And then there's a fourth one,
which is a do this immediately queue
that skips all the other queues.
Because sometimes one of my customers
needs this transcript right now,
and I don't want to queue it with the other ones
that, you know, maybe Joe Rogan is just really something and he would get the first spot because he has one of the biggest ones I need to even get
like in front of Joe Rogan on the queue. So I have this kind of immediacy queue as well.
How does something get triggered? Like is that a customer reaching out to you and saying they need
it or are they allowed to like click a button in the thing saying, hey, I want this right now?
Yes and yes. So both of the both of these. So I'm using a lot of internal signals in Podscan because it's kind of hard to figure
out which podcasts actually are popular and useful.
It's easy to figure out who's popular.
You just need to scrape their Apple podcast page and count the number of reviews, or you
need to check their podcast ranking somewhere.
You know, popularity is something that I figured out quickly,
lots of scraping, let me tell you that.
But usefulness is a different thing because again,
useful to my customers is very different than useful to
the full humanity who wants to listen to shows.
I needed to build internal signals into
Podscan that signal to me that somebody finds
this podcast useful
now and maybe also in the future.
So when there's a lot of social mention tracking on, on, on Podscan, that's one
of the big features, right?
It's kind of Google alerts for podcasts.
Somebody mentions my name, your name, you get a notification.
So whenever this happens to somebody who's a paid customer, for example, that
is a signal to me that this podcast from which this mention originates
probably should get at least a medium priority in the future.
If then somebody clicks on that mention, goes to the part of Podscan where they can look into
the transcript and they actually stay there for 30 seconds, okay, this is a high priority candidate
now because that actually is a reasonable assumption that this might be useful in the future.
And there's like 20 of these spread out
all over the system.
Somebody searches for something and clicks on it, right?
I'm just collecting a lot of signals
to have my own internal priorities.
That's a good signal.
Yeah.
That's very cool.
So I guess I'm just building with AI systems.
You do a lot, the prompt is so
important and getting the right context in there.
Is that important when doing transcription?
Are you building out some sort of prompt and
context to go with the transcription?
Is the translation, or not translation,
transcription context specific?
Like, hey, this is a tech podcast and you
indicate that in some way, or is translation, transcription context specific? Like, hey, this is a tech podcast and you sort of indicate that in some way,
or is that less so in transcription
where you're not doing as much context engineering
or whatever you wanna say there?
I wish it was.
From the beginning, that would have been something
that I really, really have liked to exist,
mostly because brand names are hard.
Just imagine Spectrum or Charter, like an ISP company or like a cable company,
those kinds of things.
Companies with names that are also actual words
that other people use in some other context,
that is the easy part.
But now let's look at like Feedly or Feedlyly or whatever, right? Things that don't really exist in human
language. How are they going to be transcribed? First off,
they're going to be mistranscribed, like to begin
with, because the thing is going to associate a different word
with it. And then, well, do I now have to put an alert on the
word that is not the word that people are looking for? It's
horrible context is really, really hard. So the problem is that Whisper, as just the model that it is,
it has a context string you can attach to it,
where you can give it almost a dictionary of words
that you know that are going to be in there.
But you have to know that they're in there.
So I can't just give it a dictionary
of all the interesting words that I want to find because I did that
in the beginning and then it found them in all the wrong places, right? Because
that's the thing with context. The moment you give an LLM of any sort and I would
assume that these things are kind of LLMs, really small and different, but
they are. The moment you give them that, they find it because there are these
kind of people pleasing mechanisms built into them, right?
So that does not exist.
What I do at this point is I feed the title of the episode
and the name of the podcast as context
into the episode itself.
Sometimes if I have it, the names of the people
that have been on the show previously,
I give a little context with actual verbs and nouns
and names that exist.
But I wish I could give it semantic context.
This is a tech show.
So look for that. Doesn't exist yet.
I wonder if OpenAI is ever going to work more on Whisper as it is,
or even if they are,
if they're going to open source it,
which I don't think they are,
but we have what we have.
But that context layer needs to happen afterwards in an actual text based LLM step if needed.
Yep.
Yep.
And so do you take that original transcript and then feed that through an LLM with more
context to like clean up that transcript?
Or like what happens between that initial transcription and diarization and putting
it into your search system?
Not much.
So I thought about it.
Obviously, there are experiments that I
run with fitting that thing right into GPT
or right into Claude or whatever.
But it is so expensive.
It is so expensive to run.
Like, there's 70,000 or so podcast episodes
released over the planet every day. It is so expensive to run. There's 70,000 or so podcast episodes released
over the planet every day,
and half of them are just people reading from the Bible
or the Quran or the Torah.
So I kinda dismiss them as relevant
for my marketing or PR people,
so I don't describe all of these, selectively don't.
But the other 50,000 or so that come into the system,
if I were to take every single transcript of these
and send that like full context into a chat GPT,
it would be okay if I just have to answer a question
in there, right?
Like I have this, this one of the features of Podscan is
whenever you have an alert set up, you can look for keywords
but you can also say, well, if a keyword is found,
let me ask this
question off the AI, like, does this podcast episode talk about flowers, as well as the thing
that was mentioned? Or is there more than two or three people in this show? Because I only want
alerts for like, lots of people on podcasts or whatever, right? Whatever question you might have,
you can put into an alert, and it then takes the full transcript, puts it into a GPT and asks this question,
you get a yes or no back.
And judging from that, I sent the alert or I don't.
That's like one of the most popular features of FontScan
is this kind of context-aware alert,
because you can ask any question, right?
Is this a good candidate for this kind of tool?
And if GPT thinks sure,
then you get that as a notification.
You can build a lot of really cool stuff on top of this.
But that's only feasible because it's not happening
to every single episode, because it's only
when a keyword is hit.
And because the output of the LLM is just one or two tokens,
it's a yes or a no.
If I were to put, I don't know, 500,000 tokens in there,
like a Joe Rogan can easily get there
with his 4 and 1 half hours of just brain dumps.
And I expect 500,000 tokens out back,
and that's the expensive part.
Man, I did the math.
I think I would pay $10,000 a day
if I were to do this at scale.
This is what I'm currently paying
for the whole setup a month.
Like, the whole infrastructure costs me that.
Have you looked into running LLMs locally,
like you've done with the whisper or is
that just like too much of a hassle to do that? I've done this and they still are. These are kind
of my backup LLMs if there is ever like so often happens that OpenAI has just you know to turn on
the respond with 500s mode for a couple of minutes or a couple hours. So for that, I have Llama 3.1, the 7b,
once running locally on these servers,
because I use my transcription servers
also as my dispatch to AI servers, right?
They just go through a couple of queues
and they fetch new stuff, and then they send it,
either they send it out to OpenAI
or they run it locally as a fallback,
which then hampers
my transcription efforts, which is why I don't like it.
But yes, I have llama 3.1s.
They're fine for the question answering thing
that I just described, that the context-aware filtering.
They're not that great at data extraction,
because they are just not that great.
You can do it if you reduce the context window
and you chunk over a conversation. You extract part by do it if you reduce the context window and you kind of chunk over a conversation,
you extract part by part,
and then you kind of synthesize it in another call,
but that takes like five minutes apiece.
It's kind of, you know, for one show,
and that's just a lot.
So I've tried this initially, results were okay-ish,
customers were like, okay, I guess that's mostly right.
But ever since I switched to OpenAI to their platform,
particularly with what is it the 4.0 mini model
and the nano model that they recently released,
they're fine for all of this extraction stuff
and all of the summarization and these things.
It is still quite manageable in cost.
It's like 1,500 or so a month
when there's a lot of work to be done.
I think I hover around like spending 50 bucks every two days or so.
That's still math.
Okay.
Yeah, interesting.
Yep.
And then one last question I have around the transcription stuff is like, how do you evolve
this?
Like if you think of either a new feature or maybe a new
mechanism, do you go back and redo old episodes or do you just say, hey, we only
do that for stuff going forward now or how do you like, because you know, going
back and doing 35 million episodes would be quite a cost. I guess, how
do you think about the evolution of the system? So as a frugal entrepreneur who also is serving a group of people who are
interested in what's happening right now, the answer should be clear.
I'm mostly focused on delivering this value for current and recent things.
There are always features and people have used them to retranscribe older episodes
or reanalyze older episodes episodes. I have a an API
endpoint where people can tell me, hey, this podcast needs to
be retranscribed this with stuff in there, all episodes, and
they can call that endpoint a couple times a day, there's a
rate limit there, right. So it's like, you can tell me that
there's stuff wrong, I'm going to fit it into the other things
into my cues. But realistically, the most important stuff happens today because
it matters. If there's another oil spill, people don't care about episodes from like 20 years ago,
or like 10 years ago, right? They need today prioritized. So the strategy is once Potskan
becomes so hyper profitable that I can do this, I'm totally going to do it. You know, it's really
what it is because it's really just a question of resources. All these cues that I can do this, I'm totally going to do it. It's really what it is because it's really just a question of
resources. All these queues that I have running, if I were to, I
think at this point, it probably would be 2x or 3x my system, my
infrastructure of transcription servers, I would walk like three
days into the past for every day that I walk into the future,
just in terms of like how much effort I can spend on catching up with older episodes. So that 35 million that I walk into the future, just in terms of how much effort I can spend
on catching up with older episodes.
So the 35 million that I've so far ingested
are obviously in the past,
and they still go back further back into the past
whenever there's room under queue.
Yep, okay.
Okay, so you do redo some older stuff
when there's capacity, interesting.
Okay, yep, very cool.
Okay, I wanna switch to search now
because search is such a hard problem.
It also just differs based on scope.
You hear a lot of people say,
hey, use Postgres or MySQL full text search.
And that totally works if it's first and last name
in a CRM within an organization, right?
But 34 million podcasts, transcripts that are,
I don't know, 50 kilobytes, I don't know how big they are,
but like many kilobytes each,
like you're doing serious search stuff.
Tell us about your, where are you at right now with search?
Are you using Elasticsearch now?
Kind of, I'm using OpenSearch now.
Okay, that counts, yeah.
Right, it's kind of, yeah.
Yeah, that counts.
Well, because I thought,
were you Melly search at one point? Yes. Okay. So, and Well, because I thought, were you Mellie Search at one point?
Yes. Okay. So, and I still am. Okay, I can tell you, I can tell you the evolution of this and
what choices I made along the way. So, when I started out, again, it's a Laravel project. So,
it's a PHP application built on Laravel, which is a wonderful system. Like, it's really meant for
people who try to build business-able solutions,
like monetizable solutions on top of it.
So I had a very easy time just integrating all the default components to get
paddle on there as my payment processor and to get like login with this and login with that.
Like just get all of these things into the application real quick.
So I could then build the business logic that only I could build.
That's kind of the idea.
So one of the things that Laravel offers
is a thing called Laravel Scout,
which is a library that plucks into search engines.
And they support, I think, three.
One of them is Melee Search,
the other one is Type Sense,
and then Algolia, I think, those three.
And all of these are kind of real-time search databases.
They're not really these kind of big old
full text everything search. It's more like if you have, like
you said, titles and, and whatnot, you can put them in
the index and a super super fast, like sub five millisecond
searches and all that. And obviously, that's great, right?
Why would you not start with this? But then came reality and
then came Okay, this is now 500 gigabytes of text data
that this thing needs to search.
Maybe search still does, it's really cool.
It ingests all this data.
It has just like Elasticsearch,
just kind of reverse whatever it's called lookup thing,
TF-IDF type thing or inverted index or something like that.
That's the one, yeah, the inverted index, right?
Like the frequency of the term is
invertedly proportional to how relevant it is.
So that is also in Melee and all these other tools.
So they can find things rather quickly.
It just turns out that ingestion is rather slow
after a certain scale.
I was talking to the founders of Melee Search as well,
because again, building in public,
once you talk about I'm using your tool,
they were like, hey, let's talk.
And I've had this many, many very interesting
troubleshooting, bug hunting,
and feature request conversations
with the CTO of Melee Search
and the CEO of Melee Search was really great.
And they helped me out setting it up.
I sent them a full snapshot of my big old database,
and they told me this is the biggest one we've ever worked
with in this tool because so far they only had done
imagine IMDB style things.
You have a name of a person, name of a movie,
just all really short text, easily indexed,
easily prioritized.
But like you said, a transcript of a show
very quickly goes into the hundreds of kilobytes
and just to even store that reliably and access it reliably
and then put into an index somewhere, super hard.
So I got a lot of help from them.
They built new versions of the software that had features
that I needed in this just so I could keep running it.
And I think I'm still running it for some parts
of the website where I need that quick lookup,
but for the actual gigantic big old like Boolean search and the wildcard search in transcripts,
I very recently migrated everything over to open search on AWS in like a hosted instance there.
Nice. And how is that originating?
I guess like how recent is that and what are your feelings on it right now?
Well, I think I only felt burnout 4,000 times
throughout this process.
It was incredibly hard.
When it comes to data that is millions of items,
that's already hard.
When it's millions of rows in a database,
millions of anything.
But when it's also then almost almost four terabyte in size,
that needs to be just even sent over the network somewhere.
And it needs to be scheduled, it needs to be loaded into RAM and sent over like,
because when I put something in my search database,
it's slightly different than what it is in my MySQL database, right?
It's enriched with other things. What's the name of the podcast, right?
That's just one foreign key lookup away.
But, you know, you kind of have to collect it
and then put it over.
So man, was that something.
The migration process, I built it
so that it was running in parallel
for the thing to still be used in production.
But in the background,
I was shifting this information over to open search,
which is really, really good at ingesting this data.
And still is, it's extremely fast, super reliable. You know, knock on wood, I guess. But this has been very, very cool to see how
reliable and fast the search has been. But man, the migration was probably one of the most stressful
things that I've done over the last couple of months. Yeah. How long did that take to move
four terabytes over to open search? Because of the fact that I needed to transload
a lot of data from other tables in my database
and often had to look up very different things
like chart rankings and all of this too,
because we track that as well,
which is a whole other thing.
This is probably my biggest table
is not even just a podcast with the transcripts
is the chart history of all chart locations
of all podcasts all over the world.
This is a solo pro-narrative project.
It's amazing.
Yeah.
I don't know how this hasn't killed me yet, but when that happened, I think it took 14
days as a background process to migrate this over.
I had to build a migration, a restartable, a skippable migration command and that kind
of stuff and just keep running it
in some shell that was running on my third monitor
and looking at it, hoping that it wouldn't break,
that kind of stuff.
Yeah, it took a while,
but then I built a kind of a hybrid system
where I would then launch it as a production feature
that people could switch over
or could be switched over to the new search on the API.
So I could see, does this break
when people search for the old thing?
What results would they get there?
Are they the same as the new ones?
Then I slowly migrated over time,
flipped the switch one day and nothing exploded.
It was one of the happiest days in my developer life.
Ever since then, this has been powering
discovery and all kinds of other things in the background.
It's been hard, but it's been rewarding. It's really cool.
Very cool.
Yeah.
How does the cost of your open search infrastructure
compare to the cost of the GPUs from Hertz and Error?
So open search, I think for the production one,
I'm paying $700 a month right now.
And I think it's 500 gigabytes of data, the provision for 500. I think we are at 350 in month right now. And I think it's 500 gigabytes of data, and the provision for 500.
I think we are at 350 in data right now.
So as this grows, obviously that will grow too.
But I think that makes it, let me say,
twice as expensive as the Melee Search cluster,
cluster, Melee Search server, which is a,
and I'm kidding you not, a 64 core Hetson machine with 350 gigabytes
that I pay $300 for a month.
Oh my God.
Oh, so you were running your own,
I forgot that Melia's open search,
so you were running your own.
Yeah, so that's all it did.
So yeah, that board broke a couple of times, that was fun.
But also the only problem there was the speed of ingestion
and their ingestion queue,
that could route to one million objects
and then the whole thing froze at some point.
So I had to build, what was it?
I have this kind of back off logic
and that queries that server's queue
counts the items in the queue
and the depending on that sends over new items or doesn't.
I was like, I don't wanna deal with this.
Could AWS please solve this for me?
And obviously the open search there is capable of dealing with my inflow. So that worked pretty
well. But yeah, that still exists. That server is still running for certain kind of queries.
The smart, yep, the different things. Yep. Yeah. Yeah. I always say like elastic search is my
least favorite thing to run, but I think it's just partly because of the nature of the data.
You're often just throwing large amounts of data at it,
doing full-text search, or maybe some aggregations too.
And it's just like, it's a hard problem.
Honestly, I can tell you I've been obviously
dealing with Search not just in this project.
I've done it before.
I was a salary software developer for companies.
Search is always something, right?
And it's always hard.
And Elasticsearch in particular, and OpenSearch
is kind of almost a dialect of it.
I don't want to insult anybody working on either project,
right?
Because there's this kind of competition between of them.
But effectively, to me, they are the same.
So they are.
I mean, it's a fork of it, you know?
So yeah.
Right?
It's just conceptually, they don't differ enough for me to see them as different entities,
so I kind of treat them the same.
To me, what I always hated, and I mean this as a developer, I did not enjoy the DSL, the
description of queries in there.
That is just rough.
It's rough to understand how exactly should and all of these weird bool into queries. And no, it was hard.
You know what saved me there? AI. AI. Yeah, this really is really that. Like, I don't think I have
written a single part of any of these queries. And these are really complicated, composited
queries. Like, I'm using Laravel OpenSearch, which is kind of sitting on top of Eloquent,
which is Laravel's ORM, kind of the thing that turns my whatever things are into SQL queries
or whatever, that exists for OpenSearch too and for Elasticsearch as well.
And I'm just trying to kind of composite this like I always would as a Laravel developer.
But sometimes I have to do a raw query.
And I have not written a single line of this.
This has all been Juni, which is kind of the coding agent
that is inside of PHP Storm, you know, the JetBrains one.
And also Cloud Code and whatever these fancy tools are,
they've helped me a lot too.
But none of these DSLs were written by me
because I just couldn't.
And I didn't have to.
I could just tell the thing,
hey, I want boost things where the full word is in there
and then also look for ands and ors
and do a Boolean query, make that.
Like I just literally, that's how I code nowadays.
Maybe that's interesting too.
I did, yes.
I'm talking into this microphone
for as long as it lets me talk.
I'm using a tool called Whisper,
what is it called, Whisper Flow?
Whisper Flow, okay.
Which captures after you press a key command,
captures everything you say, then transcribes it,
sends it through a quick AI to kind of clean it up
and then paste it into whatever text input
you have currently active, which is really cool
because you can do this on Twitter,
you can do this inside your IDE,
you can literally code with the thing you shouldn't,
but you can.
So I just effectively, I'm writing a mental draft
and I just speak it out loud, throw that into Juni
or Cloud Code or whatever, or cursor if you fancy.
And then I just let the thing do whatever I told it
for 10 minutes.
And that's exactly how I built most of the functionality
of Podscan after a year or so of building it,
because that's when I started using AI.
Yep, that's so cool.
Yeah, I've heard people talk about Super Whisper,
Whisper Flow, doing that stuff.
Like John Lundquist is another big one on this sort of stuff.
And I haven't, you know, I tried Super Whisper a year ago
and just haven't gotten back into it,
but that's an interesting one.
I would say that, and then also like,
are you using any of like Cloud Code or Codex
or any of like the truly
terminal based like set it and forget it type type things so i've um i've used clod code initially
but then i found juni and juni is exactly this like they just built effectively a clod code copy
or clone before it came out i think but don't quote me on that into Into the IDE, into PHP Storm, which kind of sits at the side
of it, and you can, it is effectively a terminal
with a little bit of nicer UI.
So that's how that works.
So yeah, I'll use it all day long.
Like that's, I code so little now that I feel-
Code in terms of actually typing anything?
Actually typing, obviously like sometimes there's a line and I deleted or remove it put a lot of a lot statement there or i notice something that the i didn't do the way i would do it because it has a different code smell i would be factor whatever but sometimes that perfect doing is just me writing a comment a perfect this to look more like that and then i had to be a no with the thing and then it does it so. So even coding to me is more a managerial task
than it is an implementation task right now.
And a lot of founders really hate this,
like if they are very technical,
because coding to them was the moment of creation
that the joyful part of here is what only I could do
and this is the thing that I've built.
I'm starting to kind of miss that too,
but the speed at which I'm now creating features,
like this migration to a search engine
that actually does what my customers pay me money for
instead of just being fast.
And this is no diss at Melee.
Melee is wonderful in all kinds of scenarios.
It is even wonderful for mine, but I found it.
I needed more.
I needed queries that people can, you know,
almost like reports built build very complex things
in.
It just wasn't possible with the old one.
So I had to migrate and AI assisted coding allowed me to do this.
I would have spent months like just slogging through this and probably not get it right.
Yep, exactly.
And yeah, it's like it does feel like something's lost, but it's also like fun, especially if you're doing UI or front end stuff, like just the pace of creation there and like seeing that change, like that's a different kind of high to, and then you let it do it. And then it takes a couple times of you
telling it what is that wrong to get it right. But still,
within 15 minutes, you have a completely refactored front end.
And then if you like it, you commit. And if you don't like
it, you roll back to the last commit. It is so simple. It
costs you no brain activity other than looking at it's
like, nope, or yes, right. And it is it is, to me, coding now is really prompting.
And in the sense of what we always have done,
we would have a scope document.
We would kind of describe what the product is gonna be.
We have some kind of information source for us,
what we want it to be,
and then we would implement it from there.
Now we just need to create that document
and allow the machine to implement it best it can, which is often better than I would probably do it. I consider myself a
0.8 developer, like 0.8x developer, not a 10x, not a 1x. I'm under there, but at least I'm good at
tool use. So, you know, that helps. Yeah, yeah. I don't know that that's true, but yeah, it's it's gonna be a sec. So you say
Whisper flow and and junia are there any other like AI use like v0 for for visuals design stuff or anything else?
Okay, so
Recently, I have found another really cool use for tools like v0 or or lovable like those things
They help so much in showing potential customers or potential like collaborators orero or lovable, like those things. They help so much in showing potential customers or
potential collaborators or clients or whatever what an integration of your tool in their tool
could look like. So I was recently talking to a client who is running an analytics company,
like somewhere in the artist space. And they track a lot of things for a lot of different people.
And they wonder if Podcastcan can help us
with podcasts here, right? They probably pulled this data in and
kind of use it for things. They didn't really know what to use
it for just yet. But they knew that they were missing out on
podcasts. So they needed a new way to get that information. So
I talked to them, we had a little chat for an hour or so.
And I took the transcript of that conversation, the full
transcript of the chat, the full transcript of the chat,
the kind of demo, I guess, the product demo.
I threw the transcript into cloud,
into Anthropix cloud, 4.0 or whatever, 3.97, nobody knows.
And I told it from this transcript of a conversation,
write a prompt for lovable so that I can build, so that Lovable can then build
a tool that shows this person exactly how our integration could work in their product.
So then came out a prompt, I took that into Lovable, had it generate this thing, worked on
it for I think an hour or so just like moving little things around, making them clickable,
coming up with more scenarios.
And then I took that link and sent it
to the guy the next morning.
And I had a fully working kind of working, right?
It's still kind of a click dummy thing,
but it was a fully integrated data centric version
of what the product could look like without data.
For that stuff, for like 20 bucks a month, yes, right?
Sales enablement, that's what that is.
Oh my goodness, yeah, yeah, yeah.
That is very cool.
Yeah, yeah, that's super cool.
Oh man, I love that.
Yeah, I love, I'd say like I'm very bad
at just design generally, so like even using VZero
or I haven't just tied out level yet, but I should.
But just to like get my juices flowing
on what something could look like is very helpful for me.
So I love those reasons.
Yeah.
Yeah, I use this all the time just
for trying to figure out how to make things more consistent.
With the moment I have something in my code base,
like with Juni here, Juni has full access
to all of the view front end files that I have.
So if I tell it, hey, look for files
that don't look like the others, it actually does.
And then it kind of streamlines them.
It's so cool.
Like these tools have insight into,
into even concepts like, you know, white space
and order and hierarchy that I don't have.
I don't have the eye for that, but they do
because they're trained on this data.
It's really useful.
Coding with AI is not just having AI write code for you that you already know it's going
to write.
It's making it figure out the things that you would never think about and then implement
that.
Yeah.
Oh, that is so cool.
Okay.
I want to ask you, you mentioned use PHP and Laravel for this one.
With Feedback Panda, you use Elixir.
I guess why did you choose PHP, Laravel, and this one,back Panda, you use Elixir. I guess like what, why did
you choose PHP Laravel in this one or did someone else choose Elixir for you with Feedback
Panda? I guess like why the switch there after having a successful exit?
Yeah, it kind of was chosen for me because at the time I was working as an Elixir developer.
Like when I was building Feedback Panda, I was still kind of moonlighting. That was a
moonlighting project.
And my full-time job was writing an Internet of Things platform that was built on top of
Elixir as a language, which also was my first job with Elixir.
That was just a tech choice by the CTO of the company who thought, yeah, we're building
a very parallel thing here with IoT.
Might as well use the language that is built on top of the
Erlang VM, right, which is built on as the phone switching networks, like deep internal core system,
highly parallel. So that's what I had at the time. So I just used the tech stack that I had
at the time to build feedback vendor. And later after that, even my first SaaS post exit that was
called permanent link, just kind of a link forwarding tool for authors for people who have And later after that, even my first SaaS post exit that was called Permanent Link,
which is kind of a link forwarding tool for authors, for people who have links in books
and don't want them to die when the original website goes away.
It's kind of that idea.
That was also built in Elixir.
I wish I would have built it in PHP because it's just so much more maintainable by AI at this point.
Like all AI systems are built on so many Stack Overflow PHP questions.
They get it right. For Elixir, it's a more rarely used language, and that makes
hiring hard. It makes maintenance a bit harder. It's also different to deploy all of that.
But I think I started a pod line, the voice chat thing, as a Laravel project, because
in my community, in the indie hacker bootstrapped solo software founder community,
Laravel just came up more and more. Like everybody was talking
about how PHP has a comeback now and Taylor Ottwell had built
this amazing thing and not just not only had he built an amazing
framework and Laravel is really, really good as a framework for
PHP, particularly compared to the other frameworks, the one
inside a war, but it's really good. But the the whole ecosystem
around it, like the business ecosystem around it, the tailor
and the team build, plus the actual user ecosystem around it,
just a very kind, friendly and super community centric and
founder centric community that you don't necessarily have
with other languages as much.
Elixir is a very technical language.
It's functional programming that attracts a lot of nerds, which we all are, but a lot
of people who are purists, that's maybe the term.
So you don't necessarily have to have a good PHP.
You couldn't have a good PHP.
Let's be real.
A language like this can't have purism because it does not exist. But I chose that because I wanted to see what
the hubbub was all about. And then I built this product so easily and so quickly, I was
like, okay, yeah, I'm going to keep using this. So that's how that happened.
You're staying here. Yeah. Yeah. Do you think we're going to see just less, I don't want
to say like innovation, but just like new languages, new frameworks
and all that stuff, given the way we're sort of building now
so quickly with AI, like, do you think that's like,
hey, we're going to see less innovation there
because the AIs have already internalized
so much PHP, Laravel, so much Node, all that sort of stuff,
so much React.
I'll answer this with two answers.
The real answer, like the actual answer to your question,
would be yes, I think so, because it's just
kind of a Lindy effect, right?
Things that have been around for a long time
will likely stick around for an equally long time.
And new things, they have a harder barrier to entry
to get even into the LLMs.
So just from the way that technology has been developing
and the
lag in which these things ingest new information means that the older information is likely going
to be more in the outputs. And since these outputs are now generating blog posts, that's going to
feed the ecosystem that then gets ingested back in. So likely. But my bigger meta theory here
is that we're going to look at specific programming languages at some point over the next 10, 20 years,
the same way we look at different implementation
of binary code at this point.
Like, we're going to...
It doesn't matter, like, what this stuff
is going to be compiled down into.
The language that we're going to communicate with
is one that an AI coding assistant understands.
Like, right now, we're coding into a compiler
or an interpreter, depending on what we use, right,
with PHP, or with
JavaScript or C or whatever, it's always this kind of
executable that then turns that into machine code, we're going
to develop a new meta language that sits on top of AI coding
that then I don't know, compiles all front-end related stuff into
JavaScript, because that's still in the browser, and then
compiles all back-end stuff into, I don't know, Rust, like
anything that the AI thinks is the best implementation for this, and then also it can run reliably.
I think at some point, and it's going to freak a lot of people out, the actual programming
language that something is implemented in, it's not going to matter anymore as much.
Do you think there will be a sort of meta-language like you're saying, or will that just be English?
And that's...
I would be surprised if it wasn't almost dialect
of the English language, but I think there will be,
like there has to be a transitionary path.
And I don't know what it is.
It might be, Markdown comes to mind,
like something like Markdown, but for logic, right?
Not for documents, not for presentation, but for representation, like almost like an idea,
topology of some sorts.
I don't really know.
Maybe it's just people rambling into a voice prompt for half an hour.
Maybe that is the interface.
Seems to work pretty well already, right?
It's just like we're doing.
Yeah. Yeah. it's true.
Yeah, yeah.
Okay, one last tech question I have
and then a few business questions.
So real-time alerts, you know,
people can put in these keywords and say,
hey, alert me whenever one of these shows up.
How's that running?
Is that just a cron job every X minutes,
you're scanning all the new podcasts
and firing off alerts,
or what's the sort of infrastructure look like for that?
Yeah, so like I said an hour ago, there is this popping thing where we get notified of new episodes,
then we add them to the internal queue, our transcription systems then fetches that from
the queue transcribed and sends it back to the main server. At that point, we have a full transcript,
then comes another step. That's the analysis step. So we send it back to the same server
that then calls OpenAI or whatever,
gets back the structured data,
adds that to the data in the database.
And then when we have either received a transcript
and we're like, okay, this is enough,
we don't need to analyze it, let's just go.
Or we have received a transcript
and done the analysis step, then we complete.
That's when we start scanning all the
alerts that we have in the system, which is like 3,000 or so at this point. We just go through every
single alert, that's the keyword match. And then if it does, is there also this kind of context
aware thing, sends things back to the API, goes back to Claude or whatever and goes back and
that says yes or no. And that's when we dispatch the notification. So it's every, like real time.
Every time there's a new transcript, you run it.
Every time there's a new transcript,
we run every single Lord Donna to just scan.
Interesting.
Yeah, and is that load pretty,
that's just not that much load to deal with there?
That is just a text scanning.
It's like trying to find, it's not even regex,
it's just, you know, text retrieval.
Yep. I was thinking of doing the queries against Elastic, doing that with the most recent ones,
but I guess if you're just doing it against the doc directly, yeah, you're just doing it in a process.
Yeah. I just load the full text and I go line by line, is this in it? And if yes, then that's great.
I mean, we could also do a more semantic approach probably.
And that might be a thing for future developments
that we would check for, like almost,
like turn it, vectorize it, do embeddings
and check against the embeddings.
And if there's a certain kind of similarity,
but we're not there yet.
It's still that we trigger by individual keywords.
And if the keyword is found,
then that may trigger another AI step or not,
but it's still very much just text substitution
and text like index off presence pretty much.
Yeah, yeah, yeah, cool.
Okay, some quick business stuff before I let you go here.
You know, you sold Feedback Panda, this was 2019, I believe.
So I guess how has maybe Indie hacking
or just the environment changed from 2019 to 2024,
2025, not to be working on Podscan?
Yeah, I've seen a shift in fatigue.
There's like subscription fatigue.
That's a big thing.
Like a lot of things that people would have easily paid some money every month a couple
years ago, then I like, oh, not another thing.
So it's harder to sell it particularly even low
ticket things to people. I'm operating in a space now and I had to claw my way kind of up here,
where my lowest tier price for Potskin is $200 a month. And then it's 200, 500, 2500. Those are
the tiers that I have. I started with 40. Right. And it might even have 29 at some point at the
lowest tier. And you know how it is, like you get customers
that are extremely price sensitive,
but also have extremely high expectations
of how you should be available as a founder,
as a customer service rep or whatever.
So I moved my way up and it's much, much nicer up here
where we are right now.
But it's just getting anybody to subscribe
to yet another tool has become much harder, which that's like the number one problem for indie hackers is getting this initial traction.
And it's harder. It's just harder to build something that people find valuable enough
to pay money for. And then, of course, the AI hype train that has kind of come into our community
has caused an interesting shift. Because it's not that AI can actually build
these businesses easily, right?
Lovable is great, but as I said,
it's not gonna build the full business.
It's a click dummy.
It's a prototype of something.
And then you can take that maybe
and turn it into something bigger
and extend it and build your own business around it.
But the expectation that people now have
is that we could just easily build this with AI, right?
Even though they have no idea
if that's actually feasible or not.
So that is another barrier to entry
because people are like,
yeah, but I'm not gonna pay for this.
Even though they don't have the capacity
or experience to build these tools.
That's this weird cognitive dissonance now
that AI could solve this for me.
I don't need to give you any money, which makes people dismiss solutions that they a couple years ago would have had no problem
paying money for at all. So that's kind of a problem that I see there. Yeah, yeah. It's like,
well, AI has solved this for you, but like through all your work connecting all the AI,
be like, you know, it's like, yeah, yeah. That's exactly right. Yeah. Yeah. AI is not a replacement and it's certainly not very good at
understanding the edge cases of a real customers lived experience.
When you see people building
prototypes and then building little businesses,
and I don't want to belittle it,
which is why I should probably choose a different word,
building fledgling businesses on top of it and say,
I've just built a clone for Sentry or whatever,
and this is so much cheaper, you should totally use this.
Sure, great, but obviously encoded within the code base
of these bigger incumbents are learnings of decades
of customers coming to them with very specific problems
that needed a very specific solution
that is not obvious or even visible to anybody outside the company.
So AI cannot necessarily encode it unless it has access to the internal JIRA or Trello
board of those companies and or code base and even then might not understand where those
things came from.
So that's the mode that you have right now as a founder is your intricate knowledge
and capacity to understand the work that the people that pay you money to solve it for them
actually have to do and how hard it is and how complicated the human aspect of this is that it's
not just easily solved with bits and bytes. It also is a relational, a conversational thing more
than anything else. That's the saving grace of indie hacking is that you can still be present as a founder
in your early customers' lives. They might choose you for that, not for the tech necessarily,
because other people can build the same tech, but only you can care.
Yeah, that's a good point. That's a good point.
The year, maybe like two years
after selling Feedback Panda,
like did you feel at peace?
Did you feel a whole,
I know you've like stayed busy since then.
I think you've done a good job of like staying busy
and doing stuff.
I guess like, what was that experience like for you?
Like it's like having sex, yeah.
Yeah, the exit was, it was interesting
because everybody tells you
that you're gonna fall into this hole of not knowing what to do. And you don't believe them and then you do.
That's kind of my experience. So I was so deeply entrenched in building a software product
that when I had to give it up to give to somebody else and get at the same time find financial
security, which was great. I still felt like half of me was gone. And I needed to
kind of fill a purpose, right? It's like, where do you find
your purpose? A lot of people ask this, and I didn't really
know that that was where I got it from, but building things for
people that I care about, so they have a little bit of an
easier day that had become encoded in my identity at this
point. And I couldn't anymore. So I had a pretty hard time for a couple of weeks,
but then I found, ah, just gonna talk about it,
just gonna write about it.
I always wanted to write, so why not start a blog?
And from that came the Bootser Founder blog
and the Bootser Founder podcast
and the Bootser Founder newsletter and the YouTube channel.
And it just easily cascaded
into this multi-distribution system media company.
And then I wrote a book, like you very kindly mentioned
Zero to Sold earlier, it's my first book that kind of
shows the story of Feedback Panda
and how I approach building businesses,
which I still very much use as my playbook, obviously,
cause that's my experience.
And in that I was like, hmm, I have so many links
in the book, I wish there was a tool that, right?
And that's kind of where Permanent came from,
where I needed a tool to keep the links in my book active.
And then Permanent Link allowed me to also have links
to my podcasts and all of that.
And then my podcasts got bigger.
And then I was like,
oh, I wish people could send me voicemails.
And I needed to keep building these tools
that I needed for myself along the way.
And I think that reestablished my purpose continuously,
which is also why I'm still knee deep in Potscan right now,
even though I wouldn't have to.
Like I could just do whatever I want,
but it's kind of what I do.
I do whatever I want and that is building a SaaS business
that is trying to do on its own
what other companies have whole departments for.
So it's just an enjoyable thing that also makes money and makes people happy along the way.
Yeah. Yep. And you're just like tackling these fun challenges and working with the yeah, just like that.
Yeah. So then like, what's your ideal outcome with pod scan? Like, given that you have the exit earlier
and you have that success, like, are you just wanting to run this until it's boring to you?
Yeah, it's a fun question because there is the reasonable answer,
and that is, yeah, let's just see where it goes.
But that's always the aspirational answer is,
yeah, Spotify acquires a lot of companies.
I'm still looking at it from the founder perspective.
I know that right now we have a team of three-ish, four-ish people that,
other than me, all part-time, just, you know, working on specific things
in the business outside of the technical domain, sales and marketing and that stuff.
So it's still pretty much a solopreneur driven business,
even though it's not a team of one.
And I'm like, that does hamper the capacity of this company.
And I know that with a proper team
and with proper funding to build infrastructure even further,
this could do much more and much faster.
We are now between 5 and 15 minutes
between the release of a podcast and its alert handling,
it could probably be like with just the speed of transcription tools as they are right now,
under 60 seconds, like somebody could release the episode and you could start listening and
they are not even done with the intro and you would already know what the sponsor in minute 65
is saying, right? Like that is what this tool could do and can already sometimes do. But with a bigger team and maybe in another
context, that could also work. So I'm going to see if I can get
there myself. But I'm certainly not going to say no to
acquisition offers along the way, just realistic as a
founder. Like if somebody thinks this is a good thing for them in
their businesses, well, just email me. It's very, very
simple. Yeah, that's where I'm at.
So I'm just going to see what happens.
And if it doesn't work at all,
if it implodes at some point
because it gets so cost prohibitive
that I can't afford running it,
then I'm going to turn it off.
I'm going to see what the next thing is.
Or I'm going to try something else.
I'm trying to decouple my identity as a person
from my identity as a founder, from my identity as a person from my identity as a
founder, from my identity as a founder of that business. These are different layers.
I can always go back to being a founder or just being myself and not be a founder for
a while. It's also fine.
Yeah. Yeah, for sure. Well, Arva, this has been awesome. I'm so glad I got to talk to
you. And yeah, this conversation did not disappoint at all. So keep doing what you're doing. I
keep sharing just the know, just the
process because I just love following along every week and hearing what's going on. I guess, yeah,
for people that want to find out more about you, where's the best place to go?
Well, thanks so much, Alex. I appreciate you giving me this chance to just nerd out for an hour or so.
That's great. If you want to find me, I'm on Twitter or X, as the cool people call it now,
at Arvid Carl, A-R-V-I-D-K-A-H-L. That's my handle there. And podscan.fm is where Podscan lives.
That definitely is a place to check out. My podcast is at TBF.fm. That's the Bootstrap founder.
That is where I've talked to Taylor Otwell. We were talking about Laravel a couple of weeks ago
and he talked to me about like Laravel Cloud
and their AWS deployment and all that.
It was very interesting too.
So I just, that's what I get to do as well, right?
Like since I don't have a job, really,
I just get to hang out with all my nerd friends
and all the other tech people
and I just get to chat with them
and then build my own software on their platforms.
It's bizarre.
It's what a life.
I'm so fortunate to be part of this.
It's true.
My wife always jokes about all my internet friends, right?
Cause it's like, I'm in the middle of Nebraska.
I don't have like a lot of tech people here,
but then like I talked to all these
just different people on the internet
and get to see them at conferences once in a while.
So that's, that's right.
Yeah. Same for me.
Like outside of this window,
there's probably some cattle running around
cause we're just in the country.
Yet I get to talk to you and to this community of wonderful people. I feel
very blessed. I'm very grateful for the life that I have.
It really is. It's a great time to be alive. Yep, for sure. All right. Thanks for coming
on. Really appreciate it.
All right. Well, thanks so much.