Software Huddle - Blocking Bots & Moving from Redis to SQLite with Mike Buckbee
Episode Date: October 1, 2024Today, we have Mike Buckbee on the show. Mike is the co-founder of Wafris, and he wrote a really insightful article last week about moving from Redis to SQLite for an aspect of their architecture. The... article was nuanced in describing why it worked for their specific needs, and it has some surprising takeaways, including that SQLite was 3x faster than a local Redis instance for their workload. Mike has built a few different WAF (Web Application Firewall) products, so we covered that area as well. He's seen a lot here, so we walked through all the nefarious traffic patterns and the speed in which these bots adapt to new vulnerabilities. Finally, Mike has a wide-ranging skillset that includes marketing. Developers are notoriously tricky to market to, so we talked about his experience in effective marketing to developers without being disingenuous. Links Blog Post: https://wafris.org/blog/rearchitecting-for-sqlite For A Good Strftime: www.foragoodstrftime.com IP Lookup: wafris.org/ip-lookup Timestamps 01:11 Start 03:41 Wafris 07:22 Redis and SQLite 19:09 Flatfile 21:50 Knowatoa 28:22 Web Application Firewalls 46:21 Jumpstart Pro 48:11 Marketing to Developers
Transcript
Discussion (0)
I think it's a misnomer that developers don't care about security.
Like, I hear that a lot from security teams, which is the security team being frustrated that, like, hey, I'm trying to get these developers to fix this thing.
And they're coming back that, like, is that really an exploit?
I tried to do this thing.
It's not fun stuff to work on.
And I think their takeaway is that developers don't care about this.
I think developers care a lot about the security
because I think they're conscientious. And part of making good software is making software that
doesn't harm your users. How sophisticated are, I guess, the bad actors? Is it pretty
sophisticated stuff or what's that look like? For sure, there's stupid bots out there that do
like slash WP admin to every rail site and just, you know i've seen those for sure yeah yeah for
sure well and that's because there's a lot of that out there but those bots they still there's enough
intelligence to them they because they're trying to be efficient like imagine you were writing a
malicious bot like well i'll check for the most common thing but whatever the headers come back
with then i can do a follow-up too i want to i would talk a little bit about web application
firewalls generally,
because you have some deep expertise in this area. And I guess, how did you get interested
in this idea? Hey, folks, this is Alex. And today we have Mike Buckbee on the show. He's the
co-founder of a couple of companies, including Wafris. And he wrote this really cool article
this week that I saw about how they re-architected and moved from Redis to SQLite in how they do
Wafris for client installations.
And I just thought it was really interesting
how he laid it out.
Like, hey, some of their unique requirements,
some of the testing they did, why they did it.
And I just love posts like that.
We got to talking and decided,
hey, let's make an episode about that.
So we talked about that re-architecting.
We also just talked about WAFs in general
because he's got some pretty deep expertise in this stuff.
So just like what that looks like,
what some of these problems are
and how you should be protecting your stuff. We also just talked about Rails and marketing to
developers, a lot of good stuff. In this episode, I really liked talking with Mike. If you want to
reach out and you know, with suggestions or comments or guests you want to have on the show,
anything like that, feel free to reach out to me or Sean. With that, let's get to the show.
Mike, welcome to the show.
Hey, thanks so much, Alex. It's great to be here.
Yeah, sure. Well, I found a post you wrote earlier this week, and it was like one of my
favorite posts that I read this year. I thought it was really great. And we just started getting
to chatting and thought it'd be good to have you on the show. I want to talk about that post and
a lot of other things you're working on. You're the co-founder of Wafris, but maybe for people
that don't know, you can give us a little bit of background on you and Wafris. Oh, sure. Yeah. So I'm a software developer
and founder and have really straddled the line between a lot of marketing
activities and software development and worked for lots of different companies. I've worked for
some YC startups. I've worked for like some really big companies. I've worked for the U S Navy.
Um, so pretty much the whole gamut of things.
Uh, and you know, through that, you know, hopefully have learned, you know, to do better
and better at this.
And, uh, something I'm very big about is actually trying to put things out there instead of
just like trying to do a side project and it kind of languishes, but to actually like hit publish on it, put a domain on it and let it live out there.
And so I've tried to do that for a number of years and to try to, the phrase I like is like
stacking bricks. So I've got a whole lot of different things, but you know, that stacking
all these bricks to try to build something bigger. And yeah, that's where I'm at today.
Yep. Very cool. cool yeah it makes sense
that you have like a bit of the marketing background along with the development just like
how well done this this post was i think and like a few other pieces of content that you have on the
wafer site and then you shared some like side projects as marketing that you've done so yeah
a lot of a lot of cool stuff um with that that i want to talk about but i want to start like with
this this post because this sort of kicked it all off. So this post is re-architecting Redis to SQLite. And like it says, you're talking about
migrating a portion of your architecture from Redis to SQLite. Some interesting stuff I want
to get into, but I guess my favorite parts of it is you're just laying out your requirements,
which were somewhat idiosyncratic compared to like other application requirements, describing those, like why they were important, how they affected the final solution,
and like why that resulted where you went. I guess maybe, maybe tell us about Wafrus and how it works.
And then just like, we'll go into like your general constraints or needs around this part
of your application. Sure. So Wafrus really comes about, you know, I think like a lot of stuff from
my own personal frustrations. And that's really, there's really two angles on, you know, I think like a lot of stuff from my own personal frustrations.
And that's really there's really two angles on that one.
I run a very enterprise web application firewall that lives in the Heroku ecosystem called Expedited WAF.
And it is a classic sort of firewall where it's for enterprise customers.
It starts at quite a high price point.
It's a little tricky to manage, though we've done
as much as we can with that to make it easier. But it's certainly not the default. And that's
what we see is everybody gets hit. It's indiscriminate. And literally, it's to the
point where bots that are scanning for vulnerabilities, they don't really go out and
pluck out like your site or my site. What they do is they hit every IP address in an address space.
They hit every single website on this giant list.
So there's a real need for default security that's not present.
And the other piece is what comes up from the open source world is lots and
lots of great things as far as like individual libraries,
but it's not really a system.
And so we're trying to have a different take on this. So Wafris is in the case of like a web framework, like Laravel,
Rails or Express, it's a middleware, you put it into your application, and then you have a whole
range of tools that you can use to stop attacks, you can block by user agents. And this is one of the things that if we were
starting today as like a profession, we'd say, hey, you know what? We can all look in our logs
and find all of these crazy IP addresses doing horrible, stupid things. So obviously it's got
to be easy to just like one click block this IP address from doing that. Oh no, no, that's a whole
other thing.
And so in most cases to do that,
you have to actually like hard code something
into the application, push it out, do a deploy.
And as soon as you do that,
there's 10 more IP addresses that have come in
that are doing something else.
So most people give up and they have poor security.
So that's the other angle.
So we're trying to make it more of a default,
have an open source solution so it can be widely deployed, and make something that works in lots of different environments.
I mentioned web frameworks, but also like ingress controllers, like traffic, HTTP servers, like Nginx and Cadi.
So it has this very broad sort of set of requirements and a very deployed,
it's not an internal SaaS in that respect.
Does that make sense?
Yeah, I think so.
And so as I understand it,
you have hook-ins to these different web frameworks and things like that.
And with that, they can sort of call out to something
to figure out, hey, should this request be blocked or not?
Yeah.
The easiest thing to think about is an IP address that, hey, this request is coming from a certain IP address.
Is it something we should block?
And just there's a lot of other things in there, but that's the easiest.
You know, is this on a block list?
Yep. Yep. Gotcha.
Okay.
So diving in a little bit just in terms of how it works.
If I'm a Rails developer, I sign up for Wafers and do this.
I integrate this middleware into my system.
A request comes into my Rails application.
Where is that thing that it's going out to on the hot path to figure out, hey, should this request be allowed?
Just get a sense of how that's working.
Yeah, so this article was about moving from
redis to sqlite for the clients so in rails and again this works the same in all the frameworks
um in v1 of this which was a mistake was we had chose the basis on redis and our assumption was
like oh well you're setting up your web stack you have you know a relational database you probably
have redis sitting there so just use redis and it's right there in handy that was an assumption
that was wrong you know we just flatly and and just to be clear on this so like even though i'm
using guafras in some sense as as a service from from you all it's actually like on my own redis
infrastructure somewhere like within my application.
Yes.
Yeah.
A model.
I don't know if you're familiar with Sidekick, which is the async.
That was really our model. We're like, oh, what if we made like a Sidekick, but for what application firewalls?
And the performance characteristics of Sidekick, and again, this is not in any way a knock on Sidekick.
It's just it's a different
thing. It can be less because it's not in the hot path and deliberately so. So if you have a
Redis that actually takes like a hundred milliseconds to respond because it's across
the network, that doesn't matter so much because it's just in queuing jobs that get taken care of
and come back. When that's like, Hey, should we show the homepage
to this bot or not? That falls apart. Um, and so we had done a tremendous amount to really make
Redis fast that we, um, I don't know if you're aware of this, but there's a whole like Lewis
scripting language you can run inside of Redis. And so we had this whole system that did this in
it from a technical standpoint, it was really cool.
And from throughput standpoint, it was really great.
But altogether, it failed.
And it was very difficult to set up Redis for our customers.
And you really needed to go in and tweak some internals and do some stuff.
And it was just a pain.
It was just a huge stumbling block.
So that was version one.
So version two is basically if you just add the gem and you're set, it's a whole other experience.
And what enabled us to do that was we had switched from Redis over to SQLite.
And I don't know if there's other parts of, I don't know if there were specific things you
want to talk about in there, Alex, but yeah.
Yeah, for sure. I mean, I think, I think like, just thinking about some of the very unique requirements there of like, hey, this is, you know, deployed to a customer and given directly to them.
So it's not like they're calling out to you and it's like, hey, what do we use in our stack behind our API of like Redis or SQLite?
It's like, no, actually, this is getting pushed out to a customer. And so we need like, one of the major factors there, as I understand it,
is just like ease of setup for them and just operational ease, right on on some of that stuff.
You know, I mentioned earlier, you know, I try to like, put things actually out there for people to
use and get feedback on and do stuff. And when and it's so hard to do that. Like you always want to work on stuff more
before you show it to people.
But one of the things we really found
when we put this out there was that
there's more and more distributed systems out there.
Like even in somewhat monolithic applications.
And so we're a partner with fly.io.
Like we're working on web application firewalls for them.
They were the first ones to get the V2 of the rails.
So they have from their DNA, a very distributed system.
So when you deploy to fly, it's awesome because you can spread out and be on all these different regions and do all this different stuff. What you can't have is a single monolithic Redis that works across
the Argentina region and the Tokyo region, as well as U.S. East, you know, U.S. East in quotes,
all at the same time and have it be performant. And so in a lot of ways, this was specifically
designed to work in those kinds of environments where instead of there being like one Redis
server that all these different applications all come up to,
it pulls down the individual SQLite databases to all of them.
And so we built a sync architecture to make that happen.
And it works so much faster.
Yep, yep.
And I think that's another interesting point.
You're like, hey, Redis wasn't great for me
because I'm distributed.
And it's like, a lot of people would be like,
wait, so you went to SQLite,
which is like basically a single file somewhere. And would be kind of weird but again like unique requirements
there of rights are pretty infrequent to your to your like updates around rules and things like
that are pretty infrequent and and like not super time sensitive right so when they do happen you're
actually just like taking that whole sqlite file and distributing it to all these different locations.
And every, you know, app server essentially has that file locally.
And it's not calling out to some remote service.
That's exactly it.
Yeah.
Well, and so what's interesting to me, and this is a lot of what I try to get across in the article, is I think what's changed in the last, you know, five, ten years of web development is like what is hard and what is easy like it's actually easy to distribute 100 megabyte sqlite database files and it seems
weird in the context of web applications but every mobile app you use silently does this all the time
that when you know my son plays um run kingdom, which is this incredibly complicated game of like
candy real-time strategy, social thing. Anyway, it, every, you know, the whole thing is a SQLite
database of downloads of like, here's the king of the cookies, you know, like, and has 20 hit
points, like all of that stuff. So it's just a standard in the mobile application world that,
you know, we've just not picked up on the website
so yep yep so i think that's super interesting and like one thing when i like started to read
it and saw hey uh you know sqlite was three times faster i'm like okay that kind of makes sense
because it's not making a network call out to redis and all that stuff but then like as i
reread it again that test was even like with a local redis so it's
like not even you know some network hop to a different database which which i kind of surprised
me a little bit um i guess like i have a few hypothesis do you have like any hypothesis like
like what are your thoughts on like why local sqlite beat local redis without even sort of
the network network hop aspects there well i think first think first off, let's not discount. Maybe I did something
really bad in the benchmarking. Like honestly, no, no, it's, it's true. Like, yeah. Like you
mentioned in the thing is like benchmarking is really hard because you have to like be expert
in sort of everything you're benchmarking or else you're not, it's hard, but yeah. Anyway, go ahead.
Yeah. Well, and that's sort of the starting point of it is that like, if, if I, as the person who's worked extensively with Redis is writing these incredibly complicated Lua scripts in it, we had an official partnership with Redis as well, like signed, you know, partnership with Redis.
If, if we can't get this Redis to be correct in a reasonable way, what hope do our users have?
So that's part of it.
So that's part of it.
I think from a technical standpoint, and there is a really big Hacker News discussion about this.
And in there, a lot of people with, I think, even more knowledge than I really came to the conclusion that what's happening is that the Redis connections are still over sockets.
So it's like serializing, deserializing this into Redis's protocol to do the queries.
And what I was testing, like, so a little bit of background. So the big thing that we have to deal with, like the worst, most,
most pathological thing we have to deal with is really trying to figure out if an IP address is
in a range. And so I don't know if you've worked with IP address as much, you know,
not enough. So you'll have to educate me here for sure. So here's, here's the two challenges. There's two sets of IP addresses. There's IPv4 and IPv6.
Fair enough.
And the naive way to handle this and the way we started was, okay, well, this is great.
We'll take the IP addresses and we'll take them and we'll take the range and we'll split it into integers.
And then we'll see if it's between these two integers.
There's a bunch of inherent issues with that with IPv4, but they can be dealt with.
And but let's look at IPv6, which is much bigger.
And then the energy you get back is bigger than a big int.
And so that's a that's a problem.
And what you find out when you really dig into this, if you look at like the IP lookup,
the IP handling libraries and all this, it's really like a kludged together set of like optimized regexes and like heuristics for pulling apart the IPs into this different stuff.
So originally in Redis, what we did was we used a sorted set, which is usually used for leaderboards.
So you have like a key and then you have a value
and the value has a number.
And as that number changes in increments,
it automatically reorders it.
So you can have millions and millions of people in this
and it's very easy to take out a slice.
And it's like, oh, well, Alex, you're 500th.
And so here's the next 50, that kind of thing.
If you take that sorted set
and make everything the same score, it then reverts to doing what's
called a lexical index which is basically just like alphabetizing it so we take the ip addresses
make them into integers zero pad them and then add uh like a dash and like a number to them that indicates like, oh, this is the country. So
Argentina is like a one. And so we'd have a range that's two entries in that sorted set and then do
a range and a reverse range on the same number. So it's really like two queries
for that IP address. And then if those rule numbers, that's the suffix on it are the same, we know it's in
that range. Otherwise it's not, it's not even real simple to explain. And a lot of people were
speculating on better ways to do it with like bit fields and stuff. I would love if there was a
better way. I tried so many things. I've had so many people like nerd snipe me over this.
So anyway, so we implement that in Redis and we implemented the same thing in SQLite because we couldn't still figure out a better way to do it.
And so SQLite is just faster for this weird query.
So yeah, yeah.
Those are going to be my two guesses.
Like, hey, you still have some network serialization, deserialization, even for local Redis.
And then, like, Redis has some great data structures, but not the perfect data structure for this.
And just reading some of your responses in that Hacker Noob post about how you, yeah, sort of set with zero value.
Like, I was like, oh, I wonder how a sort of set works there with just a giant blob of all the same scores and things like that.
Whereas, like, this is like a perfect B-tree application in SQLitel white for for that sort of thing so yeah that's that was like super interesting but i think
like um again just like going back to thinking about your specific needs hey you had like kind
of this weird access pattern it's a little tricky for redis also like you know needing to make it
easy for your clients being able to distribute it like sort of asynchronously and updates and
things like that just make it a nice fit for SQLite.
So I thought that was super fun.
Some people I know like ask like, hey, why didn't you just use a flat file?
Do you think about that at all?
Oh, yeah.
So I mentioned like, oh, here's all the stuff we're doing.
So we have to maintain different lists of that.
And the big ones are we have to maintain both GUIP database that, you know, you imagine that's huge,
that's millions of records. And then we also have through a whole mix of sources,
because we run this other enterprise WAF, an IP reputation database, that's also millions of
records. So, so if you imagine a JSON file,
you know, or whatever kind of flat file,
well, with millions of records in it,
well, now you have to build an index on it
and then sort of now you've built SQLite.
So we give up on that pretty quick.
Yeah.
It's like people are like,
why not a flat file?
And it's like, well, SQLite almost is a flat file.
Plus they've got a bunch of nice stuff on top of like,
why would I recreate compression and indexes and whatever they're all doing
there?
This isn't,
this isn't me.
I know there's been a couple of attempts,
like for people that are doing stuff where they want to have like,
um,
like for Figma or design applications.
And they'll use SQLite as the data format for like the export and stuff.
Like it's just such a cool little database for stuff like
that yeah for sure um just a side note on redis like there's a there's a note you mentioned in
there about being at rails world 2023 and feeling like some blood in the water about redis is this
like what's going is that about the license change is that about just like i think more
vibes towards running simpler architecture or like what, what was that about?
So I think it's a mix of things and it's interesting.
So I was just at LaraCon, um, first Laravel conference I'd been to and was at Railsworld
and Railsworld is happening now, um, for 2024.
And there's a real divergence where, you know, DHH and the Rails world is like, you should
run it yourself.
And I think, you know, if that's, if that's's your goal what you want to do is strip out all the complexity and
the same way we did with like yeah you can get rid of redis use sqlite it's a lot simpler and
that's sort of that's sort of where they were going as well i think which is trying for a
simpler architecture they just got rid of the traffic as an ingress controller and have come on proxy now
so it's a simplification of it and sequel light i think really fits with that story well yep yep
interesting um we were talking a little bit before we started i've been like one thing i love about
the post is it felt like it was written by a human it didn't feel like something that ai
could create it like had some life and just like a little bit of soul human. It didn't feel like something that AI could create. It had some life and just a little bit of soul to it.
I feel like I can feel your personality coming across, which is great.
You mentioned another project you're working on, Noa Toa.
Tell me about that.
Yeah.
So, you know, in general, I'm a man of many side projects.
And the side projects spawn side projects and things.
So this actually came out of a discussion I was
having with some other SAS owners, which is everywhere you look, you know, like, especially
developers, I don't know about you personally, but the rise of AI tooling for development has
absolutely murdered the number of Google searches I do in any given week, like just massively. And what does that mean
for people finding my, my SAS, you know, in these. So if you go into chat GPT and you look for like,
what's Wafra is like, does it have the right information? Does it do all this?
And so I had started with literally a, it was a Ruby script that started going through the APIs and would
like look for these sorts of things and then spit out a CSV. Then I had like a half dozen other like
SaaS owners and SEO people. And I would like every week I would send them this CSV, like, Oh, hey,
I reran the numbers on this new model and just got a little bit of traction and interest, um,
from that. And then I started reaching out to a few like SEO agency
people and they're like, oh, this is great. Not because there's a huge amount of volume of this.
And it's really tricky to see because it doesn't have attribution, but they really wanted to,
it's something that's real. Like, I think that's the bottom line is there is definitely more people
looking in these systems than, than we know. And so Noa Toa was now a SaaS that lets you go to,
you can go to the site and put in your domain and it will figure out your competitors and it will
figure out like, Oh, here's looking at all the actual keywords for your site. Like it pulls like
a thousand keywords for your site. Then it pulls like a thousand keywords for your site.
Then it says like, well, a lot of these are informational.
Like, hey, what's a proxy server?
What's an XYZ?
AI will just answer that.
So we're not even going to worry about those.
And we're just going to focus on these like high intent questions.
Like what's the best XYZ in Dallas?
You know, that kind of stuff.
And then it gives them, and then it does all the math and gives you a nice report of all that information. I'm like, oh, well, this is doing better here. Your
competitors showing up better here. And we're still working on this, but internally I'm tracking
a lot of like data source stuff and a lot of tying back to the security. A lot of the things for
optimization mirror, a lot of the things you would do with LLM security, like trying to figure out data sources and trying to sort out which bots are hitting my site.
Like, can all the AI bots scrape my site?
Like, is that good or bad?
You know, all those sort of different things.
Yep.
Interesting.
Have you seen, you know, tracking this over time, have you seen big differences in how like Wafris, for example, is showing up in in these different like as the models upgrade and update their information and things like that?
Oh, yeah, absolutely.
I mean, I think a misconception and this is also this is how I learn about stuff like, you know, Wafris.
There's a lot of things that are maybe AI tangential, but not really AI straightforward.
And it seems like such a big thing.
How do I learn about it?
I do side projects, you know.
So something I came to realize is we talk about these models like GPT-4, and I think we think of that as like a block, but it's not.
And they're constantly tweaking and changing things behind the scenes and if you
want they actually have like if you're using that the api they actually have a model that's just
called like gpt for latest you just call that and you get whatever you want and that kind of makes
sense like occasionally i don't know if you remember there was like a couple days where uh
where chat gpt was like a crazy uncle, like no matter what
you put in, it just came back with like insanity. And then they're like, Oh no, no, no, we reverted
that. We fixed that. So there's a lot of that happening. And yeah, I think it's, it's really
hard to know right now. Like it's just hard to know, like, is the information in there correct about my company?
Is it doing this? And like I mentioned, I had talked to a bunch of SAS founders,
pretty much at this point, all of them have said, yeah, I talked to someone who said they found me
in chat GPT that, and the pattern is they do research in the LLMs and then they do a navigational query in Google.
So they look up, hey, what's the best transcription software for this?
And it comes with a bunch of names like great.
And they put that name in and they go to the website.
But yeah. whether through reference or through other ones that you're tracking, I guess like see some marketing of strategies or things like that in non
LLM world,
whether it's SEO or new posts and things like that and see like,
Oh,
now there's a lift in them getting mentioned in some of these,
these queries in LLM.
Like what is the best web application thing for rails or something like
that?
Is there stuff you can do in the non AI world that's having an an impact there is that too hard to track oh yeah well so a way to track that is
and it's you know i think it makes sense which is you can look at the difference between what's in
the search results page and what's in the results that come back from the lm so you can see oh
there's some new and this is an opportunity. It's an opportunity. Like
if you're in a really competitive space, it's hard to break in on the SEO side. Maybe this is
a chance to break in on the, I call them AI search services because they're kind of blurry
on the AI search service side. And definitely like some of the folks who have been much more
active on Reddit, much more active on like forums and things like that.
And I think another piece of this is, if there's a piece of advice, LLMs are very straightforward,
I think, compared to the traditional search. So if you don't have something that says like,
yes, Wafris is the best web application firewall, the LLM won't know that you're the best application firewall.
So, because a lot of that, especially in the business world, like, you're almost trying to imply all these things.
Yeah.
They aren't real subtle.
So, yeah.
Yep, that's interesting. I want to talk a little bit about web application firewalls generally, because you have some deep expertise in this development, cybersecurity work, and marketing.
Like those are the three sort of things. And so I've done a lot of different permeations of all
of those things. So. And was one of those first, like, did you start as one and start to like add
more skills on or like, or were you just like, Hey, just brought indie hacker from childhood?
Yeah, certainly I started as um a software developer
you know learned web development and then just by the nature of the clients i had which was a lot of
education and uh my background like when i was originally doing corporate stuff really came in
working for the navy working for hospitals working for like 3M health information services, like very regulated
environments, very, you know, kind of different things. So yeah. So like at the Navy, I had
developed a rails app that helped with Navy personnel when they're leaving the service,
they have to get like a comprehensive list of all of the injuries and health issues that they've had.
And this is a huge pain for the doctors to generate this because it's just paperwork,
but it has a huge impact on the lives of those people because if they don't have everything
listed, then they don't get benefits and they don't get supported post Navy. And so I wrote
the system that like extracted and put all this together in a real like early form.
And it was great for that.
You know, it was like a real positive.
But, you know, it has to exist in this very secure environment.
It has to meet all these things.
So a lot of this came out of that.
I had, so early in Heroku's lifespan, they came out with the platform API. And while it was still in beta, I had noticed that
they had a method of programmatically applying SSL certificates. And so this is way before
let's encrypt stuff. So that was really my first big win was I developed a system for automatically
producing those and installing them and getting it to work. And so I still run half a dozen add-ons in the Heroku marketplace.
And then the biggest of those is expedited web application firewall,
which really came out because let's encrypt came out and was a much better
solution, both, you know,
from a cost standpoint as well as like a user experience standpoint.
And I think maybe I took those lessons to heart with the web application
firewall pieces.
So yeah.
Interesting.
Okay.
So I guess like maybe even just educate me on WAF generally,
I guess like how bad is this problem?
Why is it,
why is it needed so bad?
Is there,
they're just like bot armies out there just scanning all sorts of,
you know,
like you're saying IP addresses and web addresses and everything.
Yeah.
I think there's, there's an external and an internal reason for it.
Most of the people that I talked to really fall into three buckets.
One, they're trying to get like SOC compliant or some other compliant thing, and they need
a web application firewall sort of for a checklist.
And those people are very reluctant customers.
But I've found that oftentimes they become a lot more enthusiastic after it's installed and they're actually using it.
Because really all a WAF is is a toolbox.
And I think as developers, we love our tools.
We love like adding in new libraries and doing all this stuff.
And so that's the one group is people trying to get compliant.
The other group is I'm a small SaaS, uh, I'm a small, uh, SAS,
and I'm trying to interact with this bank. I'm trying to interact with this bigger company
where I just got acquired and they looked at what we're doing and then freaked out and said,
like, you need to get a WAF immediately because you don't even know if you're under attack.
And the third group is just, oh boy, we looked in our logs and saw a lot of weird traffic.
And again, it's mostly bots.
So I had a company that was an HR, like if you have an incident inside the company, you can anonymously report it to this external SAS.
And they sort of handled the complaints and things.
And they found out a third of their traffic of the number of requests to their website was from China.
They only do business in the U S it's not, and it's not even to say like, it's, this is a malicious
Chinese cyber attack that was targeting them. No, it's just these bots that are launched from China.
Some of them are like even search engine bots and all these things, but all it takes is like
a little misconfiguration for like all that data to be out there.
Yeah.
Yep.
Interesting. And so are these mostly, I guess when people sign up for Ralph, is that almost always about blocking traffic that's not true valid traffic? like, you know, maybe they're making some sort of developer tool and they need to have like some actual rate limits on people that are hitting their API in like a programmatic way that could
shut down their application. Like, is there much uses of that? Or is it mostly like, hey, you know,
just like truly invalid traffic that we don't want to have happening? It's truly invalid traffic. I
mean, it's not a good solution for doing API based rate limiting. It's a much better solution for doing things like,
well, again, I mentioned it's like a toolbox.
And so the toolbox is we,
and especially with Wafris,
we've really tried to collapse the loop of all this stuff
where you can certainly look in your logs.
And so people do this.
They take their logs and they ship off all their logs
to like some massive elastic search cluster.
Then they write like these complex queries to dig through it. And they're like, okay,
we found out last week that there's these many IPs. And like, why is one IP address,
like 90% of our traffic, like that kind of stuff happens. So we have a visualizer that tries to
make sense of all that stuff. And then you can one click block it. You can also like,
you know, try to do some programmatic things, but that's a lot of it. And the other things are,
there's a lot of configuration stuff. So you can block countries, you know, which is very useful
just to reduce your surface area. We had someone who had to block Canada and there's no like real
thing with Canada. It was just like there were proxy servers in canada that people were using to attack their site we often find that's the case um yeah so things like that like
very very practical sort of stuff so yep how sophisticated are i guess the bad actors is it
is it pretty sophisticated stuff or what's that look like it really depends i mean for for sure there's stupid bots out there that do like slash wp admin
to every rail site and every just you know i've seen those for sure yeah yeah for sure well and
that's because there's a lot of that out there but those bots they still there's enough intelligence
to them they because they're trying to be efficient like imagine you were writing a malicious bot like
well i'll check for the most common thing, but whatever the headers come back with,
then I can do a follow-up too. Oh, interesting. Yeah. Okay. So I see a lot of that and we see
a lot of those IP addresses that are making those things act as reconnaissance where there's,
you know, this is, this sounds like I'm making it up like a Cold War thing, but really Eastern
Europe has a lot of data centers
that they're like, yep, sign up for us.
We're cool with spam and malware.
And they change IP addresses.
But if you see a bot that if you get hit with that WP admin thing,
you know like, oh, well, this IP address is making this request.
That's bad.
So we're going to block it
and so then the next thing that it comes out with you can block that as well so yeah yeah okay that's
pretty interesting um just just to kind of close the loop i guess like wafers you you have some
sort of centralized thing somewhere where people actually tweet their rules and all sorts of stuff
yeah um i guess like tell me a little bit about your architecture there. I imagine you're not using SQLite as your database
for your centralized thing.
What are you using there?
Well, that looks much more like a normal Rails app or whatever.
And hub.wafers.org, you can sign up for a free account and go in.
And if you're on Rails, you automatically get the V2.
And so if you have the gem installed, it really just looks for an API key.
It functions very similarly
to like exception monitoring services.
So like Sentry or any of those,
like you set up a key
and it sort of handles things.
Yeah, and it would just work.
You can go in, see like,
oh, most of our requests are from,
you know, Buenos Aires.
We don't do any business there.
That's sure weird. So we're going to block that, you know, Buenos Aires. We don't do any business there. That's sure weird.
So we're going to block that, you know.
Yep.
Yep.
For that visualizer aspect, which we're talking about a fairly high amount of data, sort of analytical queries, very ad hoc and interactive for users.
I guess, like, what are you using for that?
Has that been a hard problem technologically to solve?
We're using Redis.
Oh, wow.
Okay.
We're using Redis on the server.
Okay. We're using Redis on the server. So, okay.
So is that like pre-aggregated
and like you're just sort of
aggregating in different ways.
So there's like some flexibility
on how you slice and dice,
but not like sort of unlimited
SQL querying capabilities.
Well, at the end of the day,
these are log files, you know,
so there's only so many ways
to chop them up.
And there's only so much data
we're actually taking in. So, you know, so there's only so many ways to chop them up. And there's only so much data we're actually taking in. So, you know, grouping on, cause these are, this is weird.
Most people, I don't think have ever seen a list of like, here's your top 10 IP addresses that have
made requests to your site in the last week. Like, yeah, that's not a crazy thing to ask for.
That would be very useful, but that is a hard thing to get out of most of these systems.
And this is kind of what I was going back to.
Part of our larger goal is just to raise the default, like the default level of security
we as application developers have with our applications.
More control, more security, because there's this horrible asymmetry to this where it's so easy to write
a bot and like we do this in my talks like hey you want to write a bot and it's basically just
a curl script you just write curl and then you wrap it a little bit of bash and you just give
it a list of domains and it just goes out and hits all those domains looking for like oh did
you leave your dot env file in the root like you
might not but one of the 5 000 people were looking for today did so you know yeah all that kind of
stuff so yep for sure um what about you know there's a new vulnerability that came out i think
yesterday cups or whatever that had like this 9.9 out of 10 score or like heartbleed how quickly
between like those vulnerabilities being released,
do you start to see bots trying to exploit that?
Is that like immediate?
Like,
is that within a day?
Oh yeah.
Yeah.
Um,
so heart bleed was an SSL vulnerability.
So you had to update your TLS libraries and things.
Um,
I think a better one is log for J.
I don't know if you're familiar with that one.
Yeah,
sure.
Yeah. So log for J I was really know if you're familiar with that one. Yeah, sure. Yeah.
So Log4J was really bad for people that aren't aware of it.
Log4J exploited a Java logging service.
And basically, you could very easily write in commands that would then be executed by that service.
So you could write in like, hey, post out all the data from the log files to this external server.
And you could actually go in.
So like, so I'm running a rail server.
I'm serving up my images out of it.
I could go in and like go to logo dot PNG and then put in the parameter question mark and then put my log for J command in there.
My rails app doesn't actually do anything, but I'm using a service that ships
this off to, you know, some other thing that does use log for J that service then pulls out all the
logs and sends it off. So yeah. So that stuff happened almost immediately and it was a huge,
it was a huge scramble, but you know, for expedited, which is the enterprise service, we, that day, you know, started patching for everyone.
And everyone was just covered.
And that's a very managed service.
You know, with Wafris, you can do something very similar.
We help with that.
You can go in, like, the immediate response is, like, the protocol for blog4j was, like, JNDI.
So if you went in and just block, that's an unusual string.
Like you block any path that has J and D I in it.
That's not everything, but that's like, again,
you're cutting down the surface area of attack.
You're cutting down like the probes, figuring out that.
Oh yeah.
Like, yeah, it's easy to get to this, you know, site.
So, yeah.
Yeah.
One thing you were talking about before we started here is just like how
security and developers, how they think about it. Hey, they, they, they learn this stuff within their application framework, but sort of once it goes over that boundary, it's just like, they don't have a sense of it or something like that. Could you, could you dive into that a little bit and what you see or like where devs could, I think,'s a misnomer that developers don't care about security. Like I hear that a lot from security teams, which is the security team being frustrated that like, hey, I'm trying to get these developers to fix this thing.
And they're coming back that like, is that really an exploit?
I tried to do this thing.
It's not fun stuff to work on.
And I think their takeaway is that developers don't care about this.
I think developers care a lot about the security because I think they're conscientious.
And part of making good software is making software that doesn't harm your users actively.
And I talked to a lot of developers.
I think it's mostly about tools, that we haven't given developers good tooling for this kind of stuff. And what I was really trying to get to was, you know, within the framework, like Rails or Laravel,
if you use the proper form stuff, you're very resistant to like cross-site scripting attacks
because there's a lot of stuff in there.
But you do have issues of, well, what do you do if like you're running an e-commerce site
and you're just being scraped?
Your whole site is just being scraped every item.
And then your competitor is just going through and marking all their items 10% lower, cheaper than yours.
And I'll bidding you like that's, that's not exactly, that's not a SQL injection that you're trying to deal with.
Like that's an operational issue or the big attack we see, like the number one,
like serious attack we see against SaaS apps by far is credential stuffing, which is that you just
put in a username and password and you try like over and over again with different ones. And
typically we see those attacks so large that they sent, they accidentally DDoS the site.
And so a very common response to that is, okay,
well, let's put in rate limiting. And so you put in rate limiting and you say like, well,
no more than like five attempts an hour. And you're like, that sounds very reasonable,
except that the attackers for a couple bucks are buying thousands of these proxy servers.
And so they have all these different IP addresses. So they
have a thousand IP addresses. Well, that's 5,000 attempts an hour. That doesn't really solve the
problem. That's still quite a bit. So, you know, you need these other techniques to then say like,
well, we can block these proxy servers because we know these IP addresses have been used in bad ways.
That's good. Block these from another country, block them on user agent, all sorts of things.
So, and then that's how you deal with those attacks along with rate limiting.
Yep.
Yep.
Gotcha.
Is there a lot of just like working with Wafers and X-rated Waf, just like education you've
had to do for customers and things like that?
Do you feel like that's like a big part of your job?
Yes and no. I mean, certainly the customers that are like mandated to get a web application
firewall less so because they're just like desperate for stuff. But I think the education,
the best analogy is a toolbox. That's all this is. It's not magic. It doesn't do some super
secure thing. It just gives you the ability to like, when there is an issue, actually take care of it in a reasonable way. In a way, like I, I, I have had multiple calls
with people like two in the morning, their time, they're been up for 24 hours. Their site has been
down that whole time. And they're just desperate. They're like, and that's not a good time to be
like, well, which of these different
enterprise framework options should we choose? Like, it's just hard. So again, and this is
Wafris is our reaction to all these problems. It's like, it's open source. It's, you know,
in rails, it's just a gem. It's a library. You install it. It's there. If you need it,
you can check your traffic. You can, you know, see where this stuff happens.
So, yeah.
Yeah.
Yeah.
I want to talk about Rails a little bit because I have never written any Rails, but I like to do just sort of like vibe checks of where we're at.
And it's hard to tell on Twitter because there's so much stuff.
But it seems like there's a bit of a shift back towards like full stack frameworks and some of that stuff.
And like, hey, some of the craziness of full-stack JavaScript,
like was,
was not worth it.
And actually it's nice to have some of this stuff,
I guess,
like what have you seen in the,
in the rails or,
or like,
you know,
where about like different,
different full-stack frameworks that you provide for,
are you seeing,
I guess,
five shifts there or what's that feel like the last couple of years?
It's hard to tell.
I mean,
honestly,
well,
and I think there is just so much application development that's happening.
It's hard to tell.
I do see a lot of very split applications
and we have a lot, probably 35, 40%
of the applications we protect with Expedited
are essentially like api.domain.com.
And they have a front end app.
Maybe it's on Verso and novi or whatever and
then the back end is on heroku and that's you know doing all the heavy lifting back end work
and still choosing you know it's this right right tool for the right job you know kind of stuff
yep for sure i y'all had a good post on adding in Jumpstart Pro.
And I'm not familiar. I've heard of this a little bit.
And I just want to understand a little bit.
So as I understand, you had Wafers as a Rails application.
It was sort of like MVP or just like at least that early version of it.
And then Jumpstart Pro is like this sort of, I don't want to say template,
but like a toolkit that has like a lot of sort of opinionated rails,
a starter kit. There you go. And so you like bolted that on after the fact. I guess like,
tell me a little bit about like why, why you decided to do that, how that, how that went and
what that was like. Sure. So there's really two sides of it. One small team trying to be very
efficient and starter kits are a good way to do that.
It is the case that, you know, there's a lot of things out of the box.
You don't want to have to write from scratch, like for every single project,
interacting with Stripe, you know, doing all that stuff, managing plant.
And so, so why do it?
The other thing, and this may be unique to us, but I think, you know,
speaks a little bit to like the overlap of the marketing and development stuff is that I think SaaS starter kits are a distribution channel for Wafras.
And that's part of the reason we're strategically open source is that, you know, Jumpstart is a very popular starter kit.
They include Wafras in it.
So, you know, so people just get it out of the box
um another one is bullet train yeah i've heard what train yeah okay uh they were acquired by
click funnels um so andrew culver's starter kit and you know we're included in there we're included
in some other ones and that's just a distribution channel for us uh and so you
know it makes sense to to be familiar and use the things that you know also promote us so yeah yeah
tell me a little bit about you know your background both as a developer and marketing
i just like how do you think about marketing to developers what's a what's effective for that
well i think it is a challenge uh you know, a real challenge we have with Wafris is
that there's not, it's not often very legible, like who is really in need of this. Like I laid
out the scenarios and those are real scenarios. And something I feel very happy about is that,
you know, especially doing the security work is this is very positive work. It's very positive for developers, very positive for these sites.
It's not something that, you know, feels scummy in any way. And yeah, so I think I lost the thread
of the question. Marketing to developers. And so I think part of this is, so it's hard to target those people other than like
very broad things. So this article was an attempt, um, to reach more developers by just sharing like
what we're going through and our expertise and like our particular bit of this. Um, it was on
hacker news. It was on a lot of the big, the major like subreddits for programming and SQLite and
stuff on Reddit, uh, and lots of, lots of shared, you know, things. So do a lot of that. And then
I try to do a lot of, uh, software engineering is marketing. I have a different site for a good
STRF time, which just helps you format date time strings. Um, so that's been around for a long time
and it actually drives quite a bit of
traffic to the WAFRA site. We have a link at the bottom of it. And I had originally made that for
a meetup for a talk, just like, Oh, here's how easy and quick it is to make something. And again,
one of those things that it's just been very useful to people. So trying to be useful, trying
to be helpful, make things I think are all directionally positive ways of doing
marketing so yep yep i love that like sharing knowledge stuff you know like just nerd sniping
people like you did with this this thing where it's like oh wow going from redis to sqlite and
getting faster i think that like at least like tweaks people's interests and and getting a sense
of that and then yeah that for for a good strf time we'll include that but it like reminds me of
you know auth0 had jwt.io and i think a lot of people just use that for good STRF time, we'll include that. But it reminds me of Auth0 had JWT.io.
And I think a lot of people just use that for different things.
And I think just like, yeah, like you're saying, having just helpful tools and then sharing your tool as part of it.
So a tool we have for Wafers is IP lookup that you can go to the site, no sign up or anything, and put an IP address in.
And it will do a lookup of it of all the sort of different reputational information. And having used a lot of these
services, I think ours is a lot better. Like we do some other things to try to give more context to
it. Like we look up actually not just the IP address that's put in, but all the IP addresses
around it. Cause oftentimes that indicates like, Oh,
this is actually in one of those bot farms.
Cause everything in it is well known to all these block lists.
You can actually launch like a probe of that IP address from the website to
see if it has active like VPN and proxy stuff happening on it.
So, and that's, that's again, it's a free tool. We get a lot of sign's, that's, again, it's a free tool.
We get a lot of signups from that just because again, it's useful just out there. So.
Yep. Where do you like that IP reputation stuff? Is that something that you like buy from some
other service provider? Are you calculating, like, do you see enough traffic that you calculate it
yourself? Or like, what's that sort of look like? All the above. I mean, and that's really,
because I don't want to get you to give away your secret sauce or anything here if that's.
Yeah, but well, it's it's a Google away.
So it's not it's not that secret.
Certainly, you can go out and license this data.
I think the real part where expertise comes in is that some of these lists are more reliable than others.
And what you don't want to do is block people unnecessarily. So we've learned through hard experience
sort of how to filter out this giant list of noise and things.
So, yeah.
Yeah, yeah, very cool.
Mike, thanks for coming on the show.
This has been great.
Like, again, I love your post
and it's been great learning about Wafris
and all these different things.
If people want to find out more about you, about Wafris,
where should they go?
I'm still on Twitter.
Still reluctant to say that, but, you know, I'm still on Twitter mostly because a lot of mute words.
I'm still on Twitter at mbuckbee.
And I am on LinkedIn, but who uses that?
And, yeah, you can go to Wafers.org.
Check it out.
Yeah.
Cool.
Mike Buckbee, author of Re-Architecting,
read us the sequel.
One of my favorite posts of the year.
Thanks for coming on.
Awesome.
Well, thanks so much, Alex.
This has been a delight.
Yeah, great.