Software Huddle - AI-driven Database Cache with Ben Hagan from PolyScale

Starting point is 00:00:00 So yeah, you can really go from zero to high hit rates very, very quickly. So, you know, even if you're sort of purging your entire data set, it doesn't really matter. You know, you can get up very quickly. Caching is obviously notoriously difficult, but it did tick all of the boxes around being able to build a platform that could scale across different types of databases. I mean, the actual learning never switches off. Every query that comes through the platform feeds into the algorithm. But as I say, it's actually very efficient from a cold start. You get a third query, you'll get a

Starting point is 00:00:29 hit. Hey folks, this is Alex. In this episode, I talk with Ben Hagen from Polyscale AI, which is an interesting company, right? When you go to the website, it bills itself as this AI-driven cache. But the more I looked into it, the more I talked to Ben and figured out what's going on there is like there's kind of layers and there's multiple things going on there. It's got this smart basically drop in read through cache for you that it does automatically, but it also has a much better operational model for caching. It's got this like global network, so it gives you sort of edge caching at all these different locations around the world just sort of automatically.

Starting point is 00:01:02 It also just gives you insights into your query patterns and optimizations and where you're spending a lot of time in your database and how your cache is helping you. So I thought it was super interesting to talk with Ben and figure all that stuff out. Again, if you have any questions, comments, things you like, things you don't like, feel free to reach out.

Starting point is 00:01:19 If you have other guests you wanna be on the show, let me know about them. You can find me on Twitter, Alex B. Debris. You can also find my co-host, Sean Falconer, on Twitter as well. So yeah, feel free to reach out. And other than that, enjoy the show. Ben, welcome to the show. Thanks, Alex. Great to be here. Yeah, absolutely. So I'm excited to talk to you. You are the founder and CEO at Polyscale, which is an AI-driven cash. I think it's super interesting because it's just an area I don't know and don't understand that much around what's going on there. So I'm excited to dig into some of the technical aspects around this.

Starting point is 00:01:50 But maybe tell us a little bit more about your background and Polyscale. Sure. So I guess Polyscale came from kind of a classic startup tale of living the problem and um so my background um i've traditionally been in kind of sales engineering solutions architecture um for you know getting back a few companies for sort of large data-driven companies um i did um some time at elastic so elastic search and um before that a startup called data sift and that was focused on mining Twitter firehose and LinkedIn and Facebook data in sort of privacy safe ways. And, you know, what I sort of observed at these companies was that getting data into the right locations and to being able to support the right access patterns, you know, it was complex and difficult unless you had the right teams of people and

Starting point is 00:02:46 methods for moving that data. So really, the pains of scaling your database and your data tiers in general, what drove Polyscale or the inception of Polyscale. And when I set out, it was really a case of how can we make it really easy to scale a database without a huge amount of effort, cost and complexity from teams of people? And you've got the sort of traditional, you know, vertical scaling challenges and different access patterns, as I mentioned. But really, I arrived on caching as being that sort of layer for scaling those systems. So, you know, I did a lot of research on things like materialized views and, you know, would Polyscale be a read replica company? Is that the way to solve this? And caching is obviously notoriously difficult, but it did tick all of the boxes around being

Starting point is 00:03:38 able to build a platform that could scale across different types of databases. So, you know, that was a big win. But as I say, the focus that was, I think, quite unique to this day is that Polyscale is completely plug and play. So the idea was how easy could we make it? You know, how can we make it really trivial to plug this thing in and start scaling with databases? And that's where Polyscale came from was, as I say,

Starting point is 00:04:04 the whole plug and play ethos. Yep, absolutely. And some of those companies, as I say, the whole plug-and-play ethos. Yep, absolutely. And some of those companies, like I love Rockset, Elastic. I'm sure being in those sales engineering leadership roles, you saw just a ton of really great use cases, data-heavy use cases, but still struggling, even with those great tools, like how to make these work with some of those volumes and things like that. So Polyscale, you mentioned plug and play. I guess, what is Polyscale? What gave you the idea for it?

Starting point is 00:04:34 Or how are you seeing people use that? Yeah, so taking a step back. So Polyscale is a database cache. So specifically designed to cache just your database. So comparing it to a key value store, something like Redis or whatever, that you can obviously cache pretty much any type of data. PolySkill is very much focused on databases specifically.

Starting point is 00:04:54 And there's really sort of three pillars that underpin it. The first one is that it's wire protocol compatible with various databases. So again, going back to the sort of plug and play, it's wire protocol compatible with Postgres and MySQL, MS SQL Server. We just released MongoDB, which is interesting for us, the first sort of NoSQL environment. But the idea that you could just update your connection string and start routing your data through Polyscale was really the goal. And taking a step back, architecture-wise, Polyscale is effectively a proxy cache

Starting point is 00:05:29 that sits between anything that's connecting to your database, like your web application, serverless function, whatever it may be. And yeah, it inspects that traffic that passes through. So being plug and play was kind of the first one. And then secondly, you know, caching gets really hard when you think about what to cache and how long to cache it for. So if you're sort of implementing caching at the application tier, you may select a specific query, you know, maybe break that off to be a microservice in its own right. Maybe it's a leaderboard query or something specific. And you may have a good idea around how long you want to cache that for. And maybe that's

Starting point is 00:06:08 something you can invalidate when you know that data's changing. So if the updates are coming through the application tier, you can easily write that logic. So the approach we're taking, however, is that we want to cache all queries that are good candidates for caching. So the fact that we are a sidecar proxy that sits alongside your application means that we can inspect all of the traffic. Every single interaction between the app and the database, we get a view at. What's cool about that is you can plug in complex applications that may run 50,000 unique queries a day through the platform and Polyscale can inspect those and work out what to cache. So the second sort of core principle here is that the AI caching engine, it inspects all of that traffic, as I mentioned, that goes back and forth, and it builds statistical models on every individual unique query. And that's incredibly

Starting point is 00:07:05 powerful because you, as I say, you have the full breadth of any query that can be cached, will get picked up and it will get added into the algorithm. So as I say, the goal of this is really to allow a developer to plug in large applications or full applications without writing any code, without doing any configuration, and automatically start seeing their hit rates go up, with dealing with all the complexities of invalidation and all that good stuff. Then finally, the third pillar of the platform is that we have our own edge network. The whole idea, again, of the plug and play is you can connect this thing in and it will route the data through our infrastructure and whatever

Starting point is 00:07:50 the closest location is to your application tier, that's where the data will be cached. So you get this sort of nice reduction in latency, or you can self-host Plotly Scale if you want to run that inside your own VPC. Very cool. I love that. I love that three-pillar approach. That first point, that plug-and-play is interesting. And it makes me think of like DAX. I use a lot of DynamoDB, and they have DAX, which is basically a pass-through cache on that one. We're starting to see a few more of these. But it's so interesting because it just lowers the barrier so much in how much work you have to do to integrate a cache. You're not manually changing all your data storage logic to check the cache first and things like that.

Starting point is 00:08:28 It's just totally passing that through. So that's pretty interesting. You mentioned you integrate with a bunch of databases. Are there particular databases that you're seeing more interest for? Did you see a lot of people asking for Mongo or what did that process look like? Yeah, so we kind of started with, we picked MySQL. This is going back a couple of years. We sort of said, look, at the time, it was really the biggest, most popular database. And it was a good place to start.

Starting point is 00:08:56 We knew the protocol, understood that pretty well. And then from there, I guess the Postgres interest keeps going up and up. And there's new awesome vendors coming out the Postgres interest keeps going up and up and there's new awesome vendors coming out around Postgres. So that was our next database. And I guess taking a step back, we really focused on or started to focus on those traditional transactional databases to start with. Just where was that adoption? Where was the widespread adoption? And that's where we started. So as I said, we did MySQL, then Postgres. And for us, actually implementing a new database is a reasonable amount of work because for that protocol compatibility reason.

Starting point is 00:09:34 So I think the whole concept of asking people to install different ORMs or different drivers or different client libraries to interact with your tool is a burden and is an overhead. And you're always sort of competing with, you're always going to be competing with other libraries. And, you know, is there an ORM that someone's picked because of a specific feature? And you don't want to be having to replace that or sort of, you know, in that fight. So being wire protocol compatible was really nice in that you just get plug and play zero, you know, and it runs across every language. It runs across TypeScript and Root, doesn't matter.

Starting point is 00:10:11 But yeah, to answer your question, I think we really focused on just the biggest popular, most popular databases at the time and then moved from there. We did MySQL, MariaDB is obviously very similar protocol. So we did that. And yeah, as I say, more recently, we added support for Microsoft SQL Server. And that's really interesting because you see lots of cases where typically, excuse me, if you're in sort of the more edge-facing use cases, we don't see a huge amount of MS SQL Server in those types of environments.

Starting point is 00:10:48 And it's nice to actually be able to be the plumbing for those types of tools where people can now plug those in and use that data anywhere. And then, let's say more recently, Mongo's our first step towards... There's actually... We've got a pretty significant roadmap around where we want to take the whole paradigm of using caching to distribute your data, support those different access patterns. And Mongo is our first sort of NoSQL database. And then we're also looking to move into things like Elasticsearch, search infrastructure, and also data warehousing, things like ClickHouse and potentially Google BigQuery. So it's really following where demand is, really. And I think MongoDB and Atlas, there's a huge demand for Mongo,

Starting point is 00:11:33 high-performance distribution, and that caching layer is very useful. Yep, absolutely. So you mentioned the protocol compatibility and some of the work there. Are you, I mean, do you start from scratch every time or do you use like the, you know, the query parsing front end from like Postgres or MySQL or things like that? Or are you just like, hey, we're going to look at the wire protocol and parse that all in our own engine and handle it from there? Yes, definitely the latter. So, I mean, it is the latter. So our actual engine is written in C++.

Starting point is 00:12:02 So we have our own proxy that's written in C++. If you think about what we do, we're a middleman between your app and your database. If we added any latency to SQL queries, if we added any latency in that process, we're going to fail as a business. There's just no way that people would voluntarily have latency added to their queries. So we've worked really hard to make sure that that sort of inspection that we do of all the packets that come through is very low latency, zero copy buffering. And so we've written everything from scratch in that perspective. And that unfortunately means going right down to the wire protocol. But the nice thing is about what we do is we're not

Starting point is 00:12:45 sort of implementing all particular types of queries for every database. So what we do is we do implement the authentication handshake. So we manage that, and then that sort of flows. There's a handshake between the upstream database and that's obviously allows somebody to connect into Polyscale and that then creates an upstream connection to the database. But from that point onwards we're sort of just letting packets flow backwards and forwards. And what we do is we sort of inspect those to understand are those read queries, are they selects and shows, obviously are they cacheable versus are

Starting point is 00:13:25 they sort of manipulation queries or mutations, you know, inserts, updates, and deletes? And those mutation queries just flow through and hit the origin database. And the nice thing about that is you're not altering people's architectures. You're not sort of saying, you know, all your rights are going to end up in the same location. You're not having to distribute your database or shard your data in any way. So that resonates well with customers in that your rights are still going to where they always did. But if a query comes in that we see and it's a cache,

Starting point is 00:13:55 we have that in the cache, then we'll serve that from there. So you're effectively getting a SQL-compatible key value store at the simplest level. Yeah, very cool. Okay, I want to dive a little bit deeper just on like some of the under the hood stuff, that protocol stuff is great, but keep going. So one thing you all talk about is sort of AI driven caching, right? And I know like AI is the buzz right now. That's right. That's like mostly LLM. That's LLM stuff. And I'm guessing

Starting point is 00:14:20 it's not an LLM under the hood that's doing your ALI, you know, so. Correct. Correct. I'm guessing it's not an LLM that's doing your ALI. Correct. I'm sorry to say, but no LLMs here at the moment. Yeah, exactly. So tell me more about why is this a good... Cache invalidation is famously one of the hardest problems in computer science. Why is AI a good approach for this? How does that work? Yeah, so I think when we started looking at this, it was a case of every cache implementation that's sort of manually done, it's implemented at the application tier. Everyone's literally starting from scratch. And this was staggering to me.

Starting point is 00:14:58 Even a couple of years ago, it was like, well, why do we make it really hard for developers to implement caching? Because you're going down this process of deciding, okay, pick your key value store, Redis or Memcached or whatever it may be. That's the easiest part of the job, right? It's pick your tool. There's some amazing vendors out there of high availability cloud services, but then it's a case of, okay, I've got a blank canvas and I start with modeling your data know then you start building your application logic and then you've got to work out as you say when do i invalidate and lots of um so many implementations i've seen people just say well it's good enough let's put a ttl of 10 minutes in there and i'm either serving some stale data and that may be fine um or i may be just getting misses and my hit rates aren't what they could be, but that's okay." I think lots of people settle in that world.

Starting point is 00:15:50 From the inception here, as a case of, look, those approaches are typically a developer will pick a handful of queries that are causing pain. They're either, they've got really slow performance because of various reasons on the database, or whatever it may be, lack of resources or whatever. Or it could be the case of you've started writing your cache and you've got halfway down that process and you realize, well, actually, there's a lot of queries in here we need to solve for. And you get this ongoing burden as well where a new feature comes into the application. It then has to go into that logic as well. So you get this sort of ongoing, there's an observability task, there's a testing task, there's an ongoing overhead of, is my

Starting point is 00:16:39 caching working as expected with the platform? So having this sort of sidecar approach where you completely separate that logic from the app is completely independent. It doesn't touch the database. It doesn't touch the app tier. It's really nice in that it's a one-stop shop. You plug it in. Every feature then that you add to this benefits from that.

Starting point is 00:16:59 So the way we do this is every query that passes through, there's a whole bunch of inputs going to those models that get built on them. So the bigger and sort of more important ones are what we call, obviously, the arrival rates. So how frequently are we seeing the queries coming in? And then, obviously, how frequently is the data changing on the database? So we can look at the payload size that comes back and sort of compare that to say, well, is it changing? Then based on the frequency of that change, we can build up a

Starting point is 00:17:28 confidence score of how likely is it that this data is actually changing on the database. So if you think about, you know, that's sort of the basics of the statistical-based invalidation and the platform will set a TTL that it thinks is optimum for based on those inputs. And that gives you some level of comfort and you'll get, you know, the data will invalidate based on that statistical model. Now, turns out that humans want really good, fast invalidation that sort of, you know, is correct and accurate. And so what we do is there's a couple of additional things that we bring in. So there's a whole feature set that

Starting point is 00:18:11 we call smart invalidation. And what that does is it looks at when we see manipulation queries. So if it sees coming across a wire, inserts, updates, and deletes, it will actually determine what data in the cache has been affected by those changes. And it will automatically remove that data from the cache. So that's really nice in that as a developer, you can say, there's an update statement coming in, it's updated, and it's affected a whole bunch of read queries that are already in the cache. Let's go and invalidate those.

Starting point is 00:18:44 The next request will be a miss, and that will then come back and refresh the cache with the new data. So we err on the side of consistency rather than performance in those use cases. And we also do that, policy scales completely distributed as well. So you can run one or more of those instances and those invalidations happen globally. So if you get an update that happens in, let's say, Paris in France, and you've got another node in New York, that's going to invalidate those globally. So that smart invalidation is really nice for the majority of use cases, actually, that we see, whereby people are just plugging in and they've got a monolith, maybe with some microservices as well but

Starting point is 00:19:25 you know all of that traffic's coming across the wire we can inspect everything so you mentioned sort of preferring um correctness and consistency over over like top level performance is that something that's that's tweakable like maybe i'm doing you know if i'm like twitter and it's like the like count is sort of always going up erratic you know points or something like that can i say hey only refresh this every once in a while? Or is that like, hey, you know, that's this is something we believe in consistency. And for right now, that's that's sort of what's available. Yes. So if you've got the if you if you take the as a default behavior, if you just connect up poly scale, it's running in what we call auto mode.

Starting point is 00:20:00 So we use the AI to drive everything as default. You can, however, come in and overwrite any part of that. So you can overwrite, you know, you can set manual TTLs. So if you know, for example, look, I've got my products table and that just doesn't change. I'm going to set that to be a cache of 20 minutes and that's perfect. You can come and do that. And you can go right down to the individual SQL query level. So within the product, you can kind of get this nice observability view of what are my slow queries, where are we being most efficient, and you can literally overwrite any one of those. And you can do that down to the table level as well

Starting point is 00:20:35 if you've got sort of more fine-grained stuff than that. So, yeah, you can come in and set those up, whatever you want them to be, or you can just go with set those up, whatever you want them to be, or you can just go with the full sort of auto approach. And what we've found, yeah, the majority of people actually just run in the auto mode without sort of any manual interventions, which is nice. Yep. Absolutely.

Starting point is 00:20:57 All right. So I interrupted you. You're talking about invalidation. You talked about that first level of basically like row invalidation, individual updates, things like that. Yeah. So we've got kind of the statistical-based invalidation, which happens out the box. We've then got what we call the smart invalidation that looks for inserts, updates, and deletes. And we keep an index of what do we have stored, what are the re-queries we have in the cache,

Starting point is 00:21:19 and how do those get affected by those updates. And that really means parsing the query. We look at the rows, columns, fields, tables that actually get affected by those queries. Then finally, that works really well for most use cases with the exception of if Polyscale can't see those updates. If, for example, you've got some direct updates going into the database, maybe they're just cron jobs, imports, whatever they may be, and in those cases you can just connect a CDC stream into Polyscale.

Starting point is 00:21:56 And that's really cool that you've got a feed going in of your real-time updates and that keeps the cache up to date globally. We're just working with a really interesting client actually who's doing that sort of... They're actually bringing in their GraphQL data into Polyscale using that method, a CDC stream for invalidations, to invalidate the cache globally. So those are the three methods that we use for keeping the cache real time and fresh. Yeah, very cool. And so just to, if I compare you for other things I've seen recently in the space,

Starting point is 00:22:32 like I would say one other one that came to mind initially was ReadySet, which is another like drop-in protocol compatible one. But they are more, as I understand it, like Noria data flow based, right? Where you sort of like define your queries, you want cache, and they'll maybe just hook into CDC and sort of cache these expensive ones where you are sort of all queries and basically like an automated traditional read-aside cache that you don't have to implement yourself. That's right. So my understanding, you know, ReadySet is effectively they're sort of rendering those result sets up front. And there's a few similar, I guess, platforms and tools, I guess, materialized Vue style implementations.

Starting point is 00:23:13 And there's definitely pros and cons of these approaches. And I think you're right. Sort of on that approach, you have to define what are those queries that you want to cache, which isn't uncommon by any means. But it's a case of the developer will select what those queries are and pre-render those up front. When I was very much in the R&D stage of the early stages of Polyscale, it was a case of the types of access patterns may not be known up front. And that was a real blocker for me around the sort of materialized views world. And this really does go back to now that we can plug in an entire e-commerce app and then just watch that start caching without doing anything, literally without

Starting point is 00:23:55 doing any configuration. So if you send a brand new query at Polyscale that it's never seen before, you'll get a hit on typically on the third request. And then what it does from there is it compares queries that are similar to each other. So it removes the query parameters and it will say, look, if we've seen something similar to this, you'll get a hit on the second request. So the actual speed that you can go from nothing to kind of high hit rates is pretty impressive. Yeah. Is it like, you know, Java and a JVM where it's just, you know, it takes a little while to get up to like, bam, really hitting peak performance. And if it like it's just sort of learning for a while as it learns your query patterns and access patterns and it really starts humming. Yeah. I mean, the actual

Starting point is 00:24:40 learning never switches off. So there's never every query that comes through the platform feeds into the, into the algorithm. Um, but as I say, it's actually very efficient from a cold start. You get a third query, you'll get a hit. So, you know, if you've got any sort of, I mean, the queries you care about, you're going to see quite regularly in a caching type environment. Um, so yeah, you can really go from zero to go from zero to high hit rates very, very quickly. So, you know, even if you're sort of purging your entire data set, it doesn't really matter. You know, you can get up very quickly. Yeah, very cool. What about just under the hood, actually handling the cache? Is that something like memcached or Redis? Or do you have like

Starting point is 00:25:23 your own custom sort of just just caching like a key value store on on those nodes or what's that look like yeah it's our own custom we we kind of we've been down a bit of a there's this history here and uh and we hit performance issues we started out with redis um and we hit performance issues around just the speed that we could read and write with concurrency and things of that nature. So we ended up going with our own solution. And it's nice. We sort of share memory and disk, so we can predominantly everything that's kept in memory, as much as kept in memory as we keep the optimum stuff in memory, the stuff that's hot. And then we fall out to disk if we need to, if stuff's overflowing.

Starting point is 00:26:05 And what's nice about that is we don't have to worry about the size of the data set. So people can be running very large data sets and that works well. It scales well and obviously you can have that data in different regions. And if you think about what we actually do, we cache the result of a SQL query or a non-SQL query, which is typically relatively small. I mean, obviously there can be larger payload sizes, but typically relatively small. And what's nice about that is we're not sort of a database read replica where we're taking an entire copy of your dataset. So we're just storing the results set. So we can actually store

Starting point is 00:26:40 large amounts of query results quite efficiently. So yeah, we store in memory and then fall back to disk if we need to. And that gives us a nice way to scale very high data sets, large data sets. MARK BLYTHEAWORKER Yep. Very cool. OK, now I want to talk about the global edge network, right? You have these sort of points all around the globe that people can hit and it'll serve it if it's cached there, it'll route it back to the database if needed. We're seeing a little bit

Starting point is 00:27:12 more of this with CDN type things where selling Netlify or hosting providers fly. This is pretty interesting to see as a caching provider. How hard was that to build? Walk me through that. What's that look like to build that sort of infrastructure? Yeah, so the actual, as I say, really, we have a proxy component. We have three tiers of our architecture. So if you think about the sort of the bottom tier is what we call our private cache endpoint or the proxy component. And that's the actual component that manages the TCP connections that pass through or HTTP connections.

Starting point is 00:27:49 We support both. And that actually stores and persists the data. That's sort of the main proxy component. Now, we run that specific proxy component in multiple locations across most of the major providers, AWS, GCP, Azure, Fly, DigitalOcean. And that then connects back to the AI control plane. So if you think about that sort of proxy component that's just responsible for checking, have I got something in the cache? If I haven't, let's pass it on and

Starting point is 00:28:19 go get it. And if I have, I'll serve it from the cache. And it's kind of dumb in that perspective because that's how you go fast, right? It's make it simple. And what it does is it then offsets whatever query came through onto the AI control plane, which actually takes the SQL and pauses that and does the more expensive stuff that we can scale independently of kind of that fast track. So the nice thing about that is we can spin up these different locations wherever they need to be, and they'll connect back to this AI control plane that pauses and processes the queries. And then all that does is it sends back TTL messages back to the proxies

Starting point is 00:28:58 to tell them, you know, that specific query, that now has a TTL of this number of seconds. And that's a, you know, there's real time traffic obviously passing through that process, but it makes it really easy for somebody to, or for us to actually deploy these proxy locations because they're just containers. We can run them pretty much anywhere, which is Kubernetes. So the actual, the actual work, more of the work is in the high availability,

Starting point is 00:29:29 the uptime, the monitoring, the DNS around. We have a single DNS network where it will resolve to the closest point of presence. And that's, yeah, as I say, I think what happens is if a pop goes down for whatever reason, say there's a hardware failure or whatever it may be, we just fall back to the next closest point of presence. And that won't be the fastest one typically, but it's not going to yield downtime, which is the important thing. So as far as the running out. And then just to be clear, so are those pops, the proxy layer, are they storing any cache data there or are they just routing it to the nearest then cache layer? No, they actually store the data. Yeah.

Starting point is 00:30:12 So the proxy layer is actually what does the storage. And that's a single component that runs in all the different locations. And that does the TCP connectivity and it does the actual storage of the data itself. So they're relatively easy for us to deploy. We can roll those out in a couple of hours to a new location. We're doing that sort of based on where customers are asking us to be. And as I say, they connect back to the control plane to actually do the slower processing of the data.

Starting point is 00:30:38 Gotcha. Is that control plane centrally located or is that distributed across a few locations as well? Yeah, that's actually distributed as well. So we've kind of doubled down on AWS specifically for that. And that control plane lives in AWS, but it's distributed across multiple locations. And the constraint there is that the control plane must be relatively low latency back to the proxy. So we don't want that hop to be too

Starting point is 00:31:05 high. We don't want the round trip to be too high. So we always make sure that there's a low latency between those two. And then there's actually a third layer on top of everything, which is responsible for the global invalidations. So if we get an invalidation from one specific location, that gets a fan out effect that goes out to all the others and that's what what gets managed there yeah okay and then so can you give me a rough number of like how many sort of pops you have for the proxy roughly how many control planes you have control planes you have what is that what are those numbers look like we have 18 pops at the moment across sort of the major hyperscalers and then we've got a bunch

Starting point is 00:31:46 running in fly as well um but yeah so around 20 sort of edge pops and then a percentage of those are full control plane pops as well um i guess we've got six seven that are sort of full full blown um that do the ai control plane as well. Gotcha. And is there any like proactive filling of caches or does it mostly just make sense to, you know, whenever a request hits a pop, it reaches out to the control plane and fills it there and it's unlikely to be hitting one of those other pops anyway, so you don't want to do that much proactive.

Starting point is 00:32:20 It's a really good question because one of the sort of challenges I saw prior to Polyscale was the sharding, the classic sharding problem. It's, you know, I need to have read replicas and what data do I put where? And, you know, you get to the point where the data sets go so big that you're spending more time sort of replicating out to its regions rather than actually serving the request. So it actually works really well. And at the moment, Polyscale will only store the data that's requested. So you kind of get this natural, lazy sharding that goes on.

Starting point is 00:32:51 It's the, you know, if I've got a lot of traffic being requested in New York, well, that's where it's going to get stored. And it's not going to get put in the other pops. So, you know, one of the things that, and that works really well, as I say, with very large data sets, you can service your audience really quite specifically and only optimize your compute that you're spending, you know, running those queries only at the locations

Starting point is 00:33:17 where they're actually being needed. So, you know, in the early days we did sort of look at like, well, can we warm the caches in other regions or should we? And I think it's um there's diminishing returns there i think you spend most of the time sort of shuffling data around the planet rather than that actually being used um now what's great is we do have visibility of all of that so we can see what queries are being used where and we can guess the likelihood that they're likely to be used in other regions so i think we may we may sort of go into that area in a bit more detail but um but yeah at the moment we do nothing it's very lazy and it's very sort of per region yep absolutely all right i want to shift gears a little bit and get into like

Starting point is 00:33:57 pricing and operations because i think that's another like interesting place where where you all are are innovating so first of, just pricing and sort of operational model. It looks like a serverless model. I'm not paying for instances. I'm not paying for a certain amount of CPU and RAM and anything like that. Pricing is totally based on egress. Is that right?

Starting point is 00:34:16 That's right. That's right. And so at the moment, everything's based on egress. And as you said, it's nice in that you can scale to zero so if you think about your classic e-commerce environment where it's kind of follow the sun you know busyness different regions can can scale right down and and others can come up so the serverless works well and more recently we've just launched our self-hosted version so you know if you're for compliance and security reasons,

Starting point is 00:34:46 you can't put your database data into a public cloud, you can run Polyscale inside of your VPC. And that's based on an event-based model. So you'll pay for the per million queries that get processed by the platform because, of course, there's no egress that's actually happening there. So, you know, our cost is actually processing the SQL queries that come through the platform. And we do that at a price per million queries. But on the serverless offering, it's just egress. Yeah. And OK, I want to come back to the pricing and operations. But talking about

Starting point is 00:35:15 that self-hosted, we're seeing that. What does that self-hosted look like? If I want to self-host, am I actually running the command, setting up a Kubernetes cluster and doing that? Or are you putting it in my account and managing it for it for me but it's in my account yeah so you've got um so the there's two two options here depending on your security and compliance requirements so the first one is you can just take that proxy component and that's just a docker image and you can spin that up inside of your you know ecs environment and and you're up and running um now what's great about that is that's still offsetting anonymously back all the queries back to the ai layer that we host the control plane so if you're if as a company if you're comfortable with having sort of anonymous connection

Starting point is 00:36:02 going out and back to the ai control plane all you have to do is literally spin up that single proxy component or as many of those as you want. Like you can deploy those into 10 different locations and have your own sort of mini edge network. And that's really nice because people can literally be up and running in a few minutes. You can just pull it down, start it up and route your traffic through it. If you're sort of a much larger organization or a bank or whatever it may be, then you want to actually host the control plane as well. So you want to take that control plane and run that internal to your organization as well. Obviously, there's more involved in that. But yeah, for most enterprises, it's really nice in that you can just pull down that single component

Starting point is 00:36:42 and you're good to go. Gotcha. And so then, I guess in both cases, they are actually running it themselves. They are deploying it. Like you make it available and I'm sure easy for them, but it's not like I've seen some models where maybe they created a separate AWS account. And now I, as a provider run stuff in that account, but you sort of own the account and have visibility into it, but they're actually running it in this case, they're actually running it themselves somewhere. Correct. And sort of what's um works well there is

Starting point is 00:37:09 you know enterprises have their own requirements around uptime of their database you know specifically if you think about what we are if we go down your database goes down and that's an incredibly important brick in the you know a piece of the puzzle so that um giving somebody the ability to own that actually allows them to put in whatever restrictions they need so or what structure they need about uptime so whatever health checks that they are happy with are currently happening on their database they can happen through the polyscale proxy as well um and likewise any sort of HA requirements, they might have a hot standby or whatever it may be,

Starting point is 00:37:48 it puts the onus on the customer. So equally, we can build out private SaaS environments, but if you want to run it inside of your own VPC, that's typically the model that we follow. Gotcha, gotcha. Okay, going back to pricing, because you're pricing on egress, and I've been thinking a lot about egress lately and talking with people. I guess,

Starting point is 00:38:08 why was that the right factor for you to be, just the right thing to be pricing on? Is it because it reflects your cost structure or it just aligns with value for the customer? How did you set on egress? Yeah. I mean, egress was really, we think about all of the data coming out of the platform that really is from a sass environment that's our cost so that marries up nicely with our internal costs so we the two bits of the cost really obviously that proxy component is processing bytes on the wire that's coming out and then we're actually processing the sql queries that come through um so it aligns nicely with both of those.

Starting point is 00:38:47 And that's really the reason we picked it. I think there's always going to be customers who have a very small number of queries, very high amount of egress and vice versa. But the majority of customers sort of fall into a nice bucket somewhere in the middle. But yeah, it really sort of aligns nicely with our costs internally, to be candid. Yep, yep. I mean, I love having just sort of one factor to price on.

Starting point is 00:39:13 And I think like you're saying, it aligns with your costs, but then also it's going to align with how much data they're actually messing with and is a pretty good proxy for a lot of folks. Is that a hard mindset shift for people to be, I mean, first of all, just the serverless pricing generally, but then also specifically on egress, they may not have thought about egress before on their day. Like, is that a hard mindset shift?

Starting point is 00:39:34 There is a mindset shift there, definitely. And, you know, we're always thinking about, you know, different pricing models. And, you know, one option that we might look into is to, on the self-hosted model where we're pricing by the number of queries, that's an option to

Starting point is 00:39:51 roll out into the SaaS model as well. Because I think for the reason you mentioned, finding out how much egress you're actually running out of your database is not a number that springs to mind easily so typically people just connect in the platform and they use it for a couple of weeks

Starting point is 00:40:09 and they work it out but you know if you said to a dba well how many queries are you running through your database every month they probably have a rough number in mind so there's definitely pros there um so yeah it's typically there is a period of okay i need to see what these numbers are um so people you know do a test to put it through the staging environment do a period of, okay, I need to see what these numbers are. So people do a test to put it through the staging environment, do a bit of a pilot. But usually they want to do that anyway. It's not just to find out how much the egress is. But we can be pretty, we can give people good ballpark figures on what we see with other customers. Like it's like you're doing this number of queries.

Starting point is 00:40:43 Roughly, you're going to see this sort of cost implications. Yeah. On that same serverless operational aspect, I know sometimes I see people that have trouble letting go of the operational visibility of what's happening. If they're used to running caches, they're like, hey, I monitor my CPU or my memory used and available. Have you noticed that with people? I assume you don't make that visible to them. How do people react to that?

Starting point is 00:41:10 What metrics should they be monitoring as they're using Polyscale? Yeah, so if you're using the serverless model, then you're right. We don't expose any of that and any scaling issues we deal with. So if CPU is high or whatever on memory, then that's something we deal with. On the self-hosted where you're actually managing the proxy yourself, then yeah, exactly. That's in your wheelhouse. So you're just running a container, then plug in all your Prometheus metrics and business as usual, CPU and memory. And again, that goes into whatever you're doing at the moment to run those containers, you continue doing it with Polyskel. And we've got recommendations around sort of the amount of minimum RAM and CPU that's required. But yeah, we do very much sort of black box that on the

Starting point is 00:41:54 serverless environment. And again, it does go back to the plug and play stuff that from a developer perspective, I just don't care. I just want to see the cache run. It needs to be fast. We are consistently sub-millisecond response times on every cache hit. And as long as that happens, then customers are happy. is kind of that observability side of things where people, it's amazing how many people plug in the tool and say, well, actually, I didn't realize I was running these types of queries or the extreme cases we've had, one customer was running 500 queries per second they weren't aware of, right? Just by accident, there was a bug in the code.

Starting point is 00:42:39 But pretty much everyone that looks at it goes, oh, okay, I've learned something here. And so you get these sort of, and there's great observability database tools out there, but it does give you that holistic view of, okay, what's the expensive queries? And what are the ones I really care about? And some people like, oh, well, my ORM is doing something crazy, or I'm missing an index, all the classic stuff.

Starting point is 00:42:59 But that's definitely been something that really resonates with customers and people like. And, you know, just showing you what it's doing. Like, yeah, it's fascinating. That's what I've been thinking throughout this. I was like, I can't imagine how many people, aside from the caching and that stuff, just like visibility into what their database is doing and where those expensive calls are. Huge. You know, so many people, I think, don't have visibility into that. And getting a sense of that and seeing where those expensive calls are, just amazing.

Starting point is 00:43:28 Yeah. And that sort of goes back to that. If you're building this manually, you've got to first work out what are those queries that are the expensive ones. And you may know because you've had, in the worst case, you've got support tickets telling you, right, this is not working as expected, but yeah, usually it comes down to somebody to be looking at the slow logs or whatever it may be to start, you know, defining what, what are the ones we need to look at. So that, as I say, you can plug it in pretty quickly and get that view really quickly. Yeah. Yep. I did a,

Starting point is 00:43:58 I did an interview with Jean Yang from Akita software once where she basically used like EBF to just like intercept packets going through and just like so many people, I think had no visibility into their APIs and they're just like, hey, this is a non-intrusive way to do that and gather it. And it sort of reminds me of a similar thing here. You don't have to make a ton of code changes,

Starting point is 00:44:15 do a bunch of instrumentation to figure out like what's slow in your database. You can drop this in and give visibility and the slower ones will start getting cached and things like that. So lots of customers do that and they'll sort of actually, you can plug this in and give visibility and you know the the slower ones will start getting cached and things like that so lots of customers do that and they'll they'll sort of actually you can plug this in and turn off caching so it's literally just a pass-through and then that gives you all the metrics it gives you all the observability and all the potential wins like if you want to switch that button on so lots of people start there they do yep yep um you know you mentioned earlier about

Starting point is 00:44:43 adding mongo and i'm just gonna make the plug for Dynamo. Like I love Dynamo. I don't know how many people would like DAX is sort of there on some things. But a few things I would just say with DAX is, number one, it's going to be instance based. So it's not a service operational model. So now you're pulling in this thing that you have to manage, which is unfortunate. Number two, like Polyscale is going to be distributed across the globe for you. So if you have customers all over the place, you'll get caching that way. But then also your caching story, I think, could be just the invalidation stuff. And especially the CDC, there's Dynamo Streams.

Starting point is 00:45:17 DAX does item-level caching. So if you're getting an individual thing, it'll invalidate that for you. But then it does query caching if you fetch getting an individual thing, it'll sort of cache or like invalidate that for you. But then it does query caching if you fetch a result set, but it doesn't invalidate that very well. So if you query a set of 10 items and then you go update one, it's not going to invalidate that for you. You sort of have to wait for it to expire.

Starting point is 00:45:38 Whereas it sounds like you all have done the work to sort of figure that out and you could do it there. So I'm going to make the pitch for Dynamo. Yeah. It's definitely a good one. Yeah. to sort of figure that out and could do it there. So I'm going to make the pitch for Dynamo. Definitely a good one, yeah, because we could effectively plug in the CDC stream in or SNS or whatever it may be and pull all that data in. So, yeah, I think there's a long list, actually,

Starting point is 00:45:57 of people that sort of pitch their next data platform of choice. And I think it's a nice... We talk about this a lot, but the fact that most enterprises are using multiple persistence layers, the right tool for the right job. I think if you'd asked me five years ago, I would have said to you, well, everyone's going to consolidate on one or two databases, and that just isn't the case.

Starting point is 00:46:23 It's gone absolutely the opposite way. Vector databases and I think Postgres is definitely having its day. It's Greenfield projects is a great starting point because you know you can scale well from a broad set of use cases. But I think the whole concept of supporting multiple persistence layers in the same method, being able to drop that data anywhere, get low latency hits, I think is valuable. So yeah, we're pretty excited about moving into different spaces. Yeah, very cool. I want to close out with some business stuff and just hear about where you're at as a company,

Starting point is 00:46:58 as a team and things like that. So yeah, just start with that. Where are you as a company? Have you raised funding? How big is your team? Yeah, we've raised funding. so we're small team we're less than 10 people um we're fully distributed we're kind of all over north america um spain germany london um and yeah we've been around for about two and a half years um and really sort of that first year was, you know, I guess we got our first product to market, which was the MySQL product after about sort of a year, maybe 15

Starting point is 00:47:34 months. And then really from there, it's sort of scaling challenges of what we've been focused on. So it's one thing to actually build this, but actually to make it fast and to scale is a whole nother level. So there's been a lot of work there before we started then adding additional databases. Yeah, we've raised some Series C money. We raised $3.5 million to date. And, you know, that really allowed us to, as I say, we have a small team, and that's kind of by design. We can do a lot with a small team and that's kind of by design. We can do a lot with a small team.

Starting point is 00:48:06 And as I say, we've built a fast and efficient platform now. So yeah, really now we're focused on, I guess from just a high level roadmap perspective, today it's all been about getting the data out of the cache at the right time. So let's make sure we're evicting at the right time and making sure people are getting, you don't want to serve people bad data. That's stale data. That's not a good use of cache. And where we're going in the future now is being clever about preloading and

Starting point is 00:48:34 pre-warming data. So you can think about use cases around personalization, for example. So you're logging into your cell phone provider account or your bank account or whatever, and you're likely to press one of these buttons across the top here, or what do you do previously? We can go preload that data. We can go pull that data in. And that's really exciting because then you're sort of using it as a cache, but obviously then it becomes a bit more than the cache. You're sort of saying, well, I've got a persistence layer here that can handle any types of queries, and it's always going to be running fast. It's always going to be bringing in the data that you need. So that intelligence layer, we can sort of crowdsource that across all users. And that's pretty exciting. That's sort of what we're focused

Starting point is 00:49:18 on as well as sort of moving into, as I already mentioned, into those different layers. But yeah, so we're a small team distributed and um get lots of hard problems to deal with so yeah very cool it's so much fun to see what you can do with a small team that's like you know intensely focused on a on a hard problem it makes awesome progress it is and um it gives us that agility it really does like we can pivot really quickly onto projects that come at us. And as I say, I never, you know, you definitely, there are huge benefits from having a small team. There really are. And I think getting the right people is challenging

Starting point is 00:49:56 and getting the skill sets you need at the right time is challenging. But yeah, it definitely gives you advantages over much. People that raise a lot more money have gone out and scaled up much larger team you know team sizes and uh and there's definitely you know downsides to that yeah and a lot of them are you know zombie companies if they raise it at too much evaluation and grow into it and you know over the last couple of years where we are in the last couple of years it's been pretty crazy from a race perspective so um yeah we've been pretty tight on where we've invested we're kind of not out at every event or whatever that we'd like to be but we're uh you know we're building a good product that's the focus yeah great um so one thing i've talked to a few sort of cloud

Starting point is 00:50:41 native database companies and one thing i always ask them is just like how do you get people to trust you given that you know you're a new company and you're dealing with their data their storage of data you know it's a little less of a concern since you're a cash right you're not like the primary permanent persistent store but but one thing you mentioned is like being a read-through is like sort of your uptime is is my uptime now right like your availability is my availability how did you yeah and so like i imagine you spend a bunch of time on thinking just um how to make that better and be you know high available from your end but how do you convince customers or just sort of deal with their feelings on some of that stuff what do you how do you how do you approach that

Starting point is 00:51:20 yeah it's a really good one it is front front and center, and it should be, right, for anyone who's plugging in a tool like Polyscale. And it's interesting where I was having this conversation actually a couple of weeks ago with a prospect, and they were saying, yeah, how do we get comfortable with what you're doing here? And I think what I actually dug into their specific scenario was quite interesting. And they were already proxying all of their SQL data through a security company that was doing PII analysis and a whole bunch of other stuff. And I said, well, okay, how did you get comfortable with that? Because you're doing exactly the same there. And anyway, long story short, I think the focus is you do it piecemeal. You take, here's a specific function or whatever it may be.

Starting point is 00:52:04 And you say, well, let's start routing that through Polyscale. So, you know, from an integration perspective, Polyscale is just a connection string. So rather than going direct to your database, you're going to Polyscale, and then that gets routed onto your origin. So what's cool about that within your, could be just a serverless function. It could be even within your monolith, just have that sort of dual connectivity. You can route some traffic through Polyscale and others not through Polyscale. But obviously you start with your development and staging environment. So

Starting point is 00:52:34 nine times out of 10, people are going to plug it into a DevOps staging. If they want to use the cloud environment, the serverless environment, it's a good starting point because it's just easy to do. It just connects it and have a play. And that will allow you to start to get confidence with the platform. And people, first, you want to test the sort of smart invalidation. You want to see that working. And that's a really easy thing to test.

Starting point is 00:53:01 But yeah, definitely people start piece by piece. They're like looking at certain features or functions or, you know, great use cases. Well, I'm just going to break out this specific query to run on Cloudflare workers. And I want to run that through Polyscale because I can run that fast everywhere. And that's a great use case and it's easy to do. So, but yeah, you've got to build that confidence. You've got to build that trust. And that then goes down to obviously the infrastructure that we provide, that having sort of the high availability and failover built in. And we're also at the mercy of all of the providers

Starting point is 00:53:45 that we work with across AWS, GCP, Azure. And, you know, so we definitely... But what's nice is that if you think about sort of a classic TCP connection, they are designed... Sorry, classic sort of database TCP connection. It's typical to lose that connection and reconnect. You know, you're going potentially that connection and reconnect. You're going

Starting point is 00:54:05 potentially across the public internet. You've got no control over routing or packet loss and ORMs and whatever logic, reconnect logic is default within a database environment. So for example, if you lose a connection, another one's going to get initiated by the client software. They're good at those, right? That's what they're good at. So if you are in a situation where you do experience downtime and you switch over to another environment a couple of seconds later, that architecture has to be robust, right? That sort of DNS behavior has to be robust. So yeah, I think start small is the answer. And it's definitely a challenge. People are, because, you know, it's different to putting pieces of your queries into this.

Starting point is 00:54:51 This is the whole environment. So, yeah, it's definitely start small and grow. Yeah. On that same note, are most, you know, new customers for Polyscale, are they people that were previously using a cache and were just like, hey, I don't want the operational burden or manually doing all this work? Or are they people that are just new to cache generally and like, hey, this is a much easier way to do it than having to go back and instrument my code? Yeah, it's interesting because when we started this, it was like, okay, people want to sort of solve that latency issue. That's the number one thing that they're after. It's, I'm going to have a distributed app and multi-regions coming and everyone's running at the edge in the next couple of years. And

Starting point is 00:55:28 really what we found is there's three use cases that are kind of, you can't predict which ones people are going to be using, but there's the classic, okay, my queries are slow, which is very traditional. I've got, you know, my database is on fire for whatever reason. It could be concurrency or indexes or whatever. Then there's the latency one. So I'm in one or more regions and I need to reduce that network latency. And then the third one that's is cost savings. So, and I guess, you know, you could look at this and say that's pretty traditional for a cache, but the fact that you can plug Polyskelion into your entire application really does yield quite large cost savings because if I'm serving 75% of my reads from not to my database,

Starting point is 00:56:10 I can either go do more stuff with that resource, serve my rights faster, or I can reduce that infrastructure spend. So those three things are really across the board, which isn't a great answer, but we definitely see all three from different types of customers. Yeah, absolutely. Well, Ben, I appreciate this conversation. It's been a lot of fun and just learning about it and just seeing all the interesting things that you're doing. I think the operational model, the sort of visibility into what's happening in my application, the global distribution, in addition to just like the smart

Starting point is 00:56:44 caching work that's happening there. I think there's a lot of interesting stuff there. If people want to find out more about Polyscale or about you, what's the best place to find you? Yeah, so just obviously website polyscale.ai. You can email myself, ben at polyscale.ai. And we're obviously on Twitter and all the usual channels. So yeah, definitely reach out. And we have a Discord channel. So yeah, it'd be great to connect with people. All right. Sounds great, Ben.

Starting point is 00:57:06 Thanks for coming on today. Great. Thanks for your time, Alex. Great to meet you. Yeah. Bye.

Software Huddle - AI-driven Database Cache with Ben Hagan from PolyScale

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.