Orchestrate all the Things - Aerospike Graph: A new entry in the graph database market, aiming to tackle complex problems at scale. Featuring Aerospike CPO Lenley Hensarling

Episode Date: November 6, 2023

“Graph database growth is going strong through the Trough of Disillusionment.” And “Graph Analytics go big and real-time.” These were two of the headlines of the Spring 2023 update of the... Year of the Graph newsletter. In combination, they seem like an appropriate summary of the reasoning behind a new entry in the graph database market: Aerospike Graph, which Aerospike officially unveiled in June 2023. We caught up with the company’s Chief Product Officer Lenley Hensarling to discuss this long journey that started about three years ago, as well as Aerospike's differentiation in a very densely populated market. Article published on Orchestrate all the Things.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Orchestrate All the Things. I'm George Anadiotis and we'll be connecting the dots together. Stories about technology, data, AI and media and how they flow into each other, saving our lives. Graph database growth is going strong through the Trofodov-Ziluzhiment and graph analytics go big in real time. Those were two of the headlines of the Spring 2023 update of the Year of the Graph newsletter. In combination, they seemed like an appropriate summary of the reasoning behind a new entry in the Graph database market, Aerospike Graph, which Aerospike officially unveiled in June 2023. We caught up with the company's Chief Product Officer, Len Neyhen-Sharlin, to discuss this long journey that started about three years ago,
Starting point is 00:00:45 as well as Aerospike's differentiation in a very densely populated market. I hope you will enjoy this. If you like my work on Orchestrate All The Things, you can subscribe to my podcast, available on all major platforms, my self-published newsletter, also syndicated on Substack, HackerNin, Medium and Dzone, or follow or gesturate all the things on your social media of choice. Yeah, well, thanks. Glad to be here, George. And thank you for your efforts on the white paper. It turned out really well. Yeah, I've been at Aerospike now for, you know, almost five years. And I came to it with a background in enterprise software, sort of across both infrastructure, databases, networking, and also large enterprise applications.
Starting point is 00:01:38 So enterprise resource planning, manufacturing, logistics, things like that. So I was a big user of databases. And so I've been on both sides of it and different types of databases too. And did work on directory services, which were some of the first users of in-memory databases back in the day. Here at Aerospike, I came in as a consultant on strategy and then became chief strategy officer and then put together the product management group here. And now we have a strong product management group. I know you've worked with Ishan Biswas, who's the product manager for Graph, which we're
Starting point is 00:02:19 going to talk about today. And my whole focus has been trying to figure out how we apply more data to decisioning you know throughout my career how do we do that in a way that's cost effective right you know we know that there's a net yield because there's a cost to run the computers there's a cost to to manage things and to program applications on top of databases. So all those things come into a factor. But that's been something I've been focused on and also focused on what I jokingly call catching up to the present.
Starting point is 00:02:55 With databases and software in general, we haven't been able to reflect what's going on in the moment. We know, we started out with batch solutions that we could tell you what happened last month, and then it's gotten progressively better. And now we're still trying to catch up to the exact moment, and the world just moves faster. So, you know, we say milliseconds matter. Okay, well, great.
Starting point is 00:03:23 Thanks for the intro. And I guess the next reasonable question to ask, at least for someone who's not necessarily familiar with Aerospike is, well, what's the key premises? What are the key premises behind Aerospike? And therefore, I guess, by extension, what was it that attracted you to join the team? Yeah, well, two guys started AeroSpike, one of whom I knew from my past in networking, Brian Volkowski, and one of whom I've worked with a lot in the last, you know, almost five years, Srini Srinivasan. And Brian came from a networking background and a storage background, you know, working with SSDs and
Starting point is 00:04:07 all the way back to, you know, file servers and such. And Srini came to it from a strong theoretical database background. He's one of those PhDs from University of Wisconsin, which, you know, historically has been kind of one of the database research institutions. The whole hypothesis of Aerospike was that companies needed to apply more data in a short time window with a predictable service level agreement or SLA and to do that in a cost effective manner, as I was saying earlier and you know this notion of being able to do that and do that cross cloud and on premise was something that was key to the to the founding of the company the other thing I'd say about
Starting point is 00:04:59 Aerospike is that we are a true infrastructure system software company. And we have a different type of engineer than a lot of companies. You know, we pay attention to, you know, how we handle concurrency. We exploit the hardware that's available now. And, you know, I'll say that people say, does that matter given that everything's in the cloud? There is no cloud. It's just somebody else's computer. And, you know, it's a networking infrastructure
Starting point is 00:05:34 and you have to optimize for that. And so the engineers we have think deeply about these things. We have massive throughput in the product and that's a result of this concurrency and multi-threading encoding specifically so it scales like that and i think that attention to detail is what really differentiates aerospike from some of the other uh no sql and databases in general, really. Okay, so you just said the magic word, so NoSQL. I was going to say that, well, everything you've just said sort of makes sense, but you didn't actually mention what I would probably start with.
Starting point is 00:06:20 I mean, the fact that Aerospike is, well, a key, at least initially and primarily, a key value store. I know that over time, you have expanded that to also touch upon different data models as well. And actually, that's the occasion for having this conversation today. I mean, the fact that, well, the latest addition, let's say, to Aerospike's arsenal is the expansion to graph. But before we get there, I think it's worthwhile just covering a little bit the key, the initial data model, so the key value aspect. So this is where Aerospike started from, and then also the additional data models that you have expanded to over the years
Starting point is 00:07:05 and a little bit of the thinking behind that yeah sure George it's great that you brought that up really you know what what we focused on initially was as you said a key value store and we had some differentiation there you know the primary index is in a model we call hybrid memory. The primary index is held in memory. The data is stored on SSDs, but we treat the SSDs as a memory space, if you will. We don't write to it as a file system we don't write to it as block storage but we treat it much like memory and this has made it possible for us to get access to a given piece of data in sub millisecond time so going through the primary index we've expanded that to support secondary indexes that are held in memory, but also can be held on SSDs as well. And that's allowed us to expand the things we can do with it.
Starting point is 00:08:14 The other thing that's worth mentioning is that it is a distributed database. And so that always raises the question of how do you partition the data? How much work is it to, you know, figure out the partitioning model? How do you manage that? And we've done this in a way that we can partition the data for the customer. We can ensure that there are no hotspots. And then as data access patterns change, we can move the data around and change the partitions. And this goes on in the background as required.
Starting point is 00:08:51 So there's not much involvement required to exploit a very distributed architecture so that we can scale up to data sets in the size of multi petabytes. But with those large data sets in the size of multi petabytes but with those large data sets we can still gain access to a given piece of data off the primary key in less than a millisecond so we're talking microseconds there we can do queries on the secondary indexes that are single digit millisecond. And that's because of the way we handle the balance between indexing and memory and data on the SSD and being able to handle the indexes on SSD as well
Starting point is 00:09:39 for very, very large data sets, but still approaching it as if it was a memory space, if you will. Okay I see and so I know if I recall correctly then starting off with the initial key value model I think you have expanded in recent years to also touch upon document so I think aerospike also supports Jason now and I think you also have like a SQL interface as well yes so so one of the things that we looked at was we essentially have figured out storage and access to data on a key on a model, both you know primary index and secondary index and we we started out with it, what I will call a simple value type, you know one type of data, whether that's essentially an object. So with a map list structure or JSON structure, depending on how you want to cast it, if you will.
Starting point is 00:10:54 And that allowed us to support documents or to support object persistence, if you will, because documents as we were talking about here are really just a way to persist objects in an object model, a programming model. So we got to that point. Then we started looking at what some of our customers were doing with the product. And that was around graph solutions. So a number of our customers are in ad tech and in fraud.
Starting point is 00:11:28 And those are two places where identity management or identity resolution has become ever more complex as we try and throw more data at the problem. It's not just your username and password anymore. It's your username, your password, your history, trying to triangulate, you know, the devices you use, where you are, who you are, who you're with, even, you know, and be able to handle things like that. And as we saw them doing that more and more, and doing it on top of the aerospace you know as a key value store we we looked at an evolving pattern where some of our customers were using an open source graph technology that i know you're very familiar with george tinker pop it's a patch project
Starting point is 00:12:21 and they had they had been doing this, themselves, basically building a solution on that and then putting it on top of Aerospike as the storage model. And we saw that that scaled far beyond what other solutions in the graph space did. You know, we talked about being able to scale to petabytes of information and still have the access times remain very low. We talked about being able to balance the data between partitions. And so they were exploiting that and doing that with graph solutions based on TinkerTop. What we did was then started talking about, could we do that? Talking to some of these customers, right? Talking to other customers who are looking at the landscape of other graph solutions. You know, Neo4j is a great product.
Starting point is 00:13:16 You know, Tiger Graph's a good product and, you know, really tried to tackle the scale issue. And we thought that there was some white space at the level we play at, which is very large data sets where there's a demand for near real-time performance. And so then we started investigating TinkerPop ourselves, started experimenting with a layer that would tie TinkerPop back into Aerospike as the storage mechanism, defining the data models for it, and the first thing that we've
Starting point is 00:13:54 done is come out with an identity management solution based on TinkerPop and the Gremlin query language, if you will, right? It's more than a query language. I would say it's a graph language, more than just a query language. And we put together a solution that, you know, together with Aerospike, we think meets a need that exists for this high scale, high throughput, and low latency capability. And we've been in several proof of concepts now with customers, and we started that before we released the product, actually, working directly with some of these customers, and have sort of proven out the solution and
Starting point is 00:14:46 we'll move it forward you know going in the next year and support an OLAP you know an analytical graph solution as well as the identity graph solution that we have in the marketplace today and we'll just expand from there, building more solutions and providing it. I should mention too, that we use a tool called G.V. I think you're pretty aware of it. It's an ICE tool. It's an IDE essentially for Gremlin. And it allows you to visualize your graph.
Starting point is 00:15:20 It helps you to step through queries and things like that. All those things that developers really need to do and that's kind of the the path that we've gone down to and in some sense what we're doing is putting different personalities if you will on top of a very robust storage and access mechanism okay so it sounds like quite a journey. So I'm wondering, how long did it all take you? I mean, from the moment you became aware of clients actually using, building their own sort of ad hoc solutions for graph on top of Ferrospike, to the moment you decided that this is something that you would like to pursue as an official let's say implementation up to the moment where you were actually able to release it? So there are two answers. One's about three years right from the first time I saw this, you know, we have a very large customer who's in the payment systems business, and they were using a graph solution like we're talking about here, built on TinkerPop at scale, you know, with billions of vertices and thousands of edges, you know, connecting them all and and the first time i saw that i kind of went wow why don't we do something in that space it was just a thought and i started educating myself more and more on graph right i hired a product manager i think i called him out earlier ishan this was
Starting point is 00:17:01 and ishan started really digging into it, working with these customers directly. That was probably another six months. And then it took us about another year to hire the team and implement the solution. And essentially, it's taking the TinkerPop graph engine, if you will, and making Aerospike the storage mechanism for that, and really creating a graph database. And as I said, that took about a year, it took probably six months to get something up and running, because we were leveraging TinkerBot, you know, it's an Apache 2 solution. But then, as with everything we do, we wanted to make sure that it did scale that it had high throughput and such and so we worked you know and refined it and we'll continue to do that i think software is always a journey but you know in the last year we've been able to do this and
Starting point is 00:18:00 released i guess it was you know probably four or five months ago, our first solution. And as I said, you know, we'll follow that up with, with an OLAP solution so that people can do more analytic exploration with the graph and do that. You know, we see that tying back into, there's this interplay in the ad tech business. And that's a, that's a large vertical for us. It's not the largest anymore, I think financial services is, but they share characteristics, you know, fraud, identity management, and identity management or graph in support of those.
Starting point is 00:18:39 But really trying to get to solutions that are more out of the box for customers. Graphs not simple, as you know, you know, it's a different mindset. And we're trying to package data models for specific solutions with our graph capability as well. Okay, yeah, that's certainly interesting. And I think if you get to that point, it's going to be very helpful for people to be able to at least have a starting point out of the box. So they don't have to model everything from scratch. Modeling in graph is, well, a fine art, let's say, and it takes a while to master it.
Starting point is 00:19:23 So if you can give people a head start, I'm sure that will be appreciated. There's also something else I think worth noting. You did mention the fact that, well, Aerospike Graph is largely based on TinkerPop, which is an open source framework for managing Graph. It offers its own query language, which is called Gremlin. But as you also briefly mentioned earlier, it's actually more than that. So part of the appeal, let's say, that TinkerPop has is the fact that it can basically plug into any kind of backend. So it provides a service provider interface
Starting point is 00:20:07 through which theoretically any provider can implement that interface and so make their database or whatever other data management system work as a backend for TinkerBot. And I'm sure that this is the process that you also went through. And in order to achieve that and also be able to have that sort of performance that you were after, I'm sure that
Starting point is 00:20:34 it took lots of knowledge that you definitely have in-house about the specifics, the specific APIs and implementation and everything that has to do with Aerospike, but you also needed to get a good understanding of the TinkerPop fundamentals. So I know that in order to do that, you actually went and hired some help and probably the best one you could possibly get as far as TinkerPop goes. So I wonder if you'd like to share a few words on the collaboration with none other than TinkerPop's founder. So Marco Rodriguez, I know that you worked rather closely during that implementation stage.
Starting point is 00:21:21 Yeah, we actually hired Marco as a consultant, and then we also hired a number of other people who've been very active in the TinkerPop Graph open source community in working that to build out. And as you said, there's a service provider interface. interface and then the work we did was how best to implement that back to Aerospike specifically in in service to identity graph models and you know the work we'll do on OLAP right or the analytical component is to make sure that we have optimized use of Aerospike in the way it handles storage for that. And, you know, we had the project code name was Firefly to build that interface. It's non-trivial is what I'll say, right?
Starting point is 00:22:15 We went through some what I would call naive implementations that were not as scalable as we would like, that had certain hotspots, if you will, where you have nodes that get hammered too much. And we had to figure out how best to handle that. And over time, over this last year, we've really come up with a very sophisticated interface layer to our storage. The other thing I should mention is that the graph engine itself and this layer that talks to Aerospike have been built in a way that they scale out horizontally in a shared nothing sort of model. So the graph engine that's essentially just requires compute and networking attachment to Aerospike can be sized on instance types in the cloud or you know hardware that you might purchase that are specific to that part of
Starting point is 00:23:25 the application and they scale independently to the data set so you can have your aerospike cluster running and it's distributed but the the implementation of TinkerBot and the connectivity to the database scale independently of that so if you have high throughput and since it's shared nothing you can spin up nodes and spin down nodes as you have need to in terms of how much throughput how many connections you're going to have and that ability to you know be able to elastically scale that while you have the persistence handled elsewhere is a big cost savings to people where they have variable workloads so there's a great deal of elasticity built into the solution as well okay i see so um how would uh people actually get started with uh with aerospike graph and i think
Starting point is 00:24:23 there's probably two different uh categories to address here so existing aerospike Graph. And I think that's probably two different categories to address here. So existing Aerospike users who presumably already know the basics of how to use vanilla Aerospike, let's say. So how would they get started using Graph and how would someone who's not an existing Aerospike user get started with the new product? So, you know, we packaged it in a way that you can deploy Aerospike and then deploy the graph engine independently to some extent. You can actually get a free trial of both components, both the storage engine in Aerospike
Starting point is 00:25:04 and the Aerospike Graph service as a free trial and download that. There are Docker images and, you know, for both, and that can be deployed relatively easily when you go into production. It can become a little, you know, more complex and, you know, require a little bit more sophistication. But you can download that free trial. The other thing that we're going to have, and I think this will probably be Q1, maybe into Q2 of next year, we'll have a DBaaS solution.
Starting point is 00:25:40 We've recently announced our trial, free trials on a database as a service of Aospike, just the Aerospike engine. And we'll have a similar thing up and running, like I said, you know, in the first half of next year. And when we have that, it'll become very easy because you'll just, you know, say I want an endpoint and be able to play with that. And we'll handle the scaling of the graph engine and of the database behind it for you. And so that'll greatly simplify things. But right now, the free trial can be downloaded and installed on your laptop. It can be, you know, installed on your favorite cloud vendor. I see. Okay.
Starting point is 00:26:32 And so that covers the actual installation part, I guess. So what about support and documentation and education material? So where should people look for that? I know you already mentioned that there is a graphical interface specifically for the graph part that people can use. So it's called G.V. And as far as I know, you have a partnership with the vendor behind G.V. We're actually including links to download the community version of G.V. our free trial and you know when you download it for you know pretty pretty nominal fee you can buy the Enterprise versions of g.v through through g.v.com right they're a great partner and uh you know it's it's work that was
Starting point is 00:27:24 done by somebody who's used tinker pop a lot themselves and saw the need for this tool and built it out but yeah that's easily done i think that the documentation on the solution um is all available on our website you can go to our dev hub you know devhub.aerospike.com and there there are forums where people you know can get help on doing this you can get access to the documentation you don't have to buy anything to go read the documentation and understand that you can get the free trial and and get help through the forums on our developer hub. Okay, I see. And you also mentioned something else previously about the primary use cases that you are targeting with this release, at least initially.
Starting point is 00:28:21 So you mentioned ad tech and finance and the thing that these two domains have in common, so building identity graphs and those are typically used for fraud detection purposes. And I think it would help to add a little bit of background to that. So the reason that graph is a good match for that is that because well lots of graph algorithms can be beneficial for this particular use case. So you have algorithms such as PageRank or BreadFirst and so on that are often utilized in that context because in order you also mentioned the fact that, well, it's not just about, well, who you are. It's also people these days in the context of anti-fraud
Starting point is 00:29:14 are also checking things like, well, what is your network like? Or where are you based? Or how often do you try to execute certain actions and so on. So graph algorithms can actually help with that type of anti-fraud action. So I'm wondering, in order for people to actually utilize those algorithms that are to a large extent, at least, supported inherently, let's say, by TinkerPop. So where should they be looking?
Starting point is 00:29:52 Do you have maybe some kind of pointers for them? Like if I want to implement an anti-fraud solution, for example, do you have existing use cases that you can direct them to? We have some use cases written up. One thing I would say is that in general, and George, you and I have talked about this before, the graph solution world is one that's still evolving and new ideas are popping up all the time but what's driving it is really the the need to have alternate ways to you know just cookies and you know is is one of the things
Starting point is 00:30:36 that's happening you know people became a concern that Google and Apple might take cookies away, you know, from Chrome and Safari. And then how would they be able to get the same level of validation? The other thing that I think is, in some ways, a bigger driver is that in order to validate things, you know, there are bad guys in the world, bad actors, and they're always learning how to spoof or fake more and more of what we all use to try and, you know, validate identities and make sure it's you that's moving the money and not somebody else. But applying more data and more information about you and who you might be related to and you know if if i'll give you an example if a an ip address is outside of the realm or connected with bad actors or something like that or it's a part of the world that you know there's just no way you've visited. And being able to see things like that tied with your device that might not be your device.
Starting point is 00:31:50 Somebody may have faked that. And then they're checking all this level of complexity of interconnections to validate identity now. And as they gain access to more and more data sources and want to figure out how best to resolve that and to make use of that information that's that's kind of what we're seeing and there's a lot of literature out there um just um all over the place i would say you know in in the TinkerPop community, on their chat boards. There are chat boards out there just on graph. And as I said, we have our developer hub, where we take questions and can answer questions
Starting point is 00:32:38 as well. You know, the developers that work on our solution, you know, monitor periodically those things, the product manager monitors them, and we respond with, you know, information on it all there as well. Okay. All right. So, well, then I guess now it's time for me to wear my analyst hat. And as an analyst, graph and graph database is a space I've been covering for a while. So one of the characteristics, let's say, of this space is that, well, there's lots of divergence. There are many solutions out there. Well, some people say there's even too many of them. So last time I checked on DB Edgins, there were about 60 different graph databases in total, which is a lot, obviously. So whenever there's a newcomer, let's say, in this market, the first question that naturally comes to mind is like, well, okay, first of
Starting point is 00:33:45 all, welcome and nice to meet you. But then, all right, so after the initial welcome, what makes you special? So what is it about vendor XYZ? In this case, what is it about Aerospike that makes it stand out from the already numerous solutions out there? So, George, I think that's key. Typically, markets will have many, many participants. We've got 60, 70 graph solutions out there, as you mentioned. We're in a space, the database space, that's evolving pretty fast right now. And there are literally hundreds and hundreds, you know, 400, 500 different databases that are all competing for some space. One of the things that we've done is differentiate along a couple specific vectors right i mentioned
Starting point is 00:34:48 you know the notion of real time and and it's not just it's not just being able to respond to queries with low latency it's the ability to do that consistently regardless of what the load is, okay? Meaning if you have, you know, a few hundred people coming to your database and trying to get a graph answer, you know, that's pretty doable for most, you know, solutions. If you have hundreds of thousands per second, if you have millions of queries per second, then it's a different question. And the ability to keep up with that is something that in our database we focus on and something that we've focused on in how we built out our implementation of TinkerPop. You know, I mentioned that having it be able to scale horizontally, very fluidly, and then exploit the ability of Aerospike to handle the multiple connections back from our, you know, interface to TinkerPop, the layer we built there, and have that scale out
Starting point is 00:36:00 as well. And to do that in a consistent manner, you know, with low latency, even when you have, you know, high workloads, and even when the data set has grown to significant size, you know, we have a t-shirt that we hand out at meetups for Aerospike that says, write once, scale forever. And one of the things we really believe is that, you know, the application of more data, the, you know, we've come out with a term we use a lot now called aspirational scale. You know, even if your solution in the beginning has, you know, thousands of of users not hundreds of thousands not millions
Starting point is 00:36:48 of users right you need to plan for that number of users that much throughput you also need to plan for data sets growing one of the things we've seen over the last five years for sure this has been going on with aerospike for i don't know let's let's say we're going into our 15th year now of being a company that people are adding more and more data sources all the time an anecdote i'll tell you is that we have a customer in the ad tech space and I naively ask, you know, how many data sources do you add a year? And he said, well, we don't look at it that way. We add tens of data sources a month. And you know, all of this to refine exactly who's there,
Starting point is 00:37:46 what kind of knowledge you can have about the person, you know, presenting themselves, if you will, and to tailor and really understand where and what types of ads might go to that person. And by the same token, when it's financial services, you know, what does it really mean uh is that the person is this activity normal for them is that does that activity match up with the size of the transaction you know where they are and things like if you or I you know show up in
Starting point is 00:38:22 a part of the world we never go to and ask for sums of money that we don't normally ask for to sell stocks and transfer the funds we'll probably be denied at the same token when we go down and you know exercise some things out of our you know portfolios because we it's time to buy a house or time to buy a new car and we want a down payment. We don't want a lot of friction in that, right? And the ability to apply all that data to mitigate risk, if you will, but do it in a way that's relatively frictionless is what's driving all this the other thing i'll say is that there's a there's a new thing happening this is this is behind a lot of our the demand
Starting point is 00:39:11 we're seeing for graph solution is that in streaming media you know tv is no longer tv it's an internet app if you will right and so people are wanting to understand who's in the room watching TV. Might be the neighbors. They might be the more interesting person to advertise to, right? And so they know they're trying to buy a car, so they'll get car commercials, even though it's your house, right? Because they know, you know, what devices are in the room, and where those devices have been and things like that. And so there's so many different, you know, vectors of information that are coming in right now, and that people want to apply. One of the few technologies that can do that well, is graph technology, because because it's about associations if you will.
Starting point is 00:40:07 Okay so I guess the takeaway would be that it sounds like you consider performance and scalability to be aerospace graphs differentiating factors right? Yeah absolutely and I just always want to reiterate that performance is not just low latency it's also the throughput it's the number of connections you can support the number of you know people that can come a person on our board of directors has a great statement that he makes he said like he mentioned the non-linearity of the internet. And what that means is, you know, when you open your service, whether it's, you know, selling shoes or, you know, delivering laundry or delivering food, you don't know how many people are going to show up.
Starting point is 00:41:00 It's not like a store that's physical. And nobody wants to wait in line on the internet, right? People expect the performance of your applications to be consistent. And they don't care that there are 100,000 people there when they showed up or a million. And being able to handle that throughput as well is a key factor, as well as the size of data and the low latency. So then I guess the next question that as an analyst, I ask people when they tell me that, well, you know, we're focused on performance and so on. It's like, fine. Okay.
Starting point is 00:41:41 So I guess then you probably have done some benchmarking to be able to to support that statement and so I'm guessing that also applies to your case so have you done benchmarking and are people able to check on those benchmark results and I should also add a disclaimer before actually hearing you out that well well, even if you have, I always encourage people to check the benchmarks and then actually also do their own local benchmarking, let's say, using data that's representative of their own use cases and a setting that's also representative of what they are able to support. Because, well, the thing about benchmarks is that, well, they're only indicative. What really counts is the actual performance in your actual setting. Yeah, I would agree with that strongly, George. I think that, you know, what's given me the most comfort that we do scale and we perform. It's not just the benchmarking work we did, you know, we've done, we've done some what I would call nominal benchmarking up to, you know, several terabytes of data, and, you know, tens of thousands or hundreds of thousands of transactions a second. But we've done some proof of concept work, POCs, with actual customers using their actual data that they wanted to.
Starting point is 00:43:14 And it bore out what we've seen in our benchmarking, which is really that, you know, for, let two to five you know hop queries you know on a graph it's between three and seven milliseconds to get to get your answer and that that is invariant over uh large workloads and so we've done that both with some anonymized data that we got from customers. But like, as I said, what's given me the most comfort is that they have taken their data. We've worked with, so that we can, you know, hydrate the graph, if you will. And that those results have borne out as we have. We're currently, you know, working on models that we'll be able to scale out and then we'll in the cloud test this out with more hardware applied. But some of these customers, both on-prem and in the cloud, have put together very large data sets and run POCs for pretty high levels of concurrency and throughput. Okay, I see. So to come back to the previous topic of, well,
Starting point is 00:44:54 this market being very densely populated, let's say by a large number of vendors sort of vying for market share. One of the things that caught my attention recently was that one of those vendors that relatively recently entered the market, so about five years ago, namely Redis. Redis is an interesting case because they seem to have worked a similar path that you are walking down now having a sort of similar motivation let's say so initially Redis was obviously as people know not graph database per se but because of the fact that they saw that well some of their clients are
Starting point is 00:45:42 using graph they decided to add a graph extension to their offering and make it an official part of their product, let's say. Now, five years later and about four or five months ago, unfortunately, Redis made an announcement that they're winding down that offering of theirs. And they cited a number of reasons for doing that. Basically, they said that, well, even though they were also very confident and very happy with their implementation and its performance and scalability and so on, what they found out actually being in that market for as long as they have was that it was a hard sell basically because of the fact that compared to other
Starting point is 00:46:33 data models and implementations based on those models, graph was harder for people to wrap their heads around. Therefore, their projects took longer, they needed more help to get started and to walk them through. And in the end, it ended up not being viable for them, basically. That was more or less the reasoning that they gave for exiting, for sunsetting their product the way that they did. So because of the fact that I do see some parallels between what drove them to enter the market, that market, and what has driven you to enter that market,
Starting point is 00:47:15 do you think that there's anything you can learn by their experience and anything you can do differently so that you don't end up the way they did. Yeah, I think that we noted that when they pulled out and when I was doing my research for this, they hadn't gotten that much traction and that's what they're saying right they just didn't get traction I think that um some of this is a question of um sort of being fairly laser targeted you know on on what we're going after um and and our products very different shaded focused on scale and performance and the scale part. So Redis sort of grew out of being cash, if you will, sitting in front of other databases.
Starting point is 00:48:15 And that's still their primary use. And it's a great solution for that, right? And used very widely. We are more of a database and are used to selling more at the enterprise level and into more complex solution spaces, if you will. For us, we've recently entered the market with a database as a service. It'll go commercial.
Starting point is 00:48:48 You'll be able to just purchase with your credit card or your account on the cloud vendor and do things. But really, when you're designing enterprise-scale applications, it's not a simple thing. And getting the data models correct and such is not that simple. So we have the infrastructure in terms of both pre-sales and post-sales staff that help people develop complex solutions and solve complex problems and so i read their their announcement of you know sort of withdrawal from the market and and they said they like to focus on simple things do simple things well for you know developers and there's a there's a space for that The white space we saw in the marketplace was not the easy space is what I would say. It was the space where people like this payment system that I
Starting point is 00:49:56 mentioned, and some of our ad tech customers that are pursuing ad bidding for uh streaming video right that they're trying to solve complex solutions or problem sets with complex varied amounts of data and they don't expect it to be easy and we don't expect it to be that easy to sell into either. You know, we're always wanting to reduce the friction, hence the free trial, hence the database as a service. But we're focused at this space and they're focused at easy. And I think that, you know, that's why there's room for both Redis and Aerospike. We have a lot of customers that have Aerospike,
Starting point is 00:50:43 you know, running, supporting, you know supporting petabyte data sets and things like that. And then they may still use Redis, not on top of us because we don't require cache, we're so fast. But they may have relational databases or mainframes or whatever that they're just a cache in front of. And we see that quite a lot. Okay, so I guess then the takeaway is that, well, the market segment that you are targeting is different, and in a way, they accept, they embrace complexity, and therefore, you're not afraid that that complexity, which is to some extent inherent, let's say in graph, may sort of deter them. Yeah, I think there is a space in the market, and it just so happens to some extent that it's our customer,
Starting point is 00:51:36 our historical vertical footprint has been strong in financial services around fraud and strong in ad tech and that means both streaming you know ad as well and that those spaces have a demand that they have a need you know there's an unmet need to solve this problem that graph is a good solution for. It's not every company, but it does fit well with our focus. Okay, so speaking of complexity actually, there's also something worth adding in that respect. So we did mention previously the fact that, well, Aerospike Graph is built leveraging Apache Tinkerbook. So Apache Tinkerbook, for all its strengths, also has something which is, well, different,
Starting point is 00:52:34 let's say, than the other graph labels around, especially when it comes to the query language. So Apache Tinkerbook has its own query language, which is called Gremlin. And what's special about Gremlin is that, as opposed to pretty much all other graph languages I'm aware of, I think, it's a procedural query language as opposed to being declarative. What that means in layman's terms is that, well, if you write a query in any other query language, be it Sparkle or Cypher or whatever, you don't have to actually specify how you want your operation to be done. You just specify what it is you want done. In Gremlin, it's a bit different.
Starting point is 00:53:17 You actually have to specify step by step how you want your query to run. And for some people that's, well, some people love it because they like that fine grained control and that way we should also mention that it's possible to have to exercise control over how your query is going to run. And if you are very familiar with your data and the way it's distributed and you know things such as frequency indicators and all of those things you do have a fine-grained control over how your query runs and therefore you can make it run in a more efficient way. The flip side of that is obviously that well it's more complex to write a query in Gremlin as it is to write, to express the same query in other query languages. So having said all that, I'm wondering what's your take on graph query, different graph query languages on top of Gremlin is something you may consider in the not so distant future.
Starting point is 00:54:31 Yeah, so I think that this ties back to a theme in our discussion here, which is, we tend to solve data management problems for the most complex and the most highly scaled, you know, demanding solutions, right? And so, as you mentioned, having that control, being able to optimize it, spending that extra time maybe in developing that application matters in those situations because it has to run efficiently and, you know, with high performance. That's the type of customers we service. That said, you know, we've been trying to make, you know, use of computers easier and easier and easier all the time to make everybody
Starting point is 00:55:26 be a programmer maybe generative ai solves all of this right but i think with with respect to what we're talking about here we we also are looking at supporting the Cummings standards. You know, there's now finally sort of a standards effort around graph languages and they derive from Cypher, you know, and that that language that you mentioned. And there is the open Cypher project on on top of TinkerPop. And, you know, one of the things we found, we started looking at it and we recently decided to do OLAP primarily from customer demand, but it turns out that us providing the analytical
Starting point is 00:56:14 capability would be easier than doing the work to support Cypher, Open open Cypher and replace Gremlin or to have a pluggable model like that. And we decided that that effort is going to be substantial, but we also decided that we'll wait for the standard to solidify. And instead of implementing Cypher, we'll implement that standard, which is going to derive from that open Cypher work, I think, right? It's going on in the Apache community. Yeah, yeah, it is. It is going on indeed. And yeah, like you said, it's work that we expect to actually solidify sometime soon.
Starting point is 00:57:00 So I can see the point in waiting for that to come out rather than, well, sort of not exactly rushing out. We always have to think about the cost in the game as well, right? Indeed. So I guess based on what you said, I can already infer the next things that you're working on. You already mentioned adding analytical capabilities, so all up, and that's probably more immediate than the mid to long term goal of adding another layer of graph query language. Is there anything else that's in your roadmap for the coming period? I think that the other piece that I did mention was that we will be putting this up on our database as a service solution. And what I will say is everybody wants to move to the cloud it's not trivial to get that right and so we've already started initial work on that you know there are a lot
Starting point is 00:58:13 of learnings from us as we made you know multiple starts at putting up a database as a service just around Aerospike and its use as a database. And doing this on top of with the Gremlin and TinkerPop scaling in one manner and the database scaling in another and really making that be opaque to the end user who just wants to spin up a cluster of aerospace graph is something we've already started work on. And you'll see that come out, as I said, sometime next year, hopefully before the end of the first half of the year. Yeah. I do realize, as you also pointed out, that it may seem trivial if you are only interested in using the end product, the service, but it's actually not at all if you're the one who has to implement it.
Starting point is 00:59:13 So I understand it's going to take you a while. Exactly. Okay, great. Thanks. I think we're about at the top of the hour, so it's probably about time we wrap up as well. So I don't know if there's anything else that we didn't touch upon and you feel like we should, but if not, I would like to thank you for joining today and for sharing what you have with me and the audience. Well, thank you, George, for having us, you know, for having Aerospike participate in your podcast and, you know, always love to talk to you and, you know,
Starting point is 00:59:55 have learned a lot and from interactions with you and, you know, thanks for your support for the graph community in general, right? I think that it's still one of those technologies that needs evangelists broadly, and you're one of the leaders there. So thanks for that. Thanks for sticking around. For more stories like this, check the link in bio and follow link data orchestration.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.