Orchestrate all the Things - Aerospike Graph: A new entry in the graph database market, aiming to tackle complex problems at scale. Featuring Aerospike CPO Lenley Hensarling
Episode Date: November 6, 2023“Graph database growth is going strong through the Trough of Disillusionment.” And “Graph Analytics go big and real-time.” These were two of the headlines of the Spring 2023 update of the... Year of the Graph newsletter. In combination, they seem like an appropriate summary of the reasoning behind a new entry in the graph database market: Aerospike Graph, which Aerospike officially unveiled in June 2023. We caught up with the company’s Chief Product Officer Lenley Hensarling to discuss this long journey that started about three years ago, as well as Aerospike's differentiation in a very densely populated market. Article published on Orchestrate all the Things.
Transcript
Discussion (0)
Welcome to Orchestrate All the Things. I'm George Anadiotis and we'll be connecting the dots together.
Stories about technology, data, AI and media and how they flow into each other, saving our lives.
Graph database growth is going strong through the Trofodov-Ziluzhiment and graph analytics go big in real time.
Those were two of the headlines of the Spring 2023 update of the Year of the Graph newsletter. In combination,
they seemed like an appropriate summary of the reasoning behind a new entry in the Graph
database market, Aerospike Graph, which Aerospike officially unveiled in June 2023. We caught up
with the company's Chief Product Officer, Len Neyhen-Sharlin, to discuss this long journey
that started about three years ago,
as well as Aerospike's differentiation in a very densely populated market.
I hope you will enjoy this.
If you like my work on Orchestrate All The Things, you can subscribe to my podcast,
available on all major platforms, my self-published newsletter, also syndicated on Substack,
HackerNin, Medium and Dzone, or follow or gesturate all the things on your social media of choice.
Yeah, well, thanks. Glad to be here, George. And thank you for your efforts on the white paper. It turned out really well.
Yeah, I've been at Aerospike now for, you know, almost five years. And I came to it with a background in enterprise software,
sort of across both infrastructure, databases, networking, and also large enterprise applications.
So enterprise resource planning, manufacturing, logistics, things like that. So I was a big user of databases.
And so I've been on both sides of it and different types of databases too.
And did work on directory services, which were some of the first users of in-memory
databases back in the day.
Here at Aerospike, I came in as a consultant on strategy and then became chief strategy
officer and then put together the product management group here.
And now we have a strong product management group.
I know you've worked with Ishan Biswas, who's the product manager for Graph, which we're
going to talk about today.
And my whole focus has been trying to figure out how we apply more data to decisioning you know
throughout my career how do we do that in a way that's cost effective right you know we know that
there's a net yield because there's a cost to run the computers there's a cost to to manage things
and to program applications on top of databases.
So all those things come into a factor.
But that's been something I've been focused on and also focused on what I
jokingly call catching up to the present.
With databases and software in general, we haven't been able to reflect what's
going on in the moment. We know, we started out with batch solutions
that we could tell you what happened last month,
and then it's gotten progressively better.
And now we're still trying to catch up to the exact moment,
and the world just moves faster.
So, you know, we say milliseconds matter.
Okay, well, great.
Thanks for the intro.
And I guess the next reasonable question to ask, at least for someone who's not necessarily
familiar with Aerospike is, well, what's the key premises?
What are the key premises behind Aerospike?
And therefore, I guess, by extension, what was it that attracted you to join the team?
Yeah, well, two guys started AeroSpike, one of whom I knew from my past in networking,
Brian Volkowski, and one of whom I've worked with a lot in the last, you know, almost five years,
Srini Srinivasan. And Brian came from a networking background and a storage background, you know, working with SSDs and
all the way back to, you know, file servers and such. And Srini came to it from a strong theoretical
database background. He's one of those PhDs from University of Wisconsin, which, you know,
historically has been kind of one of the database research institutions.
The whole hypothesis of Aerospike was that companies needed to apply more data in a short
time window with a predictable service level agreement or SLA and to do that in a cost
effective manner, as I was saying earlier and
you know this notion of being able to do that and do that cross cloud and on premise
was something that was key to the to the founding of the company the other thing I'd say about
Aerospike is that we are a true infrastructure system software company.
And we have a different type of engineer than a lot of companies.
You know, we pay attention to, you know, how we handle concurrency.
We exploit the hardware that's available now.
And, you know, I'll say that people say, does that matter given that everything's in the cloud?
There is no cloud.
It's just somebody else's computer.
And, you know, it's a networking infrastructure
and you have to optimize for that.
And so the engineers we have think deeply about these things.
We have massive throughput in the product
and that's a result of this concurrency
and multi-threading encoding specifically so it scales like that and i think that attention to
detail is what really differentiates aerospike from some of the other uh no sql and databases in general, really. Okay, so you just said the magic word, so NoSQL.
I was going to say that, well, everything you've just said sort of makes sense,
but you didn't actually mention what I would probably start with.
I mean, the fact that Aerospike is, well, a key, at least initially and primarily, a key value store.
I know that over time, you have expanded that to also touch upon different data models as well.
And actually, that's the occasion for having this conversation today.
I mean, the fact that, well, the latest addition, let's say, to Aerospike's arsenal is the expansion to graph.
But before we get there, I think it's worthwhile just covering a little bit the key, the initial
data model, so the key value aspect.
So this is where Aerospike started from, and then also the additional data models that
you have expanded to over the years
and a little bit of the thinking behind that yeah sure George it's great that you brought that up
really you know what what we focused on initially was as you said a key value store and we had some
differentiation there you know the primary index is in a model we call hybrid memory.
The primary index is held in memory. The data is stored on SSDs, but we treat the
SSDs as a memory space, if you will. We don't write to it as a file system we don't write to it as block storage but
we treat it much like memory and this has made it possible for us to get access to a given piece of
data in sub millisecond time so going through the primary index we've expanded that to support secondary indexes that are held in memory, but
also can be held on SSDs as well. And that's allowed us to expand the things we can do with it.
The other thing that's worth mentioning is that it is a distributed database. And so that always
raises the question of how do you partition the data?
How much work is it to, you know, figure out the partitioning model?
How do you manage that? And we've done this in a way that we can partition the data for the customer.
We can ensure that there are no hotspots.
And then as data access patterns change,
we can move the data around and change the partitions.
And this goes on in the background as required.
So there's not much involvement required
to exploit a very distributed architecture
so that we can scale up to data sets
in the size of multi petabytes. But with those large data sets in the size of multi petabytes but with those large data sets we can still gain
access to a given piece of data off the primary key in less than a millisecond so we're talking
microseconds there we can do queries on the secondary indexes that are single digit millisecond. And that's because of the way we handle the balance
between indexing and memory and data on the SSD
and being able to handle the indexes on SSD as well
for very, very large data sets,
but still approaching it as if it was a memory space,
if you will.
Okay I see and so I know if I recall correctly then starting off with the initial key value model I think you have expanded in recent years to also touch upon document so I think aerospike also supports Jason now and I think you also
have like a SQL interface as well yes so so one of the things that we looked at was we essentially
have figured out storage and access to data on a key on a model, both you know primary index and secondary index and we we started out with it, what I will call a simple value type, you know one type of data, whether that's essentially an object.
So with a map list structure or JSON structure, depending on how you want to cast it, if you
will.
And that allowed us to support documents or to support object persistence, if you will,
because documents as we were talking about here are really just a way to persist objects
in an object model, a programming model.
So we got to that point.
Then we started looking at what some of our customers
were doing with the product.
And that was around graph solutions.
So a number of our customers are in ad tech and in fraud.
And those are two places where identity management
or identity resolution has become ever more complex
as we try and throw more data at the problem.
It's not just your username and password anymore.
It's your username, your password, your history, trying to triangulate, you know, the devices you use, where you are, who you are, who you're with, even, you know, and be able to handle things like
that. And as we saw them doing that more and more, and doing it on top of the aerospace you know as a key value store
we we looked at an evolving pattern where some of our customers were using an open source
graph technology that i know you're very familiar with george tinker pop it's a patch project
and they had they had been doing this, themselves, basically building a solution on that and then putting it on top of Aerospike as the storage model.
And we saw that that scaled far beyond what other solutions in the graph space did.
You know, we talked about being able to scale to petabytes of information and still have the
access times remain very low. We talked about being able to balance the data between partitions.
And so they were exploiting that and doing that with graph solutions based on TinkerTop.
What we did was then started talking about, could we do that? Talking to some of these customers, right?
Talking to other customers who are looking at the landscape of other graph solutions.
You know, Neo4j is a great product.
You know, Tiger Graph's a good product and, you know, really tried to tackle the scale issue.
And we thought that there was some white space
at the level we play at, which is very large data
sets where there's a demand for near real-time performance.
And so then we started investigating TinkerPop
ourselves, started experimenting with a layer that
would tie TinkerPop back into
Aerospike as the storage mechanism, defining the data models for it, and the first thing that we've
done is come out with an identity management solution based on TinkerPop and the Gremlin
query language, if you will, right? It's more than a query language.
I would say it's a graph language, more than just a query language. And we put together a solution
that, you know, together with Aerospike, we think meets a need that exists for this high scale, high throughput, and low latency capability.
And we've been in several proof of concepts now with customers,
and we started that before we released the product, actually,
working directly with some of these customers,
and have sort of proven out the solution and
we'll move it forward you know going in the next year and support an OLAP you know an analytical
graph solution as well as the identity graph solution that we have in the marketplace today
and we'll just expand from there, building more solutions and providing it.
I should mention too, that we use a tool called G.V.
I think you're pretty aware of it.
It's an ICE tool.
It's an IDE essentially for Gremlin.
And it allows you to visualize your graph.
It helps you to step through queries and things like that.
All those things that developers really need to do and that's kind of the the path that we've gone down to and in some
sense what we're doing is putting different personalities if you will on top of a very robust
storage and access mechanism okay so it sounds like quite a journey. So I'm wondering, how long did it all take you? I mean, from the moment you became aware of clients actually using, building their own sort of ad hoc solutions for graph on top of Ferrospike, to the moment you decided that this is something that you would like to pursue as an official let's
say implementation up to the moment where you were actually able to release it?
So there are two answers. One's about three years right from the first time I saw this, you know, we have a very large customer who's in the payment systems business, and they were using a graph solution like we're talking about here, built on TinkerPop at scale, you know, with billions of vertices and thousands of edges, you know, connecting them all and and the first time i saw that i kind of went wow why
don't we do something in that space it was just a thought and i started educating myself more and
more on graph right i hired a product manager i think i called him out earlier ishan this was
and ishan started really digging into it, working with these customers directly.
That was probably another six months. And then it took us about another year to hire the team
and implement the solution. And essentially, it's taking the TinkerPop graph engine, if you will, and making Aerospike the storage mechanism for that, and really creating
a graph database. And as I said, that took about a year, it took probably six months to get
something up and running, because we were leveraging TinkerBot, you know, it's an Apache 2
solution. But then, as with everything we do, we wanted to make sure that it did scale that it
had high throughput and such and so we worked you know and refined it and we'll continue to do that
i think software is always a journey but you know in the last year we've been able to do this and
released i guess it was you know probably four or five months ago,
our first solution. And as I said, you know, we'll follow that up with,
with an OLAP solution so that people can do more analytic exploration with the graph
and do that. You know, we see that tying back into, there's this interplay in the ad tech
business. And that's a, that's a large vertical for us.
It's not the largest anymore, I think financial services is, but they share characteristics,
you know, fraud, identity management, and identity management or graph in support of
those.
But really trying to get to solutions that are more out of the box for customers.
Graphs not simple, as you know, you know, it's a different mindset.
And we're trying to package data models for specific solutions with our graph capability
as well.
Okay, yeah, that's certainly interesting.
And I think if you get to that point, it's going to be very helpful for people to be able to at least have a starting point out of the box.
So they don't have to model everything from scratch.
Modeling in graph is, well, a fine art, let's say, and it takes a while to master it.
So if you can give people a head start, I'm sure that will be appreciated.
There's also something else I think worth noting.
You did mention the fact that, well, Aerospike Graph is largely based on
TinkerPop, which is an open source framework for managing Graph.
It offers its own query language, which is called Gremlin.
But as you also briefly mentioned earlier, it's actually more than that.
So part of the appeal, let's say, that TinkerPop has is the fact that it can basically plug
into any kind of backend. So it provides a service provider interface
through which theoretically any provider
can implement that interface
and so make their database
or whatever other data management system
work as a backend for TinkerBot.
And I'm sure that this is the process
that you also went through.
And in order to achieve that and also be able to have that sort of performance that you were after, I'm sure that
it took lots of knowledge that you definitely have in-house about the specifics, the specific
APIs and implementation and everything that has to do with Aerospike,
but you also needed to get a good understanding of the TinkerPop fundamentals.
So I know that in order to do that, you actually went and hired some help and probably the best
one you could possibly get as far as TinkerPop goes. So I wonder if you'd like to share a few words on the collaboration with none other than
TinkerPop's founder.
So Marco Rodriguez, I know that you worked rather closely during that implementation
stage.
Yeah, we actually hired Marco as a consultant, and then we also hired a number of other people
who've been very active in the TinkerPop Graph open source community in working that to build
out.
And as you said, there's a service provider interface. interface and then the work we did was how best to implement that back to Aerospike
specifically in in service to identity graph models and you know the work we'll do on OLAP
right or the analytical component is to make sure that we have optimized use of Aerospike in the way it handles storage for that.
And, you know, we had the project code name was Firefly to build that interface.
It's non-trivial is what I'll say, right?
We went through some what I would call naive implementations that were not as scalable as we would like, that had certain hotspots, if you will,
where you have nodes that get hammered too much.
And we had to figure out how best to handle that.
And over time, over this last year, we've really come up with a very sophisticated interface layer to our storage.
The other thing I should mention is that the graph engine itself and this layer that talks to Aerospike have been built in a way that they scale out horizontally in a shared nothing sort of model.
So the graph engine that's essentially just requires compute and networking attachment
to Aerospike can be sized on instance types in the cloud or you know hardware that you might purchase
that are specific to that part of
the application and they scale independently to the data set so you can have your aerospike cluster
running and it's distributed but the the implementation of TinkerBot and the connectivity
to the database scale independently of that so if you have high throughput and since it's shared
nothing you can spin up nodes and spin down nodes as you have need to in terms of how much throughput
how many connections you're going to have and that ability to you know be able to elastically scale
that while you have the persistence handled elsewhere is a big cost savings to people where
they have variable workloads so there's a great deal of elasticity built into the solution as well
okay i see so um how would uh people actually get started with uh with aerospike graph and i think
there's probably two different uh categories to address here so existing aerospike Graph. And I think that's probably two different categories to address here.
So existing Aerospike users who presumably already know the basics
of how to use vanilla Aerospike, let's say.
So how would they get started using Graph and how would someone who's not
an existing Aerospike user get started with the new product?
So, you know, we packaged it in a way that you can deploy Aerospike and then deploy the
graph engine independently to some extent.
You can actually get a free trial of both components, both the storage engine in Aerospike
and the Aerospike Graph
service as a free trial and download that.
There are Docker images and, you know, for both, and that can be deployed relatively
easily when you go into production.
It can become a little, you know, more complex and, you know, require a little bit more sophistication.
But you can download that free trial.
The other thing that we're going to have,
and I think this will probably be Q1, maybe into Q2 of next year, we'll have a DBaaS solution.
We've recently announced our trial,
free trials on a database as a service of Aospike, just the Aerospike engine.
And we'll have a similar thing up and running, like I said, you know, in the first half of next year.
And when we have that, it'll become very easy because you'll just, you know, say I want an endpoint and be able to play with that.
And we'll handle the scaling of the graph engine and of the database behind it for you. And so that'll greatly
simplify things. But right now, the free trial can be downloaded and installed on your laptop. It can be, you know, installed on your favorite cloud
vendor.
I see. Okay.
And so that covers the actual installation part, I guess.
So what about support and documentation and education material?
So where should people look for that?
I know you already mentioned that there is a graphical interface specifically for the graph part that people can use. So it's called G.V. And as far as I know,
you have a partnership with the vendor behind G.V. We're actually including links to download
the community version of G.V. our free trial and you know when you
download it for you know pretty pretty nominal fee you can buy the Enterprise versions of g.v
through through g.v.com right they're a great partner and uh you know it's it's work that was
done by somebody who's used tinker pop a lot
themselves and saw the need for this tool and built it out but yeah that's easily done i think
that the documentation on the solution um is all available on our website you can go to our dev hub you know devhub.aerospike.com and there there are forums where people you know
can get help on doing this you can get access to the documentation you don't have to buy anything
to go read the documentation and understand that you can get the free trial and and get help through the forums on our developer hub.
Okay, I see.
And you also mentioned something else previously about the primary use cases that you are targeting
with this release, at least initially.
So you mentioned ad tech and finance and the thing that these two domains have in
common, so building identity graphs and those are typically used for fraud detection purposes.
And I think it would help to add a little bit of background to that. So the reason that graph is a good match for that is that
because well lots of graph algorithms can be beneficial for this particular use case. So
you have algorithms such as PageRank or BreadFirst and so on that are often utilized in that context
because in order you also mentioned the fact that,
well, it's not just about, well, who you are.
It's also people these days in the context of anti-fraud
are also checking things like, well, what is your network like?
Or where are you based?
Or how often do you try to execute certain actions and so on.
So graph algorithms can actually help with that type of anti-fraud action.
So I'm wondering, in order for people to actually utilize those algorithms that are to a large extent,
at least, supported
inherently, let's say, by TinkerPop.
So where should they be looking?
Do you have maybe some kind of pointers for them?
Like if I want to implement an anti-fraud solution, for example, do you have existing
use cases that you can direct them to?
We have some use cases written up.
One thing I would say is that in general, and George, you and I have talked about this
before, the graph solution world is one that's still evolving and new ideas are popping up
all the time but what's driving it is really
the the need to have alternate ways to you know just cookies and you know is is one of the things
that's happening you know people became a concern that Google and Apple might take cookies away, you know, from Chrome and Safari. And then how would
they be able to get the same level of validation? The other thing that I think is, in some ways,
a bigger driver is that in order to validate things, you know, there are bad guys in the world,
bad actors, and they're always learning how to spoof or fake more and more of what we all use to try and, you know, validate identities and make sure it's you that's moving the money and not somebody else.
But applying more data and more information about you and who you might be related to and you know if if i'll give you an example if a
an ip address is outside of the realm or connected with bad actors or something like that or it's
a part of the world that you know there's just no way you've visited. And being able to see things like that tied with your device
that might not be your device.
Somebody may have faked that.
And then they're checking all this level of complexity
of interconnections to validate identity now.
And as they gain access to more and more data sources
and want to figure out how best to resolve
that and to make use of that information that's that's kind of what we're seeing and there's a
lot of literature out there um just um all over the place i would say you know in in the TinkerPop community, on their chat boards. There are chat boards out there just on
graph. And as I said, we have our developer hub, where we take questions and can answer questions
as well. You know, the developers that work on our solution, you know, monitor periodically those things, the product
manager monitors them, and we respond with, you know, information on it all there as well.
Okay. All right. So, well, then I guess now it's time for me to wear my analyst hat. And as an analyst, graph and graph database is a space
I've been covering for a while. So one of the characteristics, let's say, of this space is that,
well, there's lots of divergence. There are many solutions out there. Well, some people say there's even too many of them. So last time I checked on DB
Edgins, there were about 60 different graph databases in total, which is a lot, obviously.
So whenever there's a newcomer, let's say, in this market, the first question that naturally
comes to mind is like, well, okay, first of
all, welcome and nice to meet you.
But then, all right, so after the initial welcome, what makes you special?
So what is it about vendor XYZ?
In this case, what is it about Aerospike that makes it stand out from the already numerous solutions out there?
So, George, I think that's key. Typically, markets will have many, many participants.
We've got 60, 70 graph solutions out there, as you mentioned. We're in a space, the database space, that's evolving pretty fast right now.
And there are literally hundreds and hundreds, you know, 400, 500 different databases that are all competing for some space.
One of the things that we've done is differentiate along a couple specific vectors right i mentioned
you know the notion of real time and and it's not just it's not just being able to respond
to queries with low latency it's the ability to do that consistently regardless of what the load is, okay? Meaning if you have, you know, a few hundred
people coming to your database and trying to get a graph answer, you know, that's pretty doable
for most, you know, solutions. If you have hundreds of thousands per second, if you have millions of queries per second, then it's a different question.
And the ability to keep up with that is something that in our database we focus on and something
that we've focused on in how we built out our implementation of TinkerPop.
You know, I mentioned that having it be able to scale horizontally,
very fluidly, and then exploit the ability of Aerospike to handle the multiple connections back from our, you know, interface to TinkerPop, the layer we built there, and have that scale out
as well. And to do that in a consistent manner, you know, with low latency, even when you have,
you know, high workloads, and even when the data set has grown to significant size, you know,
we have a t-shirt that we hand out at meetups for Aerospike that says, write once, scale forever.
And one of the things we really believe is that,
you know, the application of more data,
the, you know, we've come out with a term we use a lot now called aspirational scale.
You know, even if your solution in the beginning
has, you know, thousands of of users not hundreds of thousands not millions
of users right you need to plan for that number of users that much throughput you also need to plan
for data sets growing one of the things we've seen over the last five years for sure this has been going on with aerospike for i don't know
let's let's say we're going into our 15th year now of being a company that people are adding
more and more data sources all the time an anecdote i'll tell you is that we have a customer in the ad tech space and I naively
ask, you know, how many data sources do you add a year?
And he said, well, we don't look at it that way.
We add tens of data sources a month.
And you know, all of this to refine exactly who's there,
what kind of knowledge you can have about the person,
you know, presenting themselves, if you will,
and to tailor and really understand where
and what types of ads might go to that person.
And by the same token, when it's financial services,
you know, what does it really mean
uh is that the person is this activity normal for them is that does that activity match up with the
size of the transaction you know where they are and things like if you or I you know show up in
a part of the world we never go to and ask for sums of money that we don't
normally ask for to sell stocks and transfer the funds we'll probably be denied at the same token
when we go down and you know exercise some things out of our you know portfolios because we
it's time to buy a house or time to buy a new car
and we want a down payment. We don't want a lot of friction in that, right? And the ability to
apply all that data to mitigate risk, if you will, but do it in a way that's relatively frictionless
is what's driving all this the other thing i'll say
is that there's a there's a new thing happening this is this is behind a lot of our the demand
we're seeing for graph solution is that in streaming media you know tv is no longer tv
it's an internet app if you will right and so people are wanting to understand who's in the room
watching TV. Might be the neighbors. They might be the more interesting person to advertise to,
right? And so they know they're trying to buy a car, so they'll get car commercials,
even though it's your house, right? Because they know, you know, what devices are in the room, and where those
devices have been and things like that. And so there's so many different, you know, vectors of
information that are coming in right now, and that people want to apply. One of the few technologies
that can do that well, is graph technology, because because it's about associations if you will.
Okay so I guess the takeaway would be that it sounds like you consider performance and
scalability to be aerospace graphs differentiating factors right? Yeah absolutely and I just always
want to reiterate that performance is not just low
latency it's also the throughput it's the number of connections you can support the number of you
know people that can come a person on our board of directors has a great statement that he makes
he said like he mentioned the non-linearity of the internet. And what that means is, you know, when you open your service, whether it's, you know,
selling shoes or, you know, delivering laundry or delivering food, you don't know how many
people are going to show up.
It's not like a store that's physical.
And nobody wants to wait in line on the internet, right?
People expect the performance of your applications to be consistent.
And they don't care that there are 100,000 people there when they showed up or a million.
And being able to handle that throughput as well is a key factor, as well as the size of data and the low latency.
So then I guess the next question that as an analyst, I ask people when they tell me that, well, you know, we're focused on performance and so on.
It's like, fine.
Okay.
So I guess then you probably have done some benchmarking to
be able to to support that statement and so I'm guessing that also applies to your case so have
you done benchmarking and are people able to check on those benchmark results and I should also add
a disclaimer before actually hearing you out that well well, even if you have, I always encourage people to check the benchmarks and then actually also do their own local benchmarking, let's say, using data that's representative of their own use cases and a setting that's also representative of what they are able to support. Because, well,
the thing about benchmarks is that, well, they're only indicative. What really counts is the actual
performance in your actual setting. Yeah, I would agree with that strongly, George. I think that,
you know, what's given me the most comfort that we do scale and we perform.
It's not just the benchmarking work we did, you know, we've done, we've done some what I would call nominal benchmarking up to, you know, several terabytes of data, and, you know, tens of thousands or hundreds of thousands of transactions a second. But we've done some proof of concept work, POCs, with actual customers using their actual data that they wanted to.
And it bore out what we've seen in our benchmarking, which is really that, you know, for, let two to five you know hop queries you know on a graph
it's between three and seven milliseconds to get to get your answer and that that is invariant
over uh large workloads and so we've done that both with some anonymized data that we got from customers. But like, as I said, what's given me the most comfort is that they have taken their data. We've worked with, so that we can, you know, hydrate the graph, if you will.
And that those results have borne out as we have.
We're currently, you know, working on models that we'll be able to scale out and then we'll in the cloud test this out with more hardware applied.
But some of these customers, both on-prem and in the cloud, have put together very large data sets and run POCs for pretty high levels of concurrency and throughput.
Okay, I see.
So to come back to the previous topic of, well,
this market being very densely populated,
let's say by a large number of vendors sort of vying for market share.
One of the things that caught my attention recently
was that one of those vendors that relatively recently entered the market, so about five
years ago, namely Redis.
Redis is an interesting case because they seem to have worked a similar path that you are walking down
now having a sort of similar motivation let's say so initially Redis was obviously as people know
not graph database per se but because of the fact that they saw that well some of their clients are
using graph they decided to add a graph extension to their offering
and make it an official part of their product, let's say.
Now, five years later and about four or five months ago,
unfortunately, Redis made an announcement
that they're winding down that offering of theirs.
And they cited a number of reasons for doing that.
Basically, they said that, well, even though they were also very confident and very happy with their implementation and its performance and scalability and so on, what they found out actually being in that market for as long
as they have was that it was a hard sell basically because of the fact that compared to other
data models and implementations based on those models, graph was harder for people to wrap
their heads around.
Therefore, their projects took longer, they needed more help to
get started and to walk them through. And in the end, it ended up not being viable for them,
basically. That was more or less the reasoning that they gave for exiting, for sunsetting their
product the way that they did. So because of the fact that I do see some parallels
between what drove them to enter the market,
that market, and what has driven you to enter that market,
do you think that there's anything you can learn
by their experience and anything you can do differently
so that you don't end up the way they did.
Yeah, I think that we noted that when they pulled out and when I was doing my research for this,
they hadn't gotten that much traction and that's what they're saying right they just didn't get traction I think that um
some of this is a question of um sort of being fairly laser targeted you know on on what we're
going after um and and our products very different shaded focused on scale and performance and the scale part.
So Redis sort of grew out of being cash, if you will, sitting in front of other databases.
And that's still their primary use.
And it's a great solution for that, right?
And used very widely. We are more of a database
and are used to selling more at the enterprise level
and into more complex solution spaces, if you will.
For us, we've recently entered the market
with a database as a service.
It'll go commercial.
You'll be able to just purchase with your credit card or your account on the cloud vendor and do things.
But really, when you're designing enterprise-scale applications, it's not a simple thing. And getting the data models correct and
such is not that simple. So we have the infrastructure in terms of both pre-sales
and post-sales staff that help people develop complex solutions and solve complex problems and so
i read their their announcement of you know sort of withdrawal from the market and and they said
they like to focus on simple things do simple things well for you know developers
and there's a there's a space for that The white space we saw in the marketplace was not the
easy space is what I would say. It was the space where people like this payment system that I
mentioned, and some of our ad tech customers that are pursuing ad bidding for uh streaming video right that they're trying to solve complex solutions or
problem sets with complex varied amounts of data and they don't expect it to be easy and we don't
expect it to be that easy to sell into either. You know, we're always wanting to reduce the friction, hence the free trial,
hence the database as a service.
But we're focused at this space
and they're focused at easy.
And I think that, you know, that's why there's room for both Redis and Aerospike.
We have a lot of customers that have Aerospike,
you know, running, supporting, you know supporting petabyte data sets and things like that.
And then they may still use Redis, not on top of us because we don't require cache, we're so fast.
But they may have relational databases or mainframes or whatever that they're just a cache in front of.
And we see that quite a lot. Okay, so I guess then the takeaway is that, well, the market segment that
you are targeting is different, and in a way, they accept, they embrace complexity, and therefore,
you're not afraid that that complexity, which is to some extent inherent, let's say in graph, may sort of deter them.
Yeah, I think there is a space in the market,
and it just so happens to some extent that it's our customer,
our historical vertical footprint
has been strong in financial services around fraud
and strong in ad tech and that means
both streaming you know ad as well and that those spaces have a demand that they have a need you
know there's an unmet need to solve this problem that graph is a good solution for. It's not every company, but it does fit well with our focus.
Okay, so speaking of complexity actually, there's also something worth adding in that respect. So
we did mention previously the fact that, well, Aerospike Graph is built leveraging Apache Tinkerbook.
So Apache Tinkerbook, for all its strengths, also has something which is, well, different,
let's say, than the other graph labels around, especially when it comes to the query language.
So Apache Tinkerbook has its own query language, which is called Gremlin. And what's
special about Gremlin is that, as opposed to pretty much all other graph languages I'm aware
of, I think, it's a procedural query language as opposed to being declarative. What that means
in layman's terms is that, well, if you write a query in any other query language, be it Sparkle or Cypher or whatever,
you don't have to actually specify how you want your operation to be done.
You just specify what it is you want done.
In Gremlin, it's a bit different.
You actually have to specify step by step how you want your query to run. And for some people that's, well, some people love it
because they like that fine grained control and that way we should also mention that it's possible
to have to exercise control over how your query is going to run. And if you are very familiar with your data and the way it's distributed and you know
things such as frequency indicators and all of those things you do have a fine-grained control
over how your query runs and therefore you can make it run in a more efficient way. The flip
side of that is obviously that well it's more complex to write a query in Gremlin as it is to write, to express the same query in other query languages.
So having said all that, I'm wondering what's your take on graph query, different graph query languages on top of Gremlin
is something you may consider in the not so distant future.
Yeah, so I think that this ties back to a theme
in our discussion here, which is,
we tend to solve data management problems for the most complex and the most highly scaled,
you know, demanding solutions, right? And so, as you mentioned, having that control,
being able to optimize it, spending that extra time maybe in developing that application matters in those situations
because it has to run efficiently and, you know, with high performance. That's the type of
customers we service. That said, you know, we've been trying to make, you know, use of computers
easier and easier and easier all the time to make everybody
be a programmer maybe generative ai solves all of this right but i think with with respect to
what we're talking about here we we also are looking at supporting the Cummings standards. You know, there's now finally sort of a standards effort around
graph languages and they derive from Cypher,
you know, and that that language that you mentioned.
And there is the open Cypher project
on on top of TinkerPop.
And, you know, one of the things we found, we started looking at it and we recently decided to
do OLAP primarily from customer demand, but it turns out that us providing the analytical
capability would be easier than doing the work to support Cypher, Open open Cypher and replace Gremlin or to have a pluggable model like that.
And we decided that that effort is going to be substantial, but we also decided that we'll wait
for the standard to solidify. And instead of implementing Cypher, we'll implement that
standard, which is going to derive from that open Cypher work, I think, right?
It's going on in the Apache community.
Yeah, yeah, it is.
It is going on indeed.
And yeah, like you said, it's work that we expect to actually solidify sometime soon.
So I can see the point in waiting for that to come out rather than, well, sort of
not exactly rushing out. We always have to think about the cost in the game as well, right?
Indeed. So I guess based on what you said, I can already infer the next things that you're working on. You already mentioned
adding analytical capabilities, so all up, and that's probably more immediate than the mid to
long term goal of adding another layer of graph query language. Is there anything else that's in your roadmap for the coming period?
I think that the other piece that I did mention was that we will be putting this up on our
database as a service solution. And what I will say is everybody wants to move to the cloud it's not trivial
to get that right and so we've already started initial work on that you know there are a lot
of learnings from us as we made you know multiple starts at putting up a database as a service just around Aerospike and its use as a database.
And doing this on top of with the Gremlin and TinkerPop scaling in one manner
and the database scaling in another and really making that be opaque to the end user who just wants to spin up a cluster of aerospace graph is something
we've already started work on. And you'll see that come out, as I said, sometime next year,
hopefully before the end of the first half of the year. Yeah. I do realize, as you also pointed out,
that it may seem trivial if you are only interested
in using the end product, the service, but it's actually not at all if you're the one
who has to implement it.
So I understand it's going to take you a while.
Exactly.
Okay, great.
Thanks.
I think we're about at the top of the hour, so it's probably about time we wrap up as well. So I don't know if there's anything else that we didn't touch upon and you feel like we should, but if not, I would like to thank you for joining today and for sharing what you have with me and the audience.
Well, thank you, George, for having us, you know,
for having Aerospike participate in your podcast and, you know,
always love to talk to you and, you know,
have learned a lot and from interactions with you and, you know,
thanks for your support for the graph community in general, right?
I think that it's still one of those technologies that needs evangelists broadly, and you're one of the leaders there.
So thanks for that.
Thanks for sticking around.
For more stories like this, check the link in bio and follow link data orchestration.