Orchestrate all the Things - Graphs as a foundational technology stack: analytics, AI, and hardware. Featuring Neo4j CEO and Founder Emil Eifrem, Graph Data Science Director Alicia Frame
Episode Date: May 28, 2021Graphs are everywhere. That has been the motto of graph afficionados for years, and now it seems that the world is waking up to this. How would you feel if you saw demand for your favorite topic..., which also happens to be your line of business, grow 1000% in two-years time? Vindicated, overjoyed, and a bit overstretched in trying to keep up with demand, probably. Although Emil Eifrem never used those exact words when we discussed the past, present and future of graphs, that's a reasonable projection to make. Eifrem is the CEO and co-founder of Neo4j, a graph database company which lays claims to having popularized the term "graph database", and to leading the graph database category. Eifrem and Neo4j's story and insights are interesting because through them we can trace what is shaping up as a foundational technology stack for the 2020s and beyond: graphs. "Graph Relates Everything" is how Gartner put it, when including graphs in its top 10 data and analytics technology trends for 2021. Interest is expanding as graph data takes on a role in master data management, tracking laundered money, connecting Facebook friends and powering Google, in search and beyond. Think Panama Papers researchers, NASA engineers, and Fortune 500 leaders: they all use graphs. Here's why, and how. Article published on VentureBeat. Image: Getty Images
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
Graphs are everywhere.
That has been the motto of graph aficionados for years
and now it seems that the world is waking up to this.
How would you feel if you saw demand for your favorite topic,
which also happens to be your line of business,
grow 1000% in two years' time?
Vindicated, overjoyed and and a bit over-threatened
in trying to keep up with demand.
Although Emil Eifrem never used those exact words
when we discussed the past, present, and future of graphs,
that's a reasonable projection to make.
Eifrem is the CEO and co-founder of Neo4j,
a graph database company which lays claims
to having popularized the term graph database and to leading the graph database category.
Whether you subscribe to that view or not, AFREM and Neo4j's story and insights are
interesting because through them we can trace what is shaping up as a foundational technology
stack for the 2020s and beyond.
Graphs. foundational technology stack for the 2020s and beyond, graphs. Graph relates everything is how
Gartner put it when including graphs in its top 10 data and analytics technology trends for 2021.
Interest is expanding as graph data takes on a role in master data management, tracking laundered
money, connecting Facebook friends and powering Google in search and beyond.
Think Panama Paper researchers, NASA engineers, and Fortune 500 leaders.
They all use graphs.
Here's why and how.
I hope you will enjoy the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.
Sure. Yeah, so my name is Emil Afrem and I'm the
CEO and one of the co-founders of a company called Neo4j. And we are at the highest level,
we're a database company and specifically our point of view is that the world is becoming
increasingly connected and that spills over into data. So data then is becoming increasingly connected.
It becomes increasingly valuable to be able to figure out how things fit together. And we believe
that the best architecture to do that successfully is what's called a graph database. And so we're
a graph database company. We were founded back in Sweden in the early days where we ran into the problems ourselves, the co-founders, where we were building enterprise content management systems that were highly connected.
You can imagine files that belong to folders that belong to other folders, which forms a hierarchy.
And then you add symbolic links or shortcuts to that, which turns the hierarchy or the tree into a network or a graph,
you know, mathematics, you know, graph is the synonym with network, right? And then we added
access rights and permissions and stuff like that. It all became this big connected hairy mess that
we tried to shove into just a traditional relational database, which worked, but it was painful.
And we're recording this in 2021.
And today, as an industry, we have a fairly nuanced understanding of it today, right?
But back in those days, it was harder, right?
But basically, the fundamental problem was that there was a mismatch between the shape of the data and the abstractions exposed to us by the underlying infrastructure, right? So we had a square hole in the round peg, you know, type of a thing going on.
And that's what ultimately led to us taking a step back and saying, you know what, you know,
what if we had a different building block, right? Different building blocks exposed by the underlying
database, rather than tables with rows and columns. What if we had nodes and relationships between those nodes and key value properties on both of them? Then we can
easily model out these files and the folders and the users belonging to groups belonging to other
groups and the permissions between them. That was super easy, right? And so that was the original
kind of birth of the Spark, as it were. And we ended up building that, lots of trial and error along the way.
We initially tried to layer it on top of Postgres, for example, just as a layer that worked from a programmatic API perspective that didn't work from a performance and scalability perspective and so on and so forth.
But ultimately, we ended up with what we today
know as Neo4j and the property graph model. So that's a little bit kind of the origin. And as
for the company today, we are probably around 500 people today. We started the year around 400.
We want to grow to 600 by the end of the year. So we're in hyper
growth phase. We're headquartered in Silicon Valley, but have our R&D in Europe, in Malmo,
Sweden, in London, but also elsewhere in Europe. And we sell into the global 2000. So odds are that anyone who's listening to this podcast will have used Neo4j directly or indirectly at least this week through a variety of different applications.
The 20 biggest banks in North America are all using Neo4j as an example.
And I could go on the list.
But that gives a little bit of a flavor of what the company is and the origin story.
Great. Great. Thank you. And just to give a little bit of more contextual background, if you will. So
I have to say that, well, you and I first met of me doing a piece on graph databases on a mainstream publication was kind of, well, it wasn't something that was done that often.
A lot of water has flowed under the bridge since.
You can say it, George. It was crazy.
Well, I wouldn't say it was exactly crazy, but well, not something many people would do, let's say.
But yeah, obviously lots of water has flowed under the bridge since
and graph is, I would say, pretty much mainstream today.
So it's something that the gardeners and foresters of the world
often write about.
And so as a colleague said,
this is the vindication of George Danadiotis.
Well, I wouldn't personalize it that much, but well, it's good to see.
And I'm sure you feel in a similar way, having been in this space longer than most.
So one thing I'd like to emphasize here is that, well, you started by saying that Neo4j is a graph database, which is obviously true.
But I think that you have actually kind of grown as a company since then.
Well, you obviously have.
I mean, you mentioned some numbers, but this is just one aspect of how you have grown.
I think you have also grown in terms of scope.
So you're not really just a graph database anymore.
And that has basically to do, I think, with the fact that, well, as you say yourselves, graphs are everywhere. So you have expanded from being just a graph database to
covering other areas as well. So we have graph analytics, we have knowledge graphs, and we also
have graph machine learning, data science, AI, however you want to call it, which is what Alicia Frame was also with us today.
He's going to talk to us about a bit later.
So I wonder if you'd like to say a few words about each of those areas.
How do you think they relate to each other and how you as a company are approaching those?
Yeah, so maybe let me start
and kind of go through kind of my point of view
and maybe I end on graph data science,
which is a good handoff to Alicia,
who's the expert in that field
and then she can add a lot of color.
Yeah, I think you're spot on, right?
Like if you think back on what's our original point of view,
our original point of view is that the world is becoming more connected
and therefore data is becoming more connected, you know,
and that's not something that is purely used for,
let's call it operational data stores.
So developers building applications,
those applications use a database, right?
It's definitely used there and that's where we
grew up, right? And whether it's a recommendation engine or fraud detection or a customer 360,
you'll go down the list of the kind of the standard use cases for Neo4j and that's really
valuable. But then also like I tend to zoom all the way out and think of them as analytical data stores. That's where kind of the data bricks and the snowflakes live, right?
And so these are technologies used by data scientists to build machine learning and AI
do predictions, right?
And it's used by data analysts to basically to build reports, right?
And the question is, does connected data have value in that universe as well? And one of the clear things that we've seen over the past, you know, several years has
been that, man, this is really valuable on that side of the house as well.
And then I think you mentioned knowledge graphs and, you know, I'll circle back to the graph
data science and hand off to Alicia, but just a few words on kind of the knowledge graph
piece, which I think is a
really interesting area of development over the past several years. And, you know, I think you
wrote a while ago about kind of knowledge graphs and, you know, it was a great rebranding, right?
And there's always that debate if an idea is truly new. And as we all all know like almost no ideas are truly novel and new and any kind of big
successful idea be that wrapped in in the company or or elsewhere you will always find people on the
sidelines saying that they came up with it 20 years 30 years 40 years ago right um and i think
probably knowledge graph it got popularized at least by Google, you know, through the MetaWeb acquisition.
And they started talking about when you search for something on Google, if you search for Malmo, Sweden, where I happen to be recording this podcast, right?
Then, you know, at that point, it would show you a little side panel, right?
The side pane where they bring up Malmo.
It would link to the Wikipedia article and it would be able, you could click through
that and it will extract some like semantically useful data about it, right?
And that was kind of their initial use for that knowledge graph.
You would search for things, not strings was kind of the summary, right?
And I thought that is very useful.
We see that in like the way the version we have for that
is that people put a lot of like data knowledge
into Neo4j and then you have human beings
as the direct end users who look at that data, right?
And that's the equivalent of an end user searching Google
and then looking at that
data. So we definitely see that. And then of course, what Google did over time was that they
started powering their search through signals from that knowledge graph, right? And so that was very
interesting. And then a few years ago, and I guess this will be my hand off to Alicia, right? They
published a few blog posts talking about graph-based machine
learning, how they started to shift their machine learning internally to not just use discrete data
points, but also use relationships as a signal into their machine learning models. And this is
one of several clues that we got here at Neo4j saying, you know what? That data science community, they are underserved, right?
The developers, we've served the developer
reasonably well, I think, in the previous decade,
like building the graph database focused on them.
That's my background.
And I'd love to talk more about
what we have done there recently
and what we're doing more,
if you find that interesting, George.
But then we've always had some data scientists
use us, and maybe Alicia will talk a little about her. Alicia was one of them, actually, right? But
I don't know if you would agree with this statement, Alicia, but I feel like on some level,
if you think back five years ago, we would have data scientists use Neo4j kind of almost despite
us, right? It was not optimized as a form factor for data scientists.
And then what Alicia has done together with the team
is to build that out and adapt Neo4j
to work better in normal kind of data science workforce.
And we're really excited about both of those areas.
And I think if you think about, you know, the foundation is I've
got connected data and I want to store it in a graph, kind of graph data science and graph
analytics is the natural next step from there. So once you've got your data in the database and you
can kind of start looking for what you know is there, so that's your knowledge graph use case,
right? I want to find Emil, who's the CEO of Neo4j,
which is a tech company. And I can start to write queries to find what I know is in there to find
the patterns that I'm looking for. And that's kind of where data scientists got started with Neo is
I've got connected data. I want to store it in the right shape. But then the natural progression
from there is I can't possibly write every query under the sun.
I don't know what I don't know.
And I don't necessarily know what I'm looking for.
And I can't manually sift through billions of nodes.
So you want to start applying machine learning to find patterns, anomalies, and trends.
And it's how do you make sense of large volumes of connected data?
And we were really lucky to kind of see this convergence of the tech for graph maturing.
So Neo4j is a mature graph data platform.
You can store your data efficiently.
You know, it doesn't get corrupted.
You can query it.
It's highly available.
It's a really mature and robust database.
And we've got the tech to store the connected data.
And then at the same time, you have the academic research, kind of the state of the science,
developing really novel and powerful ways of using graph to make better predictions,
to uncover patterns, and to kind of represent data in a faithful way so that you can start
really machine learning with it. And when you take those two things happening at the same time, that's what brings us to our newest, you know,
our newest endeavor in graph data science, which is a library that works directly with the database.
So you've got your graph data and your graph shape, and now you can really quickly get up
and running with these state of the science techniques to learn from connections, to find
patterns and anomalies that you don't know what you're looking for, but you can say, wait a second,
this is an outlier. This part of the graph is more densely connected, or I can learn from the
structure of the graph to predict how it's going to change in the future. And that's really hand
in hand with the database of what's the next step. Now I want to
find what I don't know is there. I want to enrich my data and make predictions. And I think we're
really excited to see just how quickly the community and our users have gotten up and
running and are showing value with this. Yeah, I have to say that this is an area that I'm seeing huge traction in.
I mean, not necessarily, not exclusively in terms of Neo4j's graph data science platform,
which obviously helps, but as you also pointed out, as a discipline in general, let's say.
So graph machine learning is, you know, people are just putting new papers almost daily
and there's like huge progress and huge traction.
So it's definitely an area of growth and new research
and new applications that are coming up.
So to bring all those areas together,
basically it looks like graph databases and Neo4j in particular is going,
basically it's growing from just being a graph database, which is no small feat, but just being
a graph database to actually being a platform for developing a series, an array of applications, including data science applications.
So I watched Emil's keynote from last year's Nodes, Nodes 2020,
in which you were in as well, Alithea, which was, I found it really interesting,
mostly because it summarized all these disparate areas really well and
it brought them together.
So some areas that Emil mentioned in that keynote that he emphasized as key to growing
to a platform was the Swift to the cloud, the emphasis on developers, graph data science
which you just referred to, and the evolution of
the graph model. So I would like to ask Emil to just briefly recap those for those among
people who may be listening who didn't have a chance to see that keynote, and I think it would help to establish that context.
Yeah, so just a little bit of background.
So, you know, Nodes 2020, you said,
you referred to Nodes 2020, George,
which is our online conference, which we actually,
we were early adopters.
We did an online only developers conference
back in 2019, pre-pandemic. So we felt that we were ahead of the
game on that one. And then obviously in 2020, it was very much an online one. Very popular,
over 13,000 people attend, developers, data scientists. It's really practitioner centric.
And in my keynote, I took the liberty, I guess I got excited about the fact that we just walked into a new decade, right?
And so I took the liberty of zooming all the way out and say, hey, all right, as a community, what did we achieve in the previous decade in the 2010s, right?
And how do I look at the 2020s, right? And how do I look at the 2020s, right?
And then my, what I think you're referring to is that
I believe there's like, there's four pillars
that are gonna shape the world of graphs in the 2020s.
And they're the ones that you just talked about,
which is the first one is the shift to the cloud.
And this is hardly a controversial statement right
this is a big secular shift in all of technology but what's been interesting to observe is that
it's in the broader technology stack it's been the shift to the cloud has happened at a different
pace for different layers and a little bit counterintuitively, it first hit applications
and at the absolute top of the stack, and then virtual machines, VMs at the, almost at the bottom
of this, not quite at the bottom of the stack, right? But almost at the bottom, right? If you
think about things like Hotmail, whatever, Gmail, Google Maps, right? On the consumer side, and
maybe things like Salesforce and CRM, right?
That's very much in the application layer. And then you think about EC2, right, the first AWS
service, right? That's at the absolute bottom of the stack almost, like with virtual machines,
right? And then data in there, it took a while for that to happen, right? Because of data gravity, right? And because of regulatory concerns, right?
And so it took a while, but a few years ago,
towards the end of the decade, 17, 18,
it was very clear that it was happening.
We started hearing that it was the fastest growing part
of AWS, and I'm sure GCP and Azure as well,
were the database services, right?
We saw the success of MongoDB Atlas. It
was very clear that, you know, individual developers, small startups, and very importantly,
enterprises alike were comfortable with consuming databases as a service in the public cloud,
right? And so, of course, the graph database space is not different there at all. Right. And this is a shift that is going to be ongoing for the majority of this decade.
And it's huge.
I'm actually one of the people who believe that while it's only going in one direction,
right, it's going to be more and more public cloud consumption of database products.
I don't think it's going to pan out at 100%.
I still think that
the local form factor is important. I think developers want to run their database on their
laptops. I think CIOs want to be able to deploy in their hybrid cloud, in their private data center.
And we can all quote Fortran and Cab is around many, many decades later.
And we all have those kinds of references.
And so even if I were to start a graph database company
from scratch today, God forbid, but even if I were to do that,
I would be cloud first, but not cloud only.
And so that's an important thing, right?
But so that's kind of the first pillar.
And the second pillar is practitioner centricity, right? And this is also part of a broader pattern
in just how technology is becoming just consumed,
even inside the enterprise today,
which is the buyer has a lot more choice, right?
And there was talks before about kind of the consumerization of the enterprise,
which people thought of as bringing your own device,
the fact that people would take their iPhones rather than their BlackBerrys to work
and their iPads and things like that, right?
But that's a broader scheme.
And you see that developers and data scientists increasingly are able to pick their own,
choose their own tools, right?
And if you then marry that up with the first trend around kind of a cloud consumption of software, and then you add a third trend around there, which is consumption-based pricing. is I can swipe my credit card and expense up to some amount, right?
And get started without asking IT, without getting approval, like anything like that.
And all of a sudden, like you can get foothold without selling in top down into the CIO with
beautiful slides.
And that's just a point of view we've always had as a company.
That's why we're open source in the beginning.
We want to win by being the best product.
We want to win by making sure that we solve problems for the practitioners, the people
who use our product every day.
We want to be awesome for them.
And if we can then have pretty slides for the CIO, hallelujah, that's amazing.
But we don't sell in by the pretty slides and have the CIO force that down into our
organization,
right? And so that's kind of how we, you know, it's both almost a philosophical point of view
on how we're building our company, as well, just an observation of a broader trend that
that is how graph databases and graph technologies will be adopted in the 2020s.
So that's kind of the second one.
And maybe I'll ask Alicia to flesh out, you know,
the graph data science piece a little bit more,
if you're okay with that, George,
and then I'll happily speak to the fourth trend,
which is I find really important as well.
I mean, in terms of graph data science,
it's really giving people, giving data scientists
the power to do what they know they want to do.
So like my academic background is I've spent 10 years using Neo4j and writing kind of graph data science code from scratch in Python.
And I joke that like I have now put myself out of any future job by implementing a library.
So you don't have to hire your PhD data scientist.
But as data science has exploded as a field, there's this tremendous appetite to do, you know,
I need the competitive advantage. I want to do the best that's out there. I want to get those
extra three percentage points of accuracy. And I think graph really fills that niche of,
and it's not a niche, it's like a fundamental requirement of, I need to represent my data in a faithful way so that what I'm doing machine learning on represents the real world.
And the real world isn't rows and columns.
The real world is connected concepts and it's really complex and there's this extended network topology that data scientists
want to reason about. And when we talk about that as, you know, a pillar for the next decade to come
with graphs, it's about getting people up and running with making it easy to get your data
into a graph shape and start asking questions. What's unusual? What's clustered together?
What's important? What's connected to what.
And then you can kind of accelerate from there to say, okay, I want to predict how this graph
is going to change in the future. I want to transform this graph in a way that I can integrate
it with my deployment machine learning pipeline that I already have built and make it super easy
for me to get that into production. And going back to what Emma was saying about,
you know, we want developers and data scientists,
we want to make it easy for them.
They shouldn't have to go to their fifth level manager
and write up, you know, this is why I need a graph.
They should be able to download it off our website,
go to Aura, get it in their hands quickly and show value.
So for us, really the last year
has been about decreasing the friction so that data scientists have the runway and show value. So for us really the last year has been about decreasing the friction
so that data scientists have the runway
to show value quickly and evolve with graph.
Anything just to kind of add on to pillar three
and then I'll cover pillar four.
I actually think that if we pick a far enough out timelines we don't start to
argue about specific specific years right so let's say we we with my keynote i had a 10 year like a
decade decade point of view so let's just take by the end of the decade by 2030 right i believe
every single machine learning model on the planet will use relationships as a signal, right? Google already
moved over there a few years ago, right? And it's been proven out the relationships are a really
strong predictor of behavior. So we're going to end up in a situation where data scientists,
they build their models and they're in a specific industry. They might be in banking,
capturing fraud, or they might be doing recommendations in retail
or whatever it might be.
And if their competitors are using relationships
as a signal, right?
Then they're gonna have better predictions, right?
And then the, you know, back to Adam Smith, right?
It's the invisible hand will then, you know,
through competition force them to start using that as well.
So I think that's
going to happen with or without Neo4j, right? That's just how it will unfold, right? And then
our opportunity then is to make sure that we are the best platform for them to do this. And I was
talking to an analyst from Gartner, you know, half a year, almost nine months ago now.
And he's the lead AI and machine learning analyst for Gartner.
So obviously Gartner being the premier industry analyst firm, he fields all the inquiries about AI and machine learning from the entire enterprise, right?
That's a really interesting proprietary data set just to look at what are they thinking about?
What are they asking about, right?
And what he told us was that back in 2018,
5% of his inquiries were about graphs.
In 2019, 20% of his inquiries were about graphs.
And then he said that by 2020, year to date,
so this is maybe September timeframe
or something like that in 2020,
he said that of all the inquiries that he got,
so everything related to AI and machine learning
for the enterprise, 50% of them were related to graphs.
So just think about that.
That's a pretty mind blowing thing, right?
And they've actually been, now they're on record,
like in their trends report from February of this year,
they actually put that in print.
It wasn't just a casual conversation anymore that 50% of the inquiries were about AI and
ML inquiries were about graphs, right?
So I think that's the big opportunity here.
And so we're really excited about it.
So that's then kind of the third pillar.
The fourth pillar is, you said the evolution of the property graph model.
I think of it a little bit differently.
I think of it maybe in the following way.
I think this is the decade where the property graph model is going to live up to its full potential.
And what I mean by that is that if you take a few steps back and you look at the previous decade,
like on some level, what have we achieved collectively?
And then we, I mean, us here at Neo4j,
but our community, our customers, the competitors,
other graph database vendors, industry analysts
and writers like you, George,
like collectively what have we achieved, right?
We've gotten graph databases established
as a category, right? It was just a term that we made up 10 plus years ago, right? And then,
you know, we filled that with meaning and we've evangelized the concept and undeniably it's a
category today with every single big enterprise software vendor is launching a graph database
offering and there's a dedicated Forrester wave to graph
databases and so on and so forth. Undeniably it's a category in data today. If you zoom in a little
bit and you look at where is the depth of adoption today and here at Neo4j we have a really good,
talked about the proprietary data set of Gartner hearing what the enterprise is thinking. We have a really good, talked about the proprietary data set of Gartner hearing what the enterprise is thinking.
We have a really good kind of proprietary view because we probably have this ability to every single commercial graph database deployment in the world.
And we're not always chosen.
Maybe we're chosen 8% of the time or something like that, right?
But we're always in the loop as the leader of the space, right?
And so we see all of these things happening, right?
And really the depth of commercial adoption, and I'll choose that lens because we obviously have, we have
hundreds of thousands of developers that build applications on top of Neo4j, right? But that's
open source. We don't really have the same level of visibility, right? So if you look at the depth
of the commercial adoption, it really is where we solve a very deep pain point,
typically related to performance and scalability, right?
And so this is, you know, you are a big retailer and you want to do personalization based on
your clickstream data, right?
So every single click on your website, on your Fortune 50 website, right?
You store in a massive Neo4j cluster. And then as
they click around based on George put this product into the shopping cart, I'm going to say that,
hey, George put a hammer in his shopping cart. I'm going to generate a promotion for nails,
right? So if you buy hammers and nails, you apply a discount. That's done in real time as you click
around and then you view this page,
we're going to generate the recommendation based on that. So that is impossible to do in a
relational database. The joints would kill you. You can't physically do it. And so that's a great
example of where in the previous decade, that's the depth of adoption of graph databases today.
I think that's going to continue because of that trend that we talked about before. Data is becoming increasingly
connected. Use cases that weren't use cases when we went to the market are now sweet spot use cases
for graphs, right? And so, for example, when you first went to the market, supply chain was not a
use case for us, right? Because you grab a random manufacturing company and they went to the market, supply chain was not a use case for us, right? Because you
grab a random manufacturing company and they would have a supply chain that is two, three levels deep
or something like that, which is if you want to be forward leaning, right? You know, 20 years ago,
something like that, right? And you would digitalize that. You can store that in a relational database,
right? It's two, three hops. That's doable's doable with a few joins right fast forward to today and any company that ships stuff they tap into this global fine-grained
mesh spanning continent to continent and all of a sudden like a ship blocks the suez canal which
you know i guess this is a podcast people might realize that just happened like a month ago in 2021, where like the Suez Canal
was blocked for a week, right?
And then you have to figure out how does that affect my business, right?
And how does that affect, more importantly, my customers, right?
And the only way you can do that is by digitalizing it.
And then you can reason about it and do cascading effects, root cause effects,
and so on and so forth.
And now in 2021,
you're no longer talking about two, three hops.
You're talking about a supply chain
that is 20, 30 levels deep, right?
That all of a sudden requires you
to use a graph database, right?
And so that's an example of this win
behind our back in the graph industry, right?
Where use cases that didn't
used to be kind of targeted to us, now you have to use a graph database because the level of
connected data is just exploding in the world, right? So that's going to continue, right? That's
part of trend number four. And really, if you think about what I said there, why is it impossible? It's down to performance. So it's the same core driver as what was the driver in the previous decade, performance and scalability.
But what pillar number four was also about was that, you know what?
We can go in this decade, we as an industry have an opportunity to go much, much further.
And this is because if you think about the core
value proposition of a graph database, performance and scalability is one of them. But intuitiveness
and flexibility and agility, those are the other two. Intuitiveness, flexibility, agility. Those
are the other two core benefits of a graph database. And if you think about what I just said,
the commercial depth of adoption is not related to those two. Why? It's because I think Neo4j today has an amazing developer experience. It's the best that I know of in the graph we're doing there. We're very just self-critical, right?
This is a startup.
It's fast growth.
We're learning and we're improving.
You know, we have, when you and I last spoke,
George, we had about a hundred engineers, right?
I don't know exactly how many we have now,
but by the end of the year,
we're going to have 200 engineers, right?
Which is, you know, probably five to 10 times bigger than anyone else investing in graph
technology.
And the key focus there is reducing practitioner friction, right?
So in other words, ease of use.
And as we do that, I believe that the property graph model is going to be used for graphs
that are 10 records big, right?
10 nodes and
relationships and 100 nodes and relationships, just because the model is a better fit with most
applications. And that is something that the world at large does not understand. People think that
a document database or a relational database is a better fit for most domain models. And that's
actually not true. If you look at it, once get developer experience on par as good as I feel like it should be,
I think we have this opportunity
to become the first database of choice
for most new applications.
And that, if you then take that decade long view
of the industry in the 2020s, that's the fourth pillar.
Yeah, yeah, thanks.
And actually I couldn't help but be reminded of something I read somewhere by an analyst. I don't even remember his name, but it was something along the lines. It was on the occasion of a recent funding round in this space. It wasn't yours, but it was something along the lines well i don't get it okay so fine you know maybe this database is the fastest thing in the world but so what what what's with this crazy valuation that they get and to me
it was like well it's not about speed well in some ways it is as you also mentioned uh because you
know just the ability to run some things in you know in in time that that in a time that that
matters for your application is important but it's not really about speed.
It's about leveraging connections.
And this is the part that, well, I guess some people still don't get.
But that's our opportunity in this decade to educate people about that.
Yeah.
Yeah.
But I want to now take the opportunity to get a little bit more in the
weeds in those four key pillars that you mentioned.
And well, let's start with cloud.
And I know that Neo4j, you started with your cloud offering called Aura in 2019.
And I think it was quite recently, actually, that you won GA.
And so I want you, if you were able to just recap the journey so far,
and one particular aspect that I'm really interested in,
and it also has to do with the fact
that you're an open source company,
is that you were among the first cohort
that Google Cloud struck a deal with to partner
in terms of how you work with them so yeah
I wonder if we can shed some some light on that and how how how the adoption of
Google Neo4j Aura is going and what your plans for the future are yeah so it's a
couple of thoughts there we have broadly, two tiers of our cloud offering. One is called Aura Professional,
and that is what's called a self-serve tier, right? So what that means is that you sign up,
you swipe your credit card, right? And as a, you know, low starting, you know, 50, 60 bucks per
month, you know, type of starting point, you know, would build like hourly, right? So you
can really use it at a super small scale and it grows elastically with your usage,
credit card based monthly billing. You don't have to talk to a human being or anything like that.
You can just sign up. And that's the one that we announced and launched by the end of 2019. And so we've had
a full year and some of that one. And that one is going really well. We have over 800 customers
right now, which is a pretty stark amount if you think through level of adoption inside of the
graph database space, generally speaking. And there's a lot of exciting things that we want to do there that I'm not gonna preview quite yet,
but watch this space over the next several months.
There's a lot of exciting things going on there.
And it taps very much into our open source roots, right?
Where from day one,
there's been a Neo4j Community Edition
that you can get up and running with completely for free,
right, and build production applications and all with completely for free, right, and build
production applications and all that kind of stuff, right? So that's the first tier. And then the
second one is Aura Enterprise. And this is really targeting, you know, as you can hear in the name,
like the deep mission-critical enterprise deployments. And that one went GA just a few months ago in Q1 of 2021.
It's been an amazing success, right?
And so we've had companies like Levi Strauss, right?
They're public customers of Aura Enterprise,
Worldline for all the ticketing and the underground
and the subway, I guess is the term, in London, Adio, the Orchid.
We have a bunch of big global 2000 customers already up and running on Aura Enterprise.
And we then have this, and we're in GA on GCP, the Google Cloud Platform, and we are
in early access program as we're recording this in May of 2021 on AWS, right?
And so that one should go GA very soon,
probably by the time this podcast is released.
And so really excited about that one.
And then we have this really close partnership
with Google, as you mentioned,
where when Thomas Kurian took over
as the new CEO of GCP, he said, okay, we want to be an enterprise focused, but the most partner
and open source friendly cloud out there. And what he did is said, look, I want to build the
best of breed platform. And rather than trying to build my own graph database,
which some of the other cloud providers have attempted,
I want to partner with the best graph database I can find.
And so they chose six, I think, or maybe seven
top tier open source partners
amongst a variety of different product categories
like Elastic, for example, Mongo,
and Neo4j for graphs. And we're really
excited about that one. It's a very unique partnership where there are, and it includes
this what's called left-nab console integration. So that's the equivalent of when you log on to
GCP onto the GCP console is what it's called, where you choose which services you want to use.
Then Neo4j sits side by side with, for example,
Google BigQuery or Google Spanner,
like the in-house build Google products. And Neo4j Aura sits right next to them,
which is a very unique thing.
And then there's a unified billing.
So you just, you get the same bill
and consolidated support and the Google field.
So the entire customer facing organization is deeply incentivized to sell Neo4j.
So they get as much money from selling Neo4j as they make from selling a Google product.
It's extremely unusual.
It's a very, very deep partnership that we're super excited about.
Cool. Yeah. Indeed, that sounds quite a privilege to have Google as your partner and at this level
of partnership that you just mentioned. And I'm glad to hear that it's working out for you.
So as far as the second pillar goes, so developers and how
you serve them basically, you briefly mentioned GraphQL previously and that was something I was
going to ask you about as well. I know that you actually quite recently I think also released a new version of your GraphQL integration.
And this is another area which is,
which I'm seeing huge traction in GraphQL adoption
and GraphQL for databases, even more specifically, actually.
So even though it's kind of a misnomer,
so when most people hear GraphQL,
they think it's a GraphQL language, which it's not really,
but well, it does,
it does leverage kind of hub or graph structure underneath.
So it's basically a query language for a meta query language,
if you will, for APIs.
And so I wonder if you're also seeing this traction,
if you're also seeing developers wanting to use that to access Neo4j, even though you do have your own query language.
I wonder if you also have a segment that prefers to go through GraphQL and then what you do
for them, basically.
So what does your GraphQL integration entail?
Yeah, it's a super fascinating topic, right?
And you're right to call out, there's a lot of confusion in here.
You know, GraphQL, and some would argue, myself included, isn't a graph query language, like
you just said.
But it's called GraphQL, which doesn't help.
And then we have GQL, which maybe we'll talk about a little bit later if we have time,
right?
And which is the soon to be standardized
GraphQL language.
And so there's just a lot of terminology and confusion.
We got very excited about GraphQL
the moment that it launched.
At the time, it was deeply coupled
with an emerging front-end developer-centric stack
from Facebook, right?
It was part of React and Relay.
There's a bunch of technologies that were coupled, cobbled, maybe even together when it got released by Facebook.
But that seemed super fascinating and really high potential.
And so we were very early in launching an integration. This was run out of our developer relations team,
our DevRel Labs team,
which wrote this integration completely on top of Neo4j
and formed the grand stack, right?
Which is GraphQL, Apollo, and the Neo4j React
and the Neo4j database,
which forms like an easy to get started with a stack for using those technologies in a kind of a best practices integrated pre integrated way.
We saw a huge amount of traction and interest in this, both just broadly speaking in the community but also in the in the customer base and so what we did more recently is that last year we took a step back and said, all right, we did this kind of in our first attempt as the kind of GraphQL universe was rapidly evolving, right?
We tried to evolve with it.
Now we've taken the lessons learned from that, taken a step back and built what we released a couple of months ago. So I think maybe late Q1, early Q2,
2021, something like that. So just a few months ago as we're recording this, the Neo4j GraphQL
library. And really kind of our point of view of this is very much like you said, that it isn't a
graph query language. Technically speaking, it isn't, right? So trying to embed that too deeply in the database
is a mistake.
That's not gonna work.
Like you're gonna have to extend it somehow
to start doing real database query language things,
at which point it will no longer be GraphQL.
It will be something else.
So then it just becomes,
just adds to the confusion of calling that GraphQL
then right. So we felt that that wasn't that was not how we thought about the universe right.
And so then instead what we believe is that it's a really powerful way of if you're a front-end
developer to tap into the back end in a I guess I just used the word in a, in a standardized way or in a consistent way, right?
Which is rather than having to invoke a bunch of specialized custom rest
endpoints,
you have this way of interacting through GraphQL with the backend,
which front-end developers love.
And it gets them to think about graphs, both maybe the data structure,
but even just the name, like those two things in combination, right? Gets them to think about graphs, both maybe the data structure, but even just the name, like those two things in combination, right?
It gets them to think about graphs.
And so all of a sudden, we have a push for graphs from the front end layer of the stack down, which for us has been in the graph space for a long time, we've never seen before.
So that's kind of one area that we really like about kind of GraphQL. And then the other one is maybe meeting it from the other way up, which is it's just a really powerful way for the backend developer who sits and I've written some business logic that I want to expose to the front end, right?
And frequently these are separate teams.
Sometimes it's a full stack kind of developer who straddles both worlds, but frequently
it's different teams, right? And then just an easy way for them to take, all right, I've written all
this business logic, which interacts with the database, right? I want to expose that to the
front end, right? And for those people, for them to be able to do that in a really easy way, graphs
all the way down, right? From that business layer, from the API, kind of the northbound facing API
after the front end, all the way down into the database,
graphs top to bottom in the entire stack,
we think is a really powerful way.
So that's why we're excited about it.
It's also just reduces friction for developers
to adopt new technologies,
which is just generally something
that we are very interested and excited about.
Good. And to pick up on that topic, and since we're talking about developers and query
languages and such, you also briefly mentioned GQL previously. So just for people who may not know it's an acronym that stands for Graph
Query Language and it's an industry-wide effort to standardize on a graph query language
actually and you've been among the people and the companies who started this and are
actively participating and obviously it's going to be good for the industry at large,
in the same way that SQL has been good for the relational database industry.
And actually, there's a parallel there, as some people have pointed out,
that it's the first query language standardization effort
that the ISO has adopted since SQL, actually.
So that actually, I guess guess speaks to your point,
to your earlier point about the wind behind graph. So I have a question in two parts.
Part one which you know let's briefly cover since I guess you're not directly probably in touch with
the people participating in the in the standardization
committees well how is it coming along basically and whether you have a kind of estimate or a
time frame on when it's going to come to fruition and the second part is your feeling about
the impact it's going to have on the industry at large because since I guess we're also kind
of running out of time I'd like to wrap up with a kind of overview on the market.
So that's a good bridge to do that.
There's a lot to unpack in that one.
From the micro around kind of where the standard is
all the way to the macro on the market.
Maybe let me address it in the opposite order to that.
So I'll talk about the
broader market and then I'll zoom in to specifically where I think we are with GQL today.
So the database market, it's the biggest singular market in all of enterprise software.
Every single thing that we do in our digital daily life, the three of us on this call,
every one of the listeners, right? Everyone who will be reading the article, right? single thing that we do in our digital daily life, you know, the three of us on this call,
every one of the listeners, right? Everyone who will be reading, you know, the article, right?
Everything we do in our daily digital life ultimately ends up in a database somewhere,
right? And it's the biggest market in all of enterprise software, which has been static. It's been stagnant for decades from an innovation perspective. Yes, companies have been built and
they've grown and they've made a lot of money, but from an innovation perspective. Yes, companies have been built and they've grown and they've made a lot of money,
but from an innovation perspective,
it's been static up until about 10 years ago
when big data happened, when no SQL happened, right?
And there was this generation of non-relational companies
that said, you know what?
The era of the one size fits all database is over, right?
And it was hit by all these trends around the shift to the cloud,
big data, sensor proliferation,
AI and machine learning on the value side of using data.
You add that all up together along with evolution of architectures
with microservices where it was easier to swap out
when you had this big monolithic application
with the singular relational database,
it was much harder to swap it out.
Like all of this came together into this perfect storm,
which gave birth this Cambrian explosion
on new types of databases.
What we're seeing right now is that
there are a few distinct categories that are forming.
And they're much fewer than what we saw 10 years ago, right?
10 years ago, we would think about key value stores and column family with Cassandra and
stuff like that, and then documents with Mongo and things like that. And today, what we see is
there's a broad category forming around what I call document plus plus, right? Here is where Mongo sits.
Here's actually where Redis sits as well.
In reality, they fight for the same slot,
architectural slot in the same projects, right?
And even though the nuances of the slightly different data models
and slightly different pros and cons,
in reality, it's a zero-sum game
and it's a different project in a single project
between let's say a Mongo or a
Cassandra and a Redis, right? And so that's the broader document plus plus category, right?
Clearly there's a graph database category emerging and we'll get back to that one.
And then I also think there's this new SQL category, right? And it's smaller and it's
earlier than graphs are, for example, right? But here's where companies like maybe a CockroachDB
or a Yuggabyte and folks like that exist.
And then there's definitely something happening
around time series.
Let's see how big it ends up being.
But I think even from a five-year,
10-year out perspective,
time series focused databases are here to last.
So you can really start seeing the landscape
like becoming set in place now. And so what's gonna happen databases are here to last. So you can really start seeing the landscape like, you know, becoming
set in place now. And so what's going to happen over the next five, three to five years is that
a few generational database companies are going to be crowned. And they're all going to sit in
and be leaders in one of these categories. And maybe there's room for two or three or something
like that, right? But it's not room for 20 database
companies, probably room for five new or seven new or something like that. And then to your point,
we've all watched and tried to learn from the rise of the relational database in the 80s,
right? And of course, a key moment of inflection there was when the vendors came together and said,
you know what,
like, we shouldn't use our own custom query language. Let's choose a standard query language.
And it's also structured query language. So SQL, right? And they formed SQL. And that was this
lightning strike that all of a sudden made that space take off, right? And so what's happened
then in the database universe
for several decades is that people then have seen that
and tried to replicate that.
So that happened actually with,
and I'll date myself here,
with object databases in the mid 90s, right?
They tried to say, okay,
let's standardize an object query language.
And they went to the, I guess,
ANSI SQL committee at the time.
I don't think, I think this was pre ISO. And, you know, hey, let's standardize that. And think this was pre-ISO. And, hey, let's
standardize that. And they just looked at it. It's like, you know what? We can integrate
this into SQL. And so they did that. And Oracle made it a tiny little feature of Oracle. And
then that entire industry died. And then the early 2000s with XML databases, they tried
the same thing. Ultimately got rejected by the SQL committee. And they said, you know
what? We can add a little bit of XML features into SQL. And then they tried to do their own thing with like XQuery and
XPath together with W3C instead. Okay, so they tried that. And then actually with the document
databases, they had several attempts of doing that as well with Mongo and with Nickel from Couchbase.
And every single time this has happened,
the SQL community said, you know what?
This is just a feature of SQL
with exactly one exception in 40 years,
which is they looked at graph databases
and they said, you know, this is different.
This is distinct.
This is a real category emerging
that deserves to stand on its own
and it deserves its own query language.
And that's where we came together.
Together we're co-leading it with Oracle
and many vendors are participating.
And we're really excited about it.
And we think that it is not,
even though the huge signal and input into it was Cypher,
our own proprietary query language,
which we opened up a few years ago to,
to competitors.
We've never felt that we should compete on,
on,
on product surface.
Like let's all just get together and figure out the best way to query
graphs.
Right.
And then let's compete on who has the best implementation,
fastest,
most scalable,
easiest to use,
you know,
that,
that kind of stuff,
looking exactly at what happened to the relational database in the
eighties, because the real big game here is making the world aware of graph databases.
That's the big game, rather than infighting between, you know, graph database vendors, right?
And so we always had that point of view. And we're really excited about GQL as an opportunity for the
overall category. Now, in terms of, I tried to back into,
so kind of the micro part of your question then,
the standards universe, like kind of by design and necessity,
this isn't overnight type stuff
because this is multiple people trying to come together,
figuring out what's the common denominator,
what are areas
that are still rapidly evolving or where we have stark differences between the various vendors,
implementations, right? Which are areas that we have in common, right? And that takes a while,
right? And so you actually want it to be slow moving on some level, right? And so we've been
at this for a long time. We opened up Cypher in 2016, I think, maybe in 2015. So we've been at this for a long time, you know, we open sourced, opened up Cypher in 2016, I think, maybe in 2015.
So we've been at this for a long time, which then culminated into GQL several years ago, the GQL project, right?
My hope is that by 2022, we're going to see a final approved first version of the spec, not a draft, not like a non-draft version of it, probably Q4,
but more realistic, it might be, it might slip into 2023.
But those are kind of the sort of the timelines that,
at least from my vantage point that I'm seeing, you know,
on the GQL front.
It's not a bad timeline, actually, I would say.
I mean, considering the novelty
and the complexity of the whole enterprise,
it's not bad.
So I think we're almost out of time.
I don't know if you have time
for one last forward-looking question,
which has to do with hardware, actually.
Because, well, this is another kind of pet interest of mine, keeping an eye
on, you know, new AI chip architectures and the like.
And one thing that struck me about the majority of the most innovative designs that I see
coming out in this space is that they all kind of emphasize graph structure as well.
And how, you know, they're built to leverage that in a kind of compilerist way, I would
almost say.
And so the way I want to tie that to you specifically is, well, I also see a kind of emerging trend,
let's say, in some vendors that actually leverage this custom hardware.
And so I wanted to ask if that's something
you're planning to do as well,
and what you think of this space in general.
Yeah, it's a super interesting topic.
And as a, I now have to say,
post-technical person, as an ex-engineer,
who used to write VeriCode,
those are very long, right?
And things like that, you know, for custom
chips back in the, you know, back in college. I find it technically very, very fascinating,
right? But I think the, I guess the perspective that I have on it for the graph space, I think
is two things, right? One observation is the one that you just made,
which is, man, all these custom specialized,
super optimized hardware architectures that are emerging,
maybe not all of them,
but the vast majority are highlighting graphs
as an important workload for them.
And I think this comes back to the whole observation
we did before that,
man, if you're not using relationships as a way of doing your predictions, then your competitors
will, and that's going to leave you on the sidelines, right? So there's a huge just,
you know, area of focus and interest, you know, around this, around graphs in general,
which is driving this. So that's obviously a great observation,
right? So that's kind of one on the one hand. On the other hand then, Neo4j is a mass market
database, right? Like we have anywhere between 10 to 100 times deeper and more adoption than
in other graph databases that we look at, right? It's used by hundreds of thousands,
if not millions of developers and data scientists out there.
And in order to truly be useful to that scale, right?
It has to be available in one of the cloud platforms.
And so while my inner geek love that these come to market
and I look at them and I look at and they're super
fascinating do some really cool things right the way that I think about it from from a company
perspective is that it's really only worth doing for us like kind of for real once it becomes
available kind of off the shelf as it were in you, as part of the cloud platforms. Before then, it's not really
a thing. And that's exactly how it happened with SSDs for us, for example, which is, you know,
whatever, 10 years ago, just such an amazing technology for graph database storage engines.
For those of us who have a native graph architecture, right? It was amazing,
but it really wasn't available if it wasn't available on AWS.
And it's the same thing going on here.
And so those are kind of the two perspectives
that we try to balance internally
in terms of just time investment
on some of these new exciting types of hardwares.
Okay, it makes sense.
And just to tie that back into what we spoke about earlier about your
cloud deployment aura and how you started with Google and now you're about to go GA on Amazon
as well. I guess Azure is the next logical step for you to take. And maybe that's where you're
going to meet with Graphcore, one one of those vendors who is actually,
who has taken deals with Azure.
And so maybe that's the confluence
of how you meet Graph hardware as well.
Yeah, maybe that's what happens.
Yeah, that's definitely the next one in the pipeline for us.
And Graphcore, I think in particular,
there are several things that I love about Graphcore,
you know, the architecture, the way they've approached it.
And let's not forget the name.
The name is also just a good...
So between kind of GraphQL on the front end in the middle layer, like GQL, Cypher, and
a graph database as the back end and running on top of something like Graphcore,
then we truly have graphs in the entire stack,
which is an interesting thought experiment.
Let's see how it unfolds.
I hope you enjoyed the podcast.
If you like my work,
you can follow Link Data Orchestration
on Twitter, LinkedIn, and Facebook.