Orchestrate all the Things - Neo4j's roadmap in 2023: Cloud, Graph Data Science, Large Language Models and Knowledge Graphs. Featuring Neo4j CPO Sudhir Hasbe
Episode Date: July 6, 2023Neo4j recently announced new product features in collaboration with Google, as well as a new Chief Product Officer coming from Google: Sudhir Hasbe. We caught up to discuss what the future holds... for Neo4j as well as the broader graph database space. Article published on Orchestrate all the Things.
Transcript
Discussion (0)
Welcome to Orchestrate All the Things.
I'm George Anadiotis and we'll be connecting the dots together.
Stories about technology, data, AI and media
and how they flow into each other, saving our lives.
Neo4j recently announced new product features in collaboration with Google
as well as a new product manager coming from Google, Sudhir Haspe.
We caught up to discuss what the future holds for the graph database
and the greater
trends in this space. I hope you will enjoy the podcast. If you like my work, you can follow
Link Data Orchestration on Twitter, LinkedIn, and Facebook. So hello everyone and hi George.
I'm Sudhir Hasbe. I'm the Chief Product Officer at Neo4j. I live in Seattle and I have joined Neo4j since April. So I've been here for
almost three months now. I've spent a lot of time meeting with customers, learning from our
customer-facing teams, as well as learning a lot about the product and experiences that our
customers have through our own team in product
management and engineering.
Prior to joining Neo4j, I was Senior Director of Product Management for all of data analytics
at Google Cloud.
So I ran all the analytics services.
There are eight to 10 of those that we offer, including BigQuery, which is one of the
largest data analytics services in the industry. I did that for five and a half years. And prior
to that, I was at VP of Engineering at Zulily. And then before that at Microsoft, I was part of
product and program management and Xbox as well as Azure. So that's briefly about me. I'm based in Seattle and I love clouds and
everything around cloud technology as well as the cloudy weather.
Yeah, I was going to say, well, I think Seattle is a good place for
someone to be based at if you love clouds.
Yes, that's exactly right.
Well, another thing I noticed actually just quickly going through your background, basically,
is that, well, you have some interesting hopes, let's say, in your career.
I think you started actually more as an engineer, but then you quickly switched to product development,
product manager, whatever you want to call it.
And, well, you have a stint at Microsoft and then Google.
And actually, it's the last one that I find quite interesting
because I'm kind of guessing that this is,
it's through that last stint of yours that you came to know Neo4j.
And this is what led you to transition to actually working for them.
So to give people who may be listening a little bit of background, I know that Neo4j was along
the batch of companies that, open source companies actually, that made a deal with Google back
in 2019.
And that was kind of big news at the time and for good reason because Google you know at the time it
was the whole open source and commercial open source thing was was a big deal and whether you
know cloud providers were profiteering let's say off of open source projects and companies and
Google kind of made the difference by by making move. So obviously the rest of the world, we don't know the exact details of the deal, but the
general, the spirit of the agreement was basically, well, we're going to give them a fair share
of the profits since we're using their products anyway.
And besides that, the financial part, let's say of the arrangement, it also entailed sort
of close integrations.
So knowing all that, I'm guessing that this is what brought you in touch with the wonderful
world of Graph and by extension Neo4j.
And I'm guessing that that led you to eventually joining the company.
Yeah, and I can give you a little more background.
First of all, on the Google Cloud side, we of course had this strategy since Thomas came in to lead the team in, I think it was end of 2018, early 2019. source companies rather than competing heavily. This included Neo4j, but also Confluent and MongoDB
and various other partners in this realm, Elastic. And so we always went ahead and partnered really
well with these organizations. So I got an opportunity. I was actually the exec sponsor
for Confluent throughout my time there. And I always tried to go ahead and make sure that Confluent got the right
positioning messaging for all the messaging services. We had our own service, but we always
made sure that customers had the choice to pick the right service, especially if they preferred
open source. So that is absolutely, I agree with you. That was a strategy.
Giving you a little bit background on myself and graphs, like I first came in touch with graph technology when I was at Xbox and we were trying to build some of these features around social discovery and what that would look like and all of that.
Like, hey, if we knew your social graph, can we go ahead and do better recommendations for games and stuff like that? So it was nascent to those days. It was not that,
I don't think we didn't have Neo4j-like technology, but we were basically trying to
build our own graph from big data solutions. And it was challenging because it was not
that easy to go ahead and traverse graphs in
like a big data technology. Think about Hadoop and think about running something like that. It was
not Hadoop in Microsoft. Of course, it has had its own internal technology that we used to use.
So then I transitioned from there to Zulily. And Zulily is a consumer technology company. It's a marketplace.
We had more than 18,000 vendors trying to publish new products every day. They lasted for only three
days. And so it's a rotating catalog problem. And we experimented that time with Neo4j. This was
early days. We used the open source tech and a bunch of hackathons. And so we never put anything in production that time.
We were like just experimenting.
And so I had been in, I had looked at the technology.
Then I traverse and land into Google.
And we were that time, like, you know,
working with one of the open source
or one of the partners in the space
that built on top of Bigtable
and tried to go ahead and build a graph technology on top of,
that was Janus Graph, I think, that ran on top of Bigtable
and we were like partnering with them.
I didn't run the database side of the house.
I ran the analytic side, but I used to talk to my counterparts
on the database side and understand the use cases and all. And so in 2018, I wrote
this paper in Google on what is my long-term vision for data analytics in Google Cloud.
And I had this view that in the long-term, organizations will need different kinds of
compute engines running on the data that they have collected to get value from it.
And of course, it included things like, how do you do search analytics? How do you do
large scale big data analytics? How do you do artificial intelligence through the same thing?
And as part of that was also graph analytics and how graphs will come together. And internally,
we had this term called big graph, which was BigQuery plus graphs and what that would look like.
And we partnered with Neo4j as well as a couple of others in the industry to go ahead and make that vision happen.
And as in the last year or so of my being there, six months to a year, we started focusing a lot on that partnership and working with folks in that space.
Because we saw our customers like
Wayfair and all trying to use some of these technologies there. And so I think that was
pretty profound. And so I knew there was value in like, you know, big data systems have this
ability for you to collect massive amounts of data. And we have large systems like BigQuery,
they're really good columnar databases,
which means you can do aggregates really well. That's the technology, like from relational
database, which is row-based, you go to columnar, which is like column level storage, and they're
really good at storing and aggregating the whole column data and sub-aggregates and all.
What they don't have is the relationships between entities,
which means if I know Sudhir likes products and electronic products, like I buy a lot of
laptops and this, and from there, what do I like? So that level of relationship connections are
harder because you're storing everything as columns and not as like
entities and so when I saw those use cases come become really interesting and how we could add
more value that actually was eye-opening and then when I started looking at after five and a half
years what I wanted to do next with the advent of AI Gen AI, the value that graphs could add in the space in general to customers with data that we had already collected and consolidated,
I think it made a lot of sense for me to go to a graph-based company.
And of course, then the question was Neo4j is the leader in that. I started talking to ML and it just worked out.
So that's my little brief
history with graphs over time. I won't say I'm an expert in graphs yet, but I'm learning really
fast and understanding how customers are using it and where they can use it. And I met with,
I don't know, 25, 30 plus customers over the last couple of months and understood their use cases.
Well, you've been most productive, I would say,
taking into account you've only been in your new role for three months.
That's like 10 customers per month.
So not bad.
Interesting, interesting backstory.
And well, as you may know, I've also been into Graph for, well,
longer than I'd care to admit, actually.
But throughout that time, I've always been wondering, and I'm not the only one, let me tell you that.
So how come Google never really officially, let's say, got into that space?
Well, I know there have been some isolated, more or less, endeavors, let's say here and there. So I remember at some point,
like a year ago or something, when there was this initiative about knowledge graphs and how they can,
they were used internally to power some features. But I think that's, to the best of my knowledge,
that's as far as it went. So it was interesting for me to hear this this big graph thing that you
said uh is going on so did you have any idea if this is uh still going on or where this may i think
yeah so just to be clear so google does you have a massive knowledge graph that powers
search and all and there are public documents on that. On the Google Cloud side, our strategy was to focus on the core big services like BigQuery,
and the big graph was partnership with Neo4j and others in the industry to provide the graph capability natively integrated with it.
We announced this just before I left.
My last day was, I think, March 30th or so.
And I think the day before or something, we announced the actual solution between both Neo4j and BigQuery and that integration.
So that's how we thought about it.
There are various business reasons we have evaluated graphs and should we go into graph business or not
but i think that won't be fair for me to talk about in this this forum but yeah i think internally we
do have knowledge graphs we do use those uh for some of the use cases like kyc and all uh which
which the vertex team is building and working on but providing graph as a database technology, I don't think
there is, like, we didn't plan for that yet.
So maybe things will change in future.
I don't know now, but yeah, there's always interest in graph technology in various works.
Yeah, well, obviously, I don't know as much as you do about how Google thinks about it. But, you know, as an outsider, my take would be,
well, probably if Google wanted to do that,
they would have done it by now.
So it doesn't seem very likely that they would come out
and release their own graph database solution
or anything of the kind.
Probably not.
Okay, but well, since you talked about,
it sounds like almost like the last thing
you signed off, let's say, before you leave Google.
But interestingly, one of the first things to come out of Neo4j, since you also joined,
also has like a Google flavor to it.
And it actually has to do with the latest announcement that I know of, at least from Neo4j.
It has to do precisely with features that are about integrating Google's large language model capability and vertex AI in general.
Let's say with graph-specific features.
So would you like to just quickly take us through the features that you just announced?
And after you do that, let's get a little bit into the specifics of each one.
Yeah.
So first of all, yeah, like a couple of weeks back, we announced all the integrations and
use cases that we are focusing on with Vertex AI.
Just to be clear, we have those integrations with Vertex AI
and also with OpenAI and Microsoft version of OpenAI services.
So we are integrating with all the LLM platforms,
and we are also working with AWS team to integrate those with Bedrock
and other technologies that they are going to come out with.
So first of all, we are like, you know, cross cloud
and we run on all the three clouds
and just wanted to highlight that.
And so specifically about LLMs
and like talking about Vertex AI, right?
I think the whole, like, you know,
in last five to six months,
the world has really significantly shifted
with advent of chat
GPT and then announcements from Vertex AI and large language model innovations that
are happening.
And one of the main things, and I was involved in some of that technology, especially from
an analytics perspective, how we would use it in Google.
So I knew coming in what was happening and how fast this industry was
moving. And one of the things that we believe is a challenge in large language models is,
so if you think about large language models, they were actually built for understanding language and
grammar and how you build sentences. And it's like, if you read the Transformers paper, it's like if you read the transformers paper it's about the attention uh like the whole theory
is like how likely is the next word after this word what's the probability of the next word being
the word that we are generating and all that and so in in it's great for creative generation of
content it is a challenge when you're looking at factual data and it doesn't have the facts with it in many
cases, especially the enterprise domain knowledge that exists in enterprises that is not available
in that. So our partnership with Vertex AI and Google Cloud, as well as others in the industry
is how can we bring this enterprise knowledge back into the large language models and how
can we enable both of them to work together to benefit from each other.
So that's the main thing.
There are five use cases that we have identified that we believe are going to be really valuable
for customers.
And we have various different integrations that we have done in the product in that. First of all, the most interesting use case that most of our customers are talking about,
I was talking to a large pharmaceutical company. They have a knowledge graph that they have built.
It's being used by 300 odd people in the organization through Bloom, which is our
visual analytics analytics like graph
analytics tool but they would love to make it accessible to many more users but these users
are business users they are not technical users so their thing is a natural language experience
working with the knowledge graph is actually going to help the whole organization to go ahead and
benefit from it rather than just
limited people who understand graphs and all. So I think that's one is natural language
integration. In this case, we can use the large language model to convert, generate cipher. We
can go ahead and train the model, fine tune it as well as prompt engineering to go to that. So
that's one. The second one is we can enrich the knowledge graph
using large language models. So if you have, there are organizations, there's a large oil and gas
deal that we did just recently. They basically have a lot of technical manuals that go on the
oil and gas rigs and all, all of these are unstructured data. They have a structured data
that's there. So building a knowledge graph
that can embed both structured as well as contents from the unstructured data within it and building
the knowledge graph from your unstructured data is the second one. And large language models are
great at taking and generating entities out of it. The other one that we have is basically enriching the knowledge graph based on large language
models.
So let's say you already have some of the content, let's say some text that was created
or generated with the customer service reps.
You can use summarization functions.
You can do various of these, like sentiment analysis and all of that.
And we can, in real time,
enrich all the data that you have with like large language models using real
time enrichment function calls that you can do.
The last one, which is more interesting is actually there are two of them,
but I will touch on one. One of them is the grounding. So, you know,
we always hear about hallucination impact of large language models, where we can help with knowledge graphs is once you have actually run your analysis, or if you ask the question to Vertex AI or ChatGPT, once a response comes, we can validate that response against the knowledge graph. So we have
integrated Neo4j with LanChain. So you can go ahead and integrate that in the pipelines that
you have. And then finally embeddings, I think it's super expensive to run searches in large
language models. There is a really good paper from any scale
on the cost difference between,
it's primarily on OpenAI side, OpenAI 4.0,
GPT 4.0 to 3.5.
And it just tells you that it's much better
to store vector embeddings and embeddings
in your database and then do searches
with cosine similarity like functions
rather than doing it in large language models.
So we are enabling that use case too.
So those are five use cases that we are looking at,
and we have done different integrations from some APOC functions that we have
built to land chain integration to integration with the embeddings API that we
have and all that.
So that's the list of five use cases that we have enabled with the Google team.
Yeah.
Thanks.
Thanks for listing those.
And well, I have to say, so one of the things I do is, well, I have this newsletter that
I run from time to time.
And well, it was in hibernation for a long time, actually. Well, due to a number of reasons.
So recently, I just put out the last issue.
And obviously, one of the overarching themes in what's going in the graph world is what
you just talked about, basically, this whole idea of how do we use large language models
and graphs together.
And just looking at, there's like a myriad of ways that people are thinking of doing
that and different uses case they're exploring, but it looks like all of them fall in under one
of three categories. So there's the use cases that are about, well, how can we use large language
models to build a new knowledge graphs? They're the ones about how can we use a large language model
to add to an already existing knowledge graph.
And then there's the whole using large language model
as an interface to just accessing knowledge graphs.
And I think everything that you talked about
falls under one of these three categories, basically.
Yeah, that's correct.
Like the first one, I would just expand it one
for saying like you can generate knowledge graphs
and also enrich knowledge graphs
with additional context and information
that large language models can give you.
But you're right.
You're either building a knowledge graph,
training the LLMs using the knowledge graph information so that that becomes better, or using a natural language to engage with knowledge graphs.
So those are roughly three use cases. You're right.
Yeah. So you also mentioned that these integrations that you announced with Google Vertex AI, you're going to be working on similar integrations
with other cloud providers as well.
So I guess that means we should expect announcements
to be coming out soon as well.
Yeah, you will see that.
And the integration with OpenAI stack is already done.
We are working with like AWS team
to go out and integrate with them too.
And so, yeah, you will see quite a few more announcements coming in next few months.
The additional one thing I do want to say is it is an evolving space.
So one of the things we did was we actually, me and MLR CEO, when we started talking about
the whole generative AI space, what we did was instead of just going and building just some
product integrations and all, we also formed this internal team. We call it NALLM, but
it's an LLM team, focus team, which is like we are spending, we pulled a bunch of engineering
folks and product folks and cross-functional team into a small working group. We dedicated more than 75% of their time is going into doing research
and actually building components and publishing blogs
and technical write-ups on it.
Because as we all know, this is such a fast-evolving space
that we wanted to make sure we are ahead you know ahead of what is coming and what's
happening and we can educate our customer base as well as everybody else in the industry
we have got really tremendous response to some of the initial i think we have posted three blog
posts right now on technical details of how to think about it uh how to engage and how to integrate with these
technologies, and we'll continue doing that. So I think it's a, the more important thing here is,
I don't think Gen AI, we are in the first, like, you know, innings with Gen AI still, right? Like,
and there's a lot of innovation to come, a lot of evolutions will come. And so I think just working
as a community, especially on the graph side would be super helpful. And so I think just working as a community,
especially on the graph side, would be super helpful.
And I think it's not just Neo4j,
but hopefully others in the industry also partner along with us as well as our customers to go ahead and evolve
where we can leverage graphs with LLMs in a more efficient way.
So yeah, we are learning and we are hoping to continue learning
how to evolve it.
Yeah, actually, this is something
I wanted to ask you about
because, well, it's great
that you have these
tightly knit integrations
with Google and OpenAI.
But I was thinking, well,
if I was a product manager at Neo4j
or, well, whatever other vendor
for that matter,
well, definitely I would like to have
these types of integrations,
but I would also definitely like
to have that kind of feature
for my standalone product
or if people want to run on-prem.
So these integrations,
I'm guessing that they're only valid for Aura.
So if I want to run like an on-prem version,
then well, what do I do?
And I guess this research
and development team that you just referred to is the answer. So you're sort of trying to
build that into the core product, right? Yeah. So just one thing to be clear,
all the integrations we are doing are available both for Aura and self-managed instances. So they run on both.
Like for example, our embedding callbacks,
like for enrichment work through like direct calls.
So you can actually use them either in either place.
Some of the stuff that we are doing with NeoDash,
the dashboarding tool that's available to anybody,
wherever you're running, you do need access to Vertex AI,
but it doesn't have to be Aura only.
It's not just fully managed instance,
but also self-managed instances and all.
And many of our large customers deploy it in the cloud,
but they self-manage it.
They don't want to use the fully managed instances here
because they want it in their own VPC control.
They want to control the environment and all.
So yeah, it will work with both our on-prem product
as well as on-prem, self-managed,
can be in cloud or on-prem
and in the fully managed version.
So yeah, I think that's our thing.
We may do a few things in coming months
which will first go into the Aura product,
because it's much easier to experiment and launch and learn in the cloud with a fully managed offering,
because we control the environment before we actually make it available in self-managed deployment for customers.
So we may lead with Aura in many cases now, but we will make sure the capabilities are available
in our self-managed option.
Well, another thing that I would have in my roadmap if I was a product manager would be
actually integrating with open source large language models.
And the reason for that is, well, you know, there's a number of reasons, actually.
But thinking from the enterprise side of things, I know there's lots of concern about, well,
having your data exposed or shared in whatever way with OpenAI or Microsoft or Google or
whatever other vendor is behind running those large language models.
So I'm pretty sure that that's a concern you're also familiar with. So to address that
concern, probably the answer is either enabling clients that already have figured out how to run
their own large language models, or probably offering just one out of the box for the ones
that haven't figured that out. I think that is a good point. And what we did was, as we started working on the Gen AI stuff, we actually partnered with 12 or so customers. And we started understanding their requirements. And so they explicitly were very clear to us
that we will not use anything
that's in the public forum, anything.
They can't even,
some of them can't even get to the OpenAI website,
forget about using the models.
And so we are partnering with them
to figure out how they would like to build a LLM model
based on all of their technical details that they have.
Like that will be like more of a custom model that they will like, you know, train as well as run.
We still have to figure out that portion about how we will integrate with something that is running on premises in a completely cordoned off environment.
What do the APIs look like? What does the engagement and integration model
looks like and all that?
So that one, I think it's easier to integrate
with like a standardized API from OpenAI,
Azure's OpenAI version,
or like Google Cloud or AWS
because they're public APIs.
On the self-managed one,
we are still trying to figure out
what the infrastructure would look like,
what the integration model will look like.
So still more learnings to do.
We are, of course, committed to helping our customers
that want to deploy the knowledge graph
as well as the large language model
in their environment and all that.
We just have to figure out
how that engagement and integration would look like.
But we're partnering with some of the largest companies
in the world to learn from them
and identify what that integration would look like.
Yeah.
So there's another part of this whole landscape, let's say,
which you also mentioned previously
that has to do with embeddings.
And I know this is something that Neo4j has actually supported for a while.
I'm not sure exactly how long, but well, at least since the whole data science product came out,
which is like a couple of years already.
And I'm seeing more and more, not necessarily graph database,
but database vendors in general who are adding these capabilities precisely because
of, well, the reasoning you referred to earlier. So it makes lots of sense if you want to integrate
with a large Lagrange model in some way, or if you want to have a sort of exchange, let's say,
with that model to be able to offer that capability. So I was wondering, like I said,
I know that Neo4j has offered that
capability for a while. And I think actually it has like both node embeddings and graph embeddings.
So I wonder what's the status there. I know that there's a few algorithms that are supported out
of the box. So is it under active development? Do people use it? And what's the roadmap for those features?
So you're absolutely right.
Let me touch upon a couple of things, and then I will come on the general industry too.
So first of all, we have been supporting embeddings in the database and running algorithms on top of it for last couple of years. And we have tons of customers, like almost every customer that I have spoken to in last two to three months
are using some shape or form of our data science capability.
So whether that's page rank to running similarities based on the node
embeddings and all of that.
So cosine similarity functions and all that stuff.
So almost every customer is using that advanced
data science library to go ahead and run algorithms. And it's not just data science
teams. It's also the developers that build applications. Now they want to go ahead and
become smarter in a few use cases. They use the library, whether centrality, whether it is
different functions, but also the embedding capabilities and all. So that's one thing.
Especially in the large language model, there is currently this massive interest in vector data type and vector databases,
and what does that look like?
And the fundamental thing there is, you know, large language models are also great at vectorizing information.
And the problem with large language models,
at least that's what is coming out right now, is if you use them for searches on top of every time,
on top of the data that you have, it is going to be more expensive. So the theory is
using vectorized information from large language model, storing it in your store,
and then running cosine similarity like functions
is a much more cost-effective way for running algorithms
than trying to put all your enterprise data
back into a large language model, fine-tune it.
There's limitations on fine-tuning number of,
like, you know, how many tokens you can use and all.
And on top of that, running searches.
And most likely, it's anyway going to hallucinate and give you some wrong information.
So, again, you have to validate that answer.
So, therefore, I think there's massive interest in storing vector data and vector embeddings primarily and running cosine similarity functions.
So, we provide that capability today.
But we realize we can do much better on
that. So I think one of the things we are working on is vector indexes. So today, if you store the
vector information internally, you run cosine similarity, that works really well. We can do
that in memory with the data science libraries and all of that. But we want to actually make it
pretty more native. So we are looking at enabling vector indexes directly on top of our storage tier.
So that will make it much, much better. So that is coming. My hope is I won't give a date like
a good product manager in this event, but sometime in next few months, you should see
support for vector indexes that makes the
whole system way more efficient for vector searches. Just one more thing I want to add is,
like, you know, I've talked to a bunch of customers and this conversation has come up about,
do I need a vector database to go ahead and store vectors? or do I need a knowledge graph and does one replace the
other and all that I think a lot of times things depend on what use case you're enabling and so
vector types and vector data types databases are are great for one use case which is search right
like if you wanted to do like natural language like searches I think it's a great use case and solution for that.
And if you wanted to just replace your current search technology with that kind of a model,
I think that may be possible. But in most cases, what happens talking to customers is
they have some structured information along with the unstructured searches on natural language,
right? For example, if you wanted to find, tell me what products are similar to, let's say,
white shirts that have pockets. Let's say somebody searches that I'm looking for white shirts with
pockets. Normally what you want to do is, oh, you want to filter it down to shirts
and then look for all the descriptions that are there, which are vectorized and say,
hey, in descriptions, do you have the white shirts with pockets? Because that will be more
likely the right answer. And so you want to do some level of structured filtering
and apply searches on top of only those entities, because otherwise that will not be right in many
cases. So I think the advantage of knowledge graphs and the way you can leverage them is
you have these structured relationships, which are explicit relationships, and then you can have the
implicit searches that are running on top of it. And the combination is what I think is much more powerful
than just purely implicit search that you can do.
Of course, if you just need that, you can use that.
But I'm saying like a combination is way more powerful
for most customer use cases that I'm seeing,
at least right now.
Okay, well, that was basically an answer
to a question I had lined up,
which was going to be, well, in a way,
that situation with many database vendors
adding vector search capabilities
resembles a situation that has happened
and is happening now as well
with many non-graph database vendors
adding graph capabilities on top.
So there's always this debate, let's say, well, should I go for a specialized vendor
that does, I don't know, graph or vector or whatever, or should I stick with my general
purpose database that also has graph or vector on top?
And well, the answer is it depends, basically, as you also said.
I think there's also one more thing, right? Like I, I would say,
can you put a graph algorithm on any storage system? You can.
The problem is graphs are all about like, you know,
relationships and traversing relationships.
Can you run a traversal of relationship in a relational database? Yes,
you can. You're doing a multi-dimensional recursive SQL queries, right?
You basically, and that means it's going to slow you down.
So that's perfectly, that's why it is sometimes,
it looks nice, but it doesn't perform well,
actually, in the production environment.
Same thing is going to happen with vectors, right?
If you basically try to take a graph database
and try to just run vectorized searches
and stuff like that,
and if you didn't change the underlying technology
of how you handle vectors,
I think that will be challenged over a period of time,
especially when you do large scale use cases and all.
And this is the exact reason
we are building vector indexes
that will allow us to go in the storage system, store all the vectors in an index form that we can then run searches directly from an index rather than traversing through every hop of the node and then trying to figure the vector search, that's not going to be the right thing.
And this is where the data science library that we have also puts all the data in memory and then allows us to run really fast.
And with the vector indexes, we will do the same. We will have a vectorized index for all the vectors and we can run it really fast.
So I think it's all about, one is about your right. It depends on what your use case is.
And the second is also
how our technology is being built.
Are you natively building something
that optimizes for that from a storage up?
Or are you just putting like an API on top
and just trying to make sure it just works?
So I think that's what it is.
And so we are trying to take
from a bottom-up approach
of how we can like really do vectorization, vector storage, and vector searches efficiently across
the whole database. Okay. So let's wrap up with something a bit more high-level or strategic,
let's say. I noticed, again, in the last press release that I saw, which was about this new integration with Google Vertex AI, it was, you know, the messaging was all about knowledge graphs, you know, how to access knowledge graphs, how to build knowledge graphs and so on and so forth.
And this is something that will, to the best of my knowledge or memory, I don't think I've seen before, you know, so much,
you know, like upfront and center from Neo4j. At the same time, you probably know that there is
this specific, well, I don't know, module or extension, however you want to call it,
which is called NeoSemantics. So basically, it's a way for people to use, well, semantics and RDF
technology and all of that in conjunction with Neo4j.
And that I saw recently that it has been going really well. It's reached like a million downloads
recently. And so I wonder, bringing these two pieces of evidence, let's say together,
does that mean that Neo4j is kind of shifting its messaging or maybe that you get feedback from clients that they're really interested in knowledge graphs?
And so you're following the signal that you get from them?
I think, let me give you a little philosophy that I have and then we can get to the product things, right?
I always love to listen to my customers and figure out where what they are
looking for and what helps them and i think what i have noticed and since i've been here and also
the industry the way it is moving i think there is this like you know genuine new interest in
knowledge graphs and coming from google i've seen the power of knowledge graph and how it helps in, in large search kind of an environment where, where Google runs. And, and, and so I see like,
as the natural language experiences become more and more, more usable for customers, this like,
you know, how can organizations build knowledge graphs and leverage them with these technologies is becoming more interesting.
Now, I would say we are more following what our customers are telling us. And I think if there is more interest in knowledge graphs, in building knowledge graphs and having more semantic
graph capability in Neo4j, we will follow that and we'll absolutely go build more
capabilities to support organizations in that. So I think we will follow that and we will absolutely go build more capabilities to
support organizations in that. So I think we will learn as we go. I'm refreshing our whole
product strategy for next two to three years. And this will be one piece where we will continuously
get our feedback from. We are building our whole customer advisory board where we are adding
some of the largest organizations in the world to get continuous feedback and we will learn from
them and identify what capabilities will make more sense in the product in that realm okay so
uh let's let's then wrap up with a little peek then maybe if if possible into that product
roadmap that you just talked about so just to
reiterate for people who may not be familiar i think like the main product line you have up to
this point is like well the core database and or like the uh the fully managed version then there's
bloom the visual exploration module and also data the data science which is like a again you talked
about it quite a bit with, it's the one that offers
all these algorithms, graph algorithms and all that. So do you see new features into the existing
product set or do you maybe envision also new products? So I actually think there are,
it's a combination of both. So let me first tell you the big themes I have for Neo4j and
where we are going to go invest in and all, right? I think, first of all, one of the biggest things
I've seen is most of our customers, 90 plus percent, are deploying Neo4j in cloud. What I
mean by that is it's either Aura or it is self-managed, but it's all in cloud.
And so I think being cloud first is going to be the big theme for us. And what do I mean by being
cloud first is of course, making sure Aura has all the product capabilities that organizations need,
how do we deliver Bloom and works like, you know, a browser and all these tools that you talked
about in cloud,
how do you enable doing that?
It's going to be a big thing for us.
And so I have this whole comprehensive
Auras-based solution
where users don't have to deploy anything.
They can just run it
and that should run with your self-managed.
Some of the tools will run with self-managed
as well as in the fully managed database side of it.
So I think that's one.
The second thing in that realm is also how we integrate with the cloud ecosystems, right?
You saw a first preview of what we can do with Vertex AI, OpenAI that we talked about
today, but there are a lot more integrations.
We also did some integration with BigQuery, but I want to focus on Snowflake and Databricks and Azure Fabric and all these
ecosystems that are evolving in cloud and make it very easy for our customers.
Many of our customers are like, hey, for backups and recovery, we just want to use the cloud
object storage and instead of doing like what we did traditionally in disks and all that
and on-premises. So I think helping being that cloud-first company,
integrating more natively with the cloud ecosystems
is going to be one big theme for us.
The second thing I want to focus on is ease of use.
I think one of the things that I have seen is graphs are like,
you know, I'm pretty sure more organizations will use graph technology
if we made it easier for them to adopt it. And this is the reason I want to make sure that
anybody who wants to start with a graph technology can do that. I have this theme internally that I
am talking about. It's not from me. One of the leaders in Microsoft I worked with has this thing.
So I'm just picking it from him.
It's like five seconds to sign up and five minutes to wow.
So basically, in five minutes, you should be able to pick the data that you want,
move it into a, like model it into a graph model and move
and actually start using bloom or something like that
to actually visualize it and start finding value in your data and stuff like that so just
simplification from import modeling to visualization to analytics which is basically dashboarding and
all of that stuff so we have some experimental tools that we have given our customers like
neo dash which is the dashboarding we have neo converse that we have given our customers like NeoDash, which is the dashboarding.
We have NeoConverse, which is chat experience, looking at some of those profile end to end and seeing what more tools will be helpful for organizations to go from one to end and how do we make it easy.
So that's the second piece of it. And in that whole context, it's also about how we think about data science. Because I think data
science has two use cases. One is a developer who has built an application. They want to build an
intelligent application. They want to put predictions within their application, like
a supply chain graph wants to run some risk algorithm, and they just want to call a function
to go figure it out. And that's one example. And then there is this whole other example where a data scientist or an analyst
wants to run some algorithm,
figure out what analysis they can do.
But that is more of an ephemeral workload,
which is like a temporary workload.
It's not like running 24-7 as an application.
So enabling those two different types of use cases
for our customers in a very cost-effective way
in the cloud is now possible.
So I think that's the third area, which is like, how do I satisfy those use cases? So those are
the three main things. And of course, I come from large big data world and making sure our database
can also support cloud scale, not just we are one of the most scalable databases, we can do
large read replicas and all of that. But I've been talking to large customers from my past,
from Snap to Spotify to like Walmart, like all the large customers and the size of data is only
growing in organizations. And so how do we support the cloud scale architecture and cloud scale
capabilities is going to be another
big area of focus for me so that's at a very high level maybe we do a one session sometime in future
when i have the real roadmap final for next couple of years and we can go deeper into that
yeah yeah i mean in all fairness you you're new so i wouldn't expect you to have it all figured
out by now and i would it would actually be even a bit concerning if you had like everything else
already. What I can tell you is that, well,
there's definitely a overlap in the way that Emil thinks.
So the emphasis on ease of use basically,
and he's always been saying that, well, it's, it's already easy to use,
but we want to make it even easier.
So I can see why, you know, you do get along basically.
Yeah, I agree.
I think it's been great working with Emil in last or three months as part of
this team, but also more than six months I've been talking to him.
He's an inspirational leader.
And I think we have a
lot more opportunities to like you know make it make our technologies more pervasive across the
globe so yeah looking forward to working with with him and the broader team we have an amazing team
here and not just in product and engineering but our leadership in the revenue organization
marketing like it's a great team, finance and all that.
So I'm looking forward to partnering with everybody
to take Neo4j to new heights now.
Great. Well, it's been a pleasure.
And best of luck with your plans.
And well, I'm sure we'll be in touch
as they come to fruition eventually
and you have product announcements
and all that to talk about.
Thank you, George.
Thanks a lot.
For more stories like this,
check the link in bio
and follow Link Data Orchestration.