Orchestrate all the Things - Neo4j's roadmap in 2023: Cloud, Graph Data Science, Large Language Models and Knowledge Graphs. Featuring Neo4j CPO Sudhir Hasbe

Starting point is 00:00:00 Welcome to Orchestrate All the Things. I'm George Anadiotis and we'll be connecting the dots together. Stories about technology, data, AI and media and how they flow into each other, saving our lives. Neo4j recently announced new product features in collaboration with Google as well as a new product manager coming from Google, Sudhir Haspe. We caught up to discuss what the future holds for the graph database and the greater

Starting point is 00:00:25 trends in this space. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook. So hello everyone and hi George. I'm Sudhir Hasbe. I'm the Chief Product Officer at Neo4j. I live in Seattle and I have joined Neo4j since April. So I've been here for almost three months now. I've spent a lot of time meeting with customers, learning from our customer-facing teams, as well as learning a lot about the product and experiences that our customers have through our own team in product management and engineering. Prior to joining Neo4j, I was Senior Director of Product Management for all of data analytics

Starting point is 00:01:16 at Google Cloud. So I ran all the analytics services. There are eight to 10 of those that we offer, including BigQuery, which is one of the largest data analytics services in the industry. I did that for five and a half years. And prior to that, I was at VP of Engineering at Zulily. And then before that at Microsoft, I was part of product and program management and Xbox as well as Azure. So that's briefly about me. I'm based in Seattle and I love clouds and everything around cloud technology as well as the cloudy weather. Yeah, I was going to say, well, I think Seattle is a good place for

Starting point is 00:01:56 someone to be based at if you love clouds. Yes, that's exactly right. Well, another thing I noticed actually just quickly going through your background, basically, is that, well, you have some interesting hopes, let's say, in your career. I think you started actually more as an engineer, but then you quickly switched to product development, product manager, whatever you want to call it. And, well, you have a stint at Microsoft and then Google. And actually, it's the last one that I find quite interesting

Starting point is 00:02:31 because I'm kind of guessing that this is, it's through that last stint of yours that you came to know Neo4j. And this is what led you to transition to actually working for them. So to give people who may be listening a little bit of background, I know that Neo4j was along the batch of companies that, open source companies actually, that made a deal with Google back in 2019. And that was kind of big news at the time and for good reason because Google you know at the time it was the whole open source and commercial open source thing was was a big deal and whether you

Starting point is 00:03:13 know cloud providers were profiteering let's say off of open source projects and companies and Google kind of made the difference by by making move. So obviously the rest of the world, we don't know the exact details of the deal, but the general, the spirit of the agreement was basically, well, we're going to give them a fair share of the profits since we're using their products anyway. And besides that, the financial part, let's say of the arrangement, it also entailed sort of close integrations. So knowing all that, I'm guessing that this is what brought you in touch with the wonderful world of Graph and by extension Neo4j.

Starting point is 00:03:55 And I'm guessing that that led you to eventually joining the company. Yeah, and I can give you a little more background. First of all, on the Google Cloud side, we of course had this strategy since Thomas came in to lead the team in, I think it was end of 2018, early 2019. source companies rather than competing heavily. This included Neo4j, but also Confluent and MongoDB and various other partners in this realm, Elastic. And so we always went ahead and partnered really well with these organizations. So I got an opportunity. I was actually the exec sponsor for Confluent throughout my time there. And I always tried to go ahead and make sure that Confluent got the right positioning messaging for all the messaging services. We had our own service, but we always made sure that customers had the choice to pick the right service, especially if they preferred

Starting point is 00:04:58 open source. So that is absolutely, I agree with you. That was a strategy. Giving you a little bit background on myself and graphs, like I first came in touch with graph technology when I was at Xbox and we were trying to build some of these features around social discovery and what that would look like and all of that. Like, hey, if we knew your social graph, can we go ahead and do better recommendations for games and stuff like that? So it was nascent to those days. It was not that, I don't think we didn't have Neo4j-like technology, but we were basically trying to build our own graph from big data solutions. And it was challenging because it was not that easy to go ahead and traverse graphs in like a big data technology. Think about Hadoop and think about running something like that. It was not Hadoop in Microsoft. Of course, it has had its own internal technology that we used to use.

Starting point is 00:05:56 So then I transitioned from there to Zulily. And Zulily is a consumer technology company. It's a marketplace. We had more than 18,000 vendors trying to publish new products every day. They lasted for only three days. And so it's a rotating catalog problem. And we experimented that time with Neo4j. This was early days. We used the open source tech and a bunch of hackathons. And so we never put anything in production that time. We were like just experimenting. And so I had been in, I had looked at the technology. Then I traverse and land into Google. And we were that time, like, you know,

Starting point is 00:06:38 working with one of the open source or one of the partners in the space that built on top of Bigtable and tried to go ahead and build a graph technology on top of, that was Janus Graph, I think, that ran on top of Bigtable and we were like partnering with them. I didn't run the database side of the house. I ran the analytic side, but I used to talk to my counterparts

Starting point is 00:07:00 on the database side and understand the use cases and all. And so in 2018, I wrote this paper in Google on what is my long-term vision for data analytics in Google Cloud. And I had this view that in the long-term, organizations will need different kinds of compute engines running on the data that they have collected to get value from it. And of course, it included things like, how do you do search analytics? How do you do large scale big data analytics? How do you do artificial intelligence through the same thing? And as part of that was also graph analytics and how graphs will come together. And internally, we had this term called big graph, which was BigQuery plus graphs and what that would look like.

Starting point is 00:07:45 And we partnered with Neo4j as well as a couple of others in the industry to go ahead and make that vision happen. And as in the last year or so of my being there, six months to a year, we started focusing a lot on that partnership and working with folks in that space. Because we saw our customers like Wayfair and all trying to use some of these technologies there. And so I think that was pretty profound. And so I knew there was value in like, you know, big data systems have this ability for you to collect massive amounts of data. And we have large systems like BigQuery, they're really good columnar databases, which means you can do aggregates really well. That's the technology, like from relational

Starting point is 00:08:32 database, which is row-based, you go to columnar, which is like column level storage, and they're really good at storing and aggregating the whole column data and sub-aggregates and all. What they don't have is the relationships between entities, which means if I know Sudhir likes products and electronic products, like I buy a lot of laptops and this, and from there, what do I like? So that level of relationship connections are harder because you're storing everything as columns and not as like entities and so when I saw those use cases come become really interesting and how we could add more value that actually was eye-opening and then when I started looking at after five and a half

Starting point is 00:09:17 years what I wanted to do next with the advent of AI Gen AI, the value that graphs could add in the space in general to customers with data that we had already collected and consolidated, I think it made a lot of sense for me to go to a graph-based company. And of course, then the question was Neo4j is the leader in that. I started talking to ML and it just worked out. So that's my little brief history with graphs over time. I won't say I'm an expert in graphs yet, but I'm learning really fast and understanding how customers are using it and where they can use it. And I met with, I don't know, 25, 30 plus customers over the last couple of months and understood their use cases. Well, you've been most productive, I would say,

Starting point is 00:10:07 taking into account you've only been in your new role for three months. That's like 10 customers per month. So not bad. Interesting, interesting backstory. And well, as you may know, I've also been into Graph for, well, longer than I'd care to admit, actually. But throughout that time, I've always been wondering, and I'm not the only one, let me tell you that. So how come Google never really officially, let's say, got into that space?

Starting point is 00:10:38 Well, I know there have been some isolated, more or less, endeavors, let's say here and there. So I remember at some point, like a year ago or something, when there was this initiative about knowledge graphs and how they can, they were used internally to power some features. But I think that's, to the best of my knowledge, that's as far as it went. So it was interesting for me to hear this this big graph thing that you said uh is going on so did you have any idea if this is uh still going on or where this may i think yeah so just to be clear so google does you have a massive knowledge graph that powers search and all and there are public documents on that. On the Google Cloud side, our strategy was to focus on the core big services like BigQuery, and the big graph was partnership with Neo4j and others in the industry to provide the graph capability natively integrated with it.

Starting point is 00:11:39 We announced this just before I left. My last day was, I think, March 30th or so. And I think the day before or something, we announced the actual solution between both Neo4j and BigQuery and that integration. So that's how we thought about it. There are various business reasons we have evaluated graphs and should we go into graph business or not but i think that won't be fair for me to talk about in this this forum but yeah i think internally we do have knowledge graphs we do use those uh for some of the use cases like kyc and all uh which which the vertex team is building and working on but providing graph as a database technology, I don't think

Starting point is 00:12:26 there is, like, we didn't plan for that yet. So maybe things will change in future. I don't know now, but yeah, there's always interest in graph technology in various works. Yeah, well, obviously, I don't know as much as you do about how Google thinks about it. But, you know, as an outsider, my take would be, well, probably if Google wanted to do that, they would have done it by now. So it doesn't seem very likely that they would come out and release their own graph database solution

Starting point is 00:12:57 or anything of the kind. Probably not. Okay, but well, since you talked about, it sounds like almost like the last thing you signed off, let's say, before you leave Google. But interestingly, one of the first things to come out of Neo4j, since you also joined, also has like a Google flavor to it. And it actually has to do with the latest announcement that I know of, at least from Neo4j.

Starting point is 00:13:27 It has to do precisely with features that are about integrating Google's large language model capability and vertex AI in general. Let's say with graph-specific features. So would you like to just quickly take us through the features that you just announced? And after you do that, let's get a little bit into the specifics of each one. Yeah. So first of all, yeah, like a couple of weeks back, we announced all the integrations and use cases that we are focusing on with Vertex AI. Just to be clear, we have those integrations with Vertex AI

Starting point is 00:14:05 and also with OpenAI and Microsoft version of OpenAI services. So we are integrating with all the LLM platforms, and we are also working with AWS team to integrate those with Bedrock and other technologies that they are going to come out with. So first of all, we are like, you know, cross cloud and we run on all the three clouds and just wanted to highlight that. And so specifically about LLMs

Starting point is 00:14:32 and like talking about Vertex AI, right? I think the whole, like, you know, in last five to six months, the world has really significantly shifted with advent of chat GPT and then announcements from Vertex AI and large language model innovations that are happening. And one of the main things, and I was involved in some of that technology, especially from

Starting point is 00:14:58 an analytics perspective, how we would use it in Google. So I knew coming in what was happening and how fast this industry was moving. And one of the things that we believe is a challenge in large language models is, so if you think about large language models, they were actually built for understanding language and grammar and how you build sentences. And it's like, if you read the Transformers paper, it's like if you read the transformers paper it's about the attention uh like the whole theory is like how likely is the next word after this word what's the probability of the next word being the word that we are generating and all that and so in in it's great for creative generation of content it is a challenge when you're looking at factual data and it doesn't have the facts with it in many

Starting point is 00:15:46 cases, especially the enterprise domain knowledge that exists in enterprises that is not available in that. So our partnership with Vertex AI and Google Cloud, as well as others in the industry is how can we bring this enterprise knowledge back into the large language models and how can we enable both of them to work together to benefit from each other. So that's the main thing. There are five use cases that we have identified that we believe are going to be really valuable for customers. And we have various different integrations that we have done in the product in that. First of all, the most interesting use case that most of our customers are talking about,

Starting point is 00:16:33 I was talking to a large pharmaceutical company. They have a knowledge graph that they have built. It's being used by 300 odd people in the organization through Bloom, which is our visual analytics analytics like graph analytics tool but they would love to make it accessible to many more users but these users are business users they are not technical users so their thing is a natural language experience working with the knowledge graph is actually going to help the whole organization to go ahead and benefit from it rather than just limited people who understand graphs and all. So I think that's one is natural language

Starting point is 00:17:10 integration. In this case, we can use the large language model to convert, generate cipher. We can go ahead and train the model, fine tune it as well as prompt engineering to go to that. So that's one. The second one is we can enrich the knowledge graph using large language models. So if you have, there are organizations, there's a large oil and gas deal that we did just recently. They basically have a lot of technical manuals that go on the oil and gas rigs and all, all of these are unstructured data. They have a structured data that's there. So building a knowledge graph that can embed both structured as well as contents from the unstructured data within it and building

Starting point is 00:17:51 the knowledge graph from your unstructured data is the second one. And large language models are great at taking and generating entities out of it. The other one that we have is basically enriching the knowledge graph based on large language models. So let's say you already have some of the content, let's say some text that was created or generated with the customer service reps. You can use summarization functions. You can do various of these, like sentiment analysis and all of that. And we can, in real time,

Starting point is 00:18:25 enrich all the data that you have with like large language models using real time enrichment function calls that you can do. The last one, which is more interesting is actually there are two of them, but I will touch on one. One of them is the grounding. So, you know, we always hear about hallucination impact of large language models, where we can help with knowledge graphs is once you have actually run your analysis, or if you ask the question to Vertex AI or ChatGPT, once a response comes, we can validate that response against the knowledge graph. So we have integrated Neo4j with LanChain. So you can go ahead and integrate that in the pipelines that you have. And then finally embeddings, I think it's super expensive to run searches in large language models. There is a really good paper from any scale

Starting point is 00:19:26 on the cost difference between, it's primarily on OpenAI side, OpenAI 4.0, GPT 4.0 to 3.5. And it just tells you that it's much better to store vector embeddings and embeddings in your database and then do searches with cosine similarity like functions rather than doing it in large language models.

Starting point is 00:19:47 So we are enabling that use case too. So those are five use cases that we are looking at, and we have done different integrations from some APOC functions that we have built to land chain integration to integration with the embeddings API that we have and all that. So that's the list of five use cases that we have enabled with the Google team. Yeah. Thanks.

Starting point is 00:20:11 Thanks for listing those. And well, I have to say, so one of the things I do is, well, I have this newsletter that I run from time to time. And well, it was in hibernation for a long time, actually. Well, due to a number of reasons. So recently, I just put out the last issue. And obviously, one of the overarching themes in what's going in the graph world is what you just talked about, basically, this whole idea of how do we use large language models and graphs together.

Starting point is 00:20:40 And just looking at, there's like a myriad of ways that people are thinking of doing that and different uses case they're exploring, but it looks like all of them fall in under one of three categories. So there's the use cases that are about, well, how can we use large language models to build a new knowledge graphs? They're the ones about how can we use a large language model to add to an already existing knowledge graph. And then there's the whole using large language model as an interface to just accessing knowledge graphs. And I think everything that you talked about

Starting point is 00:21:23 falls under one of these three categories, basically. Yeah, that's correct. Like the first one, I would just expand it one for saying like you can generate knowledge graphs and also enrich knowledge graphs with additional context and information that large language models can give you. But you're right.

Starting point is 00:21:40 You're either building a knowledge graph, training the LLMs using the knowledge graph information so that that becomes better, or using a natural language to engage with knowledge graphs. So those are roughly three use cases. You're right. Yeah. So you also mentioned that these integrations that you announced with Google Vertex AI, you're going to be working on similar integrations with other cloud providers as well. So I guess that means we should expect announcements to be coming out soon as well. Yeah, you will see that.

Starting point is 00:22:16 And the integration with OpenAI stack is already done. We are working with like AWS team to go out and integrate with them too. And so, yeah, you will see quite a few more announcements coming in next few months. The additional one thing I do want to say is it is an evolving space. So one of the things we did was we actually, me and MLR CEO, when we started talking about the whole generative AI space, what we did was instead of just going and building just some product integrations and all, we also formed this internal team. We call it NALLM, but

Starting point is 00:22:52 it's an LLM team, focus team, which is like we are spending, we pulled a bunch of engineering folks and product folks and cross-functional team into a small working group. We dedicated more than 75% of their time is going into doing research and actually building components and publishing blogs and technical write-ups on it. Because as we all know, this is such a fast-evolving space that we wanted to make sure we are ahead you know ahead of what is coming and what's happening and we can educate our customer base as well as everybody else in the industry we have got really tremendous response to some of the initial i think we have posted three blog

Starting point is 00:23:38 posts right now on technical details of how to think about it uh how to engage and how to integrate with these technologies, and we'll continue doing that. So I think it's a, the more important thing here is, I don't think Gen AI, we are in the first, like, you know, innings with Gen AI still, right? Like, and there's a lot of innovation to come, a lot of evolutions will come. And so I think just working as a community, especially on the graph side would be super helpful. And so I think just working as a community, especially on the graph side, would be super helpful. And I think it's not just Neo4j, but hopefully others in the industry also partner along with us as well as our customers to go ahead and evolve

Starting point is 00:24:17 where we can leverage graphs with LLMs in a more efficient way. So yeah, we are learning and we are hoping to continue learning how to evolve it. Yeah, actually, this is something I wanted to ask you about because, well, it's great that you have these tightly knit integrations

Starting point is 00:24:33 with Google and OpenAI. But I was thinking, well, if I was a product manager at Neo4j or, well, whatever other vendor for that matter, well, definitely I would like to have these types of integrations, but I would also definitely like

Starting point is 00:24:49 to have that kind of feature for my standalone product or if people want to run on-prem. So these integrations, I'm guessing that they're only valid for Aura. So if I want to run like an on-prem version, then well, what do I do? And I guess this research

Starting point is 00:25:06 and development team that you just referred to is the answer. So you're sort of trying to build that into the core product, right? Yeah. So just one thing to be clear, all the integrations we are doing are available both for Aura and self-managed instances. So they run on both. Like for example, our embedding callbacks, like for enrichment work through like direct calls. So you can actually use them either in either place. Some of the stuff that we are doing with NeoDash, the dashboarding tool that's available to anybody,

Starting point is 00:25:40 wherever you're running, you do need access to Vertex AI, but it doesn't have to be Aura only. It's not just fully managed instance, but also self-managed instances and all. And many of our large customers deploy it in the cloud, but they self-manage it. They don't want to use the fully managed instances here because they want it in their own VPC control.

Starting point is 00:26:03 They want to control the environment and all. So yeah, it will work with both our on-prem product as well as on-prem, self-managed, can be in cloud or on-prem and in the fully managed version. So yeah, I think that's our thing. We may do a few things in coming months which will first go into the Aura product,

Starting point is 00:26:25 because it's much easier to experiment and launch and learn in the cloud with a fully managed offering, because we control the environment before we actually make it available in self-managed deployment for customers. So we may lead with Aura in many cases now, but we will make sure the capabilities are available in our self-managed option. Well, another thing that I would have in my roadmap if I was a product manager would be actually integrating with open source large language models. And the reason for that is, well, you know, there's a number of reasons, actually. But thinking from the enterprise side of things, I know there's lots of concern about, well,

Starting point is 00:27:08 having your data exposed or shared in whatever way with OpenAI or Microsoft or Google or whatever other vendor is behind running those large language models. So I'm pretty sure that that's a concern you're also familiar with. So to address that concern, probably the answer is either enabling clients that already have figured out how to run their own large language models, or probably offering just one out of the box for the ones that haven't figured that out. I think that is a good point. And what we did was, as we started working on the Gen AI stuff, we actually partnered with 12 or so customers. And we started understanding their requirements. And so they explicitly were very clear to us that we will not use anything that's in the public forum, anything.

Starting point is 00:28:10 They can't even, some of them can't even get to the OpenAI website, forget about using the models. And so we are partnering with them to figure out how they would like to build a LLM model based on all of their technical details that they have. Like that will be like more of a custom model that they will like, you know, train as well as run. We still have to figure out that portion about how we will integrate with something that is running on premises in a completely cordoned off environment.

Starting point is 00:28:42 What do the APIs look like? What does the engagement and integration model looks like and all that? So that one, I think it's easier to integrate with like a standardized API from OpenAI, Azure's OpenAI version, or like Google Cloud or AWS because they're public APIs. On the self-managed one,

Starting point is 00:29:04 we are still trying to figure out what the infrastructure would look like, what the integration model will look like. So still more learnings to do. We are, of course, committed to helping our customers that want to deploy the knowledge graph as well as the large language model in their environment and all that.

Starting point is 00:29:22 We just have to figure out how that engagement and integration would look like. But we're partnering with some of the largest companies in the world to learn from them and identify what that integration would look like. Yeah. So there's another part of this whole landscape, let's say, which you also mentioned previously

Starting point is 00:29:41 that has to do with embeddings. And I know this is something that Neo4j has actually supported for a while. I'm not sure exactly how long, but well, at least since the whole data science product came out, which is like a couple of years already. And I'm seeing more and more, not necessarily graph database, but database vendors in general who are adding these capabilities precisely because of, well, the reasoning you referred to earlier. So it makes lots of sense if you want to integrate with a large Lagrange model in some way, or if you want to have a sort of exchange, let's say,

Starting point is 00:30:18 with that model to be able to offer that capability. So I was wondering, like I said, I know that Neo4j has offered that capability for a while. And I think actually it has like both node embeddings and graph embeddings. So I wonder what's the status there. I know that there's a few algorithms that are supported out of the box. So is it under active development? Do people use it? And what's the roadmap for those features? So you're absolutely right. Let me touch upon a couple of things, and then I will come on the general industry too. So first of all, we have been supporting embeddings in the database and running algorithms on top of it for last couple of years. And we have tons of customers, like almost every customer that I have spoken to in last two to three months

Starting point is 00:31:07 are using some shape or form of our data science capability. So whether that's page rank to running similarities based on the node embeddings and all of that. So cosine similarity functions and all that stuff. So almost every customer is using that advanced data science library to go ahead and run algorithms. And it's not just data science teams. It's also the developers that build applications. Now they want to go ahead and become smarter in a few use cases. They use the library, whether centrality, whether it is

Starting point is 00:31:40 different functions, but also the embedding capabilities and all. So that's one thing. Especially in the large language model, there is currently this massive interest in vector data type and vector databases, and what does that look like? And the fundamental thing there is, you know, large language models are also great at vectorizing information. And the problem with large language models, at least that's what is coming out right now, is if you use them for searches on top of every time, on top of the data that you have, it is going to be more expensive. So the theory is using vectorized information from large language model, storing it in your store,

Starting point is 00:32:26 and then running cosine similarity like functions is a much more cost-effective way for running algorithms than trying to put all your enterprise data back into a large language model, fine-tune it. There's limitations on fine-tuning number of, like, you know, how many tokens you can use and all. And on top of that, running searches. And most likely, it's anyway going to hallucinate and give you some wrong information.

Starting point is 00:32:50 So, again, you have to validate that answer. So, therefore, I think there's massive interest in storing vector data and vector embeddings primarily and running cosine similarity functions. So, we provide that capability today. But we realize we can do much better on that. So I think one of the things we are working on is vector indexes. So today, if you store the vector information internally, you run cosine similarity, that works really well. We can do that in memory with the data science libraries and all of that. But we want to actually make it pretty more native. So we are looking at enabling vector indexes directly on top of our storage tier.

Starting point is 00:33:32 So that will make it much, much better. So that is coming. My hope is I won't give a date like a good product manager in this event, but sometime in next few months, you should see support for vector indexes that makes the whole system way more efficient for vector searches. Just one more thing I want to add is, like, you know, I've talked to a bunch of customers and this conversation has come up about, do I need a vector database to go ahead and store vectors? or do I need a knowledge graph and does one replace the other and all that I think a lot of times things depend on what use case you're enabling and so vector types and vector data types databases are are great for one use case which is search right

Starting point is 00:34:19 like if you wanted to do like natural language like searches I think it's a great use case and solution for that. And if you wanted to just replace your current search technology with that kind of a model, I think that may be possible. But in most cases, what happens talking to customers is they have some structured information along with the unstructured searches on natural language, right? For example, if you wanted to find, tell me what products are similar to, let's say, white shirts that have pockets. Let's say somebody searches that I'm looking for white shirts with pockets. Normally what you want to do is, oh, you want to filter it down to shirts and then look for all the descriptions that are there, which are vectorized and say,

Starting point is 00:35:10 hey, in descriptions, do you have the white shirts with pockets? Because that will be more likely the right answer. And so you want to do some level of structured filtering and apply searches on top of only those entities, because otherwise that will not be right in many cases. So I think the advantage of knowledge graphs and the way you can leverage them is you have these structured relationships, which are explicit relationships, and then you can have the implicit searches that are running on top of it. And the combination is what I think is much more powerful than just purely implicit search that you can do. Of course, if you just need that, you can use that.

Starting point is 00:35:51 But I'm saying like a combination is way more powerful for most customer use cases that I'm seeing, at least right now. Okay, well, that was basically an answer to a question I had lined up, which was going to be, well, in a way, that situation with many database vendors adding vector search capabilities

Starting point is 00:36:14 resembles a situation that has happened and is happening now as well with many non-graph database vendors adding graph capabilities on top. So there's always this debate, let's say, well, should I go for a specialized vendor that does, I don't know, graph or vector or whatever, or should I stick with my general purpose database that also has graph or vector on top? And well, the answer is it depends, basically, as you also said.

Starting point is 00:36:44 I think there's also one more thing, right? Like I, I would say, can you put a graph algorithm on any storage system? You can. The problem is graphs are all about like, you know, relationships and traversing relationships. Can you run a traversal of relationship in a relational database? Yes, you can. You're doing a multi-dimensional recursive SQL queries, right? You basically, and that means it's going to slow you down. So that's perfectly, that's why it is sometimes,

Starting point is 00:37:14 it looks nice, but it doesn't perform well, actually, in the production environment. Same thing is going to happen with vectors, right? If you basically try to take a graph database and try to just run vectorized searches and stuff like that, and if you didn't change the underlying technology of how you handle vectors,

Starting point is 00:37:34 I think that will be challenged over a period of time, especially when you do large scale use cases and all. And this is the exact reason we are building vector indexes that will allow us to go in the storage system, store all the vectors in an index form that we can then run searches directly from an index rather than traversing through every hop of the node and then trying to figure the vector search, that's not going to be the right thing. And this is where the data science library that we have also puts all the data in memory and then allows us to run really fast. And with the vector indexes, we will do the same. We will have a vectorized index for all the vectors and we can run it really fast. So I think it's all about, one is about your right. It depends on what your use case is.

Starting point is 00:38:27 And the second is also how our technology is being built. Are you natively building something that optimizes for that from a storage up? Or are you just putting like an API on top and just trying to make sure it just works? So I think that's what it is. And so we are trying to take

Starting point is 00:38:43 from a bottom-up approach of how we can like really do vectorization, vector storage, and vector searches efficiently across the whole database. Okay. So let's wrap up with something a bit more high-level or strategic, let's say. I noticed, again, in the last press release that I saw, which was about this new integration with Google Vertex AI, it was, you know, the messaging was all about knowledge graphs, you know, how to access knowledge graphs, how to build knowledge graphs and so on and so forth. And this is something that will, to the best of my knowledge or memory, I don't think I've seen before, you know, so much, you know, like upfront and center from Neo4j. At the same time, you probably know that there is this specific, well, I don't know, module or extension, however you want to call it, which is called NeoSemantics. So basically, it's a way for people to use, well, semantics and RDF

Starting point is 00:39:43 technology and all of that in conjunction with Neo4j. And that I saw recently that it has been going really well. It's reached like a million downloads recently. And so I wonder, bringing these two pieces of evidence, let's say together, does that mean that Neo4j is kind of shifting its messaging or maybe that you get feedback from clients that they're really interested in knowledge graphs? And so you're following the signal that you get from them? I think, let me give you a little philosophy that I have and then we can get to the product things, right? I always love to listen to my customers and figure out where what they are looking for and what helps them and i think what i have noticed and since i've been here and also

Starting point is 00:40:32 the industry the way it is moving i think there is this like you know genuine new interest in knowledge graphs and coming from google i've seen the power of knowledge graph and how it helps in, in large search kind of an environment where, where Google runs. And, and, and so I see like, as the natural language experiences become more and more, more usable for customers, this like, you know, how can organizations build knowledge graphs and leverage them with these technologies is becoming more interesting. Now, I would say we are more following what our customers are telling us. And I think if there is more interest in knowledge graphs, in building knowledge graphs and having more semantic graph capability in Neo4j, we will follow that and we'll absolutely go build more capabilities to support organizations in that. So I think we will follow that and we will absolutely go build more capabilities to support organizations in that. So I think we will learn as we go. I'm refreshing our whole

Starting point is 00:41:31 product strategy for next two to three years. And this will be one piece where we will continuously get our feedback from. We are building our whole customer advisory board where we are adding some of the largest organizations in the world to get continuous feedback and we will learn from them and identify what capabilities will make more sense in the product in that realm okay so uh let's let's then wrap up with a little peek then maybe if if possible into that product roadmap that you just talked about so just to reiterate for people who may not be familiar i think like the main product line you have up to this point is like well the core database and or like the uh the fully managed version then there's

Starting point is 00:42:16 bloom the visual exploration module and also data the data science which is like a again you talked about it quite a bit with, it's the one that offers all these algorithms, graph algorithms and all that. So do you see new features into the existing product set or do you maybe envision also new products? So I actually think there are, it's a combination of both. So let me first tell you the big themes I have for Neo4j and where we are going to go invest in and all, right? I think, first of all, one of the biggest things I've seen is most of our customers, 90 plus percent, are deploying Neo4j in cloud. What I mean by that is it's either Aura or it is self-managed, but it's all in cloud.

Starting point is 00:43:07 And so I think being cloud first is going to be the big theme for us. And what do I mean by being cloud first is of course, making sure Aura has all the product capabilities that organizations need, how do we deliver Bloom and works like, you know, a browser and all these tools that you talked about in cloud, how do you enable doing that? It's going to be a big thing for us. And so I have this whole comprehensive Auras-based solution

Starting point is 00:43:32 where users don't have to deploy anything. They can just run it and that should run with your self-managed. Some of the tools will run with self-managed as well as in the fully managed database side of it. So I think that's one. The second thing in that realm is also how we integrate with the cloud ecosystems, right? You saw a first preview of what we can do with Vertex AI, OpenAI that we talked about

Starting point is 00:43:56 today, but there are a lot more integrations. We also did some integration with BigQuery, but I want to focus on Snowflake and Databricks and Azure Fabric and all these ecosystems that are evolving in cloud and make it very easy for our customers. Many of our customers are like, hey, for backups and recovery, we just want to use the cloud object storage and instead of doing like what we did traditionally in disks and all that and on-premises. So I think helping being that cloud-first company, integrating more natively with the cloud ecosystems is going to be one big theme for us.

Starting point is 00:44:34 The second thing I want to focus on is ease of use. I think one of the things that I have seen is graphs are like, you know, I'm pretty sure more organizations will use graph technology if we made it easier for them to adopt it. And this is the reason I want to make sure that anybody who wants to start with a graph technology can do that. I have this theme internally that I am talking about. It's not from me. One of the leaders in Microsoft I worked with has this thing. So I'm just picking it from him. It's like five seconds to sign up and five minutes to wow.

Starting point is 00:45:14 So basically, in five minutes, you should be able to pick the data that you want, move it into a, like model it into a graph model and move and actually start using bloom or something like that to actually visualize it and start finding value in your data and stuff like that so just simplification from import modeling to visualization to analytics which is basically dashboarding and all of that stuff so we have some experimental tools that we have given our customers like neo dash which is the dashboarding we have neo converse that we have given our customers like NeoDash, which is the dashboarding. We have NeoConverse, which is chat experience, looking at some of those profile end to end and seeing what more tools will be helpful for organizations to go from one to end and how do we make it easy.

Starting point is 00:45:58 So that's the second piece of it. And in that whole context, it's also about how we think about data science. Because I think data science has two use cases. One is a developer who has built an application. They want to build an intelligent application. They want to put predictions within their application, like a supply chain graph wants to run some risk algorithm, and they just want to call a function to go figure it out. And that's one example. And then there is this whole other example where a data scientist or an analyst wants to run some algorithm, figure out what analysis they can do. But that is more of an ephemeral workload,

Starting point is 00:46:33 which is like a temporary workload. It's not like running 24-7 as an application. So enabling those two different types of use cases for our customers in a very cost-effective way in the cloud is now possible. So I think that's the third area, which is like, how do I satisfy those use cases? So those are the three main things. And of course, I come from large big data world and making sure our database can also support cloud scale, not just we are one of the most scalable databases, we can do

Starting point is 00:47:05 large read replicas and all of that. But I've been talking to large customers from my past, from Snap to Spotify to like Walmart, like all the large customers and the size of data is only growing in organizations. And so how do we support the cloud scale architecture and cloud scale capabilities is going to be another big area of focus for me so that's at a very high level maybe we do a one session sometime in future when i have the real roadmap final for next couple of years and we can go deeper into that yeah yeah i mean in all fairness you you're new so i wouldn't expect you to have it all figured out by now and i would it would actually be even a bit concerning if you had like everything else

Starting point is 00:47:49 already. What I can tell you is that, well, there's definitely a overlap in the way that Emil thinks. So the emphasis on ease of use basically, and he's always been saying that, well, it's, it's already easy to use, but we want to make it even easier. So I can see why, you know, you do get along basically. Yeah, I agree. I think it's been great working with Emil in last or three months as part of

Starting point is 00:48:18 this team, but also more than six months I've been talking to him. He's an inspirational leader. And I think we have a lot more opportunities to like you know make it make our technologies more pervasive across the globe so yeah looking forward to working with with him and the broader team we have an amazing team here and not just in product and engineering but our leadership in the revenue organization marketing like it's a great team, finance and all that. So I'm looking forward to partnering with everybody

Starting point is 00:48:47 to take Neo4j to new heights now. Great. Well, it's been a pleasure. And best of luck with your plans. And well, I'm sure we'll be in touch as they come to fruition eventually and you have product announcements and all that to talk about. Thank you, George.

Starting point is 00:49:07 Thanks a lot. For more stories like this, check the link in bio and follow Link Data Orchestration.

Orchestrate all the Things - Neo4j's roadmap in 2023: Cloud, Graph Data Science, Large Language Models and Knowledge Graphs. Featuring Neo4j CPO Sudhir Hasbe

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.