Orchestrate all the Things - Amazon Neptune introduces a new Analytics engine and the One Graph vision. Featuring Brad Beebe & Denise Gosnell, Amazon Neptune General Manager & Principal Product Manager

Episode Date: November 29, 2023

Amazon Neptune, the managed graph database service by AWS, makes analytics faster and more agile while introducing a vision aiming to simplify graph databases. It's not every day that you hear p...roduct leads questioning the utility of their own products. Brad Beebe, the general manager of Amazon Neptune, was all serious when he said that most customers don't actually want a graph database. However, that statement needs contextualization. If Bebee had meant that in the literal sense, the team himself and Amazon Neptune Principal Product Manager Denise Gosnell lead would not have bothered developing and releasing a brand new analytics engine for their customers. We caught up with Bebee and Gosnell to discuss Amazon Neptune new features and the broader vision.  We cover where Amazon Neptune fits in the AWS vision of data management, and how the new analytics engine provides a single service for graph workloads, high performance for graph analytic queries and graph algorithms, and vector store and search capabilities for Generative AI applications. We also share insights on the One Graph vision, the road from serverless to One Graph via HPC, as well as vectors and Graph AI. Article published on Orchestrate all the Things: https://linkeddataorchestration.com/2023/11/29/amazon-neptune-introduces-a-new-analytics-engine-and-the-one-graph-vision/ 00:00:00 Introduction 00:01:44 Amazon Neptune & AWS vision of data management 00:05:35 The Importance of Graph Databases   00:08:55 Amazon Neptune Use Cases 00:13:13 Introduction to Amazon Neptune Analytics   00:15:20 Key Features of Neptune Analytics   00:17:40 Use Cases for Neptune Analytics  00:21:10 Preparing Data for Generative AI Applications 00:23:37 Neptune Analytics Use Cases and Deployment 00:26:43 Pricing and Roadmap Q&A 00:48:46 Conclusion

Transcript
Discussion (0)
Starting point is 00:00:00 Καλώς ήρθατε στο Αρχιστήριο των Πορταγών. Είμαι ο Γιώργος Ανατιώτης και θα συνεχίσουμε τα πράγματα μαζί. Στοιχεία για τεχνολογία, δίδα, AI και ΜΕΔΙΑ και πώς μπροστά σε έναν άλλο, σύγχρονα με τα τάξη μας. Το μεταφέρον του γραφικού δίδαλου Amazon Neptune από το AWS, κάνει την αναλυτική πιο γρήγορη και πιο εύκολη, ενώ παραδείγματος έναν βίστημα που προσπαθεί να απαντήσει τα γραφικού δίδαλου. Δεν είναι κάθε μέρα που ακούς πρόεδρους που προβληξηγήσουν την χρησιμοποιία των προϊόντων τους. Ο Brad Beebe, ο Γενικός Μέντρων
Starting point is 00:00:28 της Amazon Epson, ήταν όλοι σοβαροί όταν είπε ότι οι περισσότεροι πελάτες δεν θέλουν γραφικές δίκαιες. Αυτό το σχέδιο χρειάζεται κατασταθήκη. Εάν είχε πει ότι, σε λιτέρια σκέψη, ο ομάδος του, ο ομάδος του Amazon Epson, ο πρωτοβουλίος του προϊόντου
Starting point is 00:00:44 του Amazon Epson, ο Δενίς Κόσναν, δεν θαμάδος του ομάδου και ο πρωτοβουλίος του Amazon Neptune, ο κ.Δ.Ν.Κ.Οσνελ, δεν θα είχε ασκήσει να αναπτύξει και να δημιουργήσει ένα νέο ενεργό ενεργοσύνης για τους πελάτες. Είχαμε τη μία συζήτηση με τον Π.Π.Ο.Κ.Οσνελ για να συζητήσουμε νέα πιτσένα του Amazon Neptune και την πιο διάφορη θέση. Βεβαιώνουμε πού αντιμετωπίζεται το Amazon Neptune στην θέση του AWS για τη διαχείριση των δεδομένων και πώς το νέο ενεργό ενεργό σύμβουλος προσο προσφέρει μία μία διευθυνσία για γραφικές εργασίες, υψηλή επιτυχία για γραφικές αναλυτικές εμβολίες και γραφικές αλγόρθμους και βεκτορ-στορ και διευθυνσία για εργασίες με δημιουργητική επιχείρηση. Επίσης, δίνουμε επισκέπτες για τη βιβλιογραφική βιβλία,
Starting point is 00:01:17 τη πρόταση από τη σερβερική σε μία γραφική μέσω του HPC, όπως και βεκτορ και γραφική επιχείρηση. Ελπίζω ότι θα το απολαύσετε. via HPC, as well as vectors and graph AI. I hope you will enjoy this. If you like my work and orchestrate all the things, you can subscribe to my podcast, available on all major platforms, my self-published newsletter, also syndicated on Substack, Hackernian, Medium, and Dzone, or follow and orchestrate all the things on your social media of choice.
Starting point is 00:01:43 Hi, I'm Brad Beebe. I'm the general manager of Amazon Neptune and Amazon Timestream. Neptune is AWS's managed graph database service, and Timestream is AWS's managed time series database service. A little over seven years ago, I joined AWS from a small open source graph database company to launch a managed graph database service. And today, I'm really excited to talk to you about Amazon Neptune Analytics
Starting point is 00:02:08 and give you a preview of what's going to be coming soon. But the first thing I wanted to do was to give you a little bit of an overview about how we at AWS are thinking about our vision of data management. And at AWS, our vision is an end-to-end architecture where customers don't have to worry about how their data is stored or managed. To do that, we really have three different pillars.
Starting point is 00:02:41 The first is having the most comprehensive set of services to store, analyze, and share your data. The second is having solutions that make it easy to connect all of your data between the services that you want to use. And the third, which is very important, is having the right governance and policy solutions in place so that you know that your teams can use the data effectively and quickly, but within policy and regulatory guidelines. At AWS, from a database perspective, we have the most complete set of both relational and purpose-built databases. And of course, today we're going to focus on one of my favorites, which are graph databases.
Starting point is 00:03:31 Graphs are awesome. And the reason that graphs are awesome is because they allow you to innovate based on the relationships in your data. In a graph data model, relationships are a first-class entity, which means you can ask questions and build applications that explore these relationships and the connections in your data. The challenge is that when you access a graph, the way that you need to touch the data is often random. And so if you think about
Starting point is 00:04:07 how one person is connected to another, is connected to another, the way that you lay out and store that data internally in a system makes it very difficult to predict how you're going to access it. So it's hard due to the random data access. And when you want to ask generalized graph questions or high-performance graph processing, you often get the best result by using a purpose-built graph solution. Amazon Neptune is AWS's fully managed graph database solution. It's purpose-built for processing graphs, and it's designed for interactive graph applications where you need to store billions of relationships,
Starting point is 00:04:51 in fact, up to 128 terabytes of graph data, and support interactive navigation with parameterized searches from one to three hops. Neptune offers customers the most choice of open-source and open and open standard query languages supporting both the labeled property graph and the resource description framework graph models and the three query languages of OpenCypher, Apache TinkerPop, Gremlin, and RDF and Sparkle. And did I mention that Neptune also provides a serverless deployment option and a global AWS regional deployments? One of the things that most excites me is that every day, thousands of customers create tens of thousands of different Neptune instances. And you can see from the customer logos here
Starting point is 00:05:45 that the kinds of use cases that customers do with graphs are very broad. When we look across our business, we really see four different areas where we're seeing traction with customers. The first are knowledge graphs, which is how is information related. And we see customers using this for information retrieval, as a precursor for machine learning types of applications,
Starting point is 00:06:11 and increasingly with various different kinds of Gen AI use cases. A brief example is Siemens is using a knowledge graph to power a digital twins use case, where they provide a query service for their digital twins that's connected by a knowledge graph to power a digital twins use case where they provide a query service for their digital twins that's connected by a knowledge graph. The second major use case that we see from customers are identity graphs. And these are using many different observations of customers or users or devices, and then using relationships between those observations, often in conjunction with different kinds of analytics, to be able to try and understand and create a 360-degree view of the customer.
Starting point is 00:06:58 So of all the interactions that I have across all this data, who are the actual customers behind the scenes? How can I help understand the customer journeys? How can I use that as a precursor to various different kinds of fraud applications? Fraud, of course, is a classic graph use case. It's really only limited by the ingenuity and the creativity of those who commit fraud. But the kinds of fraud that we see customers using to detect with Neptune are dealing with the relationships in the data. So they're looking at transactions or groups of individuals and trying to understand how those individuals are related to be able to do fraud detection. A very fun example of this is Games 24-7, which is an online gaming
Starting point is 00:07:48 company in India, and they play rummy for money. And one of the behaviors that they saw was that multiple groups of people were playing multiple tables of rummy at the same time, and they were colluding. And so by using a graph to look at the relationships across players at the same time, and they were colluding. And so by using a graph to look at the relationships across players at the same time, they're able to detect this particular pattern of collusion-based fraud. So it's a fun and interesting example. And the last major use case that we see from customers
Starting point is 00:08:21 are security graphs, also a classic graph space. We see customers using the connections between their devices and networks to help them understand their cloud security posture, to do detection about data exfiltration and data flows, to manage policies for identity and access control. This has been one of our fastest growing segments over the Είναι ένας από τους πιο γρήγορους σεγμένους σεγμένους των τελευταίων χρόνων. Έχω δείξει ότι έχετε κάποιες επίδρασεις σε κάθε από αυτά τα σχέδια για αυτά τα χρησιμοποιητικά συμφέροντα. Και πιστεύω ότι για την ιδιαιτήτηση, την εμφανίστηση και τα σχεδιαστικά γραφή, αυτά είναι πιθανότατα, τουλάχιστον, διαχειρισμένα με την εγγραφική εγγραφή των ποιότητες σας. Η γραφική εμφανίστηση των γνώσεων είναι πιθανότατα διαχειρισμένη με την εγγραφή των ρΔΦ. graph case is probably typically handled by the RDF engine. I think I also looked around a little bit and I saw you have pre-configured notebooks
Starting point is 00:09:11 for those three use cases for identity and all of them are property graphs, right? Yeah. So let me answer them in reverse order. So we do have the Amazon Neptune notebooks are Jupyter notebooks that we provide in open source. And they give you examples for both how to use graph graphs, how to use graph databases, and in particular for the different use cases. And you're correct in that the fraud and identity graph and security graph use cases. There are the examples that we provide are with Apache TinkerPop and Gremlin. And we do have a knowledge graph use case, which I believe we have both an RDF and a property graph one.
Starting point is 00:09:55 We do, you know, I think that I see a mix of different uses of graph models. I think the first assumption that I might make would be that most customers are using RDF to build knowledge graphs. And while we do see that many customers are using RDF to build knowledge graphs, particularly those who are really thinking deliberately about their information architecture and their information models, we also see a large number of customers choosing to build knowledge graphs with property graph.
Starting point is 00:10:28 And I think that that's speaking a lot to the value that customers see by relating the data and that it sort of transcends the choice of the graph model for those kinds of use cases. Okay, cool. Thanks. I'll let you pick up from there. Yeah, no, sure, no problem. So one of my favorite examples from the security graph space is Wiz. Wiz is a very fast growing security ISV.
Starting point is 00:10:59 They have cloud security posture management software. They have lots of research services. But the thing that is really interesting about WIS is the way that they're using the graph is to really help you understand why findings are important. So, you know, in the security space, it's very easy to get overwhelmed by alerts and things that seem scary that your detection systems are finding. And it's challenging to really understand, of all the things that we've found, which ones are the most important for you to prioritize
Starting point is 00:11:34 or for you to ask your IT teams to prioritize fixing. And what you see here is an example of Wizz's application. And you can see that they're using the graph to help understand why a particular vulnerability or detection is important to fix. And so in this case, what you're seeing is that the cause, the reason something is important is because this particular detection means that one of your business applications is connected to the Internet versus a developer system or something that was standalone and maybe had other kinds of defense in-depth pieces. So I think it's really interesting that, you know, their use of the graph is for explainability, and that's, you know, just really helps them be differentiating in their offering. So with that, I'm very excited to turn it over to Denise to talk about our new offering, Neptune Analytics. Thank you so much, Brad and George. My name is Denise Gosnell. I joined the Neptune team a little
Starting point is 00:12:40 over a year ago, and it's been a privilege to get to be a part of this team and join here to talk to you all and to share where we are going with our new analytics engine, Neptune Analytics. Amazon Neptune Analytics is a new analytics engine for Amazon Neptune so that our customers can make better data discoveries by analyzing large amounts of graph data with billions of connections incredibly quickly. So far, there are three main features or main ways to think about Amazon Neptune analytics that our customers in a beta program have been loving the most. The first of which is that it's a single service for working with your graphs. You can invoke popular graph algorithms, you can run low latency queries, and perform vector similarity search all from a single API. This API supports OpenCypher, which is a really popular open source
Starting point is 00:13:40 graph query language. The second thing our customers have been most excited about is that it's incredibly fast. So far, we've seen that our high-performance graph computing techniques have proven to be about 100 times faster for loading data in, and we've got 20 times faster scans and about 200 times faster columnar scans when you are running graph analytic queries and performing graph algorithms. The third thing our customers have loved the most is how much easier it is to start to build generative AI applications quickly. You can store and search vectors within Neptune Analytics by storing embeddings on nodes. And we also can use the Langchain library to perform, to translate natural language questions into open cipher queries, really lowering that bar to entry to working with graphs and working with graph
Starting point is 00:14:31 algorithms. So far, our customers have been using Neptune Analytics in three unique ways, first of which is that they're using them to perform ephemeral analytics. So imagine you have a workflow where you just need to spin up a graph really quickly, run some analysis, and turn it off. That's one of the main ways our customers have been using it, and it's giving you an overall lower total cost of optimization for your graph analytics workflows. The second way our customers have been loving using Neptune Analytics is for performing low latency analytical queries.
Starting point is 00:15:05 The best way to think about that is that there are many established ML pipelines with feature tables so that you can perform real time predictions off those features. Now our customers are able to run incredibly high concurrent query workloads to augment their existing feature tables with new analytics about their graph structure. That gives their ML models much higher prediction rates and overall higher end user engagement. The third way our customers are loving using Neptune Analytics is for doing vector search and then building Gen AI applications. Like we mentioned, you can perform a vector similarity search when you store your embeddings in Neptune Analytics, and then we also have a much easier way to translate those English questions into graph queries because the way we think about data just so happens to fit really well with how a graph structures it. I'd like to
Starting point is 00:15:56 go a little bit deeper and just show you all some stories about how our customers have been using each of these three types of use cases in a beta program that we've been running. So for ephemeral analytics, there was or there is a financial service company that has been able to increase at the point of sale, increase their intervention of successfully identifying fraud from about 17 to 58 percent. And they've been doing that by quickly spinning up a graph, loading their data, studying specific structural properties about it, and then turning it off. And that type of investigation for their analysts has helped them identify those new patterns of fraud much faster. Because as we say, and as our experience is showing, fraudsters are only as creative as whelp their minds. And so you've
Starting point is 00:16:45 got to be able to quickly find those patterns and be able to deploy new ways to fight against them. There's also a large media and technology company we've been working with who has loved the massive simplification that the ephemeral analytics workflows have brought for their data science teams. So what they've been able to do is to replace their data science pipeline with spinning up a graph, extracting those insights from algorithms and combinations of queries, and then turning it off. So that new process for them has offered an overall lower total cost for their data science team, and it's been a with our customers in our beta program for doing low latency analytical queries, we've been working with a social media company that 14 hours to load over 10 billion edges into a graph, understand specific properties about their recommendations and their friend engagements to then augment their ML pipelines. Now they can do that in about two hours,
Starting point is 00:17:57 and they're able to run much higher concurrent queries to augment those feature tables and to get those graph stats in their ML pipelines. Also, Amazon.com has been able to reduce time to resolution by about 25% for investigating fraud cases. Again, very similarly, as you heard, by being able to extract a graph feature by using Neptune Analytics and then augment their understanding of the ML predictability or those features for their ML pipelines so that they can have a much faster resolution on finding that fraud when it's emerging very quickly. Now, generative AI. Everyone wants to talk about how people are building generative AI, and got some more
Starting point is 00:18:41 stories about how our customers are working with us to build them with Neptune Analytics. So first off, one of the biggest themes that we hear about is how generative AI is helping customers make data discoveries. And that is exactly how we have been working with a large healthcare products company to do so. Specifically, they want to create scientifically aware search or translating proteins into vectors, doing similarity search, and then having the ability to you're able to find out what's similar and explain why. It gives you that much better way to discover new connections in your data. And it's really exciting and it's very interesting, especially right now on the Neptune team, to see how our customers are innovating in that fashion. We're also working with a very large online retail store who needs to make sure that they can quickly identify and flag pirated material that's being listed and sold. So you can imagine that you might have a piece of content that you know is
Starting point is 00:19:56 pirated, and you can combine vector similarity search and a knowledge graph to say, well, for this piece of pirated material, find other items that are also, that are very similar to it. And then you can traverse your knowledge graph to determine other sellers, listers, and buyers or patterns of how that pirated material is being listed and sold on the website. It's all about giving a service to our customers that is incredibly fast at detecting emerging patterns in as near real time as possible. When we've been working with our customers for doing generative AI applications, we have been working very closely to determine how they're going to be deploying Neptune Analytics within their workflows so that they can most quickly build generative AI apps. So there's two ways that we've been working with them. There's the perspective of how our customers and users are going to be using Neptune Analytics and then using generative AI. But then there's also the
Starting point is 00:21:02 perspective of how you're going to prepare your data and get it ready in a generative AI app. Let's talk about those in reverse order. So for getting your data ready, there's a need to have processes where I can imagine you have a large amount of training data that sits in a data lake. And you're going to need to use one of the well-established tools in Amazon or in AWS's tool suite like AWS Glue or Amazon EMR to process that data. And then typically our customers for Neptune Analytics are storing it in one of two places. You might need to store it in Amazon Neptune Analytics itself or Amazon Neptune, or you might just be storing it in S3. So once you process your data out, you can put it in Neptune or you can put it in S3. And from each of those locations,
Starting point is 00:21:50 it's incredibly fast to get it into Neptune Analytics to use in your application. Once you're also pre-processing your data, you might want to learn embeddings off of that. And that's when, on a second note, you might want to use maybe the open source laying chain library, or you might want to use SageMaker or Amazon Bedrock to extract embeddings about your data to then also persist in Neptune Analytics to use for vector similarity search. So those are two ways to look at how customers are extracting data from their data lakes, storing it in the Neptune service, and then using Amazon tools like Amazon Bedrock to get embeddings to set it up. Now let's look at the other side,
Starting point is 00:22:30 how our end users or how our customers' end users are using generative AI applications. They are going to probably start with invoking a query, and they're absolutely loving hitting that Langchain OS, Langchain library from open source to translate a human question into a graph query because that's our favorite part about working with graphs. The way we think and speak about data naturally maps into that connected and natural way to work with data.
Starting point is 00:22:58 Once they have their query, those queries are being run against Neptune Analytics at incredibly high speeds with high concurrency so that you're able to get answers back, rewrap them, and use another large language model to make them much easier to understand and then return it to the end user. We like to talk about that because it's really important to see as the generative AI space is moving so quickly, it's important to see and start to understand the patterns in which people are deploying Neptune Analytics and deploying graph technology to be used within a generative AI app. And you got to consider both sides. You got to understand how you're going to prepare the data. And then you also need to work backwards from how
Starting point is 00:23:38 your end customer is going to use it so that you can architect it to be as fast as possible. So let's talk about pricing here for a second. When you start to look at Neptune Analytics and its pricing, its pricing is going to be based on memory optimized units. So we're going to be pricing this based on how much compute that you use for Neptune Analytics per hour. You're going to have essentially a capacity, a provision of memory, and it's going to be associated to different compute and network resources. And there's a price per hour for how much compute that you're going to be using. Our customers have been loving this because it drastically simplifies how you create your graphs. You're not having to think about the instances and making all of those subsequent choices. You can just specify the maximum capacity of a new graph in terms of
Starting point is 00:24:31 gigabytes of memory. And then the last thing that our customers have been loving is that the capacity can be automatically determined when you're importing your data from S3 or you're importing your data from Neptune with an overall max capacity so they can control their budget. So to kind of recap of Neptune Analytics and where we're going, Neptune Analytics is a new analytics engine for Amazon Neptune that's incredibly fast. It's about 100 times faster than our existing solutions for doing graph analytics today. You can receive incredibly fast responses to analytics. It's tuned for those memory intensive graph computations, and it's built for use cases that are ephemeral.
Starting point is 00:25:12 Spin up a graph, run analytics, turn it off. They require a lot of highly concurrent low latency queries, like augmenting established machine learning pipelines with new graph analytic features. Or for those building in the greenfield to build more generative AI applications, Neptune Analytics is built to support vector similarity search and other integrations like with large language models stored in Amazon Bedrock. Our customers are using this to make data discoveries and to use both the explicit
Starting point is 00:25:43 modeling of a knowledge graph and the implicit search of similarity search from vectors to really do some fascinating, to build some fascinating new use cases when you combine those two together. The overall simplicity of where we're going with Neptune Analytics to have that single API is one of the most loved features so far.
Starting point is 00:26:02 You can load, query, and analyze graphs all from a single API. And the simple pricing model is making it a lot easier for our customers to make choices and get started. That is where we're going with Neptune Analytics. It is an incredibly exciting time here. And thank you so much for having us to get to talk about it. Great. Thanks for the introduction. And I do have a number of questions, actually. And to be honest, that all sounds pretty interesting and impactful.
Starting point is 00:26:33 So based on what you said, it sounds like your users are already making good use of it. And what I'm trying to figure out here, though, is where does that all stand, let's say, in relation to what you already had? να αναφέρετε εδώ, όμως, πού όλα αυτά στήνουν, ας πούμε, σε σχέση με το τι ήρθατε ήδη. Γιατί, όσο ξέρω, η Neptune ήδη υποστηρίζε έναν τρόπο αναλυτικών πιθανότητας και, πιο δημοσιογραφικά, έναν τρόπο αλγόθυρων που μπορούσατε να περάσετε από την κομμάτια. Και ξέρω ότι ήδη είχατε δυο εργαλείς που περάσαν κάτω από το κομμάτι, είχατε την εργαλ have the RDF engine and the property graph engine. So is this a new engine on its own, or is it some kind of add-on or enhancement
Starting point is 00:27:12 or new features to the existing engines? Yeah, great question, George. So to answer one of your questions, Neptune Analytics complements Neptune by offering in-database algorithms. So when you have your data in Neptune, you can connect, you can essentially spin up a graph with Neptune Analytics, connect the endpoint to the ARN of your Neptune cluster, and it'll automatically ETL your data from Neptune into Neptune Analytics. Neptune Analytics is an in-memory processing engine that offers in-database algorithms. So that's a difference and an improvement for Neptune's customers today.
Starting point is 00:27:51 I think it's also a question about the kinds of use cases. I think that when I was talking earlier about graphs, still excited about them. You know, we talked about random data access. And, you know, for Neptune databases use a pretty traditional database type of architecture, you know, where we separate compute and storage. You can store very large graphs, are used to answer graph queries. And so that works really well for interactive OLTP-like graph applications, where over time you have kind of your hot working set of data that's in the instances. And you're answering questions over specific parts of the graph because those queries are parameterized with, you know, I'm looking for friends of Brad or contacts of George or those kinds of things.
Starting point is 00:28:58 Often for the use cases like the ephemeral analytics, some of the low latency analytic queries, you need to ask questions over the entire graph. And so from that perspective, what we found is that we needed to build an in-memory type or in-memory optimized architecture to be able to store and partition the data in a way that was optimized for questions where you might have to look over all of the graph at the same time to answer them. So if you think about your graph algorithms, you know, ranking kinds of operations, clustering kinds of operations, things where you really want to find trends across your whole graph or find insights across your whole graph
Starting point is 00:29:46 versus just answering parameterized pieces. And so that's kind of why, from that perspective, Neptune Analytics has a little bit of a different architecture because it's built for to solve a slightly different use case on the graph problems. Okay, well, that makes sense. Actually, it also sounds a lot like the reasoning I heard from στις προβλήματα του γραφείου. Ωραία, αυτό έχει σκέψη. Επίσης, αυτό ακριβώς σκέφτεται πολύ σαν το σύμφωνο που έκανε από τους ανθρώπους του Neo4j που όπως πιστεύετε πιστεύετε πριν έφερε
Starting point is 00:30:14 ένα αναλυτικό εγγυμό που βρίσκεται σε παραλληλή διαδικασία. Και έδωσαν ακριβώς το ίδιο σύμφωνο που μου έδωσε για γιατί το έκαναν και πραγματικά για το πώς το έκαναν. Έτσι, έλεγαν precisely the same reasoning that you just gave me for why they did it and actually how they did it as well, I think. So they said they basically introduced some parallelism in order to be able to achieve the speedup for cases that have to do with global graphs
Starting point is 00:30:35 and so on. So that makes me wonder, how did you implement your own solution? And perhaps if maybe you also did something similar. So, I mean, I think that, you know, we're both, you know, we both see a broad subset of graph customers and graph use cases. So, you know, I'm always excited to see what Neo4j launches. I think they've got a great product team and a great engineering team. I think, you know, from our perspective, we have, you may or may not be aware, but we have several different, there's a role at Amazon called an Amazon Scholar. And Amazon Scholars are people in research or academia who often will spend time working with Amazon and with the service team. And we have several different Amazon Scholars as part of the Neptune team. And so one of the things that we leveraged was techniques that are coming from high performance computing processing of large scale graphs.
Starting point is 00:31:33 And so that's really where we've taken the inspiration for kind of memory optimized graph partitioning graph and how to write algorithms over those kinds of in-memory optimized graph partitioning, and how to write algorithms over those kinds of in-memory optimized graph partitioning. I'm not as familiar with the specifics of Neo4j's implementation, but in terms of the parallel processing memory optimized pieces, those are things that are pretty well understood from the high-performance computing community, really the difference is that for HPC researchers, they're often solving a very specific graph problem on a very specific graph. And as we mentioned, graphs have random data access.
Starting point is 00:32:20 And so as a service that has to solve graph problems for many customers, one of our challenges was we had to generalize the techniques that can work well for high performance computing for general graph processing. We don't know what the shape of a customer's data is going to look like, and we don't know what questions that they're going to ask. So what we really did with Neptune Analytics was we took what we saw was sort of the best of high-performance computing for graphs and tried to build it into a service for general graph processing that can give good performance for graph analytics
Starting point is 00:32:56 for cases where you need to look over the whole graph. Yeah. Another thing that sort of stood out for me was that, well, those three areas, let's say, that you highlighted in the presentation, I tend to think of them as somewhat orthogonal. So, you know, for a similar workload, what you basically need to do is speed up the loading process by a lot. And in my mind, at least, that doesn't necessarily have to do with how the algorithms or parallelism or whatever it is that you do in the actual query execution after you have loaded those
Starting point is 00:33:35 after you have loaded that data. So I think that you probably did something which is like, I don't know, Amazon storage specific there in order to enable that speed up in loading the data set. So yeah, I think there was a couple things that enabled it. One was very much moving to more parallelism and loading and changing the way that we partition the data. So that was definitely a key part of it. You know, with Neptune databases, one of the things that really makes it unique for customers is how easy it is to provide high availability
Starting point is 00:34:12 and read replicas. And that was created by leveraging some storage technology that was originally built for other databases in AWS. And for Neptune Analytics, we're leveraging some other AWS technology, you know, that uses a log-based storage mechanism. And that's also how we're able to both load data very quickly, but also provide durability and strong consistency guarantees for many of those low latency type of applications. So we are leveraging, we've built a lot of things for graphs.
Starting point is 00:34:47 We're, like we did with the Neptune database, we are leveraging some unique innovations that are within AWS as well. The other thing that I saw mentioned at some point was that the goal is to have one single API to access all of these features. And so OpenCypher was specifically mentioned there. I'm wondering if using OpenCypher was specifically mentioned there. I'm wondering if using OpenCypher is actually the only way to use these new analytics features or whether it's also available using Gremlin and Sparkle. We will be supporting Gremlin and Sparkle after we become generally available. The notion about using a single API is more about being able to manage your end-to-end workflow of doing graph analytics from one endpoint.
Starting point is 00:35:28 Being able to do that from one endpoint is a massive simplification from what our customers have been telling us. I've been working within the Gremlin community for about a decade, so I'm very much looking forward to bringing Gremlin to Neptune Analytics very shortly. One other comment on that, Georgia, you know, I think you had asked a question earlier about kind of how the data vision for graph customers or something along those lines. And I think one of the things that I've learned is that customers, I would say,
Starting point is 00:36:03 and this sounds a little heretical, but I think that most customers don't actually want a graph database. And what I mean here is that they want graphs and they want to store and query their graphs, but they don't want to create instances and clusters and have another database management system in their IT infrastructure. And so part of what we're really excited about with Neptune Analytics is this idea that, you know, the fundamental unit of using this new engine, this new service is a graph.
Starting point is 00:36:40 Like that's what you operate on. You create graphs, query graphs, you store graphs, you select the multi-AZ availability constraints and policies that you want you operate on. You create graphs, query graphs, you store graphs, you select the multi AZ availability constraints and policies that you want to associate with it. And you don't have to manage clusters. You don't have to set those things up. And so, you know, we think that that is going to enable people to use graphs for more problems because it just reduces the kinds of overhead, you know, that they have to use before they can get started with it. So I think that, you know, on one hand, the analytics and the performance and the algorithms and the vectors are super cool and really exciting.
Starting point is 00:37:16 But when I think about really impacting how customers are going to think about using graphs and think about using them in many different places, we may look back and find that this graph API abstraction here was really probably the most impactful thing. We'll see. Time will tell. Okay. Well, yeah, the way you talk about it actually does sound important. And in fact, I think I may have overlooked it initially. Αυτό που είπα, πραγματικά, σκέφτεται σημαντικό και, στην πραγματικότητα, πιστεύω ότι μπορούσα να το αποτύπω αρχικά. Όταν είπατε ότι θέλουμε να δώσουμε αυτή την εμπλεκότητα στους ανθρώπους, ώστε να μπορούν να εμπλέκουν στους γραφείς, δεν είδαμε ακριβώς ότι το σημαίνατε αυτό.
Starting point is 00:37:56 Θυμόμουν ότι σημαίνει ότι θα υπάρξει ένα κοινό εμπλέκο API, το οποίο έχει σημαντικό σημασία, αλλά αυτό που λέτε είναι λιγότερο διαφορετικό. API entry, which makes total sense. But what you're saying is slightly different. It sounds like what you're saying is, well, basically we're doing away with the need to spin up nodes and provision instances and everything. So here's an API that you can use to create and manipulate your graph and that's all you need to do. Yeah, so I mean, with the Neptune Analytics, you create a graph, you specify some characteristics of it in terms of capacity, minimum and maximums that you want to consume. You specify characteristics related to policies for access control and those kinds of things.
Starting point is 00:38:41 And also characteristics in terms of availability, particularly whether you want to have single availability zone or multi availability zone. And you don't have to do other things. You don't have to select instances and build other clusters. And so I think we're going to learn a lot. You know, we may not have it exactly right, but we really think that moving away from the database management system abstraction
Starting point is 00:39:06 of consuming graphs and moving more towards a simpler API that gets customers storing and querying their graph data faster, it feels like the right thing based on what we've learned from customers. Absolutely. Absolutely. Because it's really, at the end of the day, helping our customers deliver an end value as fast as possible and making sure that we've abstracted that in the right way to help them discover data insights or be able to build up workflows that are going to help their applications and to do that work as fast as they can with as minimal choices that they have to make along the way. Absolutely, Brad. Okay, so the way you're describing it, it sounds like something like the equivalent of lambda functions for graphs. So you don't have to specify much. It's just you don't have to spin up an instance. You don't have to define your API, let's say, speaking about the equivalent in terms of programming in the same way that you don't have to provision nodes or anything. You can just spin up a Lambda function and you don't care about anything else. You can just spin up your graph and you're done pretty much.
Starting point is 00:40:17 That's definitely the vision. And I'll be honest with you, I don't think we're 100% there at launch, but we're far closer to being there than we are with the Neptune databases. And so, like I said, we're really excited about it. And maybe we should mark that George called it. We'll have something as popular as Lambda functions, but for doing graph workloads. So thank you for that, George. Well, you know, actually, like I said, initially, I sort of missed that point. And so you may as well want to highlight it a bit more.
Starting point is 00:40:51 Yeah, we'll think about that. It's a good call out. I think it's something that we'll go back and sort of think about. It's also it's kind of it's a nice theoretical lead in to say, like from a database vendor, that they don't think graph databases are the right thing. So we'll see. That's probably pretty memorable. It is. Indeed it is. So going back to your new features, as I said, I wasn't too surprised to see that you now support embeddings and vectors and all that, because it's a theme these days. και όλα αυτά, γιατί είναι ένα θέμα αυτές τις ημέρες. Πιστεύω πως πιο πολύ κάθε βιβλιοθέτης, όχι μόνο για βιβλιοθέτης γραφικών, αλλά κάθε βιβλιοθέτης βιβλιοθέτης κάνει αυτό ή είναι στο προσπάθειά του να κάνει αυτό, γιατί, φυσικά, υπάρχει πολύ προσπάθεια για αυτό. Πιστεύω ότι αυτό είναι πιθανότατα αυτό που δείτε επίσης στους περίπτωσές σας και
Starting point is 00:41:40 τι σας έκανε να εμπλεκτικάτε αυτή την εφαρμογή. Αυτή η στρατηγική είναι σε ένα τυπικό συναντήμα. Αντιμετωπίζουμε ότι υπάρχουν πολλά βεκτορικές διεθνείς και έχετε τα φασικά τύπη ερωτήματα. Ποιος είναι ο καλύτερος? Είστε να επιλέξετε έναν ειδικό βεκτορικό διεθνείς, που προσπαθεί να είναι πιο γρήγορο και θα σας δώσει περισσότερο επιλογή σε θέματα αλγόρθων και εμπνευσμένης υποστηρίας ή μπορείτε να πάτε με την εξωτερική σας δίσκο, είτε γραφικό είτε άλλο, η οποία θα προσθέσει μετά κάποιες δυνατότητες και θα είναι περίπου καλό αρκετό για αυτό που θα κάνετε, για αυτό που πρέπει να κάνετε, αλλά δεν θα σας δώσει, for what you're going to do, for what you need to do. But it's not going to give you, it's not going to be like best of breed, let's say.
Starting point is 00:42:27 Did you also get that type of question from your customers? Yeah, I mean, we absolutely do get the question about purpose-built vector database versus vector capabilities in other databases. And I think, you know, the way that we're thinking about it is sort of a yes and yes kind of answer, which is that absolutely customers will need purpose-built vector databases for certain things. But, you know, one of the benefits of putting vectors into your existing databases is that it makes it a lot easier and faster for customers to use and they don't have to move data around. So I think that what you'll see from our database offerings
Starting point is 00:43:10 is kind of both of those thoughts, where we want to meet customers where they are by giving them vector search within the databases that they're using. And there are some other use cases where purpose-built specialized vector performance makes a lot of sense. For Neptune Analytics, I think that the thing that we're really most excited about is how you combine vectors with graph searches and graph algorithms. And we haven't quite figured out the right way to talk about it
Starting point is 00:43:46 or the moniker for it, but internally we sort of talk about kind of vector-guided navigation. And what this means is that you're using the explicit relationships and properties in your graph, and you're combining them at certain points with the statistical capabilities that you get from vector similarity search. And by doing both of those things together, you're able to get a better outcome because you can both leverage the power of the statistical techniques and the explainability of the explicit side. That's not to mention that the other use case for vectors in the graph is vectors that are not necessarily coming from LLMs,
Starting point is 00:44:39 but vectors that are coming out of GNMs. So the other thing that you can do, and I think this is a more advanced use case, but for those customers who need it is really important, is you can store the embeddings that are coming out of your GNNs back into Inoption Analytics. And then you can do cosine similarity type want to do link prediction on your graph, which is super important for many of these marketing and kind of recommendation targeted content types of use cases, fraud, then that's another capability that's also enabled by having vectors in the graph. And so, you know, we're really thinking about the vector capabilities of Neptune Analytics as there's a vector guided navigation,
Starting point is 00:45:25 graphs and vectors being better together versus being, you know, trying to be like a pinecone or a mildus, if you will, for a vector side. Yeah, I think that makes sense, and I was also going to ask you about that because I know, I recall from the last time we spoke that back then you had just released a Neptune ML. So graph neural networks, basically. And it seems like a natural fit since you're now also adding vector capabilities to somehow intermingle those. Yeah, and you may or may not be aware, but the DeepGraph library that was part of Neptune ML has expanded into something called GraphStorm, also an open source project. And GraphStorm is really more about the APIs around building and deploying GNNs. And so I think that if you look forward into the future, you'll see things from both us and the AWS ML teams that make it a lot easier to store and load embeddings and compute embeddings over graphs between Neptune Analytics and GraphStorm.
Starting point is 00:46:41 No, I wasn't aware of that. so thanks for the point. Check it out. For sure. So you already sort of outlined one of the things that you will be working on in the future. What else do you have in your robot? I mean, I think that, you know, so you talked about the query language pieces. You know, I think that one of the things that we want to do is that we want to make sure that we can, you know, provide customers who want to do the TinkerPop and Sparkle queries with their analytics and vectors that capability as well. And then I think, as you'll recall, one of our visions about graphs is that the distinctions between the property graph and the RDF model, you know, really more distracting to customers than they are helpful. And, you know, so I think that, you know, one of the things that you'll see in the future is some of our one graph vision start to be realized, you know, within the M10 analytics platform.
Starting point is 00:47:45 So, for example, you know, if you can imagine leveraging the relatively large amount of publicly available RDF data with, you know, graph algorithms written over property graphs, those kinds of use cases I think that, you know, we're really interested in learning from customers how they want to use them. And that's part of the reason that we feel like now is the right time for us to release Neptune Analytics. We feel like there's a good core capability of use cases that people can do.
Starting point is 00:48:18 But there's also pieces where we're really looking for customer feedback and to learn how customers apply these use cases. And just to also echo absolutely to what Brad just mentioned about increasing the number of query languages and starting to deliver on our one graph vision. Brad also mentioned a few times so far about our vision for really simplifying the experience for working with graphs. And just to echo it, when we ship, we've got one experience, but we're really looking forward to working backwards from
Starting point is 00:48:52 customer requests right afterwards. I think one of the main themes that you're also going to see from us is really making it as easy as possible to work with graphs end-to-end in a workflow. When we've been talking with our customers, particularly those who are building out new Gen AI applications, they've made it very clear to us that being able to easily integrate these new features when they build Gen AI applications is one of their most important criteria. So we're really looking forward to continuing to build out on our vision for abstracting how you use a graph and build a graph workflow in your application, because making that as easy as possible is clearly one of the biggest priorities from our customers.
Starting point is 00:49:34 Thanks for sticking around. For more stories like this, check the link in bio and follow linked data orchestration.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.