Screaming in the Cloud - Everything Is a Graph (Even Your Dad Jokes) with Roi Lipman

Starting point is 00:00:00 I wouldn't want to program in any other language, but see, I just like the simplicity of the language. I don't like the fancy interfaces and I don't like, you know, things being hidden from me. It's a pirate's favorite language. Some people think it might be R, but the Pirates' first love is the sea. The dad jokes do not get better from here. Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Roy Lipman, who is the CTO at FalcourDB. This episode is sponsored in part by my day job, Duck Bill. Do you have a horrifying AWS bill?

Starting point is 00:00:44 That can mean a lot of things, predicting what it's going to be, determining what it should be, negotiating your next long-term contract with AWS, or just figuring out why it increasingly resembles a phone number, but nobody seems to quite know. why that is. To learn more, visit duckbillhq.com. Remember, you can't duck the duck bill bill, which my CEO reliably informs me is absolutely not our slogan. Woi, thank you for joining me. Thank you, Corey. Happy to be here. So let's start at the very beginning. It seems like you can't swing a dead cat these days without hitting a different kind of database in all shapes, sizes, levels of management,

Starting point is 00:01:30 etc. What type of databases, Falcor? Where does it start? Where does it stop? I think over the years, there was an explosion of different databases coming to the market.

Starting point is 00:01:44 FalcoDB is a player in the graph database field. One of many. Usually, there are the relational databases which everybody probably know and then about maybe a decade or two ago we started to see noSQL databases such as Redis key value store of their type and within this no SQL category you can find document databases such as Mongo you see time series databases today you would also find vector databases and there's this niche, which is the graph database field,

Starting point is 00:02:39 which we've been in for the past 10 years now. I'm reminded of Forrest Brazil's comic years ago, before he joined Google, sort of disambiguating all of the different AWS managed database services, because it really seemed for a little while there, like the DBA job of the future was going to be figuring out which of their 40 databases was the right option for any given workload. And for the Neptune option that they offered, the flow chart decision was, do you need a graph database? And the only

Starting point is 00:03:11 option was, no, you don't. And it led right back to the previous question, which I thought was pretty amusing. It seems these days that graph databases have found a niche, but it is a niche. I've not built anything on top of one yet. And talking to other folks, I don't think I'm very, a alone in that. Am I? Am I the freak? No, I don't think so. I think that it's very natural for people to think in tables. It comes very natural

Starting point is 00:03:41 to say, oh, my data fits into a table, so I'm going to go with a relational database. Those who are aware of this entity called graph, these are either people who

Starting point is 00:03:59 have encountered this, probably during their time at the university. Or working at Facebook. Or, yeah. I mean, if they think about, you know, how the data in Facebook is modeled and in any other social network, you would see this shape of a network or more generally we like to call it a graph forms. So apparently, or, apparently, or, you would say, Or as it turns out, there is this very famous saying, everything is a graph.

Starting point is 00:04:36 So even if you're aware of it or not, it's very likely that a lot of domains are behind the scene or can't be relatively easy described as a graph. Just throwing out a few examples. A social network, obviously, is a graph. Roads and intersections, maps, these are graphs. Any type of network where data flows is a graph. So graphs do pop up in a lot of use cases.

Starting point is 00:05:21 It's just the way in which we're viewing. the problem and realizing that, oh, I'm dealing with a graph here. So maybe instead of trying to force a graph structure into a different data model, such as the document store or a relational database with tables, maybe I should use the right data model for my domain or for my problem. And just to let this up, think about a navigation system, the one that you might have in your car. Whenever somebody changes the map, let's say, a new road was added or some road was removed because there's some construction and then the road is closed, every update to that map

Starting point is 00:06:22 need to go through a validation that says for example in a given state or in a given country you should be able to drive from every location to any other location on that map or within that state that is definition of how roads should work

Starting point is 00:06:44 yeah it's pretty obvious i mean you should get from point a to point B try to serve this problem or make this verification when modeling this data as a table or as a set of tables or a set of documents. It's practically impossible. You need a graph. It's a graph problem to begin with. And so, yes, I think that graphs are still a niche, but they do with the explosion of AI, we do see them coming back real strong with this idea of knowledge graphs. For a while, my default go-to was DynamoDB. And that was great for a variety of reasons for small throwaway projects, but something you learn pretty quickly when you start reaching for

Starting point is 00:07:36 the every tool you have is a hammer, every problem looks like your thumb. When you're using Dynamo, it works really well if and only if you know exactly what your access patterns are going to look like in advance, as soon as you start changing that due to evolving requirements, suddenly you're in a world of pain because you get to reimagine your entire table model, redo a bunch of stuff, possibly do a migration. So I have found that reaching by default for something like Postgresqueel tends to be the right answer for a lot of these things. But also, increasingly, I find that I don't care what database it uses because I'm not the one

Starting point is 00:08:12 building the code. I am hurling it over the wall for an AI tool to wind up. building out for me. How are you seeing the rise of graph in that universe? In the universe of people building applications with AI. Throwing it over the wall to AI so that it just decides whatever it's going to build.

Starting point is 00:08:31 It has preferences. And in some cases, I've switched providers just because I got tired of fighting against it. Like, no, use Amplify. Don't want to use Versaul. Fine, use Versailles. Just stop bothering me with it. I personally, personally try to avoid handing off my work to AI. So I use it mostly for doing code review. I don't have enough experience to answer that question. I'm not in the position where I constantly handing off work to an AI. Right. In fairness, that is where most developers are. I have the advantage of being a different

Starting point is 00:09:15 kind of developer where my only two languages I know are brute force and enthusiasm. When I want a line of business app to glue two things together, well, that would have been a multi-day undertaking previously. And now it's, eh, just build me a clicky thing so I can stop copying and pasting between two things like I have for four years. It's, it's basically the glue applications between things. And historically, I found that I cared a lot more about things like data structures, the underlying code, how this stuff all ties together. Increasingly, I find as long as I have a solid test harness, I can, I don't have to care as much. Now, will this have problems if I start scaling any of these things?

Starting point is 00:09:53 Oh, absolutely. But that's not a, that's not a risk for the way that I am building a lot of these things to begin with. It's more or less building everything with the idea from the outset that it's probably not going to work and I'll throw it away and replace it in two months with something else, which is, frankly, a luxury. But I do find that it gets me experimenting with different technologies. they otherwise would not have had the ramp up time or focus to really dive into.

Starting point is 00:10:20 Have you seen folks that is coming to Falcor from, well, AI suggested it, so now I'm curious, or is that not really the path that leads folks to a graph database? The way that our prospects of customers are ending up using Falcadipa, B, they are usually two main paths. It's either they realize that they need a graph and they do a quick, uh, search. and they come up with a number of contenders and hopefully they would pick Falcordip. So those are, sometimes they are coming fresh to this type of database. They did not have experience before but they understand that the problem that they're trying to solve is best or is suited for a graph database.

Starting point is 00:11:09 That's one path. The other one is prospects who had some experience with a graph database, maybe a short while or even for a few years. And for any, you know, there might be multiple reasons that they're not happy with their current provider. And so they want to try something else. Usually, those people who are turning to FalcoDB are turning in because they have concerns with regard to either scaling or performance. So this is where FalcordiB shines when you try to compare it to the other vendors. We have a unique approach to how we go about representing graphs and traversing them, but to

Starting point is 00:12:07 we also make it extremely easy for you to scale out in case you have multiple tenants or multiple graphs that you want to host on your Falcorp database. Something that I noticed about Falcourt ties into, I guess, a pre-existing bias I've had because for a number of years, it seems that folks have been pushing really hard to make vector search its own database type. But it's become pretty clear across the board, across, these are the database types that I spend more time with, that vector search is a feature. It is not an engine itself. And it appears that Falcour has, FalcordDB has taken this in a similar direction, where

Starting point is 00:12:49 you obviously have done a fair bit of enrichment with your own graph rag stuff, but vector search is already a feature that FalcordDB does. Where do you land on that? Vectal search or semantic search is really, important type of search that you can perform. At the very beginning, or at this is how I remember things a few years back, with the ability to compute embeddings, and then search given some data point,

Starting point is 00:13:28 which you also turn into a vector, and then you search through the vector space, the first product that came out were actually vector databases, companies such as Pinecone, Wii V8, I think, quadrant, and there are plenty. And eventually the other players, relational databases, graph databases, any type of database looked at it as if, I don't think, I want this feature, I want the ability to perform, semantic search over my own data. I don't need a new database to add to the stack and manage.

Starting point is 00:14:15 I look at it as a search capability and we already have indices doing full text, doing exact match. Why not doing also geo? Why not extend the index capability with semantic? And this is what the entire industry had done. You would see these days throughout the spectrum, almost any database would now support a vector capability. Now, I'm not saying that there's no room for vector databases out there. I mean, if you're hosting hundreds of millions or, you're... even more vectors and that's the only type of information that you're storing for your

Starting point is 00:15:12 application then yeah the vector database you know might make sense but if your numbers are relatively you know reasonable and you need those vectors just to do semantic search over a broader data set that you have For example, let's say you have a database containing users and every user has a profile image, and then you're getting a new image and you want to see which the top five users that are, you know, they look like the given query image. Then, okay, use a vector index capability. So that's what we've done. This is what the entire industry had done. That, I think, put a question mark on whether or not do we really need vector databases.

Starting point is 00:16:12 I think that for the extreme cases, yes. But for the majority, you would see people are using Postgres and they are doing vector search within postgres. Others are doing it within Falcord-DB because, you know, they have a graph and they want the semantic capability. Which tracks? And I'm not here to crop on other people's use cases. It's one of those areas where people get very sensitive about it.

Starting point is 00:16:36 It's like their children. It's like, oh, so which is your favorite child? I can't stand any of them if I'm being direct. Now, it becomes almost a religious debate at some point. Where did Falc-DB come from? What is the origin story of it? Because frankly, one day someone decided to sit down. I'm going to write a graph database.

Starting point is 00:16:57 Sounds like a supervillain origin story more than anything else. Well, that's what happened 10 years ago with me. I took a course at the university that had dealt with graph, and I just fell in love with this idea of graphs. And I thought that back then, I thought that it would be interesting to extend Redis capability with a graph data type. Back then, Redis just came out with its ability to extend the database with modules, Redis modules.

Starting point is 00:17:39 And I thought it would be a nice idea to have a graph data type and some graph capabilities within Redis. So I worked on that project for a while. I think it was about six months. And then I met with the Redis folks and I showed them what I've been, up to and they said, oh, that's really interesting. You should come and join Redis. So I did. And within Redis, I've built Redis graph. So we turned this six-month worth of work at home into an actual product. I've done that at Redis for about six to seven years, building and developing Redis graph. Which eventually Redis decided to abandon or shut down the project.

Starting point is 00:18:35 And myself and two other colleagues of mine thought that it would be a shame to see this project go down the drain. And so we decided to spin out of Redis and turn Redis graph into what now is known as FalcoreDB. So we've basically just continued on from where we've left. off at Redis. And we've been doing that for over two years now. So around 10 years in total, I'm involved in building and, you know, getting a graph database out there, which is competitive. So, yeah, that's the background story behind Falcourt.

Starting point is 00:19:22 This episode is sponsored by my own company, Duck Bill. Having trouble with your AWS bill, perhaps it's time to renegotiate a contract with them. Maybe you're just wondering how to predict what's going on in the wide world of AWS. Well, that's where Duck Bill comes in to help. Remember, you can't duck the Duck Bill bill bill, which I am reliably informed by my business partner, is absolutely not our motto. To learn more, visit DuckbillHQ.com. how do you find that it has been to run effectively what is a i have to be careful using terms

Starting point is 00:20:00 like open source people get salty about this but you are source available uh do you find that it is tricky to run a viable business where there's also the store of or you can just use the code and all the stuff that we build for free and make available to everyone i i i understand that it's somewhat of a leading question and a naive one but i am curious to get your take on it yeah i mean It's tricky. It's a question that pops up from time to time. I think that the benefits of having your product as an open source are greater than having it as a closed source. What happens is that your adoption rate is much greater when you have your product available for free. On the other end, yes, it's a bit frustrating to see all of those people who are using the product

Starting point is 00:20:51 and are not chipping in. But, I mean, at the end of the day, for me, it's a huge, it's just a pleasure to see the work that I've done being used all over the place. And it comes as a surprise, you know, we're not aware of all of the use cases and all of the people who are using the product. And when somebody reaches out and we have a conversation with him and you see the product being used for a variety of workloads and youth cases, you know, you feel great about that. In addition, the fact that it is source available, people can submit issues. People can have a conversation about it. there is benefit to it.

Starting point is 00:21:45 So if you look at our GitHub page, you'll see plenty of issues. And I think it's great to have GitHub issues because it shows that people are using the product and they care enough to go and open an issue. And so we learn from that. I mean, it's kind of like a free QA to the product. We might be able to produce or to generate more revenue if we were close source. But I think that the benefits of having something open and making some contribution to the open source community does pay off at the long run.

Starting point is 00:22:29 Apparently, back in 2024, you folks did an announcement about rewriting ValcoreDB and Rust. that often my snide comment has always been the primary IDE of a Rust developer is PowerPoint because it's impossible to do anything in Rust without doing a whole dog and pony show around it. Has that been completed? How to go? No, it's still a work in progress. My other co-founder, Avi,

Starting point is 00:22:57 is in charge of doing the Rust rewrite. At first, we've tried to do that in a gradual form. So we have this relatively large, C code base. I'm a big C fan. I I wouldn't want to program in any other language

Starting point is 00:23:18 but C. That'll get you hate mail just for that statement alone. Maybe I mean, I'm 42 years old. So maybe that's the you know, the age. Yeah, we're a gentleman of a certain age. And yeah, at some point learning the latest new trendy thing

Starting point is 00:23:36 is it just feels like it's more trouble than it's worth if I'm being direct. I just like the simplicity of the language. I don't like the fancy interfaces and I don't like, you know, things being hidden from me. The type of programming that is needed to develop a high-performance graph database,

Starting point is 00:23:55 you need to be aware and in control of practically anything that happens in your system. And any language that hides stuff from you and the language that would not give you absolute control over the instructions and the memory layout, you would simply, you know, would not be able to fully utilize the hardware. Just circling back to the Russ question. So the first we try to do it gradually, taking apart components that are written in C and rewriting them in ROS and then integrating them back. that didn't work too well for us.

Starting point is 00:24:36 And so after a year of trying this out, we've decided that this approach would not work. And so we're going to write everything in one go, the entire thing from sketch. So that's going to take a while in my mind. They did make a good progress, but it's still not there just yet. I'm more focused on the sea. So Pirates' favorite language. Some people think it might be R, but Pirates' first love is the sea.

Starting point is 00:25:13 Dad jokes do not get better from here. I store them in a database. There's a separate. There's a whole spiel we can go into, and we'll avoid for brevity's sake. What I guess is the challenge I've always had, and I suspect I'm not alone in this, is I keep seeing graph databases,

Starting point is 00:25:30 and I see things that are interesting, But I'm missing a toy problem that feels like a good fit for it. Take a relational database and great, I'm going to go and build even something dumb like a to do app, which is still overkill for some of that. Or I'm going to build something that will track books in a library with, and you can multi-table stories with authors and books and the rest. What is, I guess, the toy problem minimum viable footprint that isn't insane for a graph database? So I have maybe two examples.

Starting point is 00:26:03 We've recently done a webinar about our UDF capabilities, user-defined functions within the database. And the project which showcased this was form to table. So the idea was in this graph you have nodes, entities, representing forms. Those are connected to delivery hubs, which might connect to other. delivery hubs but eventually they would reach a store like Trader Joe's and basically what you want to do is to find a path that would get the produce fruits and vegetables and keep them as fresh as possible to the store. So in

Starting point is 00:26:49 other words it's an implementation of the traveling salesman problem. Yes you can I mean you can think about it that way it's a path-finding problem where you want to optimize some function. In our case it was the freshness of the fruit, but you can also look at it and say, okay, I want to find a path which has the minimum cost of transportation. So that's maybe one example where you know graph databases are a great fit. Another one which you know people can relate to is access management. So think about the big organization with lots of users and those users are grouped into groups and there might be a group hierarchy and then there are

Starting point is 00:27:38 resources such as servers and files and folders and every time a user try to access one of those resources you need to make sure that he has the right access. And so this once again becomes a path-finding problem if you think of it. You need to see if there is a path connecting that user. through a set of different groups to the end resource. So we actually have a number of customers who are using Falcordubb exactly for this. My last question before we wind up calling this an episode

Starting point is 00:28:14 is open source, source available, etc. Great. Awesome. Your code is on GitHub and accepts contributions from the larger community. I have heard a lot of open source maintainers. I have some minor stuff up, but you know, we're talking about talking to actual people who are serious about this stuff. They are inundated with a barrage of AI generated pull requests, the quality of which is, will be polite and say, varies widely.

Starting point is 00:28:42 What has your experience been with that? FalcordiB is not an easy product to contribute code to. So if we're getting some contributions, it's very rare and we're very happy to. to receive them. That said, a few years back before all of those new AI tools, whenever a PR showed up, it was obvious that there was a person behind that PIO who took the time and wrote the piece of code that he wanted. Right. Someone cared. One of my early exposures to the open source world in that sense was with Saltstack. Tom Hatch was a phenomenal founder of that project and company because he would accept my shitty poll request and then he would thank

Starting point is 00:29:34 me for it and then immediately rewrite it so it was good and then submit that as a second pull request within minutes which does not scale for crap but it was so welcoming and so you belong here that it really led to an amazing community for a while that doesn't scale and I'm not suggesting anyone should necessarily follow that pattern. But man, was I fortunate to encounter that? I mean, that was a great gesture on his behalf. I feel like, you know, if I think that PR addresses a real issue and there's a good justification to add it,

Starting point is 00:30:18 it's very likely that I would do some rewriting to it or ask the... You can also... This was also a different era. You can assume good faith back in 2012 when I was doing these things. Now, for all you know, someone just threw something into a slop cannon and basically said, I need to have another one for my resume. Go submit a PR. And they don't know what the project is.

Starting point is 00:30:41 They don't care about it. At least historically, it used to be people going through and fixing the comma placement in your docks. Fine, whatever. Now it's a lot harder to tell. We actually open, if you, we have another. of issues that we're opening to get help from the public and we're willing to pay for those PRs. Oh, no. I'm sure this will not cause an in-influx of AI-generated PRs. Oh, no. Yeah, that's exactly what is happening these days. People try to, you know, to address those PRs and collect that money by simply

Starting point is 00:31:16 throwing an AI at it and then we're, you know, we're bombarded. We've seen crap like this, like, I get at this my security email inbox all the time for last week in AWS, which is, again, a podcast and an email newsletter. It is not a SaaS app or anything like it. And they're always finding the same things of you're not enforcing hard fail for SPF on your emails, because if you do that, it will in fact cause some problems with email forwarding in some newsletter edge cases. So there, you should fix that. Pay me, please. Like, what the hell is this? It's the lowest effort. It's the beg bounty. It's awful. Awful. We're getting those as well. Well, I mean, at least twice a week I'm getting an email of that sort.

Starting point is 00:32:00 I don't know. It's a new era, I think. It is. And I would not want to be starting my career in this era. I don't have a better path for this. The roads, you and I walked have long since closed. It's the world's a different place. I'm always scared to give boomer to your advice of, well, back in my day, you print out your

Starting point is 00:32:17 resume a nice paper, and you walk into the office, and you demand to speak to the owner, and you have a firm handshake and hit the bricks. you'll have a job by sundown. Yeah, sure, that'll work. Now they'll call the police. I see people, colleagues of mine or, you know, family who are working in the tech industry. And the picture that they paint is pretty awful. I mean, from my perspective, people who, you know, call themselves pogromels are not typing anything in the keyboard except for handing their,

Starting point is 00:32:52 workover to AI and then just merging stuff in without even looking at it. And then you ask them, you know, okay, how does it work? Or maybe you can walk me through this. And the answer is, I have no idea. AI generated it. And so I think that's a very dangerous route to take. I mean, there is the counterpoint, too. I look at this thing, like, how does this work?

Starting point is 00:33:17 I don't know. I wrote it six months ago. That was always my answer, too. It's like, honestly, I don't know what the hell. hell me of six months ago was doing, but he was terrible. I can relate to that as well. How many times in your career have you been like, what idiot wrote this and you fire up get blame?

Starting point is 00:33:34 And it turns out it was you so you quietly fix it. Don't say a word. And then just go ahead and patch it. Yeah. Yeah. It's fun to look back and see, you know, how you as an individual developed over the years, especially if you're involved. For certain values of fun, yes.

Starting point is 00:33:50 I mean, if you're involved in the project for a few years, then hopefully you do get to evolve. And it's nice, you know, to have this retrospect and see how was the code that you've written five years ago and where you're at right now. Hopefully, hopefully you have improved. But these days, you know, people who are using AI, to the point where they don't give it they don't even get the chance to practice their craft um yeah i think that's problematic people are saying it just you know in my area people are saying that new developers juniors are very having a really hard time um getting work and so i i wonder where where would we find the the

Starting point is 00:34:50 seniors, the 15 years ahead of us, the seniors, from where are they going to come from? That's the thing that I've been pointing out to people. When I first found that AI could write some code and show it to some friends, their responses. Well, this is great to replace junior engineers, but it won't replace senior engineers up. Where do you think senior engineers come from, buddy? They don't simply form from the forehead of some God somewhere. Exactly. And then maybe some would argue that in the near future, signals can also be replaced. I don't know.

Starting point is 00:35:24 I think that at the moment, LLMs are having difficulty to work with large code bases. I do see them as a great utility to start out something new. Whenever there is a clean slate and you ask it, OK, build for me from scratch something, it might work well. It might work well for applications that are not at the system level, maybe front head type of work,

Starting point is 00:35:57 maybe some backend type of work. But to build, I don't know, 20,000 line worth some high performance, high quality application database. Yeah, my version of US. is around ergonomics of command line tools. I am not a front end person. This has been an unlock for a lot of different use cases there. And it's powerful.

Starting point is 00:36:22 But I'm also not sitting here thinking I can build a reasonable front end for anything of scale. The counterpoint is I think that's a dangerous point. Well, that's not going to scale in any meaningful way. It's a back-of-house application for one email newsletter to get from one API to another. Maybe it doesn't have to scale. Maybe in the event that there's an outage. into something, we can send the newsletter a half hour later.

Starting point is 00:36:47 Maybe it's not a concern. It's fit the problem for business case. Because by the time you go from concept that you're puttering around with one evening on your laptop to a hyperscale service, you're going to have rewritten that thing a dozen times over already anyway. I agree. My favorite part, such as a startup founder myself, is every time you hire someone, they're better at the thing that you're hiring them for than you are.

Starting point is 00:37:08 So it's a constant state of being humbled by these things. And watching people in the interview trying to put a dude. diplomatic corporate spin on what the hell is wrong with you? Why would you do it this way? Well, it got us this far to hire you, so please come help fix it. If people want to learn more about Falcord-DV and graph databases at large, where's the best place for them to go? Well, they can check out our website.

Starting point is 00:37:34 They can check out our documentation. I don't know if we're going to include the links to those. We absolutely can't put those in the show notes for you. be fine. Then our documentation, our website, and our Discord channel, this is where you can reach out and learn about our product or our service and on graph generally. And if you're, you know, interested in the actual source code and see seeing what I'm working on, then you can check out our GitHub, which is FalcordDB at Falcode. Or Blaclis. And all of that will wind up in this show notes.

Starting point is 00:38:17 Rui, thank you for taking the time to speak with me. I appreciate it. Thank you, P. It was fun. Likewise, Rui Lippman, CTO and co-founder of Falcord D.B. I'm cloud economist Cory Quinn, and this is screaming in the cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you didn't enjoy this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, confused comment that makes it sound like,

Starting point is 00:38:44 you're a zookeeper understanding why giraffes need their own database.

Screaming in the Cloud - Everything Is a Graph (Even Your Dad Jokes) with Roi Lipman

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.