Orchestrate all the Things - Fluree, the graph database with blockchain inside, goes open source. Featuring Brian Platz, Fluree co-founder and co-CEO

Starting point is 00:00:00 Welcome to the Orchestrate All The Things podcast. I'm George Amatiotis and we'll be connecting the dots together. Today's episode features Brian Glatz, Flurry co-founder and co-CEO. Flurry is a hitherto under-the-radar graph database that uses blockchain to support data lineage and verification. Flurry, the graph database with blockchain inside, goes open source. I hope you will enjoyed the podcast. If you like my work, you can follow Link Data Registration on Twitter, LinkedIn, and Facebook.

Starting point is 00:00:33 Well, I guess, you know, a good way to start a conversation and what I usually do is, you know, I just ask people to say a few words about, you know about themselves and their role in the company and a few milestones in your path leading to where we are today. And just for the record and for the people who may be listening, I would just like to note that the occasion for connecting and having this conversation is twofold actually so on the one hand it seems you're about to raise some money for a funding round and you also have plans to open source your product so there's a lot to to go over so let's let's start from the beginning yeah sure there is quite a bit um so the beginning. So, you know, Flurry was founded by myself and Flip Filipowski. And Flip and I have been working together for about 20 years now through, I guess, three or four companies that we've built. Flip's background goes back further than that, including, you know, founding

Starting point is 00:01:45 and taking two tech companies public on the NASDAQ. But, you know, at the core, we have dealt with the issues of managing data more effectively, not only for our own products, but when working with our customers and seeing the struggles that they had. And so, you know, Flurry, about four years ago, we, you know, looked at the landscape. We saw some exciting momentum around blockchain technologies, which we thought could add value around data in securing the integrity of information, something, you know, that surprisingly we don't have a lot of today. And then we're also have been very excited about semantic web, semantic graph technologies and its ability to connect data like sort of the internet connected information,

Starting point is 00:02:40 which is really the promise of it. And that those two combined with some other things that we could look at really had an opportunity to take data management to a whole new level. And so that was really the genesis of Flurry. We spent several years really just building, you know, and in some cases rebuilding. You know, we'd build, we'd realize it wasn't quite what we wanted or it wasn't doing what we wanted or it wasn't doing what we wanted and we'd start over again. And we were fortunate to have the luxury to do that and spend the time to really get it right. And about two years ago now, we released the beta version

Starting point is 00:03:15 and shortly after what we call our community edition, our free edition, which now has over 15,000 users. So we're very happy, speaking of milestones, that that is continuing to really ramp up in adoption as people learn about this. And we've also started to sell about a year ago now a commercial version, which is really our community edition, couple extra features, but 24-7 support and some other services. And we've managed to attract some really fantastic customers with that. And probably one of the most notable is now our largest customer is the Department of Defense, U.S. Air Force. So that's a quick recap of our journey to this point.

Starting point is 00:03:58 Okay. Okay. Thanks. Actually, there's a couple of additional questions I have regarding your background, let's say. So first, you can correct me if I'm wrong, but to the best of my knowledge, I don't think you've raised any capital before, have you? So the question there would be, well, what were the funds that you used to get you to the point where you are now? Was it self-funded or something else? Yeah, the first two or so years were actually self-funded. And that was more of our experimentation time, more of our R&D time. About a year ago, we did do a seed round of funding.

Starting point is 00:04:44 So it ended up being around $5 million in total. And one of the things that we're announcing here very shortly, I think within a few days, is that we just opened up and extended that seed round for a little bit of additional capital, bringing the seed round in total up to $6.5 million. And that was required not because we needed more money, we were actually quite fine, but part of our contract with the Department of Defense mandated that we raise an additional $1.5 million. And that's part of their checks, just with us being a small company. To them, that proves that there's commercial interest in our product beyond our contract directly with the Department of Defense. So that is kind of where we're at from a funding standpoint.

Starting point is 00:05:37 Okay, okay, thanks. So yeah, the U.S. Air Force use case is a very interesting one, I would say, one at least that I have seen people getting interested in, so we'll get to that in more detail. But before we do, the other question that I wanted to ask regarding, well, company background or team, if you will, more or less, was at some point, at an earlier point, maybe it's already been a couple of years, I was going through Flurry's team, basically.

Starting point is 00:06:09 And at that point, something that attracted my attention was the fact that it seemed like, you know, for a company that has a vision such as the one you outlined before, which is quite ambitious and, you know, involves bits of complex technology, let's say. It seemed to me back then that the balance, let's say, of your team was not as much leaning to the engineering side as I would normally expect or as I have seen in other companies, especially in their early years. And, you know, it seemed that you were instead, like, giving more, paying more attention to, you know,

Starting point is 00:06:52 marketing or outreach or sales, which is a different line of thought. I would just like to hear from you, you know, if my observation is actually to the point. And if yes, you know, what then if yes, what is your thinking? Even though I see that today this has somehow changed, that you have more engineers in the team, I would just like to hear what your thoughts are on that. Yeah, so I don't know that that's ever been the case. So I'm not sure where that impression came from.

Starting point is 00:07:24 But today, we're probably about 60% of our company is R&D. And as you know, most software companies are closer to typically about a third. So we're very heavily tilted towards R&D. And in our early years, we're even more so. But I will take it as a compliment. And certainly, Kevin Doubleday, who's I's on the phone here too, is our communications lead, that it's a compliment that it seemed like we were doing a really good job getting the word out. But yeah, even to this day, technically, we have one person in sales in our entire company. So yeah, we are definitely more r&d focused i anticipate that will change over time um but uh yeah that's where we are today okay okay thanks all right so then i guess yeah as i said as you also mentioned yourself actually it seems like you are basically with

Starting point is 00:08:22 your with your product trying to combine the benefits of two different technologies. So blockchain technology and well graph database or semantic web or whatever it is that you want to call it. So let's start with the blockchain part. So what are the main benefits that you are seeking to get out of using blockchain? And to make that a little bit more concrete, I went through some of your publicly available material and it seems that you highlight the ability to do what people colloquially say is time travelling. So, you know, being able to query and get snapshots of, you know, the image of your data in different points in time and so on, which has certain appeal to that in specific use cases.

Starting point is 00:09:23 But, you know, this is not is not you know using blockchain is not the only way you can achieve that and there are other products in the market that give the the same benefit without actually involving blockchain on the other hand involving blockchain technology comes also with certain drawbacks so how did you evaluate you evaluate the benefits? What was your cost-benefit function, let's say, and what made you take that decision that blockchain is the right technology for you? Yeah, really good question, George. So our main goal wasn't trying to combine blockchain in what we do. Of course, it was really focused around solving problems that we think are becoming, you know, increasing issues for organizations and have been issues for organizations.

Starting point is 00:10:15 You know, blockchain plays a role more in the integrity of data and how data can be proven to external parties. And it's a side effect of our need to increasingly share data with other parties. You know, something that, of course, we've been doing for a while. We do it by building custom APIs, typically today, but also with emerging technologies like artificial intelligence, and while we as humans can make judgment calls about the information we're receiving when we're deciding or making decisions, machines, AI in particular, has no good ability of making similar judgment calls about the data it is operating on. And being able to prove the integrity of the data, that the data hasn't been tampered with and how it originated, we think is going to be a very, very important issue as AI and machine learning makes more decisions automatically, especially when those decisions have implications. So, and at the same time, we spend trillions of dollars on auditing in the world. It is a massive, massive industry.

Starting point is 00:11:30 And while we still need auditors, there's a big part of what auditors are doing, which is verifying data has integrity, verifying that it hasn't been manipulated with by people, and then certifying that to third parties. Well, mathematically, we have the ability to actually record and prove that data hasn't been manipulated and how data originated. And so blockchain provides a really nice mechanism to be able to do this. So we think the implications are pretty profound, not only in helping machines make better decisions, but by decreasing potentially costs around auditing, improving information, and importantly, as we're sharing information and participating in more data networks where the information we're sharing might not have originated from us, it might be us passing along data that originated from our partners, our customers, our suppliers, and being able to transfer the proof of that data through that data supply chain or that data network are important. analysis, but it's something that we haven't really had the ability to do and we think is

Starting point is 00:12:46 becoming increasingly critical in data management today. So that's where blockchain ends up fitting in. Okay, I see. So I should also note that at least as far as I've been able to tell, it looks like you are using what people call permissioned blockchain which basically means that it's not you know an open network that anyone can join and therefore the implication of that is that you need to have extra security in place basically so that you don't get collusion scenarios and so on but having a permissioned permissioned network basically means that you know who's part of your network. So does that mean that one of the drawbacks of using blockchain technology is precisely

Starting point is 00:13:37 because you have all the added complexity of cryptographic proofs and being tamper resistant, it means that there's a hit in performance. So what led you to choose the permissioned scenario? Was it the fact that you wouldn't have to pay that penalty or was it that that was more suited to your use cases? Yeah, I'd say it's definitely more suited to the use cases. So our goal hasn't been to create like a cryptocurrency or a public blockchain network. It's been around, at least as it relates to the blockchain side of this conversation,

Starting point is 00:14:19 it relates around people being able to be assured of the integrity of information. And in order to solve that problem, you do not require a public blockchain network to do so. So that would be the main driver. The other thing is that when you think about data management and a lot of the requirements for data management, we need relatively fast transaction speed. We need, you know, really fast query speed for most types of applications. You know, the query speed can be solved in a few different ways, but the transaction speed is very difficult to solve with a full public blockchain network. There's just no good way of getting a lot of computers to agree to information in a very, very short amount of time. Of course, there are some people out there that purport to be able to do this, and they do to a degree, except they always come with a consequence, which is you can record transactions fast, but you can't guarantee the integrity or sort of the ordering of the data for an extended period of time.

Starting point is 00:15:25 So when you're dealing with data and integrity of information, you have to be able to guarantee both. So to be able to do that in a very fast way, really a permission network is the only good way of doing that. But again, you know, our focus is really more around proving the integrity of information. And so we do not need the public blockchain sort of route to be able to do that. Thank you. So we've already, we're already touching upon the issue of querying and transactions and so on. So which kind of leads to the second

Starting point is 00:16:06 let's say pillar of your technology approach which is the graph database so I have my own well let's say theory in that as far as you know the combination of blockchain and graph which you, it's kind of natural in my view, but I'd like to hear from you. So you basically got into this space in graph databases, well, I would say just before it started getting really, really popular. So what was it that led you to that decision at the time that, you know, it wasn't, the domain was not as high, let's say, or as, you know, the prospects didn't look as good as they're looking right now? Well, to us, graph database technology is an obvious choice for managing data with today's data requirements. You know, the risk, I would say the only risk about going the graph database route today is that a lot of people in the development community don't have a lot of experience with it yet.

Starting point is 00:17:21 So it's always challenging to get people to try new technologies. Why does graph make sense? I mean, graph is, you know, this beautiful format that allows the types of more complicated sort of relationship joins that we're doing more and more today to happen incredibly efficiently, way more efficiently than you would ever be able to do with a relational database. And at the same time, you end up with basically all the analytic type of capability that you would with a relational database. The data is represented more like a lot of developers would expect it to look like in a document database, something that makes that very powerful. So it really combines the best of a sort of a document or NoSQL approach with the best of the

Starting point is 00:18:12 relational approach. And in fact, a graph database can even make itself look like rectangles. It can make itself look like a relational database. It can make itself look like a document database. So it's just really, you know, to us, I mean, it's never seemed like anything but an obvious choice. Again, with the hurdle that it's a newer technology that not as many people are familiar with. At the same time, we're really excited by things like GraphQL, which itself has nothing to do with really specifically a query language or even a graph database. But the way you use GraphQL is the same way you would, you know, crawl relationships with a graph database. So I think some of this kind of shift towards GraphQL as an API interface is getting people familiar with, whether they know it or not, sort of how graph kind of works and operates. And so I expect for all the reasons that it's just a fantastic data format for today and that people are getting more and more used to it.

Starting point is 00:19:13 There's no reason I can see that this will not continue to be by far the fastest growing database category over the coming probably decade. I would have to agree with you on that. But then, you know, I'm obviously biased having been involved in this space for a long, long time. You mentioned GraphQL, by the way, which, yeah, it's an interesting, I don't know, on-ramp, let's say, to the space.

Starting point is 00:19:39 But, you know, in my humble opinion, it's also a bit of a misnomer and gets people confused sometimes because it's not actually a GraphQL language, but that's a different topic. What I wanted to ask is that I see also lots of adoption for GraphQL and actually lots of adoption in the database space. So I see more and more databases adding a GraphQL layer through which people can access them. Is that something that you are planning to do or possibly already offering?

Starting point is 00:20:11 Yeah, we already offer it. So, yeah, when you set up basically a schema within Flurry, we automatically expose a GraphQL interface for that. But as you mentioned, yeah, GraphQL is not a query language. So, you know, it's really good for finding a specific piece of information and then maybe crawling relationships and getting data out in a tree, which oftentimes is exactly what people need to do, especially if they're powering like a front-end app. So it works really, really well there. But if you want to do like a more complicated analytical query with GraphQL, of course, you could sort of hard code those interfaces into GraphQL, but it's not like an ad hoc sort of query language. So that would be the downside of it. And for that, what Flurry leverages is the World Wide Web Consortium standard Sparkle gives all of that flexibility and power to be able to query.

Starting point is 00:21:08 So Flurry not only opens up a Sparkle interface to be able to query, it opens up a GraphQL interface, which you can natively query. It also opens up a JSON-based representation, which we call FlurryQL, and we're currently in progress of opening up a SQL interface as well. Because like I said before, you can actually make graph data look like rectangles. And so SQL will work. And of course, all of those have trade-offs.

Starting point is 00:21:35 But one of the things we want to do is have this data repository be really accessible to different tools and technologies. And so having all these interfaces helps a lot there. It does. Yes, it does. And again, I would concur that this is something I see a lot these days, like databases regardless, you know, whether they're graph databases

Starting point is 00:21:55 or whatever other type of database, just offering more ways to access, you know, be it more APIs or more query languages. So, you know, we have the examples such as Cosmos2P and more like leading going down that road. And that actually helps understand the way you do things a little bit better, because to be honest with you, I was a bit confused at some point. I mean, I knew that Flurry has support for Sparkle,

Starting point is 00:22:27 but when I started looking around for examples, I came across something that didn't really look like Sparkle to me. So I was a bit confused at the moment. So I guess that maybe what I saw was an example of your JSON-based query language? Yeah, probably. In fact, I often describe what we call FlurryQL, the JSON query interface, is it is really Sparkle in JSON. And then we add a few features.

Starting point is 00:23:03 So one thing that Sparkle doesn't do really well, that, for example, GraphQL does really well, is it doesn't give the ability to crawl the results of the queries. So you can do just these unbelievable analytical queries within Sparkle. But when you're pulling out pieces of data, sort of crawling the graph from those pieces of data isn't something that it supports natively really well. You can sort of just output tuples of data. You can't necessarily output like trees of data from Sparkle. Well, GraphQL outputs trees of data. So we add some of that GraphQL graph crawling into it as well.

Starting point is 00:23:39 And then there's also some features that, you know, Flurry supports that aren't native to Sparkle, like time travel. You brought that up. You know, Flurry gives the ability to query, instantly query, every version of the database that ever existed. And you can query it by time, you know, issue this query as of January 3rd at this moment in time, or you could issue it by what we call block, which is every update, you know, which has a new identifier to it or an incrementing number to it. So that's not something, for example, that Sparkle natively supports or even GraphQL. So we add a couple of these additional features in there, which incorporate time as well.

Starting point is 00:24:15 But really, yeah, FlurryQL is Sparkle in JSON plus some other things. Okay, I see. You also mentioned analytics a couple of times actually and the fact that you're working on adding SQL support so that makes me wonder if you have something in mind regarding connection or connector to business intelligence tools for example because, because again, this is something I have been seeing emerging in the graph database world. So vendors adding SQL support or drivers or whatnot, basically to enable this additional

Starting point is 00:24:58 access layer for people who are familiar with end user tools tools, just a blow on the line. Yeah, that's very observant. It's probably one of the primary drivers is that there is a large ecosystem of tools out there that have native ODBC drivers or can work directly with SQL that people could take advantage of. But the other reason is that just a lot of developers know SQL.

Starting point is 00:25:25 And by comparison, I expect it to change, could take advantage of. But the other reason is that just a lot of developers know SQL. And, you know, by comparison, I expect it to change, but very few developers know Sparkle. And one of the questions we get, I think everyone's really excited to learn Sparkle and excited to learn Graph. And at least for me, and I've worked in it for a while now, it's actually far easier to do more complicated queries in Sparkle than SQL. But there's kind of this initial learning period. So one of the questions we sometimes get is like, here's this query in SQL.

Starting point is 00:25:54 Tell me what this would look like in Sparkle. And one of the interesting things even with Flurry is that no matter how you query Flurry, whether you're querying it in GraphQL or in Sparkle or eventually SQL, ultimately what we translate that query into is this JSON, this FlurryQL. So we're just really parsing

Starting point is 00:26:14 these different query interfaces into this common sort of superset. And one of the things we want to be able to do is allow people to write SQL queries, but then at the same time, they write the SQL query and say, oh, by the way, here's your SQL query, but here's exactly what it would have been in Sparkle

Starting point is 00:26:31 to help people sort of make that transition over. And if they just want to keep working in SQL, that's fine as well. But yeah. Okay. So I know this is a bit you know too too technical possibly like you have to to indulge me because well it's first is something i have a personal interest in and then second as i said it's also something that i see as an important trend in this space so it's it's good to have a chance to discuss with you so So, and I guess with my final question in that department,

Starting point is 00:27:05 and then we can move on to more, to different topics. So, the issue of Sparkle, I also noticed that, well, as opposed to traditional, let's say, triple stores or semantic web-powered graph databases, you don't actually store triples, but you store, well, six tuples instead. So, and, you know, there's a certain logic behind that, which I saw, but it would be good to, if you wanted to, to briefly recap that logic as to basically what's the extra elements that you store in the publish. And then my question as far as Sparkle goes is how transparent is that to people used with, who are familiar with just regular straight Sparkle?

Starting point is 00:27:53 Do they have to do something different to be able to query Flurry? Yeah, well, that's a good question. Actually, it's a topic I love talking about. So I'm glad you asked it. It can get a little technical, but I'll try to summarize it because the reasons are pretty simple. Actually, when you query Flurry, like with a Sparkle or GraphQL or however you're getting data out of it, you're technically querying Flurry at a moment in time. Remember, Flurry natively understands time. If you don't specify a moment in time, it just assumes the only thing a typical database can do, which is you just want the most current information, and that's how it will respond. But every single query actually has time attached to it. We'll

Starting point is 00:28:41 just default the current time if you don't specify one. So when you're querying a moment in time, every moment in time is itself in a mutable database that will never, ever change. I mean, there's some sort of ways we allow you to in a dire situation or a legal situation or in a GDPR situation, if you actually have to truly extract data out of it, you could do so. But every moment will never change by default. That moment is really a triple store. So there are just triples in that moment in time. We add additional information to the triples because what we're actually creating is an append-only log that can be replayed to any moment in time. And so it's the append only log that has triples. But if you manifest that log in a moment, or that has six tuples, if you manifest that log at a moment in time in a database, you're really just left with triples at that point.

Starting point is 00:29:40 So what do we add to the triples? We add two primary things. One is a reference to the transaction that happened so that every piece of data can be traced back to its origins. So this gives the provenance of information. So you can imagine every individual nugget, every datum within Flurry has a reference to the transaction. That transaction you can crawl to and see all the metadata about who put it in there. You can look at the cryptographic hashes, you can recalculate the hashes to prove the data, et cetera. The second piece of information we add

Starting point is 00:30:16 is whether or not the data is being added at that moment in time or retracted. Because again, we have time now. So as the transaction is happening, that transaction might be deleting data. So we need to sort of mark that triple as something that used to be true but is no longer true at this moment in time, or whether or not we're asserting new information. And then the sixth piece that you mentioned is really just a general bucket for holding additional metadata. And what this allows for is things like property graph style features. You talked about GraphQL being a good maybe on-ramp to kind of understanding graph.

Starting point is 00:30:57 Another good on-ramp is a technology like Neo4j. I mean, that's where sometimes people get their first exposure to graph databases. One of our colleagues calls it sort of graph databases on training wheels. And that is a property-style graph database where you can actually attach metadata to what, you know, we call the edges or sort of the relationships between items. And so this last piece of information is there to also help store additional metadata about those relationships. A semantic graph database, which is what Flurry is, does not natively have the ability to store properties on edges. You can still accomplish

Starting point is 00:31:37 all the same things. There's a lot of other benefits, but it's worth noting that there's kind of two flavors of graph databases in the world. There are additions coming to semantic graph, one called RDF star, which adds property-like features if people feel they really need it. And that sixth place in the tuple is where we can store that additional information. Yeah, that's true. And since you're obviously knowledgeable in the topic, I would also add that previously to your sixth table, people have also experimented, not just experimented, but actually implemented four tables with the fourth element being for storing what is called in RDA terminology named graphs, which basically enable people to divide their entire data sets into smaller graphs and to use that for various reasons.

Starting point is 00:32:36 And I guess this is something you obviously must support as well. Yes. Yes. And very good point. Okay. well yes yes and very good point okay so it sounds then that well taking you know just just putting all of what we've discussed so far together so your permissioned blockchain approach and the use of sparkle and semantic web technology it sounds like what you've built is really made for scenarios where you have like different nodes in a network. Well, you

Starting point is 00:33:10 want to have those benefits that you mentioned like immutability and being able to prove with certainty the origin of certain pieces of data and do things like time travel and so on. And so it's really built, if you take the capabilities of Spark into account, it's really built for like a federated data management scenario. Is that assessment actually what you had in mind? Yes, absolutely it is. There's, you know, Flurry is built to try and overcome some really big issues or struggles we have around data management. And one of them is this, probably the biggest one that is just killing a lot of enterprises. It is that we have an immense number of data silos. And our needs to combine that data, of course, are increasing all the time as we have new technologies. We're trying to stay competitive.

Starting point is 00:34:11 Data becomes much more important to our businesses. But I mean, we're literally at like a standstill. It's estimated over 50% of IT budgets are being spent on integrations. And part of the promise of some of these semantic web technologies is it actually offers a way for machines to integrate data dynamically, automatically, without us having to hard code integrations. So we're not only getting sort of slaughtered in our IT budgets, but we have all this data, we have this immensely difficult time being able to strategically leverage to our advantage. And to overcome this, we add another, what we think is a huge band-aid onto the fundamental problem, which is like data warehouses and data lakes. Because the only way we have to query data across multiple data sources is to combine all that data into one huge data source. And it's a, you know, it's a treatment to this underlying condition.

Starting point is 00:35:15 These semantic technologies give a path to solving the underlying condition, which is Sparkle and the technologies that they sit on top of, enable you to issue queries across multiple data sources, to actually do joins across multiple data sources, to have multiple data sources that might even physically be storing data in slightly different ways, but can share a common vocabulary so that anyone who also shares that common vocabulary has the ability to query and get knowledge out of this information. And it paves the way to exactly what you said, which is federated data or virtualized data stores that might not only federate data within your own organization or your own tools, but can even

Starting point is 00:35:57 federate data across an entire data ecosystem. Your partners, your suppliers, Wikidata, which already supports these standards, DBpedata, which already supports these standards, DBpedia, which already supports these standards. These are databases that even though you don't physically house in your own data center, you can sort of treat as though you're physically housing because you can issue queries with your own data combined with data from them. So that is absolutely the future of managing data. We have the standards in place to do it. We just need to sort of start digging ourselves out of this data silo hole that we've created and start our movement towards this, we think, much better world.

Starting point is 00:36:39 Yeah, I totally see that. And actually, you know, whenever we're having related discussions, let's say, with people about, I don't know, the benefits, let's say, of this versus that approach, and you want to get more specific about the nitty-gritty, you know, what's the best approach, like the property graphs or semantics or whatnot, what I always say to people is like, look, okay, it all depends on your on your use case and you know there's benefits uh to both approaches but to me what seems to be like the biggest benefit of uh this kind of technology that you're also using is precisely that integration data integration so in my humble

Starting point is 00:37:23 opinion it's the best possible technology we have at this point for this specific use case. But just to slightly, for a minute, return back to the technical part of the discussion, because, well, you kind of triggered me with something you mentioned about schema. So quick question, and then we'll head right uh to the use case so um i suppose that obviously you must uh flurry must also uh support importing schemas which are of the self this is again another uh big benefit of those technologies by now we have a quite rich library of vocabularies and schemas that people can use so uh the question is, A, if you're able to use that, any of those, which I'm guessing you probably are, and B,

Starting point is 00:38:11 because we also briefly mentioned NoSQL in the beginning, well, in most semantic graph databases, there's also the option of actually not having a schema at all and whether this is something you support. Yeah, so absolutely. The premise of Flurry is that not only can you create your own schema or vocabulary, and I think in practical use, a lot of developers approach it that way. And we don't want to fight that trend, right? I am building an application. That application needs certain data. I need to start out by just sort of, even in the development or kind of an agile process, I'm building out a schema or

Starting point is 00:38:50 thinking about that. If you already know a schema that exists that fits your use case well, that would be optimal. But that's just not how a lot of people are working yet at this point. So what our focus is, is not only giving you the flexibility to have your own schema like you would in a typical database, but then at any moment in time, from the beginning through later stages, being able to map elements of your schema to external schemas. And it doesn't have to be even one of them. So like schema.org is a popular one that a lot of people know about only because Google, you know, uses it and searching and you can embed it. This is all, and in fact, it's probably something a lot of people don't know is that,

Starting point is 00:39:35 you know, these semantic technologies are being used probably by everybody listening or reading about this, maybe even hundreds of times a day. It's embedded in all your web pages. When you share articles on Twitter and the nice little image and summary of the article pops up, it's all using these same technologies that Flurry is built on top of, which enable effectively what you're seeing at a very small scale, data integration to happen automatically with no hard-coded integration. So these vocabularies exist, but different people may not agree on the same vocabulary for the same domain. It might be medical records, right? And so there might be several kind of competing vocabularies out there. There should be no reason that you shouldn't be able to have your data and say, by the way, this element in my schema, maybe it's called patient name or a medication,

Starting point is 00:40:30 it maps to these three different schemas, public schemas. And now anyone can use any of those three public schemas to query my data. So we think you need to be able to map to multiple public schemas, but really this provides the translation layer for external consumers that gets you away from having to ever do sort of hard-coded integrations. Okay, thanks. So then with that, we can return to what we started discussing, which was use cases. So yes, federated data management.

Starting point is 00:41:07 And we also mentioned earlier one of your use cases with U.S. Air Force, which is actually how your upcoming funding round came to be as well. Would you like to say a few words about what this use case is precisely. I mean, as far as you can say, because I imagine it seems, because of the nature of your client, in this case, there may be some sensitive aspects around it, but if you can tell what you can share, basically, what this is about, how you got this client,

Starting point is 00:41:42 and where this is going, and what you're working on with that. Yeah, and it's a use case that I think every organization, probably no matter how small or large, has, which is around having a central view of their information. And we typically refer to this as master data management. We like to call it master data services because it's not just about having sort of the data sitting there that somebody can export or put it master data services because it's not just about having sort of the data sitting there that somebody can export or put in a report, but it's actually about operationalizing that data, allowing you to leverage it in applications, et cetera.

Starting point is 00:42:15 And one of the things that, you know, sometimes we get a little ahead of ourselves, we get really excited about these kind of public data ecosystems. I mean, we really see a world where your databases sit in the public domain, which seems like really far away from where we are now, because right now your databases probably sit behind at minimum two layers of firewalls. And, you know, we're talking about this idea that your databases can actually sit completely exposed to everybody. And that is because we do have a certain capability in Flurry that we enforce through a feature we call smart functions, but it allows you to actually embed the security rules around data into the data layer

Starting point is 00:42:59 itself. And right now we typically do that at the perimeter. And oftentimes, we're doing it many, many times over and over. Oftentimes, these fall out of sync over time. And oftentimes, these are where some of our data marts that your partners, suppliers, customers can transact directly into if they have the permission to manipulate the data and also directly query from. Most organizations are dealing with a far more fundamental issue right within their four walls, which is I have multiple departments. I have 500, 1,000, maybe it's 50 different applications with their own data silos. And I struggle to have a complete picture of all the data I have around a customer or around a client or around a product. And Flurry offers a really advantageous platform to be able to bring this information together. For one, it can enforce, again, who can write and update that data. So that especially when you have multiple departments within an organization, people don't have to worry about like, now we've got this more open data source, for example,

Starting point is 00:44:16 about customer data. How do we guarantee that people aren't like writing over each other? So our mutability leads into that because technically you can't overwrite anyone's data. You can only mutate it at a moment in time, but also enforcing security rules around it. And then likewise, who can read it? So I may put customer data that I think is a little sensitive, but maybe two other groups need, but these two other groups can't see that. Again, data defending itself not only gives every group the ability to just issue direct like GraphQL queries without having to build custom APIs, but every group will have a completely filtered view of the data that only includes the data that

Starting point is 00:44:58 they have permission to see. So it's a great way to start combining data, being able to query and leverage data centrally. And this just leads not only internally, but this is the same activity that we believe everyone will be doing externally before long. But we have a big first step. So this master data management is top of mind for a lot of organizations and really is a wonderful use case for this type of technology and one thing that may seem may raise some eyebrows let's say for for some people is well I was fear being a relatively young company you know landing a client like that is not exactly, you know, it doesn't really happen every day, let's say. So would you like to say how that came along? Yeah. So, you know, I think the key with Flurry is that we have some very unique capabilities

Starting point is 00:46:03 that solve some pressing pain points. We think those pain points are pain points that most organizations ultimately have, but they're more acute in certain scenarios where information is deemed highly secure and or deemed very important that you can trace exactly how information came about. And as you can imagine, a group like the Department of Defense has those sort of characteristics. They may be towards the top of the list, but when you start working your way down, financial services firms have a lot of risk around data. Data is being used in an automated way where provenance becomes important. And there can be very, very serious, you know, financial consequences if mistakes happen with data or if someone hacked into data

Starting point is 00:46:58 and manipulated it a little bit, because so many machines are doing so much automatically now. But even a small organization ultimately has these issues. They may not feel these issues quite acutely today. We anticipate they will in the future. But yeah, certainly for our feature set, I would say that any type of group that has highly sensitive data where tracking the provenance of that data is very critical are going to be the ones that are going to be most attracted early on to a solution like Flurry. Okay. You also mentioned earlier in the conversation that the fact that you're about to have another funding round is actually dictated, if I interpreted that correctly, by your contract with the Department of Defense. So would you like to say a few words about what your funding has been up to date. I think you mentioned that you have the seed round of 5 million,

Starting point is 00:48:05 which I would say is quite a lot of money for a seed round. And so if you can actually disclose who have been the venture capitals, I guess, behind your seed round, or it could have also been angel investors, I'm not sure, who's going to participate in your upcoming funding rounds? Yeah, so we just closed an extension to our seed round of another million and a half dollars. So our seed round effectively becomes a six and a half million from the five that we had

Starting point is 00:48:41 originally disclosed. And that information, I believe, will be released here in the latter part of September, maybe in a few days from when we're talking right now. It is the same investors that have added in the additional capital. And as you brought up, and I guess I brought up, it is a requirement with our contract with the Department of Defense. One of their rationales is they have described it to us as when they're working with a smaller company, which we certainly are, that they want to have some assurances that a contract with them doesn't end up meaning that the company becomes

Starting point is 00:49:26 solely dependent on them as a customer. And so they look for validation points that a company that is small like us has a product that is commercially viable outside of their use case. And as you can imagine, that's a very difficult thing to try to figure out. So the mechanism that they have devised to figure it out is that if we can raise a certain amount of capital from private investors, that demonstrates that the product in the company has commercial viability outside of their specific use case. So that's why that requirement is there. And in many ways, I think it's a pretty good and effective mechanism. Certainly doesn't prove anything, but probably gives them a good indication. Our investors include, our lead investor is

Starting point is 00:50:17 4490 Ventures. And so they are one of the larger Midwestern venture capital firms. Another major investor in Flurry is Rise of the Rest, which is part of Revolution. Revolution is the firm started by Steve Case, you know, of AOL fame. And Good Growth Capital, Engage, Engage is unique because it's actually a fund formed by 11 large corporates, most of them in the kind of southeast area. But they include Delta Airlines and Home Depot, Goldman Sachs. Obviously, they're not in the southeast. That would be an exception. Invesco and these corporates have all invested kind of as an R&D channel into this fund. And that fund has invested in Flurry. So we feel really fortunate. We have,

Starting point is 00:51:17 I almost couldn't ask for a better set of initial backers within our company. And so we're excited with our partnership with all of them. Okay. Yeah, I guess that's an interesting backstory. I mean, to me personally, I don't think I've heard anything similar in the past, but then again, you know, I haven't been really in touch with companies that have had military contracts in the past. So, yeah, i learned something new the other part that has to do with you know your your business development basically is that well as you mentioned you're

Starting point is 00:51:53 planning to open source your your product so again what's what's the rationale there and you know if you can share some more details on when it's happening and how exactly you're going to do that actually. Yeah. So the rationale is that we want to reduce any obstacles for an organization that is thinking about adopting Flurry. And I think we've probably talked about it a few times here, but picking a place in the technology to help manage your critical data is a really, really big decision. And it carries with it, you know, a good amount of risk, I think, for an organization.

Starting point is 00:52:36 So they look at this in a critical eye. And we've, as a smaller company, we've tried to address this with now two main strategies. One is that we are entirely based on standards. So even the data that we write to disk is based on standards. We serialize all of our data in Avro. You can use open source technologies to read, to even calculate the cryptography around sort of the integrity of the data. All of that can be done without Flurry even being there. So the standards-based approach, I think, is a good way of helping to address any concerns. And open source is another incredibly

Starting point is 00:53:20 good way of addressing any concerns with that. Now, at the same time, we're really optimistic. I mean, we have a great community on Slack. I would incur anyone trying out our product to join our Slack channel, who's building some of their own libraries and code and publishing them. But there's also, we think, a great desire of people to contribute to some of the things we're doing with Flurry to address some of their specific needs. And, of course, open source gives us, you know, the ability to have some of that happen as well. So those are the main motivators. Have you already decided on what specific license you're going to use? Yeah, we're going to use the AGPL license. And we have a large number of libraries that we have already open sourced that

Starting point is 00:54:14 underpin Flurry, including our consensus library, which is Raft. We've got a couple of cryptography libraries. And we have open sourced all of those to date under the MIT license. We're probably going to change that to the Apache 2 license. So we have a very liberal license for a lot of our underpinning libraries and technology. And then the Flurry Core, which wraps a lot of these libraries and, of course, has some additional features in it, is going to be AGPL. Okay. I guess maybe part of your reasoning for choosing AGPL has to do with potential use or misuse by cloud vendors, as we have seen happen with other database vendors? Yes, that's exactly correct. And companies like Mongo have gone the route of kind of creating their own license because

Starting point is 00:55:11 AGPL is still a little bit liberal in some regards, but we think AGPL gives most of the protections. And of course, it's a very common license that a lot of people are familiar with. So we didn't want to go the route kind of that Mongo went on this. Yeah, I think it makes sense. Do you have an exact timeline? So when are you going to announce your open source,

Starting point is 00:55:40 your open sourcing the product? Yeah, so we have the September 30th, I believe is the date. We've said at the very end of September. So it is possible that our communications team will, you know, move that a day before or after or something like that to optimize for the formal announcement, but yeah, end of September. Okay. So yeah, I mean, what, what you mentioned before makes sense.

Starting point is 00:56:13 It is probably open source in your, your product will work that way in terms of mitigating potential concerns that users may have. But to me, I think the main one, or a benefit that's at least as big as the one you mentioned, is the fact that this may give you the chance to grow your community, basically. Not just in terms of code, but also mindshare and everything that comes with that.

Starting point is 00:56:41 So I wonder what your plan is to make that happen, basically. So becoming a successful open source project will actually require lots of work. It's not a very, very easy thing to do. You have to make it easy for people to onboard themselves. You have to provide training and documentation and community support and all of those things have you planned for that yeah and at the same time i'll be very humble in that we have

Starting point is 00:57:12 not uh open source before a large software product so um you know as i mentioned we've been in enterprise software our whole careers but this is the first experience for us. So we are trying to talk to people who have gone through this process before to gather whatever advice or insights that we have that we can get. One of the reasons we have not open sourced it to date yet is partially because of some of the feedback that you just mentioned, which is, you know, people need to be able to easily just pull the code down and run a build. And, you know, a lot of the things, because Flurry does have a lot of technology, you know, a lot of our own libraries and et cetera, that we didn't feel was easy enough. So we actually, in the past couple of weeks, have just gone through and rewritten the entire build pipeline so that it could be easier for someone without any environment

Starting point is 00:58:12 already pre-established on their machine to be able to just pull down the code and start building and running it. So that's a lot of the feedback that we've gotten. I am confident we have a lot to learn. There'll be a lot of lessons as we go through, but we're really trying to do our best to make it as easy for the community to use and understand as we can. Okay, yeah. Yeah, that makes sense. And yeah, I totally see how, you know,

Starting point is 00:58:41 it's not the easiest thing in the world. And if it's your first time doing that, it's not going to be perfect from the onset. But I hope you eventually get there. So yeah, I think we're about out of time and we've actually covered lots and lots of ground. Because, well, honestly, you have quite complex in some ways, but also quite ambitious products going on.

Starting point is 00:59:09 So it kind of warranted the time we gave it, I would say. I think that's pretty much it from my side. So maybe you have some closing comments. Feel free. Otherwise, I'm good. Yeah, I have a couple comments. So one is that, yeah, we do do quite a bit in data, which I think data in its importance today warrants it.

Starting point is 00:59:35 We have made great efforts to make it easy for someone to start using Flurry. You do not need to know anything about blockchain or cryptography. Those are all features that you can tap into as you start to care about them. But you can quite literally just download Flurry on your laptop, run it as a single node. It'll start up and form a consensus of one machine. Automatically, you don't need to think about consensus or any of this stuff. You can put some data in and you can start building a React app or something that's using GraphQL. And you can literally do that in like 20 minutes. So we've tried to make it really approachable and easy

Starting point is 01:00:15 to start using. And then you can dive into some of the more advanced capabilities it has as you need them. The other thing that Flurry is a natural platform for, which I think is going to be a very disruptive technology in data as a whole, is the emerging standards around what's called verifiable credentials, which is built on semantic standards. It's built on cryptography. It's built on decentralized identifiers. And that is maybe a podcast or an article for another day, but this is going to really revolutionize, I think, how we think about data,

Starting point is 01:00:53 introduce people to ideas of like data containers or portable data, portable secure data, and really exciting things on the horizon. And Flurry is really well positioned for that market to be a foundational technology in that. And so, yeah, I'd love to talk about that maybe another time if you invite me back. I hope you enjoyed the podcast.

Starting point is 01:01:17 If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

Your Ad Here

Orchestrate all the Things - Fluree, the graph database with blockchain inside, goes open source. Featuring Brian Platz, Fluree co-founder and co-CEO

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.