The Changelog: Software Development, Open Source - ZeroDB (Interview)

Starting point is 00:00:00 I'm McLean Wilkerson. I'm Michael Egorov. And you're listening to The Change Log. Welcome back, everyone. This is The Change Log, and I'm your host, Adam Stachowiak. This is episode 190. And on today's show, Jared and I are joined by McLean Wilkerson and Michael Egorov, the guys behind ZeroDB,

Starting point is 00:00:28 an end-to-end encrypted database and also a protocol. We talked about why it's open source, how it's different from other encryption techniques, if there's a performance hit for running encrypted queries. We also talked about the business side of this thing and also an interesting topic all in itself, proxy re-encryption and so much more. Three sponsors we have on the show today were Codeship, TopTile, and DigitalOcean. Our first sponsor is CodeShip. If it works with Docker, it works with CodeShip.

Starting point is 00:00:53 For those out there with established Docker workflows or those looking to leverage native Docker support while automating your testing and deployment, check out CodeShip's new Docker platform at CodeShip.com slash changelog. And for those looking for great resources to automate your development workflows with Docker, you should download CodeShip's free ebook

Starting point is 00:01:15 covering why consistent environments are so important, how a company lost $400 million in 45 minutes due to inconsistent environments, and also how to build an app to run inside an isolated Docker container. Head to codeship.com slash changelog to check out Codeship's new Docker platform, and head to resources.codeship.com slash ebooks to download that free ebook I mentioned on automating your development workflows with Docker.

Starting point is 00:01:42 And now, on to the show all right we're back today talking about zero db with mclean wilkinson and michael eggerov talking about an end-to-end encrypted database also a protocol we'll dive deep into that got jared here as well mclean michael welcome to the show thanks guys it's just excited to be on here i guess maybe the best way since we have two guests on the show today jared let's maybe kick off real quick with some quick introductions so just to kind of pinpoint voices i know when listeners listen to any podcast there's more than a couple voices it's hard to kind of put a name or who's speaking to it. So McLean, we'll go with you first.

Starting point is 00:02:29 So give an intro to who you are and what you do with ZeroDB. Sure. So again, McLean Wilkerson, one of the co-founders of ZeroDB. Software engineer, also do business stuff now that we're an official company at ZeroDB. Have a background both in software engineering as well as more traditional finance. I worked in investment banking for a few years and I had a positive conversion from that. But during that time I was covering tech, media, and telecom

Starting point is 00:02:58 companies before getting into startups. Awesome. And Mike, what about you? Yeah, so I'm a software engineer and basically I'm the guy behind the tech here, even though McLean is also working on the tech in our company. And I also have a physics background. And how this got on our radar. So I think Jared, we, you know, chime in here on this because I thought this was kind of interesting. Before recalled jared and i have some sort of sync up right to think about like you know what is this show gonna be about who's on the show what's their backgrounds that kind of thing and jared said to me adam how this hit your radar jared what did i say i kind of stuttered a little bit i had no idea but yeah you didn't know you were like well we had to backtrack a bit

Starting point is 00:03:41 it was december i know that yeah i. I'm going to guess Hacker News. No, no, you would think Hacker News. Now, this is how cool we are. This is a slight pat on the back to us because we've been trying. Right. So we have this email called Change All Nightly. And that email goes out every single night. It's essentially top new repositories, top star repositories every single day from GitHub.

Starting point is 00:04:04 And that hit Change All Nightly, your repo repo when it open sourced on December 7th. And so that hit the top news repos. And then the very next day, December 8th, jumped to top star repositories. And then days after that kept trending for several days. And that's how we discovered you. And almost immediately, I was like, this is going to be pretty interesting because it's obviously, you know, encryption, database related. Sent you a tweet the very next day on December 7th and said, congrats on the launch. Let's get you on the show sometime soon.

Starting point is 00:04:36 So it took about a month. We're after the new year. We're here in the new year finally. And so now we have you on the show. And that's kind of how this hit our radar. Gotcha. the new year finally and so now we have in the show and that's that's kind of how this hit our radar gosh yeah when we when we open sourced it uh in i guess beginning of december we had a little bit of a positive uh feedback loop or viral effect effect there for a little while because i guess it caught some people's attention got a bunch of stars and then it hit the uh github

Starting point is 00:04:59 trending page and and once you get there it sort of it feeds on itself for a while so we we had a nice little ride there for a few weeks. Jared, do you want to mention anything about Teams All Nightly? I love it. You love it? I mean, the fact that it's our own radar? Yeah, I mean, there's no secret to how we find awesome stuff. Stuff that we put in ChangeLog Weekly or ends up on the show is 10 p.m. U.S. Central.

Starting point is 00:05:27 What's interesting on GitHub for the day? And like you said, certain things pop into that new repositories list and they pop in and they pop out and you never see them again. And then other things will pop into that list and then you'll find them in the bigger global list of top start altogether.

Starting point is 00:05:44 And yeah, you can just start to see what is trending, what is trending, what's interesting, and what's worth clicking through and checking out. Well, that's a good example of dog feeding, I guess. Well, I wanted to mention that at the top of the show, mostly because of the discussion that Jared and I had earlier. Like normally we don't come on the show and make it about us, but I thought just for a second,

Starting point is 00:06:03 we can talk about that simply because it was literally from our own dog food that you know for using that that metaphor that we found out about zero db and immediately i was like we should get in touch with guys have them on the show um but let's let's dive deeper into some some better interesting things by the way if you want to subscribe it's a good transition chang. ChangeLaw.com slash Knightly. Go ahead and subscribe to that every single night. But let's dive in, I guess, to the topic of how you guys got started. Obviously, we're going to dive deep into 0DB databases, encryption later in the show, but let's find out where you came from. How did, I guess, you two meet? What was your background seemed to cross a bit in terms of, you know,

Starting point is 00:06:46 tech and financial and things you've mentioned, but how did you guys meet? Yes. We're going to go back a little ways. And actually even before ZODB, we were doing a lot of blockchain cryptocurrency stuff, uh, sort of the intersection of our respective backgrounds. We actually first met at a, at the SF francisco bitcoin uh users group um and so we we met there sort of hacked on a bunch of different you know bitcoin cryptocurrency blockchain related stuff for a while um and that was sort of the initial meeting and overlap of interest and out of that process sort of uh we had the inspiration for zero db after after many different experiments you guys still attend that meetup that bitcoin meetup uh we do

Starting point is 00:07:32 yeah actually we are not in silicon valley at the moment for the next several months uh we're in london there's a there but yeah we attended it out there. For some reason, McLean, you got real soft there. Did you back up for the mic? I'm soft? Yeah. Did you step away? Did you move away or something like that? No, I'm actually pretty close.

Starting point is 00:07:54 Is it still bad? No, you sound okay now. I don't know if anybody else, if you found it. I noticed it. I haven't noticed yet. Yeah, you kind of got soft there for a second. But your back, we can hear you. You're just softer.

Starting point is 00:08:06 Like your volume went down. Like you backed up from the mic. Yeah, I'm not sure what happened there then because I've been pretty close. All right. Sorry about the glitch there, everybody. We'll see if we can edit some of that out. But we might just leave it because that's radio. All right.

Starting point is 00:08:20 So, Mike, what about you? I mean, what's your story in terms of that meetup? Like how did you get attracted to that Bitcoin meetup? Yeah, so I've been attracted by Bitcoin even before I came to the US. And at the time, while I was in Australia, and I wanted to go to Silicon Valley to do some startup. I don't know which one. So I went there just to work for a tech company for linkedin

Starting point is 00:08:47 for a while and i started attending these meetups um especially bitcoin bitcoin meetups and that's why i've met mclean um and yeah so i've been fascinated by bitcoin decentralized applications things like that and that's what sparkled this zero db idea at what point did you know was that first meetup or that's when you first met but like at what point did this discussion happen what was the problem you guys were solving was it just like you know what was the situation there yeah we didn't really start working on 0db at all really till i guess the after a year of working on other stuff um sort of in that blockchain you know area um we were looking a lot at sort of these new newer uh frameworks like ethereum or storage with a j and actually how 0db came about we were thinking okay if you have a decentralized application presumably like every other application,

Starting point is 00:09:46 it's going to need data, and some of that data is going to be sensitive or need to be private. But if you don't know where that data is sitting, and you're not trusting the server necessarily, like how can you have that data be secure and usable in this decentralized world? And that was sort of the original problem that ZeroDB was solving.

Starting point is 00:10:08 Although, you know, we've ended up building it for a sort of a certainly a more traditional client server architecture. But that's the original origin story. So from there, like, how did you, what was the conversation like was like, hey, we should create this technology or hey, there's these problems and we should open source it we should create a business together like how did that happen well i think we we had you know throughout all these different experiments we had been planning on eventually you know having something that would become a business i think that was in the back of our minds the entire time while also you know having you know the

Starting point is 00:10:42 flexibility to be able to work on cool stuff. And we, so originally when we first had the idea, we didn't really think that it would be, that it would work. We just thought it would be too slow. And eventually we got around to sort of hacking together a really quick and dirty prototype and realized, hey, actually, you know, you can, you can tweak some things and play around with it and actually make it, make it performant enough that you can use it for, you know, real world stuff and production applications. And at that point, this was back in March of last year, so 2015.

Starting point is 00:11:15 At that point, we just thought it was kind of an interesting little technical trick. But we posted about it on Hacker News and it blew up there and hit the front page twice in one week, which for me is a, I would say, very frequent reader of Hacker News was a cool thing. And that was when we sort of realized, okay, hey, you know, this could be, you know, something that could go into actual business. Well, I was just going to ask, you know, where it goes from there. I mean, we have more and more of these open source projects that have a GitHub page and they also have an AngelList page.

Starting point is 00:11:50 And so it's a bit of a newer phenomenon and we're talking to more co-founders slash open sourcers. It almost sounds like I said, oh, sourcers. Open sourcers. And so, yeah, I would just wonder, is like, okay, so you have this cool technology. You obviously both want to do startups or be in a startup or start a startup.

Starting point is 00:12:12 And you've had something that's interesting on Hacker News, which shows it has at least some technical merit. Where do you go from there in March to here we are in january of the following year um and now it's you know a business so what's the the process the steps yeah what's that look like yeah so what happened was we we had obviously the little really quick and dirty prototype and once we realized hey there's a lot more interest in this beyond just the two of us

Starting point is 00:12:43 from this hacker news post we we sort of built it out's a lot more interest in this beyond just the two of us from this Hacker News post, we sort of built it out a little bit more, started giving access to some of our friends that were interested in potentially building stuff on top of it and kept sort of on product development for a few months. And then I guess the summertime, we started reaching out to some banks because we realized beyond decentralized applications, which are really cool in general, but I think in reality, it's not clear.

Starting point is 00:13:11 There's not a whole lot of regulatory and compliance constraints and see you know if there's a use case for an end-to-end encrypted database in in that area and obviously that's where you could potentially build a an actual big business as opposed to you know in this decentralized world which is i think very interesting but still maybe a few years away from actually really taking off. So I started having those conversations, realized, you know, this could be a tool that could help banks move a lot of their on-premise infrastructure to the cloud. And so in general, banks have been very late adopters of cloud infrastructure and not a whole lot of clouds that are running infrastructure in AWS or Azure. The big reason for that is because they don't want to give up control over their data to a cloud provider, which even if they're encrypting their data at rest, if they're actually querying it,

Starting point is 00:14:18 they're going to have to send the key to the database server at some point. But with 0DB, you can have the keys on-premise, still have the data be queryable, and so potentially offload a lot of that on-premise infrastructure to the cloud. And obviously, there's a lot of economic benefits to that, as well as scalability and performance. So went through a lot of those conversations. It actually got announced on Monday of this week, so I can say it publicly. We're in FinTech Innovation Lab out here, which is a really good chance for us to talk with the 15 or actually 20 banks that are involved with this program here and potentially help them with that long-term cloud strategy. Tell us about FinTech Innovation Lab, exactly what that is and what it entails for you guys. Sure. So FinTech Innovation Lab is sort of an accelerator, but not really the way most people think of accelerators out in the States, like in New York, where the London one, of course. And it's really sort of a mentoring slash sourcing mechanism for them to find new interesting technology that they can potentially use internally or help go to market. So we're working and have the opportunity to work closely with a lot of senior IT executives and application developers inside of a lot of the leading global banks and figure out where zero to B would potentially fit within the organization. What the other use cases are beyond the ones that we've already identified.

Starting point is 00:16:04 And there's 15 other companies in this program with us. It's divided into three different tracks, five companies in each track. We're in the tech track, which may be obvious. There's a retail banking track and then a commercial and investment banking track as well. So you said you're in London. Where are you normally at? I assume London's foreign to you, right? Yep.

Starting point is 00:16:27 This is the first time I've really spent any time in London. We're usually out in the valley, so we're in Mountain View typically. Gotcha. I was going to say, you made it this Bitcoin meetup. You're interested in blockchain and decentralized things and obviously security as well. And then decide to get together and build some technology, start a startup, and ends up not being specifically using the technologies that Bitcoin relies on. But nonetheless, here you are. And I guess I'm just curious, going back to the Bitcoin, if you guys, maybe Michael, you

Starting point is 00:16:59 can address this one first. Are you guys as bullish on Bitcoin as you were back when you first started uh meeting up in san francisco uh well probably not as much bullish as we were initially but we still strongly believe in bitcoin and in decentralized applications and things like that. So that said, I would say now it's this Bitcoins and Bitcoin infrastructure growth is at much more sustainable rate rather than this explosive craziness. Have you guys seen any interest from the fintech

Starting point is 00:17:42 and the banks about these types of technologies like the bitcoin the blockchain based technologies for infrastructure or anything like that uh yes certainly there is some interest interestingly enough they uh recently called blockchain as b word as what b word. As what? A B word. Oh, B-word. Is it a bad thing or a good thing? I don't know.

Starting point is 00:18:13 It's pretty much, this word is used too often, I guess. And they jokingly call it B word just to emphasize that. That's funny. People are talking about it so much, it's become the B word. Oh, I see. Gotcha.

Starting point is 00:18:26 Well, I think this might be a good chance to step back, take a break. Of course, we're here to talk about ZeroDB, which is your guys' startup and your open source project. We'll probably start with the conversation around the open sourcing of it and why it's open source and all those things. And then we'll dive deep into not just what it is, but how does it work? Why is it cool? And all those things that developers love to hear about. So let's step back for a moment, hear from one of our sponsors, and we will talk about all those things on the other side of the break.

Starting point is 00:18:58 We know you listen to other podcasts. Don't worry, we're not upset. We know the changelog isn't the only podcast in your list. And you know what? Don't worry, we're not upset. We know the changelog isn't the only podcast in your list. And you know what? You may have heard advertisements on other shows about other hiring platforms and other places like our friends at TopTow. But let me tell you, there is no match to how invested TopTow is to both sides of the equation. You have companies out there who need great engineers, great designers.

Starting point is 00:19:25 And you have great designers and great engineers out there needing great opportunities. And TopTow plays both sides of that fence very well, helping make sure the right kind of opportunities are there and the right kind of developers and designers are there. And if you're a CTO listening to this or someone on a team who knows you need to expand and add more people to help reach your goals, TopTow is the best place to go and find the best designers and the best engineers out there. And if you're one of those best engineers or best designers looking for great places to do great work and freelance and travel the world and have a lot of freedom, but also have the same kind of support you would want from working somewhere, TopTow is that place. You can blog with them.

Starting point is 00:20:08 You can travel the world with them. They have meetups all around the world. They absolutely love to encourage and to support and help their developers and the designers that are in their network all around the world grow and be better and dop-t-a-l.com once again t-o-p-t-a-l.com but if you want a personal introduction on either side of the fence whether it's an introduction to find the best designers and developers or if you're a designer developer looking for great opportunities get in touch with me i'd be glad to give you a personal introduction email me at adam at changelog.com. And now back to the show. All right, we are back speaking with McLean Wilkerson and Michael Egerov of ZeroDB, about ZeroDB. And as we teed up before the break, our first question, as often our first question on a show about open source software is, why is 0DB open source?

Starting point is 00:21:09 Well, I think it's, I mean, well, there are a lot of reasons. I mean, fundamentally, obviously, we're developers. We believe in open source and we want the things we work on to be open source. But even beyond that, just from a business perspective, I think in 2016, now that it's the new year, it's, it's, I think it's just super hard to have a closed source proprietary business, software business, particularly when you're dealing with infrastructure tools or dev tools. And even moreover, when it's, you know, something that's so security focused, like zero DB, you of course don't want your, you know, have security through obscurity. So obviously, the more eyeballs we can get on 0DB, the more we can be sure and confident in its security

Starting point is 00:21:48 and in our implementation of it. And then beyond that, just looking at where the, you know, where the software industry is gone, you know, if you're a developer and you're selling to developers, developers' first choice, they're going to look at an open source tool. And if you're closed source, you're sort of starting behind the eight ball. So I think it's very difficult to build a closed source tools company today. Looks like you've selected the AGPL license.

Starting point is 00:22:16 Would one of you speak to that decision and what all that implies as an open source license with ZeroDB? Sure. So we've looked at very closely at 10Gen and MongoDB as kind of a model for what our business could potentially look like. So with AGPL, obviously it's an open source license. People are free to use ZeroDB for their projects. The requirement there is if you're building on top of it

Starting point is 00:22:41 or making extensions to it, you need to open source what you do on top of it as well. And for a lot of people, that's fine. For some people, particularly maybe enterprises that aren't as into the open source movement quite yet, they prefer to have what they build still be closed and proprietary. So in that case, we can sell them a commercial license.

Starting point is 00:23:02 And then we're also considering if we include extra features and potentially like an enterprise edition that we could license as well, although we don't, we haven't done that quite yet. We're still thinking around that. And then obviously, you know, on the business side, you can, in addition to selling licenses, you can sell support and things like that. I'm curious on that licensing front, like, you know, Jared, I don't know about you, but I still feel like licenses to me, like they run together.

Starting point is 00:23:31 I have a good memory. I try my best, but it's like documentation. I have to go back and look it up. And I'm just wondering McLean for you, if, if you had any advice from somebody, like did you have a lawyer step in and sort of walk through that with you? So as we have people listen to the show, I'm just wondering if there are resources out there, aside from say the ones that are known, like choose a license from GitHub, which is really helpful. What resources did you use to, to, to make the decisions you just made?

Starting point is 00:23:58 Yeah, we did. We did talk to a lawyer just to sort of sanity check everything we're doing. I wouldn't say we went to a lawyer to start. I mean, there's obviously tons of resources online in terms of advice around building an open source business and what the different models might look like. Of course, even though it's not super fun reading through the actual licenses, particularly the one that you choose, you should probably do that at least once or twice

Starting point is 00:24:22 so that you actually understand. Waste of time. Waste of time. Nobody reads them. And then obviously, you know, talking to people that have done an open source business and have used a particular license. One caution I would say is it's generally better, at least in my opinion, to choose sort of a standard license. So something that's known and people are relatively familiar with. Obviously, the mit licenses the apache licenses the gpl ones um as long as something's standard and not i think if i think where you could potentially get into trouble is

Starting point is 00:24:55 you have some sort of bespoke license and that just sort of that makes the hurdle for someone actually using your software that much higher because then particularly if it's if it's a company they're going to have to have their legal department look at it and understand it and right you involve nothing against lawyers but you involve those guys things to get slow you have to do things that you wouldn't normally do and you can't quite be quite as agile around decisions yeah and one of the benefits of having an open source in the first place is is lowering the adoption hurdle where you you know, if you're, if you're, if you're forcing people to go through the legal department, that sort of defeats that entire purpose.

Starting point is 00:25:30 I like that process of looking at a company that you, you know, that you like their model, you've seen it. It's, it seems to work at least, uh, with regard to the way they're going about building a business around open source and then just saying, well, I like, for TenGen, like you said, that's the kind of model that we want to follow with our business. And so let's check out what they're doing and use that as a starting point, at least,

Starting point is 00:25:53 to having conversation with lawyers. Or if you're able to get somebody's ear at TenGen who's in those decisions and ask, how do they feel about the fact that they have this license and how is it working out for them? I think it's a great way of going about it as opposed to just starting with a blank slate. Yeah, and I think the interesting thing

Starting point is 00:26:10 also in the open source businesses, it's in reality, the entire life cycle of the entire sort of this open source movement is very early, particularly thinking around business models with it. So I don't think anyone really has the answer quite yet. So everyone's still sort of experimenting and seeing what might work, what doesn't work and trying to figure out how you can build, you know, a sustainable business on, on top of open source software.

Starting point is 00:26:33 It's further down our notes. So Jared and I put together notes for each show and it's further than a list than the first question, which was the obvious one, why open source? Um, but it makes sense to bring up here, which is the patent pending algorithm piece. How does that play into and having a patent on software, I guess, or I'm not sure what the exact patent is for, but how does that play into the open source nature of this project?

Starting point is 00:26:57 Yeah, so we have a provisional patent filed and I think the reason you file a provisional patent is sort of get a date on it until you have up to a year after you file the provisional to apply for the actual patent. I would say that was done out of an abundance of caution. We don't know. Obviously, it was open source. The code is available and free for people to use.

Starting point is 00:27:21 I think also in practice, it's unclear that software patents really work. You know, and so it's sort of an abundance of caution, a little bit of a check the box from an investment perspective. You know, obviously a big question you get when you're going to raise money is, you know, what does your IP look like? And is that defensible? I wouldn't say the patent is really a poor piece of our strategy going forward. Um, but it's, it's something that we did just, just in case, you know, obviously when we're thinking through this, we didn't know exactly what things will look like, you know, a year from now. So we just wanted to be, have, have that

Starting point is 00:27:59 option if it, uh, if it was something that we thought would be valuable for the business going forward. That makes sense. Interesting. Well, let's dig down into what ZeroDB really is and what it's here to offer us as open source developers, what it does and how it works. You call it an end-to-end encrypted database. Michael, could you unpack that for us and tell us what zero2b does and why it's different than stuff that's already out there? Sure.

Starting point is 00:28:28 What it allows you to do is pretty much to use this database to do queries without the database server knowing anything about your data or knowing your keys. So all the keys always stay with the client. And still from this client, you're able to query your data, obviously without pulling the data down to the client. So for those out there who may be thinking, well, couldn't you just use full disk encryption? There are offerings out there.

Starting point is 00:28:58 You have record level or column level encryption. You have full disk encryption. And so I just want to put a stressing point on how this is different than those things. So could you compare and contrast it to like, well, I'll just encrypt the entire disk on my server and I'm good to go? Yeah, so basically when you encrypt your entire disk

Starting point is 00:29:17 on the server or use column-level encryption, every time you query, at least, you expose your encryption keys to the server. So these keys are at least in memory. And any attacker who is sitting on the server can pretty much take a snapshot of memory and figure out where the keys are and decrypt the data. And it wasn't so much of a problem when the threat was somebody physically stealing your hard drives, but now in a very networked world, most of attacks are remote,

Starting point is 00:29:58 so if somebody infiltrates into the server, this is actually really a problem. There's a nice little analogy, which isn't necessarily exactly technically accurate, but if you compare it to like a physical vault, so you have your valuables inside of the physical vault, you're going to lock the door. You're not going to hang the key up on a nail next to that door and go home. But in a way, not exactly, but in a way, that's what you're doing when you're encrypting data at rest. So you're still having to send the key over to this database server where your data is,

Starting point is 00:30:28 and that's the window that we're trying to close with 0db. I think that's definitely a helpful analogy. Like you said, probably not exactly 100% accurate, but I think that does help paint the picture a little bit. One thing that's always advised

Starting point is 00:30:44 with security is the defense in depth principle, which is to say, don't just lock the door, but also don't just have a key, also have a padlock and then also do this and just add layers of depth to that security. Would you guys consider 0DB in that case a layer of security? Or would you say, if you're using 0DB,

Starting point is 00:31:02 we don't need the full disk encryption. You obviously can't really even do column encryption, I would guess, but are these instead ofs or are these alsos? Well, I would say with 0DB, we call it encryption in use. And so by default with encryption in use, you still have it, it's still encrypted at rest. It's still encrypted in flight, you know, as the data is moving between client and server.

Starting point is 00:31:22 And of course you could also layer on top TLS or SSL to that if you liked. But I would certainly not suggest that someone just use 0DB and totally forget about the rest of their security posture. That's definitely not what we would advise. You don't want to go on record and say that, is that what you're saying? Yeah. What we tried to do as UADB was, so if you assume that the database server is compromised,

Starting point is 00:31:49 how can you still mitigate the potential for your data to be, the actual underlying unencrypted data to be stolen on the server? So that's the way that we came at it. And the answer is, your server should never know anything, basically. Yeah, I mean, if you never send the encryption keys to the server and assuming you're using strong encryption like neg AS-256,

Starting point is 00:32:16 then all an attacker is going to see is just a bunch of encrypted gibberish. So how does it work? You got the server's dumb in the sense of encrypted gibberish. So how does it work? You got the server's dumb in the sense of the encryption technology. I'm guessing it's just storing crypto text. How does it work? Tell us how it works, guys. Yeah, so basically we,

Starting point is 00:32:40 apart from encrypted records, we have encrypted index, and that's what makes the thing working. And the way it works is really easy. So the encrypted index is a tree-like structure, pretty much a B-tree. And it is pieces of this B-tree, the buckets, are encrypted before they go to the server. So the server.

Starting point is 00:33:09 So the server observes only already encrypted buckets. And the way we do queries, the key piece in it is that the client traverses this B3 remotely. So pretty much it pulls down the root of the tree, decrypts it, figures out what piece of B tree to request next, requests it, and in several steps it finds what it was looking for. Of course it's not all that easy because it becomes a little bit more involved when you need to do some operations like set intersections or things like that. But the key piece is this traversal of the tree from the client. So let's talk a little bit about the ergonomics of the database or the kind of database it is,

Starting point is 00:34:02 because I think you mentioned somewhere that, you know, it's 0DB is an end-to-end encrypted database and you also have protocol mentioned. But at the end of the day, it has to be a database, right? So what kind of database is it? Is it like an SQL compliant relational database? Is it a key value store or, you know, a document store? What kind of database is it?

Starting point is 00:34:23 How do you work with it? It's more like an indexed document store, a document store? What kind of database is it? How do you work with it? It's more like an indexed document store, somewhat similar to MongoDB. Interestingly enough, we base it on something called ZODB, which is a part of ZOB framework, if you remember this thing. And ZODB wasn't super popular at the time when it was a thing. The reason, I think, is because it's more like a kit for building databases rather than a database on itself. So we used that to build ZeroDB. The name is similar, as you can recognize. And from that, we inherit such features as ACID compliance. We can do replication and things like that. Again, inherited from and yeah that's pretty much explains how we have all these features we just add the

Starting point is 00:35:29 end-to-end encryption piece on top of that that makes a lot of sense because what i was starting to think when i was reading through what you guys are up to with zero db is um at the end of the day like wow there's a lot of stuff you have to build if you're building a database right like your your key differentiator is the encrypted part of it, but that you're also offering a database, right? And I was like, they just decide to just redo all these things. And now I'm finding out Zope object database,

Starting point is 00:35:56 it has transactions, history undo, transparently plugable storage, all these things that you guys could just build on top of and leave that to somebody else, so to speak, or at least the foundation is there for you, which looks like it's a Python-based thing. That's the beauty of open source. Yeah. There you go. So good idea.

Starting point is 00:36:18 So is that a fork, though? Are you using actually ZODB behind the scenes and you're following and tracking that project? Or is this something that you've forked and you're using your own version of it? Yeah, basically, the way ZODB is built, it's super modular. So you can actually use its pieces as a library and you will still be okay. So that's how we are using it. Of course, we could fork it, but it seems like it wasn't necessary so far. I guess maybe another question on that is,

Starting point is 00:36:52 since, Jared, you mentioned it's Python-based, what made you choose this database? Why this one and not some of the other options out there that are available to everyone? It's actually being built in Python. It allows us to move pretty quickly, to develop things quickly, faster than if we would do that in C.

Starting point is 00:37:15 And if you think about performance, well, it seems like performance is good enough. I mean, all the limitations come not from Python by itself. You have encryption overhead, things like that, and that pretty much outweighs the fact that the database is written in Python. And actually, ZODB, if you use that properly, is pretty fast. Yeah, I think a big piece of it was just, as you said before, we don't want to have to rebuild a new database from scratch. We lean on something that's existing, and ZODB is, as Michael said,

Starting point is 00:38:03 a good kit for doing that. But as you pointed out, I think before, we in ZODB is, as Michael said, a good kit for doing that. But, you know, as you pointed out, I think before there's, you know, we had the word protocol there. So in principle, you know, you could build a database from scratch that does ZODB. You could potentially build it on top of another database on a SQL based database, for example. That's what I was thinking about when you said that was like, if is this, you know, obviously you're at a certain point, if someone else preferred a different database, could they take the same principles or the same things you've done with the idea of it being a protocol? Could they apply that to, you know, X, Y, Z database or a different preference at least? You certainly

Starting point is 00:38:39 could. And I would say it's not, not a trivial thing to do. And one thing we've actually thought about is, is there a way for us to potentially sit alongside of existing databases, particularly when we're talking to a lot of the banks, like they're running Oracle, MySQL, Postgres, a lot are still running even DB2 and Sybase stuff. So is there a way potentially for us to sit alongside of that? Of course, a large bank isn't going to rip out all of their existing database instances and replace them with 0DB as much as we would like that. So in that case, I mean, we don't have this ready today,

Starting point is 00:39:15 but our thinking around there is potentially they can keep using Oracle, for example, for the storage, throw encrypted records in there and have a 0 DB index that would sit next to that. And actually when they want to go query those encrypted records, they'll go through our index. So I would say still forming our thoughts around that piece, but that's a possibility for the future. Yeah. Well, let's be honest. I mean, this is fairly, you know, fairly new, like in this last year. So in 2015, this whole entire thing for you guys was born right like you guys connected in march it sounded like if i'm painting back the history and then by time december we had an open source project that was spawned off now a business so you're still

Starting point is 00:39:54 in the uh forming your ideas innovation stage to to say the least i'm sure right yeah certainly still super early days and working through everything both from the technical engineering product development stuff to thinking through the business model stuff that we talked about earlier. Gotcha. Well, let's take another break. When we come back from this break, we're going to cover the deeper sides of the client side of this database. So stay tuned. We'll cover that when we come back. Our friends at DigitalOcean,

Starting point is 00:40:28 simple cloud hosting built for developers, launched a new feature recently we absolutely love. It's called Droplet Multi-Create for easily launching up to 10 servers at once. You can deploy multiple droplets when you create your droplets with the exact same configuration. And this is a huge feature. We absolutely love it.

Starting point is 00:40:44 Head to digitalocean.com and when you sign up use our code change law to get a $10 hosting credit when you sign up all right we're here with mclean and michael talking deeply about zero db the protocol the database and all the the odds and ends of this. And, you know, the encryption is handled by the client, and right now there's only a Python client. So it seems like you're bullish on Python, which is a good thing. And at one point, Michael, you mentioned you could do it faster in Python than you could in C.

Starting point is 00:41:18 But I guess the question we have here is roughly around the clients, will this always be the case? Are you planning other clients? And how much work does it take to create a new client let's say for ruby or javascript for example all right yeah that's an that's a very interesting question so obviously now we uh we we support uh json api so that you can interface it from anywhere. But that means that you have to run this JSON API server on the client machine, which is not always very convenient. And the first thing people are requesting is actually JavaScript client. Right?

Starting point is 00:42:04 So, of course, nothing prevents us from writing that. The easiest way is to port Python code using something like PyPyJS, and apparently it can work, but probably the more proper way would be to actually write a JavaScript client. And that's something we are planning. Another thing is that in this financial world, when banks want to outsource their databases to the cloud and do things like that, they often use Java.

Starting point is 00:42:47 And they obviously need some way for their Java code to interact with 0DB. And for that, we would need some JDBC connector. And that is possible to do. And I think with Java, we can go with Jython actually. So that begs the question of why at this point, maybe on the client side, at least, maybe a C library that's portable and can be interfaced and wrapped from these other higher level languages might in the long term become faster-moving. Because it seems like there'd be a lot of smarts in the clients,

Starting point is 00:43:31 and so you may end up with six implementations of the same encryption stuff. Just your thoughts on that? Yeah, we'll see how it goes. I guess we will start from porting from Python code, which is certainly possible to do. You can do that with JavaScript, you can do it with Java, you can do mobile clients from like Python ported to there. But long term, you quite can be right i guess it kind of depends too jared if um i mean it seems like this is for everyone but obviously financial tech has the most

Starting point is 00:44:14 or financial people have the most initial benefit from this but you know obviously other developers are thinking how could i use this into an encryption on my own account kind of depends maybe on their motivation on who they're focusing on so it might be open source but they might be focusing on financial tech and they're they're kind of hacked with python for example that's right yeah i mean i'm just thinking about it in terms of other databases let's take again mongo db as an example like as a business it's in m in MongoDB or in 10Gen's best interest of having as many client libraries

Starting point is 00:44:47 as possible and keeping those up to date and awesome and everything like that. And as the technology matures, they have to maintain

Starting point is 00:44:58 all of those, even though they're open source and of course, the community can do all this stuff. It's in their best interest to do that. And similarly,

Starting point is 00:45:04 I would think that it's in 0DB's best interest to have that. And similarly, I would think that it's in ZRDB's best interest to have as many clients as possible. Is that the case? Yeah, I mean, I think, is that the case, guys? Yeah, that's definitely the case. Yeah. And so then my question was like,

Starting point is 00:45:17 your guys' clients have to be really smart, right? Lots of surface area of code because of all that you're doing client-side. And so I just would see that that might be a bottleneck for you, like, you know, software wise potentially. So is it an issue? I mean,

Starting point is 00:45:35 yeah, I, I obviously, I think you're probably, you're, you're pretty, you know, that's a good insight.

Starting point is 00:45:40 I think in general, like it's always a struggle, you know, allocating resources when you're a super early-stage company and thinking long-term versus what are some quick wins. So yeah, that's certainly one of the challenges of doing any business, particularly a software business. Is that a place where you guys are looking for help from the open-source community, or is that more like

Starting point is 00:46:04 your bread and butter actually open source community already offers some help in that so we've seen some people who want to uh to port zero db to mobile platforms for their own purposes and we are actually we are happy to accept this help because this was one of the reasons why we open source. Yeah, I mean, if someone wants to build an alternative client, we'd be super happy to see that. And we definitely support them in that effort. Cool. So speaking just more about the client side of things,

Starting point is 00:46:40 since you're putting so much of the smarts in the client and you have the decryption is client side, right? The key handling is client side. And the server is an encryption store, basically. There's some smarts in there, but like you said, it doesn't have the secrets. It can't share the secrets. Is it just kind of moving the ball around a little bit? Because now instead of your server being the source of truth to a certain degree, and if it gets compromised, you're in big trouble. If one of your clients gets compromised, is the whole game up?

Starting point is 00:47:13 Or is there something that segregates where if a client gets compromised, it's not a big deal? Well, even today, you have this problem with the smart server model. If your client is compromised, you still get your data exposed. But we just close one window on the server. But this is actually not the only thing we do. On the server, you can, of course, throttle the data so that when a client is compromised, the attacker couldn't steal absolutely everything in just a little bit.

Starting point is 00:47:57 Another thing is we have this sharing or proxy re-encryption piece. The way it works, we can enable somebody else, be it third party or yourself using a different device. We can enable the third party to query your data still without the server knowing anything. So let's say you can share data with your friend, and your friend can still do the queries which you allow him to do.

Starting point is 00:48:40 And the server doesn't know anything, but the server can revoke the access so that your friend with his key cannot access this data after a day or two. How does that work? Yeah, so it's based on a technology called proxy re-encryption. It's a pretty young family of encryption algorithms. The first one appeared in 1998. The way it works is you have, let's say you want to share your data with somebody with a different key pair. You take their public key, your private key, and calculate a special thing called transformation key. You give this transformation key to the server. The only thing the server can do with the transformation key, it can transform data from being encrypted to you

Starting point is 00:49:31 into being encrypted for the third party you are sharing your data with. So we combine this encryption primitive with our query algorithm, and this allows us to pretty much share data on a granular level. So you can say, I want to share everything matching this query with the guy who has that public key.

Starting point is 00:49:56 And then that guy can pretty much query the data in that subset until this transformation key expires. When the transformation key expires, the server removes that and that guy cannot query this data anymore. So for example, if you are afraid that the third party can get the key compromised, you limit the time span of it to be very short. And you don't have to re-encrypt all your data again and again because you are not sharing the key with which actually the data are encrypted. I mean, that sounds like a real advancement.

Starting point is 00:50:40 Proxy re-encryption, that just sounds cool, right? It does. Very cool. Very impressive. We support proxy re-encryption. That just sounds cool, right? It does. Very cool. It's a very impressive term. We support proxy re-encryption. Of course. Yeah, it's been around for not a super long time. It's been around for a while, and there's actually been a couple of companies

Starting point is 00:50:56 that have tried to commercialize it in sort of a file-sharing context where you have files that are sitting up in the cloud that are encrypted under your keys and you want to share them with someone else. And I wouldn't say that's been super successful or hasn't really taken off, but I guess what's potentially interesting

Starting point is 00:51:14 about proxy re-encryption in the context of ZeroDB is that we can pair it with a lot of our other query protocols so you can share stuff on a much more granular level. Yeah, I was going to ask about use cases because it was, like, incredibly impressive in my mind, are other query protocols that you can share stuff on a much more granular level? Yeah, I was going to ask about use cases because it was like incredibly impressive in my mind, but I kept thinking like, how exactly would somebody use this to like an application advantage? I think file sharing is, you know, the one that comes to mind. Are there other ways that you guys see that being used that maybe even isn't currently being used or that can go to market? Right. So yeah, I mean, one kind of cool example would be, let's say, a healthcare app. So let's say you have a mobile application for storing people's personal medical

Starting point is 00:52:00 records. And obviously those are quite sensitive. If you're a user of this app, you don't necessarily want that app provider, that service provider to have access to your medical information, your medical data. So if someone were to build something like that on top of ZeroDB, they could guarantee to their users that their PHI was totally secure and totally owned by them. They control the keys. And of course, if they need to share it with, let's say, an insurance provider or their hospital or their doctor, they can use the proxy or encryption piece for that. So I always think that's a pretty cool example for how something like that could be used.

Starting point is 00:52:42 Yeah, that is very cool. And that just brings up my thoughts of HIPAA and other such compliance things. Do you guys run into any of those? I mean, like you said earlier, your main customer currently is banks. And so HIPAA is not a thing you guys got to worry about. Perhaps PCI. I don't know what other regulations the banking industry is under. You guys probably do. But have you come against regulations so far when it comes to zero DB? So in a way, obviously, a lot of our customers have to deal with these things. Us ourselves as a company or how we think about our products, we want to be sort of an enabler. And we're obviously providing infrastructure, a developer tool for people to build applications

Starting point is 00:53:27 that potentially could be compliant with these types of regulations like HIPAA or Dodd-Frank type stuff in financial services. And you have a whole other set of regulations over in the EU. So we don't, you know, we're obviously not building applications ourself at the moment.

Starting point is 00:53:43 So that's not something that we're having to directly deal with. But obviously, to the extent that our customers could potentially be using zero ADB to, you know, be in compliance with these regulations. That's something that we at least need to be aware of. I can give you one interesting use case from the EU has to do with data sovereignty laws. So let's say you're a bank in Europe and you operate in a bunch of different countries and then you generate some customer data in Germany. It's very difficult for you to move that data,

Starting point is 00:54:15 that customer data in the clear across country borders, even inside of Europe. If you encrypt it, and it varies by country by country, for example, Switzerland is quite strict, so it doesn't work there. But in general, if you encrypt it, and it varies by country by country, for example, Switzerland is quite strict, so it doesn't work there. But in general, if you encrypt that data, you have more flexibility in terms of what you can do with it, where you can move it. But of course, historically, the problem has been that once you encrypt it, you can't use it anymore. You can't query it. So a lot of banks actually will just have a data center in every single country that they operate because of this data sovereignty issue.

Starting point is 00:54:46 Whereas if they were to use something like 0DB, they could have the keys in-country, encrypt the data in-country, ship it off to a data center in Luxembourg, for example, still have it be queryable, and only ever decrypt it inside in-country as well. So in that scenario, you could consolidate 25 data centers in 25 countries down to potentially a much more manageable handful. And that is actually one of the reasons why we are in London at the moment. We see a bunch of regulations which don't stop us from operating, but actually help us. And we are actually exploring that. So you said you're at the fintech innovation

Starting point is 00:55:27 lab in london that's what you're doing there now that that's your incubator that you went through right yep that just started actually this monday so we're sort of diving in this is new for you though like this yeah yeah we have a few more questions on the tech side of things but since this is the topic now i was just talking to to Jared behind the scenes about whether we should bring this back up again. So what are your goals, I guess, for this? How long is this process for you to be part of this incubator? Yes, so the FinTech lab lasts until mid-April. So we'll be primarily here in London, obviously.

Starting point is 00:56:00 We actually sit in Canary Wharf, which is sort of one of the banking districts in London where a lot of the banks are located. We'll be traveling a little bit to the EU to visit some of the banks in Italy and Germany, for example, and then occasionally back to the US, obviously, because we have things going on there as well. But it's a standard sort of three-month program that a lot of these accelerators last. It's actually run by Accenture. So they're obviously quite helpful from having experience working with a lot of these banks and selling into banks, which, you know, you don't have to get into it. But that whole enterprise sales process is a whole other bugaboo there is trying to sell into banks, which in general is quite difficult.

Starting point is 00:56:47 I mean, is your goal to move the company forward in this incubator or is your goal to move the technology forward in this incubator? I mean, is it one and the same? Yeah, they pretty much go hand in hand, I would say. You know, obviously it's important for us to make the product better

Starting point is 00:57:04 and not only more secure, but a big focus obviously for us is performance and limiting the trade-off that people are making. So obviously you have encryption and decryption overhead and things like that, but you want to make it as drop-in as possible so that people aren't really getting hit on the performance side when they're using something like StereoDB as opposed to just having data in the clear or just encrypted at rest. And obviously when you're dealing with banks that have just massive amounts of

Starting point is 00:57:32 data, performance is a, is a pretty big concern for them. So yeah, I mean, I think as you improve the product, you're moving the business forward at the same time. We have a few more questions on the performance stuff, but prior to that, I'm just thinking how, and maybe the question is really easy to answer, but how do you or how have you been able to financially set yourselves up to be able to take this time off and still live your lives

Starting point is 00:58:00 and still do whatever you do? I have no idea if you guys are families, dads or whatnot, so I'm just making assumptions. Human beings have lives, so how do – obviously we have lives. But how are you guys able to take this time to spend to develop the company away from Silicon Valley where you're normally at? What have you done? Do you have backing?

Starting point is 00:58:22 Do you have funding? What is that scenario like for your company? Yeah, so we've actually just been bootstrapped so far and been using our own money i'd say it's just the two of us uh co-founders and developers at the moment so we don't have a ton of expenses you know outside of sort of of course you have to support yourself and you have to eat and you have to have somewhere to live right uh so there's some baseline um sacrifice too so it sounds like you're okay with that yeah absolutely i mean you're i think the reality of being of doing a startup a lot of times is you're gonna you're gonna go through that ramen stage where you're of course not not able to spend a lot of money because there's just not a lot of money

Starting point is 00:58:59 there um we are hopefully you know fingers crossed in the next couple weeks gonna have a small angel round um that'll obviously help quite a bit and help hopefully you know fingers crossed in the next couple weeks gonna have a small angel round um that'll obviously help quite a bit and help us you know potentially hire a third engineer to move even faster so potentially april may time frame that'll happen whenever the incubation stage is over or is that yeah so we would we would uh raise around sometime early summer late spring and towards the end of this and so j, Jared, chime in here wherever you want, but I mean, as we've mentioned before this, we've got this growing trend of, you know, RethinkDB,

Starting point is 00:59:32 and, you know, we can name a number of others that have come on the show and talk about, you know, building an open-source product, but also building a business alongside of that. Metabase rings a bell as well. That discussion we had there, it seems like databases are easy to get, you know, as well that that discussion we had there seems like databases are easy to to get you know we're not so much easy but it seems like they're prime for this kind of

Starting point is 00:59:50 infrastructure yeah infrastructure these tools um so i guess that's just kind of an interesting take there we're seeing uh you know this trend of of companies building up source products how does that span for you i guess guess, moving forward? Like, what does that look like for, you know, what is, I guess you're offering to anyone who would want to fund you? What is their, what does their get? You know, I don't know if you watch Shark Tank or not, but no one gives somebody money thinking they'll never get it back. Yeah. And obviously for us as entrepreneurs and business people from that side, we're, you know, we're obviously doing this.

Starting point is 01:00:26 Obviously, we want to build something cool and awesome that people can use. But also, of course, you want to build a sustainable business as well. So I think in general, I think if you think about going back to some of the things we talked about before with banks, moving a lot of their infrastructure from on-premise to the cloud, you know, starting to hit that way, that's starting to happen over the next, well, now a little bit, and particularly over the next three, five years. That's something that we can really help them with and accelerate that process. But even beyond that, you know, I think just in general with all the data breaches that have been happening and, you know, security becoming increasingly sort of top of mind, not only for business people, but for developers and

Starting point is 01:01:08 technical people as well. I mean, you need to think about, you know, how you can build applications that aren't going to lose your customers' data and destroy your business. That was exactly what Jared and I were talking about. As I mentioned before every call, we have some sort of sync up and Jared asked me, he's like, how did this hit your radar? And he wasn't asking in our in an argumentative way he was like you know genuinely you know how did this hit your radar what was interesting to you adam

Starting point is 01:01:31 and i was like well i mean it was it hit nightly so that was one thing and that was our own radar but then also i was thinking like over the last four or five years and i'm sure it's it's happened lots more that you know lots more before those times but it's become more and more clear to me that we've had more and more cybercrime, so to speak. Sony was down for, I mean, it's a shame I couldn't play my PlayStation for a month, but Sony was down, and that was a big one there. And then you've had lots of different comprom you know, compromises, so to speak. Yeah, targets, compromises. And next thing you know, credit card data, all this data. And then not only that, you know, they've got my password if that stuff's not encrypted right or whatever.

Starting point is 01:02:15 I don't know. But then, you know, everyone's not secure at that point. So it made sense to have this conversation with you guys? I think over the last year, there happened maybe 13 or 14 big security compromises. And yeah, that is something, it was something which just started popping up again and again. And that's increasingly a problem. Yeah.

Starting point is 01:02:41 And I guess I think also like part of the bet of our company is if you look at sort of the last generation of big tech companies like Facebook, Twitter, LinkedIn, a lot of the value that they're creating is based off of their users' data. So they're monetizing that. But if you think about potentially what the next wave could look like, and actually Lawrence Lesik had an interesting post I think a week or so ago about this topic is, is how companies are not starting to think of customer data actually as a liability and not necessarily the source of value. So if you're, let's say a financial services company or a new insurance startup, insurance tech startup,

Starting point is 01:03:17 and you have data on your customers, that's a potential risk for your company. If you get hacked and you lose that, particularly when you're just starting to build trust with people, your business is basically dead. So if people start to build on top of things like zero DB and other options where they're not necessarily having to access that data directly,

Starting point is 01:03:40 they remove that liability and that threat to their business. Well, let's talk about that. Let's talk about building on top of zero DB. After the break, our final break, by the way, we will talk to you guys about, let's just imagine we're going to build zero book. You know, we want to build a huge worldwide social network

Starting point is 01:04:00 on top of zero DB technology, scale implications, performance implications day to day management those type of things and then of course getting started as a bit of that as well so i think it'd be a good place to take this conversation uh after we hear from this last sponsor and then we'll be right back every saturday morning we ship out an email called change law weekly that covers everything that hits our open source radar it's our curated editorialized take on what you need to pay attention to from this week in open source and software development. There's no algorithms.

Starting point is 01:04:31 There's no machines. It's just us paying attention to what we pay attention to so you don't have to. Head to changelog.com slash weekly to subscribe. And now back to the show. All right, we're back and we're ready to talk about using zero DB practical applications, um, performance scale, what have you getting started, but let's start with performance because you guys do have some posts on your blog about performance. And anytime you add a layer to the conversation, which you're sending encrypted bits down the wire

Starting point is 01:05:08 and decrypting them in the client, there are performance implications. So we'll just kind of lay that on the table and say if I'm using zero DB versus some other DB, which is exactly like zero minus the encryption stuff, what penalty am I paying? Right. like zero minus the encryption stuff what penalty am i paying right so basically uh the biggest uh the biggest implication for us is that uh the way we work is we the client talks to the server while doing the query more than once so a typical database is sending a query to the server

Starting point is 01:05:43 the server is doing something and returning a result in our case the sending a query to the server, the server is doing something, and returning a result. In our case, the client talks to the server a little bit. So it sends a little bit, gets some information, decrypts it, sends something again, and that repeats several times. So what that comes down to is pretty much latency between client and server. For each query, you have a performance penalty of multiple times this latency. What helps with that a little bit is client-side caching of the top of the index and that makes queries faster by a factor of two about that. But I would say there is a positive thing as well since a lot of logic and encryption

Starting point is 01:06:40 decryption happening on the client. If you have some database server and multiple clients, each client decrypts its own data on its side. So you have this decryption load parallelized between multiple clients, and this kind of takes this load off the server. So in certain cases, you can have actually performance higher than in a typical situation. It seems like, and I guess this is the same for traditional databases, but a write-heavy application where a lot of clients are writing

Starting point is 01:07:20 is going to be busting that client-side cash more often and require more round trips. That's true. Okay. So there is a penalty. Sometimes it's mitigated down to zero or negative, but something to definitely think about as a concern when going. And as it is with anything that you're going to adopt, there are trade-offs.

Starting point is 01:07:43 What about scaling? So like I mentioned before the break, I'm building thisoffs. What about scaling? So, you know, like I mentioned before the break, I'm building this thing. I don't know if you've ever heard of it. It's called ZeroBook. And it's going to be huge. I've just got my first 5 million users. I don't know who they are because all their data is encrypted,

Starting point is 01:07:57 but I'm sure that they're awesome. And I'm trying to now expand my server-side infrastructure by 0DB based database infrastructure past one machine. I'm thinking of sharding or replicating or whatever it is these people do at Facebook. What do I got to do? Yeah, so basically replication and sharding we inherit from ZODB or things built for that. Pretty much one is called ZRS, which is just replication. And there is a thing called Neo, which is actually already sharding.

Starting point is 01:08:39 And actually using that, we can scale it up. So I guess for actually scaling the thing you need all your data to reside on multiple servers. So when you get to the situation when all of your customers data doesn't fit onto

Starting point is 01:09:00 one server. So for this we can use these things I mentioned which are already built. And they are integrating pretty well with 0DB just because we use this existing technology stack. But of course there are some... There are some scalability thoughts about doing certain queries in an encrypted manner. So that is something we are constantly improving.

Starting point is 01:09:34 Are there best practices that are coming about or things that you or your users are learning over time and kind of gathering together a corpus of knowledge about how to go about it? You mean about the performance and scalability? Yeah, just the scalability. But you said that there are things that you're learning about how to do the queries. Or is that all internal to the protocol and not to the person who's actually... Yeah, it is actually internal to the protocol. So I don't think at the moment developers should worry about that. If there is something they should worry about that, we will definitely publish that. And if there are some recommendations. stuff. So it seems like it's an easier onboard if I'm building a Python client, something that can rely upon Python. So let's just start with that one. Let's say you sold me on the idea of 0db. I'm ready to start coding up 0book. How do I get started? Yeah, so we've got a little quick

Starting point is 01:10:37 start tutorial on docs.0db.io. And then you just grab 0db-server from a GitHub repo. And it's pretty easy just to sort of get around, get started and start playing with it. You can populate it with some dummy data and we give you an example of how to write your data models and how to query the database. So pretty easy to at least get a sample of it. Seeing this as kind of a brave new world of databases, I'm assuming that there's somewhat

Starting point is 01:11:13 of a lack in terms of tooling around zero DB and you know, query generators or any sort of thing like that. Is that an area where you guys are looking to innovate as a company perhaps, or is that something that you just completely open up to the open source community and hope that they would build these kinds of infrastructure around your infrastructure? Yeah. I mean, there's obviously things that we have in our roadmap for the company itself that we'll, we'll plan on doing.

Starting point is 01:11:43 And then to the extent that other people do them first, that's, that's great. We're happy to support people who are, who are building stuff around zero DB and on top of zero DB. And even just in the last, in the couple of weeks that it's been open source, we've seen some of that already. I know, I think, uh, in the first week, maybe someone started starting playing around with Docker and put it into a container. I think a couple people have written plugins for various web frameworks.

Starting point is 01:12:10 So we're seeing some of that already. And yeah, obviously to the extent that that continues or increases, we're very happy to see any activity around 0DB. What do you say to the person who's not interested in Python at this point when you guys still, you know, the person who's not interested in python at this point when you guys still you know the javascript clients in the works python the client is all that we have but they're willing to get involved and actually you know do an implementation um getting started for that person is the protocol documented or is it open up the python client and start to you know emulate

Starting point is 01:12:41 what you see there um what's the kind of process that you would imagine somebody would go through to implement their own? You mean to implement their own client in a different language? Yes, and then also just like that, you say it's not just a database, it's also a protocol. Is the protocol documented or is it out there? Do you just read the code?

Starting point is 01:13:02 Yeah, so we're actually publishing a paper, hopefully in the next week or so, maybe a little bit longer. Don't hold me to that. It'll have some more specifics and we'll also have sort of more on our threat model and security assumptions as well. Some of the crypto primitives that we're using

Starting point is 01:13:21 and potential sort of future optimization. So from understanding the protocol in more depth, that'll help quite a bit there. Actually, the protocol itself is derived from what was internal for the ODB. And we've talked to the founder of Zope and ZODB, Jim Fulton, who pretty much sees the future of ZODB as allowing alternative clients, which are non-Python, to work with that.

Starting point is 01:14:02 And since we are based on that, we are pretty much in line with his vision. So I guess we can pretty much work together with people who work on ZODB to enable these alternative clients. I guess it's kind of been somewhat the ask here, but not directly, you know, talked about the protocol, the client, you know, if somebody wanted to create their own client, you guys are in an incubator right now.

Starting point is 01:14:29 You've obviously got the next, you know, three or four months kind of, to the most part mapped out in terms of what some of your goals are. But if you had the ear of the open source community and they were thinking similar to what Jared said earlier, Hey, you know, I'm not digging Python, for example. I'm digging something else. Not so much just that, but how can the community step in and help you to move the ball along? If it's not, you know, your financial or not your financial goals, but your goals around your business. If it's not that, if it's around this technology and moving it forward, what are the best areas that the open source community can step in and help out?

Starting point is 01:15:07 It's very much around building alternative clients, writing plugins. Another thing actually that helped a lot in the first couple of weeks after we open sourced it was just having people look at our implementation. And we had actually a lot of good feedback around things we could potentially do in the future to even make it even more secure. One of the things that we do leak is access patterns. And what I mean by that is the server can observe, obviously, which encrypted buckets that are being queried. And in theory, over time, they could deduce things like the order of the database from those access patterns.

Starting point is 01:15:47 So one of the sort of active research areas right now is something called Oblivious RAM or ORAM, and that's designed to obfuscate these access patterns. And so that's something that we could potentially add in the future. So things like that, obviously, that help us sort of map out the roadmap for what zero DB will look like in the future. So things like that, obviously that help us sort of map out the roadmap for what zero DB will look like in the future. Um, that's helpful as well. So if someone's out there and they're listening, okay, well, I'm going to go to the org on GitHub, which is github.com slash zero hyphen DB. Would it be, uh, okay for them to hop into the zero DB, uh, repo and just drop an issue there and share some ideas or is there different ways to communicate with you guys?

Starting point is 01:16:30 Yeah, that's, that's certainly an option where, you know, we're, we're actively paying attention to GitHub and you can open issue there. We have a Slack channel too, that we're, that we're on most of the time and it can be very responsive there. Um, I mean, we're, we're pretty much online 24-7, so we're on Twitter too, so it's pretty easy to get a hold of us. Gotcha.

Starting point is 01:16:49 Is that Slack channel mentioned anywhere on 0db.io? Oh, I guess it is right there, right there at the top. I missed it. It's too obvious. It's so obvious that I missed it. So you got a mailing list too, so you're sending out emails. How often are you emailing people about what's going on?

Starting point is 01:17:04 We don't email very often i think we sent out two emails in the entire life of our mailing list so one was uh you know blog like back this was back in march one was a follow-up blog post we talked about a little bit in more detail how the protocol worked and then the second was when we actually open sourced it in December. So we're not going to, we're not going to spam you too much. Yeah.

Starting point is 01:17:28 Well, if you're, if you're heading to zero db.io, they got to link the GitHub there, obviously, which is the organ just mentioned the Slack channel to sign up to, if you want to talk real time or the mental list and never get an email or at least infrequently,

Starting point is 01:17:41 that would be good. Just throw important stuff yeah well that's good right i mean you didn't promise a weekly email um but uh email is really interesting so all right fellows we don't have many more questions for you i know we wanted to talk quite a bit about that we're running out of time for our show anyways so we'll skip the closing questions here this is twice in a row we did it with you huda and now we're doing it here in this show to skip the closing questions. This is twice in a row. We did it with Yehuda, and now we're doing it here in this show to skip the closing questions. So maybe it's becoming a trend.

Starting point is 01:18:08 Crazy. We're crazy. We'll see. I don't know. But, fellas, I want to thank you guys for taking the time to come on the show. And obviously to think proactively about the future of technology and being able to, as you said earlier, McLean, to bet on this technology. One, to build a company around it, and two, to open source it. Obviously, you guys will have some financial gains in the future with what you do with

Starting point is 01:18:29 it, but it's generous of you to give back so much to open source and trust the process of open source and all that good stuff. So that's really awesome. I want to thank our listeners for tuning in and also our members for tuning into the show this week our sponsors for the episode this week are code ship top towel and digital ocean and if you're not a member yet you can join the community for just 20 bucks a year and we'll give you an all access pass everything we do head to changelog.com membership to hear more about that our next show is actually already recorded it's already in the can,

Starting point is 01:19:06 but it's episode 191 with Richard Feldman. He's from NoRedInk discussing Elm, which is described as the best of functional programming in your browser. Jared, that was an awesome show, right? Yeah, it was. And some awesome news came out of NoRedInk just last week. Or was it this week? We had a post on it on the website

Starting point is 01:19:25 you had a conversation with him yeah about uh what's going on there you want to talk about that real quick well they have it announced we shared it on our blog uh it's a it's on sound cloud as well but we talked to richard briefly about uh evan japlicki which i'm not sure that's exactly how you say his last name so forgive me me if that's wrong. But he's joining NeverEdding, which is a really interesting thing. They are stepping up to support him, and he's coming out of the work, and he's creating the Elm Foundation, the Elm Software Foundation. Really interesting to see that happen in there. So check out the blog, changelog.com, to find that.

Starting point is 01:19:59 There's an audio piece there as well. It's not in the main podcast feed, so if you're looking there, you're not going to find it there. But we mentioned both of our emails also in the show, so I'm going to mention that one more time. ChangeLog Weekly, we ship that every Saturday. It's our hand-curated editorial take on the week.

Starting point is 01:20:15 And we also ship ChangeLog Nightly, which is our radar as well. So as you can see, we create new shows from this. We're constantly watching this. So we ship that every single night at 10 p.m. Central Standard Time, covering the daily trends in open source on GitHub. You can subscribe to both of those at changelog.com slash weekly and changelog.com slash nightly, respectively. But, fellas and everyone on the call, that's it.

Starting point is 01:20:38 So let's say goodbye. Goodbye. Thanks for coming on, guys. Thanks, guys. This was a lot of fun. Bye. Bye. goodbye goodbye thanks for coming on guys thanks guys that was a lot of fun bye We'll see you next time.

The Changelog: Software Development, Open Source - ZeroDB (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.