Programming Throwdown - 121 - Edge Computing with Jaxon Repp

Episode Date: October 22, 2021

What is "The Edge"?  The answer is that it means different things to different people, but it always involves lifting logic, data, and processing load off of your backend servers and onto ot...her machines.  Sometimes those machines are spread out over many small datacenters, or sometimes they are in the hands of your customers.  In all cases, computing on the edge is a different paradigm that requires new ways of thinking about coding.  We're super lucky to have Jaxon on the show to share his experiences with edge computing and dive into this topic!!00:00:23 Introduction00:01:15 Introducing Jaxon Repp00:01:42 What is HarperDB?00:08:10 Edge Computing00:10:06 What is the “Edge”00:14:58 Jaxon’s history with Edge Computing and HarperDB00:22:35 Edge Computing in everyday life00:26:12 Tesla AI and data00:28:09 Edge Computing in the oil industry00:35:23 Docker containers00:42:33 Databases00:48:29 Data Conflicts00:55:43 HarperDB for personal use01:00:00 MeteorJS01:02:29 Netflix, as an example01:06:19 The speed of edge computing01:08:43 HarperDB’s work environment and who is Harper?01:10:30 The Great Debate01:12:17 Career opportunities in HarperDB01:18:56 Quantum computing01:21:22 Reach HarperDB01:23:53 Raspberry Pi and HarperDB home applications01:27:20 FarewellsResources mentioned in this episode:CompaniesHarperDB https://harperdb.io/MeteorJS https://www.meteor.com/ToolsRaspberry Pi https://www.raspberrypi.org/Docker https://www.docker.com/If you’ve enjoyed this episode, you can listen to more on Programming Throwdown’s website: https://www.programmingthrowdown.com/Reach out to us via email: programmingthrowdown@gmail.comYou can also follow Programming Throwdown on Facebook | Apple Podcasts | Spotify | Player.FM Join the discussion on our DiscordHelp support Programming Throwdown through our Patreon ★ Support this podcast on Patreon ★

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everybody, this is an awesome episode. I'm really looking forward to this. Edge computing is one of these things where when I first learned about it, I thought it was just client-side computing. I thought it was something on the browser or something on your mobile device or something like that. When you think about it, there's actually a whole gradient between that and some backend-end server, right? So imagine Netflix releases a new episode that they know it's going to be super popular, their most popular show, and everybody just starts downloading it, and that would just completely blow them up, right? They also just can't put a two-gigabyte show on everybody's phone proactively. They can't do that either, right? So there has to be an answer there.
Starting point is 00:01:07 And edge computing is a big part of that answer. And so there's a lot of complexity around how this all works. And I'm so happy to have, you know, head of product at HarperDB, Jackson Rep here to really dive into edge computing and learn as much as we can about this topic. So thanks for coming on the show, Jackson.
Starting point is 00:01:24 Thank you very much for having me. I really appreciate it. Cool. So we always kind of start this off. How has COVID kind of changed Harper and changed your kind of work style? What do you feel has been the sort of big salient point from that? Well, we're a relatively young company formed in 2017.
Starting point is 00:01:46 We were in a co-working space. We had a bunch of different little isolated spaces together, and we had finally got enough traction, got our product where we thought, yeah, we took another round of funding and we leased an office space in January of 2020. Oh, wow. One month later, everybody was like, maybe that was a bad idea. And we became a fully distributed company in much the same way we are a distributed product, a distributed database. So after a few months, we took a company survey and every single person was more productive than they had been in the office. I think we all realized we had a much better work-life balance. My cats are much less lonely, although lonely was kind of their jam.
Starting point is 00:02:33 And we decided sort of as a company that the money we would have spent on rent, we think we're going to spend on annual or biannual retreats where employees can bring their families and we'll go somewhere like Mexico and do morning, you know, planning meetings and afternoon is yours. Because to be honest, we're still small enough, we can pull that off. And part two is, you know, a lot of us, at least at our company, are pretty okay with, you know, being able to better divide that line between, you know, software development is often one of those all-encompassing things that takes over your soul and all of your available hours. And I don't know
Starting point is 00:03:11 about you, but I'm not as young as I used to be and I have kids and I would like to see them. They are not terrible people. I agreed with you up until the end. No, just kidding. But no, I think you're totally right. I think, you know, having, you know, a get together at some cadence, maybe, you know, every year or every, you know, semi annually or something like that, you know, that is super nice. And that can, that can, you know, keep that bond going and like start that bond up with new folks. But coming into the office every day, I mean, you know, Patrick and I used to have huge commutes, like hour each way commutes. And that just eats so much of your day. And so many times, you know, you're there. And there were days where I went in and left and didn't even really talk to anybody. It's kind of
Starting point is 00:03:57 like, well, what was I doing for those two hours? Right. So, yeah, I think, you know, you can get a lot done. You can get by with not coming into the office every day. I think that's that's definitely something we've all taken away. So what happened with your lease? Like, were you able to break that lease or or how does that work? In general, I'm always fascinated with what happened to corporate property at this point. I think we got out of it and I think it was because basically they had entire multiple floor tenants who were trying to fight the same battle. And to be honest, I believe probably the real estate agents lawyers were too busy fighting real battles. Not to worry about little startup who took a corner unit with almost no windows.
Starting point is 00:04:41 I honestly, I look at it and I say, to your point about commutes, I used to commute an hour every day. And then I moved to Harbor DB and my commute was 15 to 20 minutes. And I was like, that is so much better. And then I switched to a five second commute from my bedroom to the living room. And there is something to be said for the decompression that comes with a commute oh that's true yeah from being sitting there and staring at your code like that kid who doesn't know why it works and doesn't know why it doesn't work and all of a sudden it works again and you're like okay i'm done and you walk out in the living room and kids don't respond that way they just you cannot you cannot debug them they are just a constant challenge. And, and to be able to be
Starting point is 00:05:27 present with them after you've spent, you know, a whole day banging your head against your desk, that is a skill in and of itself, being able to walk out and be present and not be focused on work still. Yeah. Oh man, you hit on something that's such a, such a good point. I mean, so a couple of things to riff on there is when I started taking a walk, so I started walking to work where I'd basically just walk in a circle around our neighborhood and that's my walk to work. And I just feel like mentally it kind of puts me at a different place. So far, it seems to be working.
Starting point is 00:05:58 Maybe it's a little bit of placebo effect there or something, but I feel like it's doing something. And then the other thing is, yeah, I feel, I feel like even at work, you know, there's, there'll be a situation that's totally on fire and you have to make some really hard decisions very quickly. And there's, it's a zero sum game and everyone's really upset. And then you go from that to, uh, let's say a, a meeting, a one-on-one meeting with somebody who's doing an amazing job. And you have to kind of switch gears from, you know, your, your, your sort of debate face to, to someone who's like extremely excited and happy and appreciative. And then that meeting ends and now you're with your family, which is like another dimension. So I feel like being able to toggle all of these
Starting point is 00:06:40 different persona that, that has been really, really difficult, um, over VC. Whereas, whereas, whereas in a real office, you would at least walk from one room to the other and you'd have time to sort of, you know, like reframe yourself over and over again. Yeah. Just sitting down to dinner and, you know, kind of crossing your hands and saying, all right, let's talk about your performance today. That's right. It's my understanding that you spilled food on your, on your shirt at the beginning of the day.
Starting point is 00:07:13 And then you had to walk around with that stain. That's clearly not the image we want to project Grayson. Yeah, that's right. You might have to go live next door. Oh man. So cool. It sounds like a, yeah, it really worked out for the best. And I think that this acceleration, I mean, definitely, you know, you wouldn't ever wish COVID on anybody or any country or anything like that. But there has been some real silver lining. I think this has been one of them where we've started to understand the working relationship better. Cool. So yeah, let's dive into edge computing. So initially, I thought edge computing was just on the browser, on the mobile app, right? And that's definitely like the extreme edge. I mean, there's definitely things you want to do in that space, but there's a whole bunch of stuff in between that and let's say an EC2 instance you have running. And so kind of walk us through what that really is,
Starting point is 00:08:05 like what is available there and what can people do on the edge? Well, the edge is defined loosely, to say the least. We used to think of the edge as not so far out as the browser because there's so many limitations in terms of what you can do. You're in a sandbox. So the next smallest compute unit that we would focus on, and this is both at HarperDB and my prior company, which is an IoT platform, things like Raspberry Pis, small microcomputer, you know, Jetson boards, stuff like that, where you can run code that handles a workload and will perform some smaller tasks than could be handled on a larger server up in the cloud. What you find though, is that
Starting point is 00:08:51 that hardware is not super ready for a lot of dynamic programming and workloads. If you want to be out in a vineyard, for example, an autonomous vineyard just outside of Tucson, Arizona under my previous company. And we are automatically watering the vines and using compute to analyze soil moisture content and humidity and temperature and canopy and shade and infrared and all of that stuff. And we are still using ruggedized raspberry pies out there because it's hard to find something that will give you the flexibility to install a platform or a database or whatever you might want to store or run on it and then likewise uh handle the reality that it rains outside or that it's 150 degrees sometimes when when you're down there measuring temperature so the hardware was a huge challenge and we were banging our head against that at
Starting point is 00:09:45 HarperEB because we knew that distributed computing would require distributed data. And we knew the benefits of distributed computing. Running an AI model at the edge on a small data set or stream data set is much more efficient than shipping it all up to the cloud, especially when you have intermittent connectivity, which is often the case out at what we call the edge. Can you describe it? So I thought that the edge was like maybe the ISP or something like, like what exactly is the edge? Is the edge like your house or, or is the edge some server in between you and the internet
Starting point is 00:10:21 or what exactly is that? The Edge is everything outside what I would call a major colo facility, like a major cloud provider. So we've partnered with Lumen, the old CenturyLink slash Level 3. They're pushing micro-edge data centers, which are still data centers and still much more server capacity than
Starting point is 00:10:45 I have ever had in my closet at home. But to them, that's the edge. And that was really the change for us to realize that the edge isn't a wearable because we can't be installed on it, but it is the edge to some people who can compile apps and put it on a watch. For some people, that's the edge. For a lot of people, you know, where your sensors are and where you're collecting that data is the edge. That's how you define it. So you fight the battle and you find the hardware to pick and survive there and collect that data. But for other people, the edge is simply my customers have a really fast connection. I can trust that that connection will exist. I just want to move my application closer to them so that the round trip to the API
Starting point is 00:11:33 is a millisecond instead of 300 milliseconds. That's the sort of performance that I want to get back. So we find that lots of people are defining the edge. The people who own giant cloud data centers people are defining the edge. I mean, the people who own giant cloud data centers are definitely defining the edge as a slightly smaller data center, slightly closer to the users. And the people who are building apps that collect sensor data and make use of that with machine learning, they're pushing it out further and solving those hardware challenges. I don't think we really need to truly define it more than that because tomorrow we're
Starting point is 00:12:06 going to invent some new technology that's inside me. Yeah, that's right. We're all the edge. Okay. Oh, that totally puts it in perspective because I did a bit of research prior to the show. People who listen to the show know I'm an AI person, so I don't have any background in full stack, but I did a bit of research and what I saw was things like Cloudflare Edge and AWS Lambda Edge. And that sounded like, as you said, just a smaller data center. And there's just a lot of them. But it sounds like Edge is much bigger than that. I mean, it's also your vineyard is a great example.
Starting point is 00:12:40 So in this case, you have this swarm of raspberry pies on this vineyard, and now that's the edge. So with Harper, for example, are you concerned with all of those different types of edge computing, or are you focused more on the former or the latter? out when I first joined HarperDB, we were very, very much focused on let's go into AI-powered classifications in the mining industry. And that's a very small, it might be a Dell Edge device or a Raspberry Pi for proof of concept. And let's do our calculations and let's provide real benefit. And we could absolutely do that. But the client loves it. The result is great. And they're like, cool, we ran that, but we can't really run a raspberry pie in this hot, smelting environment. So what hardware solutions do we have? And inevitably, you run across budget concerns because if we've got 150 of those across a plant and you're going to spend now $3,000 on a ruggedized piece of hardware, well, now all of a to create a HarperDB device with our product on it,
Starting point is 00:14:06 and we incur that capital cost. And now they can pay monthly so they can run it as OpEx. And from a budgetary perspective, it's very, very challenging to have that be the edge. And we found that the big guys with all the money who are absolutely pushing their giant cloud service offerings to these smaller data centers, they are just as desperate to move data and compute and functionality and capability to the edge. And they have much bigger pocketbooks and they can make it happen more quickly. So, I mean, not a lot more quickly. We can do POCs all day long for smaller companies, but there's still a real hardware problem out there for true edge computing. Yeah, that makes sense. So let's step back a little bit. Now we've defined edge computing,
Starting point is 00:14:55 which I think is super, super useful to set the frame. So what got you into edge computing? Give us kind of a bit of a background on your kind of story and what led you to HarperDB. Sure. I was a partially reformed software developer. Partially reformed. Partially reformed. This is my eighth startup. I was at a communications startup where we were joining together all your phone calls, texts, emails into threads for customer service. And the UI for that, it basically, I wrote React, like literally a year before React came out. Wait, how is that possible?
Starting point is 00:15:35 There's like a preview release or something? No, no, I wrote effectively the functionality of these modularized HTML because there wasn't one, but I knew that that's what we needed. Wow, isn't that amazing how, how there's like a, it's literally great minds think alike. Like, you know, there's, there's, there's this idea and it just, a lot of people kind of come to the realization at the same time. Yeah. It just seemed, it seemed so obvious. And then I immediately moved, uh, well, my wife was pregnant at the time and she, she kept getting more pregnant.
Starting point is 00:16:05 And I didn't want to pay cash for that baby. So I had to get a job with insurance. And I worked for DirecTV. And it was one of those jobs. I've never worked at a company where I had to wear khakis and a button down before. And it just didn't fit. So I started looking for my next opportunity. And I found an IoT platform, sort of a low-code drag and drop, drag in your sensor, block.
Starting point is 00:16:31 And now the data that comes off of that, capture that from port three and divide it in two and keep a running average in your memory buffer. And if it goes above, by the way, fetch a threshold from a database, if it goes above that limit, then send an email. So it was a super easy to use platform a la Node-RED, only sort of enterprise grade. And it was a great product. costs for any given operation dropped dramatically because we didn't have to have massively powered servers on the floor and run cables to everything. These could be wireless connections. We could use low energy Bluetooth if we had the ability. So there was a tremendous opportunity for us to capture and refine data and then not send every single piece of sensorized data,
Starting point is 00:17:26 every piece of sensor data up to the cloud. You very quickly, when you're in an installation with a thousand sensors, realize how much of your pipe you're taking up, sending everything up there to be analyzed. So it made sense from a filtering perspective to me. And then it was just about how can I make this easier, faster, more stable? What are the challenges that I have and how can I overcome them? That makes sense. And so that company that you're at, which had the IoT devices, that was from there you went to Harper. Is that the step before Harper? Correct. I was looking for a network fault tolerant data solution. And I actually, I found HarperDB just, I think may have actually searched for that. We had great SEO back then. Nice. and I integrated, I built a block for it for our platform. But I ran into some documentation issues.
Starting point is 00:18:26 Their postman collection appeared to be a little outdated. So I rewrote that postman collection and just sent him an email at hello at harperdb.io. I'm like, hey, your documentation appears to be a little out of date. Here's a new file. I rewrote it. So that worked for me. So I could use the postman collection basically locally. I said, thanks.
Starting point is 00:18:43 I continued to implement it, ran into a couple of things that I thought would be cool to have. And I wrote to them and eventually they just wrote me back and said, we would really like perhaps you to work for us. Cool. It would be cool if you could consult or just help out. I'm like, ironically, this office is about to go virtual and I have kids at home and I don't think that I can survive that. And so they invited me to office with them half time, which turned into full time.
Starting point is 00:19:11 And here I am two and a half years later. That is awesome. I mean, that's I think it's a really great story. We get a lot of folks asking, how do I get into the field? Right. And I mean, there's a perfect example where they, well, I mean, they might have looked you up, but let's assume they didn't. I mean, they might not know your college degree or whether you went to this bootcamp or that bootcamp, but you're
Starting point is 00:19:36 providing real value to them. And you knew, they knew that you knew what you were doing. And so they reached out to you and got that process going. That's a real sign. I think it's inspiration for people out there who want to get into the field of you just get in there and start using things and be a part of the vocal part of the community. And that can go a really long way. Yeah, I think one of the things I've noticed when I work with newly onboarded employees is they're very much two types, the ones that need to wait to be told, you know, what to do and how to solve a problem, or they, they always have questions. And then the ones that bring me two solutions to a problem they've encountered and they Google them or stack overflow them and, and found them and say,
Starting point is 00:20:24 I don't know which, but both of these would solve the problem. All day long, that makes me do a happy dance inside as opposed to the other. Likewise, when you're working with a product, I know that they don't want their documentation to be outdated. Nobody wants that. I hate documentation, but I was willing to do it because I needed that postman collection to work for me anyway. As soon as I got it to work for me, I'll just send it to them. And that way, nobody else has to have that problem. We have enough problems as programmers. Yeah, that's right.
Starting point is 00:20:54 Let's not let them fester out there in the world. Yeah, totally. Cool. Yeah, that's awesome. And so, yeah, that's great. So you were using HarperDB as part of this IoT project. You're communicating with them. And you said, wow, this is actually a really cool piece of technology. I want to go there full time. And also the idea of virtual thing. So let's jump into how do people write code for the edge? And how is that different from, you know, building a regular server like in PHP or something like that? Like what is, what makes the edge environment different to work with? It's a very good question because based on our previous, based on your previous question about what is the edge, I think that's changed for me a lot because for most people,
Starting point is 00:21:43 I think who've been doing this for a long time, because lots of people have been programming at the edge. And that's, you know, microprocessors written in low level C and, you know, basically super. I mean, you guys were rocket scientists, right? And you worked on things that went into space. So theoretically, you've worked on extremely resource constrained devices. And that always felt to me like what edge was. Edge is, it doesn't do a lot, but it's very purpose-driven. It's not super dynamic. And to be honest, once you put it on that board, it's never going to change. Yeah. Yeah. I think resources have changed and the Raspberry Pi kind of opened everybody's
Starting point is 00:22:22 mind to what was possible and that you've got Arduinos and all of these little things where you can add your custom code and even update it over time to continually adjust to changing workloads. Yeah, really to double click on that, our car connects to the internet. I mean, it's not a fancy, it's just a Honda Odyssey, but it connects to the internet when we get home. And one day we were out driving and I stopped at a red light and the car shut off and I panicked, but it turned out that this was just a new feature that had rolled out where the car literally turns off when you hit a red light. And then when you let go of the brake, it turns itself back on. And I guess that's somehow more economical, but it just randomly happened.
Starting point is 00:23:11 So we've kind of moved from, when Patrick and I were doing embedded work, there'd be this firmware update and you'd have to carry a briefcase with a laptop in it and a cord, and you'd go to the site and you'd plug it in and update the firmware. And it took thousands of dollars for you to fly halfway across the world to do that. And now my car just does it. I don't even know, right? I mean, it's just so different nowadays. I mean, I do feel like maybe they should send you an email telling you your car is going to shut off starting Friday.
Starting point is 00:23:36 You would think, right? It just randomly started happening. Don't freak out, but this is going to start happening. Yeah, but it shows a dramatic change. And to your point, Raspberry Pi is also just a massive game changer because it puts it in everybody's hands. I mean, I needed a lot of handholding to do embedded work, having no background in C or anything like that.
Starting point is 00:24:00 And now with Raspberry Pi, you have an entire Debian OS at the edge, which gives you a ton of flexibility. And so I think as we look at where the edge has moved, it becomes resource. And now you look at AWS Lambda and you look at our new feature custom functions. Realistically, JavaScript is the, and I don't want to start a flame war. I don't want to get you flame war i don't i don't want to get you guys like a million downvotes but javascript is an exceptionally easy language to learn and to be honest if sandboxed properly it can be less it can be as non-dangerous as you want it to be and it can be as performant as you as you architect it so So I feel like it's certainly the future of,
Starting point is 00:24:47 I think, at least edge prototyping. I would be hard pressed to say that there is still not going to be a use case once you figure out what you want to do with data at the edge to continually lower costs and perhaps solve that hardware problem permanently. You are going to be on more constrained devices and maybe you're going to use something like a Kotlin
Starting point is 00:25:09 that can compile down and run in the JVM or something that's going to be able to function out there but still has that, I don't know, the mental clock cycles of the developer in mind and the ease of use, the, I don't know, call it the usability, I guess. I want it to be usable because I always call it when we're talking about collecting sensor data and doing workloads at the edge. Right now it's so new. Everybody's been talking about edge
Starting point is 00:25:37 computing for years, but we are collecting so much data. And I call it the Rumsfeldian challenge because we just, we don't know what we don't know. So we better collect everything. And obviously transporting that all to the cloud is not ideal. So we will solve this problem, but I think it takes a lot of experimentation in the very beginning and you need something flexible for that. And so these wholly capable standalone Ubuntu environments like a Raspberry Pi are the ideal place for us to figure out what it is we're even going to do when we're out there at the edge. the AI for the Tesla autopilot. And he was effectively saying, well, we just need to get enough data and then we'll be done. And so it's really a data problem. I think that the challenge, I mean, I agree with him in principle, but practically the challenge is some data is
Starting point is 00:26:40 more important than others, right? So for example, if you're driving on a road here in Texas and there's nobody on the road and it's 70 mile an hour speed limit and you're just going on a straight road by yourself, that is a lot less interesting than you're part of a 17 car pilot, right? And so that second thing doesn't happen very often, but when it does, it's really important to collect that data
Starting point is 00:27:06 because you want every single time that happens, you want to learn as much as possible, right? Anytime there's a black swan event, you want to learn as much as possible. But as you said, you can't collect everything all the time or even half the time or even a tenth of a time. So you need something that's smart, that's saying, you know, is what's happening right now interesting? If it is, then start collecting it. If it's not, then throw it away. And that smart thing has to live on the edge, you know, by definition. And so I think that the differentiator, and I'm not an autonomous vehicle guy or anything like that, but, but just looking at the Tesla idea, I feel like the differentiator there is, can they do smart things at the edge? That's going to make or break that whole idea.
Starting point is 00:27:58 And so I, I think there's probably a hundred other examples where, where edge computing is going to make or break a lot of the next generation of tech and of ideas. I agree. And the classic example that I always talk about, I was working with the oil and gas industry, the turbines that are processing at their refineries, they spend 20,000 RPM. And if something goes wrong, it goes really wrong. And those things shut down and it's at peak natural gas prices. It's a million dollars a day that they're losing for just one turbine. So it shuts down and it's not able to refine it. They're losing a million dollars a day or in some cases a lot more than that. And they were collecting data and pushing it into an old school data historian from sensors. And their resolution was every five seconds. And you can look at the data points leading up to a failure and you can say, well, there's probably something there. Yeah, that's right.
Starting point is 00:29:00 And they had one guy. They introduced me to the one guy. And he comes in and he looks at it, and he's like, well, yeah, here's what happened, and everybody else in the room, we were all looking at the exact same screen, and we had no idea what he was talking about, and he's like, well, I remember once in this other place 20 years ago, I saw something like this. We didn't have sensors back then, but it was a lot like this. And he just had all of his tribal knowledge and he was 60 and he desperately wanted to retire,
Starting point is 00:29:32 but he could not. He would get called up in the middle of the night and have to fly off somewhere in the world to analyze something that had gone wrong. And so our mission in this consulting project with this company was to provide five you know, five millisecond resolution, but you cannot record that and put that all into a historian because you're going to overwhelm that test period. So we built a system that basically kept a rolling buffer of five minutes and would wait for an anomaly to occur, would wait through that anomaly or shut down if that's what happened, and then capture five minutes on the other side, wrap that up into a package, put that into a local edge instance of our product, which would then be transported up to the cloud for analysis.
Starting point is 00:30:17 So the ability to understand what an event is, what data led up to it, and capture that in higher resolution when you're just normal, steady state analysis of five second resolution where you're like, as long as the line is flat, everything's great. But as soon as that line starts to move a little bit and infinitesimally so, and you're talking about vibration, it's something that's spinning at 20,000 RPM. It's important to know all of those fluctuations and important to be able to look at them because not everybody has that tribal knowledge to understand that, you know, when it goes up by a fraction and down by a fraction every five seconds, that what you're really looking at
Starting point is 00:30:56 is very real problems with vibration in the 24 hours leading up to this thing flying through the wall and ruining everybody's day. Yeah. Yeah. I think the, uh, that problem of, you know, am I seeing something interesting? He's like a really phenomenal, I think it's a type of an active learning problem. And so for something like that. So, so I definitely, you know, I, I, uh, I think, you know, JavaScript is, is, uh, actually I, I truly enjoy writing TypeScript, which compiles down to JavaScript. I feel like it's a solid language. Do you think we'll get to a point
Starting point is 00:31:31 where the edge will be language independent? Or is there something about the edge where if you were to add more languages, it's just a lot of work? Is there something... Could you describe a little bit, what's this machine VM or what's this sort of box that runs at the edge in terms of software and what actually is going on there? Well, I mean, it depends if it's, what we found
Starting point is 00:31:57 is that the, the, the opportunity in POCs is that you can put anything out there. It could be any language and the box doesn't need necessarily to even survive the element. So you're not really hardware limited until you go out of proof of concept and into production, right? So once you figure out what your functionality is, then you start to look at what is the cost-effective hardware that can be run and how do I replicate this functionality out there? And from a resource perspective, will I have the benefit of an OS that supports Python? Could I run my statistics in a Python script? Or could I use JavaScript because I want to do a bunch of API calls to third-party resources? Or what language accomplishes my goal for the proof of concept
Starting point is 00:32:45 may well be different than the language that accomplishes the goal in final production. So in the same way TypeScript compiles down to JavaScript, Kotlin, you know, compiles down to something that runs in a JVM. I'm sure that somebody very smart, you know, is going to take some super easy language that hasn't even been invented yet that my children are going to learn how to write that will compile down to C and ultimately be able to be shipped off through chip as a service. And you're going to send me the thermostat program that my kid wrote to keep her room cooler in the middle of the summer and adjust the thermostat automatically so i feel like i i want to believe that you know language the language shouldn't
Starting point is 00:33:33 matter that the programming languages and to whatever end there are you know tribes that adhere and fight desperately for one language over the other, that probably all eventually goes away. And we all just drag and drop some boxes onto a screen and say, that's what I want it to do and go make it happen where I want it to happen. Containerization is obviously a huge movement and it has been in the cloud for applications. We see certainly in these smaller edge data centers, everything is containerized. They're only pushing containers out there nobody's installing on bare metal and you know even at the edge out at the vineyard we are running you know little docker containers
Starting point is 00:34:14 and those raspberry pi's so it's entirely possible to be you know a kubernetes cluster you know pushed out to the edge k33S is a great, really, really minimalist container management platform. But again, I don't know how the code gets out to the edge. And to be honest, I try not to care. I try to work on how does the part of the puzzle that I'm working on, how does it make everybody's life easier rather than more of a problem? Yep. Yep. That makes sense. I'm trying to make it just work. And somebody with a lot more money and a lot more time and who hasn't spent as much of their life banging their head against the hardware problem is probably going to have to solve that one because I think I may
Starting point is 00:34:59 have given up on it. Yeah, that makes sense. Yeah. I think maybe, you know, it started as a, I believe a lot of these Lambda functions started as, as sort of JavaScript kind of front end servers. And so, and so you have the browser running JavaScript. And so I think maybe it's a starting point that a lot of this, the cloud flare edge, I think is JavaScript only. And that's probably just because of their pedigree, like where they came from and their inspiration. And, but to your point, I think, is JavaScript only. And that's probably just because of their pedigree, like where they came from and their inspiration. But to your point, it's all run on VMs. And so it's just a matter of time before they'll say,
Starting point is 00:35:33 look, you can point us to your Docker Hub location that could be running just about anything and then just really open it up. It teaches you a lot. As you start to move into real- life deployments where containerization is the standard, it also teaches you how important it is to build a good Docker image. Cause that's one of the things I feel like, I feel like Donald's news,
Starting point is 00:35:58 Donald Newt's axiom about premature optimization is one of those things that can absolutely kill a company, but man, if you're gonna to spend it anywhere, a good Docker container is key. I think when I got to HarperDB, our Docker container was 350 megs. And I was like, it feels too big. I mean, it literally feels like it might be too big given that our actual installer, our actual installed binary is under a hundred and all we needed was Node.js, no JS, uh, I feel like we could do, we could, we could do this better. So we spent a lot of time just recently, actually with our new release, working on that and making it what we're, I guess, the industry term is a first-class citizen because truly I think containerized applications and workloads, and to be honest, dynamically distributed by large service providers that
Starting point is 00:36:48 provide rapid access to your application on demand, they don't want your Docker container running out on their edge servers all of the time. They may only want to follow the sun. And right now, the best framework we have for that is something like a Kubernetes cluster that can shut down and spin up and have access to persistent disks, certainly for data storage. But a lot of it is ephemeral. Yeah, that makes sense. I had an issue recently with AWS Lambda, where I think Lambda can only be 200 meg. That's their limit. And so I wanted to run some machine learning. And as anyone knows,
Starting point is 00:37:25 who's tried to install PyTorch or TensorFlow or any of these things, like you type, you know, pip install PyTorch, and then you get this, this, you know, in console progress bar telling you you're downloading like 900 megabytes. And you're like, what, you know, but it's just, I think it's because it has all these different optimizers, you know, if you're running, what? But I think it's because it has all these different optimizers. If you're running on Intel hardware, there's this thing called MKL, which is some kind of linear algebra thing. And if you're running on the GPU, they have that. And so it ends up being this massive thing that really can't be, at least I don't know how to decompose it. And so I think I ended up getting around that with some elastic
Starting point is 00:38:05 file system. So now Lambda function mounts this file system that Amazon is just holding onto for you. And you can have a bunch of Lambda functions all using this. But yeah, I think you start to hit a lot of limitations for good reason, because you're fanning this out now. It's not just some server that could be uber powerful sitting in the Midwest somewhere, but you're fanning this out now. It's not just some server that could be uber powerful sitting in the Midwest somewhere, but you're fanning this out to many, many different nodes, potentially all over the world. And so that just creates a lot of limitations
Starting point is 00:38:35 that people might've not had to deal with otherwise. It creates a tremendous number of limitations. I mean, it also creates a lot of opportunity for the challenges around logistics and that's the other thing that kubernetes you know for better or worse is very good at it's like i have an atom and i want this atom to do some work and i wanted to do some work here across all of these places and it's very easy to script it and it's very easy to spin it up and it's very easy to spin it down and that is that is truly you know as the container sizes get smaller and as the edge compute resources become more powerful you know you're just going to continue to push out and i
Starting point is 00:39:16 don't see any change in in you know containerized architectures coming because I can't imagine a better, more atomic way to send out a core piece of functionality than in that container. I mean, would I like it to have less overhead? Sure. Would I like it to be a little less complex? Yes. But it does a great job. And that's why DevOps people are so angry all the time. Yeah, that's right. I think, and you could please fill in the gaps here, but I think the way that the container system works is it's kind of like you start with some base image and then it keeps download Node.js and install it. You know, those commands are run starting from some frame of reference. Maybe it's an Ubuntu install or something like that. And so, you know, you don't have to actually copy the Ubuntu install because that's sort of your base image that everyone has agreed on.
Starting point is 00:40:19 This is the Ubuntu image. But you're copying over basically what you've done to that image. And so I guess, and walk us through this, but like shrinking the Docker container in this case, I guess means just, does it mean doing less things to the base image so that there's less to keep there? Well, there's a process called like a sequential build where you could bring in the Ubuntu image and then you install Node.js. But all you really need is Node.js because on top of Node.js, which is the only prerequisite for HarperDB, you install HarperDB. And so rather than carry the whole Ubuntu image, because again,
Starting point is 00:40:58 these are going to be installed over a Linux OS, right? That's what's running Docker or Linux subsystem on Windows. So you've got Linux. You don't need all of Ubuntu. You definitely need Node.js because we require that. So ultimately you want to install Node.js and there are Node.js based images, and then you can install HarperDB on top of that. And then in our case, because we persist to disk, and we're not just reading from inbound streaming data, we need to persist something, your data and your config. We, on container start, will the first time install, i.e. reach out to that persistent disk, set up all the files that we need, set up your config, set up your data store, set up your data files. And then ultimately
Starting point is 00:41:54 that becomes your install. So it's not plug and play because otherwise we wouldn't be able to persist any data, but it is as quick as it can be that first time. And then if it were to shut down and start up, it will look at its persisted disk, its file mount basically, and say, oh, well, all of those install files are there, so I'm good. I'm just going to basically spin up the APIs that HarperDB has and wait for somebody to try to talk to me. Got it. I see. And if it reads those files and it says this is a version eight of HarperDB, but on version nine, and it has some migration logic and all of that. Exactly. Got it. Cool. Cool. That makes sense. So cool. So let's dive into databases now. So we have, I think we've given a really good overview of edge computing. And so you kind of, you can see
Starting point is 00:42:42 kind of how this can follow. You have all these machines running, let's say, in the vineyard, and they want to do things without having to phone home. So they don't want to have to give all the data to some server, which could be a thousand miles away. They want to do some processing locally, a lot of processing locally, and then just send back the most important things. And so to do that and to coordinate that, we need to have some centralized place where we can have information. So to use the vineyard as an example, you know, maybe we want a centralized place where we store
Starting point is 00:43:20 what we consider to be like anomalous temperatures. And that could change as the season changes. And so we want to keep some information in all of these Raspberry Pis so that they're all kind of on the same page and they can kind of make decisions kind of as a unit, right? And so what you end up having to do is, if you were to write this by hand,
Starting point is 00:43:43 is do a lot of message passing. And anyone who's, you know, and I wrote Mame Hub a long time ago. So it's a peer-to-peer kind of video game thing. You know, anyone who's ever done peer-to-peer knows how hard it is. You know, getting two Raspberry Pis to talk to each other,
Starting point is 00:44:01 you know, unless they have a public IP address is super difficult. And just having a mesh network, even a mesh network of public computers, is really difficult. So kind of walk us through, like, what is kind of HarperDB? How does it solve this problem? And why is it able to do what it does? Sure. So HarborDB was built by developers. I love the phrase, by developers, for developers. It feels like every product is that, really, isn't it?
Starting point is 00:44:37 Well, maybe not for developers, but definitely by developers. You got to have that. Ultimately, it was built to solve a lot of the pain points that we found when we were building distributed applications. So I know that I have workloads that I want to run on disparate devices. I know that I'm going to collect some sensor data. I know that I want to run some calculations on that. I know that sometimes the sensor data comes in more quickly than I can run those calculations. And sometimes it comes in less quickly. I need to persist that in some way. So I'm writing it to a file or I'm holding it in RAM, except I lose power. And now all of a sudden I've lost all the data I was holding or, or I was able to make my calculation. And now I've reduced that stream of data to
Starting point is 00:45:20 the every 10 minute running average that I really want. And I have that 10 minute running average. And now I want to transmit that 10 minute running average to the next data point in that 10 minute running average off to the server that's going to analyze all of the 10 minute running averages across all of my data sensors that I'm collecting, except I lost my network connection. So now I need to build a buffer to hold that.
Starting point is 00:45:45 And oh, wait, somebody shut off the power again. So I lost my buffer. So that's a giant pain. And HarperDB is designed to function and to push that data storage out to the edge. So you can run your calculation, or sorry, you can collect your data from the sensor with app process.
Starting point is 00:46:03 And then you can simply put it into the database. Then you can have a, and it is persistent and it is ACID compliant. And you know that it's been stored. And then you can have a second process that will pull those things out and aggregate your 10 minute averages. And then it runs every 10 minutes and then it puts the result into a second table. And that table is now persisted and it is there and we know that we have it. And the fact that you want to now move that data over to, say, the cloud node for analysis, we have what we call clustering, which is not traditional database clustering, but we call our bi-directional table level data replication. So you don't have to replicate an entire database with HarperDB. You can literally choose within a schema or a table what records, sorry, what tables are going
Starting point is 00:46:52 which direction. I can publish it up. I could subscribe to say a thresholds table that might bring the thresholds for an alert down to the edge. And then when I create my 10 minute running average, I can publish that table up to the cloud. So my when I create my 10 minute running average, I can publish that table up to the cloud. So my application gets a lot simpler because I only need to make local host calls. I don't need to worry about network connectivity. I don't need to worry about holding in a memory buffer. I don't need to worry about what happens if the power goes off because I know it's persisted. I don't need to worry about, you know, wifi going out or, or whatever mesh network collapsing for a few seconds because somebody kicked a power cord. It's there. And when it gets plugged back in and HarperDB boots back up, it's going to say, oh, I've got these messages. Oh, I have not sent them. I'm going to send them now. And they'll send that. So really, if you look at what HarperDB does, it allows you to simplify your programming by just sitting there being an always on, always connected data fabric. So you can move your data
Starting point is 00:47:53 wherever you need, and you can do operations on it, and you don't need to move all of it. But ultimately, it reduces your application code to just making local calls. So it also, to some degree, allows you to bolt down that box a little more because you don't need your application code to be making calls out to third-party APIs. You could have a cloud server making those calls, putting the results of those calls into HarperDB, and then subscribing those calls, the results of that data, back down to a third-party API call table. And so I don't need to make those calls, the results of that data, back down to a third-party API call table. And so I don't need to make those calls from the edge.
Starting point is 00:48:30 Wow, that's cool. So how does the developer handle conflicts, right? So power goes out. You say that the rolling average is X, but because the power went out, someone else got there first, and they think the rolling average should be Y. Power comes back on, and now you have a conflict. Is there some API to handle that? Or is there something about the way the transactions are specified that there's always a logical way that gets resolved? How does that work? There is. In most of the instances that I'm
Starting point is 00:49:02 kind of describing, add node of HarperDB with a piece of compute, maybe some sensors hanging off the side of it. There won't be a conflict because the rolling average that I'm calculating is based on the sensors that I'm attached to this particular node. So there may be a thousand nodes, but their rolling average is going to be basically tied to their sensors. So there wouldn't be a conflict. However, when you're looking at other applications where perhaps my edge unit is not a Raspberry Pi in a field collecting sensor data, but instead is an edge node in one of those smaller data centers
Starting point is 00:49:38 and a user logs on and because of their IP address and their location, they're steered to one and they input some data into a form and that immediately is replicated up to the cloud, which powers the massive UI for the core application. However, somebody else is elsewhere and they may have entered a number into that value a little bit after, but their network connection was a little bit faster, and they get it there first.
Starting point is 00:50:13 So the way HarperDB handles that is we have timestamps and we look at a unified time server and say, this timestamp versus this timestamp, whoever last writer wins, and we can overwrite that. But there are often times where that will cause a further conflict where you can get into that. Ultimately, it's an age-old problem in distributed computing. And that is the conflict between data happening over a slow versus a fast network connection. And so our next chapter or our next version, or maybe two versions away, forget, we're looking at CRDTs, which are conflict-free replicated data types. So they have a bunch more metadata associated with them.
Starting point is 00:50:56 And they can therefore make comparisons and say, all right, I know you wanted to do this, but I have to handle this transaction first, even though it came in later. So while I have persisted you, I'm going to unwind you and rerun this and now run you. And the end result is going to be what was intended. Right now, you can do that with HarperDB simply through intelligent architecture. But if our motto is it should just work and, you know, sacrifice simplicity without sacrifice, which is our tagline, ultimately we should handle that for people automatically. So we always have an eye on that. And we're working, we're new enough that we're working with customers and we help them architect these solutions because, you know, distributed computing is a challenge for a lot of people that is new to them. And so we're not the subject matter experts. We're not the only subject matter experts, but we feel like we have a good handle on how we can architect around some of the limitations of existing solutions. And we're always looking forward to try to figure out what the best
Starting point is 00:52:01 long-term solution is going to be. Yeah, I remember reading about this with Bitcoin, where it's called the double spend problem, where basically two people are in different geographic regions, or maybe a better way of saying it, who are somehow far apart in the internet space, can both spend the same money at the same time. And then it might take a really long time for that to get resolved. And until it's fully resolved, if someone actually executes that spend, then now you've both been able to buy a coffee for the same price or something like that. And again, I'm not a crypto expert either,
Starting point is 00:52:37 but I think that what's going on there is there's, I guess like there's just, it becomes a kind of popularity contest where there's this big battle over who is right. And eventually there's a consensus. And so I would imagine, yeah, something like if you're using HarperDB for e out further and further and closer to the edge because you want to get low response times and you want to get that request. But then for most architectures, at least currently, there's one master database because that's how you solve that problem. And it's a giant vertically scaled instance that costs hundreds of thousands of dollars a month
Starting point is 00:53:30 sitting in Oregon. And ultimately you're going to overwhelm it with a thousand different servers running your Lambdas that are all going back to the same place. And we did a proof of concept with a large social company. And if you were in Buenos Aires and you hit their API, the ping to the endpoint, like the connection was almost instantaneous because they had a lambda running in South America in a data center. The data that would come back, your friends list took sometimes upwards of 11 seconds because it was all the way back in
Starting point is 00:54:07 so we ultimately realized that our benefit was we can handle pushing the data out to the edge we can handle with our new custom functions lambdas that are at the edge also so you're you're basically your data is right next to your Lambda that's trying to access it. And then we handle moving the data around and the data that we move around and replicate to all the other instances in a globally replicated cluster of HarperDB is the transaction. It can be as large as the initial operation, but it can also be smaller because it might not change everything at the end of the day. So we can move less data around and we can move it on pipe that we control because we understand the internal IP addresses, which are going to be faster than traditional external IP addresses. And it becomes a homogenous data set with very, very low latency for everybody who's interacting on it.
Starting point is 00:55:06 And now literally the only challenge that remains is to make sure that multiple actors acting on data at the same time are resolved correctly. So we say we are ACID compliant at the node and we are eventually consistent. So right now we can't do, for example, financial services, right? We're not going to solve that double spend problem. But there are a lot of places where that's not critical. Social media is certainly one of us, but we're working on a solution that would make us able to solve that problem. Very cool. So we talked about kind of mining equipment and some of these like really specialized environment. What about if someone is, here's a good example. What if someone's just building an email app? So an email
Starting point is 00:55:51 iPhone app, right? So, you know, they would want to have access to their emails. Obviously the server has a copy of their emails. It's kind of caching, but it's also really more like a database. I mean, you could imagine someone wanting all of their emails on their device, right? And so could someone use HarperDB for something like that? I mean, that's more of like a consumer facing, you know, like on their consumer device running an instance of HarperDB. Is that part of the sort of use space for that?
Starting point is 00:56:21 Absolutely. We don't run, we need Node.js, so we're not going to run on an iOS device. We were able to use Userland, which is an Android app that actually installs a Linux subsystem, a full Ubuntu copy, and we could run it there. It was not a recommended implementation, but you certainly can do it. You can get it running on an Android tablet. I built a vehicle telemetry app on a tablet that was completely self-contained. And it would store local data in HarperDB. And then when the tablet came within Wi-Fi range of the office, it would then replicate
Starting point is 00:56:56 that data into the cloud. And you would see the vehicle and its path and any violations from its thresholds immediately represented. So if it had cell service, it would be doing that in real time. If I shut off cell service, it would still collect that data, still persist that data. And when it had a network connection, it would push that up. So it's very, very possible to persist that data without maintaining that connection. And I think I forgot what the literal question was. Oh, yeah. So the question was, could you run HarperDB on an iPhone if you're building some app that needs a window of the data locally? So imagine I'm building an email app,
Starting point is 00:57:38 I go to airplane mode, I still want to see my emails, I delete a few, I come off airplane mode, it needs to sync. All of that sounds, I would put that in the hard category in terms of being able to do that correctly, where I don't delete the wrong email or have a double delete or something. And so it'd be amazing if there was, and there might, I haven't done a survey on this, but it'd be amazing if there was some technology out there where I could just use some library and I would have some snap, some, some, not snapshot, but some slice of the data locally on my phone and they would take care of everything else, which it sounds like what Harper's doing.
Starting point is 00:58:13 And then that's when you brought up the, the restriction around the, the Node.js and all of that. Yeah. And there are, there are pure like client side, JavaScript browser level, JavaScript libraries that, that can accomplish a lot of what we do. They'll make use of IndexedDB as an underlying key value store. We have an underlying key value store that we use called LMDB, which is Lightning Memory Map Database, which is extremely fast, very performant, written in C, but obviously it's just a key value store.
Starting point is 00:58:46 So it doesn't have all of the properties that you'd want in a database, SQL querying and indexing and stuff like that. So we've built all of HarvardDB's functionality on top of that. However, underlying that is a key value store. So could we, if we had unlimited time and resources, replicate all of that into just a client-side library that you could include in a browser app, have it sync data down from a cloud and be completely performant, self-standalone. And if your browser on your phone then reconnected to a network later, execute the exact sort of syncing that HarperDB does currently
Starting point is 00:59:27 from, say, Raspberry Pi or a smaller data center edge node. Absolutely. You could 100% do that. And there are a few solutions that do that. The challenge is they maintain those subscriptions. Maintaining those subscriptions is expensive on the server. So continually syncing that data back and forth and holding what is in effect a socket open so that you can subscribe to a specific query from, say, a server-side entity is very expensive. Subscribing to a table is a lot less specific because you're going to have a lot less individual subscriptions. It's not that customized. Once you start getting those query level subscriptions, it can become very expensive. Meteor.js was a great platform that did that. And it was built on top of MongoDB and it looked at transaction log to figure out what
Starting point is 01:00:20 real-time data needed to be pushed down, but it was incredibly resource inefficient. Oh, interesting. I was wondering because I remember when Meteor.js came out, I did try the demo. I think we talked about it on the show years ago and it looked magical. Like it looked like, okay, well, you know, I have this slice of user data and I just want it to exist over here.
Starting point is 01:00:43 And it just magically worked, but then it never took off. And it sounds maybe like this is why, like it just, it just at scale, it just fell apart. It was, it was, it was magical. It was truly magical. It was just, as soon as that group started to move to include other databases, they realized how incredibly challenging that was because they integrated it so closely. And so they ended up building an entire library that moved away from Meteor.js and ultimately became Prisma. That's what it was, Prisma.io. Oh yeah. I've heard of that too. Yep. Yeah. So that was the next iteration of that. That was the next iteration of how do we sync data between a client and a
Starting point is 01:01:29 server and do that in a more efficient way and not necessarily overwhelm with individual subscriptions. And they're all great use cases and they are truly magical for users, but they become incredibly resource intensive. So we are focusing on, I'd say less the long tail of simplicity and providing the bulk of functionality we can within what we know to be the limits of data replication between every single client on earth and one central data store. Because obviously the other challenge is if I give you access to every single piece of data,
Starting point is 01:02:06 then you could update that data. And now I have a billion clients that are all trying to resolve, you know, who did what, when, what was your network timestamp? You know, who came first? What's the right answer? And then you'll never get into financial services, which as you know, is where all the money is. Yeah, that's right. Closer to the money supply. Yeah. So it sounds like the Harper, sort of a center of mass for HarperDB is just using Netflix as an example. Netflix wants to push its most popular videos to the edge so that you don't have to go all the way to Los Gatos or wherever Netflix's data center is to get that video, right? And so you can imagine all over the world, there's a ton of these like small data centers hosting whatever the most popular Netflix
Starting point is 01:02:59 video is. And so you have this cache and so people will go to the server. The server will say, oh, yep, I have that video. It's one of these super popular videos for your region. Here it is. Or, oh, I don't have this really esoteric video about leopards or something. I'm going to have to go to the main data center and go fetch that. But any time you write any kind of logic or really do anything with computer with a computer, you're going to want to keep some records. Right. You're going to want to keep track of how many people watched each video. And so now you could every time someone goes to watch a video, you could phone home to the main server. But now you hit a whole bunch of other issues, as we talked about with that main server now getting bombarded with tons of requests all the time, and it doesn't scale. So what HarperDB could do is sit on these edge nodes, collect all of those statistics,
Starting point is 01:03:57 so that tomorrow Netflix knows what videos are the most popular tomorrow and it can keep that fresh. And then all of that gets replicated as all these machines are ticking up this histogram of videos. And then at some point, maybe at the end of the day, someone or some process at Netflix can get a copy of this database that all these agitators are sharing and read it and learn some intelligence from it. Did I explain a use case pretty well or is there any? You did. And I'd go one step further to say, you would run an AI machine learning model to actively compress all of the individual data points that maybe come through a Netflix UI, a user experience. I might hover over a movie. I might watch the trailer for
Starting point is 01:04:58 it. I might only get halfway through the trailer. If you've ever thumbed through your Netflix queue, gone past a row of films and gone back up, you'll see cover art change for films as they try to test different cover art to see if you'll click on that. So a lot of these decisions are simply like, we want to try A, B test this thing automatically. But at some point, somebody is going to realize that there's an advantage to one of those covers versus the other cover, at which point that is going to realize that there's an advantage to one of those covers versus the other cover, at which point that is going to become a policy that is rolled down to every single client. We're saying, this is the best cover for this. This is what gets people to click on this. Or based on this profile, we're going to show this cover and the demographic data
Starting point is 01:05:40 that we've classified. And we're going to run a machine learning model that will basically classify all of our users into one of three archetypes and the cover art is defined by that archetype all of that happens at the edge none of that you know aside from larger you know aggregation or probably strategies for that knowledge is going to happen at the edge. It'll happen in the cloud. But most of it you want to have happen out there. Otherwise, you run into the same problem everybody runs into before distributed computing was even a thing,
Starting point is 01:06:14 which is, my God, we need this server to be literally the size of the planet. Right. Yeah. Yeah, that makes sense. I think, too, there is this study. I'm sure you're more familiar with this, Ian, but there is there was some study that I think Google did this study back in like 2011, said, basically, for every millisecond it takes their site to load, their product gets hit in some significant way, or maybe it was every 10 milliseconds. And so there's the real economic advantages. It's one of these things that's probably innate or it's probably subconscious. You're not sitting there looking at your watch saying, oh, this was 80 milliseconds. I'm out. But subconsciously, the product gets hit hard every 10 milliseconds it takes to return a result.
Starting point is 01:07:00 And so anything you can push to the edge just will turn into material dollars and cents. Exactly. And I mean, ultimately, we say that at least in gaming and in computing that 16 milliseconds is what the human being can perceive as a delay. So you want it to be down at 16 milliseconds. And I mentioned a case study earlier where users in Buenos Aires were spending, you know, a few milliseconds connecting to a local API, but then data would take anywhere between 300 milliseconds and 11 seconds to bring back a friends list.
Starting point is 01:07:37 And when we started running our tests with our custom functions and the data, which had been replicated out, your friends list doesn't change all that often, but we'd replicated the data out right to where the endpoint was. And you were seeing response times of five to 10 milliseconds, which, you know, I, we knew that under load, we would see that push,
Starting point is 01:07:59 but our objective was under a hundred milliseconds and we, and we beat that easily, which you were never, ever going to do if all of the data still lived in Seattle. Yep. Yep. Totally makes sense. Cool. Yeah. I think we covered a ton of really good material here. I think we opened all the bookmarks, which is good. Let's jump into HarperDB as a company. So what's something that is kind of unique about HarperDB? It could be the way you play in your off sites. It could be the layout of the office. Or what's something where when you showed up at Harper or maybe through your tenure there, it's really made
Starting point is 01:08:39 Harper stand out in terms of the work environment? Well, I think if you go to the site, you'll see our logo is a dog. Harper's actually our CEO's dog's name. Oh, wow. Okay. All of our demo datasets, if you go to our postman collection, if you go to docs.harperdb.io, you'll see we have a ton of demos and our demo data sets are all the dogs owned by the people in the office and then a breeds table. So you can do a join of those data sets. So all of our demos are based on the concepts of dogs. And at the end of the day, it's about somebody who is hopelessly loyal to you, always there. And ultimately, they make your life better. And so if that is the driving
Starting point is 01:09:29 architecture of every employee we hire, every feature we look at on our feature up board and say, do enough people want this? Is it going to make people's lives better? And a lot of us are multidisciplinary software guys. So we've seen lots of problems over time. And to be honest, this product was built to solve problems that the founders were having in specific application at their former company. But they solve a lot of problems that I've had too. And there's no limit to problems you face as a programmer. And we call our approach ultimately collapsing the stack. So we have now effectively Lambda functions or old school, you might call them stored procedures, but they're written in JavaScript and they're super easy to deploy and they
Starting point is 01:10:19 make your life easier and better. And hopefully you can spend less time working on that and more time outside playing with your dog, which is all they really want. Yeah. Do you let dogs in the office? This is a great debate. I've worked at places where dogs are in the office. I never had an issue with it. Definitely some people didn't like it. And I've worked at places where dogs were banned and people really didn't like that either. What's Harper's take on dogs at the office? Well, when we had an office-
Starting point is 01:10:49 Oh, that's true too. When we had an office, dogs were absolutely welcome. The irony is that Harper was not a nice dog and Harper was the only dog. If Harper wanted to come to the office, no other dogs can come in the office. But otherwise you could like, there was, we had, we talked about conflict, uh, resolution, replicated data types. Um, ultimately there were also conflict resolution dog types where certain mixes of dogs were allowed in, but if that dog was going to come, we definitely knew you can't bring this dog because they will not get along. Yeah, you need operational transforms for dogs. This person has to get transformed across the hallway or something.
Starting point is 01:11:31 The SQL query we're not in. Yeah, that's right. Where dog's not in disagrees with this dog. I think in the last episode, we were talking with the CEO of Pinecode, which is a database that does vector arithmetic database. And Patrick was bringing up R trees. I think this would be a perfect example where we could have rectangles for each zone of influence for each dog. And if we get an overlap, that throws an alert or something. Yes.
Starting point is 01:12:01 The Venn diagram of dogs that don't get along is just a circle. Yes. The, the, the Venn diagram of dogs that don't get along. It's just a circle. You cannot, it's just, we can't put all these dogs in one room. It's just too many dogs. So, so, okay. So it's distributed. So, you know, are you hiring sort of interns or full timers and where are you hiring? What kind of people are you hiring? You kind of walk us through, you know, people could definitely, I'm sure you have a careers page and people can check it out, but just ostensibly at a high level, what are, what are you looking, you know, for HarperDB on the engineering side? What kind of persona are you, are you looking for? We just went through a, we just had our first hiring round in a couple of years.
Starting point is 01:12:45 We built out all of our core functionality and now we're ready to, I think we got what I'd call the first versions of this where we're figuring out what is the thing supposed to do? How is it supposed to work? And what is our technical debt left over from that learning process? And we've cleaned that up. And so now we're out there looking for a new full stack developer. And we were looking for a designer and then an infrastructure developer, because we're finding that the bulk of the challenge, once the product is sound, we want to increase the size of the team that's helping build cool new features. But right now our features make it super easy to deploy and we think it meets the needs of most of our customers. Now it becomes the services layer that we put in place to help big customers solve their architecture problems
Starting point is 01:13:37 because distributed computing is a new paradigm for many of them. So an infrastructure developer, somebody familiar with taking a Kubernetes cluster and extending it across public private clouds, figuring out how to make it work with edge devices and script all of the intranode connectivity. So we're looking for obviously very smart people in DevOps, a full stack software engineer. Node.js is what we're written in. So Node.js is a prerequisite there. We are also not solving, I wouldn't call them traditional programming problems. We're in a very, very specific space. So we're not looking for extremely experienced programmers. We're looking for people who sort of get it and understand the goal is to build something that's a joy to use. And as such, there might be a little more heavy lifting on our side so that there's a little less heavy lifting,
Starting point is 01:14:40 you know, on the parts of developers who are using our product. So to that end, we really, really like to be able to take somebody in and make sure that they care what the customer thinks. Because there's a lot of developers who want a functional spec and they want to build out according to code. And then they want to check out. And I met the objective and I'm like, guess what? The objectives are going to change every day, but there's one core and that's it'll only change
Starting point is 01:15:11 if it makes it easier to use, more stable, smaller, tighter, faster, whatever. And the other part is you're free to bring suggestions to the table just as much as our CTO or myself or our director of marketing who's out there on dev too, uh, you know, and reading all the articles and all the feedback on our blog posts. And it's like, you know what? Everybody hates this thing. Yep. Like they talk about how we don't solve it, but nobody solves it and everybody hates it. And maybe we should look at
Starting point is 01:15:45 that. And that's just as valid as an idea, um, as the idea that, you know, our clustering engine should perhaps change to something written in a lower level language so that it's faster. Yeah, totally makes sense. Yeah. So, so the job isn't just, uh, you know just inverting the binary tree or solving some really tricky dynamic programming problem or something like that. That's not actually the job. That might be something you have to learn as a rite of passage, but it's not the job. Yeah, I mean, you'll totally do those things. You'll totally 100% do that. But we've got so much of the core written that at this point, our patented data model and our indexing and all the things we do are really, really solid. I think we'd love somebody to become familiar enough with it that our CTO could take a day off. That'd be nice every once in a while, right?
Starting point is 01:16:46 But I think the other, the other part is just the flexibility to say, I don't know the answer, but also nobody knows the answer. So let's figure out a way to write it two or three or 10 times, test it all and figure out what the right answer is right now. One of the things I realized that the most authoritative paper on resolving conflicts in distributed computing was written in 1984 by a woman at Microsoft. That's the paper that all of the articles eventually go back to her site as the primary influence. And we've known it was a problem for a very long time.
Starting point is 01:17:28 And we still have and people still end up with that article because we have not solved that problem yet. Yeah. Yeah. Yeah. I don't know if that is the same as Paxos. I've heard the name Paxos a lot. I think that's some way to do, I think, leader election and solve conflicts. That paper, at least in my circle, seems to come up a lot. But what everyone tells me, and I'm sure you've seen this too, is it's great in theory. Everything's great in theory. And then in practice, you have to find out the right corners to cut
Starting point is 01:17:57 so that something doesn't take three months to be consistent. And also, on the flip side, doesn't have massive errors. And so it's playing that game, I think, is a question of what do the customers really value? I think at the end of the day is what really matters. I think somewhere down the line, the idea of RAF consensus, election, all of that will, will fade away. And the data itself will contain bits of metadata that allow you to have a leaderless distributed system. So inherently all of the information you need to know about what you need to do is present for each node to,
Starting point is 01:18:41 to execute rather than having a central broker that kind of directs traffic because you could have a cluster or two leaders or failover or whatever, but inevitably it's going to be a single point of failure if you wait for that one person to make that decision. You're going to have lots of people doing things and it won't scale. So it is my thought that ultimately it will be able to decide in a deterministic manner by itself, just based on the data itself. And it has some interesting applications down the road
Starting point is 01:19:21 for quantum computing where, you know, you can make probabilistic determinations across massive data sets. And obviously databases are supposed to be deterministic. And there's a lot of debate about whether or not quantum computing could ever be used for data persistence or data logic. But I think there's a tremendous opportunity there to find the lowest energy solution or the probable lowest energy solutions for a query. It just requires a lot more qubits than we have right now. But 10 years down the line, man, my patent is going to be awesome. Yeah. And to the point you brought up earlier, I mean, there is a double spend problem and there are like very specific niche cases where you do have to spend that time and you do have to have sort of that
Starting point is 01:20:12 perfect answer. But the vast, vast majority of the time, you don't. And so in all of these instances, you can use edge computing, you can use things like HarperDB and all these like edge computing services that kind of make it easy to deploy to the edge. Docker, all the things we talked about will be extremely, extremely important. And then that one time when you actually click the checkout button, that time, you know, it can go to the server and take a long time. And people kind of expect that. They expect, OK, if my credit card is go to the server and take a long time. And people kind of expect that they expect, okay, if my credit card is going to get charged, I expect to wait a little bit. And you can kind of have the best of both worlds just by being smart and about when do I use A or B and, and, and both of them are extremely, extremely important.
Starting point is 01:21:00 Exactly. It's the, it's the challenge that you want. You want it to be as fast as possible, but not too fast. Yeah, that's right. Yeah. Yeah. Now a fast as possible without hubris, right? Exactly. That is, that is, that is the Sisyphean struggle, right? Yeah. We push, we push the rock up the hill every day. Yeah. Very cool. Cool. So let's jump into how people can reach, um, you know, you and how people can learn more about HarperDB. And what are some good resources for folks out there? And alongside that, we have a lot of folks who are in university who love can try out HarperDB for free? Is there a permanent free tier? Or what are some of the opportunities for them? It'd be great to kind of cover some of those bases. Sure.
Starting point is 01:21:52 Our URL is harperdb.io. On there, we have a docs tab, which will teach you everything you need to know from getting started. You can install HarperDB locally. It just requires Node.js and NPM. You can just NPM IG HarperDB. Super easy. We also have a management studio, which is web-based. Even your local instances, because obviously your browser is capable of making local network connections. You can manage local and cloud and other instances that you might have installed through RStudio. That allows you to connect instances to each other, set up intra-node data replication at a table level, pub and sub,
Starting point is 01:22:38 as well as our new custom functions feature where you can hang your lambdas basically off the side of HarperDB at its own API endpoint. So you can not just use our operations API, but set up something with third-party authentication that makes a query, perhaps inserts some data, then runs another query, calculates an average, and inserts it into a time series table. It's a super cool piece of functionality that then you can package up a project, click a button and send it to any of the other HarperDB instances in your organization. So it's very easy to deploy these as well. That system is backed by our own AWS hosted collection of Lambdas. And obviously, if you've worked with Lambdas before,
Starting point is 01:23:25 you know that deploying them and writing them and getting them out everywhere is not necessarily always the easiest. So we took a note on that and we tried to make it easier. We think we accomplished it. We do have, within our studio, the ability to spin up a HarborDB Cloud instance, which is our database-as-a-service product,
Starting point is 01:23:44 which is your own EC2 node with HarborDB running on it. We have a free tier. There's also a free tier for the locally installed instances. And you can effectively network all of these things together, watch the data move around a system, run your local instance on a Raspberry Pi, collect some sensor data, watch that get replicated up. It's really, really easy to set up a very, you know, a comprehensive distributed computing application in only a few minutes using HarperDB and the studio.
Starting point is 01:24:19 That's super, super cool. So folks at home, you can, most people have a Raspberry Pi. We've been telling people to buy Raspberry Pis for what, half a decade or something. So you have a Raspberry Pi, you have a computer, you can run HarperDB on the Pi, run HarperDB on the computer. And then whenever you, you know, insert foo into table bar, it just shows up on the computer, which is pretty cool. I mean, there's a lot of really fun stuff you can do with that. If you want to have a Raspberry Pi, we're working on a Raspberry Pi
Starting point is 01:24:49 water fountain to control a water pump. That's the latest project the family's been doing. And so we could just have a database, which is just saying, when should I turn on the water fountain or should I have it on right now? And then from our computer, we could just, you know, change, change a value in that database and boom, the water fountain shuts off. So, so there's a whole bunch of really fun stuff you can do with this. And then as you learn that technology and you go to a company that, you know, is moving a lot of bits and needs to do things at the edge for all the reasons we talked about, you'll have that experience. You'll be ready to go and you'll have sort of a leg up there. Absolutely. I wrote a thermostat program for my own house. I have old fan coil units that either have hot air going
Starting point is 01:25:37 through or hot water going through them or cold water going through them. And depending on that, based on the temperature, if you want it colder, you need to know what the temperature of the water is because you don't want it to just turn on. So you need this piece of data. And I wrote it on a Raspberry Pi with a little seven inch monitor on top of it. And it runs HarperDB that stores it. And it does a little predictive temperature curve. It does third-party calls to the weather service. It will turn on the cooling if there's cold water in there and it knows it's going to be hot later. It may turn on the air conditioning a little early. So it's a super easy and simple system
Starting point is 01:26:15 that then actuates the power button on any given fan coil unit in the house based on which window is that facing and does that room face? And is it time for it to be colder in here now? Or can I wait until later in the afternoon? It's a very, very simple proof of concept, but it's one that, you know, a commercial thermostat was never going to meet my needs. Yeah. Wow, that's super cool.
Starting point is 01:26:39 And yeah, and this way they can all talk to each other and they can all be aware of each other. So if one of them is going, you know, all out, the other ones know that, okay, maybe the temperature is going to drop and they can kind of bootstrap off of each other. Exactly. Yeah. Very cool. I'm just trying to save $2.
Starting point is 01:26:55 I just want to save $2. Yeah. That's the engineer thing, right? It's like, well, I could purchase this product for $99. I could purchase a Bugs snag subscription for 10 a month but i think i'll write my own and spend three years you know no we're right from scratch under the guise of someday i'll productize this and i'll get all that money back yeah that's right never never works never works cool uh jackson was so awesome having you on the
Starting point is 01:27:23 show um i learned a ton i know patrick and I have learned a ton about edge computing from you and I really appreciate it. Folks at home have learned a bunch. If you want to reach out to HarperDB, they're on Twitter. We'll post a link to on social media and show off what you've built. I think they'd love to see that. And I'll also post the site and everything else. Thank you so much for coming on the show. I really appreciate it. You're welcome. I had a great time. Cool.
Starting point is 01:27:55 And for everyone out there, thanks for subscribing to us on Patreon and checking out Audible on our behalf. We really appreciate that. And we will catch everyone in a couple of weeks. See you later. Music by Eric Barnwell. Programming Throwdown is distributed under a Creative Commons Attribution Share Alike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you must provide an attribution to Patrick and I and share alike in kind.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.