Programming Throwdown - Databases

Episode Date: June 26, 2014

This show covers Databases. Tools of the show: Jason: nvAlt / nvPy Patrick: Arduino. Books of the show: Jason: Negotiating For Dummies http://amzn.to/1jS9Yem Patrick: Steelheart http://amzn.t...o/1lsRxBv ★ Support this podcast on Patreon ★

Transcript
Discussion (0)
Starting point is 00:00:00 Hosting provided by Host Tornado. They offer website hosting packages, dedicated servers, and VPS solutions. HostT.net. Programming Throwdown, Episode 34, Databases. Take it away, Jason. Hey, everyone. So just real quick, I know we promised the episode on Swift it is coming Patrick and I are just really uh it's coming simply Patrick I'm just really trying to learn
Starting point is 00:00:32 Swift but uh it's taking a little bit longer than definitely I anticipated uh more just because you know we want to uh there's not a whole lot out there right and so um you know a lot of it is just understanding how the language works and things like that. So we'll definitely get to that. But in the meantime, we'll definitely, we have a really awesome show here on databases that. I don't know what you're talking about, man. I already wrote three top of the chart performing App Store games. Oh, yeah?
Starting point is 00:00:58 Flappy Patrick. Yeah, and I quit my job. And I'm a billionaire. Oh, man. So from the mailbag, Ash Booth wrote in, and he said, Hey, guys, I discovered the show about a month ago. I'm an AI slash machine learning PhD student, and what's it like to go from academia to you know industry to working uh well they're
Starting point is 00:01:29 both they're both working but what's what's it like to go from working at a university uh or or studying at a university to a full-time job maybe i should answer this because i you went straight from elementary school to working i did have a job for most of high school and college. A full-time job. Anyways, so yeah. Well, yeah, you go first and then I'll chime in. I was just joking, actually. Okay.
Starting point is 00:01:56 So for me, I only did undergraduate, so four years of college. So I'm not an academic uh long-term student well you did a master's right what i did a master's while i was working so oh i see so you kind of did yeah that makes so i already started working and i had an internship actually after my second year of school so i went to school for two years and i had already worked and i think that eased some of it so i recommend internships are a really good thing oh definitely because engineering internships are almost always good thing. Oh, definitely. Because engineering internships are almost always paid.
Starting point is 00:02:29 I've never heard of an unpaid engineering internship. Right. And so you get money. That's cool. And then also, like, it does expose you for, like, you know, going back to school for the remainder of the time to, like, know what to study and what you're interested in and what the real life, I'm doing little quotes here is all about um but i mean i think the biggest transition is from the fact that like an academic is perfectly to say things like always and never and in real life you can't really say those things because you have to bend or break a lot of rules because you may be working on older technology or like the style of your team is just this and even though academically you may be able to show like this style of your team is just this and even though academically
Starting point is 00:03:06 you may be able to show like this paper says this this is the more optimal way to do this it's like well we already have coded this we've already invested the time and tested it and you know whatever um just kind of realizing you know like uh we were talking about this um today at lunch that uh you know in python in the Python compiler, there's something called TimSort. And it's named after Tim. And the idea is
Starting point is 00:03:33 they need to do some kind of sorting. And again, I'm not going to go into the details. I don't even really know the details. But to some extent, part of the Python compiler needed to sort something. But they knew something about the data like they knew certain chunks of the data were already in some kind of order and so tim sort uh you know took advantage of some of the assumptions they could make about the data to sort faster than n log n right which? Which, just to recap, like if you take computer science 101,
Starting point is 00:04:07 one of the first things you'll learn is that the two fastest sorts, theoretically, quick sort and merge sort, both run in n log n. But Tim sort, you know, given the data that Tim sort works on, runs in faster than n log n. And so the point is, you know, when you get down to industry, you're not working on like theoretically theoretical data. I mean, you're working on practical data that, um, you can make a lot more assumptions on. And so a lot of the theory kind of goes out the window. Yep.
Starting point is 00:04:45 That is, that's my biggest thing. I don't know. Yeah. I guess my biggest thing is, uh, um, I kind of look at it this way, like going into academia,
Starting point is 00:04:56 like as a student or as a faculty or, or what, how as a research assistant or what have you, um, is a lot like owning your own business. Um, like as a research assistant or what have you um is a lot like owning your own business um like as a student you're really in charge of your own fate more than you are as an employee like as a student you can very quickly within a matter of three or four months you can fail out of college right but you know outside of like sexually harassing someone or something which
Starting point is 00:05:25 which is very unlikely, like which would require like you to be a sociopath, like outside of those crazy things, you'll almost never get fired within three to four months. Like, like for you to be fired, for most companies for you to be fired for performance reasons. I mean, you have to actually be it takes like 12 months yeah you have to be messing up for a long time right i mean you really have to be going down the wrong path and know it for a while but school i mean you could you could you know really just like you know especially when you get to the higher levels like a phd um or even worse a professor like if you don't write you know if you don't, you know, if you don't get enough
Starting point is 00:06:05 grants accepted, and you don't get enough papers accepted, you know, there are people who want, you know, there are conferences that are looking for good papers, but they're also very selective. And if you don't get these things through, then you don't get funding. And then you have to kind of beg your department to float you and that you know ends badly and and um it's so easy to fail just like running your own business and to to just be destroyed in your career um but then on the flip side what you're given in academia is is a lot more freedom like in school you can study whatever you want if you want to be a biologist you'd be a biologist you know you study anything um but you know if you get a job as say a software engineer then then uh obviously it's much more limited right so those are sort of the trade-offs
Starting point is 00:06:56 well the other yeah well there's i i guess i think about two other things one is that like in school almost everything was whatever we call greenfield work so like everything is completely unwritten right you're writing stuff from scratch and most of the time when you're working on a job there's already something that exists yeah it might just be requirements but it's it's very rare i mean almost all my time spent coding is working in other people's code working in legacy code you know even when i'm writing new code it has to fit in with other people's code right there is some of that in school but it's you know not as much um and then the other thing is like the continuity of work like you could be working on the same thing so at least in like if you're doing graduate work and research is probably different
Starting point is 00:07:40 but like an undergraduate you know you're doing little projects but there's typically a long project would last a semester and most times projects are multiple per semester right in a class um and at work like you could easily be working on basically the same thing for years you know but easily you know months and you just every every night you go home and you come back the next morning and you're doing the same thing you were like the night before and you just keep doing it over and over and over again you know yeah right yeah actually there's this uh there's this interesting ted talk and we'll link to it in the show notes but it's about uh what makes us feel good about our work and um the reason why I bring it up is they did this experiment where you, they had a control group where they started off paying them $5 to build this Lego. So they had a box of Lego and you had to build something. I think it was like a robot or something.
Starting point is 00:08:41 And they had the instructions and after you built it they you know put it aside they took it to another room and they gave you a new box and told you build another one but this time it was only for i think it's like four dollars and they had some kind of power function right and so most people like kept building the robots until it was down to like 70 cents um then for the second group of people they had to build the robot and then they destroyed the robot in front of them like disassembled it and had them build it again and that second group like only built the robot once or twice and then said they were done and so like grad school is a little bit like being in that second group because you do a lot of work.
Starting point is 00:09:25 A lot of it is alone. And even being a professor, you know, a lot of the work you're really like in your own world. I mean, you have students who are part of your team who kind of report to you, but you're really on your own. And a lot of the work, this is my opinion, so take it with a grain of salt. But a lot of that work kind of gets destroyed you know goes into some research paper and then hopefully someone in industry picks it up and does something with it but i felt like there's a lot of that kind of being in that second group but when you are in industry it's true that like you sometimes can feel like you're a cog in a much larger machine but on the flip side it's like, you're part of a larger machine that's doing something pretty amazing.
Starting point is 00:10:09 And so that's sort of like the big tradeoff I see. All right. On to news. News. So this is pretty topical. There's a database, RethinkDB. And the impression I get is I don't know too much about Rethink, but the impression I get is they're trying to, you know, kind of compete with MongoDB
Starting point is 00:10:33 and some of these other document stores. And they released a new version that has a bunch of cool new features. So if you've never heard of RethinkDB, after you're done listening to this podcast and you are a database expert, you can definitely check it out. Oh, we can offer certifications. You're a certified database expert. Nice. We should team up with Udacity or something.
Starting point is 00:10:56 Oh, I got that emailed too. You did? Yeah, I was just thinking that. think of that yeah anyone who uh has an audacity account got emailed a couple days ago from sebastian thron himself um saying that they're going to start offering accreditation all right uh this next article is yours as well dhcpio what is that yes dhcp so i used to use DYNDNS, and actually I still use it. And they stopped doing free host names. I didn't know that. Yeah, but I had already kind of been using them for a while, and I got kind of nervous because I have, like,
Starting point is 00:11:41 a bunch of things tied to that DNS, scripts, things like that. So I just said, forget it, I'll pay them. I think it was like $20 for the year or something like that. But now I found out about this DHCPIO. I'm going to try it sometime this week. And it looks like it's exactly what DY and DNS used to be, where it's free and you could you set up some kind of service that updates your ip automatically and things like that so i'll check it out i'll report back
Starting point is 00:12:12 next show and let you guys know uh if it's if it's solvent or not um but uh at least you know it's good that there's other alternatives out there nice so my next one was a interesting article i was reading it's called folding a fractal so fractals are i think you know for some reason programmers are fascinated by fractals and mathematicians why do you think that is i think because it's just like we don't do at least a lot of programmers i know aren't they don't consider themselves very artistic but it's kind of something where you turn a little knob you change a little equation and you get something kind of beautiful out okay that makes sense it's like computer generated like procedurally generated things right programmers will like build all sorts of i always see procedurally generated you know moonscapes and landscapes and treescapes and people are just like really
Starting point is 00:13:06 like a simple algorithm that creates something complex okay conway's game of life right like all these things like you know complexity from simplicity um and this article takes an interesting look which i'd never heard about before for the um i believe it's the julia fractal and and basically talks about the symmetry of it and how you know you always hear the typical thing like you know you a fractal is something that takes place on the complex plane so the vertical axes is the imaginary component and the horizontal axes is the real component i think i'm'm getting that right. And, you know, you take a point there and you apply, you put it into an equation, you get a result, and then you take the result and put it back into the same equation. You keep doing that. And either the results will stay under some limit
Starting point is 00:13:56 or will kind of spiral out to an infinite number. And based on that, you kind of make colorings and whether it's in the set or out of the set and uh you know these beautiful things arise and this one you know you typically hear about in this very mathematical description that it's kind of dry kind of boring i guess but it's interesting and this one they talk about it in a way of kind of like folding and transformations and you'll have to read it they have some kind of cool animations they talk about it and it's just a different way of looking at it but you know kind of interesting that even with a different approach you can kind of arrive at the same thing and kind of have a
Starting point is 00:14:34 different insight in how and why the shapes arise like they do yeah this is really cool i'm actually on their website now this guy's website is incredible like yeah but it's poor performing well so yeah it's really cool but like i had trouble loading it a couple times yeah so i was just saying how great it is and and all of a sudden i was rotating his website around you actually like move the header of his website around in 3d and then i i got an achievement in the website. I got achievement unlocked for rotating the logo around. But as part of getting that achievement, I broke the website. Like, I can't scroll anymore. Oh, nice.
Starting point is 00:15:16 But I'm looking at the top of his header. So it is a, I mean, it is an interesting website, but it's not my favorite cup of tea but okay yeah a little too much going on but it's pretty cool i i haven't seen a website like that in a while so you can check out the link in the show notes or just search how to fold a julia fractal yeah definitely so uh yeah i have a piece of news here wave pod music scripting um i have a bunch of friends who are music buffs i uh tried to play the guitar once, and it was embarrassing. And I have no musical talent. Wait, wait, Guitar Hero.
Starting point is 00:15:49 I'm great. No, I'm not. I'm mediocre at that game. Well, there you go. That makes me a real musician. I'm also mediocre at Guitar Hero. But this is pretty cool. It's basically all JavaScript, so you can iterate pretty cool. It's basically, it's all JavaScript. So, you know, you can iterate pretty quickly.
Starting point is 00:16:07 And you just write some scripts. You write, basically they give you a function. And, you know, t comes in where t is, you know, the current time. And what comes out is an amplitude. And they have a few samples where they just do the sine of t, cosine of t, and you can hear the little sine wave mix. And if you do sine of 2t, you can hear a higher frequency sine wave. And then they even have some crazy examples where it's like they've made their own techno, their own electronic music um that you can just like you can just
Starting point is 00:16:46 execute in the browser so i thought it was pretty cool i've i've played with something like this before i'm trying to remember the name of it now uh there's like a scripting the same thing but it wasn't it's not like live in an editor uh but it was kind of like a scripting language that you would use and it did something very, very similar to this. Interesting. Yeah, it seems pretty cool. I mean, I know nothing about music theory or anything like that, but this looks really fun. If you're into music and you want to build your own sick techno beats,
Starting point is 00:17:20 this seems like a really good way to do it. Nice. This kind of goes in line with our last one man we have themed themed news reports this is good yeah totally all right time for book of the show book of the show so uh my book i think you should go first all right i think you should go first no you should go first i think you should go first um let's make should go first. I think you should go first. Let's make this win-win. Let's talk at the same time.
Starting point is 00:17:51 Let's make a deal. How about I go first of book of the show, and then you go first tool of the show? Okay. All right. We just had there, ladies and gentlemen, was a negotiation, and I won only because I read Negotiating for Dummies. And I talked about this book a long time ago.
Starting point is 00:18:11 I'm getting ready to buy a car. We've talked about this on the show in the past, but Patrick and I both have pretty new families. Are you getting a minivan? I don't know. So I kind of are you getting a minivan? Uh, I don't know. So, uh, I kind of think the minivan is kind of cool,
Starting point is 00:18:28 especially because, well, cool isn't the right word, but I like the minivan because it can hold, um, it's sort of like having a hatchback. Like it could hold bicycles on the back and things like that. Um,
Starting point is 00:18:40 but then I also see the, like SUV is kind of a nice way to sort of have a bigger car that could hold more people. So do people not like minivans just because they don't think it looks cool? A lot of people really don't like minivans. Yeah, I think the problem is I think minivans got such a bad stigma. It sort of, like, it sort of, like, has the whole soccer mom connotation. Like, I'm completely indebted to my kids' happiness, and my life, like, doesn't mean anything anymore. It's sort of, like, I feel like some people attribute minivans to that, to that idea.
Starting point is 00:19:26 And it's obviously not true it's just a car but um you know and then so they came up with this suv where it's like suv i think if i understand correctly it's like a trunk a truck uh chassis so you could feel like you're driving something with muscle but like you know it's like it's like yeah I have a chassis and a frame that can withstand Arctic tundra but I'm gonna drive you know three people you know in suburbia you know so you're relearning how to negotiate so that you don't overpay for your car exactly so you know I'm still in the early stages I'm still trying to figure out what kind of car to get and things like that but I have this book hot on the fingertips for when I get into that. So when you show up to the car dealer, you just plop the book down,
Starting point is 00:20:14 and then they're just like, okay, okay, fine. Yeah, that's the plan. The plan is to just start actually just reading quotes from the book until they give me the price I want. But, no, I thought this book was great um you know and so it's uh it's one of the few books that like i actually use as a reference like most of the time you know we actually today's day and age you just look things up online like you know like we we have a lot of books of the show on programming languages and things like that and they great, especially if you don't know the language. Like I'm reading the Swift programming guide right now.
Starting point is 00:20:50 But like then if we do the show. Well, but at the end of the day, if you if you just forget how to use a function or something, you're not going to use that as your reference. You're going to Google it. But this is a book where it's like i feel like to keep going back to it so i figured i would uh go back to it here and make a book of the show which it hasn't been yet so my book of the show is a fiction book called steel heart by brandon sanderson he writes a lot of books this guy's crazy i don't actually know how he writes as many books as he
Starting point is 00:21:22 does some authors you know it's like a long time between books and he doesn't he actually has multiple series going in this guy but like how do you think they do that like even terry brooks or sorry terry pratchett oh yeah he writes a lot of books yeah how do they do that it's unbelievable and his are his are mostly discworld books but like he does write other books as well yeah but this guy's got like he's got at least like three series going i think like three different universes or whatever maybe more how they do it um and he comes out with like it seems i'd have to go look at his but it seems like every couple months there's a book coming out um yeah i don't know i don't know how people do it
Starting point is 00:22:00 and some people it's like you know i'm reading a trilogy and it's like been like two or three years since the last book came out it's like i really want the next one the author's saying like it's going to be like another year at least and it's like ah yeah um so sometimes it's better just to wait till the trilogy is done and then start reading it yeah the lady who wrote the vampire books uh What's it called? The one with the vampires and the werewolves? Anne Rice? Like, Interview with a Vampire?
Starting point is 00:22:30 No, no, the one that they made the movies of. You know, the girl, like, she has a vampire and a werewolf, and she doesn't know which one to date. Oh, Twilight. Twilight, that's right. So the lady who wrote Twilightilight apparently she was going to write i guess like another like some kind of epilogue um book but then the book got leaked out and so she canceled it and that kind of blew my mind because it's not like you know it's not like
Starting point is 00:23:02 you can't cancel a book i mean it takes so long it just i thought that was pretty wild it takes so long and it's such an individual effort that like the idea that even if someone spoiled my book that i would just throw away a year of my life yeah but you wouldn't write that kind of book nobody's gonna spoil your programming book, my friend. I'm just saying, like, no matter what book I wrote, if somebody spoiled it. In the end, it compiles. If somebody spoiled anything that I spent a year working on, I would still just finish it. But, you know, then again, she's already written so many books, it probably doesn't matter. Yeah.
Starting point is 00:23:39 Some people at that point, like, it's probably not for the money anymore, right? Yeah, I guess if someone spoils it, there's nothing left happened with the jk rowling right the person who wrote harry potter wrote another book oh man i wasn't prepared for this but she wrote another book but she wrote it under a pen name and i think it kind of got like like it was well reviewed but not like people like didn't get excited it wasn't like a bestseller i'm probably butchering this story and then like kind of came out that she wrote someone leaked like her publisher or something leaked that it was her and then all of a sudden it was like you know through the roof sales and all of a sudden it was the you know great book and all this stuff oh wow you know but like she was trying to dodge
Starting point is 00:24:16 it you know because she didn't need the money so she just probably just wanted to write and not have people view it in the light of harry potter yeah it's interesting there was some writer who gave a ted talk recently uh who um it was one of these like i think the the topic was like how do you survive your own success or something like that and basically uh she just said that like she had like huge writer's block because she was like consumed with like her own success like not that she was like she was gloating over herself every day but but it was more just like she felt like how can she top that you know and the the short story is that she just decided to write a book like she kind of just got the cojones to write the book and the second book tanked everyone was like this is
Starting point is 00:25:01 nothing like the first book um and then that sort of like gave her the freedom to write the third book which ended up being like her best book or something but i uh yeah apparently like uh you know again like i was saying it's such an individual effort that like it's sort of like movies you know i would imagine like at some point you're going to the movie just to see ge Clooney, and it doesn't really matter what he does. It's just kind of a weird dynamic. Coming full circle, Steelheart by Brandon Sanderson. And this is kind of like an interesting take on the comic book superhero,
Starting point is 00:25:38 supervillain theme. Oh, okay. So it's kind of that way, but it's not a Marvel universe. It's not like DC Universe. And it's a book that way but not like it's not a marvel universe it's not like dc universe and it's a book not a comic book um but kind of in that kind of theme but with some interesting twists on it so it was nice it was a good read it's a short rather short book okay so it's not a comic book it's uh no but it's like in that kind of world right like same kind of ideas cool cool cool so i get to go first for tool of the show.
Starting point is 00:26:05 All right. Mine is Arduino. So we've talked about Arduinos before, and I recently used it for a project, and people at work were giving me a hard time because, you know, everybody's like, oh, Arduino's overpriced, and, you know, you can get this other ARM device,
Starting point is 00:26:21 which is 32-bit and 100 megahertz and only $5. And then by the time they finished that discussion, I actually had the project done on my Arduino. And I'm not like Arduino is the best thing ever. A lot of people try to do stuff with it that you should just move to another processor. But if you're trying to do rapid prototyping, I've realized how wonderful of a tool it is because there's so much example code in libraries and peripherals out there that just work with it and you just do it and it works and if you need to it's same thing
Starting point is 00:26:56 with kind of linux distributions in a way it's nice if there's a big community because somebody probably asked a question so it's good if you can ask a question and eventually someone answers it it's better if you just look for the question and it's already been asked and answered with arduino don't you have to write in some wacky language though yeah but it's basically just c like i just write in c and it works oh really okay sort of i mean i don't know i don't do any like applications programming in it right so it's all like manipulate some hardware is what i'm typically doing with it and so i'm normally grabbing some library and some example code and changing it just enough
Starting point is 00:27:30 to where it works and uh it's really fast for that and you know i install the ide i do the thing and it's not i couldn't imagine trying to do a complicated project in it it would probably be terrible but like i said in the time that i could have figured out how to get a compiler installed properly compiling and properly flashing and you know the bootloader working on a five dollar whatever other arm board um it probably would have been very very powerful and would have taken me weeks to like get working before i started and i was done and you know like a couple hours for the thing i was trying to do yeah the arduino has wi-fi right uh some versions of it do yeah yeah because as long as it has wi-fi you could do almost everything on a desktop at least you know if you're prototyping you don't really need to have that much logic on the i'm not sure i'm not sure where you're going
Starting point is 00:28:21 but yes well no i was saying like like well because, you know, if you could do everything on the desktop and just communicate with Arduino devices, then the ID doesn't really matter that much, right? Yeah, so I mean, that's one way to look at it, right? Like some people, even some of the Arduinos have this where they can pretend to be basically a keyboard, right? And so there's interesting like data logger Arduino stuff where you just want to like measure the temperature in your office, right? could try to go buy some usb thermometer and figure out some sdk
Starting point is 00:28:49 or you can like hook an arduino up to it and have it type in the numbers into a spreadsheet and push enter after the end of each reading um it's just kind of an interesting hack um but in some ways is really straightforward and quick. So if it works, it works. And if you need to go to something more complicated, you can, and you'll know sooner. So, and that's why Arduino is my tool of the show, even for, I don't consider myself an expert programmer, but someone who does a lot of embedded work. Like if I had a board that was my preferred setup and I went to it all the time, sure, maybe that would be better. But since I don't, I, you know, when I have to do stuff with hardware and I just need
Starting point is 00:29:30 it done, um, Arduino. Yeah. Cool. So what's your most recent like home project, um, with Arduino? With Arduino? So, uh, I, it's kind of hard to explain i'll do the other one so i was trying to i wanted to to build like something for like just like entertainment value to um move like a flag back and forth you can kind of think so i want to just like attach a servo motor to a flag and like
Starting point is 00:30:01 have it wave the flag right okay um you know very patriotic i don't want to have to actually wave the flag so i built something to wave the flag for me so in preparation for july 4th you're gonna have a robotic yeah waiver so it's basically that yeah and so i programmed arduino and just told it to go from one extreme to the other back and forth back and forth back and forth and it was done in like i already had a servo i I had an Arduino board, hacked it together and it was done. Before that, I built an Arduino that moved two stepper motors that were on pulleys for strings. And they were attached to a pen and they moved the pen around a piece of paper and drew pictures. What? That sounds amazing.
Starting point is 00:30:44 Oh, well, it was kind of the hardware part was interesting. And then like it was just programming after that. It was like, oh, this is kind of the hardware part was interesting and then like it was just programming after that it was like oh this is kind of boring actually so you you uh you could give it like uh so you give it like an image and then there's already tools like people have done this before which i guess is why it wasn't that interesting to me because it kind of felt cheap um but like people you encode it threshold or whatever and then you do like traveling salesman problem like how do you go from all the the point cloud basically to form like a pattern that looks like the image right and then you send those coordinates for each of the paths um and the arduino would move the pen to that move it move it move it and you just keep streaming the commands over
Starting point is 00:31:20 that sounds awesome man you should document these i mean i would love to see a video of that working yeah but like i said it was something other people had already done so i was like i was just following their instructions i wasn't really adding anything new patrick we don't know those people we know you okay okay yes i i'm terrible i don't contribute back to the internet i hoard it all the way in my head okay that's pretty awesome my tool this show is not definitely not as cool but uh i was actually talking to a co-worker about sort of productivity tools um i have a bunch of like you know at work i have a bunch of email add-ons and all sorts of email hacks to do all sorts of crazy so i like this naive base filter
Starting point is 00:32:05 that runs on my email and um just to like you know it can actually shave hours off your workday like if you can get your productivity up uh like or at least by up i mean like more efficient wait it doesn't shave hours off your workday you just get more done in a workday uh yeah i guess that's unless you're like i don't know most people you kind of expected to work a certain amount right so like if you get done sooner do you just go home um i typically end up doing more like well so so i've never you know left at 1 p.m or something like that right but at the same time i don don't really follow any schedule. I just kind of have this feeling like, okay, now it's time to go.
Starting point is 00:32:52 So instead of working later, you can go home at a good time and still have gotten done a good work day's worth of work. Yeah, right. So this tool is called NV-Alt. There was originally a tool called Notational Velocity. I think the tool still exists. It's for OS X. It's pretty cool. It uses this wiki format.
Starting point is 00:33:13 So basically, you write in this wiki format, and it has this pane off to the right that has the rendered format that's updated in real time. And the cool thing about it is it just feels very natural. You write what you want the note to be. Then you hit enter and you start writing the note. And it has a search functionality. It has a bunch of cool stuff.
Starting point is 00:33:36 The thing is I don't really like the wiki format. I much prefer Markdown, especially the GitHub-flavored Markdown. The extended Markdown has tables and things like that. And so NV-Alt is an open-source alternative to notational velocity, which uses a markdown instead of wiki markup. So I used that. I found out, you know, because we were doing this show, I did some research, and there's actually one called NVPy,
Starting point is 00:34:05 which also supports Markdown, and it's written in Python, so it works on any OS. So you can actually take notes on Linux, those notes will magically work on OSX or on Windows or what have you. Also, it has sort of this, it can be locked with a password. So in other words, you can put all your notes on, say, like Dropbox or Google Drive or iCloud or what have you, and share them among your computers and not have to worry about somebody who doesn't know the password getting access to your notes. So it's got pretty good security there.
Starting point is 00:34:43 It has the indexing and all that. Oh, also you can at any point turn your notes into sort of like a wiki thing. So it'll basically take all of your notes, render the HTML, and then put them in file names based on the title and cradle index. So it's sort of like your own personal wiki that you could publish to the Internet. I've never used that feature, but I just think it's kind of cool. Like one of these days I might need that. Um, but, uh, but yeah, so envy all it's it's, if you need to kind of take notes, whether you're a student or, you know, during meetings or something like that, I found that one really useful. Interesting. I don't have to check that. I don't have a good note taking app, but I
Starting point is 00:35:20 do take lots of notes. Typically I used to use this emacs extension I forgot what it was called but yeah this is way better than that this is similar to one I've considered before to do dot txt oh okay I haven't seen that one yeah yeah
Starting point is 00:35:39 maybe I'll come back with that as a tool to show if I try it alright yeah let me know try this one and try that I haven't heard of try it. All right, yeah, let me know. Try this one and try the... I haven't heard of that one, so if that one's better, let me know. Okay, so on to databases. Data, big data. We're going to store this away and update the row containing this podcast file. No.
Starting point is 00:36:01 Yeah, the SQL syntax, for some reason, I could use it every day. There are some times where, let's say over the course of a month, I use SQL every day, and I still don't remember the syntax. I just don't know why. I remember there's select star in table or from table. See, there you go. Anyways, so before there were databases, which is probably going way back because databases have been around for a long time. Actually, how long?
Starting point is 00:36:42 Databases were around in the 1960s so before that and up to today even people use flat files right i mean it makes sense like you just have uh you know if you have some file on your computer um you can actually do uh you can actually seek in that file like most files are random access files um as long as you're not using like a tape or something like that and so you can actually just seek to any point in the file and read that point or write that point and so a lot of people will just write out like you know text files like for example if you have a web server every time somebody visits your website you might append a line to some flat file or raw text file saying who came to your website.
Starting point is 00:37:30 And, you know, this is great for a lot of things. But it doesn't do everything. There's a lot of reasons to use a database. You know, one of them is indexing. So the idea here is you have a lot of data in your database. Let's say you have a list of names. So you have Patrick, you have Jason, you have Jennifer, Sophia, you have all these names, you know, a column just full of different people's names. So you know most of the time you want to look
Starting point is 00:38:04 someone up by their name, like, give me all the Patricks or give me all the Jasons. So what you don't want is to have to look through every name in your whole database and then pull out the Patricks as you see them. Like, you'd want to do some kind of pre-processing and build an index such that when you say, like, give me all the Patrick's, it's kind of ready to go. What's the difference between an index and a reverse index? Do you know? Nope. Don't know. I don't know.
Starting point is 00:38:36 I think, so, you know, a reverse index is where, I think it works like this. I think, works like this. I think, let's say you have a bunch of records. An index says, OK, record 1 has Patrick in it. So if I look for record number 1, I want to get Patrick. Or if I look for record number 14,500, that one's Jason. That's an index.
Starting point is 00:39:06 Like you're indexing into the database. A reverse index is when you want to go the other way, like give me all the Jasons or all the Patricks. So, you know, even a flat file could have indexing because you could just say like every 10 megabytes in the file is a new person. Yeah, so it's just reversed versus forward it's just like a matter of how the data is stored versus how it's looked up kind of right so yeah like you said like if you have a log right you first you just start adding lines to the end of the
Starting point is 00:39:35 log but eventually you want to like have a way to go straight to a given day when it gets really really big right and you don't want to have to parse all the days to find out like where a given day starts so you may like put up at the top like day one is that bite this day two is that bite this day three is that bite this right it's like a ford index because kind of like the way it's stored reverse index would be like if some for some reason you wanted to go like the other way like from a given entry to like know what day it was in right like right right so it's not exactly the way it was stored um you know that data in this case is kind of a bad example because it's probably stored in that portion of the log but if it wasn't like hey i'm at some part of a file for some
Starting point is 00:40:15 reason i want to go back to like how what day it was that you know like going the other way yeah exactly or like tell me what days user x logged in or something like that yeah and you need to reverse it next so the point is databases do both of these so um so you don't have to think about how to build it's actually very hard building those structures um so you don't have to think about that the database will do that for you similarly to what the discussion is as it typically comes as data grows right so like it starts off simple you do something simple that's fine then as it grows you start having a problem like caching so if the file gets really big and you're accessing it in the same way a lot if this is a flat file you know depending on your
Starting point is 00:41:01 ram and your cache maybe it's getting caching at like the cpu level but as it grows like bigger and bigger right like you want a more sophisticated caching scheme that understands the access patterns you're likely to have yeah exactly yeah i mean the the cpu will cache in its own like you know l1 and l2 cache but these are very small right but the uh you know a database can you know cache like 200 megabytes of information you know in main memory and it does like the caching between like your hard drive and the main memory and uh and again the way that most databases do the caching is very sophisticated so i know you don't want to be writing that yourself yeah no it grows like you could write simple caching yourself but yeah more complicated it's better for someone else to
Starting point is 00:41:51 have done it yeah definitely um another one is redundancy so you know you want uh you want all your data to be backed up in real time. Like imagine if you're processing orders, like you're a furniture, online furniture store or something. You know, if your computer dies or even if, you know, just the power goes out and you have to,
Starting point is 00:42:17 your computer hard resets, you know, if somebody is in the middle of an order and you double charge them or you forget to charge them or something, it could be a big deal, right? Or if you uh if you lose the entire your your all of your data and you have no way to recover um that's a huge disaster right and then it gets really tricky because the way you back things up i mean you could have infinite loops like you'd have a student record who which points to the classroom they have a student record which points to the
Starting point is 00:42:45 classroom they're in, and the classroom points to the students that are in that class. So you can have these sort of circular references and things like that. And so being able to back up something like that, some kind of structured data, and to recover it, these things are really tricky. And you don't want to be implementing them yourself
Starting point is 00:43:05 similarly trying to scale them so running on one computer is all well and good more memory more cpu eventually that runs out and now you have to scale to multiple computers also called distributed databases right and now you have a whole host of other problems you got to handle about do you put one table on one computer and another table on another computer or some users can data on one computer and some on another? And then how do you split up that? Right. And like starts to play in like a lot of different choices about how you want to scale the data and, you know, understanding the problem set so that you know that you're scaling in a way that's going to be beneficial for you yeah exactly yeah i mean most of these databases if you go to amazon or microsoft or google or any of these like big services i mean there's just there's a ton of computers that are
Starting point is 00:44:00 all working together to store like one collection of information and yeah trying to implement that again really hard very hard to get right even the experts don't get it right all the time right there's a lot of heuristics a lot of guessing and hoping that certain conditions aren't met and things like that and there's a lot of fingers that are crossed and and uh um and so it's definitely not like a solved problem and you don't want to be in the business of trying to solve it i mean unless unless you're getting a phd or something in databases but for the rest of us you know we should leave it to the experts um also segmentation right i mean most people access access data in certain patterns, and they might only need, you know, if you're looking at, say, like SQL, which we'll talk about later, row of data is kind of one atomic thing,
Starting point is 00:45:06 I'm going to break it up into pieces. Or as Patrick said earlier, I'm going to break each day and put each day in a different place. And it can look at sort of your access patterns and also some hints that you can give the database. And it can divide your data, segment your data into different pieces, such that it doesn't have to do a lot of extra reading. It can just read the parts that you want and do that very efficiently. I guess it should be pointed out some of this we're blurring a little bit, maybe it doesn't matter, between database and a database management system.
Starting point is 00:45:42 Oh, yeah, that's a good point. In some way like database. Yeah, go ahead. Oh, I was just gonna say that the, you know, it's kind of a whole package, right? Yeah, but in some ways, like a database is, is a theoretical design thing, right? Like, you know, for a relational database, I'm defining, you know, here's my schema, here are my relations, here are my constraints and like you know i want some physical implementation of that but you could arguably have different front ends that handle it differently based on
Starting point is 00:46:11 like how stuff is getting put in or taken out but in reality i don't know of anyone who does that for any practical reasons typically you have one database management system which is also managing kind of like the database the back end um but database management system is where things like what jason was talking about like observing usage patterns and deciding what to go ahead and pull out a priori you know say like oh i know you might going to look at this next that would be something that the management system would be handling versus like the database itself may have less of an influence on that. Yeah, that makes sense. I don't know. That may be arbitrary.
Starting point is 00:46:49 The other thing that is really good for a database is, initially you start off storing data, retrieving data, but then as you have more users and more data, you want to do analysis. You want to say, what kind of trends do I have? How many users are visiting my website each day? Is it growing or shrinking? What regions? And the more ways you want to slice and dice and analyze your numbers the more you're going to want them to be in a database which it's kind of understood how to write queries to do this otherwise you end up re-implementing a lot of that yourself over your whatever your other implementation of storing your data is. Yeah, that makes sense. Yeah, a lot of the database-driven languages are designed for people
Starting point is 00:47:30 who want to extract meaningful information from their data. And yeah, you're going to have to be re-implementing that, which is not fun. Yeah, another one, we'll just kind of round this out, is validation. So there's a lot of database management system support triggers. So, in other words, if, say, you know, say an entry is written for a credit card transaction and it links to some credit card, but then that credit card isn't there, then what that means is clearly some system has made some mistake because it's tried to reference a credit card that doesn't exist. So you could have some trigger that says,
Starting point is 00:48:17 when I see a credit card ID for a credit card that's not in the credit card, you know, part of the database, then, you know, send an email or page somebody or something like that. Also, you could do client side. Well, I don't know if you'd really call it client side, but you could do sort of immediate validation, right? So if somebody tries to insert a record, you know, without inserting its sister record that it's joined with, or some other kind of validation. If somebody puts a negative age in the age column or something like that, then it just
Starting point is 00:48:55 won't accept that operation. It says, hey, if you're going to put an age, it has to be at least zero. It has to be a non-negative number. So a lot of these types of validation, this type of validation logic is very easy to do. They make it very easy and practical, which is these are all things that you'd have to write manually in C++ or something like that if you're writing your own database. So we're going to start talking about the different types of databases. And one thing to mention as a side note, and it's a very detailed subject, so maybe we'll just kind of gloss over it, is the issue of consistency. So this is when a single person is updating, or a single user computer process is updating your database. You typically don't run into this problem,
Starting point is 00:49:47 but it's when you have multiple users that you begin to have an issue. And consistency, one, the kind of gold standard, I guess, is ACID compliance, which is atomicity or atomic things consistent, which is I kind of grouped them all in together but i guess consistency is one part of it isolation and durability and we won't go through what each one means you can read about it and they have very precise definitions um but these are basically that what happens what are the guarantees i can say about when i successfully insert a row or update, what do I know has happened or not happened? So like, if you think about a bank, both my wife and I can go use the same ATM card,
Starting point is 00:50:34 or, you know, my ATM card, her ATM card, which linked to the same account. And we can try to draw money at the same time. And the system needs to be very carefully set so that we aren't able to draw, you know, both the full amount or the bank will end up giving us too much money. And so in that case, you want to make sure that you understand what happens when there are parallel transactions occurring. Right. I mean, in Patrick's case, the transactions are happening very slowly. But here, I mean, there's just there could be millions, maybe billions of people accessing your database. And so it becomes very likely that two people try to delete the same thing at exactly the same time. And so that could be very difficult to handle, right?
Starting point is 00:51:18 Or two people try to insert the same thing at the same time. Or you're in the middle of an operation and the power goes out. These are all just very difficult things to handle, and a lot of thought has gone into sort of how to deal with that. So, yeah, there's many different types of databases. Getting back to Patrick's point, this is sort of more we're going to talk about the actual database side,
Starting point is 00:51:45 and then we'll get to some implementations of management systems later on. But the two that I think most people will be familiar with are relational databases and then everything else, which everything else you've heard maybe NoSQL databases, but basically that's sort of a catch-all for just a varied assortment of other databases. And it just happens that because of changes in computation and networking and in sort of the Moore's Law kind of fading and things becoming more distributed, the NoSQL databases have become much more important. And so we'll cover those in general. So why don't you go first? What's a SQL database?
Starting point is 00:52:38 I mean, I think if you know any kind of database, this is the one you're likely to know. So SQL is the structured query language. This is where you see select, star from the table name, where, some limits you put on the query, and the query runs. And similarly, there's a schema associated with it. So when you create tables and you have constraints, and some of this is particular to a database management system, and some of it is part of a standards-based language that you can use. And there are all sorts of things which support SQL-like languages. So you can write a query like what I just described, select star from users where creation date is yesterday.
Starting point is 00:53:19 Something like that. It's a very common way. A lot of things use this because it's a a very straightforward notion um but ultimately underlying it the relational part says how do how do i provide links between two tables um and it gets into what you'll you'll hear the term normalized and denormalized data and patterns for that so you want to be efficient in saying like if i have a whole if i have a teacher and a teacher has a bunch of students uh the students also have classes right you can try to model you know kind of each row contains all the data necessary but that's you're replicating a lot of information and you're
Starting point is 00:53:56 not representing the relationships there that there is a a teacher entity and a teacher entity is linked to student entities and linked to class entities and those relationships can be one-to-one many-to-one one-to-many many-to-many right you can kind of have constraints about what those things are like and keys that index into other tables so like a student id is the key and that might be what you would store in a class roster a teacher students and then there might be like a student record in a class roster, a teacher's students, and then there might be like a student record which is keyed by student ID and contains like the student's address and their billing status, that kind of stuff.
Starting point is 00:54:33 And that is the traditional relational database structure, and you use SQL to interface and do what joins between these tables to construct the information necessary to get the query to run and the information that you are looking for out whether it be to look up a specific set of information or do analysis yeah definitely um yeah so a lot of um a lot of sq SQL and these kind of things, as Patrick was saying around this concept of IDs and doing joins. So, for example, you might have, say, a student's table, and it has a record for every student, but then that record also has what's called a unique ID. So, you know, Patrick and all of his grades and things like that might be in the student table and might be given an ID of like 123.
Starting point is 00:55:29 And then in, say, the classroom table, you might have each classroom also has its own unique ID. And the classroom has a list of students. But instead of putting the whole student record there, you could just refer to them by ID. So you could say, okay, in this class, there is student 123, student 456, so on and so forth. And then you could take that list of IDs and go back to the student table and say, okay, who is 123? And so this sort of breaks a lot of these circular dependencies
Starting point is 00:56:07 or having to have the same information twice is by storing a lot of these almost like pointers. And there's very efficient ways of looking up things by their ID and things like that. So then we get to every other kind of database in the world as it seems now. So yeah, NoSQL databases have really taken over for a variety of reasons. There's several of them. One is KeyValueStore, and this is the oldest. I mean, this might even predate SQL databases.
Starting point is 00:56:39 But this is a very simple idea. Imagine if you had a SQL table with just one column. Well, you had, I guess, two columns, the ID and then some other column. So that's all a key value store is. You give it the key and it spits back the value. And so you can think of this almost like a cache. Most caches are key value stores. And so even though it's very intuitive, it's one of the oldest types of databases, it actually proves to be very effective. There's something called a column family database, which is similar to a key value store, but the idea is for one key, there may be many values. So again, if you look at, say, a SQL database, you have, you know, a student record,
Starting point is 00:57:35 and you might have the key for that record, and then a bunch of other columns with information, just student name, you know, grades, so on and so forth a column family is very similar where you have you know like a student ID and then you also have you know some more a set of values about that student the only difference is in sequel the idea is the ID itself itself doesn't contain any information. So in other words, you know, like Patrick might be given an ID of 123, but that, like the application shouldn't know or care about that ID. It's just a number that is used in the database to, you know, look up Patrick. And it's even, most of the time it's given by the database. The application can't
Starting point is 00:58:26 assign that unique ID. And the IDs are chosen in a way which make indexing and things like that really easy. With a column family or a key value store, the assumption is that the key itself is important. So in other words, I'm not going to say Patrick is 123. You know, Patrick, the name, could be the key. And then the value is all the information on Patrick. And so you can actually have multiple keys with the same name. And, you know, there could be collisions there that have to get sorted out. You know, depending on how the management system works, the key might
Starting point is 00:59:06 have to be unique. But yeah, the idea there is with the column families at the key and the key value stores that the key matters. Thirdly, there's a document store. And this is something that's becoming much more common recently.
Starting point is 00:59:21 But the idea of the document store is that it's much more unstructured so if you were to think of say XML or JSON or something like that you can represent a variety of things in XML or JSON and you could even you could have a collection of JSON objects which have different topologies right so in other words like I could have one document in a document store which is the Patrick document and has like a list of Patrick's grades but I could have another document in the same database which has you know Jason
Starting point is 01:00:02 and has doesn't have. That's something completely different for whatever reason. So in the case of Document Store, it's much looser. And just through advances in the technology and also computation power growing, we're able to have these more sophisticated database management systems
Starting point is 01:00:25 that can do all the things we talked about, the validation, the indexing, the caching, without having such a tight schema as we do with SQL. So even with such a loose schema as you have in a document store, you know, given the right kind of resources and things like that, you can still do a lot of the things that we said at the beginning. All right, so now quickly as like an overcap, we'll talk about because you may be thinking of a given database and like, oh, I wonder what it is,
Starting point is 01:00:57 or maybe we just mentioned some databases you might be interested in looking up. We're going to kind of go through some breakdown of the various implementations of the databases and some examples if we can give and what kind they are so first first kind of section is the lightweight embedded you know probably only runs on one computer database and the first example that is very popular is sequel light s S-Q-L-I-T-E, SQLite. And it is a SQL implementation, a relational database. But it can be even embedded in your program and is super simple to use.
Starting point is 01:01:37 And kind of one of those things, if you start to get any sort of complicated data you're trying to store in your program, you should consider using this instead because it may make your life uh you know easier and it's it's quite efficient it doesn't take you don't have to run separate server uh and connect to it you um it kind of reduces the complexity of that that's necessary yeah i mean i think it's even built into some languages like i know python has sqlite built in i didn't know that. Yeah, you really don't have to do a lot of work to get it up and running. And I know Jason's used BerkeleyDB for stuff, right? I think you were talking about that. Yeah, BerkeleyDB is pretty cool.
Starting point is 01:02:14 Actually, I'll list another one on here, MapDB. And so these are also embedded. So as Patrick was saying, let's say you have some program that you're writing and you need to store some kind of configuration. Like the people, you know, what sounds the people want when they log in or, you know, it's some kind of music app, you know, or sorry, some kind of like music creation program. You know, what settings do they want their piano set at? These kind of configuration things, these embedded databases are great for that because you could just load it from a file store to a file. It could be part of your program.
Starting point is 01:02:55 You don't have to go phone home or something like that, contact a server or anything. So SQLite is a relational database, so it uses SQL with the structure tables and things like that. If you just want a key value store, so again, you look up a key, you get a value. Then you can use BerkeleyDB or MapDB. BerkeleyDB is written in C++, but there are bindings for a bunch of different languages. Map DB is a lot better, but it's written in Java.
Starting point is 01:03:30 So the quick answer is if you're using Java, use Map DB. If you're using any other language, use Berkeley DB. Next, we move up the scale to something that you would probably be running on a server. But if you wanted something really fast, and you know, your data size was kept under control, but you wanted it to run in memory, you get to your next set of databases, the one that is you see all the time on like hacker news and other websites is Redis. Is that how you say it? I actually don't know. But I think so okay it's like all these names i only ever read them so actually i just happened to just total coincidence
Starting point is 01:04:10 i went to a redis uh meetup and and it is redis at least the people there all said redis all right so yep redis and that's a key value store so So it's gaining a lot of popularity recently. It's pretty lightweight and it's really fast if your data size is in the range that you can hold in main memory. Yeah, keep in mind like it, once you shut down Redis or even like periodically, it saves to disk. So it's not like, it's not truly in memory
Starting point is 01:04:42 where you shut down the computer and you lose everything. But the idea is, yeah, at any given time, all of the data is in memory. Yep. Another one is memcached. I think that's how you say it. Or do you just say memcached? I say memcached. Really?
Starting point is 01:04:56 I heard people say memcached, but okay. It could be. I mean, actually, I've only said it. I've never heard anyone else say it now that I think about it. Okay. So I don't know this is another one you hear and this is often when you hear about somebody's having a performance problem uh with their sql database right we'll talk about some in a second but you
Starting point is 01:05:15 know they're having problems because there's some specific data that it just has a higher access rate or different pattern than what you know sql stuff was built to handle. And you'll hear about people talking about this as a way to store data in memory and have very fast access times and kind of alleviate some of the slowness that can happen if you have too many concurrent users to a SQL database. Right. Like in contrast with Redis, memcached, when you shut down the machine,
Starting point is 01:05:43 it does destroy the database. So the idea with the memcached, as Patrick was saying, is to use to augment another database. And it's for when that database is too slow. So you can imagine things that would be good for Memcached should be the front page of your website, things like that. And there's a lot of, if you use Ruby on Rails or if you use a lot of these content management systems, most of them will support memcached.
Starting point is 01:06:15 And things like the front page of your website, they will put in main memory. I have no idea why I've heard people say memcached before because it looks like it's just memcached you're right oh is did you actually find a pronunciation so well i i'm trying to look for it so i see other people saying that they hear it as memcached um i don't know i i can't find a definitive answer i think you know so memcache d makes the most sense because i think it's a memcache daemon i think that's where the name comes from yes but i just i don't know i mean just because it makes sense it doesn't mean it's right so it's kind of silly but i remember when linux was first kind of well when i first started encountering it and nobody knew
Starting point is 01:07:02 exactly how to say it like none of my friends um this is before youtube videos i guess so you couldn't look on youtube videos for yeah i remember when people called it linux yeah right exactly linux linux linux it's like people would say weird stuff i've never heard that one uh yeah i heard all sorts of strange things linux but now i guess there's so much video right like i don't know or what exactly happened but it's i've never heard people say anything other than linux anymore yeah that's a good point i'm not sure what caused the transition maybe it is you know maybe that's real that would be really interesting if youtube like homogenized you know language that that uh that'd be like something really
Starting point is 01:07:47 interesting to study i'm sure some linguist is looking into that right now using disk-based server databases yeah that's right so the most obvious one you know first ones you think about are mysql sql server post postgres um you know these are ones that a lot of people have heard are MySQL, SQL Server, Postgres. These are ones that a lot of people have heard of. They've been around for a long time. They're some free, some paid, some varying licenses. Yeah, can you explain? Do you understand the MySQL license?
Starting point is 01:08:20 No, I don't understand. No, no, don't ask me about any of that. I have no idea. It looks to me like if you use the mysql you know client even the client then you have to pay them money that's how i read it but uh but that can't be true so i don't know there are people's whole jobs who are like around optimizing and selecting what the differences are between the different database i'm not that person i appreciate those people but i'm not them yeah definitely there's a whole art to making my sql run the way you want it to giving your data and patrick and i are very happy to let people much more qualified than we are do that.
Starting point is 01:09:06 That is true. Yeah, so those are in the SQL family. If you're interested in the column family, which, again, just a reminder, that's the key value where there could be multiple values, right? So, you know, column family, if you have just one column in the column family then you have a key value database so you think of the column family as being a superset of what a key value database can do um so in that category there's two big ones one is cassandra and the other is h base um the short story story is the reason why there's two, they use kind of different technologies for scaling.
Starting point is 01:09:52 So in other words, when you have to split the, to shard the database among, say, 10 or 100 computers, both each of those two, HBase and Cassandra, handle that differently. um let's just say the trade-off there is you know cassandra is easier to set up definitely if you're running on one machine cassandra is way easier to set up but even as you scale out it's it's the the setup and the maintenance is pretty easy but on the the flip side, if you do have performance issues, it can be kind of a nightmare.
Starting point is 01:10:28 That's just been my experience. HBase is harder to set up. It actually requires a lot of Apache libraries and services that you have to install manually, like Zookeeper. So you actually have to install Zookeeper, start a Zookeeper server, and then install HBase and get an HBase server. So this is actually two servers. So it's got overhead, but on the flip side,
Starting point is 01:10:58 it just seems like tuning and understanding what's going on is a lot easier with HBase. So that's column family and then for document store there's also a bunch of them my favorite is mongodb um i really think mongodb is very clever in their api i'm a big fan. I felt like, so they have this kind of cool sort of REST-based JSON API. They also have a good Java API. They are now, they've recently added full text indexing, which is pretty cool. So you can actually say this column, you know, if I want to be able to do like fuzzy text searches on this column, you can just annotate the column and boom, you get fuzzy text searches. So just to recap, like if, if the column was, you know, say, a web page, let's say let's say one column of your database or one document
Starting point is 01:11:59 of your database is a website and all the XML or all the HTML for a website. You could just turn on this full-text indexing, and now you have a search engine over the whole web. Now, of course, you know, I mean, it's probably not optimized for that much data, but you get the idea. It's still pretty powerful. If you're building your own wiki or something like that,
Starting point is 01:12:26 it's pretty cool to kind of get full-text searching for free. There's also RethinkDB, which we talked about at the beginning of the show. They're definitely new guys on the block, so they are iterating very quickly. I think it'll be pretty cool, but we'll have to sort of wait and see what they come out with in the future. It's still kind of early. All right. Well, that about wraps it up.
Starting point is 01:12:53 I'm hoping that we didn't repeat too much of what we said when we did query languages, but that was like a long time ago. It's like 14 episodes ago. So hopefully there's not too much of a repeat. And thank you guys for all your emails get lots of on our Google Plus page or community I guess it's a community now people post in there and that's cool
Starting point is 01:13:17 we see them even if we don't respond we do see them and emails to us are always appreciated yeah thanks Ash for your question that kind of got the whole episode kicked off so uh yeah definitely you know post on the community um if you have any questions or any any feedback or anything like that uh we do read it so um and it does give us a lot of inspiration for the show and things like that um uh we will definitely do swift as the next episode
Starting point is 01:13:47 um we'll we'll have to uh beef up on our swift uh between between now and then uh but uh yeah i'm looking forward to that one so cool i think that's a wrap all right cool see you guys in a couple of weeks.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.