Coding Blocks - Overview of Object Oriented, Wide Column, and Vector Databases

Starting point is 00:00:00 We now interrupt your regularly scheduled programming to bring you Coding Vlog! Guess who's back? Back again. Michael's back. My friend. That's awesome. Guess who's back? Guess who's back?

Starting point is 00:00:19 All right. So, hey, how you doing? What are you guys talking about? We're going to talk about some more database stuff. Hey, we're missing somebody. What's going on? What happened? You know what?

Starting point is 00:00:30 We had a call-in friend just a little while ago. When we started the recording, we sent out the invite thing, and he showed up, and he's in a moving van. Oh, yeah. Yeah, so Jay-Z's trying to use moving as an excuse, right? Like, what's up with that? Like, he couldn't have recorded from the van. I move all the time from one room to the next.

Starting point is 00:00:52 Like, I still get on the call. That's awesome. Yeah, but no, he'll be up here pretty soon in Georgia. Is he going to regret that, Georgia? I don't know. I don't know. I don't know. We're known for our beaches and our tourism, and it's a very big vacation destination. Yeah, but it's also a very hot destination. Did you think I was being serious with anything I just said?

Starting point is 00:01:21 A little bit. A little bit. Okay, well, we're going to have to work on your sarcasm meter. Well, part of it too, is my wife came in and needed something right in the middle of all that. And so, so my brain was split two or three different ways and,

Starting point is 00:01:33 and it's not working very well anyways this week. So you're throwing her under the bus on something that's going to be recorded and distributed across the world. Like we're international. So this is a big deal, obviously. that's right that's right yes um maybe she won't listen and shortly after valentine's so you know yeah through which i was sick now we know why yeah right uh i've got to make that up this weekend by by the way. That's a good reminder there. Thanks, LL. You just saved my tail. It doesn't sound like you're ready for it.

Starting point is 00:02:09 I'm not. I'm totally not. So yes, this episode, we're on 228. And last episode, we talked about six different types of databases and when you might use one over the other and why you shouldn't just use one for all of them and all that. That sounds like a short episode. Six different databases. It was funny. When I put it together, I was like, maybe this won't be terribly long, and then we went over two hours.

Starting point is 00:02:38 And so we scaled back on this one because my voice isn't going to last two hours one way or the other, even if I only talk 10% of it. So, yeah. So we're going to last two hours one way or the other even if i only talk 10 of it so um yeah so we're going to pick this back up and again this i want to give brantley credit and slack because he's the one that was like hey maybe you guys you could do an episode like this we were like oh that sounds great so with that i'm alan underwood wait wait wait i thought it was alan the great that's what you said last time when you were like, you're going to change your name. That was your whole person table and you're changing your name.

Starting point is 00:03:09 Yeah. Somebody listens. Somebody listens. So really say that. Yeah. Well, Alan the Great, it is. I mean, I'm good with that. Here we go.

Starting point is 00:03:19 You don't even remember what you said. No, man. I told you my brain is not working. Okay. Well, you realize this is a podcast about coding, right? Okay. I've heard. I've heard.

Starting point is 00:03:31 So let's talk about cars. Jay-Z isn't here to stop us. All right. Well, I'm Michael. What's your title? What's your title? Just Michael? Michael Outlaw.

Starting point is 00:03:39 Yes. Thank you. Thank you for recognizing. All right. All right. There we go. All right, all right, there we go. All right, so first up, we need Outlaw back here to do this portion for us. And I heard how, like, you know, Jay-Z was making fun of me.

Starting point is 00:03:53 I didn't think he was making fun of me. He was struggling. The proper now is, like, iTunes and Spotify, like, the well-known ones, and instead, you know, getting all the made-up handles that people would use like those parts he nailed but itunes yeah i heard that i heard that all right so uh from difficult man yeah well there's one in here that's definitely going to get me for sure so uh from itunes calum 55555 thank you uh from audible wood to prog from spotify we have ian ghost merc and if you haven't heard your name yet, you know I'm talking about you. Really?

Starting point is 00:04:51 Zirath? No, how would a name that starts with an X be pronounced? Xavier. So I guess you'd just say X, right? No, you'd say Zuh. But why would you say Zai? There's no I there. Or why?

Starting point is 00:05:04 What did you say? Zirath? Zureith? Z zureth i'm sorry okay so it's got to be one of those five things i've said but it's probably not and that's because we all know that proper names are my kryptonite and yeah so we found it that's amazing hey and calum sorry if you threw out your back um listening to an episode oh yeah just you know disclaimer um we're not uh liable for any injuries that occur while listening to this podcast so yes do we need like a whole uh you know how like how on uh well i guess this is only a thing in the U.S. from other things that we've read before or I've read before where like only in the U.S. do they advertise medications, but there'll be like a big legalese thing block at the end of the commercial for like all the side effects. Maybe we need to have like a lawyer read like all the things that coding blocks is not responsible for and the side effects of coding blocks and yeah they can be talking about zip medication on on a commercial and they're like may cause heart failure and kidney disease and oh oh maybe i'll

Starting point is 00:06:12 keep those this i think it was i want to say it was like a reddit thing that i read it where i read it i read it i read it good um maybe that's why they named it that um where they were talking about things that, you know, like, like people who weren't from America that would come visit America and things that they found surprising to them. And one of them was seeing manufacturers of different medications,

Starting point is 00:06:38 advertise the medication on TV. Like there'd be, you know, commercials promoting it's use. And yeah. So I guess like some of the like Saturday'd be, you know, commercials promoting it to use. And yeah, so I guess like some of the like Saturday night live type skits, do you remember those from back in like, I don't know,

Starting point is 00:06:51 10, 20 years ago where they would have like the, the side effects, you know, the, the, the people in the commercial would start listening to the side effects and they're like,

Starting point is 00:07:00 Hey, wait a minute, you know, like may cause a desire to kill your, your, your business partner. And the other person was like, Whoa, wait a minute. You know, like, may cause a desire to kill your business partner. And the other person's like, whoa, wait a minute. Hey, let's not take that. All right, so what are we talking about tonight then?

Starting point is 00:07:15 Oh, wait, we forgot. Yeah, I'm sorry. We have one more thing. So we've been selling this for Jay-Z because he had planned on being there. There's Orlando Code Camp coming up february 24th so right after this episode drops again if you're in the area i mean the three of us have been to it before it is a terrific event that they set up there so you know if you're around there

Starting point is 00:07:36 like within an hour's drive i'd say go check it out i mean you'll have a good time you'll meet people and you'll learn some stuff so uh and we'll probably have a link in the show notes for that all right but now back back to where right so once again two-thirds of the show two-thirds of the hosts are here to do the show and let's talk about blah blah blah yes let's do that all right so i'm gonna let i'm gonna let outlaw pick up this first one and and we'll chew on this so again we're talking about database engines that different types of database engines and and our reference is db-engines.com and we've used this for years right like we we've talked about different databases and whatnot for years in here so um oh and i didn't i didn't number these things so so i'm gonna let i'm gonna let outlaw take this first one and we'll chat about

Starting point is 00:08:26 this one because this one's interesting and different so this should be near and dear to our heart you would think right like just from the name of it object oriented database systems you would think like oh yeah like those those would be wildly popular and among our favorites. Now, I ask you, before this, could you have named one? No, not a single one of these. I would have said, and I'm surprised it's not on the list. Actually, I'm going to go to database engines and see if it even, well, first of all, it'd be funny if it's not even in there at all. But I would have thought that maybe Lotus Notes,

Starting point is 00:09:06 if you remember that one from way back in the day, maybe count considered really well that's what i was trying to think is like i don't see it in here though by the way let's see i think it was more like an access type thing i think it was a true relational no you definitely had objects and you could like write program. You know, you could definitely have code inside of things like it was. I don't know what it counted as, but that's what I would have thought. But it's not in this list on according to DB engines ranking. I mean, it's been a long time. Let's see. It is considered a semi-structured NoSQL database.

Starting point is 00:09:51 Okay. So, yeah, not an object. So, honestly. That's crazy, really. So, did we say this is object-oriented database management systems because this one when i first heard it i was like well how's this different than like um like no sql like object databases right like mongo and those kind of things and that that's what kind of like when i saw the list of the databases they have here that's why i was like i've never heard of any of these um i want to say wasn't neo for j one at one point in time neo um i i don't know the answer that i was going to see

Starting point is 00:10:34 like it i wasn't sure did you already have i'm sorry um i was curious about the object oriented versus the the document database to see what's considered to be the difference. Well, that's what I was going to say. That's why I was thinking, is Mongo not? No, it's not. So here's what it boils down to. When they say that they store the data in the database

Starting point is 00:11:03 the way that it's modeled in the application, it doesn't mean they're storing like a, they're not storing like a JSON document, right? It's actually storing things, I guess, like binary type things in there in whatever format. So if you have arrays and collections and all that kind of stuff, that's what it's actually doing behind the scenes. So it's not the same idea as taking whatever your model is and then crushing it into a single document. It's actually storing it in that structural type format is, I believe, the big difference between the two. Yeah, there's a not incredibly accepted answer. Only like 10 on Stack Overflow. Only has like 10 upvotes, but it was saying that the difference here is that objects,

Starting point is 00:11:52 it's the actual objects that are stored and not the JSON as you were describing. Right. So what's interesting, and we'll skip down here a little bit so so these first off let's let's mention the systems right because again we never heard them um intersystems cash was one that's not even listed on their on their ranking page um intersystems iris that was number 92 on the list uh db40 that's 161 it's ranked number 161 in the list uh object store was 154 and actian was 159 on the list never heard of a single one of those before doing this um but if we jump down hold on i mean well while you're doing that though like i gotta imagine that if you're storing the actual object that your code is using like that's got to be a big part

Starting point is 00:12:54 of the reason why these aren't as widely popular because that's going to i would imagine and and i'm coming into this you know completely naive so i i will you know accept that that's probably normally the case when you listen to these shows but but for all of us but um you know i would imagine that that's going to limit your ability to uh uh iterate on your application but i don't know maybe there's like an avro type schema something like that in the background. It's like, well, here's the version of that object. So you could load it up or, you know, still use it. I don't know. It just seemed, it just seems like it would, it, it almost feels like,

Starting point is 00:13:36 you know how there's that. We've talked about this before about like the whole separation of concerns kind of concepts, right? Like uncle Bob has preached that, you know, endlessly for decades now and has several books about it, right? And he's not the only one, right? Like, there's a whole plethora of books out there that are, you know, describing those types of things. And this almost feels like you're not. Like, this almost feels worse. Yeah. Yeah. It's interesting. interesting i mean the reason why like if you go specifically to the inner systems iris page you know they basically are just talking about

Starting point is 00:14:13 how it's very performant and you know you could do a lot of things with it and sure i totally get that right like if you're storing things natively the way that it's coming out of your app yeah i mean it'd be really fast now that's if you're going directly to that stuff i don't know how how it works querying across objects and all that kind of stuff right like i haven't i have a feeling it's it's fairly complicated but i kind of envision it as like link queries though like in terms of efficiency like you know you already have this object available and so i don't know that it wouldn't necessarily be performant i i mean if you're trying to query something that's buried three levels deep in some sort of

Starting point is 00:14:58 object structure and it has to do that across all of them you know that's why i'm saying like i don't know how in a database i know how you solve that right like you you have indexes and all that kind of stuff i don't i mean maybe it's that way here too but one of the things that's interesting from this inner systems iris page so object script and python directly manipulate and read from the storage. So like direct access type stuff, objects can also be exposed in other languages like.net JavaScript, Java and C plus plus. And then they say on top of it, you can also use a SQL syntax with it. Um, and they have JDBC drivers and that kind of stuff. So again, it's really interesting. I think, I think what outlaw said from that stack overflow is really the big difference, right? Like it is actually storing those objects

Starting point is 00:15:52 directly instead of translating them or marshalling them into some sort of document or something. And there might be some performance benefits to it but it definitely has not caught on like almost all the other database types out there well i kind of view this like this is like um i could be wrong but i i kind of think of like object-oriented databases are going to be like super uh purpose like they're going to have a super specific niche of problems that they're going to go after and solve. They're not necessarily going to be your hammer database type that you're going to use for like most things, right?

Starting point is 00:16:35 Yeah, I would agree with that. I mean, on the DB Engines ranking page, they actually had a note towards the bottom of, if you go to the DB Engine site that we of if you go to the db engine site that that we mentioned and you go to the encyclopedia and you click on object-oriented database they sort of have a notes page there and at the very bottom of it they basically say that these things are sort of these object-oriented database systems have sort of fallen out of popularity because of the advent of orMs and how good

Starting point is 00:17:06 they've gotten over the years, right? Like, you know, entity framework, hibernate, whatever. So that might be, that might be the reason why they haven't caught on as much. I don't know, but I would agree though, with, with what outlaw just said is if you're going to use this, you have a very specific use case in mind, right? Like that, I don't see anybody generally going out of their way and being like, oh, let's create this thing, and we're starting with this, right? I wonder, you know what would be a good one to find out.

Starting point is 00:17:36 Let's see. Let's go back to the interwebs. If I could spell correctly, what are we talking about? Pros and cons. I'm curious what I might find here. Oh, well, is this Enos? Wasn't that one of them? EnosDB?

Starting point is 00:17:57 No. Advantages, disadvantages, complex, advantaged complex data sets can be saved and retrieved quickly and easily. And object IDs are assigned automatically. Disadvantage object databases are not widely adopted. In some situations, the high complexity what you just said, I mean, that's actually, that makes total sense. If you know that you have a person object or whatever, right. And let's just get, it's got a huge object graph under it, right? Like reports, um, health insurance, all kinds of other garbage, right? You know, that if you're getting that, you just retrieve that object and it brings you back the entire graph all at once. Right. So, so that's why they're saying it could be very performant is if you're always operating at that top that object, and it brings you back the entire graph all at once, right? So that's why they're saying it could be very performant, is if you're always operating at that top-level object, then sure.

Starting point is 00:18:50 But I imagine it would have, I think this is what you were getting at before, which I think would be a similar type of issue in a document database, where if one of those fields is an array, and you want to search for something in that array that has a specific attribute, and then maybe something else in that, and you want to join that to some other data set, that that would be where it would get problematic. Totally. If even possible.

Starting point is 00:19:17 Yeah, it's interesting. So, you know, be aware that these things exist. I've never actually seen one used. That doesn't mean that it's not in some big project out there somewhere. Well, I mean, it made the list. So that has to say something for it. And notes didn't. So, yeah, yeah, true, true.

Starting point is 00:19:36 Lotus Notes, is it in here anywhere? I mean, I did a search for notes and unless they changed the name, maybe they did change the name. It's not on the list. They said no. Yeah, apparently. They're like,, maybe they did. It's not on the list. They said no. Yeah. Apparently they're like, it's not even a database.

Starting point is 00:19:50 There are three, there are 401 items in this list. So for Lotus notes, not to make it like they really made somebody mad. I'm really curious. Did they change the name maybe? And maybe we're looking at the wrong thing. Oh, it looks like IBM sold it off to HCL.

Starting point is 00:20:09 And it's still called Nets. Interesting. Yeah. I mean, it's been a minute. All right. So, yeah, we don't have a ton more to say about that one. It's just not a super widely adopted database type. If you want to play with it, it sounds like it's kind of interesting and cool it'd be nice to know how it works but

Starting point is 00:20:30 well there was one other thing i'm sorry i was just gonna say you're probably not this is probably not going to be your first choice when you're going to look to set up a new application especially at a business you know well just being able to back up that stack overflow answer there because even though it only had the 10 upvotes you know the author i think was definitely on to something because even in the db engines encyclopedia that you mentioned there's the sentence where they said the goal was to be able to simply store the objects in a database in a way that corresponds their representation in the pro in a programming when i can't even speak. These aren't even proper nouns. What is wrong with me?

Starting point is 00:21:07 The goal was to be able to simply store the objects in a database in a way that corresponds to their representation in a programming language without the need of conversion or decomposition. Yeah, so tightly coupled, just like you said. That's really what they were going for. Yeah, so instead of me giving you back a row of data or a document of data, you know, that you then have to figure out how to parse or, you know, use or whatnot, it's like, here's a pointer to that object.

Starting point is 00:21:34 Done. Yeah, exactly. Exactly. There's no marshalling whatsoever. But I wonder, like, I said pointer though, but that's probably inaccurate because I imagine, you know, especially if your network latencies or, you know, just the network traffic in general, right? Like you're not sending a pointer back. So they have to be sending the object over the wire in some way.

Starting point is 00:21:55 But yet there's no. Yeah, it's interesting. Yeah, it's interesting. I just I honestly I can't think of a case where I would just want to use this except to experiment. like, uh, like, um, if you're going to write a software for like a Mars, you know, Rover, you know, or something like that, where it's like super limited, um, you know, hardware and everything. And maybe you don't want to take the overhead and time to do conversions and type conversions or anything like that.

Starting point is 00:22:40 Like, you don't want to have to worry about that. You're just like, here's the thing that needs to be stored. And when I query it, I want that thing exactly back as it was done and i don't want to waste time trying to you know convert things right yeah i could see that yeah i mean very limited like hardware purposes i guess or like where you're going to have limited abilities to you know do things with it. I mean, I even think about what happens when you change object schemas and stuff. Like,

Starting point is 00:23:09 is that going to be a problem? Yeah. I mean, that's what I was referring to when I made the Avro comment before, like if you needed to, if you did need to do that, like how do you, how do you iterate on your,

Starting point is 00:23:20 your design? But maybe, like I said, maybe it has like object versions like so that you would know or maybe in your code you would have like specific objects like you would have to have versions of your objects in your think about how disgusting that would be god no yeah yeah no i don't want any part of that yeah all right so next up that's all speculation by the way so somebody's gonna correct me and be like yeah yeah i mean there's probably somebody out there

Starting point is 00:23:51 that's listening that has used them and maybe they can fill us in you know feel free to drop a comment on uh codingblocks.net slash episode 228 and i'm joe by the way so if i got anything wrong yes i i guess i'm still out on the great uh so so the next one up are wide column stores now we have i know jay-z has had a little bit of experience with this uh i had messed with a little bit and these are a little bit more popular because they're all about massive horizontal scaling. And we talked about these quite a bit in data. What was it? Data-driven application something.

Starting point is 00:24:34 Designing data-driven applications. Thank you. Thank you. See, yes, I'm struggling today. So you want to tell us some of the popular ones here? Sure. So coming in at number 12, one that I'm sure you've heard, Cassandra. You've probably heard of a lot of these.

Starting point is 00:24:51 Number 12, Cassandra. Number 26, HBase. And number 27, Azure Cosmos DB. We're the most popular examples according to dbingens.com. You know what's funny about this man azure cosmos db shows up in almost every single database engine list like i would love to know what they did behind the scenes to make this thing work for everything well that one specific oh sorry and is it really that good at everything is my question right like is it truly that amazing at all of it

Starting point is 00:25:25 yeah um maybe maybe it's just the new hammer it's a globally distributed horizontally scalable multi-model database service so the primary database models for azure cosmos db a key value store and wide column db i mean that checks a lot of boxes a lot yeah where does it rank it's all managed it's ranked uh 27 oh yeah i guess i already said that yeah it was right there i put it in uh it let i when we started doing this i was like oh man it'd be nice to know how these fall because when we first talked about the relational databases like they were one two three and four right like that's that's sort of a big deal they're still kind of a thing that reminds me i did have a side to add to that throw it on there well okay so so rewind then in the last last episode, I don't know, I don't recall if this ever got called out. But, you know, part of the conversation was document DBs versus relational databases and how you could have like, you gave an example of like street where, or an address where like, Oh, well now I've got to have a street and that's going to be null for the

Starting point is 00:26:48 majority of places. Or, you know, the advantage of a document database was that you could have like kind of free form kind of things like only the properties that are needed for that specific piece are there. So you're going to save some space, blah, blah, blah. But at least I don't know if oracle does this if any i would imagine oracle does and maybe i don't recall sql server doing this at all though but uh postgres are you know are are one true love um i say that jokingly. It has the ability to do JSON as the column type. So you could have like a mixture of relational and document in the one row.

Starting point is 00:27:34 And Postgres will allow you to like query the elements of that JSON in your SQL statement like you would any other column. You know, so it kind of walks a fine line of like, let's have a little bit of the best of both worlds in these specific use cases. And I mean, obviously with everything, you know, you use it sparingly and wisely, but yeah. Yeah. Yeah. SQL server had that functionality as well, right? Like they had some JSON parsing and things in yeah. Yeah. Yeah. SQL server had that functionality as well, right?

Starting point is 00:28:05 Like they had some JSON parsing and things in it. Yeah. Now Postgres as a column type though, JSON as a column type. That's what I'm talking about. Like it's a first class citizen in, in Postgres. Oh, you might be right. I don't know if they made it a column type, but they made JSON functionality. See if they parsing in columns

Starting point is 00:28:25 maybe they did maybe they but i will tell you regardless postgres did it better in my opinion um because the the json tools that were available in sql server i think last time we touched it was like 2018 like they were a little frustrating but they were there well i mean i'm looking at it in on a microsoft document now coincidentally it did come back for 2016 sql server 2016 which is a bit long the tooth i don't know but this article is like updated 10 days ago so this is fairly recent and it was talking about using uh your your data type would be text in varchar for the for the column and then they have json functions yeah you could use that's what i remember yeah it looks like people are doing in varchar maxes with it and and then yeah they were i mean like i said the the tooling was there to be able to do some stuff, but it was not a pleasure to work with.

Starting point is 00:29:29 Yeah. I just thought that was an important distinction to make though for Postgres that, you know, they have a type. If you're listening to these, like trying to figure out like what, you know, type of database you want to use, that's an important consideration, I think. Yeah, I would agree. I would also say like outlaw, you know, be careful trying to make one thing do everything. But if you have, if you do have a use case where it's like, oh, you know, occasionally we need this document type, then, then yeah, you know, go for it. But if, if your primary use case is, oh, I've got tons of documents well then maybe you should be considering something like mongo right yeah or if maybe your primary use

Starting point is 00:30:10 case is is a search engine or you know key value or whatever right you know right yeah sure i mean no no your use cases right so sorry so back to comm Yes. Also known as extensible record stores. Clear as mud. Right? Yeah. So let's make that a little bit less muddy. They can store large numbers of dynamic columns. And what the heck does that mean? So every record has a set of columns. Well, in a regular relational database system, in a, in a schema on rights,

Starting point is 00:30:46 you have to define those columns up front, right? Like, so we talked about our address table and the wasted column was like address line two. Well, in, in a record with dynamic columns, you can just add columns that you want, right? So it's, it's almost like a document type thing where you can just add whatever you want in there. But the difference is this isn't a document. You're actually storing a record and it has these columns in it. And they say that you can have a large number of dynamic columns. Well, how many is that? They said you can store billions of columns in a record and and they say that that's why these are also sometimes described as as two-dimensional key value stores google being the og of this

Starting point is 00:31:35 category or i should say specifically google big table or i'm sorry big table as yeah as uh jay-z would prefer we pronounce it. Was it? No, it wasn't first. Was it before? They wrote the white paper, didn't they? According to the encyclopedia here, Google Bigtable is considered to be the origin of this class of database. And the publicity was based on a now classic publication.

Starting point is 00:32:03 Let's see, what was the name of that publication specifically was the big table, a distributed storage system for structured data. And it's interesting because in this original document table is not classified in big or not uppercased in big table, which is Jay-Z's gripe about big table, big table. Yes. And so this thing is a schema on read right because you can have these dynamic columns you know the record that comes back tells you what the columns are so you know you don't have to do a well-defined thing up front um now this this was a comment that the outlaw sort of made at the front. He's like, oh, so this isn't the same thing as like columnar storage in, in like a relational

Starting point is 00:32:49 database system. And we weren't supposed to talk about that. That was in private, man. Why are you throwing me under the bus? Well, I'd see, I can only do this because, because I read all this and I was like, huh, that's interesting. Um, yeah. Right. Yeah. So so what they call out is columnar storage. And if you've never heard that term, I seems like we might have mentioned it back in the day.

Starting point is 00:33:13 We've definitely talked about columnar storage. Yeah. So it's basically for being able to do like OLAP type queries out of relational database systems. You know, I think a lot of the big ones have gone to it. We know SQL Server for sure had columnar storage and a lot of the big ones have gone to it. We know SQL server for sure had columnar storage and a lot of other ones went to it. But the difference is like typically in a relational database, you're storing things in a row format, right? Well, when you go to columnar storage, it starts storing things in columns, putting the data on the columns, because it's quicker to access for doing OLAP type queries. So analytical type processing queries.

Starting point is 00:33:47 They say that the difference is wide column stores are not actually storing things in a columnar format. They are still storing them in a row format just with tons of columns on them. So it is a different storage format and technique than the other one completely. I could have sworn, but now I'm thinking I'm wrong that we've made the example or talked about the example of the back,

Starting point is 00:34:18 like the book index versus the table of contents. Oh, we might've in that, that, that the index was more example. But now I feel like that's wrong that I'm thinking of something else like a reverse index. I think maybe. I can't remember, man. That, that seems like when we were talking about the the formats, the log formats that they were writing out. It was probably, it probably was in the early discussions of the,

Starting point is 00:34:47 let me see if I can find that. Like SS, SS tables and. Yeah, it was probably early discussions around designing data intensive applications. Yeah. Like the right ahead logs and all that kind of stuff in the formats that those do those in. But so because Cassandra is so popular, that that is kind of the one that I went to go grab the information from.

Starting point is 00:35:11 And there are some very so we'll have a link to it here. It's Cassandra dot Apache dot org. And they have a basics page. And one of the key, probably most important things of it is it is hyper horizontally scalable and when we say hyper you could they even had an example where they're like oh you could add 8 000 nodes and outlaw found something oh man i'm so good we did talk about it and i was correct that i was wrong in trying to like make the association of the index to the columnar storage, but it was the inverted index. And it was the search-driven apps was the title of the episode.

Starting point is 00:35:54 It was, what was that episode number? 83? Yeah, 83. Episode 83. Search-driven apps. search driven apps and we had talked about how the what is called an index is actually an inverted index because it tells you where to go for a specific word versus the table of contents is a forward index that tells you where to go in the book for a document or a chapter okay or because we made that analogy of the, you know,

Starting point is 00:36:25 the document to the chapter. His, his search skills are amazing on our website. So I know when I'm wrong. That's just a lot from episode 80. We're talking 83. Yeah. That's probably six years ago.

Starting point is 00:36:40 That was a, you know what? I'm going to put a LinkedIn here because there was a bunch of stuff that would be like, probably relevant to discussions about databases though like like a reverse index was part of it and inverted index inverted index search engines um things like that what's the date on that episode uh this was uh june 10th 2018 is when I published it. So, so it was almost six years ago. Yeah.

Starting point is 00:37:05 It's been a minute. That's insane. We've, we've covered some ground here a little bit. All right. So, so yes, hyper horizontally scalable.

Starting point is 00:37:16 I think I mentioned, they said that, you know, they even gave an example of like, Oh, you can add 8,000 nodes, right? Like that's a lot of,

Starting point is 00:37:22 a lot of computers to store data and retrieve data, but that's what it's there for. Um, when you do this though, like at some point they even say, look, if, if you're looking at Cassandra, you're not running a single node. It doesn't make sense to run a single node because you're not, you're not solving the problems that Cassandra was meant to solve. Right. And here's some of them. I want to run big table on a single node because you're not you're not solving the problems that cassandra was meant to solve right and here are some of them i want to run big table on a single pod right yeah we got this we got this how many how many um scuzzy connections can you make to this thing um so yeah it prevents data loss due to hardware failures if you scale it obviously right and they even talked about in and this is something that you should consider if you were doing anything for your business and you weren't going to the cloud or whatever.

Starting point is 00:38:12 You probably want to have these things in multiple regions, right? Different data centers around around like the country or multiple countries or something so that if it did fail, like if you had a fire in one place and it melted all your computers in one spot, then you're not going to lose anything because it was also being distributed elsewhere. This was pretty interesting. You have the ability to tweak throughput of reads and writes in isolation. So that's pretty interesting.

Starting point is 00:38:42 This is another one that they said is huge. This is a big deal about Cassandra is because of the way that it's set up and it's distributed quote unquote manner. Every, everything looks like a single point of entry, but on top of that, every single node acts like every other node. Like it's not like you have this one master node that that you know does all the the main stuff and then this other one down here does other things this is truly like hey every node that you hit they all act exactly the same and they all do the same function so so it it makes it a easy toto-reason-about system and how it functions.

Starting point is 00:39:27 Yeah, I feel like, you know, because they referred to it as like a masterless architecture. I don't know if they still refer to it that way, but that's the way it was referred to. Yeah, they have it in their notes that that's how they call it. Yeah, so like, because that's one of the defining characteristics in terms of when you talk about the problems related to a relational database, and you guys talked about this last time, which is that you can horizontally scale the reads, but

Starting point is 00:39:56 writes have to go through one single primary node, and then those writes, that thing is responsible for committing that transaction log, and then once they're thing is responsible for committing that transaction log and then once they're committed it can be replicated across to other ones for distributed read so your rights can't be um distributed so that's the that's the downfall of relational databases as it relates to trying to scale horizontally especially for big applications

Starting point is 00:40:22 right and in this case unless you try and get cute you can charge your databases right and then your application logic's in control of all of it and that gets hyper complicated right well that's where like yeah so different sharding techniques come into play there where you're trying to like decide what's responsible for a given part of the of the table right but in this case though, with technologies like Cassandra or wide comps or there, you're able to distribute the rights because there is no like, you know,

Starting point is 00:40:56 Matt master or primary node for those rights. But what I don't know is how they achieve that. I'm that's where like my knowledge of Cassandra is limited. I think we talked about this in the designing data intensive applications. Like basically when, when you do a right to it and it, and we sort of talk about this a little bit down here at the bottom. If you look at these last couple of bullet points there, there's a configuration set up says, hey, how many, for consistency, how

Starting point is 00:41:28 consistent does this data need to be? And you can actually do it on your query to write the data and say, hey, when I write this for me to get a success back, that it needs to be consistent by being distributed to two additional nodes. Right. And so, so you could actually tweak how important you think this data is, right? Like, Hey, I need this to be distributed 10 other nodes before I feel comfortable that it's safe. You know, I knew we had talked about this at one point in about like the primary master list type of thing. And I, but I couldn't remember where it was. It was episode one 72.

Starting point is 00:42:09 And we had talked about how, um, we mentioned another one too at the time, which isn't in the DB engines list. Katika Tama K K E T a M a. Uh, I don't see if that's in,'s not in there yeah it's not in there at all but uh it was one that we mentioned at the time but they they handle the proportion the partitioning for you so that based on the number of nodes it'll decide which node is responsible

Starting point is 00:42:39 or you know it'll randomly choose a node to be responsible for a specific set of partitions, and that's how they can distribute the writes. I mean, it's pretty cool stuff, right? I mean, this is the kind of stuff to where if you want to keep data safe and available and all that, this is the type of engine you're looking at. And we mentioned Bigtable. That's another reason why a lot of people go with Big table is because they manage the solution for you right so like this this next point here this is one of the big selling points for something like cassandra or big table is you unlike a regular relational database oracle sql server postgres mysql whatever if you want to scale those things typically you're you need more processing

Starting point is 00:43:26 power you need more ram you need you know more drives attached or whatever and the problem there is at some point you're going to run into well we have the most expensive cpu you can buy now um or kind of capped not even cpu i can remember working on like where we had our database server we bought the best ssd that we could get at the time and that was a twenty thousand dollar ssd that only housed the database and we you know but we wanted that io those are the types of things where you like get capped right and and so the the thing about cassandra and and probably hbase and other ones like that as well is you can scale this thing with cheap hardware you don't need and you know when when it's referred to in the professional sense it's commodity hardware and basically what they're

Starting point is 00:44:17 saying is you don't have to go buy some ultra high-end you know super micro motherboard that supports four cpus and all this kind of stuff to be able to do it. You could, you totally could, or you could just go buy an off the rack, you know, Asus, a regular motherboard, throw a regular CPU in it and put, put some Ram and some stuff on it. And the thing will scale out by just adding new regular computers to the thing. You know, I wonder, here's something like, you know how Jay-Z had the Dockers, the new Git or whatever? Yeah. I wonder another kind of controversial thing might be to say,

Starting point is 00:44:56 maybe the traditional relational databases are out. They're on their way out. This is the beginning of the end for them. And what I mean by that is in place of those, you have database storage technologies that can deterministically decide, hey, this particular part of the rights is handled by this server and that particular parts of the rights is handled by this server. And that particular parts of the rights are handled by some other server. And then they'll figure out how to mash it all together behind the scenes, like they'll handle replication behind the scenes. And that way,

Starting point is 00:45:36 you can horizontally scale both reads and writes, while also ensuring data integrity, you know, among it. And I'm on, you know, because like, as I'm saying, as I'm thinking through this problem and like how Cassandra solved their, their horizontally scalable rights problem, I'm like, Oh, you know what? This sounds like Kafka and Kafka didn't make the list of DB engines. And I take issue with that because I definitely consider it a type of, it's a transaction log. So we, we talked about this actually i think um

Starting point is 00:46:06 jay-z and i for a second so the reason why i think neither one of us i think both of us sort of agreed that it's not a database is because you don't query it it's a queue it is a transaction log well they do have a ksql no no no that's a different technology that's not kafka right that's that's the thing but it was a sequel it was it was isn't the k sequel by apache for kafka though no k sequel was written by confluent it was written by confluent right and it was built on top of kafka streams which is an application technology on top of kafka because we actually had that same conversation and it was like no Kafka is a message it is a fast message uh persistent message queue right and that's and that's what it's made for now whatever you want to do on top of it sure you can do all kinds

Starting point is 00:46:54 of crazy stuff right like people do it but yeah fine fine going back to what you said though because but they do the same thing though flink is in the same thing and it's not a database either and i would definitely agree that it's not but this whole idea of being able to say like hey i'm going to have n number of nodes responsible for handling whatever this task might be be that task responding to a query or be that task responding to like oh some new data come in, let me like figure out how to process it, like how Flink does. The idea that all of these things share Cassandra, Kafka, Flink is the idea that like, hey, I'm going to deterministically decide

Starting point is 00:47:36 who, which one, which node is responsible for that, that particular event, you know, query event or whatever. And, and behind the scenes though, those things will can replicate, you know, state among each other as needed, right? So that if, if the one that I deterministically decided on is not, is no longer available, I can fall back to another one, right? Yeah. I mean, so to take that a step further. So I love the question, like our relational standard relational databases that we've all known and loved for, you know, three, four decades now, since the sixties, apparently, are they, are they, are those sort of going away? And that's sort of the surprising thing, right? I don't think they

Starting point is 00:48:24 are because when you look at that database engines ranking list, they're one, two, three, and four. those sort of going away and that's sort of the surprising thing right i don't think they are because when you look at that database engines ranking list they're one two three and four yeah but but if you step in with what you were saying right there are there going to be things that do the things that those systems are good at, but make them more scalable? The answer is yes, because that's what Cosmos DB is, right? Azure Cosmos DB is basically the, hey, come use me. I'll get rid of your headaches, you know, of trying to scale your databases and all that, but you still get your, the, the same development experience that you've known for a long time. Google has one, it's called Spanner, right?

Starting point is 00:49:05 And it's the same notion. And I wouldn't be surprised if AWS also has their own version of this. I don't know what it would be. But here's the problem. And this is why I think that relational database systems haven't gone anywhere yet. Do we know is is cloud is it cheap are you asking me yeah i was just wondering i mean it depends on what you're trying to do right so what if you were trying to have a database if you're trying to have like uh you

Starting point is 00:49:42 know you want to host a database for your family of like, hey, here's our family tree. Like there might be a better way to do that more cost effectively. Sure. Then, you know, but so but if you're trying to like create the next, you know. Well, I think what was it? Pinterest was the or no Instagram was the example that you guys had talked about last time where, you know, in the article that was written in like 2012 or something that was like, we have big data problems. We get 25 images a second. Right. And now it was like, we get 1,070

Starting point is 00:50:18 plus images a second. We think it's so fast. We can't even count it, you know, like, so, so if you're trying to build the next, the next Instagram, then, you know, you, you probably want to consider, consider cloud.

Starting point is 00:50:32 I mean, it's like everything else in, in computer science, right? Like the answer is that the answer that you don't want to hear is it depends. It depends. And you really need to like,

Starting point is 00:50:42 no, like what is going to be your use case before you start making architectural decisions. Well, here's the reality, though. Right. Like if you let's put it in the simplest possible terms. If you start running into a situation to where your SQL server or your Oracle can no longer handle what you've got, and you've already dumped $100,000 into the server that's running that thing, right? Maybe even more, right? In many situations, then maybe it would make sense to be looking at something like a cloud spanner for Google or for Azure Cosmos DB. But the problem is, and this is where it starts really sucking is if you've gotten to the point to where you're tipping over a hundred thousand dollars server,

Starting point is 00:51:30 you're going to be running into some decent monthly costs on, on getting that thing running up in the cloud, right? Because, because that means you've already hit a level of data and a level of complexity and querying needs and whatever that you pay for that stuff in the cloud, right? Like you pay for the extra compute, you pay for the extra throughput, you pay for all that, right? And it's like going to a nice restaurant, right? And I don't want to make it out like

Starting point is 00:51:58 Longhorns isn't a nice restaurant, right? But if you go to Longhorn and you order yourself- That's fancy date night right there. That's fancy date night right there. And you order yourself a date night right there, that's fancy date night right there. And you order yourself a filet and it came with two sides and whatever else. And you get that bread for free right up front, which was be honest. That's why everybody goes to Longhorn because that bread, that butter. But you're in it for what? Thirty five bucks somewhere in that ballpark.

Starting point is 00:52:22 You go to a Ruth's Chris or a nicer steakhouse, you're paying for that steak. You're paying for each individual side. That's what the cloud is like, right? Oh, you want that side of mashed potatoes? Okay. Well, that's, that's fine. We'll go ahead and bill you for that. You know, you want that extra throughput. You want this? That's what it's like. And so you so you really really when you start looking at the cloud you really have start looking at hey what do i think my realistic throughput is going to be what do i think my realistic cpu needs are going to be and all that because you have to you have to budget for that stuff yeah and you know it's fair to call out that ruth's chris is

Starting point is 00:53:02 also like the most popular chain. Is it? For real? For what people would consider a nice restaurant. More than a Morton's? It was just something that I heard recently, and I just Googled it again just to see, like, hey, am I wrong? And this article was from January of this year. But now, here we go. This is for millennials.

Starting point is 00:53:28 For millennials, it's number one. So I don't know. We're almost at boomer hour. So I don't know if we want to like, you know, if that's going to be a topic. There was a request that we make that a part of the show every time. But I've never, I've never been to a Reese's chris i want to go so bad and every time i'm like dude you should i mean we have one close by to us too and it's not far at all look look i'm not i'm

Starting point is 00:53:55 not gonna say anything because i don't want to sway your opinion one way or the other you should go for sure and it just you know whatever your favorite piece of steak is just ask for that don't don't even boomer hour started early wait a minute hold on is this generally you you like it or you don't is that which you really want me to tell you i really want to know um yes it's good okay uh i can make a better steak at home okay i mean i i mean i've heard that's probably true of like you know most places places, right? I mean, these aren't like Michelin five-star restaurants. Yeah.

Starting point is 00:54:32 I mean, I guess so there, there's another popular chain. So Longhorn's good. Oh, there's another really popular chain around here. Texas Roadhouse, Texas, Texas. Yeah. Texas Roadhouse. But in Georgia. Yeah. Yeah.

Starting point is 00:54:42 It's going to be confusing to our overseas friends. Yeah. Sorry. In, in, in US. There's a franchise called texas roadhouse texas roadhouse in georgia yes and i'd say that their stakes are every bit as good as probably ruth's chris now the difference is ruth's chris they're going to be giving you prime cuts of meat at at um texas roadhouse or or even longhorn it's probably choice instead which is you know one level down but whatever i mean it's good now i will tell you there is one big difference

Starting point is 00:55:12 at ruse chris they salt pepper and butter on top that's it right that's all you need that's on a good on a good piece of steak that's all you you need. If you go to some other places, you know, they might put some other type of pepper seasoning or whatever on it to, to give it that extra flavor, to kick it up a little bit. But, but regardless, it is a good steak that, you know, but is it something that I'm just always dying to go back because I've never had as good? No, no. That's consistent with things I've heard in the past too, though.

Starting point is 00:55:44 It doesn't stop me from wanting to go just experience it yeah agreed yeah totally um just to you know kind of close the loop on this whole cassandra thing like i'm a little disappointed ourselves we haven't even brought up so our past sponsor data stacks they had the whole solution of you know giving you a managed cassandra environment uh you know for you and you know, giving you a managed Cassandra environment, uh, you know, for you and, you know, something for you to consider. In fact, you know what, I'm going to include the name of that product was Astra. Let's see, or is Astra, I should say. Anytime you start dealing with a bunch of hardware and having to make sure things are alive and all that. I mean, it turns into a pain. Oh, Hey, one other thing, one other really

Starting point is 00:56:22 important thing to note here, and this is why Cassandra is so popular in terms of this whole distributed, you know, uh, wide stored wide column storage is every node you add is linear scalability. So that's, that's a big deal, right? Like, so if you have one node and it can handle 1,000 queries a second, if you add a second node, then you can handle 2,000 queries a second if you add a third. So I'm sure that it's not 100% linear, right? There's always going to be a cost overhead with any kind of distributed

Starting point is 00:56:57 network traffic and all that. But that's what they actually tell you on their pages. That is the glorious thing of it. Besides backing up your data and making sure it's consistent and available and all that being able to scale it is a very linear ad and nodes. And you have in times the performance roughly. So pretty cool. I would imagine though, that like you started with one node and then go into two, but as

Starting point is 00:57:21 we've established with Cassandra, you would never have a single node set up because as you were saying that in my mind, I was like, well, wait a minute, how can you make that guarantee? Because as you add nodes, you're going to have replication overhead and, you know, uh, that's going to bite into your, your available bandwidth, you know, your IO bandwidth, uh, both on disc and network, blah, blah, blah. But then I, that's why I wanted to call out like, Oh, well, because one node is never really realistic so you're probably starting with some minimum number and probably five if that well let's say that the minimum number was three right and you might have replication of two

Starting point is 00:57:54 right then you know if your replication is always two even if you did go from three to five it's still replication of two right right so uh you know every every node is going to have the same number of replication reads and writes in addition to incoming query reads and writes so that's where you can guarantee like oh it's going to grow linearly because their replication count isn't is probably not going to change right because i'm thinking from it from like a kafka point of view right like your replication count isn't is probably not going to change right because i'm thinking from it from like a kafka point of view right like your replication count isn't going to change just because you're adding right you add 10 more nodes you're still only replicating the two every time you do your right or whatever so yeah yeah it's it's it's pretty interesting it's a very and

Starting point is 00:58:39 there's a reason why it's popular so all right. So let's switch everything over to wide column stores. You've convinced me. That's right. Done. Big to bowl all the things. That's right. All righty. Well, I guess we won't bother with mental blocks.

Starting point is 00:58:56 We just already know that we're mental, and that's the only things that blockheads need to know. That's right uh but so if you want to be one of the lucky few that leave us a review and hear your name called out um send your difficult to read names you can find uh some helpful links at codingblocks.net slash review and we do greatly appreciate reading those reviews uh some of them are comical, you know, and like the one that Alan caught out before from Ian? No.

Starting point is 00:59:32 No, it was Kalem about, you know, don't try to listen to what you're working on or you're going to break your back. You know, but then we also get some really heartfelt ones too, so we do really appreciate, and it gives us inspiration to keep going sometimes, because sometimes there's happy moments, and sometimes there's darker moments.

Starting point is 00:59:55 Yeah. Wow, that took a dark turn. Sometimes we're sick. That took a dark turn. Sometimes we're sick. Sometimes we're moving. Yeah. There's all kinds of things going on.

Starting point is 01:00:02 I feel like I should grab a pipe and put on a coat. These are dark days with coding blocks that's right all right well let's get into vector database systems um dude this one all right this one is new to me completely never even heard of them and it's sort of mind-blowing so okay i'm coming into this completely cold right i am originally when i saw the name vector i was thinking graph but that is not the case uh-huh that's what i'm saying this one so behind the scenes how the sausage is made i spent more time learning about this one than all the others combined well i guess in fairness most of the other ones i've touched over the years but this one being brand new this was truly a just a learning experience and it's uh i need to go in and find the, the rankings for these cause I forgot to put them in there.

Starting point is 01:01:06 So the first, the first, go ahead. No, no, no. Go. I was,

Starting point is 01:01:11 I was going to read some stuff about the vector DBs if you wanted to look that up. Okay. You do that real quick. I've got that link down there, uh, to the pine cone is probably the best that I've seen. Well,

Starting point is 01:01:21 I'll start with it. What's in the encyclopedia. Just that these are systems that specialize in storing uh how did they word it they had systems optimized for efficient storage indexing and querying of highly of high dimensional vector data that use special algorithms and data structures to support a similarity search use like often used in machine learning or data mining with a focus on performance, scalability and flexibility.

Starting point is 01:01:55 Right now. Well, I feel like I just read like a marketing thing, like some marketing guru came to you and just read like every buzzword that he's recently seen you're like uh-huh yeah what were the requirements again we'll buy it what do you what did you actually want this thing to do yeah so one thing that's interesting here so he just read that and and i mentioned this on the previous episode, like these, these database websites that you go to can be an absolute wealth of information, like a hundred percent.

Starting point is 01:02:29 Now what's surprising to me is this. So the popular ones are KDB. That was ranked number 52 overall out of the 401. Pinecone was 103. So it's like, you know, double as far back in the list. And then chroma was one 39. Now, the reason why I'm bringing this up, because it's surprising to me, if you go to the pine cone, it's pine cone, pine cone.io. If you go to that website, it is fantastic. If you want to learn anything about a vector database, I mean, it's better than just about anything else out there

Starting point is 01:03:18 that you could just Google and search for. Like, it's just, they do so good on it. But the thing that was surprising to me is it's like ranked second in this list of them. But if you go to the KDB database website, it doesn't feel like it's on level at all. Now, maybe it's an amazing thing. I don't know. But go ahead. Well, I think that's because it's, you know, like in database engines, they say that it's a high-performance time series database and that the primary database models are time series and vector. So I'm assuming that the reason why it is ranked higher is because it's not just one thing.

Starting point is 01:04:02 Like the other examples that you have here, like pine cone are only vector database models. Yeah, that might be it. But you know what? I don't know, man. It just maybe I'm even I'm not even on their site. Hold on. Where is the KDB site? It's KX.com.

Starting point is 01:04:19 KX.com. Here we go. Which why wouldn't it be KDB.com? I don't know. Yeah, they got products. What is KDB.com here we go which why wouldn't it be kdb.com i don't know yeah they got products what is kdb.com yeah i don't know if i'm if i if you lose me here it's because i just got hacksword on my computer machine by by typing in a random uh name. That's always safe. Yeah. So anyways, I just bring that up because Pine Cone's website is truly, if you want to talk about here, which is going to be sort of a deep dive into what it's actually storing. Because if you don't understand that, then it

Starting point is 01:05:12 doesn't even matter to you that it's a vector database. And that's why I spent so much time on this because first off, it was fascinating. And secondly, you know, if we're going to talk about it, we should at least be able to speak a little bit about it. So, yeah. So what is this thing? He already said it. It stores. This is a technical term. It stores vector embeddings.

Starting point is 01:05:40 And it's able to retrieve them quickly. Okay. So that probably means something to 5% of the people listening. Maybe I'm being generous. I don't know. It didn't mean anything to me when I read it. Yeah, I'm reading through the problems they're trying to solve, and now I'm kind of starting to understand.

Starting point is 01:05:59 But I don't know. Well, let me see what you already had here. I don't want to like – because I was going specific off the pine cone documentation. So maybe we keep going with what you got. All right. Yeah. I mean, if you click on that page, I mean, I'm basically going straight down it now. I don't have as much detail as they have on the page. They like, you know, talk through some examples and stuff.

Starting point is 01:06:18 But. Oh, well, you didn't do this part. They start with like the problems they're trying to solve. So maybe we do go through this part first. If you build a traditional application, your data structures are represented as objects that probably come from a database.

Starting point is 01:06:34 Your objects have properties that might map directly to a column, etc. Then over time, as those properties as the number of properties grow grow so do the objects to the point where you need to be more intentional about which properties you want for a given task and you end up creating specialized representations for some of these objects

Starting point is 01:06:55 you know instead of having like very fat objects so like i'm trying to think of an example where we had a very wide table, but we might have had an object that only had like a smaller number of those properties. But, but I mean, we've all done it, especially in like select results. Right. if you're more relying on like stored procedures or routines and your objects represent the results coming back from those queries instead of like the full objects of the, of the database. Right. And then they go on to say, if you're dealing with unstructured data, you know, you go through a similar process except you know, it's more on the, the code side.

Starting point is 01:07:44 But with vector embeddings, it's a form of automatic feature engineering where instead of manually picking which things you want from your data, instead you have a pre-trained machine learning model that will produce a representation of this data that's more compact while preserving what's meaningful about it. Okay, yeah, so you're definitely getting deeper into the weeds there to bring that. So I like where you're starting with reversing why you would do this. So what you just described is what we're trying to store. Yes. A use case of it. Like there's a few that they have an examples page. A couple of use cases are semantic search. So

Starting point is 01:08:26 if you think about something like elastic search, we've talked about this in the past, you, there are like words that are known to have certain synonyms, right? So, so if you were to search for, you know, whatever sentence, it's going to sort of do a smart replacement of some of those words in that sentence to try and find anything that's similar, right? So that's how regular search engine type stuff works. But if you're using something like a vector database, what you're trying to get to is a better semantic search. So instead of searching for words that have similar meanings, you're trying to search for sentences that have similar meanings,

Starting point is 01:09:05 right? And so what it does is it stores meanings of things in a 3D space, if you could think of it, right? Like imagine that you have this sphere in front of you, and this is sort of a simplistic view of it, but you have one sentence and it's dead center of that sphere, right? Like right in the middle. Then another sentence comes along and it means roughly the same thing sentence and it's dead center of that sphere, right? Like right in the middle. Then another sentence comes along and it means roughly the same thing. So it's going to be sort of close to it, right? Like maybe it's behind it or on top of it or to the left of it or the right of it or whatever. But it's going to be located somewhere near that first one.

Starting point is 01:09:42 Now you have another sentence that comes in that means nothing like it, right? Like the first two were talking about computer stuff and the third one was talking about cooking. It's going to be on the outer edge of that sphere, right? Like, so that's what this vector storage is doing is putting things close together that have similar meanings. So semantic search was an example of that. Another one was audio search, right? Like you could, you could take audio files and it could do like spectral analysis of, of the, the waves and the patterns in the audio and anything that has similar type things to that would be located in similar space in that vector, like in this, in this 3d thing. And I think it's more complex than just a 3d plane, right? Because there could be multiple, uh, I guess, layers of this, but at any rate, so those are the types of problems you're trying to solve is instead of

Starting point is 01:10:31 the, I hate to say it this way, but like the simplistic type things that we've done for, for many decades as developers, right? Like, um, swapping out words, you're now trying to plot meanings of things and relationships to things, which is how AI models and stuff work nowadays, right? Like it's, there's a comprehension sort of to these things that is all done mathematically. Well, yeah, that's what I was, where like, I kind of had this like light bulb moment because I made the comment a moment ago about the graphs, but I was mistakenly thinking of like edges and vertices when I heard the word vector. And so I'm thinking like, Oh wait, like graphs,

Starting point is 01:11:16 like net graph networking, like that kind of thing. Is that what we mean? And then I forgot like, Oh wait, no, no, no vector.

Starting point is 01:11:22 As in like the math term vector. Yes. That's the type of thing that we're talking about. Yep. And that's where the name is coming from. So then I forgot like oh wait no no vector as in like the math term vector yes that's the type of thing that we're talking about yep and that's where the name is coming from so then i was like oh okay all right so not that like that helps to clarify it all right this is still a very a very complex piece part of it though for sure yeah and it's a very small piece of it and this is when i say small piece of it i guess every database system is's a very small piece of it. And this is when I say small piece of it, I guess every database system is sort of a small piece of whatever the overall thing is. Right. So let's go ahead and, and go through the, the vector embeddings for developers. So this is a webpage

Starting point is 01:11:59 they have on the pine cone website, and it's absolutely fantastic. I'm going to talk through it here and you know outlaw and i'll be bouncing stuff back and forth but by all means i highly recommend going and checking this out because they have some great visualizations they have some good information on that page and there'll be plenty of links to the uh to to this document as well as other others in the show notes yeah for sure so what is a vector so just like outlaw said a second ago it's a mathematical structure that has a size and a direction so if you think if you know if you go back to math days and you just have your two-dimensional graphs

Starting point is 01:12:37 right you have an x and a y and you put a point on there well that that was 2d space if you open that up like like I said, if you maybe think about it as a sphere or something, you now have an X, Y, and a Z access, right? So there's, you're somewhere in a three-dimensional room, like in a room in your house, right? If you could put a point somewhere in your room and just float it there and then make it a bigger or a smaller ball, right? Let's just say that the, that point is going to be represented by like a tennis ball or a basketball or something bigger or Let's just say that that point is going to be represented by like a tennis ball or a basketball or something bigger or smaller. That's kind of what this is. It is a

Starting point is 01:13:11 plot in 3d space with a size to it. All right. I already said that. And you know, you could think of it as zero comma zero comma zero, if you wanted it at the origin, I guess, is what we used to call that thing. And now they say for developers, it's usually easy to think about it as an array of numbers, which I mean, sure, fine, whatever. I don't know why we can't think about it the other way. But but sure enough, that's that's how we're going to represent it. OK, now this is where things get important. This is kind of what I was talking about with semantic search, right? Like if you think about vectors in space, there's some that are going to be closely put together, right?

Starting point is 01:13:56 Like things that are more related, like if we're talking about two people, they might be closely related. If you talk about a bicycle, it's going to be a little bit further out, right? But it's still associated with people, so it might be somewhat closer by and then if you talk you know about a piece of grass then maybe it's way outside because it's got nothing to do with either of them right so depending on how you're trying to model the data they're going to be closer or further apart in space oh also i thought that that should be like a new uh you know vectors in space awesome all right so how you said it made me think of that man i i'm probably gonna say all kinds of weird stuff that

Starting point is 01:14:39 will sound like it should be from something stupid this one gets this one gets so deep though because like the the thing about like any machine learning um type conversation is when we talk about it on a two plane level right x y it's easy for us to conceptualize like is this close to the zero zero point or is it not right right and and And, and then you say, okay, well I can introduce Z. So I have X, Y, Z. Now it's like three dimensional. Um, and it's, it's a little bit still okay for us to grasp, but then when you start getting beyond that plane, three planes, then it gets really, uh, difficult to even visualize some of that stuff. Right.

Starting point is 01:15:29 Totally. Um, but I think for the purpose of this, like, let's keep it simple, you know, a couple of planes. Yeah. I think, I think three is probably good. So they, they talk about the fact that vectors are extremely useful in machine learning because CPUs and GPUs are really good at math. Right. And, and that's why, you know, if you, if you haven't done machine learning type stuff and you've been curious a lot of times that's where, you know, people be buying these high end, uh, Nvidia cards, right? Because these CUDA cores are good at doing all this type of math stuff.

Starting point is 01:16:04 So this is where we start moving a little bit further. So vector embedding. So we've been talking about vectors, right? The actual data points, vector embeddings. This is the process of converting virtually any data structure into vectors. So how do you actually get those plots? And they say that it's not quite as simple as just a straight conversion, right? And the reason is, is you don't want to lose the data's meaning. Now, I can't even fathom how some of this stuff works, you know, in actual practice. So what they're saying is, if you were're comparing two sentences you wouldn't just compare the words you want to compare if the two sentences have the same meaning and you know for for anybody that

Starting point is 01:16:54 has learned english which if you're listening to this i assume you have like there's all kinds of ways to say things that all mean the same thing right like i mean oh it's hard you can have the same sentence mean different things yeah totally right mary had a little lamb is my favorite example mary had a little lamb what does that mean did she own it did she eat it something else like yeah it reminds me of, of comma skill. Uh, we went to eat grandma. If you leave that comment out at the end before grandma, we went to eat.

Starting point is 01:17:34 Yeah. But in the case that the, but in the case, the example that I gave though, like it's not even about, you know, grammatically like the comma or anything. It's literally the words, the,

Starting point is 01:17:44 the, the sentence could be written exactly the same. Mary had a little lamb and it, but yet it could mean different things. It's, it is incredibly difficult. So like what they're doing with large language models and all that kind of stuff now are crazy impressive.

Starting point is 01:17:59 So to keep the meaning and produce vectors with relationships that make sense, this requires embedding models. Now these, this is where it gets going a little bit further, right? So they say many embedding models are created by passing large sets of labeled data to neural networks. If you have not worked with machine learning at all, I've only been on the outside of it. I know outlaw was a little bit further into it, but labeled data is basically the easiest thing I can think of is let's say that you're labeling photos. You know, you get photo data coming in. You know, there might be a simple algorithm says this, a cat or a dog.

Starting point is 01:18:42 And when the photo comes in, say what captions we're training Google. Yeah, basically more or less. Um, so you get that photo and, and the whole purpose of that one particular function is to put a label on it, cat, dog, or maybe it couldn't find either one. Right. And so it doesn't put a label on it. And so as these things come through, you know, you might have a thousand photos come through and you've labeled them cats or dogs. So that's that's what this labeled data is. It's like a person has made a decision on what this data is, and then you're feeding that into the machine learning algorithm to say, like, this is an example of a cat. If you see something similar to this.

Starting point is 01:19:26 Yes. Just what he said. So neural networks are trained using supervised learning, typically, not always, but typically. And that supervised learning is what outlaw just said, you know, you already had a person pass judgment on what this label should be. You give it to this neural network and, and you tell it what you think it should be. And so the neural network starts learning based off the inputs that it gets. Right. And the reason they're called neural networks, in case anybody hasn't seen these things,

Starting point is 01:19:56 is this just huge nodes of machines out there that are just processing data nonstop using mathematical type functions, right? Like it's passing things from one, like one function to the next function to the next function. And they're all just kind of chained together. Right. I think that's a decently easy way to describe it. I thought it was, I thought the name came from, um,

Starting point is 01:20:17 like modeling the human brain though. Oh, it probably came from. Yeah. Cause you're making tons of decisions on everything that you see at every point right let's see i'm trying to see if i find something about where it came from but all right all right so while he's looking for that the next so using the supervised model you pass in these large sets of data as pairs of inputs and labeled outputs right so you gave it its stuff and you told it what you think it should be. The values are then transformed in each layer of the neural network.

Starting point is 01:20:49 So as it goes through its functions, is this a cat or is this an animal? Yes. Is it a cat? Yes. Is it a dog? Yes. Whatever, right? Like it's going through this whole chain of questions.

Starting point is 01:21:01 With each training of the neural network, it says that activations at each layer are modified. And I assume that's, it's basically tweaking its model to know, you know, Hey, if I see something that looks kind of like this, then it's, then it's a dog. If it, if it has this feature, then it's a cat, right? Like those are its activations. And then at the end of this, the goal is that eventually the neural network will be able to provide an output for any given input. Um, even if it hasn't seen that specific input before, right? So you fed it a thousand pictures of these animals that were cats and dogs. And when you feed it a thousand and one that wasn't in that original, it's going to look at

Starting point is 01:21:42 the features that it determined made you decide that something was a cat versus a dog., it's going to look at the features that it determined made you decide that something was a cat versus a dog. And it's going to make that decision now, right? Because it's trained itself to sort of understand what it is. Now, the embedding model itself is essentially all those layers of that neural network minus the last one that did the labeling of the data. So rather than getting the label data, you're getting that last layer right before it made the decision. And that is sort of, I guess the whole, that's why it's called an embedding model is because it knows how to figure out what it is that you're doing. And that is what you're storing in the vector database is this embedding model. So it's, it's super interesting that you're not storing results in this database.

Starting point is 01:22:34 You're storing the thing that determines the outputs of inputs in this database. And those things are being put in spaces to where they think that they're close in relation to each other in that database. So like in, in maybe like layman's terms kind of speak, you're storing the interpretation of previous data. So based on, I've seen this thing before and I've been told that this is that blah, blah, blah. So I interpret that to mean if I ever see these characteristics, then that can mean blah. Yeah. Yeah. It's putting these things close together. And then I think outlaw you, you're on that page.

Starting point is 01:23:18 One of the things that's really cool is they, you know, they show you a sample of one of the popular ones. It's called Word2Vec, which is basically for showing you words that are similar in this 3D space. It's really cool. But right in the middle of this word vector thing, this word vector cloud maybe is the word located. And then right around it, you'll see things like Northwest nearby housed area occupies. That makes sense.

Starting point is 01:23:54 Located. It has that meaning all the way on the far outsides of these things that has the word placed or found found. Yeah. Like it knows that, that, and these aren aren't the these aren't the furthest apart right like these are just ones that are on sort of the fringe edge that it still thinks are somewhat related but there's tons of dots out there that are like nowhere in the ballpark and it doesn't even show them so it's a really cool visualization to show you kind of like what you're looking for

Starting point is 01:24:24 the output of these things to be and how it's being stored, right? Like this is visually how it's being stored in that database. I mean, if you haven't already figured out, you definitely need to have a pretty good understanding of machine learning. If you're going to use a vector database. Yeah. Right. You're not jumping into this. I looked cause I was curious about the name for the neural network. And yeah, it comes from the biological neural network. So like according to Wikipedia, a neural network, also called a biological neural network, is an interconnected population of neurons.

Starting point is 01:24:57 And closely related are machine learning, artificial neural networks so the machine learning models are inspired by the biological neural networks and consist of artificial neurons which are the mathematical functions which are the activations that you referred to because those are activation functions that are designed to be analogous to the mechanisms of the neural circuits i mean mean, it's like, think about like, I think the idea is that like, if you had to, I'm trying to put this in like the simplify the can, but like you, you, you get some kind of input either visually, you know, or one of your various senses,

Starting point is 01:25:38 you get some kind of input into your brain. And so like all these like synapses fire, right. But there's like a connection of those things that fire. And there are people that like study the brains and they'll show like different parts of the brains lighting up. And like, why does that part light up for here? And for there, like, Oh, there's memory there's love or there's anger or whatever the different things are. Right. And, and that's, what's happening.

Starting point is 01:25:58 We're trying to model that same type of firing in math. Right. So that like in that cloud that you showed uh or that you referenced in that that pine cone uh document depending on what your context is like the firing is going to lead you down a specific part of that path yeah it's super cool yeah i probably dumbed that down badly no that was somebody's gonna say No, that was a really good example, right? I mean, this is not simple stuff, right? I don't know. Outlaw, were you on a call recently where some guy was geeking out on?

Starting point is 01:26:41 Yes. Oh, my dear God, man. I think we were talking about large language models or something yes we were and i was lost slide one yeah i'm trying to remember that specific without getting too detailed but i remember it i remember one of the slides was like layer after layer after layer and like how one would feed into the next and feed in the next and feed into the next and things like that like that's where like um the the the topic of machine learning is awesome it's amazing it's hyper complex but it also varies greatly depending on like that level of complexity i don't say that to turn anybody off

Starting point is 01:27:27 from it who's not into it because if you're if you're on the using side of it then it doesn't necessarily have to be as complex right right i'm not trying to say that it's not still complex but it doesn't have to be as complex because there's plenty of libraries that have done the hard math for you and and there are i remember like back in the day we talked about microsoft had various diagrams of like hey depending on what type of thing you want to solve like here's the type of machine learning algorithm you want to use so that there's that kind of thing that already exists for you but where it gets super more complex is if you want to be on the side of let's create new practices and patterns and algorithms and new mathematical models to do that. That's where it gets like hyper complex. And that's where the conversation that you're referring to was kind of more on the path of, hey, what if we flip the script and we started doing things?

Starting point is 01:28:31 And that's the type of conversation for a very targeted audience that might not have necessarily been all in attendance at that particular time, but it was still great, but also complex and hard. Yeah, I mean, it would almost be like somebody doing a deep dive on data structure to somebody who'd never been introduced to computer science at all, right? Like, that's the kind of level of things where people are just sitting there going, huh? Right? And, I mean, I have no doubt that this talk was put in front of the right people. People would be like, oh, yeah, yeah, yeah, you know, but put but put in front of people that like hadn't even been introduced to the concepts, right? Like think organic chemistry before you get into any of it. Like it's, it's that level of, Oh man, okay. I

Starting point is 01:29:14 need to go back and do a bunch of studying. And that's why this was so interesting to me, right? Like this whole vector embedding thing, like until you get to what you're actually trying to story like, oh, okay, I get it now, right? Like, I'm sure that all the stuff to get to this point is complex. But like outlaw said, like, even even so Microsoft, the reason why they even publish that thing that told you the types of algorithms that you might want to use for different situations is they've already got those models, right? So, so Azure has tons of machine learning models that you can just use as a developer, which is fantastic, right? And if you're doing.NET development, they have the libraries that you can just import in and say, hey, use this machine learning model for this particular data set. And then once you took the output of that kind of stuff, you could use it to potentially drop things into these vector databases.

Starting point is 01:30:08 You know, I said Microsoft, and then I remembered, as I started to go searching for that to include that, it wasn't Microsoft. It was Psychic Learn that had the really good one, which is a Python library, you know, a super easy introduction to it, you know, a very friendly library for playing around with machine learning. And also too, if you've never like done anything machine learning and it sounds daunting, just, I think if you started with decision trees as like your first entry point into it you'd be like oh wait that's that counts like oh that's not so bad and then and then you could get into the more complex like neural networks are definitely among the more uh you know complex

Starting point is 01:30:58 types uh that you get in you get into it with but see if i can i'll see if i can find that psychic learn uh drawing and i'll include that in the links excellent yeah so we uh yeah we've now covered what this thing is right so it's it's uh it's pretty cool stuff like i said i i'd never really heard of it and after digging into it i was oh man, now I want to use it. But that means I have to go learn some stuff. So, you know, and we led with like some of the things they had. As a matter of fact, the Pinecone, again, their webpage was just fantastic. They have an examples page.

Starting point is 01:31:40 And I want to list a couple of them in the show notes, but semantic search was one. They had chatbot agents. page and only listed a couple of them in the show notes, but semantic search was one. Um, they had chat bot agents, they had, um, retrieval augmentation, image searches. They had all kinds of stuff that you could do with this. So, I mean, if you can think of how you want some sort of relationship modeling to happen, this is a good good a good choice for you using as a data store so we've actually talked about psychic learn a couple times so yeah i believe so yeah i remember hearing it let's see episode 152 and in episode 92 well well, maybe not 92. Azure Cosmos DB came up in episode 92. Cosmos DB.

Starting point is 01:32:32 I mean, that and some of the Google products, and like I said, Amazon has them as well. I wonder, let me see. AWS equivalent of Cosmos DB. What do they got? Let's see. Oh, here it it is they're saying dynamo db i don't find that to be unless they've just really extended what dynamo db is that doesn't sound right so there there's a um a drawing from psychic learn called choosing the right estimator and you know it tells you like okay well how much data are

Starting point is 01:33:06 you going to have what are you trying to do are you trying to categorize this thing are you trying to label it you're trying to you know and and it'll tell you until it'll group you into a type let me say that again it'll send you down a path of like here's a group of machine learning algorithms that might be appropriate for the type of job you're trying to do and you go from there oh that's beautiful yeah a nice little uh path it reminds me of like a zoo a zoo map yeah walking through it literally like you know you're at the mall start here you are here yes oh this is beautiful yeah definitely check that out and also by the way i mean since we're talking about vector databases and how closely that ties into machine learning like psychic learning actually has like a really good um starting point for machine learning type um

Starting point is 01:34:00 knowledge you know like algorithms and whatnot And it's kind of friendly to, to use. Hey, also, um, it sounds like he's saying psychic learn to me anyways. It's like science kit. Learn is what it is.

Starting point is 01:34:17 So S C I K I T dash learn.org for anybody that doesn't go to the show notes. So, um, we'll just chalk that up as another proper noun that i can't say it might also be that i can't hear well because my whole head's congested it's some combination of those two all right all right well yeah now i know all about vector databases and we only did three this time and we're still an hour and 40. Oh, wait, were we supposed to hit the record button? Oh, good God.

Starting point is 01:34:46 Awkward time. Can we start over? So with that, uh, like I said, this, I've, as we've been talking,

Starting point is 01:34:55 I've been adding a bunch of links to relevant parts of things that we've discussed. So definitely check out the resources when like section of the show notes, there's going to be a bunch of, uh uh links in this episode and with that we head into alan's favorite portion of the show it's the tip of the week all right i've actually got some juicy ones so you know how like when you're working when you're sick and you say juicy that didn't sound right that's that's about right i hear you i hear your voice and i'm like uh no it's very very juicy here hold this it's juicy i usually i use the word phlegm on a show before and i got roasted for it as well as you should yeah that's a phlegm phlegm all right so

Starting point is 01:35:37 we won't revisit that too much or maybe we should yeah Um, so I thought it couldn't get any worse. Right. So check this out. Like, I don't know how you feel about medium in general, like medium.com. Like I get, I get some of their stuff and I like some of it,

Starting point is 01:35:54 but some of it, I'm just like, man, this is like just people writing things that are like just rehashed of everybody else's. Well, I got an article the other day that I thought was actually really good. And,

Starting point is 01:36:04 and I think outlaw, you'll like this one a lot. Docker has built in some AI, we'll call it to their stuff to where you can use Docker and knit on the CLI. And what this allows you to do is you write your application, whatever it is, right? Put it in Java, put it in Python, JavaScript, whatever you want to do. And then you go into that directory where you've done all that stuff and you type in Docker init. And it will generate a very good Docker file for you, probably better than what you would have done from scratch and included some things that you wouldn't have thought about. And then you can just go in and tweak this thing

Starting point is 01:36:49 to get it to what exactly you want. But the goal is it's basically doing most of the hard work for you. So that's pretty sweet. I was, I was impressed by this. Yeah, I dig it. Have you actually, have you tried it in like real world like because they're giving a python example in this document like how well does that translate to kotlin i have not yet but like my plans are you using maven and palm no you're on your own uh initialization done no i i do plan to use this i absolutely do plan to use this. I absolutely do plan to use it. But I mean, it's, it's pretty sweet what it puts out. And this person's whole point was like, they hate writing Docker files, right? I don't know that I hate writing Docker files. But I'd love for like, if it just added best

Starting point is 01:37:38 practices to it, right? You know, like, like any kind of template type thing to start up an application, right? Like I remember, first time i went do react i would use whatever their their starter thing was so it would generate templates that made sense i love the fact that this bakes in best practices well yeah i mean like even angular you know you can create you can stub out like a new uh controller or whatever a new a new page and it'll include like unit tests and the service and all that kind of stuff for you. So I'm all on board with that, but I heard you say that you hate to write Docker files, and I was like, or the author does,

Starting point is 01:38:20 and I'm thinking to myself, man, am I sick? Because I actually like doing that. Yeah, it's an optimization thing. No, no. No, I think we probably need to see the same doctor because I like actually trying to figure out, hey, how can we make these things work more efficiently? And how can we bundle these things in a way that keeps these images as small as possible? It's an optimization thing that I just really like. Yeah.

Starting point is 01:38:46 I, I will take every Docker problem we ever have. Cause I always enjoy it. I never, I never walk away going like, Oh, I hated working on that thing. Like,

Starting point is 01:38:56 I don't know why I think I'm, like I said, like even with get problems when people like, maybe that's why, like, I'm just like, no, I actually like,

Starting point is 01:39:04 you know, when people have these kinds of problems and I'm like, oh, let me see what we can do here. Yeah. And yeah, I, I, this thought, this author and I would not agree on that. I think, I think Outlaw's a little bit sick, but that's fine. That's totally fine. All right. So the next code or the next tip was, this one made me so happy.

Starting point is 01:39:24 I, okay. We've mentioned this site before. I know we have, there's epochconverter.com. I use this site so much, so much. I feel like I should click on a PayPal button somewhere and give them like, you know, five bucks or something. So, but here's what I found out. It's fantastic, right? You can drop in, um, an epoch. And if you don't know what epochs are, then you need to go back to listen to our dating is hard episodes. Um, but basically anytime you're dealing with, with, uh, time zones or times around the world, the simple answer is an epoch i believe is the number of seconds since um january 1st 1970 is that right i think yeah something like that yeah january 1st 1970 um so yeah but people have made

Starting point is 01:40:16 it very hard over time besides just that right because sometimes i'll do it in milliseconds sometimes i'll do it in nanoseconds there's all kinds of ways to screw it up and make it really difficult on everybody so at any, they have some useful tools on the page where you can just drop in milliseconds, nanoseconds, seconds, and it'll convert it to a timestamp for you. You can just hit the button and it'll give you the exact dates. It'll give it to you in your time zone. It'll give you to you in GMT. It'll tell you the relativity of when that time was to now, like, it's just so good. You can also go ahead. You're going to say, no, go ahead and you finish. Okay. So you could also plug in like your month, day, year, all that kind of stuff. And it'll give

Starting point is 01:40:56 it back to you in, in an epoch, even an epoch milliseconds or whatever. Like it's so good. What I failed to realize. And I seriously, I was like, Oh my goodness. How did I never see this? If you scroll down the page and this is the problem, everything they have is sort of above the fold. And it's so good. Like most of what you need as, as just a, Oh, I need this tool is up here at the top. Well, if you start scrolling down the page, they give you all the information. What is epoch time? They just tell you exactly what I mentioned, not as well. And then further down, how do I get the current epoch time in PHP, Python, Perl, Ruby, Java, C sharp, Objective-C, C++, 11, Lua, VB, et cetera. Do you have have the samples there i can't tell you outlaw how many times i've googled

Starting point is 01:41:49 how do i get epoch in python or how do i get epoch and you know you end up on 12 different stack overflows where people are doing things that you're like that's wrong yep they have it right here keep scrolling and it's got how to convert it from human readable to Epoch, how to convert Epoch to human readable in all the same languages. Yeah. I've, I use this site a lot too and I'm surprised I went back and checked. We have,

Starting point is 01:42:17 it's never come up before and I have used, yes, I have used this site. I can't tell you. I, I agree with you. There should be like, Hey, let me buy you a cup of coffee. Like, yes, I have used this site. I can't tell you, I, I agree with you. There should be like, Hey, let me buy you a cup of coffee. Like countless times when, you know, like you're looking at some kind of a log and everything's in epoch or something, or your,

Starting point is 01:42:38 your create timestamps are in epoch or something. And you're like, okay, I don't read epoch what time was that can you can you tell me in my own time zone was that was that yesterday i just created something is this the thing i just created or am i looking at an old version of it i absolutely love this site i can't believe we've this the first time over 10 years and this is the first time we've brought this thing up and i've used this daily for five at least maybe more i mean it's ridiculous how much i use this website like they probably have my ip like pinned somewhere yeah you probably have your own dedicated node because they're like okay anytime this ip address just he's over here forget it right right we know this guy. He's good people.

Starting point is 01:43:25 Yeah. So, yes. So, again, link in the show notes. Highly recommend epocconverter.com. It's fantastic. All righty. Well, for mine, I'm going to continue on with last tip that I gave was related to setting up a legacy contact for your Apple ID. And I forgot to mention at the time, you can also set up an account recovery.

Starting point is 01:44:07 And there is a big difference of why you might want one versus the other. So the legacy contact is, let's say that you have someone that you're a legacy contact for and they pass and you then want to say to Apple, like, hey, we need to get into their account for whatever your reasons are. It doesn't matter. But I don't know their password, but I need to get into their account. And so you can go down that route and Apple will eventually give you access into the, um, the account. But the downside with going the account, um, the legacy contact path is if once you do, once you go down that path, right. Uh, you will not any, any of their devices that have like payment mechanisms on it. You'll that, that'll be lost. You won't be able to do that, which that one's not the big one. The big one to worry about is the key chain, everything lost in the Apple key, everything stored in the Apple key chain will be gone.

Starting point is 01:45:00 You won't get a gain access to that. And where that's a problem is Apple introduced the ability to use the keychain to store as a password manager, right? And a password manager that can be synced across devices. So if instead, if you have a family member that instead of using like a LastPass or a Bitwarden or whatever, if instead they're using this, the built-in functionality in iOS with Keychain, you would lose the passwords to all of their accounts that they might have, which could be financial related, right? Bank accounts, investment accounts, whatever. So that's the downside to that. But, and especially if you're doing this because you're like, Hey, they're still, they haven't passed, but for whatever reason, they might not be able to, you know, um, they either forgot or they don't know or, or whatever you know point is is that there is hope there is an expectation that like hey they're eventually going to be able to use this thing again right if you go down that legacy

Starting point is 01:46:08 contact path you would be losing all all of that stuff from the keychain which is important well that's good information also worth noting that keychain will also sync across mac os so it's not just ios if you're using the same, whatever your Apple login is across devices. Yeah, your Apple ID. Yeah, if you use your Apple ID, then that keychain is available across everything. So if they're doing stuff on computers, then you don't can work is the expectation is, well, first of all, this needs to be someone you trust, right? Because, you know, you need to ensure you need to have some trust that their device can't get lost out in the wild or, you know, their credentials

Starting point is 01:47:03 get lost out in the wild. And then they are like, oh, hey, let me like reset your account. But the expectation is that like, let's say, Alan, that I'm an account recovery contact for Alan, then the idea is that Alan would say, hey, I don't know my Apple ID password anymore. I need to reset it. Outlaw, can you give me the recovery code? Can you send me the recovery code? And then I would be able to do that for you. And the hope there is that that's a much faster process. I had to go through another process where it is, I gained a lot of respect because I was like, Oh, this is actually super cool the way they do this. So the account recovery process, like I said, would be like super fast, right? Alan forgot his password. Um, and you know, he, he asked the account recovery,

Starting point is 01:47:58 Hey, I'm going to reset my password and I need you to send me the recovery code. And that's why it's like a trust factor. Like you need to make sure that that person, um, is, uh, you know, not going to lose that, lose their account. Right. Um, there was another downside to that though. Uh, there, cause there is a, there is a waiting period that happens with that. And I'm not sure. So in my case, I had this happen with a family member where we didn't have a legacy contact or account recovery. And there's another process that you can go through where they have a waiting period. And the waiting period exists for the account recovery too, but I'm not sure if it's the same length of time, but what there's an Apple support app and I can download it and I can say like, Hey, I want to, I'm trying to help someone else out. And you can even do this at the Apple store. I'm trying

Starting point is 01:48:57 to help someone else out. Let me reset their password. And what will happen is they will start from the moment you initiate that process, they will start a waiting period. Right. And in my case, because we didn't have all of this, it was a 20 day waiting period. And I think I'm not sure if it's the same in the case of using the account recovery. So just a little bit of a disclosure there. But the way that waiting period works is if at any point during that 20-day waiting period, if in the case of like I'm trying to help Alan by resetting Alan's password, if at any point during that 20 day waiting period, if Alan's account successfully authenticates to Apple, then that's it. The waiting period is done. Not that the waiting period starts over. I mean that the waiting period is poof. You can't do anything. We're done with the waiting period. We don't care about the waiting period anymore. And so what that helps prevent is,

Starting point is 01:50:05 let's say that I am, you know, malicious user and I'm trying to reset Alan's password. And I'm like, Hey, let's get into this recovery situation. But Alan knows his password. He's like, no, no, no. Let me authenticate right here. Right? Like the 20 days is gone. But so when you say the 20 days is gone, not only is it gone, it cancels your ability to even try and get into it at that point. Right, right. The process is stopped. The process is over.

Starting point is 01:50:32 It's complete. But if during that 20-day waiting period you never use the – you never successfully authenticate to Apple during that 20-day period, then because in the example here that I gave where I'm doing this on Alan's benefit, then at the end of that 20th day, I will get a notification as well as Alan's Apple ID that, hey, your waiting period is up. You can go reset your password here.

Starting point is 01:51:06 And then what would happen is during that, um, when Alan would go to reset his password, whatever, whoever helped him out, which in this case was me would get a code that I would then give back to Alan to say, here's the code to use to reset your password. And then you, you go through. The problem is if your Apple ID is an iCloud, is your iCloud account, then you would never see that email saying, Hey, you can reset your password because you can't get into it. That's the whole problem. Right. But but but i would get that notification right because when you would go through that process you would say like hey we're going to use michael so send it send send the he i trust him for you to send it to him and i would get a notification party code yeah i i would get a random you know thing like oh hey and and it'd be kind of cryptic because

Starting point is 01:52:04 all you're going to get at the time is like you can now reset the password right but they're you know it's not going to like hey for a specific case number or here's the code to do it none of that's going to happen you're just going to go you can go do it and you're like well because i specifically know that it's alan that we're talking about in this case i can go do do that one. Right. But they're not like, think about it from like a, there's no context. They don't want an information disclosure. Yeah. Right. Exactly. Exactly. From an information disclosure point of view, there is no information given to you, you know, for context, but because I do know what's happening, that's the only reason why I would be able to do that. Right. So it's so super cool. Yeah. It's awesome.

Starting point is 01:52:45 I was like so super impressed and happy. Now, you know, I'm not gonna lie. You're going through that 20 days and this a bit stressful, right? You know, because like,

Starting point is 01:52:54 think about every account that you or a family member might have to where either you have your password stored in something on, on the device, um, like in key chain, for example, or you have, um, uh, or, or even like in a last pass or whatever, but you know, if you don't have access to, um, the device, you might not even know what they were using as credentials to get into the last pass or whatever. And they might not remember whatever. Um, especially if you're using one to store the other kind of situations, cause people sometimes do that too. Um, but even from a two factor point of view, right? So Apple, um, introduced the two factor authentication ability to iOS, like versions back many years ago now. Right. But you're not going to get any of those two factor authentications anymore.

Starting point is 01:53:54 And then think about all the two factors that can happen with, um, um, like as text messages or, or, or, uh, you know, automated, you know, robots calling you to tell you like your authorization code is one, two, three, four, five, six. Damn, that's the same as my luggage. You know, something like that, right? You could lose all of that. And furthermore, in this specific case, what happened was because the ID was entered in wrong incorrectly. So many times the phone itself was completely bricked to where you can't do anything with it. It literally, the phone will bring up a message even on a reboot.

Starting point is 01:54:39 It doesn't matter. It'll bring up a message to where it'll say something to the, I'll see if I can find the exact wording, but like this iPhone is unsupported. And it'll only let you make emergency phone calls or reset the phone, but to reset it to, you know, in other words,

Starting point is 01:54:59 to wipe it and, you know, reinstall everything you have to know, you have to know the iCloud, you have to know the Apple ID password in order to reset it. So you're in this catch 22 where I have this device that I can't use. You can hear it receive text messages, but you can't see the text messages and it won't even ring for a phone call. So you can't even receive a phone call because some of those like two factor authentications, like I said, are instead of text messaging, call the number,

Starting point is 01:55:29 but you can't do any of that. So like you can really get into a bad situation. But point is you can short circuit a lot of this and make your life a lot easier by setting up account recoveries. And while you're at it, you can set up legacy contacts. But you, like I said, do it for account, especially the account recoveries. It should definitely be trusted individuals that you don't, you're not concerned about like them losing access to the device or their credentials and then the problems that could come with it.

Starting point is 01:55:58 So, so what he's talking about is, and I mean, everybody, I don't know, I don't know the age of everybody listening to the show, but, you know, this is basically along the lines of, of, something, right. Or, or vice versa, if your parents, right. Like this is something that it's not a pleasant topic at all for anybody to ever talk about. But, um, uh, doing this planning upfront before anything does happen is, is way less painful than having to deal with it when when you you don't have access to anything right like and and you don't know how people pay their bills or how they logged into things or whatever like it's uh it's probably a conversation worth having with whoever's important to you to try and and and nail some of this stuff down yeah well. Well, so, uh, along that lines too, I think we may have talked about this. I don't recall. Did we talk about that? Like,

Starting point is 01:57:11 um, like last pass has this kind of ability to call it emergency access and what can happen. If I recall the way the last pass version works is that I can say, Hey, Alan can have emergency access to my account, but I can establish what that waiting period is. So I can say that Alan can request the emergency access to my LastPass account, but he has to wait, let's say, three days. And during that three-day window, I can be

Starting point is 01:57:40 like, no, don't give him that access. Or if I've passed, for example, or I'm in a coma or whatever, then obviously I'm not going to be able to respond to that. And after that, a waiting period has expired, then Alan would be able to get access into it. Right. And I'm, I would have to assume that, you know, competing products like a one password or a bit warden, you know, that they would have a similar concept to it. Um, but yeah, the point is to Alan's point that, you know, regardless of what your age is like, you know, even if, you know, you're either on one end of the spectrum or the other, either you're trying to set up this type of stuff so that people have access to your stuff when you pass, or you need

Starting point is 01:58:24 to set it up so that you have access to when you have you pass, or you need to set it up so that you have access to when you have loved ones that pass or, you know, whatnot. So, um, yeah, it's just lessons learned, you know, going through this sucky situations. Um, but especially with technology nowadays, making it to where you have access or availability or or your loved ones have access or availability to yours in case something happens is is i mean not pleasant but probably worth spending some time on well i mean if we're gonna if we're gonna get into boomer hour for just one moment let's do let's also think about too though they're like because i don't know about you, but like I try to, we try to be as paperless as possible. Right.

Starting point is 01:59:07 And, you know, imagine if you did not have access to any computing device, like you've, you've forgotten, you've lost everything, your computer, your, any kind of tablets, any kind of phones, like you don't have access to it, period. Right. And now someone else from the family is like, well, I don't even know what you have. Right. Like there's nothing there. I don't know what bills you have, what bills might be coming in. What, what, how would I even be able to get into it to know right so yeah it's it's a it can be eye-opening when you have to go through it yeah for sure so save yourself the time and do this ahead of time hey see this how we're in boomer hour or boomer after hour yeah hold so my wife

Starting point is 02:00:01 the other day so i've been i've been like sick all. So I've like been self-courting, right? Yeah. It's just the sexy Allen voice today. Oh, that's right. I do that sometimes just for fun. Um, sexy Allen, the great. Uh, so I've been like holed up in my office all week. Right.

Starting point is 02:00:19 And my wife was like, man, I don't feel like making dinner at night. She was going to order pizza. Like you remember what it costs to order like two large pizzas from like papa john's we're really going down boomer hour okay no i'm legit okay yeah yeah yeah i'm okay right so like yeah like what do you remember spending on like two large pizzas from like papa j or dominoes or pizza hut like i mean what's what's a reasonable figure for you huh i mean i can remember when domino's used to have like the special 999 pizzas right like right right but you know that that might date me and i don't want to say that out loud

Starting point is 02:00:57 it might be too late years ago am i too lazy so so wife, like two large pizzas for Papa John's, they were like 65 bucks. She was like, no, thank you. Um, so then I, I guess she was like,

Starting point is 02:01:12 you know what? This is ridiculous. I'm going to call up like Pizza Hut dominoes. Like they were all like pushing 60 bucks. We're talking about dominoes and Papa John's man. Like we're not talking. Yeah. I mean, look, it's totally fine pizza. I like Papa John's.

Starting point is 02:01:28 It's just fine, right? Like, I like most pizza. But we're not talking Mellow Mushroom with their holy shiitakes and all this special stuff, right? We're talking about freaking bread with cheese on it. I'm trying to figure out how you got to $60 for two pizzas. I'm trying to go through this now. So here's for two pizzas. I'm trying to go through this now. So here's the crazy part. Order online right now.

Starting point is 02:01:47 Domino's. Part of it was she was going to have it delivered. And the delivery was going to basically be $20. There was a delivery fee plus a mandatory tip and whatever. And it was like $20. So the total, like $45 was for two pizzas plus $20 of delivery or 40 plus, you know, 50. And she was like, I can't like, I mean, dude, I'm, I'm with you. Like, okay, let's even adjust for inflation over the past few years. Like I'm still thinking, okay, maybe $35, you know,

Starting point is 02:02:21 and then plus maybe a $10 delivery tip or something fine but no these things were bumping into 60 up to almost 70 bucks for two large pizzas and she's like i'm not doing it i so she got her a freezer pizza threw it in the oven right she's like i'm just i'm not i can't so yeah i can't i can't go down that path with uh domino's because they want to either sign in or like give the address and i'm like there's another boomer hour thing freaking give me the stuff on the website man don't make me sign in to do everything in this world i'm so tired of all those yeah wow boomer hour really got serious there for a minute i thought we were like jokingly referring to boomer hour and then we really got into like complaining about the prices of stuff so yeah i was ha she

Starting point is 02:03:06 was mad she she was so mad that she skipped buying pizza um from one of those places so yeah all righty well uh for this and more boomer hour subscribe to us on itunes spotify whatever we're done goodbye that's right hey no go to Slack. For real. Create stuff on Slack. Go to Slack. TonyBlocks.net slash Slack. You could let it go. I was like, I'm going to tease him here and see what happens. And nope.

Starting point is 02:03:33 All right. We're done.

Coding Blocks - Overview of Object Oriented, Wide Column, and Vector Databases

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.