Coding Blocks - Overview of Object Oriented, Wide Column, and Vector Databases
Episode Date: February 19, 2024We have a different combination of the hosts for this episode where we continue the series on the types of database systems available and why you might choose one over another. Michael continues impre...ssing by recalling everything we’ve ever said on our 500+ hours of podcasts, Allen enjoys learning about a database system he’d never […]
Transcript
Discussion (0)
We now interrupt your regularly scheduled programming to bring you Coding Vlog!
Guess who's back?
Back again.
Michael's back.
My friend.
That's awesome.
Guess who's back?
Guess who's back?
All right.
So, hey, how you doing?
What are you guys talking about?
We're going to talk about some more database stuff.
Hey, we're missing somebody.
What's going on?
What happened?
You know what?
We had a call-in friend just a little while ago.
When we started the recording, we sent out the invite thing,
and he showed up, and he's in a moving van.
Oh, yeah.
Yeah, so Jay-Z's trying to use moving as an excuse, right?
Like, what's up with that?
Like, he couldn't have recorded from the van.
I move all the time from one room to the next.
Like, I still get on the call.
That's awesome.
Yeah, but no, he'll be up here pretty soon in Georgia.
Is he going to regret that, Georgia?
I don't know.
I don't know. I don't know. We're known for our beaches and our tourism, and it's a very big vacation destination.
Yeah, but it's also a very hot destination.
Did you think I was being serious with anything I just said?
A little bit.
A little bit.
Okay, well, we're going to have to work on your sarcasm meter.
Well,
part of it too,
is my wife came in and needed something right in the middle of all that.
And so,
so my brain was split two or three different ways and,
and it's not working very well anyways this week.
So you're throwing her under the bus on something that's going to be recorded
and distributed across the world.
Like we're international.
So this is a big deal, obviously. that's right that's right yes um maybe she won't listen
and shortly after valentine's so you know yeah through which i was sick
now we know why yeah right uh i've got to make that up this weekend by by the way. That's a good reminder there. Thanks,
LL. You just saved my tail. It doesn't sound like you're ready for it.
I'm not. I'm totally not. So yes, this episode, we're on 228. And last episode, we talked about
six different types of databases and when you might use one over the other and why you shouldn't
just use one for all of them and all that.
That sounds like a short episode.
Six different databases.
It was funny.
When I put it together, I was like,
maybe this won't be terribly long, and then we went over two hours.
And so we scaled back on this one because my voice isn't going to last two hours one way or the other,
even if I only talk 10% of it.
So, yeah. So we're going to last two hours one way or the other even if i only talk 10 of it so um yeah
so we're going to pick this back up and again this i want to give brantley credit and slack
because he's the one that was like hey maybe you guys you could do an episode like this we were
like oh that sounds great so with that i'm alan underwood wait wait wait i thought it was alan
the great that's what you said last time when you were like, you're going to change your name.
That was your whole person table and you're changing your name.
Yeah.
Somebody listens.
Somebody listens.
So really say that.
Yeah.
Well, Alan the Great, it is.
I mean, I'm good with that.
Here we go.
You don't even remember what you said.
No, man.
I told you my brain is not working.
Okay.
Well, you realize this is a podcast about coding, right?
Okay.
I've heard.
I've heard.
So let's talk about cars.
Jay-Z isn't here to stop us.
All right.
Well, I'm Michael.
What's your title?
What's your title?
Just Michael?
Michael Outlaw.
Yes.
Thank you.
Thank you for recognizing.
All right.
All right.
There we go.
All right, all right, there we go. All right, so first up, we need Outlaw back here to do this portion for us.
And I heard how, like, you know, Jay-Z was making fun of me.
I didn't think he was making fun of me.
He was struggling.
The proper now is, like, iTunes and Spotify, like, the well-known ones,
and instead, you know, getting all the made-up handles that people would use
like those parts he nailed but itunes yeah i heard that i heard that all right so uh from
difficult man yeah well there's one in here that's definitely going to get me for sure so uh from itunes calum 55555 thank you uh from audible wood to prog
from spotify we have ian ghost merc and if you haven't heard your name yet, you know I'm talking about you.
Really?
Zirath?
No, how would a name that starts with an X be pronounced?
Xavier.
So I guess you'd just say X, right?
No, you'd say Zuh.
But why would you say Zai?
There's no I there.
Or why?
What did you say?
Zirath? Zureith? Z zureth i'm sorry okay so it's got to be one of those five things i've said but it's probably not and that's
because we all know that proper names are my kryptonite and yeah so we found it that's amazing hey and calum sorry if you threw out
your back um listening to an episode oh yeah just you know disclaimer um we're not uh liable for any
injuries that occur while listening to this podcast so yes do we need like a whole uh you
know how like how on uh well i guess this is only a thing in the U.S. from other things that we've read before or I've read before where like only in the U.S. do they advertise medications, but there'll be like a big legalese thing block at the end of the commercial for like all the side effects.
Maybe we need to have like a lawyer read like all the things that coding blocks is not responsible for and the side effects of coding blocks and yeah they can be talking about zip medication on
on a commercial and they're like may cause heart failure and kidney disease and oh oh maybe i'll
keep those this i think it was i want to say it was like a reddit thing that i read it where i
read it i read it i read it good um maybe that's why they named it that um where they were talking
about things that,
you know,
like,
like people who weren't from America that would come visit America and things
that they found surprising to them.
And one of them was seeing manufacturers of different medications,
advertise the medication on TV.
Like there'd be,
you know,
commercials promoting it's use.
And yeah. So I guess like some of the like Saturday'd be, you know, commercials promoting it to use. And yeah,
so I guess like some of the like Saturday night live type skits,
do you remember those from back in like,
I don't know,
10,
20 years ago where they would have like the,
the side effects,
you know,
the,
the,
the people in the commercial would start listening to the side effects and
they're like,
Hey,
wait a minute,
you know,
like may cause a desire to kill your, your, your business partner. And the other person was like, Whoa, wait a minute. You know, like, may cause a desire to kill your business partner.
And the other person's like,
whoa, wait a minute.
Hey, let's not take that.
All right, so what are we talking about tonight then?
Oh, wait, we forgot.
Yeah, I'm sorry.
We have one more thing.
So we've been selling this for Jay-Z
because he had planned on being there.
There's Orlando Code Camp coming up february
24th so right after this episode drops again if you're in the area i mean the three of us have
been to it before it is a terrific event that they set up there so you know if you're around there
like within an hour's drive i'd say go check it out i mean you'll have a good time you'll meet
people and you'll learn some stuff so uh and we'll probably have a link in the show notes for that all right but now back back to where right so once again two-thirds of
the show two-thirds of the hosts are here to do the show and let's talk about blah blah blah yes
let's do that all right so i'm gonna let i'm gonna let outlaw pick up this first one and and we'll
chew on this so again we're talking about database engines that different types of database engines and and
our reference is db-engines.com and we've used this for years right like we we've talked about
different databases and whatnot for years in here so um oh and i didn't i didn't number these things
so so i'm gonna let i'm gonna let outlaw take this first one and we'll chat about
this one because this one's interesting and different so this should be near and dear to
our heart you would think right like just from the name of it object oriented database systems
you would think like oh yeah like those those would be wildly popular and among our favorites. Now, I ask you, before this, could you have named one?
No, not a single one of these.
I would have said, and I'm surprised it's not on the list.
Actually, I'm going to go to database engines and see if it even,
well, first of all, it'd be funny if it's not even in there at all.
But I would have thought that maybe Lotus Notes,
if you remember that one from way back in the day, maybe count considered really well that's what i was trying to think is
like i don't see it in here though by the way let's see i think it was more like an access type
thing i think it was a true relational no you definitely had objects and you could like write program. You know, you could definitely have code inside of things like it was.
I don't know what it counted as, but that's what I would have thought.
But it's not in this list on according to DB engines ranking.
I mean, it's been a long time.
Let's see.
It is considered a semi-structured NoSQL database.
Okay.
So, yeah, not an object.
So, honestly.
That's crazy, really.
So, did we say this is object-oriented database management systems because this one when i first heard it i was like well how's this different than like um like no sql like object databases right like mongo and those kind of things
and that that's what kind of like when i saw the list of the databases they have here
that's why i was like i've never heard of any of these um i want to say
wasn't neo for j one at one point in time neo um i i don't know the answer that i was going to see
like it i wasn't sure did you already have i'm sorry um i was curious about the object oriented
versus the the document database
to see what's considered to be the difference.
Well, that's what I was going to say.
That's why I was thinking, is Mongo not?
No, it's not.
So here's what it boils down to.
When they say that they store the data in the database
the way that it's modeled in the application, it doesn't mean they're storing like a, they're not storing like a JSON document, right? It's
actually storing things, I guess, like binary type things in there in whatever format. So if
you have arrays and collections and all that kind of stuff, that's what it's actually doing behind the scenes.
So it's not the same idea as taking whatever your model is and then crushing it into a single document.
It's actually storing it in that structural type format is, I believe, the big difference between the two.
Yeah, there's a not incredibly accepted answer.
Only like 10 on Stack Overflow.
Only has like 10 upvotes, but it was saying that the difference here is that objects,
it's the actual objects that are stored and not the JSON as you were describing.
Right.
So what's interesting, and we'll skip down here a little bit so so these first off let's let's mention
the systems right because again we never heard them um intersystems cash was one that's not
even listed on their on their ranking page um intersystems iris that was number 92 on the list uh db40 that's 161 it's ranked number 161 in the list uh object store was 154
and actian was 159 on the list never heard of a single one of those before doing this
um but if we jump down hold on i mean well while you're doing that though like i gotta imagine that
if you're storing the actual object that your code is using like that's got to be a big part
of the reason why these aren't as widely popular because that's going to i would imagine and and
i'm coming into this you know completely naive so i i will you know accept that that's probably normally the case
when you listen to these shows but but for all of us but um you know i would imagine that that's
going to limit your ability to uh uh iterate on your application but i don't know maybe there's
like an avro type schema something like that in the background.
It's like, well, here's the version of that object.
So you could load it up or, you know, still use it. I don't know.
It just seemed, it just seems like it would, it, it almost feels like,
you know how there's that.
We've talked about this before about like the whole separation of concerns
kind of concepts, right? Like uncle Bob has preached that, you know,
endlessly for decades now and has several books about it, right? And he's not the only one, right? Like, there's a
whole plethora of books out there that are, you know, describing those types of things.
And this almost feels like you're not. Like, this almost feels worse.
Yeah. Yeah. It's interesting. interesting i mean the reason why like if you
go specifically to the inner systems iris page you know they basically are just talking about
how it's very performant and you know you could do a lot of things with it and sure i totally get
that right like if you're storing things natively the way that it's coming out of your app yeah i
mean it'd be really fast
now that's if you're going directly to that stuff i don't know how how it works querying across
objects and all that kind of stuff right like i haven't i have a feeling it's it's fairly
complicated but i kind of envision it as like link queries though like in terms of efficiency like you know you already have this object available
and so i don't know that it wouldn't necessarily be performant
i i mean if you're trying to query something that's buried three levels deep in some sort of
object structure and it has to do that across all of them you know that's why i'm saying like i
don't know how in a database i know how you solve that right like you you have indexes and all that kind of stuff i don't
i mean maybe it's that way here too but one of the things that's interesting from this inner
systems iris page so object script and python directly manipulate and read from the storage. So like direct access type stuff, objects can also be
exposed in other languages like.net JavaScript, Java and C plus plus. And then they say on top
of it, you can also use a SQL syntax with it. Um, and they have JDBC drivers and that kind of stuff.
So again, it's really interesting. I think, I think what outlaw said from that stack
overflow is really the big difference, right? Like it is actually storing those objects
directly instead of translating them or marshalling them into some sort of document or something.
And there might be some performance benefits to it but it
definitely has not caught on like almost all the other database types out there well i kind of view
this like this is like um i could be wrong but i i kind of think of like object-oriented databases
are going to be like super uh purpose like they're going to have a super specific niche of problems
that they're going to go after and solve.
They're not necessarily going to be your hammer database type
that you're going to use for like most things, right?
Yeah, I would agree with that.
I mean, on the DB Engines ranking page,
they actually had a note towards the bottom of,
if you go to the DB Engine site that we of if you go to the db engine site that that we
mentioned and you go to the encyclopedia and you click on object-oriented database they sort of
have a notes page there and at the very bottom of it they basically say that these things are sort of
these object-oriented database systems have sort of fallen out of popularity because of
the advent of orMs and how good
they've gotten over the years, right? Like, you know, entity framework, hibernate, whatever.
So that might be, that might be the reason why they haven't caught on as much. I don't know,
but I would agree though, with, with what outlaw just said is if you're going to use this,
you have a very specific use case in mind, right?
Like that, I don't see anybody generally going out of their way
and being like, oh, let's create this thing,
and we're starting with this, right?
I wonder, you know what would be a good one to find out.
Let's see.
Let's go back to the interwebs.
If I could spell correctly, what are we talking about?
Pros and cons.
I'm curious what I might find here.
Oh, well, is this Enos?
Wasn't that one of them?
EnosDB?
No.
Advantages, disadvantages, complex,
advantaged complex data sets can be saved and retrieved quickly and easily. And object IDs are assigned automatically. Disadvantage object databases are not widely adopted. In some situations, the high complexity what you just said, I mean, that's actually,
that makes total sense. If you know that you have a person object or whatever, right. And let's just
get, it's got a huge object graph under it, right? Like reports, um, health insurance,
all kinds of other garbage, right? You know, that if you're getting that, you just retrieve that
object and it brings you back the entire graph all at once. Right. So, so that's why they're saying it could be very performant is if you're always operating at that top that object, and it brings you back the entire graph all at once, right? So that's why they're saying it could be very performant,
is if you're always operating at that top-level object, then sure.
But I imagine it would have, I think this is what you were getting at before,
which I think would be a similar type of issue in a document database,
where if one of those fields is an array,
and you want to search for something in that array that has a specific attribute,
and then maybe something else in that, and you want to join that to some other data set,
that that would be where it would get problematic.
Totally.
If even possible.
Yeah, it's interesting.
So, you know, be aware that these things exist.
I've never actually seen one used.
That doesn't mean that it's not in some big project out there somewhere.
Well, I mean, it made the list.
So that has to say something for it.
And notes didn't.
So, yeah, yeah, true, true.
Lotus Notes, is it in here anywhere?
I mean, I did a search for notes and unless they changed the name,
maybe they did change the name.
It's not on the list.
They said no. Yeah, apparently. They're like,, maybe they did. It's not on the list. They said no.
Yeah.
Apparently they're like,
it's not even a database.
There are three,
there are 401 items in this list.
So for Lotus notes,
not to make it like they really made somebody mad.
I'm really curious.
Did they change the name maybe?
And maybe we're looking at the wrong thing.
Oh, it looks like IBM sold it off to HCL.
And it's still called Nets.
Interesting.
Yeah.
I mean, it's been a minute.
All right.
So, yeah, we don't have a ton more to say about that one.
It's just not a super widely adopted database type.
If you want to play with it, it sounds like it's kind of interesting and cool it'd be nice to know how it works but
well there was one other thing i'm sorry i was just gonna say you're probably not this is probably
not going to be your first choice when you're going to look to set up a new application
especially at a business you know well just being able to back up that stack overflow answer there because even
though it only had the 10 upvotes you know the author i think was definitely on to something
because even in the db engines encyclopedia that you mentioned there's the sentence where they said
the goal was to be able to simply store the objects in a database in a way that corresponds
their representation in the pro in a programming when i can't even speak. These aren't even proper nouns.
What is wrong with me?
The goal was to be able to simply store the objects in a database
in a way that corresponds to their representation
in a programming language without the need of conversion or decomposition.
Yeah, so tightly coupled, just like you said.
That's really what they were going for.
Yeah, so instead of me giving you back a row of data or a document of data, you know, that
you then have to figure out how to parse or, you know, use or whatnot, it's like, here's
a pointer to that object.
Done.
Yeah, exactly.
Exactly.
There's no marshalling whatsoever.
But I wonder, like, I said pointer though, but that's probably inaccurate because I imagine,
you know, especially if your network latencies or, you know, just the network traffic in general, right?
Like you're not sending a pointer back.
So they have to be sending the object over the wire in some way.
But yet there's no.
Yeah, it's interesting.
Yeah, it's interesting.
I just I honestly I can't think of a case where I would just want to use this except to experiment. like, uh, like, um, if you're going to write a software for like a Mars, you know, Rover,
you know, or something like that, where it's like super limited, um, you know, hardware
and everything.
And maybe you don't want to take the overhead and time to do conversions and type conversions
or anything like that.
Like, you don't want to have to worry about that.
You're just like, here's the thing that needs to be stored.
And when I query it, I want that thing exactly back as it was done
and i don't want to waste time trying to you know convert things right yeah i could see that
yeah i mean very limited like hardware purposes i guess or like where you're going to have limited
abilities to you know do things with it.
I mean, I even think about what happens when you change object schemas and stuff.
Like,
is that going to be a problem?
Yeah.
I mean,
that's what I was referring to when I made the Avro comment before,
like if you needed to,
if you did need to do that,
like how do you,
how do you iterate on your,
your design?
But maybe,
like I said,
maybe it has like object versions like so that you would
know or maybe in your code you would have like specific objects like you would have to have
versions of your objects in your think about how disgusting that would be god no yeah yeah
no i don't want any part of that yeah all right so next up that's all speculation by the way so
somebody's gonna correct me and be like yeah yeah i mean there's probably somebody out there
that's listening that has used them and maybe they can fill us in you know feel free to drop a comment
on uh codingblocks.net slash episode 228 and i'm joe by the way so if i got anything wrong yes i i guess i'm still out on the great uh so
so the next one up are wide column stores now we have i know jay-z has had a little bit of
experience with this uh i had messed with a little bit and these are a little bit more popular
because they're all about massive horizontal scaling.
And we talked about these quite a bit in data.
What was it?
Data-driven application something.
Designing data-driven applications.
Thank you.
Thank you.
See, yes, I'm struggling today.
So you want to tell us some of the popular ones here?
Sure.
So coming in at number 12, one that I'm sure you've heard, Cassandra.
You've probably heard of a lot of these.
Number 12, Cassandra.
Number 26, HBase.
And number 27, Azure Cosmos DB.
We're the most popular examples according to dbingens.com.
You know what's funny about this man azure cosmos db
shows up in almost every single database engine list like i would love to know what they did
behind the scenes to make this thing work for everything well that one specific oh sorry and
is it really that good at everything is my question right like is it truly that amazing at all of it
yeah um maybe maybe it's just the new hammer it's a globally distributed horizontally scalable
multi-model database service so the primary database models for azure cosmos db a key value store and wide column db i mean that checks a lot of boxes
a lot yeah where does it rank it's all managed it's ranked uh 27 oh yeah i guess i already said
that yeah it was right there i put it in uh it let i when we started doing this i was like oh
man it'd be nice to know how these fall because when we first talked about the relational databases like they were one two three and four right like that's
that's sort of a big deal they're still kind of a thing that reminds me i did have a side to add to
that throw it on there well okay so so rewind then in the last last episode, I don't know, I don't recall if this ever got called out. But, you know, part of the conversation was document DBs versus relational databases and how you could have like, you gave an example of like street where, or an address where like, Oh,
well now I've got to have a street and that's going to be null for the
majority of places. Or, you know,
the advantage of a document database was that you could have like kind of
free form kind of things like only the properties that are needed for that
specific piece are there. So you're going to save some space, blah, blah, blah.
But at least I don't know if oracle
does this if any i would imagine oracle does and maybe i don't recall sql server doing this at all
though but uh postgres are you know are are one true love um i say that jokingly. It has the ability to do JSON as the column type.
So you could have like a mixture of relational and document in the one row.
And Postgres will allow you to like query the elements of that JSON in your SQL statement like you would any other column.
You know, so it kind of walks a fine line of like,
let's have a little bit of the best of both worlds in these specific use cases.
And I mean, obviously with everything, you know,
you use it sparingly and wisely, but yeah.
Yeah.
Yeah.
SQL server had that functionality as well, right? Like they had some JSON parsing and things in yeah. Yeah. Yeah. SQL server had that functionality as well, right?
Like they had some JSON parsing and things in it.
Yeah.
Now Postgres as a column type though, JSON as a column type.
That's what I'm talking about.
Like it's a first class citizen in, in Postgres.
Oh, you might be right.
I don't know if they made it a column type, but they made JSON functionality.
See if they parsing in columns
maybe they did maybe they but i will tell you regardless postgres did it better in my opinion
um because the the json tools that were available in sql server i think last time we touched it was
like 2018 like they were a little frustrating but they were there well i mean i'm looking at it in on a microsoft document
now coincidentally it did come back for 2016 sql server 2016 which is a bit long the tooth i don't
know but this article is like updated 10 days ago so this is fairly recent and it was talking about using uh your your data type would be text in varchar for the
for the column and then they have json functions yeah you could use that's what i remember
yeah it looks like people are doing in varchar maxes with it and and then yeah they were i mean
like i said the the tooling was there to be able to do some stuff, but it was not a pleasure to work with.
Yeah.
I just thought that was an important distinction to make though for Postgres that, you know, they have a type.
If you're listening to these, like trying to figure out like what, you know, type of database you want to use, that's an important consideration, I think.
Yeah, I would agree. I would also say like outlaw, you know, be careful trying to make one thing do
everything. But if you have, if you do have a use case where it's like, oh, you know, occasionally
we need this document type, then, then yeah, you know, go for it. But if, if your primary use case
is, oh, I've got tons of documents well then
maybe you should be considering something like mongo right yeah or if maybe your primary use
case is is a search engine or you know key value or whatever right you know right yeah sure i mean
no no your use cases right so sorry so back to comm Yes. Also known as extensible record stores.
Clear as mud.
Right? Yeah. So let's make that a little bit less muddy.
They can store large numbers of dynamic columns.
And what the heck does that mean?
So every record has a set of columns.
Well, in a regular relational database system, in a, in a schema on rights,
you have to define those columns up front, right? Like, so we talked about our address table and
the wasted column was like address line two. Well, in, in a record with dynamic columns,
you can just add columns that you want, right? So it's, it's almost like a document
type thing where you can just add whatever you want in
there. But the difference is this isn't a document. You're actually storing a record
and it has these columns in it. And they say that you can have a large number of dynamic columns.
Well, how many is that? They said you can store billions of columns in a record and and they say that that's why these
are also sometimes described as as two-dimensional key value stores google being the og of this
category or i should say specifically google big table or i'm sorry big table as yeah as uh jay-z
would prefer we pronounce it.
Was it?
No, it wasn't first.
Was it before?
They wrote the white paper, didn't they?
According to the encyclopedia here, Google Bigtable is considered to be the origin of this class of database.
And the publicity was based on a now classic publication.
Let's see, what was the name of that publication specifically was the big table,
a distributed storage system for structured data.
And it's interesting because in this original document table is not classified
in big or not uppercased in big table,
which is Jay-Z's gripe about big table,
big table. Yes. And so this thing is a schema on read right because you can have these dynamic columns you know the record that comes back
tells you what the columns are so you know you don't have to do a well-defined thing up front
um now this this was a comment that the outlaw sort of made at the front. He's like, oh, so this isn't the same thing as like columnar storage in, in like a relational
database system.
And we weren't supposed to talk about that.
That was in private, man.
Why are you throwing me under the bus?
Well, I'd see, I can only do this because, because I read all this and I was like, huh,
that's interesting.
Um, yeah. Right. Yeah. So so what they call out is columnar storage.
And if you've never heard that term, I seems like we might have mentioned it back in the day.
We've definitely talked about columnar storage.
Yeah. So it's basically for being able to do like OLAP type queries out of relational database systems.
You know, I think a lot of the big ones have gone to it.
We know SQL Server for sure had columnar storage and a lot of the big ones have gone to it. We know SQL server for sure had
columnar storage and a lot of other ones went to it. But the difference is like typically in a
relational database, you're storing things in a row format, right? Well, when you go to columnar
storage, it starts storing things in columns, putting the data on the columns, because it's
quicker to access for doing OLAP type queries. So analytical type processing queries.
They say that the difference is wide column stores
are not actually storing things in a columnar format.
They are still storing them in a row format
just with tons of columns on them.
So it is a different storage format and technique than the other one completely.
I could have sworn,
but now I'm thinking I'm wrong that we've made the example or talked about the
example of the back,
like the book index versus the table of contents.
Oh, we might've in that, that, that the index was more example.
But now I feel like that's wrong that I'm thinking of something else like a
reverse index. I think maybe.
I can't remember, man. That,
that seems like when we were talking about the the formats,
the log formats that they were writing out.
It was probably, it probably was in the early discussions of the,
let me see if I can find that.
Like SS, SS tables and.
Yeah, it was probably early discussions around designing data intensive
applications.
Yeah.
Like the right ahead logs and all that kind of stuff in the formats that
those do those in.
But so because Cassandra is so popular, that that is kind of the one that I went to go grab the information from.
And there are some very so we'll have a link to it here.
It's Cassandra dot Apache dot org.
And they have a basics page.
And one of the key, probably most important things of it is it is hyper horizontally scalable and when we say hyper
you could they even had an example where they're like oh you could add 8 000 nodes and outlaw found
something oh man i'm so good we did talk about it and i was correct that i was wrong
in trying to like make the association of the index to the columnar storage, but it was the inverted index.
And it was the search-driven apps was the title of the episode.
It was, what was that episode number?
83?
Yeah, 83.
Episode 83.
Search-driven apps. search driven apps and we had talked about how the what is called an index is actually an inverted
index because it tells you where to go for a specific word versus the table of contents
is a forward index that tells you where to go in the book for a document or a chapter
okay or because we made that analogy of the, you know,
the document to the chapter.
His,
his search skills are amazing on our website.
So I know when I'm wrong.
That's just a lot from episode 80.
We're talking 83.
Yeah.
That's probably six years ago.
That was a,
you know what?
I'm going to put a LinkedIn here because there was a bunch of stuff that would
be like,
probably relevant to discussions about databases though like like a reverse index
was part of it and inverted index inverted index search engines um things like that what's the
date on that episode uh this was uh june 10th 2018 is when I published it. So, so it was almost six years ago.
Yeah.
It's been a minute.
That's insane.
We've,
we've covered some ground here a little bit.
All right.
So,
so yes,
hyper horizontally scalable.
I think I mentioned,
they said that,
you know,
they even gave an example of like,
Oh,
you can add 8,000 nodes,
right?
Like that's a lot of,
a lot of computers to store data and retrieve data,
but that's what it's there for. Um, when you do this though, like at some point they even say,
look, if, if you're looking at Cassandra, you're not running a single node. It doesn't make sense
to run a single node because you're not, you're not solving the problems that Cassandra was meant
to solve. Right. And here's some of them. I want to run big table on a single node because you're not you're not solving the problems that cassandra was meant to solve right and here are some of them i want to run big table on a single pod right yeah we got
this we got this how many how many um scuzzy connections can you make to this thing um so
yeah it prevents data loss due to hardware failures if you scale it obviously right
and they even talked about in and this is something that you should consider if you were doing anything for your business and you weren't going to the cloud or whatever.
You probably want to have these things in multiple regions, right?
Different data centers around around like the country or multiple countries or something so that if it did fail, like if you had a fire in one place and it melted all your computers in one spot,
then you're not going to lose anything
because it was also being distributed elsewhere.
This was pretty interesting.
You have the ability to tweak throughput
of reads and writes in isolation.
So that's pretty interesting.
This is another one that they said is huge.
This is a big deal about Cassandra is
because of the way that it's set up and it's distributed quote unquote manner.
Every, everything looks like a single point of entry, but on top of that, every single node
acts like every other node. Like it's not like you have this one master
node that that you know does all the the main stuff and then this other one down here does
other things this is truly like hey every node that you hit they all act exactly the same and
they all do the same function so so it it makes it a easy toto-reason-about system and how it functions.
Yeah, I feel like, you know, because they referred to it as like a masterless architecture.
I don't know if they still refer to it that way, but that's the way it was referred to.
Yeah, they have it in their notes that that's how they call it. Yeah, so like, because that's one of the defining characteristics
in terms of when you talk
about the problems related to a relational
database, and you guys talked about this last time, which is
that you can horizontally
scale the reads, but
writes have to go through
one single primary
node, and then those
writes, that thing is responsible
for committing that transaction log, and then once they're thing is responsible for committing that transaction
log and then once they're committed it can be replicated across to other ones for
distributed read so your rights can't be um distributed so that's the that's the downfall
of relational databases as it relates to trying to scale horizontally especially for big applications
right and in this case unless you try and get
cute you can charge your databases right and then your application logic's in control of all of it
and that gets hyper complicated right well that's where like yeah so different sharding techniques
come into play there where you're trying to like decide what's responsible for a given part of the
of the table right but in this case though,
with technologies like Cassandra or wide comps or there,
you're able to distribute the rights because there is no like,
you know,
Matt master or primary node for those rights.
But what I don't know is how they achieve that.
I'm that's where like my knowledge of Cassandra is limited.
I think we talked about this in the designing data intensive applications.
Like basically when, when you do a right to it and it,
and we sort of talk about this a little bit down here at the bottom.
If you look at these last couple of bullet points there,
there's a configuration set up says, hey, how many, for consistency, how
consistent does this data need to be?
And you can actually do it on your query to write the data and say, hey, when I write
this for me to get a success back, that it needs to be consistent by being distributed
to two additional nodes. Right. And so, so you could
actually tweak how important you think this data is, right? Like, Hey, I need this to be distributed
10 other nodes before I feel comfortable that it's safe. You know, I knew we had talked about
this at one point in about like the primary master list type of thing. And I, but I couldn't remember where it was.
It was episode one 72.
And we had talked about how,
um,
we mentioned another one too at the time,
which isn't in the DB engines list.
Katika Tama K K E T a M a.
Uh, I don't see if that's in,'s not in there yeah it's not in there at all
but uh it was one that we mentioned at the time but they they handle the proportion the
partitioning for you so that based on the number of nodes it'll decide which node is responsible
or you know it'll randomly choose a node to be responsible for a specific set of partitions,
and that's how they can distribute the writes.
I mean, it's pretty cool stuff, right?
I mean, this is the kind of stuff to where if you want to keep data safe and available and all that,
this is the type of engine you're looking at. And we mentioned Bigtable.
That's another reason why a lot of people go with Big table is because they manage the solution for you right so like this this next point here this is one of the big selling points
for something like cassandra or big table is you unlike a regular relational database oracle sql
server postgres mysql whatever if you want to scale those things typically you're you need more processing
power you need more ram you need you know more drives attached or whatever and the problem there
is at some point you're going to run into well we have the most expensive cpu you can buy now
um or kind of capped not even cpu i can remember working on like where we had our database server we bought
the best ssd that we could get at the time and that was a twenty thousand dollar ssd that only
housed the database and we you know but we wanted that io those are the types of things where you
like get capped right and and so the the thing about cassandra and and probably hbase and other ones like that
as well is you can scale this thing with cheap hardware you don't need and you know when when
it's referred to in the professional sense it's commodity hardware and basically what they're
saying is you don't have to go buy some ultra high-end you know super micro motherboard that
supports four cpus and all this kind of stuff
to be able to do it. You could, you totally could, or you could just go buy an off the rack,
you know, Asus, a regular motherboard, throw a regular CPU in it and put, put some Ram and some
stuff on it. And the thing will scale out by just adding new regular computers to the thing. You know, I wonder, here's something like,
you know how Jay-Z had the Dockers, the new Git or whatever?
Yeah.
I wonder another kind of controversial thing might be to say,
maybe the traditional relational databases are out.
They're on their way out.
This is the beginning of the end for them.
And what I mean by that is in place of those, you have database storage technologies
that can deterministically decide, hey, this particular part of the rights is handled by this
server and that particular parts of the rights is handled by this server. And that particular parts of the
rights are handled by some other server. And then they'll figure out how to mash it all together
behind the scenes, like they'll handle replication behind the scenes. And that way,
you can horizontally scale both reads and writes, while also ensuring data integrity,
you know, among it. And I'm on, you know, because like, as I'm saying,
as I'm thinking through this problem and like how Cassandra solved their,
their horizontally scalable rights problem, I'm like, Oh, you know what?
This sounds like Kafka and Kafka didn't make the list of DB engines.
And I take issue with that because I definitely consider it a type of,
it's a transaction log.
So we, we talked about this actually i think um
jay-z and i for a second so the reason why i think neither one of us i think both of us sort of
agreed that it's not a database is because you don't query it it's a queue it is a transaction
log well they do have a ksql no no no that's a different technology that's not kafka right
that's that's the thing but it
was a sequel it was it was isn't the k sequel by apache for kafka though no k sequel was written
by confluent it was written by confluent right and it was built on top of kafka streams which
is an application technology on top of kafka because we actually had that same conversation and it was like no Kafka is a message it is a fast message uh persistent message queue right and that's
and that's what it's made for now whatever you want to do on top of it sure you can do all kinds
of crazy stuff right like people do it but yeah fine fine going back to what you said though
because but they do the same thing though flink is in the same thing and
it's not a database either and i would definitely agree that it's not but this whole idea of being
able to say like hey i'm going to have n number of nodes responsible for handling whatever this
task might be be that task responding to a query or be that task responding to like oh some new
data come in,
let me like figure out how to process it, like how Flink does. The idea that all of these things
share Cassandra, Kafka, Flink is the idea that like, hey, I'm going to deterministically decide
who, which one, which node is responsible for that, that particular event, you know,
query event or whatever. And, and behind the scenes though,
those things will can replicate, you know, state among each other as needed, right? So that if,
if the one that I deterministically decided on is not, is no longer available, I can fall back to
another one, right? Yeah. I mean, so to take that a step further. So
I love the question, like our relational standard relational databases that we've all known and
loved for, you know, three, four decades now, since the sixties, apparently, are they, are they,
are those sort of going away? And that's sort of the surprising thing, right? I don't think they
are because when you look at that database engines ranking list, they're one, two, three, and four. those sort of going away and that's sort of the surprising thing right i don't think they are
because when you look at that database engines ranking list they're one two three and four
yeah but but if you step in with what you were saying right there
are there going to be things that do the things that those systems are good at, but make them more scalable? The answer
is yes, because that's what Cosmos DB is, right? Azure Cosmos DB is basically the, hey, come use
me. I'll get rid of your headaches, you know, of trying to scale your databases and all that,
but you still get your, the, the same development experience that you've known for a long time.
Google has one, it's called Spanner, right?
And it's the same notion.
And I wouldn't be surprised if AWS also has their own version of this.
I don't know what it would be.
But here's the problem.
And this is why I think that relational database systems haven't gone anywhere yet.
Do we know is is cloud is it cheap
are you asking me yeah i was just wondering i mean it depends on what you're trying to do
right so what if you were trying to have a database if you're trying to have like uh you
know you want to host a database for your family of like, hey, here's our family tree.
Like there might be a better way to do that more cost effectively.
Sure.
Then, you know, but so but if you're trying to like create the next, you know.
Well, I think what was it?
Pinterest was the or no Instagram was the example that you guys had talked about last time where,
you know, in the article that was written in like 2012 or something that was like,
we have big data problems. We get 25 images a second. Right. And now it was like, we get 1,070
plus images a second. We think it's so fast. We can't even count it, you know, like, so,
so if you're trying to build the next,
the next Instagram,
then,
you know,
you,
you probably want to consider,
consider cloud.
I mean,
it's like everything else in,
in computer science,
right?
Like the answer is that the answer that you don't want to hear is it
depends.
It depends.
And you really need to like,
no,
like what is going to be your use case before you start making architectural decisions.
Well, here's the reality, though. Right. Like if you let's put it in the simplest possible terms.
If you start running into a situation to where your SQL server or your Oracle can no longer handle what you've got, and you've already dumped $100,000
into the server that's running that thing, right? Maybe even more, right? In many situations,
then maybe it would make sense to be looking at something like a cloud spanner for Google or for
Azure Cosmos DB. But the problem is, and this is where it starts really sucking is if you've gotten to the point
to where you're tipping over a hundred thousand dollars server,
you're going to be running into some decent monthly costs on,
on getting that thing running up in the cloud,
right?
Because,
because that means you've already hit a level of data and a level of
complexity and querying needs and whatever that you pay for that stuff in the cloud,
right? Like you pay for the extra compute, you pay for the extra throughput, you pay for all that,
right? And it's like going to a nice restaurant, right? And I don't want to make it out like
Longhorns isn't a nice restaurant, right? But if you go to Longhorn and you order yourself-
That's fancy date night right there.
That's fancy date night right there. And you order yourself a date night right there, that's fancy date night right there.
And you order yourself a filet and it came with two sides and whatever else.
And you get that bread for free right up front, which was be honest.
That's why everybody goes to Longhorn because that bread, that butter.
But you're in it for what?
Thirty five bucks somewhere in that ballpark.
You go to a Ruth's Chris or a nicer steakhouse,
you're paying for that steak. You're paying for each individual side. That's what the cloud is
like, right? Oh, you want that side of mashed potatoes? Okay. Well, that's, that's fine. We'll
go ahead and bill you for that. You know, you want that extra throughput. You want this? That's what
it's like. And so you so you really really when you start looking
at the cloud you really have start looking at hey what do i think my realistic throughput is
going to be what do i think my realistic cpu needs are going to be and all that because you have to
you have to budget for that stuff yeah and you know it's fair to call out that ruth's chris is
also like the most popular chain.
Is it? For real?
For what people would consider a nice restaurant.
More than a Morton's?
It was just something that I heard recently, and I just Googled it again just to see, like, hey, am I wrong?
And this article was from January of this year.
But now, here we go.
This is for millennials.
For millennials, it's number one.
So I don't know.
We're almost at boomer hour.
So I don't know if we want to like,
you know, if that's going to be a topic.
There was a request that we make that a part of the show every time.
But I've never, I've never been to a Reese's chris i want to go so bad and every time i'm like
dude you should i mean we have one close by to us too and it's not far at all look look i'm not i'm
not gonna say anything because i don't want to sway your opinion one way or the other you should
go for sure and it just you know whatever your favorite piece of steak is just ask for that don't don't even
boomer hour started early wait a minute hold on is this generally you you like it or you don't
is that which you really want me to tell you i really want to know um yes it's good okay uh i
can make a better steak at home okay i mean i i mean i've heard that's probably true of like
you know most places places, right?
I mean, these aren't like Michelin five-star restaurants.
Yeah.
I mean, I guess so there, there's another popular chain.
So Longhorn's good. Oh, there's another really popular chain around here.
Texas Roadhouse, Texas, Texas.
Yeah.
Texas Roadhouse.
But in Georgia.
Yeah.
Yeah.
It's going to be confusing to our overseas friends.
Yeah.
Sorry.
In, in, in US. There's a franchise called texas roadhouse texas roadhouse in georgia yes and i'd
say that their stakes are every bit as good as probably ruth's chris now the difference is
ruth's chris they're going to be giving you prime cuts of meat at at um texas roadhouse or or even
longhorn it's probably choice instead which is you know
one level down but whatever i mean it's good now i will tell you there is one big difference
at ruse chris they salt pepper and butter on top that's it right that's all you need that's on a
good on a good piece of steak that's all you you need. If you go to some other places, you know,
they might put some other type of pepper seasoning or whatever on it to,
to give it that extra flavor, to kick it up a little bit. But, but regardless,
it is a good steak that, you know,
but is it something that I'm just always dying to go back because I've never
had as good? No, no.
That's consistent with things I've heard in the past too, though.
It doesn't stop me from wanting to go just experience it yeah agreed yeah totally um just to you know
kind of close the loop on this whole cassandra thing like i'm a little disappointed ourselves
we haven't even brought up so our past sponsor data stacks they had the whole solution of you
know giving you a managed cassandra environment uh you know for you and you know, giving you a managed Cassandra environment, uh, you know, for you and,
you know, something for you to consider. In fact, you know what, I'm going to include
the name of that product was Astra. Let's see, or is Astra, I should say.
Anytime you start dealing with a bunch of hardware and having to make sure things are
alive and all that. I mean, it turns into a pain. Oh, Hey, one other thing, one other really
important thing to note here, and this is why
Cassandra is so popular in terms of this whole distributed, you know, uh, wide stored wide
column storage is every node you add is linear scalability. So that's, that's a big deal,
right? Like, so if you have one node and it can handle 1,000 queries a second,
if you add a second node, then you can handle 2,000 queries a second
if you add a third.
So I'm sure that it's not 100% linear, right?
There's always going to be a cost overhead with any kind of distributed
network traffic and all that.
But that's what they actually tell you on their pages.
That is the glorious thing of it.
Besides backing up your data and making sure it's consistent and available and all that
being able to scale it is a very linear ad and nodes.
And you have in times the performance roughly.
So pretty cool.
I would imagine though, that like you started with one node and then go into two, but as
we've established with Cassandra, you would never have a single node set up because as
you were saying that in my mind, I was like, well, wait a minute, how can you make that guarantee?
Because as you add nodes, you're going to have replication overhead and, you know, uh, that's
going to bite into your, your available bandwidth, you know, your IO bandwidth, uh, both on disc and
network, blah, blah, blah. But then I, that's why I wanted to call out like, Oh, well, because one
node is never really
realistic so you're probably starting with some minimum number and probably five if that well
let's say that the minimum number was three right and you might have replication of two
right then you know if your replication is always two even if you did go from three to five it's
still replication of two right right so uh you know
every every node is going to have the same number of replication reads and writes in addition to
incoming query reads and writes so that's where you can guarantee like oh it's going to grow
linearly because their replication count isn't is probably not going to change right because i'm
thinking from it from like a kafka point of view right like your replication count isn't is probably not going to change right because i'm thinking from it from like a kafka point of view right like your replication count isn't going to change just
because you're adding right you add 10 more nodes you're still only replicating the two every time
you do your right or whatever so yeah yeah it's it's it's pretty interesting it's a very and
there's a reason why it's popular so all right. So let's switch everything over to wide column stores.
You've convinced me.
That's right.
Done.
Big to bowl all the things.
That's right.
All righty.
Well, I guess we won't bother with mental blocks.
We just already know that we're mental, and that's the only things that blockheads need to know.
That's right uh but so if you want to be one of the lucky
few that leave us a review and hear your name called out um send your difficult to read names
you can find uh some helpful links at codingblocks.net slash review and we do greatly
appreciate reading those reviews uh some of them are comical, you know, and
like the one that
Alan caught out before from
Ian? No.
No, it was Kalem
about, you know, don't try
to listen to what you're working on or you're going to break your back.
You know, but then
we also get some really heartfelt ones
too, so
we do really appreciate, and it gives us inspiration to keep going sometimes,
because sometimes there's happy moments, and sometimes there's darker moments.
Yeah.
Wow, that took a dark turn.
Sometimes we're sick.
That took a dark turn.
Sometimes we're sick.
Sometimes we're moving.
Yeah.
There's all kinds of things going on.
I feel like I should grab a pipe and put on a coat.
These are dark days with coding blocks that's right all right well let's get into vector database systems um dude this one all right
this one is new to me completely never even heard of them and it's sort of mind-blowing
so okay i'm coming into this completely cold right i am originally when i saw the name vector
i was thinking graph but that is not the case uh-huh that's what i'm saying this one so behind the scenes how the sausage is made
i spent more time learning about this one than all the others combined well i guess in fairness
most of the other ones i've touched over the years but this one being brand new this was truly a just
a learning experience and it's uh i need to go in and find the, the rankings for these cause I forgot to put them in there.
So the first,
the first,
go ahead.
No,
no,
no.
Go.
I was,
I was going to read some stuff about the vector DBs if you wanted to look
that up.
Okay.
You do that real quick.
I've got that link down there,
uh,
to the pine cone is probably the best that I've seen.
Well,
I'll start with it.
What's in the encyclopedia.
Just that these
are systems that specialize in storing uh how did they word it they had systems optimized for
efficient storage indexing and querying of highly of high dimensional vector data that use special
algorithms and data structures to support a similarity search use like often
used in machine learning or data mining with a focus on performance,
scalability and flexibility.
Right now.
Well,
I feel like I just read like a marketing thing,
like some marketing guru came to you and just read like every buzzword that
he's recently seen
you're like uh-huh yeah what were the requirements again we'll buy it what do you what did you
actually want this thing to do yeah so one thing that's interesting here so he just read that
and and i mentioned this on the previous episode, like these, these database websites that you go to can be an absolute wealth of information, like a hundred percent.
Now what's surprising to me is this.
So the popular ones are KDB.
That was ranked number 52 overall out of the 401.
Pinecone was 103. So it's like, you know, double as far back in the list. And then chroma
was one 39. Now, the reason why I'm bringing this up, because it's surprising to me,
if you go to the pine cone, it's pine cone, pine cone.io. If you go to that website, it is fantastic.
If you want to learn anything about a vector database,
I mean, it's better than just about anything else out there
that you could just Google and search for.
Like, it's just, they do so good on it.
But the thing that was surprising to me is it's like ranked second in this list of them.
But if you go to the KDB database website, it doesn't feel like it's on level at all.
Now, maybe it's an amazing thing.
I don't know.
But go ahead. Well, I think that's because it's, you know, like in database engines, they say that it's a high-performance time series database and that the primary database models are time series and vector.
So I'm assuming that the reason why it is ranked higher is because it's not just one thing.
Like the other examples that you have here, like pine cone are only vector database models.
Yeah, that might be it.
But you know what?
I don't know, man.
It just maybe I'm even I'm not even on their site.
Hold on.
Where is the KDB site?
It's KX.com.
KX.com.
Here we go.
Which why wouldn't it be KDB.com?
I don't know. Yeah, they got products. What is KDB.com here we go which why wouldn't it be kdb.com i don't know yeah they got products what is kdb.com
yeah i don't know if i'm if i if you lose me here it's because i just got hacksword
on my computer machine by by typing in a random uh name. That's always safe.
Yeah. So anyways, I just bring that up because Pine Cone's website is truly, if you want to talk about here, which is going to be sort of
a deep dive into what it's actually storing. Because if you don't understand that, then it
doesn't even matter to you that it's a vector database. And that's why I spent so much time
on this because first off, it was fascinating. And secondly, you know, if we're going to talk about it, we should at least be able to speak a little bit about it.
So, yeah.
So what is this thing?
He already said it.
It stores.
This is a technical term.
It stores vector embeddings.
And it's able to retrieve them quickly.
Okay.
So that probably means something to 5% of the people listening.
Maybe I'm being generous.
I don't know.
It didn't mean anything to me when I read it.
Yeah, I'm reading through the problems they're trying to solve,
and now I'm kind of starting to understand.
But I don't know.
Well, let me see what you already had here.
I don't want to like – because I was going specific off the pine cone documentation.
So maybe we keep going with what you got.
All right.
Yeah.
I mean, if you click on that page, I mean, I'm basically going straight down it now. I don't have as much detail as they have on the page.
They like, you know, talk through some examples and stuff.
But.
Oh, well, you didn't do this part.
They start with like the problems they're trying to solve.
So maybe we do go through this part first.
If you build a traditional application,
your data structures are represented
as objects that probably come from
a database.
Your objects have properties
that might map directly to a column,
etc. Then over time,
as those properties
as the
number of properties grow grow so do the objects
to the point where you need to be more intentional about which properties you want for a given task
and you end up creating specialized representations for some of these objects
you know instead of having like very fat objects so like i'm trying to think of an example where we had a very wide table, but we might have had an object that only had like a smaller number of those properties.
But, but I mean, we've all done it, especially in like select results.
Right. if you're more relying on like stored procedures or routines and your objects
represent the results coming back from those queries instead of like the full
objects of the, of the database. Right.
And then they go on to say, if you're dealing with unstructured data, you know,
you go through a similar process except you know,
it's more on the, the code side.
But with vector embeddings, it's a form of automatic feature engineering where instead of manually picking which things you want from your data, instead you have a pre-trained machine learning model that will produce a representation of this data that's more compact while preserving what's meaningful about it.
Okay, yeah, so you're definitely getting deeper into the weeds there to bring that.
So I like where you're starting with reversing why you would do this.
So what you just described is what we're trying to store.
Yes.
A use case of it.
Like there's a few that they have an examples page.
A couple of use cases are semantic search. So
if you think about something like elastic search, we've talked about this in the past,
you, there are like words that are known to have certain synonyms, right? So, so if you were to
search for, you know, whatever sentence, it's going to sort of do a smart replacement of some
of those words in that sentence to try and
find anything that's similar, right? So that's how regular search engine type stuff works.
But if you're using something like a vector database, what you're trying to get to is a
better semantic search. So instead of searching for words that have similar meanings, you're
trying to search for sentences that have similar meanings,
right? And so what it does is it stores meanings of things in a 3D space, if you could think of
it, right? Like imagine that you have this sphere in front of you, and this is sort of a simplistic
view of it, but you have one sentence and it's dead center of that sphere, right? Like right
in the middle. Then another sentence comes along and it means roughly the same thing sentence and it's dead center of that sphere, right? Like right in the middle.
Then another sentence comes along and it means roughly the same thing.
So it's going to be sort of close to it, right?
Like maybe it's behind it or on top of it or to the left of it or the right of it or whatever.
But it's going to be located somewhere near that first one.
Now you have another sentence that comes in that means nothing like it, right?
Like the first two were talking about computer stuff and the third one was talking about cooking. It's going to be on the outer edge of that sphere,
right? Like, so that's what this vector storage is doing is putting things close together that have similar meanings. So semantic search was an example of that. Another one was audio search,
right? Like you could, you could take audio files and it could do like spectral
analysis of, of the, the waves and the patterns in the audio and anything that has similar type
things to that would be located in similar space in that vector, like in this, in this 3d thing.
And I think it's more complex than just a 3d plane, right? Because there could be multiple,
uh, I guess, layers of this, but at any rate, so those are the types of problems you're trying to solve is instead of
the, I hate to say it this way, but like the simplistic type things that we've done for,
for many decades as developers, right? Like, um, swapping out words, you're now trying to plot meanings of things and relationships to things, which is how AI models and stuff work nowadays, right?
Like it's, there's a comprehension sort of to these things that is all done mathematically.
Well, yeah, that's what I was, where like, I kind of had this like light bulb moment because I made the comment a moment ago about the graphs,
but I was mistakenly thinking of like edges and vertices when I heard the word vector.
And so I'm thinking like,
Oh wait,
like graphs,
like net graph networking,
like that kind of thing.
Is that what we mean?
And then I forgot like,
Oh wait,
no,
no,
no vector.
As in like the math term vector.
Yes. That's the type of thing that we're talking about. Yep. And that's where the name is coming from. So then I forgot like oh wait no no vector as in like the math term vector yes that's the type of thing
that we're talking about yep and that's where the name is coming from so then i was like oh okay
all right so not that like that helps to clarify it all right this is still a very a very complex
piece part of it though for sure yeah and it's a very small piece of it and this is
when i say small piece of it i guess every database system is's a very small piece of it. And this is when I say small piece of it, I guess
every database system is sort of a small piece of whatever the overall thing is. Right. So let's go
ahead and, and go through the, the vector embeddings for developers. So this is a webpage
they have on the pine cone website, and it's absolutely fantastic. I'm going to talk through
it here and
you know outlaw and i'll be bouncing stuff back and forth but by all means i highly recommend
going and checking this out because they have some great visualizations they have some good
information on that page and there'll be plenty of links to the uh to to this document as well
as other others in the show notes yeah for sure so what is a vector so just
like outlaw said a second ago it's a mathematical structure that has a size and a direction so if
you think if you know if you go back to math days and you just have your two-dimensional graphs
right you have an x and a y and you put a point on there well that that was 2d space if you open
that up like like I said,
if you maybe think about it as a sphere or something, you now have an X, Y, and a Z access,
right? So there's, you're somewhere in a three-dimensional room, like in a room in your
house, right? If you could put a point somewhere in your room and just float it there and then make
it a bigger or a smaller ball, right? Let's just say that the, that point is going to be represented
by like a tennis ball or a basketball or something bigger or Let's just say that that point is going to be represented by like a tennis
ball or a basketball or something bigger or smaller. That's kind of what this is. It is a
plot in 3d space with a size to it. All right. I already said that. And you know, you could think
of it as zero comma zero comma zero, if you wanted it at the origin, I guess, is what we used to call that thing.
And now they say for developers, it's usually easy to think about it as an array of numbers, which I mean, sure, fine, whatever.
I don't know why we can't think about it the other way.
But but sure enough, that's that's how we're going to represent it.
OK, now this is where things get important.
This is kind of what I was talking about with semantic search, right?
Like if you think about vectors in space, there's some that are going to be closely put together, right?
Like things that are more related, like if we're talking about two people, they might be closely related.
If you talk about a bicycle, it's going to be a little bit further out, right?
But it's still associated with people, so it might be somewhat closer by and then if you talk
you know about a piece of grass then maybe it's way outside because it's got nothing to do with
either of them right so depending on how you're trying to model the data they're going to be
closer or further apart in space oh also i thought that that should be like a new uh you know vectors in space
awesome all right so how you said it made me think of that
man i i'm probably gonna say all kinds of weird stuff that
will sound like it should be from something stupid this one gets this one gets so deep though
because like the the thing about like any machine learning um type conversation is when we talk
about it on a two plane level right x y it's easy for us to conceptualize like is this close to the
zero zero point or is it not right right and and And, and then you say, okay, well I can introduce Z.
So I have X, Y, Z. Now it's like three dimensional. Um, and it's,
it's a little bit still okay for us to grasp,
but then when you start getting beyond that plane, three planes,
then it gets really, uh, difficult to even visualize some of that stuff. Right.
Totally. Um, but I think for the purpose of this, like, let's keep it simple,
you know, a couple of planes. Yeah. I think, I think three is probably good.
So they, they talk about the fact that vectors are extremely useful in machine learning because CPUs and GPUs are really good at math. Right. And, and that's why,
you know, if you,
if you haven't done machine learning type stuff and you've been curious a lot
of times that's where, you know, people be buying these high end, uh,
Nvidia cards, right?
Because these CUDA cores are good at doing all this type of math stuff.
So this is where we start moving a little
bit further. So vector embedding. So we've been talking about vectors, right? The actual data
points, vector embeddings. This is the process of converting virtually any data structure into
vectors. So how do you actually get those plots? And they say that it's not quite as simple as
just a straight conversion, right? And the reason is, is you don't want to lose the data's meaning.
Now, I can't even fathom how some of this stuff works, you know, in actual practice.
So what they're saying is, if you were're comparing two sentences you wouldn't just compare the words
you want to compare if the two sentences have the same meaning and you know for for anybody that
has learned english which if you're listening to this i assume you have like there's all kinds of
ways to say things that all mean the same thing right like i mean oh it's hard you can have the
same sentence mean different things yeah totally right mary had a little lamb is my favorite example mary
had a little lamb what does that mean did she own it did she eat it something else like
yeah it reminds me of, of comma skill. Uh,
we went to eat grandma.
If you leave that comment out at the end before grandma,
we went to eat.
Yeah.
But in the case that the,
but in the case,
the example that I gave though, like it's not even about,
you know,
grammatically like the comma or anything.
It's literally the words,
the,
the,
the sentence could be written exactly the same.
Mary had a little lamb and it,
but yet it could mean different things.
It's,
it is incredibly difficult.
So like what they're doing with large language models and all that kind of
stuff now are crazy impressive.
So to keep the meaning and produce vectors with relationships that make
sense,
this requires embedding models. Now these, this is where it gets going a little bit further,
right? So they say many embedding models are created by passing large sets of labeled data
to neural networks. If you have not worked with machine learning at all, I've only been on the outside of it.
I know outlaw was a little bit further into it, but labeled data is basically the easiest thing I can think of is let's say that you're labeling photos.
You know, you get photo data coming in.
You know, there might be a simple algorithm says this, a cat or a dog.
And when the photo comes in, say what
captions we're training Google. Yeah, basically more or less. Um, so you get that photo and,
and the whole purpose of that one particular function is to put a label on it, cat, dog,
or maybe it couldn't find either one. Right. And so it doesn't put a label on it. And so
as these things come through, you know, you might have a thousand photos come through and you've labeled them cats or dogs.
So that's that's what this labeled data is.
It's like a person has made a decision on what this data is, and then you're feeding that into the machine learning algorithm to say, like, this is an example of a cat.
If you see something similar to this.
Yes. Just what he said. So neural networks are trained using supervised learning,
typically, not always, but typically. And that supervised learning is what outlaw just said,
you know, you already had a person pass judgment on what this label should be.
You give it to this neural network and, and you tell it what you think it should be.
And so the neural network starts learning based off the inputs that it gets.
Right.
And the reason they're called neural networks,
in case anybody hasn't seen these things,
is this just huge nodes of machines out there that are just processing data
nonstop using mathematical type functions,
right?
Like it's passing things from one,
like one function to the next function to the next function.
And they're all just kind of chained together. Right.
I think that's a decently easy way to describe it.
I thought it was, I thought the name came from, um,
like modeling the human brain though.
Oh, it probably came from. Yeah.
Cause you're making tons of decisions on everything that you see at every
point right let's see i'm trying to see if i find something about where it came from but all right
all right so while he's looking for that the next so using the supervised model you pass in these
large sets of data as pairs of inputs and labeled outputs right so you gave it its stuff and you
told it what you think it should be.
The values are then transformed in each layer of the neural network.
So as it goes through its functions, is this a cat or is this an animal?
Yes.
Is it a cat?
Yes.
Is it a dog?
Yes.
Whatever, right?
Like it's going through this whole chain of questions.
With each training of the neural network, it says that activations at each layer are modified.
And I assume that's, it's basically tweaking its model to know, you know, Hey, if I see something
that looks kind of like this, then it's, then it's a dog. If it, if it has this feature,
then it's a cat, right? Like those are its activations. And then at the end of this,
the goal is that eventually the neural network will
be able to provide an output for any given input. Um, even if it hasn't seen that specific input
before, right? So you fed it a thousand pictures of these animals that were cats and dogs.
And when you feed it a thousand and one that wasn't in that original, it's going to look at
the features that it determined made you decide that something was a cat versus a dog., it's going to look at the features that it determined made you decide
that something was a cat versus a dog. And it's going to make that decision now, right? Because
it's trained itself to sort of understand what it is. Now, the embedding model itself is essentially
all those layers of that neural network minus the last one that did the labeling of the data. So rather than getting the label data,
you're getting that last layer right before it made the decision. And that is sort of,
I guess the whole, that's why it's called an embedding model is because it knows how to
figure out what it is that you're doing. And that is what you're storing in the vector database is this embedding
model. So it's, it's super interesting that you're not storing results in this database.
You're storing the thing that determines the outputs of inputs in this database. And those
things are being put in spaces to where they think that they're close in relation to each other in that database.
So like in, in maybe like layman's terms kind of speak, you're storing the interpretation of previous data.
So based on, I've seen this thing before and I've been told that this is that blah, blah, blah.
So I interpret that to mean if I ever see these characteristics, then that can mean blah.
Yeah.
Yeah.
It's putting these things close together. And then I think outlaw you, you're on that page.
One of the things that's really cool is they, you know, they show you a sample of one of the popular ones.
It's called Word2Vec, which is basically for showing you words that are similar in this 3D space.
It's really cool.
But right in the middle of this word vector thing, this word vector cloud maybe is the word
located.
And then right around it,
you'll see things like Northwest nearby housed area occupies.
That makes sense.
Located.
It has that meaning all the way on the far outsides of these things that has
the word placed or found found.
Yeah.
Like it knows that, that, and these aren aren't the these aren't the furthest apart
right like these are just ones that are on sort of the fringe edge that it still thinks are somewhat
related but there's tons of dots out there that are like nowhere in the ballpark and it doesn't
even show them so it's a really cool visualization to show you kind of like what you're looking for
the output of these things to be and how it's being stored, right?
Like this is visually how it's being stored in that database.
I mean, if you haven't already figured out,
you definitely need to have a pretty good understanding of machine learning.
If you're going to use a vector database. Yeah. Right.
You're not jumping into this.
I looked cause I was curious about the name for the neural network. And yeah, it comes from the biological neural network.
So like according to Wikipedia, a neural network, also called a biological neural network, is an interconnected population of neurons.
And closely related are machine learning, artificial neural networks so the machine learning models are inspired by the biological
neural networks and consist of artificial neurons which are the mathematical functions which are the
activations that you referred to because those are activation functions that are designed to be
analogous to the mechanisms of the neural circuits i mean mean, it's like, think about like,
I think the idea is that like, if you had to,
I'm trying to put this in like the simplify the can, but like you,
you, you get some kind of input either visually, you know,
or one of your various senses,
you get some kind of input into your brain.
And so like all these like synapses fire, right.
But there's like a connection of those things that fire.
And there are people that like study the brains and they'll show like
different parts of the brains lighting up. And like,
why does that part light up for here? And for there, like, Oh,
there's memory there's love or there's anger or whatever the different things
are. Right. And, and that's, what's happening.
We're trying to model that same type of firing in math.
Right. So that like in that cloud that you showed uh or that you referenced
in that that pine cone uh document depending on what your context is like the firing is going to
lead you down a specific part of that path yeah it's super cool yeah i probably dumbed that down
badly no that was somebody's gonna say No, that was a really good example, right?
I mean, this is not simple stuff, right?
I don't know.
Outlaw, were you on a call recently where some guy was geeking out on?
Yes.
Oh, my dear God, man.
I think we were talking about large language models or something yes we were and i was lost slide one yeah i'm trying to remember that specific
without getting too detailed but i remember it i remember one of the slides was like layer after
layer after layer and like how one would feed into the next and
feed in the next and feed into the next and things like that like that's where like um the the the
topic of machine learning is awesome it's amazing it's hyper complex but it also varies greatly
depending on like that level of complexity i don't say that to turn anybody off
from it who's not into it because if you're if you're on the using side of it then it doesn't
necessarily have to be as complex right right i'm not trying to say that it's not still complex
but it doesn't have to be as complex because there's plenty of libraries that have done the hard math for you and and there are i remember like back in the day we talked about
microsoft had various diagrams of like hey depending on what type of thing you want to
solve like here's the type of machine learning algorithm you want to use so that there's that
kind of thing that already exists for you but where it gets super more complex is if you want to be on the side of let's create new practices and patterns and algorithms and new mathematical models to do that.
That's where it gets like hyper complex.
And that's where the conversation that you're referring to was kind of more on the path of, hey, what if we flip the script and we started doing things?
And that's the type of conversation for a very targeted audience that might not have necessarily been all in attendance at that particular time,
but it was still great, but also complex and hard.
Yeah, I mean, it would almost be like somebody doing a deep dive on data structure to somebody who'd never been introduced to computer science at all, right?
Like, that's the kind of level of things where people are just sitting there going, huh?
Right?
And, I mean, I have no doubt that this talk was put in front of the right people.
People would be like, oh, yeah, yeah, yeah, you know, but put but put in front of people that like hadn't even been introduced to the concepts, right? Like think
organic chemistry before you get into any of it. Like it's, it's that level of, Oh man, okay. I
need to go back and do a bunch of studying. And that's why this was so interesting to me, right?
Like this whole vector embedding thing, like until you get to what you're actually trying to story like, oh, okay, I get it now,
right? Like, I'm sure that all the stuff to get to this point is complex. But like outlaw said,
like, even even so Microsoft, the reason why they even publish that thing that told you the types
of algorithms that you might want to use for different situations is they've already got those
models, right? So, so Azure has tons of machine learning models that you can just use as a developer, which is fantastic, right?
And if you're doing.NET development, they have the libraries that you can just import in and say, hey, use this machine learning model for this particular data set.
And then once you took the output of that kind of stuff, you could use it to potentially drop things into these vector databases.
You know, I said Microsoft, and then I remembered,
as I started to go searching for that to include that, it wasn't Microsoft.
It was Psychic Learn that had the really good one,
which is a Python library, you know, a super easy introduction to it, you know, a very friendly
library for playing around with machine learning. And also too, if you've never like done anything
machine learning and it sounds daunting, just, I think if you started with decision trees as like your first entry point into it you'd be
like oh wait that's that counts like oh that's not so bad and then and then you could get into
the more complex like neural networks are definitely among the more uh you know complex
types uh that you get in you get into it with but see if i can i'll see if i can find that psychic learn
uh drawing and i'll include that in the links excellent yeah so we uh yeah we've now covered
what this thing is right so it's it's uh it's pretty cool stuff like i said i i'd never really
heard of it and after digging into it i was oh man, now I want to use it.
But that means I have to go learn some stuff.
So, you know, and we led with like some of the things they had.
As a matter of fact, the Pinecone, again, their webpage was just fantastic.
They have an examples page.
And I want to list a couple of them in the show notes, but semantic search was one.
They had chatbot agents. page and only listed a couple of them in the show notes, but semantic search was one. Um,
they had chat bot agents, they had, um, retrieval augmentation, image searches. They had all kinds of stuff that you could do with this. So, I mean, if you can think of how you want some sort of
relationship modeling to happen, this is a good good a good choice for you using as a
data store so we've actually talked about psychic learn a couple times so yeah i believe so yeah i
remember hearing it let's see episode 152 and in episode 92 well well, maybe not 92.
Azure Cosmos DB came up in episode 92.
Cosmos DB.
I mean, that and some of the Google products,
and like I said, Amazon has them as well.
I wonder, let me see.
AWS equivalent of Cosmos DB.
What do they got?
Let's see. Oh, here it it is they're saying dynamo db i don't find that to be unless they've just really extended what dynamo db is that doesn't sound right
so there there's a um a drawing from psychic learn called choosing the right estimator
and you know it tells you like okay well how much data are
you going to have what are you trying to do are you trying to categorize this thing are you trying
to label it you're trying to you know and and it'll tell you until it'll group you into a type
let me say that again it'll send you down a path of like here's a group of machine learning algorithms that might be
appropriate for the type of job you're trying to do and you go from there oh that's beautiful yeah
a nice little uh path it reminds me of like a zoo a zoo map yeah walking through it literally like
you know you're at the mall start here you are here yes oh this is beautiful yeah definitely check that out and also by the way i mean since we're
talking about vector databases and how closely that ties into machine learning like psychic
learning actually has like a really good um starting point for machine learning type um
knowledge you know like algorithms and whatnot And it's kind of friendly to,
to use.
Hey,
also,
um,
it sounds like he's saying psychic learn to me anyways.
It's like science kit.
Learn is what it is.
So S C I K I T dash learn.org for anybody that doesn't go to the show notes.
So,
um,
we'll just chalk that up as another proper noun that i can't say it might also be that i can't hear well because my whole head's
congested it's some combination of those two all right all right well yeah now i know all about
vector databases and we only did three this time and we're still an hour and 40. Oh, wait, were we supposed to hit the record button?
Oh,
good God.
Awkward time.
Can we start over?
So with that,
uh,
like I said,
this,
I've,
as we've been talking,
I've been adding a bunch of links to relevant parts of things that we've
discussed.
So definitely check out the resources when like section of the show notes,
there's going to be a bunch of, uh uh links in this episode and with that we head into alan's favorite portion of the show
it's the tip of the week all right i've actually got some juicy ones so you know how like when
you're working when you're sick and you say juicy that didn't sound right that's
that's about right i hear you i hear your voice and i'm like uh no it's very very juicy here hold this it's juicy i usually i use the word phlegm on a show
before and i got roasted for it as well as you should yeah that's a phlegm phlegm all right so
we won't revisit that too much or maybe we should yeah Um, so I thought it couldn't get any worse.
Right.
So check this out.
Like,
I don't know how you feel about medium in general,
like medium.com.
Like I get,
I get some of their stuff and I like some of it,
but some of it,
I'm just like,
man,
this is like just people writing things that are like just rehashed of
everybody else's.
Well,
I got an article the other day that I thought was actually really good.
And,
and I think outlaw, you'll like this one a lot.
Docker has built in some AI, we'll call it to their stuff to where you can use Docker
and knit on the CLI.
And what this allows you to do is you write your application, whatever it is, right?
Put it in Java, put it in Python, JavaScript, whatever you want to do.
And then you go into that directory where you've done all that stuff and you type in Docker init.
And it will generate a very good Docker file for you, probably better than what you would have done from scratch and included
some things that you wouldn't have thought about. And then you can just go in and tweak this thing
to get it to what exactly you want. But the goal is it's basically doing most of the hard work for
you. So that's pretty sweet. I was, I was impressed by this. Yeah, I dig it. Have you actually,
have you tried it in like
real world like because they're giving a python example in this document like how well does that
translate to kotlin i have not yet but like my plans are you using maven and palm no you're on
your own uh initialization done no i i do plan to use this i absolutely do plan to use this. I absolutely do plan to use it. But I mean, it's, it's pretty sweet
what it puts out. And this person's whole point was like, they hate writing Docker files, right?
I don't know that I hate writing Docker files. But I'd love for like, if it just added best
practices to it, right? You know, like, like any kind of template type thing to start up an
application, right? Like I remember, first time i went do react i would use whatever their their starter thing was so it would generate
templates that made sense i love the fact that this bakes in best practices well yeah i mean
like even angular you know you can create you can stub out like a new uh controller or whatever
a new a new page and it'll include like unit tests and the service and all that kind of stuff for you.
So I'm all on board with that, but
I heard you say that you hate
to write Docker files, and I was like, or the author does,
and I'm thinking to myself, man, am I sick? Because I
actually like doing that.
Yeah, it's an optimization thing.
No, no.
No, I think we probably need to see the same doctor because I like actually trying to figure out, hey, how can we make these things work more efficiently?
And how can we bundle these things in a way that keeps these images as small as possible?
It's an optimization thing that I just really like.
Yeah.
I,
I will take every Docker problem we ever have.
Cause I always enjoy it.
I never,
I never walk away going like,
Oh,
I hated working on that thing.
Like,
I don't know why I think I'm,
like I said,
like even with get problems when people like,
maybe that's why,
like,
I'm just like,
no,
I actually like,
you know,
when people have these kinds of problems and I'm like, oh, let me see what we can do here.
Yeah.
And yeah, I, I, this thought, this author and I would not agree on that.
I think, I think Outlaw's a little bit sick, but that's fine.
That's totally fine.
All right.
So the next code or the next tip was, this one made me so happy.
I, okay. We've mentioned this site before. I know
we have, there's epochconverter.com. I use this site so much, so much. I feel like I should click
on a PayPal button somewhere and give them like, you know, five bucks or something. So, but here's
what I found out. It's fantastic,
right? You can drop in, um, an epoch. And if you don't know what epochs are, then you need to go
back to listen to our dating is hard episodes. Um, but basically anytime you're dealing with,
with, uh, time zones or times around the world, the simple answer is an epoch i believe is the number of seconds since um january 1st 1970 is
that right i think yeah something like that yeah january 1st 1970 um so yeah but people have made
it very hard over time besides just that right because sometimes i'll do it in milliseconds
sometimes i'll do it in nanoseconds there's all kinds of ways to screw it up and make it really
difficult on everybody so at any, they have some useful tools on
the page where you can just drop in milliseconds, nanoseconds, seconds, and it'll convert it to a
timestamp for you. You can just hit the button and it'll give you the exact dates. It'll give
it to you in your time zone. It'll give you to you in GMT. It'll tell you the relativity of when that time was to now,
like, it's just so good. You can also go ahead. You're going to say, no, go ahead and you finish.
Okay. So you could also plug in like your month, day, year, all that kind of stuff. And it'll give
it back to you in, in an epoch, even an epoch milliseconds or whatever. Like it's so good. What I failed to realize. And I seriously,
I was like, Oh my goodness. How did I never see this? If you scroll down the page and this is the
problem, everything they have is sort of above the fold. And it's so good. Like most of what you need
as, as just a, Oh, I need this tool is up here at the top. Well, if you start scrolling down the
page, they give you all
the information. What is epoch time? They just tell you exactly what I mentioned, not as well.
And then further down, how do I get the current epoch time in PHP, Python, Perl, Ruby, Java,
C sharp, Objective-C, C++, 11, Lua, VB, et cetera. Do you have have the samples there i can't tell you outlaw how many times i've googled
how do i get epoch in python or how do i get epoch and you know you end up on 12 different
stack overflows where people are doing things that you're like that's wrong yep they have it
right here keep scrolling and it's got how to convert it from human readable to Epoch,
how to convert Epoch to human readable in all the same languages.
Yeah.
I've,
I use this site a lot too and I'm surprised I went back and checked.
We have,
it's never come up before and I have used,
yes,
I have used this site.
I can't tell you.
I, I agree with you. There should be like, Hey, let me buy you a cup of coffee. Like, yes, I have used this site. I can't tell you, I, I agree with you.
There should be like, Hey, let me buy you a cup of coffee.
Like countless times when, you know, like you're looking at some kind of a log and everything's in epoch or
something, or your,
your create timestamps are in epoch or something.
And you're like, okay, I don't read epoch what time was that
can you can you tell me in my own time zone was that was that yesterday i just created something
is this the thing i just created or am i looking at an old version of it i absolutely love this
site i can't believe we've this the first time over 10 years and this is the first time we've brought this thing up and i've used this daily for five at least maybe more i mean it's ridiculous how much
i use this website like they probably have my ip like pinned somewhere yeah you probably have your
own dedicated node because they're like okay anytime this ip address just he's over here
forget it right right we know this guy. He's good people.
Yeah.
So, yes.
So, again, link in the show notes.
Highly recommend epocconverter.com.
It's fantastic.
All righty.
Well, for mine, I'm going to continue on with last tip that I gave was related to setting up a legacy contact for your Apple ID.
And I forgot to mention at the time, you can also set up an account recovery.
And there is a big difference of why you might want one versus the other. So the legacy contact is, let's say that you have someone that you're a legacy contact for and they pass and you then want to say to
Apple, like, hey, we need to get into their account for whatever your reasons are. It doesn't matter.
But I don't know their password, but I need to get into their account. And so you can go down that route and Apple will eventually give you access into the,
um, the account. But the downside with going the account, um, the legacy contact path is if
once you do, once you go down that path, right. Uh, you will not any, any of their devices that have
like payment mechanisms on it. You'll that, that'll be lost. You won't be able to do that,
which that one's not the big one. The big one to worry about is the key chain,
everything lost in the Apple key, everything stored in the Apple key chain will be gone.
You won't get a gain access to that. And where that's a problem is Apple introduced the ability
to use the keychain to store as a password manager, right? And a password manager that can
be synced across devices. So if instead, if you have a family member that instead of using like
a LastPass or a Bitwarden or whatever, if instead they're using this, the built-in functionality in iOS
with Keychain, you would lose the passwords to all of their accounts that they might have,
which could be financial related, right? Bank accounts, investment accounts, whatever.
So that's the downside to that. But, and especially if you're doing this because you're like, Hey, they're still, they haven't passed, but for whatever reason, they might not be able to, you know, um, they either forgot or they don't know or, or whatever you know point is is that there is hope there is an expectation that like hey
they're eventually going to be able to use this thing again right if you go down that legacy
contact path you would be losing all all of that stuff from the keychain which is important well
that's good information also worth noting that keychain will also sync across mac os so it's not
just ios if you're using the same, whatever your Apple login is across devices.
Yeah, your Apple ID.
Yeah, if you use your Apple ID, then that keychain is available across everything.
So if they're doing stuff on computers, then you don't can work is the expectation is, well, first of all,
this needs to be someone you trust, right? Because, you know, you need to ensure you need to
have some trust that their device can't get lost out in the wild or, you know, their credentials
get lost out in the wild. And then they are like,
oh, hey, let me like reset your account. But the expectation is that like, let's say,
Alan, that I'm an account recovery contact for Alan, then the idea is that Alan would say, hey,
I don't know my Apple ID password anymore. I need to reset it. Outlaw, can you give me the recovery code? Can you send me
the recovery code? And then I would be able to do that for you. And the hope there is that that's a
much faster process. I had to go through another process where it is, I gained a lot of respect because I was like, Oh, this is actually
super cool the way they do this. So the account recovery process, like I said, would be like super
fast, right? Alan forgot his password. Um, and you know, he, he asked the account recovery,
Hey, I'm going to reset my password and I need you to send me the recovery code. And that's why it's like a trust factor. Like you need to make sure that that person, um, is, uh, you know, not going to lose that,
lose their account. Right. Um, there was another downside to that though. Uh,
there, cause there is a, there is a waiting period that happens with that. And I'm not sure. So in my case, I had this happen
with a family member where we didn't have a legacy contact or account recovery. And there's another
process that you can go through where they have a waiting period. And the waiting period exists
for the account recovery too, but I'm not sure if it's the same length of
time, but what there's an Apple support app and I can download it and I can say like, Hey, I want
to, I'm trying to help someone else out. And you can even do this at the Apple store. I'm trying
to help someone else out. Let me reset their password. And what will happen is they will start from the moment you initiate that process, they will start a waiting period. Right. And in my case, because we didn't have all of this, it was a 20 day waiting period. And I think I'm not sure if it's the same in the case of using the account recovery. So just a little bit of a disclosure there.
But the way that waiting period works is if at any point during that 20-day waiting period,
if in the case of like I'm trying to help Alan by resetting Alan's password,
if at any point during that 20 day waiting period, if Alan's account successfully authenticates to Apple, then that's it. The waiting period is done.
Not that the waiting period starts over. I mean that the waiting period is poof.
You can't do anything.
We're done with the waiting period. We don't care about the waiting period anymore.
And so what that helps prevent is,
let's say that I am, you know, malicious user and I'm trying to reset Alan's password.
And I'm like, Hey, let's get into this recovery situation. But Alan knows his password. He's like,
no, no, no. Let me authenticate right here. Right? Like the 20 days is gone.
But so when you say the 20 days is gone, not only is it gone,
it cancels your ability to even try and get into it at that point.
Right, right.
The process is stopped.
The process is over.
It's complete.
But if during that 20-day waiting period you never use the –
you never successfully authenticate to Apple during that 20-day period,
then because in the example here that I gave
where I'm doing this on Alan's benefit,
then at the end of that 20th day,
I will get a notification as well as Alan's Apple ID
that, hey, your waiting period is up. You can go reset your password here.
And then what would happen is during that, um, when Alan would go to reset his password,
whatever, whoever helped him out, which in this case was me would get a code that I would then
give back to Alan to say, here's the code to use to reset your password. And then you, you go through. The problem is if your Apple ID is an iCloud, is your iCloud account,
then you would never see that email saying, Hey, you can reset your password because you can't get
into it. That's the whole problem. Right. But but but i would get that notification right because
when you would go through that process you would say like hey we're going to use michael so send
it send send the he i trust him for you to send it to him and i would get a notification party code
yeah i i would get a random you know thing like oh hey and and it'd be kind of cryptic because
all you're going to get at the time is like you can now reset the password right but they're you know it's not going to like hey
for a specific case number or here's the code to do it none of that's going to happen you're just
going to go you can go do it and you're like well because i specifically know that it's alan that
we're talking about in this case i can go do do that one. Right. But they're not like,
think about it from like a, there's no context. They don't want an information disclosure. Yeah.
Right. Exactly. Exactly. From an information disclosure point of view, there is no information
given to you, you know, for context, but because I do know what's happening, that's the only reason
why I would be able to do that. Right. So it's so super cool. Yeah. It's awesome.
I was like so super impressed and happy.
Now,
you know,
I'm not gonna lie.
You're going through that 20 days and this a bit stressful,
right?
You know,
because like,
think about every account that you or a family member might have to where
either you have your password stored in something on, on the device, um,
like in key chain, for example, or you have, um, uh, or, or even like in a last pass or whatever,
but you know, if you don't have access to, um, the device, you might not even know what they
were using as credentials to get into the last pass or whatever. And they might not remember whatever. Um, especially if you're using one to store the
other kind of situations, cause people sometimes do that too. Um, but even from a two factor point
of view, right? So Apple, um, introduced the two factor authentication ability to iOS, like versions back many years
ago now. Right. But you're not going to get any of those two factor authentications anymore.
And then think about all the two factors that can happen with, um, um, like as text messages or,
or, or, uh, you know, automated, you know, robots calling you to tell you like your authorization code is
one, two, three, four, five, six. Damn, that's the same as my luggage. You know, something like
that, right? You could lose all of that. And furthermore, in this specific case, what happened was because the ID was entered in wrong incorrectly.
So many times the phone itself was completely bricked to where you can't do
anything with it.
It literally,
the phone will bring up a message even on a reboot.
It doesn't matter.
It'll bring up a message to where it'll say something to the,
I'll see if I can find the exact wording,
but like this iPhone is unsupported.
And it'll only let you make emergency phone calls or reset the phone,
but to reset it to,
you know,
in other words,
to wipe it and,
you know,
reinstall everything you have to know,
you have to know the iCloud,
you have to know the Apple ID password in order to reset it. So you're in this catch 22 where I
have this device that I can't use. You can hear it receive text messages, but you can't see the
text messages and it won't even ring for a phone call. So you can't even receive a phone call
because some of those like two factor authentications, like I said, are instead of text messaging, call the number,
but you can't do any of that.
So like you can really get into a bad situation.
But point is you can short circuit a lot of this and make your life a lot easier
by setting up account recoveries.
And while you're at it, you can set up legacy contacts.
But you, like I said, do it for account, especially the account recoveries.
It should definitely be trusted individuals that you don't, you're not concerned about like them
losing access to the device or their credentials and then the problems that could come with it.
So, so what he's talking about is, and I mean, everybody, I don't know, I don't know the age of everybody listening to the show, but, you know, this is basically along the lines of, of, something, right. Or, or vice versa, if your parents, right. Like this is something that it's not a pleasant topic at all for anybody to ever talk about. But, um, uh, doing this planning upfront before anything does happen is, is way less painful than having to deal with it when when you you don't have access to
anything right like and and you don't know how people pay their bills or how they logged into
things or whatever like it's uh it's probably a conversation worth having with whoever's important
to you to try and and and nail some of this stuff down yeah well. Well, so, uh, along that lines too,
I think we may have talked about this.
I don't recall.
Did we talk about that?
Like,
um,
like last pass has this kind of ability to call it emergency access and what
can happen.
If I recall the way the last pass version works is that I can say,
Hey,
Alan can have emergency access to my account, but I can establish what
that waiting period is. So I can say that Alan can request the emergency access to my LastPass
account, but he has to wait, let's say, three days. And during that three-day window, I can be
like, no, don't give him that access. Or if I've passed, for example, or I'm in a coma or
whatever, then obviously I'm not going to be able to respond to that. And after that, a waiting
period has expired, then Alan would be able to get access into it. Right. And I'm, I would have
to assume that, you know, competing products like a one password or a bit warden, you know,
that they would have a similar concept to it. Um, but yeah,
the point is to Alan's point that, you know, regardless of what your age is like, you know,
even if, you know, you're either on one end of the spectrum or the other, either you're trying
to set up this type of stuff so that people have access to your stuff when you pass, or you need
to set it up so that you have access to when you have you pass, or you need to set it up so that
you have access to when you have loved ones that pass or, you know, whatnot. So, um, yeah, it's
just lessons learned, you know, going through this sucky situations. Um, but especially with
technology nowadays, making it to where you have access or availability or or your loved ones have access
or availability to yours in case something happens is is i mean not pleasant but probably worth
spending some time on well i mean if we're gonna if we're gonna get into boomer hour for just one
moment let's do let's also think about too though they're like because i don't know about you, but like I try to, we try to be as paperless as possible.
Right.
And, you know, imagine if you did not have access to any computing device, like you've,
you've forgotten, you've lost everything, your computer, your, any kind of tablets,
any kind of phones, like you don't have access to it, period.
Right.
And now someone else from the family is like, well, I don't even know what you have. Right.
Like there's nothing there. I don't know what bills you have, what bills might be coming in. What, what, how would I even be able to get into it to know right so yeah it's it's a it
can be eye-opening when you have to go through it yeah for sure so save yourself the time and do this
ahead of time hey see this how we're in boomer hour or boomer after hour yeah hold so my wife
the other day so i've been i've been like sick all. So I've like been self-courting, right?
Yeah.
It's just the sexy Allen voice today.
Oh, that's right.
I do that sometimes just for fun.
Um, sexy Allen, the great.
Uh, so I've been like holed up in my office all week.
Right.
And my wife was like, man, I don't feel like making dinner at night.
She was going to order pizza.
Like you remember what it costs to order like two large pizzas from like papa john's we're
really going down boomer hour okay no i'm legit okay yeah yeah yeah i'm okay right so like
yeah like what do you remember spending on like two large pizzas from like papa j or dominoes
or pizza hut like i mean what's what's a reasonable figure
for you huh i mean i can remember when domino's used to have like the special 999 pizzas right
like right right but you know that that might date me and i don't want to say that out loud
it might be too late years ago am i too lazy so so wife, like two large pizzas for Papa John's,
they were like 65 bucks.
She was like,
no,
thank you.
Um,
so then I,
I guess she was like,
you know what?
This is ridiculous.
I'm going to call up like Pizza Hut dominoes.
Like they were all like pushing 60 bucks.
We're talking about dominoes and Papa John's man.
Like we're not talking.
Yeah. I mean, look, it's totally fine pizza.
I like Papa John's.
It's just fine, right?
Like, I like most pizza.
But we're not talking Mellow Mushroom with their holy shiitakes and all this special stuff, right?
We're talking about freaking bread with cheese on it.
I'm trying to figure out how you got to $60 for two pizzas.
I'm trying to go through this now. So here's for two pizzas. I'm trying to go through this now.
So here's the crazy part.
Order online right now.
Domino's.
Part of it was she was going to have it delivered.
And the delivery was going to basically be $20.
There was a delivery fee plus a mandatory tip and whatever.
And it was like $20.
So the total, like $45 was for two pizzas plus $20 of delivery or 40 plus, you know, 50.
And she was like, I can't like, I mean, dude, I'm, I'm with you. Like, okay, let's even adjust
for inflation over the past few years. Like I'm still thinking, okay, maybe $35, you know,
and then plus maybe a $10 delivery tip or something fine but no these things were bumping
into 60 up to almost 70 bucks for two large pizzas and she's like i'm not doing it i so she
got her a freezer pizza threw it in the oven right she's like i'm just i'm not i can't so yeah i can't
i can't go down that path with uh domino's because they want to either sign in or like give the
address and i'm like there's another boomer hour thing freaking give me the stuff on the website man
don't make me sign in to do everything in this world i'm so tired of all those yeah wow boomer
hour really got serious there for a minute i thought we were like jokingly referring to boomer
hour and then we really got into like complaining about the prices of stuff so yeah i was ha she
was mad she she was so mad that she skipped buying pizza um from one of those places so yeah
all righty well uh for this and more boomer hour subscribe to us on itunes spotify whatever we're
done goodbye that's right hey no go to Slack. For real. Create stuff on Slack.
Go to Slack.
TonyBlocks.net slash Slack.
You could let it go.
I was like, I'm going to tease him here and see what happens.
And nope.
All right.
We're done.