Coding Blocks - Designing Data-Intensive Applications – Storage and Retrieval
Episode Date: March 2, 2020In this episode, Allen is back, Joe knows his maff, and Michael brings the jokes, all that and more as we discuss the internals of how databases store and retrieve the data we save as we continue our ...deep dive into Designing Data-Intensive Applications.
Transcript
Discussion (0)
You're listening to Coding Blocks, episode one, how many, I don't know, we stopped, we've
been together too long, we haven't been together long enough, it's been too long, I can't remember
what number we are on, 127, let's call it.
I think that's it.
All right.
Subscribe to us and leave us a review on iTunes, Spotify, Stitcher, and more using your favorite
podcast app.
And check us out at codingblocks.net where you can find show notes, examples, discussion, and episode numbers.
We have a Slack, too, that you can hit us up at.
And you can email us at comments.codingblocks.net, too.
I got left out, man.
And you can follow us on Twitter at CodingBlocks
or you can head to www.codingblocks.net.
I actually trademarked that.
And you can find all our social links there at the top of the page.
With that, I'm Alan Underwood.
I'm Joe Zach.
And I'm Michael Outlaw.
This episode is sponsored by Datadog, a unified monitoring and analytics platform built for
developers, IT operations teams, and businesses in the cloud age, and educative.io.
Level up your coding skills quickly and efficiently, whether you're just starting, preparing for an interview,
or just looking to grow your skill set.
And Clubhouse is the fast and enjoyable project management platform that breaks down silos
and brings
teams together to ship value, not just features. All right. And today we are talking about the
data structures that power databases based on the third chapter of designing data intensive
applications. And this is my favorite chapter so far. And so I'm very excited about it and
glad you're here to be with us on this journey.
Hey, in fairness, each chapter that you read after the previous one was your favorite one,
right? Like that's kind of how it happened. Yeah. So, but I have read chapters since this one.
Oh, this is still the one that like sticks out to me. So there's other chapters that I like,
but this was the one where I was like, all right, let me get the popcorn. All right. I dig it.
Uh, well, I mean, I've got an opinion on that.
I would say that I'm not saying that the other chapters were bad, but compared to like past books that we've covered and everything,
I'm like, eh, you know, whatever.
I mean, it's good stuff.
What?
It was good stuff.
This is my favorite book that I've read so far.
But this chapter that we are about to cover tonight.
Okay.
Oh my
God.
This is
like the one chapter
like if you only
have one chapter of anything that
you are ever going to read for the rest of your
life and you want
to be a developer,
you need to read this chapter.
It's good.
It is.
It is.
Not only is it good,
it's that good is what you meant to say.
Yeah.
Yeah.
I corrected that for you. Yeah.
Because like,
I think back on it and I'm like,
man,
I wish I had this chapter at the start of my career.
Hmm.
It,
yeah,
we'll get into why here shortly,
but first.
Okay. So yeah, but first, uh, you know, as we like to do, we like to say thank you to everybody that took the time out of their busy day to leave us a review. Collector of Much Stuff, Momentum Mori, Brian Briefree, Isla Dar.
Oh, you put the L before the Y.
You can't do that.
I put the L before the Y.
Oh.
Isla.
Why are you still laughing at me?
How about Isla Dar or Isla Dar?
Okay.
There you go. That's what I was going to say.
Yeah, right.
And then James Speaker.
Very good.
All right.
And I got to say a big thanks to iDigily.
Appreciate those reviews.
You know we live for those.
So thank you very much.
Yeah.
And like, okay.
So I don't know if you gathered some of my excitement at the start right but i
think it's been a minute since the three of us have been together it's been a little while yeah
like at least a month it's been kind of crazy and it feels like it's well it actually feels like it's
been longer than that though right because yeah in recording wise it might have been a month
but that's you know tack on another two weeks or so before that.
It's been a while since the three of us have gotten together.
We've all been crazy, crazy busy.
And I think even the last episode that we recorded, we weren't together.
I think you had to record remotely, if I remember right.
So this might be the first time this year.
I'm like giddy, like,
my friends are over!
Mom, can I
unspend the night?
That's a great segue to say
that we're actually going to be hanging out in meat space
at Orlando Code Camp coming up
March 28th this year, and registration
is open by the time you hear this, and it's
free. And and so not only
get to hang out with us but free lunch and free shirt so if you are anywhere within travel distance
to orlando on the 28th you should come on down because it's gonna be awesome uh looking at about
like 100 talks from a ton of speakers it's gonna be fantastic 14 different rooms uh just jam-packed
with uh free awesome great talks and us.
And us.
And we went down to it last year, and I think both of us spoke at it as well there.
And it really is a great event.
Like, Santosh and the people who put that thing together over there, like, they do a killer job.
So definitely come.
I mean, you'll learn a lot of stuff, and it's fun.
And I'm giving some sort of talk on kubernetes and joe you're giving some
sort of talk on we're gonna be tracking ufos with uh streaming architectures kafka and graphql
very nice technically we all three spoke at that conference if you recall oh allah did a lot of
speaking but no no no i'm not talking about i'm not talking about at the booth i'm talking about
like in the rooms remember oh you were part of the panel well that's right really count that because you showed up like
in the last three minutes of like a boss he was not fashionably late he was not fashionably late
he showed up and he did get carried out of the the pre-party uh almost by a guy uh you were on
his shoulders he was like taking you off to some cooler party. I don't know. That was weird.
Yeah, you're definitely telling this like in a weird way.
Yeah, go meet us for drinks.
We'll tell you the full story.
Yeah, that's right.
So definitely, if you're going to be in the area, come hang out with us.
Come talk to us. We definitely love to meet you all and come to our talks.
Hopefully, they'll be good.
Yeah, you'll definitely find me at the booth.
So definitely stop by, say hi. I'm sure I'll have some swag there for you to pick up.
And this year people won't be writing their email addresses down, right?
Like that stuff. We might've upgraded. It'll be a much smoother experience.
Can't read your handwriting. So also I want to mention for this episode, go ahead and drop that comment on the website,
and you will be eligible to win a free book that we'll ship to you.
International is totally fine.
We love it, in fact.
So go ahead and do that right now while you're thinking about it.
And that book would be Designing Data-Intensive Applications, as that is the topic we are covering.
And the most exciting chapter that we're going to dig in tonight,
storage and retrieval. Now-
We should say we're not going to finish that chapter tonight because you know how we do.
Wait, how do we do?
You know how we do.
Very long way.
Yeah. But I know what you're thinking because you're like,
outlaw, hold on. How can storage and retrieval be the most exciting
chapter of the book and like the chapter that the single chapter of any book that you should ever
read it it is have you ever wondered how databases work that's why it's i mean yeah let me put it to
you like this uh we've each been in our careers for a minute and,
uh,
you know,
been using databases.
Did you ever think to take the time to think about like how the data was being
written to disc?
No.
Right.
It's something easy to overlook,
right?
Well,
you just assumed it was boring,
right?
Yeah.
That's the thing,
right?
Is you assume that,
well,
I mean, they figured all this out. Why do I need to think about it? I just need to think about the. Yeah, that's the thing, right, is you assume that, well, I mean, they figured all this out.
Why do I need to think about it?
I just need to think about the SQL queries.
There's the thing.
They figured it out.
Why do I have to think about how they did it?
Right?
I don't, right?
Like, why do I care?
I don't care.
All I got to do is focus on, like, does my query perform?
Do I need to add an index?
Is there already an index there I can use?
Blah, blah, blah.
Yep. And we already mentioned how I incorrectly think that we are in the golden
age of database systems
because that's not actually what golden age means.
But I still feel that way
because there are so many good choices to make
and it seems like we had a kind of explosion
of them a couple years ago. And
after kind of reading this chapter and
reading the rest of this book, I feel like i understand why there are so many why they have differences
why there isn't just been one to rule them all and why they all exist and the kinds of trade-offs
and things you have to consider when choosing one and by looking at like a deep kind of deep dive on
how it works underneath i feel like i'm able to tie it into other things i've known about like
data structures and algorithms,
trees,
things like that,
that I,
you know,
I kind of know a little bit about.
And so like bringing these two things together,
two worlds that I know a bit about and finding a common commonality between them has just been really exciting.
And,
and I don't want to take away anything that's coming up,
but like,
that was definitely one of the things that I loved about this chapter was that
it does talk about, it basically is like,
okay, hey, let's just think about this. Like, what if we had to start from scratch and write
our own database from scratch? Like, where might we start, right? And he starts off with just
writing a key value pair to a flat file using two simple bash functions that he creates in his script,
right? And just starting out small and then starts building on there. And then
as the chapter progresses and moves on, then the more complicated concepts that Joe just mentioned,
where he starts talking about, where you would start to think about like, Hey,
this is where other data structures might be beneficial, right?
This is where a B tree might be helpful.
This is where an LSM tree might be beneficial. Like those start,
those things start to like crop into the conversation. Right.
Just so organically.
But do you think, I mean, just to set the ground here, do you think that this is more interesting to us now because data is now so massive? to where we need to understand that stuff because those choices of the systems that you implement or you adopt actually have a huge impact on how things work.
Is that why this stuff now feels more important to us?
Yeah, I keep wondering.
Is this so interesting to me because it's so interesting or is it just because it's literally interesting to me because of its relevance to my day-to-day life?
And I don't know the answer.
I am guessing by the comments and feedback that we've been getting that a lot of people find it very interesting too.
So we've got that going for us, I guess.
Yeah, true.
I think for me, it's just a part of it is just having taken things for granted. Like, like,
I mean,
it's one thing to even talk about,
like,
did you ever care about how a database is written to disc?
Did you ever think to care about how a cube is written?
Right.
Right. Like,
yeah.
Like who cares?
Like OLAP.
I don't care.
Right.
Or at least I never thought about it before.
And now I,
you know,
reading this,
I'm like,
Oh man,
that's so awesome.
Right.
Yeah.
I mean, I'll tie this into some of the stuff that I know that Joe and I have been working on is you look at things like a technology from Google Cloud is called BigQuery.
And one of its claims to fame or one of the reasons why people want to use it is because they wrote their own storage format for the data that comes in
because it enables them to do faster analytical queries and that kind of stuff, right?
So that's all stuff that ties into what we'll be talking about here
and what we'll be continuing on as we get on through this chapter.
So I guess with that, let's go ahead and start with just some basics, right?
Because get everybody on the same playing field here.
Yeah, so I went to Wikipedia and looked up what a database actually was
because, um, you know, we throw that term around a lot. And I think a lot of times people will have
kind of, um, we'll just think of the databases that are kind of used to using, but I really
wanted to kind of hone in on the definition because this book kind of starts separating
things a little bit and talking about the various different parts, particularly like the storage
engine and like the query engine, Like a couple episodes, we talked
about specifically the languages. And we talked about even how some graph databases, like you can
swap in and out, like the language and the syntax that you're using. And it is responsible for
mapping that to what it actually performs underneath. And so, you know, I just wanted
to kind of go take a look. And the basic definition, no matter where you look it up,
it's basically just like, yeah, it's organized data,
and sometimes you can access it.
And that's kind of it.
So the example that Al mentioned of writing with bash scripts to basically just update a file,
yeah, I mean, that's kind of a database.
That's it.
Just a collection of data.
And when you talk to a developer about it, though, of course, when you of a database. That's it. Just a collection of data. And then when you talk to a developer about it, though,
of course,
when you say a database,
like we've got these kind of these preconceived notions,
like,
uh,
we're generally talking about a database management system at that point.
We're talking about the database.
Yes.
But we're also talking about like the APIs and all the things that kind of go
around with that,
like the,
the query languages and,
you know,
even like the ways like you,
uh,
can organize that data,
either like sharding or partitioning or accessing the way you control access to it,
like whether, you know, like users or what you can do, like file permissions, work, stuff like that.
So those are the kind of things that we usually think of when we talk about databases.
We're really talking about that whole system there.
I guarantee you, if you asked any member of your team, your development team,
hey, I need you to create me a new database. No one is going to start with a file, right?
They're going to immediately jump off to like whatever the platform of choices that your group
already uses, be it Postgres or SQL Server or Oracle, whatever, MySQL, they're going to jump
into that and that's what they're going to create.
That's going to be your database, right?
Notice Microsoft Access doesn't count, right?
Even though technically it is.
It technically meets the definition here.
So one thing I want to point out is we said database management system.
There's actually two kind of big flavors of these things that are worth calling out
is you'll typically see them called either RDBMSs for relational database systems. So all the ones that Outlaw
just said a second ago, SQL Server, Oracle, MySQL, Postgres, those are relational database systems,
right? And then you have the other ones that we've talked about in previous episodes that are
your NoSQL or your document databases, right? So your MongoDBs, I think CouchDB falls in there.
There's a lot of those, right?
So they're both database management systems
because they both have those APIs and those access controls
and all that kind of stuff.
But there are different technologies sitting on top of them
that turn them to either relational or document database storage.
So just keep that in your head that it's still a database system is nothing more than a collection
of data, right?
And how it's stored is the big difference in how it's used.
Yeah.
And you know what you mentioned a lot about how, you know, someone says create a new database.
I go to SQL Server and I right click and I do that.
But what's funny is like, depending on what databases you're using, some of the more modern
ones are multi-model now.
So if it comes to like Cosmos and say create a new database it's like okay well tell me a little
bit about your use cases uh same with dynamo and even you know like my sql has different storage
engines that are better for different things and so it's just kind of funny to see that our world
is expanding which can be frustrating because those are all new things that need to we need
to understand and there's trade-offs associated with each of those decisions.
But it's also an exciting time to live.
There's things that are evolving and growing and things that we can do easily now that were really hard to do a couple years ago.
Yeah, when you started to go down the path of the two systems, for some reason,
I wasn't even thinking about relational versus document, even though that was just a topic of a recent episode that we did.
I thought you were going to go down the path of OLTP versus OLAP.
Right.
I just assumed that's where you – so maybe there's three.
Yeah, there's even more.
I mean –
Have we talked about that on this show?
I don't know in depth.
It's coming up in this chapter.
I don't think we're going to get there tonight, though.
Maybe we'll see.
So it'll be a surprise.
Let's keep hope alive, man.
One thing to point out here, though, is like Joe said, things are changing a lot.
Keep your eyes open.
I've actually got a blog post coming out about that.
Be aware of the things that are out there.
Don't just do what you've always done because you've always done it, right?
That doesn't necessarily make sense.
Look at what the use case is and pick the tools that make sense.
Right?
Oh, man.
We've always used VB.
And so tonight we're definitely going to be focusing on –
Tybee says that's a big one.
You should use that one.
That app or that page hasn't been updated in like 10 years, I'm pretty sure.
So tonight we're going to be focused mainly on how things are basically stored and received.
We're going to start going down this path.
The first one is kind of talking about why you should care about how the data is stored and received.
And that's kind of something that before reading the book, I would have thought like, well,
I have no plans or interest in competing with Oracle or SQL Server or whatever.
So why should I care?
There's been enough to know how to perform well and how to write good queries and how
to use the analyzer and stuff in order to get the performance that way. Why should I care? Is it enough to know how to perform well and how to write good queries and how to use the analyzer and stuff in order
to get the performance that way?
Why should I care?
What I'm getting at is that
you also need to be able to
make choices about which storage engines
to use. If you don't understand the trade-offs
and why, say, Elastic is good at
some things and the things that it's
bad at too, then it's really easy
to get either suckered by marketing or to go with
the decision that kind of by default rather than making the decision.
So I think it's important to kind of have that knowledge.
And it's also just really fun to understand.
Well,
the part that I liked was the actual statement from the book.
Hey,
just because you're not going to create your own storage engine from scratch
doesn't mean that you shouldn't understand it.
Because like you said, it'll help you choose things better, right?
Like, yeah, like you said earlier, when was the last time that you were like, hey, somebody, we need a database.
And you were like, all right, I'm going to go write some bash shell scripts, right?
Like that's not what happens.
So, but understanding it is hugely important, I think.
I mean, this whole chapter is all about it focuses mostly on like how you can optimize things for rights and how you can optimize things for reads.
And then where the tradeoffs or balances are between the two, those two needs in different paradigms that might exist, right?
And with the goal of, by the time you finish this chapter, you should at least have enough
of an understanding that you can then pick which technology best meets what your main
use case that you're trying to solve is going to be.
If the thing that you're trying to solve is going to be, I need very fast reads and the rights are less important, right? Then at least you can have an
understanding as to like what you should be looking for in your engine. You know, I like to
always bring up that course I took on Educative, Grocking the System Design Interview. You would
not believe how many times you're reading through like the Twitters or Ubers or whatever's architectures.
The question of what data storage you're going to use boils down to first deciding what your
read and write traffic looks like.
Because that informs, it lets you pick a whole kind of category.
And then when you start thinking about what you're going to query and whether it's like
transactional or basically doing aggregations and analytical, that's like a whole other category. And then when you start thinking about what you're going to query and whether it's like transactional or basically doing aggregations and analytical, that's like a whole another
category. So you can just eliminate a huge number of choices by knowing those two things alone.
And it's really exciting to kind of see like, oh, if I start with these things,
like I can immediately kind of hone in on some things. And I bet if you kind of take a look at
slicing your business use cases on whatever you're working on today and kind of slicing things that way and thinking about what you're really doing with your use cases,
you might quickly figure out that you are using a suboptimal solution. And that could be fine.
You know, maybe it's working fine for you. It doesn't mean it's, you know, that you shouldn't
still do that because, you know, you've got experience with it or you've already got it or
whatever. But it's still, it's good to know that like, Hey, there are tools that are specifically
designed for the things that I'm doing. And,
you know, hopefully if,
if you know,
we've done it right.
And if this,
this all works,
then you should be struggling with the things that that system is,
you know,
traditionally bad with or where those mismatches are,
where the requirements don't quite work,
the storage engine that you've got,
and you should be probably feeling some,
some contention there.
And this will kind of explain why and what you could do about it.
And maybe it can't be stressed enough that in case if you weren't already trying other things or didn't already know, like you shouldn't just rely on like, say your SQL server instance to
try to be your everything. But you know what, to be fair, and I agree with that, but here's the reason why people do,
is over time, the SQL servers, the oracles of the world, even the Postgres,
they've turned into Swiss army knives, right?
Like, if you need to schedule a job, it's built into it.
If you need to do some analytical type stuff, you can do it.
The SQL syntax is there. So it's understandable that
everybody's latched on to those things and they don't want to walk away from them because if you
know how to write a query, then you're like, hey, I know how to get out of this thing what I want
to get out of it, right? But what Joe just said and what Michael's getting at as well, the important
part is it might be suboptimal. So yeah, it'll work, right?
How many times did you fight that thing where it's like, oh, the query is now taking 30 seconds.
You know, it used to take half a second. Oh, well, you've pushed it past what it should do. Now it's
an online transactional database and it's your reporting database and it's your analysis database.
So now it's doing all these things and it's doing them all suboptimally now because they're all contending for those same resources.
So it's fair to know that you're probably doing it, but you're probably doing it because they built them to be able to do all this stuff, even though they might not be the ideal solution for it.
Yeah, it's similar to what Jeff Atwood has said, that your comment about like, yeah,
they are Swiss Army knives, but it can be suboptimal at other scales, which is similar
to what Jeff Atwood has said, that everything is fast for small in.
So, yeah, you might be able to get away with text searching in SQL Server, which is a feature, right?
And, you know, if it's not at a large scale, that might be fine.
It might be good enough.
Yep.
But for an Uber?
Yeah, they can't do it. And I actually really like that Swiss Army knife analogy because Swiss Army knives have those little screwdrivers on them.
And you can totally unscrew a screw with that thing.
And you will be frustrated by the time that you're done with it.
But it would be a whole lot easier if you just had a Phillips head screwdriver that you could go do it.
They'll both get the job done, but one is going to give you a lot better experience than the other, right?
But the bonus is it comes with a toothpick, so you're probably okay.
It does.
I mean, that makes up for it.
Yeah.
And what is the toothpick in SQL Server?
Do we know?
Ooh.
Yeah, I don't know.
All right.
Leave a comment.
Let us know.
We'll build a book.
Right.
All right.
All right.
And so in this chapter, you should have a lot more knowledge about how to kind of choose and evaluate storage engines.
And that's really powerful and really interesting.
So now it's time to get into the fun stuff, right?
Like where we start digging into what is actually happening.
So, yeah, this is the part where you'll kind of get an appreciation for it.
So in the example that Outlaw started with earlier where there were two bash statements, right, and one that was, you know, right, and I think another one was get, the way that it works is it's an append-only file, right?
So you have this text file, and every time you go to write a record, you have a key and you have a value is the way that they're going about it in the book, right? And let's just say that the key is your
name and the value is, I don't know, a document about you, a contact information, right? So you
have outlaw that's going to write a line. It's going to have outlaws, the key, and then the
value is going to be his contact information, right? Alan, same thing. Joe, same thing.
The important part here is it's always write only, right?
You don't append only or append only. Yeah. Good point. You're always basically opening up that
file, writing to the very end of it, closing the file. So every time, if I update my address,
then I have a new line at the bottom of that file and I'm now in there twice.
Yeah. Now here's the beauty of that approach approach because if all you're ever doing is appending to the end of the file, all you have to
do from a right perspective is just seek to the end of the file, boom, add your new line and you're
done. So what you're describing is a, a, a right enhanced file format. You're just appending to
the end of the file and super duper fast like
you said it seeks to the end it already knows where it is it every operating system on the
planet is highly efficient at doing this yeah i mean for the most part most file systems you know
there's some differences but they'll have a pointer to where the file starts and they'll have like a
size or you know basically some sort of indicator of where it ends so just like array access you can hop right to the end even better
if you go ahead just leave a thread open and with that file open and just have that one writer
constantly streaming data to it you can even skip those steps so all it is doing is just moving
data you know into that file and uh i don't know if you're familiar with like zero copy
but basically there's a couple ways you you can kind of short circuit a couple
of things in operating systems,
modern operating systems where you can actually skip running through Ram if
you're writing like to a file.
So you can go directly from like a network card to disc,
which is crazy.
And there's some,
some caveats around there,
but that's kind of the gist of it.
So,
I mean,
we were talking about super duper ridiculously optimized,
like can't really imagine a better way to write data.
Yep.
And one of the things that they like to call out in this particular section is this file that we're talking about is called a log.
So typically, as application developers, we think of logs as, oh, that's where the web server log is or that's where this log is.
My application log.
Yeah, my debug output, whatever.
I'm using Apache log for net, log for j.
All log means is a write-only file, right?
So that's what they're talking about.
Pinned only.
And append only.
Yes, not write only.
And append only, always writing to the end of it.
So that's the important part.
So they call it a log, and the thing here that is key also is it doesn't have to be human readable.
And in many cases it's not because it's not the most efficient way to store that data.
There's some beautiful ways they talk about later.
Yeah.
So, so just be aware log and not human readable, but it is append only.
And there's pointers to those, those keys or those records, right?
And already you can start to make connections in your
mind because then you start
talking about, well, it's a log.
You're like, oh, transaction log.
Transaction log, that's a thing.
For all databases.
We're talking about databases here, transaction logs.
And we're already starting with a very
quick definition of this log.
Yes, we were talking about the very beginnings of what an actual database is.
Yeah, it's funny.
Like you said, let's talk about RDBases.
And then you start talking about logs instantly.
Like my mind was like, who cares?
Okay, I guess we're getting to, you know, like how transactional systems work.
But no, it's like it's literally talking about the ways to quickly write data.
And the deal is, and the reason that, you know, in addition to just being efficient and good at appending is that if you think about the opposite, if you were going
and writing to a spot in a file, that means you have to seek for it. You have to find it. You have
to go to it. If that information is larger than the information that you're updating, then you've
got to make room by basically shifting everything else to the right. And if it's smaller, same thing,
you have to shift all this data. So it's a grossly inefficient compared to appending. So we're not talking about
a micro-optimization here. We're talking about essentially
an order of magnitude difference over appending.
Hey, when we talk about this thing as a log,
does anyone else think of Ren and Stimpy
while we're talking about it.
A log, a log.
What rolls downstairs, alone or in pairs, and over your neighbor's dog.
For fun, it's a wonderful toy.
It's great for a snack.
It's on your back.
I just think of all the countless hours of my life I've wasted sifting through logs when the problem was glaringly obvious in retrospect.
Wait, you don't think about running Stimpy?
Even when he said that, you didn't think about it?
You know, I do, but it's like number four on the list.
Okay, fair enough.
All right.
Four.
We've got Yuletide logs.
We've got all sorts of other logs.
All right.
So the next thing we had, so you talked about
writing into the middle is way more expensive, right? Like order of magnitude more expensive.
Well, this is also where they start talking about reading from a log is highly inefficient,
right? So we talked about the fact that this whole append only thing is amazing. It's fast. You go straight to the spot.
You put in your new data.
Now, if I say, hey, I want to get Alan's contact information right now, you got to scan through the entire file.
We get into more things right now.
We get into the last record.
And then you're going to return the one.
You're going to find Alan's record in there, and then you're going to return the last one that you got.
And you're kind of skipping over something, but it's important, though.
Okay.
When you say the last one that you got,
because as we mentioned with this append only log,
you made the example of like where you could,
if you updated your entry,
it was in there a second time.
Right.
And so that's why it's important.
Like,
I don't know,
maybe you updated it 50 times.
Right.
Right.
But it's the last one that you really want.
Cause that's the one that has
the most correct information. Now, in this example file that we're talking about, our contact
database, you know, so far we've only really talked about like two records of real interest
in there that you put in there, like yours and mine. But, you know, there might be every name
in the United States inside of that thing.
And think about how many times an individual might change or update their contact information.
If you had to then go and scan that, you're like, okay, well, I need Alan Underwood's last entry.
Right.
And the important part here is you have to scan it.
Now, maybe there's some sort of hyper-efficient way
to reverse your way through a file.
I don't know.
I haven't really had to deal
with that kind of stuff that much.
But the key is you're scanning through it.
You don't know where things are.
And if you've ever worked with like,
what are they called?
Something plans, query plans in a database.
You'll typically see something that says it did an index scanner.
It did an index seek.
If it did an index scan, it went through every single record, right?
And that's what we're saying.
This whole read, we've said that this thing is highly optimized for a write,
is not very efficient for a read up to this point.
An example like here is like the National Weather Service has like
things all over the place, like measuring wind speed and humidity and temperature,
just all over the US. And, you know, that's all sending data really quickly. We don't want to
lose stuff. So there's some sort of logger, some sort of fashion ingestion system that's taking
all that data. But if someone wants to know what the temperature is in Oviedo, Florida,
then that's a terrible system to read from, you know, because like you said,
it's going to be at the end because there's, you know, repeats.
And if you want to know the temperature now,
and you kind of want to start, you know, at the end, you kind of move backwards.
And so it's just, it's not optimized for that.
And if that's an operation that you'd be, you know, doing all the time, then you don't want to wait four minutes for it to parse through that large file to find that information.
So you want to use something that's more appropriate for reading, although you can still take it in the fast way.
Yep, totally.
And let me back up here.
I said that you could reverse your way through a file, right?
That's assuming that you know that my record is closer to the end, right?
Like my record could have been the very first record in the file. The problem is you wouldn't know until you went through it. you can see justifications for wanting a write-optimized system versus a read-optimized system.
That, yeah, with all the thousands of sensors that might exist out there in Joe's weather example,
you could see why you might want a write-optimized system available for that data to go into.
But you would then use some other system
to say like hey what's the local weather yep or some other form it wouldn't be the same system
which we might get into here in just a moment well you know even the resolution matters like
you might have different systems like you know if i'm looking at the temperature i want to know
what temperature was like maybe right now and maybe i want to know what it's probably gonna
be like tomorrow if you're like a storm chaser and you're studying hurricanes you maybe want to watch that how the temperature
changes over the course of 11 minutes as the tornado comes in or something you know so you
want to get a lot of checkpoints and so the just the resolution the fidelity that you want to look
at that data means a lot to you and so it'd be nice if you could have potentially different
systems that are optimized for those use cases, because sometimes you care a lot about the intermediary values and sometimes you don't.
I want options.
That's right.
And so this is where we start getting into some more of the next steps in building your own system.
So this whole problem of trying to find a record in this data set, scanning, we've already said, is not optimal.
And in many cases, it could be the worst, right?
Like it could be O of N, right?
You've got to go through every single record to find the one that you need.
So the way that you solve this problem is with indexes, right?
And all this is is another data structure to store data.
And we probably most commonly know it as basically a hash table, right?
So the whole thing of an index is, all right, so I know that the last time that I wrote Alan, he was in position five in this file.
You're going to have a hash table that has Allen as the key. And instead of
storing the record, it's going to have five saying, Hey, this is the position where you can go to in
the file to get that information. Yeah. Like the example with the temperature too, you know, like
if you know that the way you're going to be using this data most often is associated closely with
location, then it might make sense to you to have an index somewhere that basically keeps
track of where that information is sorted by location. So you might be able to go to the
index and say, I need info on Atlanta. And it says, you know, here's information on how to seek
to places that contain Atlanta that make it so you don't have to scan through that whole big file.
And you can just jump to this location or these 10 locations or whatever, some information that makes that quicker.
And that's a huge value when it comes to writing.
And it doesn't slow your ingestion down.
It just means that you have to take on the additional overhead of maintaining these indexes.
And this is another one of those cases where it – when we talk about like how other data structures could come into play here and like why it's important.
Right.
So Alan described this hash map that we,
you know,
with basically a key Alan and then a value,
which is the offset to go look in the main data file for Alan's contact information.
Right.
If you think back to the past episodes that we've talked about on average,
a hash table lookup is o of one so you're
already talking about like a extremely fast operation you went from o of n which if you
it's not even o of n right it's o of like because in our in our append only file who knows how many
times alan has been updated well so it'd still be o of an n but let's sum in but not the total
number of contacts total number of updates.
So let's say that you had all the people in America, right?
330 million people, right?
And let's say that they were in there twice.
O of N is 660 million scans.
We're saying that with this hash table, it's one.
You go straight to the record.
On average.
On average.
Yeah.
Now there is worst case.
Right.
We won't talk about it, but it's
O of N. So something
you said though, Joe, is you said
that this does not impact
write performance. And I don't know if that's true.
It depends.
And that's where some of the different systems
and things start kind of taking different approaches
on things based on what they care about the most.
There are systems that will not kind of log the data until that index has been processed.
And so it kind of doesn't mark this thing as done and move on to the next until it's ready.
But I think, I don't know, I guess the kind of loggers that I'm thinking about are so afraid, you know,
about missing a message that usually they'll kind of defer to writing things down in
the log immediately and then processing after. But it doesn't have to be that way. You know,
there's sometimes when people talk about like queues or which are kind of day storage system
that are really care about write speeds. Sometimes we'll talk about messages that
are queues that focus on guaranteeing at least once delivery, meaning it never drops a message.
So in everything they do,
they're going to try really hard
to always make sure they get the data no matter what,
even if it's slow, even if it's not processed or whatever,
they're always going to defer to that.
And you have the other kind,
which is at most once delivery,
which is the opposite,
where it defers to never having more than one message.
So it kind of makes different tradeoffs.
And, you know, I'm sure there's even other specialties that kind of branch off from there.
So I do want to be careful about kind of making generalizations there.
But for the most part, you can think of, you know, the logging itself being fast,
but that data being accessible being more of a question mark.
So it depends on the storage engine.
And this is why I wanted to bring
it out, right? So first, let's back up and also talk about the fact that an index is based off
the original data. So anytime that you're indexing data, you get that original record in, you're
trying to create a fast lookup to it for the read performance. It's deriving that index based off
the original data. Now, this is whyiving that index based off the original data.
Now, this is why I said it depends on the storage engine.
If you're talking about an online transactional database like SQL Server, Oracle, Postgres,
those type of things, the more indexes you have, the slower your write is.
Because it is an ACID compliant or whatever transactional system, when it writes that
record, it also has
to write all those indexes before it marks it as done. So yeah, good point. So we use the simple
thing of our names earlier as the key, right? Typically when you're indexing things, you might
also index it by additional stuff, right? So, so maybe when we wrote our contact information,
we had, we had our first name as the key initially.
But then the entire record had our first, middle, last name.
It also had our address. It had the zip code, all that kind of stuff.
You might want to add additional indexes.
You might want to find all the people that live in a particular zip code.
Right. Well, if you think about I mean, basically, it's almost like we're describing for if we really want to talk about a phone book and at that point there,
it's a composite key,
which is what you're describing when you use more than one field.
And in that case,
it's last name,
first name address is the composite key in the phone book.
Yep.
And so here's the key part that I'm getting at here is you can't,
you don't just have to think about it as one thing,
right?
So when we were talking about appending to this file, we were talking about there's a key and there's a value. An index
doesn't only have to be the key, right? So you could actually have another index that's derived
off that data that says, Hey, I want to create an index that's based off the zip code. And so now
you create a new index and it's going to keep pointers to all those other records where all
those people lived in that same zip code.
So that's why the right performance actually does suffer.
Because as you write that first record, depending on how many indexes you have backing that for search, it's having to go in and update all those locations and all those indexes as well.
Am I remembering it wrong?
There wasn't like a portion in this chapter where he was basically describing other systems, though.
And this might be where you were saying the engine matters.
Because I thought I recalled him describing another one where it was just writing to this transaction log, and it was used for crash recovery.
It could pick back up after the fact and then continue back to rebuild indexes and whatnot as necessary.
That was based off it snapshotting things as it went.
Yeah.
But yeah, that was different recovery models.
And that was actually the pros and cons of doing some of these systems.
So yeah, at any rate, going back to this, that's the trade-off.
You have fast write speeds.
If you need increased read speeds with this particular format we're talking about right now, you take a hit on the write as you write your indices or keep track of your indices.
Yeah, that's a good point.
I was kind of focusing kind of extra on like specifically like logger type systems there, then if you ever see a question that begins with,
the customer you're working with has 10,000 IoT devices, you can automatically rule out relational databases as one of the answers.
It's not the case.
It's not meant for ingesting that kind of fast data.
It's just not meant for that sort of thing, and it's going to fall over.
So a little tidbit there.
That's hilarious.
Today's episode of Coding
Blocks is sponsored by Datadog, the monitoring platform for cloud scale infrastructure and
applications. Datadog provides customizable dashboards, log management, and machine learning
based alerts in one fully integrated platform so you can seamlessly navigate, pinpoint, and resolve performance issues in context.
Monitor all your databases, cloud services, containers, and serverless functions
in one place with Datadog's 400-plus vendor-backed integrations.
If an outage occurs, Datadog provides seamless navigation between your logs,
infrastructure metrics, and application traces in just a few clicks to minimize downtime.
So go try it yourself today by starting a free 14-day trial and receive a Datadog t-shirt after installing the agent.
Visit datadoghq.com slash codingblocks to see how you can enhance visibility into your stack with Datadog.
That URL, again, was datadoghq.com slash codingblocks.
Okay, well, how about we get into my favorite portion of the show.
It's time for a joke.
You didn't see that one coming, did you?
I didn't. I like it.
All right.
So our buddy James on Slack sent me this one, and I was like, oh, this is so great.
And so topical, too.
Are we talking about the cynical developer?
Yes, that would be the one.
Thank you.
Yes.
Another great podcast.
Yes.
Do you know how much space Brexit will free up for the EU?
I don't.
I hope it runs with Brexit.
That's all I know.
One GB.
Oh, gosh.
That's pretty good.
Man, James, that guy really knows his onions.
That's all I got to say about that.
Yeah.
All right.
That's actually really knows his onions. That's all I got to say about that. Yeah. All right. So – That's actually really good.
Yeah.
So obviously it's time for Survey Says.
All right.
So a few episodes back, we asked the question that people really wanted to have an answer to.
Which sci-fi series is best and the choices were star trek damn it jim i'm a doctor
not a doc oh okay fine or star wars han shot first and i think this was i think we did this survey
around the time that uh because it's been a minute i think this was around the time
of uh what was the last movie can you remember the name yeah uh yeah the's been a minute i think this was around the time of uh what was
the last movie can you remember the name yeah uh yeah the last jedi last jedi thank you last
styles yeah the last what the last sky jess skywalker last guy it was like right around the
time the mandalorian was coming out too which i think has probably biased maybe the survey
oh okay i guess we'll see good well go ahead well let's see then um okay. I guess we'll see. Well, let's see then.
So we'll go ahead.
Joe's already throwing his opinions out there.
So we'll let Alan go first.
Oh, I didn't see that one coming.
No, I didn't.
So I'm going to say Star Wars, Han shot first.
We'll go with, I mean, there's only two chances.
So I got to go greater than 50%, right?
So let's say 51%.
And I have spoken.
Random.
Wait, wait.
Did you not watch The Mandalorian?
Yes, yes.
You don't remember the character?
Yes, you're right.
Yeah, that guy was great.
It was somewhat relevant.
All right.
Thank you.
Yeah, that was a fail on my part.
I will admit.
All right.
Good.
So I'm going to say with 33%.
What?
I'm going to say Star Trek, damn it, Jim.
I'm an intellectual, not an action hero.
Let me see if I understand the math here.
So you are supposing that Star Trek is the most popular answer
with only a third of the vote
between two choices.
Right.
I don't know.
I mean,
I might be right.
You might be. You know what,
Joe? I like your optimism.
You just might be right.
Find out.
We will, sir.
We will.
Only because we play by Princess Ready rules.
Make it so.
Oh, God.
All right.
Well.
Okay. Well, okay.
Well, I have to be the bearer of bad news to one of you.
Care to take a gamble?
It might be.
I am wearing a Star Wars hat right now.
It's got to be true.
I have to have won that just because he went under 50%. So, yeah.
Alan, you won.
It was Star Wars. I mean, surprise. Alan, you won. It was Star Wars.
I mean, surprise, surprise, you won.
And it was over 50%, wasn't it?
It was.
You know, that's the funny thing about math.
It was that Mandalorian.
So, yeah.
At about 60% of the vote, it was Star Wars.
And maybe Yoda might have weighed in a little bit on it.
Hey, look, let's be honest, right?
I don't care if you're a man or a woman.
We can all admit we want a little Baby Yoda.
Baby Yoda's pretty awesome.
Yeah, I want one.
I'm not going to lie.
I do have a couple Baby Yodas on my shelf, actually.
Oh, really?
Yeah.
That's a cute little thing.
I need one.
Yeah, and he has his own little force capability.
Right. How cute is that? And his ears wiggle. You're like, you go to change his diaper, and he has his own little force capability. How cute is that?
And his ears wiggle.
You go to change his diaper and he's like, no, go away.
He forces you away.
I think I'm actually going to go back and watch it again.
That little guy made me smile every time he came on
screen. Yeah, he was so cute.
Alright, well,
huh.
Who would have thought that 50% is the winning amount.
You know what?
Maybe the next episode we'll just like rehash some math.
Joe's about to drop off.
I was going to do another joke, but I don't know that we need to now.
I'm just saying, you know, replicators, right?
No money.
Totally communist universe. I where did we just go there star trek oh okay all right no more traffic
all right so uh yeah so much for humor uh well let's let's do another joke anyways. How about that? Oh, my head hurts.
So, uh, from Slack, our, how do you pronounce this one?
Our bleeder, I hope I'm saying that right, uh, gave me a, gave me a, this one, this is
my joke for life.
Oh.
That's how, that's how I how – that's your biggest hint already.
So we've got the best chapter he's ever read plus the joke for life all in one episode.
All right.
You ready?
You ready for this?
I'm ready.
What does a developer do before starting their car?
Make – I don't know.
I have no clue.
Get in it.
Oh, my gosh.
I gave you such a big hint.
Such a hint.
I just know that you liked it.
I should have known.
Yeah, get in it.
Wow.
That's really good.
Okay.
Yep.
So, all right.
So, for today's survey, we ask the hard-hitting questions that other shows just don't even think to ask. And so, today's survey is, which fast food restaurant makes the best fries? Because the people want to know.
That's right. All right. So your choices are Arby's, Burger King, Checkers, Chick-fil-A, Hardee's, In-N-Out, Jack in the Box, McDonald's, Popeye's, Steak and Shake, or Wendy's.
And I'll give you a hint.
Some of you are going to be wrong.
And you want to know what's great about this particular survey is I have a feeling there's going to be a lot of passion in the answers behind these, right?
Yeah.
You're going to have to defend your answer in the comments, and that will enter you in
for a chance to win the book.
Oh, man.
I would actually love to see the dissertation as to why people chose one over
the other instead of just choosing,
like totally leave a comment.
Like,
yeah,
it's gotta be these.
And this is why.
And if you don't have palm frets in your neck of the woods,
then write in and let us know what you like instead.
Oh,
that's meant that makes me remember.
So why do we call them French fries here,
but they're called chips overseas?
What is that?
A chip is a thin, sliced, fried thing.
Why are those called chips?
Like, why is fish and chips not fish and fries?
It hurts my brain.
Biscuits and cookies too, man.
I, you know, I don't know.
Wait.
Well, you know, I did forget one last joke before we leave this section because as it relates specifically to our survey that we already gave the answers to, Mike RG from Slack, you might have heard his name like once or three billion times.
Per episode.
Yeah, per episode.
He pointed me to a tweet from Parker Higgins that really makes a lot of sense and really gives you something to think about,
especially as it relates to our survey
and just this architectural type conversations
that we're having.
So Parker says,
I used to wonder why the interfaces on Star Trek
are so clunky,
given that it's centuries in the future,
but I guess that's just enterprise software for you.
That's good.
This episode is sponsored by Educative.io.
Every developer knows that being a developer means constantly learning.
New frameworks, languages, patterns, practices.
There's so many resources out there.
Where should you go?
Meet Educative.io.
Educative.io is a browser-based learning environment allowing you to jump right in and learn as quickly as possible without needing
to set up and configure your local environment. The courses are full of interactive exercises
and playgrounds that are not only super visual, but more importantly engaging. And the text-based
courses allow you to easily skim the course back and forth,
just like a book.
No need to scrub through hours of video just to get to the parts you care about.
The incredible thing about Educative.io is that all of their courses have free trials,
a 30-day return policy, so there's no risk to you.
You can try any course you want and see what you think of it.
And you're going to love it.
And here's the great thing. They recently introduced subscriptions. So now you
can go, our listeners can go to educative.io slash coding blocks, and you can get a 10%
off discount on any course or subscription. Again, that URL is educative.io slash coding blocks.
And, you know, I got to bring up my favorite course,
Gropking the System Design Interview,
in which they go over a bunch of common architectures for,
no, I shouldn't say common.
They go over architectures for prominent platforms,
like say YouTube or Twitter or Uber,
and break down how those systems are designed.
And it'll show you just how important it is to know the read,
write ratio and volume when you're trying to think about how to design a
system,
or if you're trying to interview doing a system design interview.
So I definitely recommend checking that out.
And remember,
they've got that 30 day return policy.
So if it's not for you,
then that's okay.
You can,
you can afford to try it out with no risk.
Hey, and with 10% off, you can't go wrong.
Yeah, absolutely. So make sure to start learning today by going to educative.io
slash codingblocks. That's E-D-U-C-A-T-I-V-E dot I-O slash codingblocks. And you can get that 10%
off any course or an additional 10% off of a
subscription.
So let's jump back into the conversation with hash indexes.
So,
I mean,
this is kind of a continuation of the,
the hash map conversation that we were talking about before the,
before the break,
uh,
where like we,
how we might store the,
um, the break, how we might store the key in a hash table and be able to then have the luxury
of doing an O of 1 lookup, and then that pointing us to an offset in the main data file that
we can then go and retrieve Alan's contact information.
That's right.
In a nutshell.
And what's interesting is they say that they did this.
I've never even,
I've heard of RIOC or RIOC.
I don't,
I don't even know how you say it,
but I've heard of it before, but they said that this is what's done for bit cask,
which is the default storage engine for RIOC.
The interesting thing though,
is they store this entire set in memory so
super fast but you gotta have enough ram right
yeah i wonder what kind of applications people are doing with react i haven't really looked
into it too much yeah one one thing i kind of learned recently is that how often uh sometimes the databases are
kind of embedded into different applications like a kafka embeds um rocks db in their kakka
streams applications and that's kind of like the most prominent example that i think of
uh jaeger uh is the application i've been using for some tracing that lets you use oh sorry
oh uh just uh different kind of databases underneath,
including Elasticsearch can kind of power its stuff.
And there was one other example I wanted to give.
Well, what were you going to say about Jaeger lets you use...
Elasticsearch, ultimately it's like storage engine for displaying it.
Oh, the other one, Grafana,
you can have different storage engines that are kind of like underneath it.
And so what I think of is Grafana is a bunch of pretty graphs. Underneath you can do like Pr engines that are kind of like underneath it and so what i think of as graphon is a bunch of pretty graphs underneath you can do like prometheus or influx
or maybe there's other choices there but it's interesting to see that you um that these other
applications kind of are built around databases but don't necessarily expose that database to you
and uh you know that's there's nothing new about that. I just, like, I forget sometimes that so much of what I get out of applications
that I like out of applications that I use is often kind of granted to them
by the magical powers given by their kind of embedded database.
But how does SQLite not come to mind for you?
I've never used SQLite.
Yeah, I definitely have.
I mean, you talk about it like something that's, like, embedded everywhere.
Yeah, it was, like, de facto a lot of times for,
for mobile applications.
Right.
And even PWA,
all the things.
Yeah.
Right.
I guess I did a little bit when I was best with unity,
it was easy to embed it in there.
And that,
that's a great use case.
So like you want a relational database inside the game,
like SQL light.
Great choice.
Yeah.
Yeah.
I mean,
clearly the author
martin kleppman he has had a lot of experience in a lot of different uh database technologies
because some of these i'd never heard of like the react i was like i it sounded more like a car
you know the funny part is it is actually pronounced react i had
to go look it up so joe said it properly first but yeah um the interesting thing about this one
is they say that all the keys stay in memory but you're still appending to that file constantly so
every time you write to that file all you're doing is just going back with that o of one look up to
get back to that key update the the new pointer and you're going in. So it's
hyper efficient. Right. So that's
bit cask and react. Yeah.
Now you have to think though, like, okay, if you're just always going to run
right to this file, like how, how, what
next? Like you're eventually going to run out of disk space,
right? Like that can't be your strategy for life, right? Or can it? I mean, that's what I love about
this particular chapter, by the way, is it just keeps building on. It's like, okay, well, here's
the, here's the very first problem you're going to run into. Right. So, so the answer is obviously
no, no, you can't, you can't do that. So you have to come in to some other solutions. So that's where file segmenting and compaction come into play. So by that, what I mean is we gave this example where we were using a bash function set and get to write contact information to a flat file. And we made some mistakes and we had to update Alan's information 50 times.
All right.
So it's not until,
you know,
that 50th one is the one that is really the one that matters.
So the compaction,
what that would do is we would eliminate all those other ones and we would just
store the one entry for Alan,
but in this home brew homegrown version into a new file is the important
part right so again we're still in append only mode the big difference is when you go through
the compaction you're reading through the old stuff and you're basically trying to merge that
into a new file that is also going to be append only and will eventually become the new log file that everything else is writing to.
You can imagine like if you're kind of designing a new system like this and you start going down this path and you realize that you're potentially going to run out of disk space.
You start thinking about how you might do this underneath is like I would probably pick a size like four gigs and I would just allocate that size on disk.
And then I would start at the top and start appending. And as I started to get close to that four gig limit, then I would go and allocate a
new file. And then as soon as I hit that limit, I've already got that next four gigs allocated
and open and, and I can run over there and do that. And that point I can drop my pointer to
the file that I had open, I can exit it. And then another process can come along and take a look at
older files at some point, whenever it chooses, and go through and kind of clean things up, compact.
And that's really powerful.
And it reminds me a little bit of garbage collection, except that it can cleanly segment
these things off by kind of saying, like, we're not garbage collecting the stuff that's
actively being written to right now because we've made this rule where we only ever write
to the end. You know, I mean, that's one of the subtleties of this book, of this chapter that I loved
about it, is that even in the scenario that Joe was just describing where you might pre-allocate
this four gig file, in this chapter, he specifically discusses, like, even the performance gain that you would get from sequential writes and reads by writing all of that in one contiguous block on a spinning hard drive, right?
And, like, what benefit you might get from that.
Just little things like that that, you know, if you weren't thinking about it, you know, and you could easily take for granted, right?
But he calls it out.
Yep.
This chapter is extremely thorough.
Yeah, he goes deep.
I mean, it's a step-by-step on how you would actually do this from scratch for a very basic but still functional database, right?
And one of the things that they point out in this whole compaction type thing, right,
to what Joe was saying is typically
these things happen in a background thread, right? So think about it, right? You have something
that's constantly writing to your live log file, and then, you know, it's approaching the time
where it's filling up and it needs to create this new segmentation, this new file. It's going to do
that on a background thread. And also in the background, it's going to try and go through and find all the latest, newest records for any particular key, write them to it. And as soon as
that's done, it's basically going to do exactly what Joe said. It's going to deallocate the
pointer to that old file, point it over to that new one where you've got that compacted data in
it and start writing to that. And then that garbage collection, that file garbage collection
can take place
and you can delete those old files if you want, right?
Because what we were talking about is
once you eventually run out of space,
not if you keep basically trimming the old stuff
as you go along.
So you can kind of envision
like where the system part of RDBMS comes into play because you can already imagine like, okay, I need a whole other separate process maybe to like manage some of this compaction and segments and whatnot and moving the pointers around, you know, versus another process.
It's just like, okay, you're going to give me data.
I'm going to take it in.
I'm just going to write it to the transaction log.
I'm not going to even think about it beyond that.
I'm just read, write, read, you know, read it from you and write it to the file.
Yeah.
I want to mention to the two systems I work with a lot day to day are Kafka and Elasticsearch.
And both of them have this concept directly.
It maps segments and compaction.
And both of them work exactly the same way.
And after kind of reading
this chapter and being able to kind of correlate the things that I've learned about like Elastic,
one thing that I've noticed is that like if Elastic, if you fill up your disk space,
it's a problem because it's not easy to clean up old records. When you try to compact,
it needs to allocate new space so it can go through its segments and write to a new file
before it can delete those old
sections so what that means is if you've got 100 disk utilization filled up it's really hard to
make more room and delete stuff so you can't just go in and say okay fine delete the top thousand or
delete the oldest thousands records because it's like actually i can't i can't make any room to
delete anything truly off disk because there's no room for me to write it.
And so you can get into a kind of a bad problem there where you basically have to move stuff off disk in order to clean up some room and move that stuff back on.
So it can be a big problem in Kafka too.
Actually, one thing I ran into there I never really realized until reading this chapter is that they've got retention policies on their topics.
And you can say like age data out after a couple of days.
Or I can age data out after a couple of days or I can age data out or I shouldn't say age.
I should say, you know,
maybe only keep 50 megabytes
of data around for this.
Retention policies.
And yeah, so I kind of thought like,
OK, cool.
So as soon as a new record gets written,
maybe it looks at the oldest record
and kicks it out.
That's not how it works.
It basically happens
when those segments roll over.
So whenever it kind of hits that limit on disk, that's when things get works. It basically happens when those segments roll over. So whenever
it kind of hits that limit on disk, that's when things get cleaned up or that's when things get
compacted. And that can have a big impact if you're writing a program and your program doesn't
ever expect to get older data because you've got a retention policy set on. But that data doesn't
actually get cleared out of that segment until compaction occurs which in kafka's case doesn't
actually occur until a segment essentially rolls which is that process where you start writing to
a new one so if you've got a really low volume topic even though you've got a retention policy
set to say like three days if it's so it's such low volume that it doesn't ever roll over the
segment size you could have a year's worth of data in there yeah so if you're doing things for like
government work or whatever where you're like you can't keep data longer than x days or if you've
got like a gdpr incident you need to wipe someone's data and you think that you're clear because your
retention policy is only three days it may not actually be the case because of how these things
work underneath and so it can really chip you up if you don't understand that those are how things
work yeah now during this chapter i can go look at those systems and like i kind of understand them more deeply and i'm glad that they use the
same terms for these things yeah it is really interesting too because what you're talking about
even in the kafka world is that retention policy of size competes with the time retention policy
right so what you're saying is if you you might you might think, okay, well, I'll be smart about
this. And I'll just make it to where, you know, these things roll over segments every minute or
something, right? Because, hey, if I want to make sure that these records age out really fast,
then I can do that. But the problem is now you're creating these new files constantly, right? So
you have the contention of creating the new segmentation file while you're closing out the other ones and you're having to write to disk at the same time.
So it's really a balancing act, right?
Like you're going to have to set the proper size to say, hey, I think I'll get this much data so that it'll trigger this new segmentation file.
And I need it to work within this amount of time so that they're not competing with each other and you're constantly writing new segmentation files. Because in the Kafka world, probably very similar based off these
very simple principles we're talking about here is each segmentation has basically a starting offset
and an ending offset. So that when you go to seek to records sort of ish in the Kafka world, it kind of knows where to go find them.
So all of these principles we're talking about here in this very simple implementation of a
database are used in a lot of storage systems that are now adopted by massive companies.
So one of the things here that I don't know if we covered is while that background thread's running where it's basically trying to create a new segmentation thing,
they also point out in this implementation, you're still going to be writing your new records to that other file.
To the append-only log.
To the append only log to the append only log and you're also still reading
your offsets from that append only log while this other segmentation file or a segmentation log is
being created because you don't want to contend with that thing while it's creating it only when
that thing's done do you switch over and then they say after it's done after you've created that segment that new segment file then
the old ones can be deleted and then again that of course is going to boil down to if you have a
retention policy or something right like maybe you just kill them as soon as they're done
you know i've seen guidance from elastic search that says not to run compaction or what they call
force merge in their case to basically clean up those sector sections uh segments uh they say not to do that to an index that's actively being written to and
now i think about it's like okay that makes sense because we're writing to stuff while we're trying
to clean it up but i don't know what happens in that case like does it does it mean that it doesn't
clean up the active segment like because that makes sense to me or does it like try to clean
it up and make things get a little weird. Or do you lose your rights?
Does it kill those?
Yeah, I don't know.
I don't think it loses, but I don't know.
We've got a science experiment to try out.
But you know what's funny is with these systems,
benchmarking is really important with them,
not only just because your data is different,
but because there's so many different factors to play.
Like say, if you do set the segment size really low,
then you could slow things down.
But depending on your use case, maybe that's, you know, important and worth a tradeoff.
So you can make a little tiny setting change and drastically change the performance of your system as a whole.
And so it's really important to be able to kind of test that stuff before you roll the changes out. And that's how, you know, small config changes can bring down, you know, large data centers on Christmas Eve or whatever.
It's always the best time to bring down
your database too.
Lizard squad.
Lizard squad.
That took me a minute.
Yeah, that was a different
incident there, but it just reminded me.
So some key factors
making things work well uh file format uh mentioned here that csv is not a great format for logs uh it's common
separated values um typically you want to use something like a binary format that encodes the
length of the string in bytes with the actual string appendant afterwards. Yeah. So someone would say why?
Yeah.
So they basically said that CSV, again, because the format's not good for it,
by getting the length of it, you can basically store the offset to the end of that thing, right?
So I could jump to the next line without scanning through all these characters
or like regexing for a new line character or whatever.
So it's kind of like the thing we talked about at the beginning where like a file system depending on your file
system might contain the start and the length of the file so if you need to hop to the end it's
got a really easy way to do that same thing here so if you're trying to seek through these things
as quickly as possible like it wants to be able to go line by line the fastest way to get to the
next line is to know exactly where that next line begins so you can just do a simple add in order to hop to the next line yep when you think about if you've got like
a you know 300 gig file and you've got a billion lines to go through that's a lot of hops it's a
lot of math just to get to you know that last item there or somewhere near the bottom yeah it's
important to call it too though the reason why we're even talking about csv is because we didn't
make this call out before but technically that was the format that he was storing everything and he it
was like a key comma and then a string value yep that he had so that's why cv csv came up yeah good
point didn't even think of bringing that up and then uh deleting records is requires some special
attention because you have to create a sort of a tombstone to record the file or um you when you do the merge process and that's something i hinted at with elastic where um you
know it it does ingest things in a log type format where it keeps appending appending appending
appending and if you delete an item in an index then it doesn't go and remove that because we as
we said like uh you know when we're dealing with logging systems that have to be really fast for ingestion
we typically only write to the end
and so in Elastic's case what it does
is it stores the fact that you deleted this
document somewhere else and when you query
and it does its filtering and does
its magic whatever it needs to take that into
account and say oh this one's been deleted
and exclude it from the results
which is overhead but
we'll probably get to that later
and talk about how they could do that quickly. But the gist is to know that when you delete a
document and something that's using this kind of mechanism underneath the hood,
it doesn't automatically free up disk space. And so if you run that space on Elasticsearch and you
say, delete these 100,000 records, it might go and mark them as tombstoned. It might set that
first bit to zero or whatever and say, hey, this is deleted.
But it doesn't free that disk space up, so you still can't take in new records,
even though it feels like it should have been a delete, had it successfully operated, executed.
I like to think that we live in a world now where emojis are such a big thing
that instead of writing a zero, it could just write a scroll and crossbones.
Yep, literally puts in a – and that's something actually with kafka too um since a lot of the systems that we talk
about are um a lot of modern kind of queuing systems or topics deal with immutable messages
they really want you to keep things alive for like event sourcing or so you can recreate the
state of your document at any time until it's really tough and the way you do this is with
the tombstones like we mentioned but you have to be careful with you know your clients that you
uh you know if you're doing something kind of naively and maybe starting from the beginning
of time and building up some sort of system or map or picture you've also got to be able to
handle things like tombstoning and removing records as they come along too and so your
clients have to be a little bit smarter about things and it's kind kind of funny that you can do all these operations on something that ends up
getting deleted a few minutes later.
Yeah, it is.
One of the cool parts about the Kafka world, at least if you're talking about Kafka streams,
they actually use the same term there as well.
So when you're trying to delete something from a streaming process, you tombstone it.
So you basically send it a key with a null
value, right? And it will mark it as ready to delete. So yeah, it seems kind of goofy at first,
but it makes sense to me now. But like when you think about like event sourcing and like replaying
events, like you kind of think like you might be tempted to say, why don't I just delete the
records? It seems goofy that I'm going to do all this math on things that I'll end up late, you
know, later, maybe I delete 90% of them. But these topics or these systems don't I just delete the records? It seems goofy that I'm going to do all this math on things that I'll end up late. You know, later maybe I delete 90% of them,
but these topics or these systems don't know what you're doing with the data.
It doesn't know if you're making decisions based on the current state of that
system.
So in order to be able to replay things and needs to replay everything,
even if it ends up getting discarded at the end.
And this is like in a,
in a database,
this would be what's known as a logical delete instead of physical delete,
where you basically just mark a flag on a record and say, hey, it's deleted.
Ignore me when you're trying to show anything that's still alive.
You might also call it a soft delete.
A soft delete.
You might hear that term.
Yep.
Oh, yeah.
I should mention, too, that we've been focusing a lot on logs this episode.
We are still talking about relational databases right now.
Yeah.
Because this is a fundamental piece of how relational databases like SQL
Server, Oracle, or Postgres, this is how
this is a big part of how they work underneath.
So we're actually building up to
their specific data structures that are built
on this core, on
these core kind of tenants of logging.
I want to correct that.
I would not say that we
this is a core
these are core concepts to databases but we're not necessarily talking about a relational database.
We haven't talked about relating anything to anything.
We're just talking about how to store some data.
Yeah, I'm just saying that this is a –
So the concepts that we're talking about could apply to a document database.
They could apply to OLAP.
Like, we don't care yet.
Yeah, yeah, good point.
I did not phrase it well what i meant to say is like this is an important facet of that plays a big role in relational databases as well as all these
other systems like i mentioned kafka and elastic search stupid so it's not just relational databases
but that we're getting there is what i'm trying to say i would agree with that yeah so we kind
of already hit on this one but like you, you know, crash recovery, right? Like
it's not a matter of if your server is going to crash, it's a matter of when.
And, you know, you mentioned these, uh, like pre-allocating a four gig file, right? Like,
so if that's the size of your segment file, like depending on the size of the segment files,
it could take a minute to, to, uh, you know, for the server to spin back up,
depending on how it's writing the,
these files to disk and,
uh,
you know,
what is it that's being written in and what order is it being written?
Right?
Yeah.
So this was the whole talking about,
right?
Like if you didn't have this in memory hash and now you have to rebuild this
in memory hash, you're going to
have to scan that four gig file to rebuild that memory. That can take some time. And they said
that BigCask, what they do is on occasion, so they're writing their log constantly, right?
But at the same time, it will snapshot it's in-memory hash and write that to disk. So if it
did crash for some reason,
when the thing comes back up,
it can go load up that snapshot file,
load that straight into memory without having to scan the four gig file.
And then that way you have your pointers right back to that data.
And just to put some,
some terminology around that,
that that snapshot file that you're referring to at that point would,
would in fact be the index.
It is that is would, in fact, be the index. It is.
That is being kept in memory, but it occasionally does snapshot that index out to a file, but it can reread that occasionally.
Yep.
You know, in the case of a server crash.
Which, depending on the size of the file, could save minutes, right?
It could save maybe even more than that.
Well, that's why I call it like the four gig segment file you know since uh joe had mentioned four gig as the you know what the size of the file you might pre-allocate right like
uh depending on how large those files are you know it could definitely have an impact on uh
you know i mean it's part of the trade-off that you're gonna make right because either you take
the overhead of writing a bunch of little files or you're gonna like pre-allocate one large file
so that you can have one large sequential
file to read and write to but then either way yeah either way if it's four gigs of data in
100 files or four gigs of data in one file you're still going to take a hit on trying to go scan
through all that data well it depends on what that hunt does those hundred files represent though
well let's say it's the same format right right? Yeah. You're just going through it.
But one of the other things that they talk about here is another thing you need to be concerned about in this particular model that we've been building up or this underlying storage is what happens with incomplete record writes.
You know, this thing crashes in the checksums and say, hey, is this thing done?
Is it complete?
Is everything kosher?
If it's not, then it can skip the bad sectors and you kind of pick up where the good stuff left off.
Finally, when I mentioned concurrency control,
so you got something outlaw.
Well,
I was just going to say like,
you know,
even if you were to think back to bit,
you know,
cause you're talking about big cast,
but none of us really have a lot of experience that.
But if you were to think back to like any other traditional kind of database,
like a SQL server,
you know,
you could think about like how in um, in the, the asset
compliance that you mentioned earlier, right. Where that thing isn't truly written, considered
written until like all the indexes have been updated additionally. Right. But if it did write
to the, to the log and you know, maybe two out of five indexes got written and then it crashed,
you can start to imagine now like,
oh, here's where some of the spin-up time can come from on a restart because as it rereads that log of,
okay, what were the last things that were done?
Did those things finish?
Let me go finish those things.
So now you can kind of get an idea as to like,
how does it recover from that crash and still adhere to acid yeah there's like checkpoints all over the place
right so it is interesting and jay-z you want to pick back up on the concurrency yep just want to
mention so for concurrency we kind of hit on this before,
but it's common for there only to be one writer
that has the open file pointer
and that is responsible for streaming that data
as quickly as possible to disk.
But it's also common to have multiple readers.
And we can do that, again,
because we know that the data written,
once it's written,
as long as we're doing a proper log,
is immutable.
So it's safe for multiple
readers to be in there in time so that's something we can parallelize out and uh you know do a couple
different interesting things with that without worrying about slowing down the writing at all
so no locking yep this is this is where the uh the questions that everybody's had bouncing around
on their heads like why why are you just writing only, right?
Like, so why not update?
Yeah, I mean, it seems terrible.
Like, you have to worry about writing this space.
If you've got data that's update heavy, which is a kind of right, but it's a specialized kind of right, then this seems, you know, like a terrible idea.
And so, you know, we kind of touched on some of these things before, and it can seem really inefficient at first. But like we mentioned, if you stick something into the middle of the file,
not only do you have to seek to it
and search for it and find it,
but then depending on if that data is larger or smaller,
you potentially have to bump all the data,
you know, one way or the other,
which is a pain.
If you imagine like even deleting the first line of a file
means you have to shift everything up,
you know, to read.
Now that I say that, I don't know if there's maybe optimization for that
in the file system, whatever.
But as far as I know, you have to shift everything in place.
Well, but I mean, if we're keeping true to the spirit of this chapter, though,
you're managing this file yourself.
Right, right.
So if you're going to delete the first entry from the file
then you're responsible for shifting everything up yeah yeah that's true uh just got hung up
thinking about if there was some way maybe i could because the point is to try to think about like
you know something has to do this shuffling around right and like where where are some um
you know the the weak spots the strengths, the advantages of different ways of even reading or writing or storing this stuff on disk?
Yep.
And we mentioned, too, how this is particularly efficient on spinning hard drives.
But one thing that the book mentioned we didn't really go into is that sequential operations are also more efficient on solid-state drives.
I hadn't heard this
before and i haven't looked deeply into it i was curious if anyone looked it up or what why that
was i've never looked into it but i have also seen that at least on performance charts and stuff where
they're comparing multiple drives like your sequential versus your random reads and all
that kind of stuff like there's massive differences in them yeah i mean it's generally like if you really want to gauge the
performance of any drive be it a solid state or a spinning disc it's the randoms that are going to
be like the true measure of like how fast it is right because that's what your os is doing is
just throwing stuff all over the place yeah because, because the sequentials are usually going to be like that.
That's what they're all optimized for.
Right.
The sequential is if you're writing a massive chunk of a file at once.
Like you have a movie that you're bringing over and it's 30 gigs.
That's your sequential write and your read.
But that's not most of what's –
You hope.
But that's not what is most of your operating system, right?
It's usually files scattered all over the place, which is why your performance numbers are based off that.
Yeah.
When you get a 550 megabyte per second random, then that's good.
Yeah.
You're flying.
Well, actually, no.
That's not even good by –
Nowadays.
Nowadays.
Yeah.
I'm sorry.
I should have added a zero to that.
That was a SATA 6. I'm sorry. I'm sorry. 2007 should i should have added a zero to that that was a set
that was a sad i'm sorry 2007 called they want their hard drive back right uh so i looked up
why ssds why it matters at all for sequential and what they kind of say is basically if you're doing
random uh access and you're writing to random spots on disk then they can basically leave
little holes we're talking about kind of fragmentation essentially. And if you're writing a lot of data,
those holes eventually need to be cleaned up.
And so you're kind of forcing
more
kind of garbage collection or defragmentation
type operations because
you're going to be filling up space
in an inefficient way compared to a sequential
write, which doesn't leave any of those holes
and is able to just keep streaming
that data out but
that's the same for both spinning and ssds yeah yeah i was just curious like i knew so i knew i
understood like you know you've got that physical pointer you know that physical like writer head
on the the spinning disc and so that made sense to me as to why sequential was really important
there but ssd is always kind of it's like thought of them as basically being like similar to ram
and so it's like why does it matter if it's sequential or not?
But it basically has to deal with – it has to do with how things get junked up essentially.
So similar type thing.
I just imagine – because I just assumed that it would be like even on that disk, there's still a controller.
There is.
And like any other process, it's got a certain number of threads that it can do stuff.
It has to know where to go get it and all that kind of stuff.
If it can just be like, oh, hey, start from this offset and read 10 gig,
that's going to be a lot faster operation than, okay, read from this offset and read 5 meg.
Now go to this offset and read 2 meg.
Now go to this offset and read a gig. Now go to this offset and read a gig. Now go
to this offset. You know what I'm saying?
That was just my assumption.
Well, it turns out that's part of it too. So I just
round someone else that said
even on smaller writes,
excluding this
kind of filling up and having to do that cleanup,
it's not
a true zero
cost operation in order to kind of hop around because
when things aren't stored uh linearly you do have to go back and do that those calculations
in order to kind of move around and read and write to different areas so exactly what you said
cool it's close to zero it's much's much faster, but it's not zero.
SSDs are such a beautiful thing.
True that.
So concurrency and crash recovery, we mentioned how logging systems deal with these, but it's much simpler.
When we get into relation databases more, we're going to talk about the things that they have to do to go pretty far out of their way in order to make those things work.
To keep this back in context, though, we're talking about it's much simpler if you're in append-only mode and not updating portions of the file.
Yeah, you're right.
So I only wanted to bring that out because we floated away a second.
So we're talking about why not update in the middle of the files, right?
Like why not go update the value for some key? And this is one of the reasons because the concurrency and the crash recovery are much
easier if you're always doing append only. Yep. And merging all segments is convenient and
intrusive way to avoid fragmentation. It gives us a nice convenient pattern for following there.
It's low effort. It's hard to mess up.
And it just kind of works out really well in practice.
And so this fragmentation, I think it's important to understand what he was just talking about with the SSDs and the hard drives and all that.
Imagine, you know, a simple example because we're using my name or let's use Joe this time.
So let's say that he first went in there as Joseph Zach, right? And then at some point they come back and they're like, no, no, no, we want that as Joe. Well, because you don't
want to shift all those bits around on disk, because that's a really expensive operation,
especially if there's 10 gigs of data after him, probably what you're going to do is you're going
to write J O E there and you're going to know out that next bit in there and just leave four bites
open.
Right.
And so that's where you start running fragmentation.
And that's why they're saying,
if you do this append only thing,
then you don't have this fragmentation.
You don't have all these empty blocks all over the place because you're
always putting it right at the very end.
So you don't,
this update causes fragmentation and that's why they lean towards a spend
only mode.
And if you also think about it like this,
um,
if you were creating this database servers,
this database system,
and this system was only ever going to be used for this database,
you can kind of already get an idea where like,
if it's always pre-allocating these files of a certain size, right?
And it's always going to be like, you know,
from the beginning of this computer, this server's life,
it's always going to be pre-allocating them in a certain size,
sequential writes and reads.
And then, oh, I got to get a new file. file so i'm gonna go and create a pre-allocate
another one and then eventually i can age off this other one you could see like how
the disc itself is always going to lend itself to be in a non-fragmented you know hopefully
more often than not the disc itself will remain, you're just packing them in there.
At least you hope.
Yeah, I mean, that would be the hope for sure.
But yeah, you're just constantly churning over the same thing, right?
Like you're filling it up.
It's almost like filling a glass of water, pouring it out, filling it back up.
I mean, I'm kind of like being optimistic here, though,
and hoping that you're going to delete the same amount of records that you're inserting.
So maybe it's not true then.
But at least from like – but think about it, though, from the point of view of like this is also why systems will allow you to keep the data files in one place, temp files in another, and the log in another.
So you can allocate different disks for these things, for these purposes.
Yeah, there's all kinds of optimizations.
We're doing the simple database right now.
Yeah.
Simple.
Yeah.
All right, so we have some downsides here.
And this is going back to what we were originally talking about.
So the first one is the hash table must fit in memory, right?
If you don't have enough RAM, this was the whole bit cask, right?
Like our hash to look up thing.
Then you might, if you don't have enough memory,
then you're probably going to have to spill this over to the disk,
which isn't nearly as efficient, right?
Because now you can't just go straight to the spot in RAM
where that hash was to point to the location.
Now you're actually having to go, okay, well, it's not in RAM. Let me go find it on disk and
then go look it up. Right. So it's an additional couple of hops. Yeah. And then, you know,
another downside here, we haven't even talked about this one yet, but range queries are going
to be inefficient if you have to look up
each individual key. And so what I mean by that is like, if you had to do a query where you're like,
hey, give me all of the contacts for people whose names are from A to D, right? Well,
depending on how that's sorted, right, you could see like how that log, going through that log, is going to be inefficient.
If you had to go through that and look for at each individual entry to find the matches there, you know, match that range query.
Yeah, because the key is no longer actually the key, right?
Like it's not like it's stored as, you know, Alan, Joe, and Mike.
It's ABC 123, you know, DEF 245, whatever, because it's hashing that key for the fast lookup.
So, yeah, these range queries are going to be super expensive because they're probably going to be scans.
Scans through your hash table, more than likely.
Which is an interesting point.
Like when you see something doing like a table scan and something like SQL Server,
or you see something doing an index scan, neither are good.
But typically your index scan will probably still be faster than your table scan
because you're scanning a smaller chunk of data to get to it a lot of times.
So it's not efficient, but it's still more efficient than having to go through the entire data set in a lot of situations.
Yeah, I mean, another way that you could think about that, this is definitely getting a little bit ahead,
but to your point, the data file might contain every column.
So if you had 100 – if it was a really wide table, you have 100 columns in it versus the index.
Even in your composite key example, like in our phone book example, we gave three columns.
So it's already a much smaller –
Size.
Size-wise, it's already a much smaller – wise. It's already a much smaller, you know,
the width of this thing is,
is already smaller.
So you can already see like how it would be,
uh,
you know,
the size would be greatly impacted because now you're only talking about an
index.
And then,
you know,
there might be there additional operations that could compact that even
further that we haven't gotten to yet.
Yep.
So,
so stay tuned.
Yeah. This episode is, so stay tuned. Yeah.
This episode is sponsored by clubhouse.
Clubhouse is a fast and enjoyable project management platform that breaks
down silos and brings team together to ship value,
not just features.
So let's face it.
Slow,
confusing UX is so last decade.
Clubhouse is lightning fast built for today's software teams
with only the features
and the best practices you need to succeed
and nothing else.
And here are a few highlights about Clubhouse.
They've got flexible workflows
so you can easily customize workflow states
for teams or projects of any size.
We've got advanced filtering
so you can quickly filter by project
or by team to see how everything is progressing.
And you can do sprint planning, so you can set your weekly priorities with iterations and let Clubhouse run the schedule.
And Clubhouse also integrates with the tools that you love.
They tie into existing tools, services, and workflows. or create a story in Slack, update the status of a story with a pull request, or preview designs from Figma links, or even build your own integration with their API and a lot of other things.
And Clubhouse is an enjoyable collaboration tool.
Easy drag and drop UI, dark mode, emoji reactions, and even more.
When you're doing your best work and your team is clicking, life is good.
Clubhouse has recently made all core features completely free
for teams up to 10 users.
And they're offering CodingBlocks listeners
two additional free months on any paid plan
with unlimited users and access to premium features.
So give it a try.
You can go to clubhouse.io slash CodingBlocks.
That's clubhouse.io slash CodingB's clubhouse.io slash coding box to try it today all right so
as far as resources are like of course the book designing data intensive applications
is fantastic make sure to leave that comment if you want a chance to win that and uh yeah
now it's on to uh alan's favorite point of the show? Or not quite.
How about this?
How about this?
Because I had this question that was written into us,
and it came to mind because you mentioned if you were going after,
I think you said if you were to go after cloud certifications.
I think that's how you worded it, right?
Oh, I did get one.
I got one of those.
So the question that
we received was,
are certifications worth it?
So Joe...
And so I was kind of curious,
since you just recently
got your certification, maybe you would have
an opinion on
such a question.
Yeah, I do. So I recently got the an opinion on such a question. Uh,
yeah,
I do.
Um,
so I recently got the,
uh,
GCP,
the Google cloud,
uh,
platform,
uh,
ACE,
uh,
certification certification,
which is their associate cloud engineer.
And,
um,
uh,
for that's the first certification I've ever went and gone after and,
gotten,
and,
uh,
I've kind of avoided for years because I never really kind of saw the
point I had a couple bad experiences with tests that didn't really I feel like actually accurately
judge like how well I knew a subject like I took like a cold fusion test once and it was all about
CF forms and like nobody used those at the time it was a terrible way of doing things and I was
really upset about the test and I was angry I was like like, screw this. And, uh, but I, you know, I've kind of changed my mind recently a little bit about
some certifications because, uh, kind of two things, two reasons. One is that when you study
for a certification, particularly for something that you're kind of newer at, it really highlights
the things that you, uh, may be overlooking. So if it's, you know, something you've been doing
for 10 years, like, and you doing for 10 years and you're already doing
that job, then studying for the certification isn't really going to highlight areas that you're
weak on unless you're particularly willing to bone up on that language just to do it.
But when you're first getting into something, it's really easy to not understand or not
understand that you're missing big areas or that you have big gaps or big misunderstandings in
your knowledge. So it was a great way for me to learn GCP and have a goal that I was going after.
So it was great for the knowledge aspect.
Also, I think that some certifications are particularly valuable now.
And particularly, I'm talking about cloud and specifically Kubernetes-type certifications.
There's a couple in there like Elastic, Kafka, I think that are really valuable now.
But those are all ones where it's like there's a lot of knowledge that you can be tested
on that could be really important because there are all these like dark little kind
of corners that will trip you up and, you know, bite you on the ankles and mess you
up.
And so I think by having those certifications, you kind of show that you've at least done
kind of a general lay of the land and that you aren't just kind of strongly focused on whatever your small kind of slice of working with that technology like a day job does.
It means that you've kind of kicked the dust off of most aspects of those platforms.
And so I think that me and some organizations are kind of coming around on valuing those
certifications higher.
I think security is probably another great space where if you're working in security
space, those certifications are highly valuable.
Same reasons.
Alan?
Yeah, I have similar feelings and probably even a little bit further.
So one of the things that we talked about in this
episode was the Kafka retention policy and things aging out or being pushed on a different
segmentation type stuff. The only reason I even know about most of that stuff is because I was
actually working towards getting Kafka developer certified, right? And that's some of the stuff
that you learn about as you're going through preparing for that. And that's some of the stuff that you learn about as you're going
through preparing for that. And that's not something I would have assumed the same thing.
Hey, I said that the retention poly seven days, seven days, that data is gone, right?
That's not the case, right? So it's, it's like he said, filling in those gaps is important.
I will also say it, this isn't as important to me. And. And by no means do I want people to get hung up on going and getting certifications over getting experience on things.
I think that certifications help lend credibility to your knowledge and your experience and your ability to work on things.
However, there's going to be people that are like, oh, they say go get certifications.
I'm just going to focus on certifications. And unfortunately, you can go take tests on things all day long and not actually understand
how they work, right? So I feel like certifications lend, if you're experienced in something,
they also kind of give you some credibility to go along with it. So if you're talking to somebody
within your organization and they say,
hey, how do you think we should do this?
And you say, hey, I think that Kubernetes would be a good fit for this
because X, Y, and Z.
And they can look and say, oh, this guy's actually Kubernetes certified developer
or Kubernetes certified admin, right?
That lends some credibility so that it's not just, hey,
I've got this crazy developer over here saying that we should do this. So I think it's two things,
right? Fill in the gaps. I think it's important. And I do think that it can help you sell your
case in certain circumstances. And it could also be good for getting jobs because let's be realistic. Nowadays, your LinkedIn's, um,
that social profile and being able to put certifications on there is a big deal. And I
actually noticed at our job in your personal profile, like in, in our HRE personal profile,
there's a place where you can plug in certifications there. Right? So it can actually matter for you in multiple ways.
It takes a lot of time. It takes a lot of effort. Some cases it takes a lot of money,
but it can be worth it. It's funny. The downside to this is you guys, I know you both remember
this. You remember when MCSE and MCP were like the big things.
And there were so many exam companies that started up that it was, hey, come over here and take our training for $1,000 and we'll get you MCSE certified, right? And you had all these people getting MCSE certified that didn't know jack about how to set up systems.
And it kind of tainted the market back then.
And I think we've gotten past that maybe a little bit.
I don't know,
but I don't know.
So I'm always kind of torn on them.
So,
so the interesting thing is that,
um,
both of you targeted infrastructure-y type certifications.
Nobody said a Java certification or any kind of application
developer certification. You both went after infrastructure type things. So that's curious.
And I don't disagree with it. It's funny that you bring up the MCSE certifications, because that's definitely put a bad, you know, opinion of certifications
in my mind where like, I never went after it. Because exactly like what you just described,
I worked with a guy, we hired this guy. He had every certification that you could imagine from Microsoft that you might want your network and sysadministrator to have.
And I'm not kidding when I tell you he asked me how to find the IP properties on the server.
And I'm like, really, man?
Because I don't have any of the certifications that you have.
You have every one that Microsoft offers.
And so it was at that point, I kid you not,
I was like, these things are worthless.
I will never waste a minute of my time
or a single dollar going after one.
Right.
Right?
Now, that said, you know, things have changed.
Yeah.
I will give you that.
It's been a minute.
And so I will give you that I probably have a very tainted old view of it in perspective, and I should change that.
Because, you know, it does look impressive when you can look at somebody's, uh, you know, LinkedIn profile and
you see those kinds of things. Right. But I think that what I would say is, you know, like whether
to go after them or not is like, I would definitely put the, um, emphasis on the experience, like what you said, Alan, like that cannot be understated enough.
And if after going through that process of gaining that experience and
whatever the thing is of choice, that's of interest to you,
if you're at a point where you can get the, get the certification for it,
by all means, man, go for it. Go ahead. Like, why not at that point? But I wouldn't start
at it from the inverse. I wouldn't start with like, oh, I should probably go get the certification
in this technology that I know nothing about. So I'm going to study for the certification exam
without trying to gain the experience. Because you can do that and you can get the certification. But when it comes time to, you know, show off that knowledge,
you're going to come flat. Yeah. Right. It's true. And I mean,
I'll give a good example.
So I have been working towards the Kubernetes developer.
I think it's CKAD maybe. I think that's right. At any rate, there's,
there's a Udemy course that is, you know, get your CKAD Kubernetes certification.
You know, this course will teach you how to do it.
But it's a great, I'm not using it to go get the certification.
It is a fantastic course to show you all the things that Kubernetes can do, right?
So, yes, it's a good study guide, but more or less, it's almost like what this podcast is for me.
What I think this podcast should be to most people is how do you improve your knowledge set without having to know everything? because it's really hard to do, right?
So there's no way I'm going to read through every single doc on the kubernetes.io website and go
through everything. But if I can go through this course and this guy's like, Hey, this is a config
map and this is what it's used for. You know, this is a security vault and this is what it's used for.
It's like, Oh, okay, cool, right? Now, when I go
do something, I'm not going to do it harebrained because I'm going to have an idea that, you know
what? I heard about this. I need to go look at it. So, to me, it's like an accompaniment, right?
Like it helps push you forward. It helps build your knowledge. But if you feel like you're close
to it, do it, right? Hey, look, I'm not going to lie.
If you go get your cloud certification in AWS, if you become a cloud certified architect in AWS,
probably gonna have a decent paycheck. If you get your cloud certification in Azure
as an architect, you're probably going to have a good paycheck, right? Like, so
these things can help you, but they should be part of your
experience as you go. Not the only thing, because if you get in there with a cloud architect
certification and you can't answer simple questions, you're messing it up for yourself
for a long time. And you're also messing it up for everybody else that comes after you.
Yeah. And that's, that's where it's so tough so like to the question about like which ones i don't have any
specific recommendations i would just say like if it's already something that you are in and you
feel like you've mastered it then you know and there's a certification available for it go for
it by the way i am cold fusion certified nice very nice. Was it for CF7?
That's what I saw.
I took a practice exam, and it was like CF form out the wazoo, and I had a fit.
No, this was pre-Java version of ColdFusion.
So I think it was CF5.
I think it was CF5.
It was a layer certification back at the time.
I still had the bag, the green bag with the certification.
Yeah, it's been a minute.
That was like $700 back then, wasn't it?
I think I spent – I think it was $500 back in the day.
It was not cheap.
And I didn't study for it.
How about that?
And I still got it.
That's awesome.
I'm embarrassed to say what my last certification was in.
So let's just move on.
CPR?
CPR. No. That was a good one though i mean i did get that one at one point but yeah like real quick before we move on i don't want
to belabor this too long though why didn't we mention languages yeah i mean it's a good question
why didn't you i don't know why you didn't is it is to me i think it's because like i i mean technically mine covered i think my my at least i intended for my answer to cover it like
if that's what you feel like you're good and go for it see and i think i'm pretty good at languages
but i just don't i don't know maybe it's that i want to fill in my knowledge gaps elsewhere i i
think that for some of them you just don't hear a lot about it like java maybe microsoft technologies maybe a javascript one i don't know so i'll tell
you the reason is like if i see you know a javascript certification from like udemy or
plural site or uh you know corsair or something like it makes me think that you spent a weekend
with the course and took the test and did it and all that's good you know it shows me that you
care and you're driven whenever,
it doesn't tell me that you can program,
that you can get the job done
and you can hear a use case
and then go off and make it happen.
But when I hear that someone has like John Calloway
from Six Figure Developer
just got the Azure DevOps certification
and I know he does a lot of work with it.
And so when I hear that he's got the certification,
it tells me that he went and he looked
and zeroed in on the area and made sure that he covered all his bases.
So if I talk to John and he's consulting with me on a job, I know if I ask him about something that he just doesn't – there's not like some big missing hole in his knowledge.
And it's like, for example, like billing is something that was big on GCP.
So if someone asks me a question about GCP, it's not like I don't know about billing accounts and how that works and how that relates to projects and stuff.
So like, you know, like a bare minimum that there's not some big fundamental misunderstanding
with how I think about the system.
And if you're someone like you code, say like you work with AWS technologies all day long
for three years, you may only work with like, you know, S3 and Dynamo and you may know nothing
about IAM or the networking or any of that stuff.
And so I think when it comes to technology,
the cloud technology specifically,
and those kind of infrastructure type ones,
it's really important for those people
to like show that they've got a lay of the land.
They've got a wide knowledge of that platform
and not just like a, you know,
kind of a very narrow view,
which can be really dangerous
if you hire someone who's only worked with S3 and Dynamo
and they're setting up your billing accounts and maybe
they don't know how to use the pricing calculator or don't even know that the pricing calculator
exists or something like that.
Well, that's where they need flow for the pricing gun.
You know what?
I think this answer, though, made me realize why I don't think I've ever focused on languages
is because what you just said is you're thinking about use cases and outcomes
when you're thinking about infrastructure, things in the cloud, that kind of stuff.
Just because you passed a developer certificate doesn't mean you know how to program.
It just means you know the pieces of language.
You know the libraries.
You know where to go find things.
Well, I mean, for me, where I was thinking that, though,
was that for some, it's easy to say who the owner of that thing might be to even give the certification out.
So for Java, it's easy for, like, you know, back in the day Sun and now Oracle to be like, hey, I'm Oracle Java certified or whatever.
Right. You know, but if there's no quote owner to it, you know, like a JavaScript, then it's kind of like you went after a Udemy one.
It's like, OK, I mean, yeah.
Yeah.
You got the concept.
If it's your first job, you know, if you're coming out of high school or college or something and then you spend a weekend and got a React certification, I'm much more impressed by that than the person who just got out of college and didn't do that.
Totally.
Totally.
Yeah.
And we're not trying to,
we're not trying to downplay people that have gone and gotten their J two E
certifications or anything like that.
Right?
Like it's,
it's not a small amount of work.
It's just why I haven't chosen to do it.
Yeah.
If,
okay.
So fair.
Then if you're,
if you're brand new to it,
to,
to the industry, then, you know,, then the certifications might be better.
But if you've been in the industry for decades and you're like, hey, I just got a JavaScript certification from Udemy, then –
Well, I'm not talking –
I don't know.
Like when I said the Kubernetes thing, it was actually a course on how to get certified CNCF Kubernetes.
Well, yeah, but that's an infrastructure one.
That was a language one.
Right, yeah.
Getting a certificate for completing a course on Pluralsight or Udemy is probably not what you're aiming for.
Yeah.
All right.
Well, I think we already said that we were going to have the resources in this – obviously, we're going to have a link to this book, Designing Data in Terms of Applications.
But with that, let's get into Alan's favorite portion of the show.
It's time for a joke.
I got him again.
One last one.
One last one.
I promise.
Last one. One last one. Promise. Last one.
So Arlene shared this one with me, and I thought this was pretty good because, you know, we've got springtime coming up, and you're going to want to get out there and whatnot and get active. So this was a – she sent me a screenshot of a tweet from Dad Jokes and said, I made a playlist for hiking. It has music from Peanuts, the
Cranberries, and Eminem. I call it my trail mix.
That's pretty good. I like it.
And with that, we head into Alan's favorite portion of the show.
It's the tip of the week.
Yeah, baby.
All right.
So seeing as I was missing last episode and got impersonated, or actually the episode before last where I got impersonated.
There wasn't an impersonation there.
That was wrong.
All right.
So the first one, I was chatting on Slack, which if you're not a member of our Slack, you should be because it's full of awesome.
I was chatting with Steven Leadbeater, and we were talking about security stuff and certificates and all kinds of randomness.
At any rate, he drops this link out there nonchalantly called keycloak.org, and it's amazing.
We've talked about if you go to create a side project or any project, you typically need authentication and all that kind of stuff.
And it can be a pain in the butt, right?
Like you want somebody to be able to log in with their Facebook account or Google or whatever.
There's this little thing called Key Cloak that allows you to sort of painlessly do this.
And from what I understand, it can run in a container and it can federate your authentication and all kinds of things.
So check that out. It might make your life a lot easier if you're considering creating some sort
of membership type thing or anything that needs some sort of authentication. All right. This one
is not developer centric, but for anybody that travels at all, man. So this came from Jamie from the.NET Core Show.
So while I was over in London for the NDC conference, trying to get around in a place where public transportation is the thing and you haven't been there before can be a little overwhelming.
They got 50 different lines of railway systems or whatever.
And if you're trying to get from point A to point B, it can be really overwhelming.
There is an app on iOS and Android, I believe called city mapper.
You can go to city mapper.com.
It is amazing.
And when I say amazing, I cannot understate that or overstate that enough. I should say, is if I needed to get from, I don't
know, wherever I was to another part of London, I could plug it in and it'd give me like eight
different ways I could get there. It would tell me the roughly the times, how much walking you
had to do, what rails you had to get on, how much money it was going to cost you. Like, so from what
I understand, this is only available in like major cities that have you know
that they have access to some of the infrastructure so like the railway times the subway times and all
that stuff amazing like killer application it saved me many many times and then when you say
it's limited though you're not kidding man i mean you're talking about like how many cities are 41
okay it's big cities, right?
But it's mostly cities with public transportation.
And one of the key things there is they even had Uber, right?
So if you wanted to skip all the public transportation, it would give you prices roughly for what it would cost you to Uber from point A to point B.
So that was really killer.
I see a smirk on Joe's face.
I don't know what that is.
I heard somebody's notifications, but I can't call them out on it because that also meant that
I was chatting while we were podcasting. That's awesome. All right. And then here's
my last one. I was paying attention there for what it's worth. I appreciate that. Yeah,
you're welcome. That's fine. Yeah. All right. So here here's my last one and this one's kind of interesting so
you guys have heard or at least if you haven't you should know of my love for docker right and
i use docker in all kinds of crazy ways uh outlaw and i were talking about it tonight where he'll
use it just so he doesn't have to install software on a system right i have very similar type things
like if i want to run ruby i don't want to install ruby 2.6 and 2.7 and 2.5 and all those on my system, right?
Like I'd rather run a Docker container that has it all in there and I'm good. It's interesting is when you're running Kubernetes, one of the dreams of it is you have this infrastructure and it can deploy containers out to different things, right?
That's the whole point of it.
Well, when you really dive into Kubernetes, you find out this notion of a node is kind of a server, all right?
And then you have containers that run on different nodes. Well, one of the things that always bugged me
about running Kubernetes locally, because when you install Docker, you can say, hey,
turn on Kubernetes and that's fine. But you have a one node server, which really bothered me,
right? Like, because some of the cool stuff with Kubernetes is you can create taints and things
like that on the nodes to where like, let's that, for instance, you want to run a database in a Kubernetes cluster, and then you've got your application servers.
Well, that database server or that database needs to run on some beefy hardware, right?
So you want that thing to run on your most powerful node that you have.
And then your application stuff can run on all the other kind of nodes that are semi-powered, but they're not crazy powerful.
Well, one way you can do this is have multiple VMs on your laptop or your home computer,
and then you can register those things with your Kubernetes cluster. And then that way,
you can say, hey, I want to run my Kubernetes containers on these various different nodes,
right? They're all treated as servers. Okay, Well, you can do that. You could totally load up VirtualBox and then add a bunch of different Ubuntu servers or CentOS or whatever you want, but that's kind of a pain. Ubuntu
actually made this thing called Multipass that is really sweet. It's a command line way to spin up VMs quickly and easily. So if you wanted to say,
have four different Ubuntu instances running so that you had four nodes for your Kubernetes
cluster, it's basically multi-pass launch and then name it. And you have a VM running and you
can pass in the number of CPUs, the number of RAM you want and that kind of stuff.
And you're good to go.
So this was something I stumbled across.
I've got to play with it some more, but it's really promising for being able to spin up VMs in a very lightweight way.
So those are those are my tips.
Oh, yeah, it's really cool.
I was blown away when I saw it. I do feel like, um, just to
elaborate though on the Docker thing. Like I do, I do feel like in a lot of times I'm like,
I use Docker for the dumbest things possible. Like everybody will like spin up a Docker container or,
or like create a Docker image to be like, Oh, I want to run my app server and I'm going to like
install this in it and I'm going to install that in it and now I'm going to run it and I can just, boom,
I can hit it with all these different services and whatnot.
And I'm like, yeah, well, you know what?
I'm going to Docker run for one command
so I don't have to install whatever it is.
I don't want WGit on my system.
Yeah, yeah.
That's right.
Dude, there's a whole slew of Microsoft containers for mimicking Linux commands.
Oh, we've talked about that.
I can't remember his name off the top of my head,
but I will definitely have a link to it in the resources we like
because I want to say his name is something, Stephen something.
He was titled the Docker Captain for Microsoft because we've referenced his repo before
of all the different Docker containers
or Docker images that he created,
Docker files that he has available
for all that kind of stuff.
And yeah, I'll be like,
oh no, I don't want to install Postgres,
so I'll just Docker run Postgres
so that I can PG dump some other database
and I'm done. Why bother to install Postgres so that I can PG dump some other database and I'm done.
Like, you know, why bother to install Postgres when there's a Docker image already available for it?
Yep.
So, yeah, I use it for dumb things.
No, those are amazing things.
That's my guilty pleasure.
Okay.
So, my tip of the week.
So, one, okay, last episode, two episodes back, Joe and I were talking, and Joe brought up his tip of the week, which was muzzle.
Now, do you guys ever – am I – I can't be the only one that quite often goes and looks at the source for some of these things?
Oh, yeah.
Okay.
Thank God.
Yeah.
All right.
So Muzzle was Joe's tip of the week from a couple episodes back.
And if you haven't checked it out already, I cannot stress enough how you need to go
to MuzzleApp.com because it is hilarious to watch the messages that will fly in.
So for Alan's edification,
cause you weren't on that episode.
If you get notifications,
they're coming in.
What muzzle does is it will automatically silence those notifications on your
Mac when you are sharing your screen.
So it happens automatically for you.
Like Mac OS has the capability to turn off those,
you know, to get into a do not disturb mode, but this does it for you automatically. And if you
notice these messages flying in, they are hilarious. Right. And so I was curious because,
you know, I was like, man, I just want to read the full list of the messages. Like, you know,
I can sit here and watch them come in one by one, but sometimes like, you
know, you might blink and, you know, oh, it's already gone.
And so I just wanted to like, so I started hunting through the source because I just
wanted to read like the full set of messages.
And in doing so, I found this beautiful gem that was hidden in there that I didn't know was a thing until, until now,
which is a, we've talked about, um, in the past, like Lipson, Lipson, Lipson, uh,
Loram Ipsum for like, uh, picture generators as well as text generators, right? Well, there's a random user.me site, which is a
random user generator. And what it'll do is it'll just return back like, hey, here's the name.
And here's the, if it's male or female, here's a photo for it. So if you wanted to create something
random, like what Muzzzzle has on their app for
showing like, Hey, you got this notification from Sergio and you want a photo to go along with it.
Like random user.me will give you just, you, you hit the API and boom, you'll get back a random
call. So for example, if you were to go to muzzleapp.com
and you were to open up your dev tools and you watch your network tab, you will see a bunch of
calls coming to randomuser.me and you'll see what I'm talking about with what that payload looks
like. And it's just so awesome. You're like,
oh man, I never knew that was a thing. So if you ever have yourself, if you need random users
for your system, random user.me. That's cool. All right. So that's, that's my first tip. Uh,
then my second one is, uh, I forget now, maybe it was a couple episodes back.
No, I think because I think Alan was here for that episode where I had talked about, dang, who was it?
Was it Russ that told us about it?
The Git playing cards?
Yes.
You remember that?
I remember that.
Okay. you remember that? I remember that so I found this other one
that
like I gotta have this in my life now
it's a
get cheat sheet
coffee mug
and here it is
right there for you
but they have these for everything
so you can go to
remember the api.com and you can find all your favorite things there. And that's cool. They'll
just be cheat sheets for whatever. But you know, of course I was going to like pick out the one
forget, but yeah, there were some great options there and there. So like, uh, let me just see,
like what was some, let me go back in refresh my memory of what some of
the other ones were. Um, okay. So a Vim cheat sheet, we we've talked about our love of cheat
sheets. There was a Reg X cheat sheet. Uh, there was a, um, well, those would both get,
then they had them in mouse pads, uh, water bottles, like whatever you want, you know,
like you want a, a travel mug for your coffee instead. They've got that
versus the traditional kind of, uh, um, you know, coffee, you want a, you know, just a notebook,
you need some stationary and you want your notebook to like stand out from everybody else.
Why not have a get cheat sheet on it? Right. And it's like, there's a, remember the, the,
you know, like when you were in school and you would, uh you would get your new notebook and it would have like,
oh, hey, here's all the tables of conversions for metric measurements to imperial measurements or temperature measurements or whatever.
Yeah, so now you can have a good cheat sheet on it.
So I thought that was pretty awesome.
Oh, there's a Kubernetes one.
Oh, was there?
I didn't even see the Kubernetes one.
Really?
Yeah, man. How am I still not seeing the Kubernetes one?
Docker CLI, Kubernetes. What? How come you're seeing more
cool stuff than I am? I'm on remembertheapi.com slash collection
slash mugs. Oh, okay.
Oh, okay. Yeah, yeah, yeah. Right. Oh, computational
complexity cheat sheet.
Yep.
Big O.
Tell me you don't want that in your life.
Right?
Oh, yeah.
Now I see the Docker CLI one.
That's great.
Cron cheat sheet, man.
I'm telling you. Oh, my gosh.
Everybody needs that mug.
Everybody.
You tell me you remember every position of that.
I always forget it.
I'm like, wait, is it going from least to most or most to least?
All right. Whatever.
Inside my dumb head.
I want to do a cheat sheet mug delivery service.
Every month you get a different cheat sheet for some
sort of tech tool. Oh, yeah.
That's amazing. They need to do that. That would be
a great Christmas gift for everybody.
Okay.
The last one that I have here
is that
Alan actually told me about this one.
How did I not know about this already, A?
B, how is Alan not use this as a tip of the week, B?
And C, forgive me if Alan did use this as a tip of the week and I forgot because I swear I was listening.
Reference to the comment earlier.
Yep.
But in your dev tools in Chrome, if you are in your JavaScript file, you're trying to debug something maybe,
and you know the specific function that you are looking for, you can type Control-Shift-O, and then it'll bring up a prompt,
and you can just type in your method name, for example,
and it'll navigate right to it instead of going to it by line number.
Yeah, I was watching Outlaw one day.
We were trying to figure something out,
and he kept doing a Control-F to go find stuff.
No, no, no. I was doing Control-G.
Oh, you were trying to go straight to the line or whatever,
and I was like, hey, dude, just do this. go straight to the line or whatever and i was like hey dude just do this you type in the method name and you're right
he's like oh it's just one of those things it's muscle memory right i've been doing it so long
that i don't even think about it anymore so very nice yeah all right well you guys just did like
seven tips uh and uh maybe eight i only have one tip but it's super good probably so that makes
sense uh whatever podcast app you're using right now um after you finish this episode
go and subscribe to tabs and spaces this is a new podcast but it's super lit uh if you're a member
of the slack then you've seen uh many of the the characters
around we got um it's like an all-star cast it's basically like uh i want to say the coding blocks
of the uk but we got zach braddy uh james cynical developer uh and the progman uh.net core and
waffling tailors uh it's an excellent show they only got two episodes out but i just tell us a
bit amazing it's going to be amazing and i see their first episode was 59 minutes so i feel like you know there was a
decision there to keep it under an hour and the second episode is 75 minutes so i can see that
they're on a very similar trajectory to us and they'll be at three hour episodes in no time
and i'm looking very much forward to listening to it and it's great it's conversational style
so i i'm i'm to wager that, uh,
I'll wager a few euros or whatever pounds. Uh, if you like this podcast, you're probably going
to like this tune. You should check it out. Hey, so I haven't listened to it yet. I'm
absolutely going to, but I met Zach Braddy and I met Jamie Taylor. Unfortunately, James couldn't
make it. But if this, if this podcast is half as entertaining and half as fun as the conversation
and in time that we had talking while i was over there it's got to be amazing so yeah you're gonna
be cracking up it's great that's excellent uh i'm absolutely looking forward to it and they've
even got a sweet looking logo so that yeah the site's really good some jerks hey wait the site's really good. Some jerks. Hey, wait.
Our site's really good.
We can use a little touch-up.
There's this jam stacking.
I ain't nobody got time for that.
All right.
Well, we hope that you've enjoyed this episode as we've dug into how to write our databases to disk and retrieve that data.
And this is just the start of this awesome conversation. Next up, we're going to be
talking about sorted string tables and log structured merge trees and even B trees.
That's when you get like a lot of B bees around you. So be careful of those.
Yeah. So if you are happen to be listening to us because a friend pointed you out to us on
the website or they're, you know, letting you use their device, you can find us on all your
favorite podcast platforms. So be sure to subscribe to us there if you haven't already
subscribed. So iTunes, Spotify, Stitcher, whatever your favorite podcast destination might be.
And if you haven't already left us a review, we would greatly appreciate it.
Obviously, you're going to get your name butchered by me.
So, you know, I can only say that, you know, you're welcome for that. You can find some helpful links there to leave those reviews at www.codingblocks.net slash review, as I remember how the internet works.
See, you can't mess with the trade.
Yeah, see.
That's why you can't say it right.
Yeah, I was going to make it a Twitter thing or something.
That's right.
So while you're up there at codingblocks.net, check out our fantastic show notes, discussion
examples, and more.
Load your feedback, questions, and rants
up into a big bag and come
and just drop them in a slack. Boom.
By going to codingblocks.net
slash slack and sending yourself an invite.
Make sure you follow on Twitter
to at codingblocks or head over
to codingblocks.net and you'll find all our
social links at the top of the page boom
dude
33%
I wasn't that far off
hey you can lose and still be a winner.
Dude.
Oh, man.
That's amazing.
Bernie taught me that.
We got a show to do, guys.
There's two. There's two answers. I was only off by seven.
But it was the majority.
Oh, man.
That's so funny.
All right.
I think I'm good.
I can't hazmat.