Coding Blocks - Intro to Apache Kafka

Episode Date: May 26, 2024

We finally start talking about Apache Kafka! Also, Allen is getting acquainted with Aesop, Outlaw is killing clusters, and Joe was paying attention in drama class. The full show notes are available on... the website at https://www.codingblocks.net/episode235 News Intro to Apache Kafka What is it? Apache Kafka is an open-source distributed event streaming platform used […]

Transcript
Discussion (0)
Starting point is 00:00:00 all right what are we talking about so uh no talking why because that's it's 2024 man they heard the music all right um hey we got some of it has to go in like we have a website codingblocks.net oh my god who doesn't have a website we we never put this in the show notes but we have a slack go to codingblocks.slack.com or codingblocks.net slash slack and sign up because there are some amazing people in there and amazing conversations going on and uh i'm dozak All right. All right. So who are you? I'm Alan Underwood. All right.
Starting point is 00:00:48 I'm Michael Outlaw. All right. Perfect. Well, today we're talking about Apache Kafka, not to be confused with the was a Ferdinand Kafka born in like the 1800s. Oh, was that even up for debate? I didn't even think. Yeah.
Starting point is 00:01:04 I mean, you're talking about the mathematician, right? No, the metamorphosis person. What? The guy wrote a book about turning into a bug and his sister threw an apple at him and it stuck in his carapace. Why do you know this? Right? I love that book. What a weird book.
Starting point is 00:01:23 Somebody took a lot of time to write that book. Wow, dude. It's great. You know what else somebody took a lot of time to write that book wow dude it's great all right so you know what else somebody took a lot of time to write apache kafka yeah a long time hey so so before we actually get into the show there was on from apache kafka's website they actually say up there more than 80 percent of all fortune 100 companies trust and use Kafka. And for those bad at math, that means at least 80 out of the 100, usually 81, at least 81 are yes, because more than 80%.
Starting point is 00:01:53 Yes. So, you know, that's, that actually does say something that says a lot, but before we get into all the, I always take question. I always take issue though with like,
Starting point is 00:02:02 it's so well, any kind of quote like that. I'm like, okay, I mean, clearly you mean 81 out of 100, but instead you got to make it sound fancy in marketing speak. More than 80%. Okay, well, that's 81. It might be 82. But wouldn't they have said 82? But does that sound as good as 80? Who knows?
Starting point is 00:02:21 Well, I mean, then wouldn't you have just said 82 out of 100? Yeah, maybe. But that 80% sounds 80 sounds real nice though doesn't it it does the trust gets me though it's like how did you figure that out is that a survey i don't know well they do have a lot of logos on their website which i assume they had to have gotten permission to put up there uh having worked with corporate worlds we know how hard that can be so yeah for sure um i like kafka yeah i like kafka before we get into why we like kafka and what it actually is we have to do our little news segment here so outlaw you're gonna read us any proper names today. Fine. Uh, from iTunes, we have,
Starting point is 00:03:07 and from Spotify, we have, and lastly, boy, I hope I get this one right from audible. That hurts. All right. So that's a big,
Starting point is 00:03:21 thank you. Great job with the pronunciation. Uh, thank you. Those were hard. Those were hard. Those were difficult. Yeah. The,
Starting point is 00:03:27 the, the air. Hey, if you, if you haven't had a chance, go to coding box.net slash review and, you know, please leave us some kind words.
Starting point is 00:03:35 Kind words are better than harsh words. Uh, but the last bit of nudes that we talked about last time and is, and is coming up again here.ember 7th is atlanta devcon and that is atl devcon.com is their website we might go to that you know we're all actually in the atl area now so we probably should go to it let's do it but yeah so coming up oh we do have hats don't we we have stickers we have hats we have okay i still haven't gone by the box dude somebody reached out to me a month ago about i need to go i'm doing it tomorrow i'm going by the box tomorrow so apologies for whoever sent in the
Starting point is 00:04:18 swag it's lunch by the box tomorrow jay-z you want to make that happen you want to take a bet that i don't go to the box. Oh, I can't go to the box tomorrow. That's not very nice at all. Kind words. What are the odds? Give me some odds. 80%?
Starting point is 00:04:36 More than 80%? I'm going at 80%. That's a good number. That he doesn't go? 80% is a good number. It is a good number. 80% of all fortune 100 companies use rust apache coffee should it be apache kafka you know we've talked more about what kafka isn't than what it is yeah i'm actually looking forward to this episode because we've been saying that we need to talk about it for two years and so here here we go. Let's do it. So what is it? They actually, all right. So in fair and fair disclosure here, I basically copied and pasted this from their front page, all of this.
Starting point is 00:05:17 So I'm probably gonna have to do some wordsmithing on this so that we don't get called for like copyright infringement or something, but it's an open source distributed event streaming platform used by thousands of companies, more than just 80% in the top 100 for high performance data. Now I, it doesn't seem like we should have to talk about that sentence much, but I actually want to, because it wasn't long ago that I did a presentation at NDC London,
Starting point is 00:05:48 where they had a completely different statement about what Kafka was at the time. Like they have full on embrace this streaming platform thing now before it was just a distributed uh queue that was um i can't think of the word for it uh durable and persistent was what i was looking for that was used to basically share data between different consumers and producers that that was essentially what the old definition was but they've gone all in on this data streaming stuff. You know, we should say too, that when we say streaming, like a lot of, a lot of times now people associate the word streaming with like video, uh, Netflix, you know, stuff like that. And that's not the kind of stream we're talking about. We're talking about basically data coming in and in part. So basically the opposite of batch processing, where you like
Starting point is 00:06:42 grab a whole bunch of something and you do it and do it again on a a schedule or when whenever you complete so that's the kind of streamer we're talking about yep so you have any thoughts on it outlaw streaming versus the yeah i was shared i was just trying to think of like the way jay-z was describing i was thinking about even even in like more layman's terms where like, you know, you might be more familiar with processing, like, Hey, I'm going to select these a hundred records and do this thing. But in a streaming kind of world, you haven't got that kind of time. So as each individual record comes in, you're trying to make a decision about what you do or don't do at that time, streaming streaming application right yep like every time a row comes in yep so let's talk about some of the core capabilities of kafka and the first
Starting point is 00:07:33 one and this is probably the reason why so many companies actually have embraced it is high throughput so they say that they have messages that are network limited, meaning it's not IO bound. It's basically at your NIC interface, right? With latencies as low as two milliseconds. Yeah. And what was that? I forget the word for it, but they had, there's some configuration set up called like zero memory or something where they're able to actually take in data directly from like the
Starting point is 00:08:02 network connection and write the disk bypassing ram completely i forget what it's called but um like there's a couple uh little tricks that make that hide through but i know we're not diving into the details here but uh you know it's interesting even how they get that um the way the kafka kind of difference differs from like traditional cues where you kind of like literally go uh request and get an item of like a unit of work and then you process it and you put it back or whatever and you have this overhead associated with each message and like Kafka kind of tends to grab stuff almost like you're slicing off like butter or something you know like with a
Starting point is 00:08:34 knife where you kind of get to pick how much you want and then you kind of bite that off and go chew on it and all this I think we said we weren't going to get into. Well you know you know it's funny I just realized because when I was putting the notes together, they did change their verbiage on their homepage so much that I think it actually overlooks what it is. And I didn't even put it in the notes. What it actually does is it's a queue. It's just a persistent distributed queue, meaning it's not a database. You can't query it.
Starting point is 00:09:07 It's not, you put something in, you have something that can take it out or not even take it out, can see that that thing was put in and it basically does it in a queue order. So people that are probably familiar with, with, uh, what are they called? Like there are particular things. i can't think of what it's called but like rabbit mq is a really popular technology that's been used for a long time and the biggest difference is something goes into the queue as soon as it's read or used it's gone it gets deleted right in kafka that doesn't happen you read it and it's marked sort of as red but it's just a queue it's just there it's a persistent very fast cue so and that mark is associated with the consumer so it's like i know that alan read
Starting point is 00:09:51 this piece of data right so i wanted to get that out of the way or alan's consumer yes consumer group yeah so i wanted to get that out of the way because I just realized that it wasn't anywhere in there. So keep in mind, Kafka is nothing more than a very fast, persistent queue, like at its very simplest terms. Do you agree with this statement? This is from Kafka, like top line of their documentation. Event streaming is the digital equivalent of the human body's central nervous system. Ah, geez, marketing's at its again. Right? Yeah. I was like, oh, i mean like there's a lot going on sure okay i guess and like i guess the point is that like at any one point in the body like you know decision is immediately being made and it's moving on
Starting point is 00:10:37 but also i guess like groups of parts of the body could decide to make decisions like, oh, we should move that arm away from that flame. I'll read this to you, see what you think of if they did a better job of event streaming. Who did it better in defining event streaming? Event streaming is the practice of capturing data in real time from event sources and storing these events durably for later retrieval, manipulating processing and reacting to the event streams in real time, as well as retrospectively and routing the event streams to different
Starting point is 00:11:13 destination technologies as needed. Yeah, that's pretty good. I'll say it's okay, but it made a couple of assumptions about the use cases, like the durability, which is, you know,
Starting point is 00:11:23 kind of optional. And I forgot the first, the first thing I took kind of issue with. Oh, the real-time, which it doesn't have to be. But I understand that if we're talking about real-time event streaming, then I think that's all fair. It's just there's so many different ways you can configure it that you can just get all tangled up, wrapped around the axle, trying to describe it, and that's what I'm doing right now.
Starting point is 00:11:44 Sorry. All right. So we did it better is what I'm hearing okay yeah we did it better you know i will say i i searched their front on their home page for the word q not on there that's what i was saying yeah i looked at the getting started thing and um the same thing uh no word the q word q is not mentioned anywhere and they don't have pub stuff together they discuss uh publishers and subscribers of course but it is in their documentation somewhere yeah i mean i guess they're they're basically they're trying to get you to buy into the idea of data streaming and forget about what's actually going on behind the scenes but i think it's important to know because, I mean,
Starting point is 00:12:25 I've had people ask me questions after presentations or whatever, and they're like, well, can I use it for this or can I use it for that? And I'm like, well, I mean, just keep in mind, it's actually a pretty simple-ish, you know, oversimplifying things here, but it's a fairly simple idea of what it's doing. So, you know, you can do tools on top of it to do some of the things you want but it itself is just a queue yeah it does a lot you can it's got a lot of configurations that you can do really cool stuff with that really change it um like you know some things like like real time
Starting point is 00:12:55 or not durability or not or um searchability or not even joinability like the ability to kind of bring information together a lot of that stuff is uh you know you've got options but it's definitely got things that it's good at which is high throughput scalable permanent storage high availability and you've got things that it's bad at like searching yeah well i mean it's not made for it again you put a tool on top of it but yeah so so the next thing and you just said the word is scalable so i'm going to read this because it's actually pretty impressive and i've seen i've seen things from places like microsoft where they actually say And you just said the word is scalable. So I'm going to read this because it's actually pretty impressive. And I've seen, I've seen things from places like Microsoft where they actually say things like this scale production clusters up to thousands of brokers, trillions of messages
Starting point is 00:13:36 per day, petabytes of data, hundreds of thousands of partitions, and can elastically expand and contract storage and processing like those are big big numbers yeah pretty impressive the elastically is interesting when i think something scales elastically that to me tells me like i can just say like oh go from four to five or you know whatever and that's not been my experience with with uh kafka but hey you know maybe they know something i don't yeah I was curious to learn that part too. Because when you talk about expanding and contracting the storage, maybe it's probably my fault. I think of Kafka in a Kubernetes cluster kind of thing.
Starting point is 00:14:20 And I'm like, Kubernetes doesn't really allow you to reduce the size of your your PVCs. That's a manual. That's a very manual thing. And like Joe was saying, if you decided like, oh, hey, you know what? I want to add two more brokers to my cluster. You have to do some work to rekey things back. So like I'm curious to learn, like maybe there's some new capability that we aren't yet taking advantage of. And that could be because skipping ahead, you know,
Starting point is 00:14:49 because we are in a Kubernetes world and rely on Strimsy, which is the, the operator implementation for us to be able to use Kafka in a Kubernetes world. Like maybe there's a disc, you know, a lag there, you know?
Starting point is 00:15:04 Yeah. I think there's lots of stuff we can learn from this. I mean, we're not going to be talking about it on this episode, you know, a lag there, you know? Yeah. I think there's lots of stuff we can learn from this. I mean, we're not going to be talking about it on this episode, but like what Jay Z said, there's lots of configurations. Yeah. I mean,
Starting point is 00:15:13 there's control and there's, there's a whole ecosystem of tools that have grown up around this, that kind of help with some of the stuff too. Yeah. And it's not just that there's lots of configurations. There's lots of configurations there's lots of configurations that you could do so um the next thing they got is permanent storage and this is one of the key ingredients of kafka compared to something like rabbit and q or mq or some of the others out there
Starting point is 00:15:38 it stores streams of data safely in a distributed durable asterisk and fault tolerant cluster. So the reason why Jay-Z called out a minute ago that, you know, durable is sort of in question. It depends on the number of brokers you deploy. If, if you only have one broker and no replication, then it's only as durable as that one broker is, right? If you have it on three brokers with no replication again you are tied to those disks that those things have if you don't have any replication set up so well as soon as you start creating a cluster with replication now you have durable fault tolerant things so it all boils down to your configuration by the way it was called a zero copy the thing i couldn't remember the name of before and it's a little more nuanced than the way i said it before but i
Starting point is 00:16:32 think it's close enough basically it's they minimize uh having multiple copies of data kind of shuffled around in like memory and whatnot so it's able to like actually send data back and forth across the network without like running through the java program like the broker for example and so it's able to actually send data back and forth across the network without running through the Java program like the broker, for example. And so it's just really efficient. And so zero copy doesn't actually mean literal zero. Just that's kind of the guiding principle behind it. Okay.
Starting point is 00:16:54 So trying to keep it close to zero. All right. And then the last thing they have here for the core capabilities is high availability. And again, I just copied what they had. They stretch clusters efficiently over availability zones or connect separate clusters across geographic regions. That's pretty important. Like, you know, if you're in a cloud world, right,
Starting point is 00:17:18 you want to have your data in more than one availability zone. So if, you know, heaven forbid, one of the data centers catches on fire or something happens, right? It gets flooded. Then you still have data available somewhere else. And so these things can connect and you're kind of safe. And I think outlaw you've played with some of this,
Starting point is 00:17:37 right? Which part though, the like, like having the data replicated in multiple zones, maybe, maybe you have no because well no because we were looking at cheating it but mirror maker would have been the way that i wanted to go which is like if you wanted to mirror maker skipping ahead is if you wanted to make a copy
Starting point is 00:18:00 of your cluster if you wanted if you wanted like a redundant cluster kind of situation hey and and just to note again we're not getting into some of the nitty-gritty stuff here but they mentioned strimsy a little while ago even if you're not in a kubernetes world so strimsy is the operator like they said that allows you to deploy kafka and manage it a little bit easier than if you're just doing it all manually in kubernetes even if you're not in a kubernetes world it's worth looking at their documentation to see all the tools that they tie together that they use to manage and run kafka clusters because he said mirror maker there's there's a kafka connect there's all kinds of things that they have in their documentation
Starting point is 00:18:42 that would lead you as a person dealing with kaf Kafka to know the tools that you should probably be looking at that things like Strimsy have sort of standardized on. All right. So next up, the ecosystem. So this is where they start talking a little bit more marketing speak, right? So built- in stream processing, and this is processing streams as events with joins aggregations, filters, transformations, and more using event time and exactly once processing. So there's some caveats here. What they just said is not it's Kafka as a platform. So they're talking about their entire platform and all
Starting point is 00:19:26 the software that comes with their platform. They're not just talking about Kafka. So events with joins, aggregations, filters, transformations, that is a different piece of software that operates on top of the Kafka technology, right? That's Kafka streams or KSQL or any number of other things. Flink is one that we said that we want to get into beam is another technology from a patch days past. Yeah. So there's, there's a ton of things that happen here and the actual under underlying queue itself is what enables a lot of this. You can connect to almost anything was another one of them. Kafka's out of the box connect interface integrates with hundreds of event
Starting point is 00:20:09 sources and event syncs, including Postgres, JMS, Elasticsearch, AWS S3, and more. By the way, that's capital C. It's a solution for basically getting data in and out of like data sources. So like syncing data for either producing or consuming from Kafka. And it's a super powerful and frustrating, but extremely useful technology because essentially, again, this isn't, this isn't Kafka.
Starting point is 00:20:41 This is the software that's been built on top of Kafka as a platform that allows you to hook into, like when it says it can hook into Postgres, like it can basically do change data capture from Postgres. And so anytime something's updated in Postgres, it sends a message to this connect instance, and it'll write that to a Kafka topic. And then you can have something that listens to that particular cue, that topic and have it write it somewhere else for you or transform it and then write it, whatever.
Starting point is 00:21:10 But it's like this whole connect the dots type technology, right? It's almost like Legos for putting stuff, moving it from one place to another. There's some, there's another thing for this like this though, that I'm trying to remember. And for some reason I can't think of it.
Starting point is 00:21:26 In a SQL Server world, using a linked server to get data from some other source into your SQL Server database. And that's kind of like, I'm not thinking of the right thing. Oh, you're talking about SQL Server management ss is packages yeah yeah ss is where like you know you could use that as a tool to get data in and out of your database and that's how i kind of you connect it's like it's a tool that can get data in and out of your kafka cluster and just like ss is like it could be a variety of different uh you know sources or outputs you know just depending on what you're trying to do well those are those are connectors so just to be pedantic uh connect is actually the server that kind of orchestrates the tasks and the jobs and stuff like that and it
Starting point is 00:22:15 runs the connectors and the connectors are we talking about we tend to like talk about them the same but just wanted to just in case someone is not familiar with it i wanted to clear it up well that's fair i guess yeah i guess what i if we're talking about kafka though like i just wanted to like cursory say connect and then like not get lost in the weeds of everything about you know postgres or elastic search or mongo or whatever yeah if you want to sync data between two databases kafka is a great way to do it you hook up connect uh you set up two connectors one from like say mongo one to postgres and then say data comes out of here and it goes into there and it's going to be really efficient it's going to come out by tailing like the like the write ahead log or whatever this you know change or whatever, depending on the database,
Starting point is 00:23:05 and it's going to put it in in a really efficient way too. People spend a lot of time making these really good and really efficient and documentation and all that stuff. You don't have to sit there and hand code like, well, here's my watermark, and now let me go get the next batch size 500. No, none of that stuff. All this stuff is available for you,
Starting point is 00:23:22 open source and well-documented. Declared it. Yeah, declared it, which of that stuff. All this stuff is available for you, open source and well-documented. Declared it. Yeah, declared it, which is really nice. One thing that I think might be worth calling out that they said in this, in the what is it thing, is I think they actually rebranded Kafka instead of just being the queue behind the scenes. They said it's an open source distributed event streaming platform. So I guess we need to quit separating what Kafka actually is and what their platform is because it looks like they've rebranded it, which is why we're talking about Connect and built-in stream processing and all that kind of stuff. I guess we came from the old school version of Kafka. Back in my day. Yeah, right. Exactly. So client libraries, they come with,
Starting point is 00:24:08 with several, they have read, write and process streams of events and a vast array of programming languages. When they talk about vast array, they really do have a lot. They have Python, they have C sharp, they have our Java. Java is their predominant one. Um, they have, they have it in C plus plus they have it in a bunch of different things. And I think a lot of the languages wrap a lot of the C libraries as well, or they used to. So they're super high performant and really good. The Kafka protocol that they actually use to communicate with Kafka with the brokers is they've done a really good job making it like kind of forward safe. And so you can use like a really old library to talk with the newest brokers.
Starting point is 00:24:48 Like as long as it's can kind of negotiate and agree on a shared protocol, which goes back to, I think like 1.1 is the last one that was kind of, or that was the first one. I think that was Kafka 1.1 was the first one that kind of introduces paradigm. So if you're talking to a broker, you know, prior than 1.1, you're like 10 years in the past at this point. But everything since is going to work really great with these client libraries.
Starting point is 00:25:13 I'm going to look that up now. I don't even know what version we're on now. I don't either. But you know the reason why a lot of it still does work is because the underlying storage system and communication protocol is very simple. Like it's not a complex thing. So they were able to hammer it out and it's just been working well for a while. is because the underlying storage system and communication protocol is very simple. It's not a complex thing, so they were able to hammer it out, and it's just been working well for a while. The current version of Comcast is 3.7. Wow. Okay.
Starting point is 00:25:34 And then the last thing that they call out here on the ecosystem is they have a large ecosystem of open source tools. Between stuff that Apache has actually done as well as confluent and, and a bunch of other companies on top of it, just even community type stuff. There are a ton of things that have been built for Kafka at this point. So their next thing is trust and ease of use. So I guess the trust, what they're talking about here is mission critical stuff.
Starting point is 00:26:09 So you can do some of the things that they mentioned here, like guaranteed ordering, zero message loss, efficient exactly once processing. So those are all options that you can configure. You don't have to configure. They don't have to be ordered exactly how you want it. They don't have to be exactly once. Like you can tell it, I don't care if it's exactly once. Like there's all kinds of things that you can do here, but they do give you the options. Yeah, that's pretty intense. When you think about it, these are not lightly given guarantees. Like if you've got a thousand broker cluster,
Starting point is 00:26:43 just imagine the data is a thousand brokers and saying like we'll we guarantee that we'll preserve the order of the the events that you send us or we promise you we're not going to lose anything once we've confirmed that we've got it and you know exactly once there's another another thing when you start talking about thousands of brokers like that's not trivial that's a tough problem to solve and a lot of these configurations like alan said they're like you're gonna pay for it in some way whether it's like slowness or you know uh you're gonna make some sort of trade-off on the cap theorem there it sounds like a challenge though i think i could lose it yeah oh you could definitely make mistakes with kappa all right so so now we're back to the greater than 80% of fortune 100, but they say trusted by thousands of organizations, which is greater than the hundred of the top fortune 100. So, you know, more, more market speak, but it's legit. Uh, they say from internet giants to car manufacturers to stock exchanges, more than 5 million unique lifetime downloads that's pretty big pretty good yeah
Starting point is 00:27:46 and i mean they're legit i it really is a fantastic platform built on top of like i mean without going too crazy we've been running kafka for years now and i'd say by and large it's been pretty problem free would you agree yeah and there's times when we've made the mistakes and had to, you know, clean up after it. But, um, you know,
Starting point is 00:28:07 that's, you can't blame them for that. Right. What would you say outlaw? It just kind of runs, right? Um, I would say the problems that we have had would be self-induced.
Starting point is 00:28:21 We have not had problems where it's just crashed and it was the fault of it i can't even think of a time when the client libraries like the publishers and the subscribers have ever had any problems like it's truly it's truly been a very solid platform to build on top of so yeah i mean i am trying to be very careful because like what I mean by like the problem, you know, the mistakes of your own making, like if you don't account for enough storage, right. You know, sure. You can definitely get yourself in a situation, but that's not its problem. That's like you didn't account for it correctly. Yeah. So configuration, uh, if, if you have critical data and you didn't have your retention periods that long enough on a topic or or if you weren't compacting data that was coming
Starting point is 00:29:11 into a topic meaning that you know if you were doing what are they called cqrs systems or whatever where it's like uh constant updates it's not that i can't even remember the pattern the one where where every record builds on top of each other, sort of like get event sourcing event sourcing. Yes. So if you were doing an event sourcing type thing, then that's fine. If you want to keep all of them,
Starting point is 00:29:33 but if you're trying to treat it more like a database thing to where it always has the latest state, you can set compaction and do that. So there's all these configurations that you need to know about that can bite you in your application world. But for the most part, Kafka just chugs along so pretty cool um and then the last thing here is they have a vast user community they really do they have one of the five most active projects on the apache software foundation that's pretty impressive i don't know what the other four are, but it'd be interesting to find out.
Starting point is 00:30:08 HTPD? Apache itself? That's the only one I can think of. Is that Apache? Isn't Nginx an Apache project? I don't know. I thought it was. Let's find out. Now everybody goes to the internet
Starting point is 00:30:23 machines. Is it Apache? I don't see Apache on the front page. Uh-oh. Uh-oh. This is not an Apache. What are the five? I thought Nginx replaced the Apache HTTP. That just goes to show you what I know.
Starting point is 00:30:45 235 episodes. We've reached the end of my knowledge. We're done. Man, I'm really going to have to look this up at some point. All right. Well, back to the show so we don't leave you guys hanging around. I found a nice list. Did you?
Starting point is 00:30:59 Yeah. So here's a couple things. So HTTP, which we used to just call Apache. Then there's Tomcat, Java server. You're using it in the background if you're doing Java, even if you don't know it. Apache Cassandra, we've talked about. Hadoop, Spark, we've talked about quite a bit. Flink, we've talked about quite a bit.
Starting point is 00:31:17 Beam, we just mentioned. I don't know, Mesos. I think that was kind of like a predecessor to Kubernetes in some ways, wasn't it? Wait, do we really think cassandra is more active than kafka oh no i just found the 10 oh okay the 10 okay got it yep uh kafka and solar okay i'm surprised solar's still up there with elastic search taking over the search space pretty hardcore well solar is in elastic search it is. It's the core of it, right? Yep. Okay. Cool. Alright.
Starting point is 00:31:48 Well, yeah, leave us a review. Whatever. Come back. You know, it's just, it bugs you so much so I can't not stop now. I gotta keep going. I feel like it's like that
Starting point is 00:32:03 7-Minute Apps thing, like you're twitching. Like I feel like it's like that, you know, seven minute apps thing. Like you're twitching. Right. Exactly. Like, yeah, I think you need to try and stack overflow the review stars. That's,
Starting point is 00:32:12 that's what we need to see. Stack overflow. The review stars. Wait, how do you get more stars? Is it not a stack overflow? We want a, what's it called?
Starting point is 00:32:21 Oh, buffer. No, not a buffer. What's that? What's the number overflow that you stack? I can't even think of it. No, stack overflow is when you have too many things on the stack.
Starting point is 00:32:29 Yeah, arithmetic overflow. That's what we need. Thank you. Maybe we can go negative. No, why? Like, why? No, I had to go there. These guys, they're marketing geniuses, we're not.
Starting point is 00:32:49 Welcome to our world of 235 episodes of Marketing Genius. Real men of genius! I'm taking little pokes at Kafka for their marketing for their marketing uh materials in the meanwhile like what if you gave us a negative one review right yeah so maybe i shouldn't be talking smack poking holes in the ship while we're out at sea that's great it's my nature the one that was the story of the scorpion and the frog i can't help it what you know it's like one of Aesop's fables I think
Starting point is 00:33:25 or some fable like a there's like a frog and a scorpion and it was flooding and the scorpion's like yo can I get a ride and the frog's like
Starting point is 00:33:32 yeah hop on and halfway through the stream the scorpion stabs it in the back the frog's like let's do that for now we're both gonna die
Starting point is 00:33:40 and the scorpion's like well it's in my nature why'd you give me a ride why'd you trust me yeah you knew who I was beautiful and the scorpion's like, well, it's in my nature. Why'd you give me a ride? Why'd you trust me? You knew who I was. Beautiful.
Starting point is 00:33:50 Alright. Well, now that we got that business out of the way and you leave us all your stars until it overflows to negative, let's play a game of mental blocks with the real Man of Genius.
Starting point is 00:34:11 I can't remember what commercial that's from. Budweiser. Was it Budweiser? Yeah. Okay. Oh, man. That's been a long time ago. Here's to you, Mr. Real Man of Genius.
Starting point is 00:34:22 Yes. Okay. Okay. So. you mr real man a genius yes okay so um oh for those over the pond on the other side of the pond budweiser is a beer that you probably don't want to drink because you have better beers over there yeah we call those american beers all right so uh let's see. This is episode 235. So, Alan, you are up first, according to TechCo's trademark rules of engagement.
Starting point is 00:34:50 A very bad streak going at the moment. Yep. Yep. All right. And your categories are constitutional matters. Colorful, colorfully named people. Hell to the chef, marriage story, ironic man perms, and lastly, GI track. Wow.
Starting point is 00:35:20 Whatever the second category was. Colorfully named people? Yeah, let's do that one. Let's go four. Four. By some accounts, a character from Reservoir Dogs was the inspiration for this stage name of singer Alicia Moore. I don't know. Alicia Moore. I don't know.
Starting point is 00:35:51 Can you just, are you allowed to just say the last thing? Cause you could just pick a color. Blue. Joe for the steel. I know. Yeah. Nothing. Uh, you guys are both going to be upset. Pink. Oh, this deal? No, we got nothing.
Starting point is 00:36:06 You guys are both going to be upset. Pink. Unbelievable, man. Oh, man. Alright, Mr. Pink. Nobody wants to be Mr. Pink and that's why you're going to be Mr. Pink. Because I say you're Mr. Pink. I pick the names.
Starting point is 00:36:23 What? What? You remember that whole conversation like nobody wanted to're Mr. Pink. I picked the names. Okay. What? You don't, you don't remember that whole conversation? Like nobody wanted to be Mr. Pink. No. Okay.
Starting point is 00:36:33 He's just like moving on. I don't have time to educate these people on spoiler alert. In the nineties, there was a movie that came out. Okay. So, uh, Joe,
Starting point is 00:36:45 your topics are composer playlists, best picture winners in a nutshell, a year that ends in zero. We'll give you the, the events. You tell us what year the ends in zero. They all happened in. Okay. peninsulas
Starting point is 00:37:07 headquartered in and we'll need the name of the city or brown out brown is in quotes brown is in quotates because that means the word will appear in all the correct responses. I feel like these are easier just for you to help you out there, Joe, too. Like if that zero one, the year ends in zero, it also means it's divisible by 10.
Starting point is 00:37:34 So, Oh, look at that. I don't know if that probably five as well. Yeah. And also two, but I was trying to like keep things simple. This is tough. I feel like I could actually do a good job at three of these okay let's hear it um composer playlists only because i've been listening to that uh audio
Starting point is 00:37:52 thingy all right let's do it make them i'm not doing that one uh you're the ends in zero like there's only there's only so many uh-huh uh and then i feel like i could do headquarter didn't really give i'm gonna go with Years to End in Zero. I think I can do this. This is going to be embarrassing. What's your number? All the way. Oh, five.
Starting point is 00:38:15 He dialed it up. Wow, yeah, he did. Alright, here we go. The U.S. hockey team wins Olympic gold. Lin-manuel miranda is born oh geez how old are they yeah right look if you start getting if you get this you googled it period no and look i i don't know about the hockey but i I have seen Hamilton. So I'm going to go with 1980. That is correct.
Starting point is 00:38:53 Unbelievable, man. Golly. That was the year I was born, by the way. So, yeah. It's a shame. Unbelievable. They should have included the most important thing from that year, which is the hit single, Dancing on the Ceiling. Ooh.
Starting point is 00:39:11 What a feeling. Yeah. When you're dancing on the ceiling. All right, here we go. Alan, your topics are? Yep, five. Okay. Was that the column number or the row number that's how well that's what we're going for when i picked when i picked the column ah
Starting point is 00:39:33 english dialects and accents nailed it i just right there i already got that it's the only one i know under the hammer in bookstores now do people go to a bookstore anymore maybe they mean digitally vitamins and minerals okay you're gonna like this one horse hog or dog i'll give you the breeds and you tell me whether i'm talking about horses, hogs, or dogs. That one. And then the last category is write you are where you are are the letters you and are. Each correct response will end with the letters you are.
Starting point is 00:40:25 Ooh, I like that one too, but the horse hog and dog is pretty interesting. Let's do that one all in. Okay. Well, you're going to regret that. Great.
Starting point is 00:40:41 The large black and the large white. What? That is the answer. The large black and the large white. It's a hog. I don't know how you pulled that out. That has to be a guess. It was a guess.
Starting point is 00:41:01 I had a 33% chance of getting it right. Yeah, that's true. That's true. uh all right so we are into our final round of mental blocks category is play titles and you should go ahead and submit how much you care to wager to me privately i should remind you shouldn't have to but for some reason i have to i mean you know for some reason there it is all right let's see you said okay okay i got yours i got yours. I got yours. Okay. Here we go. We figured things out. We're getting there. This 1959 plays title was taken from a Langston Hughes poem that begins.
Starting point is 00:42:00 What happens to a dream deferred? You know this joke? I might. You're so ridiculous, man. Like, really? Say that, will you repeat that one more time for me there, outlaw? Yeah. Joe, you are so ridiculous, man. You know this? Wait, is that not
Starting point is 00:42:20 the part you wanted me to repeat? I'm sorry. Rewind like 10 seconds for that. Oh, okay. I'm sorry. Rewind, rewind like 10 seconds for that. Oh, okay. I'm sorry. This 1959 plays title was taken from a Langston Hughes poem that begins. What happens to a dream deferred? And you can each, uh,
Starting point is 00:42:40 you know, send me your answer privately and did i get i don't i think i've gotten alan's yet oh he's typing it's funny because i was like just sitting here i was like okay just brainstorm all the plays you could think of and i started started listing in my brain. It was literally the first one I thought of. I don't – I'm not happy about this. Okay. Okay.
Starting point is 00:43:17 You know, the fun part is always trying to decide who to go first with. I mean, there's no question. Alan says he's not happy with his, so I kind of want to go with him first. You can do that. All right. So Alan says the title is What is a Broken Dream? And I do appreciate that you put that in question for him. There we go.
Starting point is 00:43:37 If he didn't, he loses. Oh, no. He loses. You wagered five points on that. I did. It was dangerous. I knew it. I wagered five points on that. I did. It was dangerous. I knew it. I knew you were going to do five.
Starting point is 00:43:48 And you have lost all five because that is incorrect. Yes. Jay-Z, to make things interesting, also wagered five. I knew he was going to do that, too. His answer is Raisin in the sun and the correct answer is a raisin in the sun what in the world dude like seriously how much useless knowledge do you have stuffed away up there that's ridiculous you know how i know that no because it was a movie we watched in drama class in eighth grade and i always i really liked it i never forgot it i think about it all the time you need a hobby i the only thing i was worried about is i didn't know the name of the pope i
Starting point is 00:44:38 knew that the name had been taken from a poem that's so ridiculous man yep hi it's good man it's a you know i had the answers in front of me i'm like i don't know yeah oh man that's so frustrating i think i would like a four game losing streak because all the obscure questions on the planet seem to come out and he knows them all i happen to get plays for both of them and i really don't know a lot about plays or musical even like when i saw i said i saw hamilton i saw on tv you saw it i didn't see it though i guess i need to stop watching nba and nfl and start culturalizing myself somehow alan have you not seen hamilton i don't even know what it is what oh my god really yeah what is it you've never heard of the plate like it's been talked about for years now
Starting point is 00:45:31 you do you even know who lin-manuel miranda is then do you even get that reference none oh really yeah if you had seen that then you would have guessed it's like either it's 1970 1980 or 1990 uh never never heard of it oh okay you should watch it it's on all right plus i'm gonna add it i'm gonna i got rid of disney plus so i'll have to find another way gotta go to broadway there we go i'll do broadway i love broadway well tickets when that came out though were like outrageous that was part of the reason why it made so much news because like when it first came out tickets were like outrage that was part of the reason why it made so much news because like when it first came out tickets were like ten thousand dollars a pop who we like they were super pricey to go see it when it very first came out oh man that leads me into if we're done with this
Starting point is 00:46:19 uh this tail whooping that i just received uh into, into boomer hour. So can I just finish with one thing though? Like if you've never heard of Hamilton, like it's, it's basically like a rap version of American history. Oh, it's like you, you hadn't, I can't,
Starting point is 00:46:36 I still can't believe you haven't heard. I'm trying to like give you hints here, man. I'm helping you. It says it came out based off a 2004 thing. I know dude, never heard of it. Was it 2004? I thought it was was based off a 2004 thing i know dude never heard of it was it 2004 i thought it was later based off based off something from 2004 the production came out off broadway in 2015
Starting point is 00:46:53 i'm on the wikipedia page it was huge yeah so i'm i'm good i will i will research this before our next get-together. Surprisingly good. I could be well-versed in it. All right, so Boomer Hour. So $10,000 tickets. It just sprang to mind. So I love the NBA playoffs. I've always loved the NBA playoffs.
Starting point is 00:47:20 I mean, you've got the best basketball players in the world battling, right? I hate watching the playoffs now with the advent of cell phones over the past 20 years. And you see people sitting on court seats on the court. Those tickets were probably 10 grand a piece or more. Oh, they're more. Yeah. If you're in the Knicks, they're way more. They're probably way more than that and two-thirds of the people are staring at the phone clicking around on i'm like are you
Starting point is 00:47:52 serious go give that seat to somebody sitting in the nosebleeds that actually cared to come to the game and watch it or get off your phone like it it drives me crazy to see like in in hd all the 4k glory ultra hd you know somebody holding their iphone they're staring at the iphone while while somebody's like you know hitting a game winning shot or something i'm like really really so that much money i just did a search for courtside hawks tickets and vivid seats says currently the hottest hawks tickets cost twenty five thousand four hundred twenty three dollars which could represent floor courtside and the hawks aren't even good guys that's my point that's my point like you're saying ten thousand dollars and i'm like no no a lot more than ten
Starting point is 00:48:41 thousand dollars i mean seriously it is it is disgusting to see how many people have that much money that are just like, ah, whatever. It doesn't matter. I mean, I'm just came here to chill because, you know, I want to be on camera, but I'm, I'm so busy chatting with my friends on Facebook or whatever that it is my mind boggling. So that's it for my boomer hour. You guys got anything? Yeah. mind mind-boggling so that's it for my boomer hour you guys got anything yeah i'm i'm looking at like there's a whole uh atlanta hawks subreddit that has a conversation on court side tickets i was trying to see if there was somebody in there but those are also like
Starting point is 00:49:17 seven years old you know seven year old conversation but so i mean depending on who they're playing i've actually seen where some of the tickets are like $1,500 or $2,000 for either right on the court or the row right behind. But, I mean, you're probably playing a team that's not really got any playoff hopes or anything. But still, $2,500 a ticket. You'd think that it'd be worth looking at the players on the court instead of that little screen that you probably stare at,
Starting point is 00:49:46 you know, 90% of your life. I just, I don't get it. I get, I gotta imagine though that a bulk of those tickets, uh, you know,
Starting point is 00:49:53 excluding like celebrity, you know, kind of things. Cause I remember like Jack, Jack Nicholson was famously like a big Lakers fan and he was always floor, you know, court side. But so excluding those i gotta imagine
Starting point is 00:50:06 that like just across the league in general that the vast majority of courtside tickets probably are corporate owned in order to be able to afford it you know yeah because it has to be because then it makes sense because from then the the corporation can kind of like make excuses for it to say like, okay, to justify it where they can say like, okay, well, you know, I could bring a customer here as part of a self pitch or whatever. Yeah, yeah, exactly. You know, and that's the crappy thing that that's how that's, you know, turned out, right. That so much of it is just, you know, corporations get to gobble it up. It's insanely unaffordable.
Starting point is 00:50:47 But yeah, I mean, it always kills me when I see people that have it made and they don't even seem to enjoy it or take advantage of it or whatever. It's not just basketball games. It's just the NBA playoffs are going on right now, and I see it every time I watch it. I'm like, that's so disgusting yeah i mean again you're not i bet you wouldn't see like a jack nicholson or like you know the super fans that you know that they're watching the game yeah yeah all right uh any other boomer you jay-z you got any boomer stuff no oh no not today i'll think of something no boomer stuff i never have boomer
Starting point is 00:51:25 stuff wait what are you talking about all right fair enough i'm i'm the boomer all right so back to the kafka platform event streaming outlaw already defined it uh he actually read this verbatim earlier so i'm gonna go over that one again. Oh, uh, whoops. It's all good. Uh, but what might be useful is all right. So we know that it's now this thing that sends data and all that kind of stuff.
Starting point is 00:51:56 And it's sort of real time or can be what, what does it use for? Like what are some real use cases? So I think there's some good ones here. Anybody want to hit some of these? What about processing payments and financial transactions in real time? Like it? Pretty good.
Starting point is 00:52:13 Well, I was thinking. Go ahead. No, go ahead. No, I don't want to. Well, I was thinking like IoT devices that you have out in the field. So like a bunch of sensors and whatnot that are all like just throwing in data. Yep, yep. Tracking automobiles and shipments in real time.
Starting point is 00:52:31 You know, Uber probably uses it behind the scenes. I haven't looked it up, but it would make a lot of sense. Yeah, totally. And connecting and sharing data from different divisions of the company. I'm surprised they didn't have the CDC case in there because I think syncing data between two databases seems like a really good use case yeah it's a big one i was actually going to say the same thing like just the whole notion of of getting data into a
Starting point is 00:52:55 central spot that other things can leverage is massive um and for the record, Uber does use Kafka. And as of 2021, they had one of the largest deployments of Kafka in the world. driver is as he's moving around it's got to be written somewhere and kafka is super fast and super durable so it's probably writing coordinates there every you know 30 seconds or something right to keep it up to date yeah the publisher i would think is just like putting all the data it's got in there and then the phone your consumer is actually there and kind of um grabbing whatever data it needs like it may not need the full history it just wants to know the current location of the car for example example. But you know, all that information can be handy after the fact. And it's also good for real time.
Starting point is 00:53:50 It's great. Well, not only, not only that, but think about like a Amazon deliveries. Have you ever seen your Amazon deliveries do the same thing? Yep. You know,
Starting point is 00:53:58 10 stops away. Yeah. Yeah. And you can like see where the driver is and everything. We, we live in such a spoiled world now where like we need immediate you know satisfaction and gratification like where's my thing that i bought you know i right i typed i typed on my computer that i wanted something
Starting point is 00:54:18 why is it not already here and so to to you know to satisfy you right like oh i can see the little pac-man moving on the screen now and it's almost closer oh speaking of which i know we we typically have long episodes but there's tons of good tips at the end of this one so even if you skip ahead or whatever you should you should at least check them out there's some good ones this time uh unlike those other episodes yeah all the all the past 234, they were terrible at tips, but these will be good today. Um, he's talking about yours,
Starting point is 00:54:48 Jay-Z. Yeah. That's fair. It's fair. Okay. So now of course, again, they've sort of branded themselves as a streaming platform.
Starting point is 00:54:58 So what does that mean? Why, why are they saying that? So they say that there's three key capabilities that makes this thing a streaming platform so one you can publish and subscribe to streams of events and i think i even have the definition in here somewhere but an event's nothing more than a record that's that's being written right so when they call an event, it's just the concepts. Okay. Yeah. Um, it can store these events durably and reliably, reliably. Again, we talked about that earlier and you can
Starting point is 00:55:35 process these things in real time, real time, meaning when it hits, then your thing is notified of it and can do something with it. Or you can even do it retrospectively, meaning, Hey, I want to do some processing over data. That's a month old. Now you can do that, right? So you can either do it on things that are coming in right now, or you can do it over the data that you've gathered over the past month, two months, three months, a year, whatever it is. Um, and and then so these are some of the details nitty-gritty you can deploy to bare metal so you know on an actual computer that is set up or server you can do it on virtual machines or you can do it in containers or on-prem or in the cloud
Starting point is 00:56:17 and we've done both with on-prem and in the cloud containers. And I think the three of us really like that particular approach. Yeah, I mean, I would definitely think that even if you were going to play around with it, that regardless, I wouldn't say install it bare metal. I would just say use it in a container and experiment and learn and see what you like, what you don't like. There's no way I would recommend anybody like hey go install kafka on your laptop no and even like a container like you're going to be doing a docker compose at a minimum because there's like you know a broker zookeeper depending on your configuration like you're going to end up with like five things running minimum by the time you're done now the one thing that does give me pause on the bare metal thing that is interesting
Starting point is 00:57:11 you already brought it up earlier outlaw is there are some things in kubernetes at least like gke google's google kubernetes engine to where you can't like resize pvcs and there's some things to that it makes me wonder, and you sort of alluded to it, if doing things in VMs or even on bare metal, if it would allow you to get around some of that stuff. And maybe there's even ways around it in Kubernetes that we're just not aware of. Well, I mean, how easy is it for you to reduce the size of your C drive right now on your, you're on a Windows machine right now, right? no that's not any easier than it's going to be in kubernetes well not necessarily if you do something like uh hot swap drives to where you have things that are like part of jbot arrays or
Starting point is 00:57:55 something then maybe it's easier i don't know i don't know i don know. I can't think of any kind of file system where you can easily reduce the size. You can make it bigger. Yeah. You could make it bigger. But they actually said elastically change things earlier. So I'm open to learning what they have in store for me. Right. We should go learn about this.
Starting point is 00:58:21 But I mean, I remember years ago, not to go off too much of a tangent. I remember years ago, there were lots of debates when Docker was getting popular about, yeah, but do I really want to run my database in Docker because I'm going to lose performance because I have this abstraction on top of my OS and on top of my hardware. And there became a tipping point. And it was probably Kubernetes, to be honest, to it's like yeah i don't care that it's not as fast because now it's sort of a self-running healing maintaining sort of beast as opposed to running at bare metal but i think we've also we can't i don't think we had to sacrifice speed too though like i mean we're definitely in a tangent here unrelated to
Starting point is 00:59:02 kafka but i mean you don't have to give up speed to be in Kubernetes, right? Like depending on your node pool size, your node pool configuration, the type of disks that you're using, like there's a lot of variables that you have at your disposal. You do. I think, especially in the world of cloud nowadays, I think the scalability maybe outweighs the raw performance that you might get on a verimetal type thing right i mean you could have a dedicated node pool just for like some kind of
Starting point is 00:59:32 database like uh you know whether it be like a postgres or a kafka or whatever you know whatever your storage engine is like you could have a whole dedicated node pool just for that that's super high performance CPU memory disk. Yeah, true. So it's pretty interesting. And then the last thing here, and this is kind of cool, is you can run these things self-managed, which we sort of do in terms of how we run Kafka. Or you can do it via various cloud providers, providers as managed services, which we had actually used confluent at one point in time, which they did a really good job making it to where you didn't even think about the platform,
Starting point is 01:00:14 right? You just had your producers and consumers and, or subscribers and life was great, right? Minus the credit card you might have hooked up to that. Oh, wait, I'm using that much space? Oops. The best thing was the UI. Yeah, there's a learning curve to Kafka for sure. Most of it has to do with the interface.
Starting point is 01:00:37 The lack of a UI. Right, none. All right, so how does Kafka work? And again, they're talking about the platform. It's not the actual storage system so it's it's evolved um it's a distributed system that's composed of servers and clients that communicate using a highly performant tcp protocol that's that's it it's pretty simple that sounds like everything ever written right yeah it's done ever yeah exactly oh you're the one that did it with the high performance tpc for
Starting point is 01:01:07 oh well we weren't on the high performant one that was all we were missing yeah yours has the turbo button ours has the ultra button yeah so their servers they have a cluster of one or more servers they can run in multiple data centers or cloud regions or whatever right like it doesn't matter we already talked about the fact that they can scale out brokers this is an important terminology thing so kafka anytime you're talking about kafka storage you're talking about the brokers this is the storage layer this is where your data is written and you can have lots of them whenever we talk about Kafka, I'm only ever thinking of the brokers. Isn't that crazy?
Starting point is 01:01:46 I never think of Connect. And I think this is just the mindset of what is your use case of the type of work that you're doing at the time? And from the point of view of trying to size a Kafka cluster or maintain a Kafka cluster, For example, I'm just only ever thinking of like brokers. Well, maybe the zookeepers, cause that's part of that, you know, but I'm never thinking of connect and like the fact that we've even have connect. Cause I know it's coming up in our bullet point. It always, it just kind of bothers me. Cause I'm like, Oh, it's not Kafka. Talk about Kafka.
Starting point is 01:02:22 Yeah. It's the platform. Get on board. Hey, real quick, though, in the you mentioned Zookeeper, they've been talking about forever that they're getting rid of that. Have they yet? It's no longer the default configuration. Oh, interesting. OK. OK. But that's new. Like, yeah, they really slow played it.
Starting point is 01:02:40 So they announced it, made a big deal about it, and then like slowly spent the next five years rolling it out. But yeah, I mean, I think it's supposed to be ready for primetime, but it took a little while for like Strimsy to catch up to. So like Kafka would release something and then, you know, six months later, the Strimsy release would have it too. So between those two things, it's like seemingly always been unavailable. But now it's out there. It's ready. Go, go, go. I wonder if there's any major benefit to
Starting point is 01:03:06 using the non zookeeper version of it because i mean zookeeper is fairly hands off when it's set up and running so yeah zookeeper is great there's a reason why everything bundles zookeeper is like just does it does this thing it's it's been the standard for years maybe a decade now um all right so yes the next, the next bullet point that Outlaw was talking about is Kafka Connect servers. And these, we talked about it earlier, these are the servers that allow you to import and export data from
Starting point is 01:03:33 Kafka. Or, no, no, import data into Kafka from some other system and then shoot it out to another destination from your Kafka topics. Yeah, it's really nice. And it's got optional configurations for things like,
Starting point is 01:03:49 they call them SMTs, single message transforms. So if you say like, well, I'm getting this data out of this database, but I only care about these fields and I want to drop the others, or maybe I'm dealing with JSON and I want to pull, I want to extract this piece of JSON and move it here. And I want to rename this field to there. And now I want to reset and I want to re-snapshot and I only want to start tailing
Starting point is 01:04:08 the logs after yesterday. All those kind of things, all that stuff is just taking care of you and configurations. It's really great. All declarative, like Outlaw mentioned, you can version control this stuff and see exactly what was in there. Kafka Connect is really nice. We've definitely had our struggles with it as well.
Starting point is 01:04:25 I definitely think, you know, when we talk about thousands of brokers running in clusters, they didn't say thousands of Kafka Connect running in a cluster. You know, I'm still kind of curious to see some stuff around that, but it's still really good. It is good. I will say we've had our struggles with it. Just like he said,
Starting point is 01:04:46 a lot of that has been not understanding configurations on when you set up your, your sinks and your, uh, what are they? Sinks and sources, sources, sources and sinks.
Starting point is 01:04:59 Like if you don't set the configuration, right, you can totally shoot yourself in the foot and you don't realize it until you go back. You're like, Oh, I got a hole in my foot. It was that. Oh, yeah. You know, there's tons and tons of configuration. I mean, it's complex.
Starting point is 01:05:13 If you think about it, you're hooking into a Postgres or a Cassandra or a file system or whatever, right? Like they all have different things that you have to be aware of. So it can be fairly complex, but it a really really powerful tool yeah and it's um there's tons of configurations uh via third parties for these uh connectors you know sometimes the third party is apache but it's still like if you're hooking up a sync and a source on kafka connect you're dealing with two different types of documentation for two different things and this one uses this terminology this one uses that terminology and like you'll find the settings perfect oh latch ms that's what i wanted and you set up and then later you read in the documentation like oh this setting doesn't work if you have
Starting point is 01:05:53 those are setting you know configured and so it's just uh the all the bad parts of working with third parties are going to be present here you know but um ultimately uh if you've seen some of the code that i wrote you're very happy that I'm not writing this stuff by hand. Whatever, man. You know, the funny part is we've talked about a long for, for quite some time that we, that we prefer declarative over imperative type approaches, but there is one really big downfall of declarative. And it's just what Jay-Z said. If you use this property, then this property doesn't work. And if you use that one,
Starting point is 01:06:28 so that's not always super apparent when you're setting up your declarative configurations and stuff, right? So that, that one's tough. You almost need a tool to generate things for you so that you get really clean declarative stuff. Sometimes.
Starting point is 01:06:51 I don't know that I understood what you meant as the the downside to declarative like i i didn't get it the documentation can sometimes be unclear of if you choose this configuration up here then these configurations down here won't be applied. Right. Oh, okay. I see what you're saying. And so it's almost like, it's almost like that problem of declarative over spaghetti code. Yeah. Over imperative every day of the week. But basically you mean like sometimes the documentation in none, none to just Kafka, this could be any documentation. Sometimes a systems documentation doesn't necessarily specify that the ramifications of picking this mean that either that is now in use or that is no longer available,
Starting point is 01:07:32 blah, blah, blah. If you do this, then this one no longer applies. Or if you do this, the other one's overridden. It's a reading thing. I learned of a prime example of that this week. Are you sure?
Starting point is 01:07:46 Yeah. Where in Kubernetes, if you declare, and I shared this to Jay-Z the other day, if you declare a volume as a memory volume, right, then without, I never thought about it. Hadn't crossed my mind, but whatever memory requests and limits you set, the container that's using that volume, the rights to that volume count against that memory limit. Oh,
Starting point is 01:08:18 okay. Which is, which it makes sense, but it's also like, Oh, but wait, that limit, that volume is a memory, right? Or, you know, that's the type it makes sense, but it's also like, oh, but wait, that limit, that volume is a memory limit.
Starting point is 01:08:26 Right. Or, you know, that's the type. It's memory. But you weren't thinking about it, right? You're just thinking like tempfs, you know, like high speed. But yeah, that's an example of where like, oh, because you set this thing, now this other set of parameters are in play, and you have to be aware of that. And I tend to agree that I'll take declarative over imperative a lot every day i don't i don't know about every day because there's some configurations that are so complex that you can't mentally map it out right like you don't
Starting point is 01:08:57 even know what you did you don't know what the state of your system is going to be then you write imperative code to create the declare there you go there you go well the the funny part is joking well that's the maven versus gradle world right and and maven was sort of the de facto java compiling thing for a long time and they and a lot of people started moving to gradle because you have more power and control with the imperative approach. So, you know, whatever. They're both great, but, you know, I was just bringing it up as one of the things. Yeah, right. All right, so conflict clusters are highly scalable and fault tolerant. We already said that.
Starting point is 01:09:36 You have clients. These are the things that allow you to read and write and process streams of events. Again, the processing streams of events is is an application feature built on top of the underlying technology and we already said they're available in a lot of languages all right i can't let this go i got like you can loop with declarative by the way okay oh it's such a pain we it's done in sql all the time and we've already established that sequel is a declarative uh language or you can do it in yaml too you can do it in yaml i've seen it in yaml it is disgusting but that's actually dot yaml isn't that a feature go yeah that's the go templating
Starting point is 01:10:17 it's the go templating yeah so yeah it's not really yeah it's yeah it's cheating sequel is sequel is an example of declarative because you're not telling the engine how to retrieve the pieces of data off of the disk or anything like that. You're saying, like, this is the data that I want. You go figure out the best way to get it. And that's why I have a million line SQL code sometimes. Because what would have been easy and imperative is really hard the declarative world was true all right imagine if we lived in a world where all software had already been written and and everything you ever needed to do was already written and your only job as a quote
Starting point is 01:11:00 software engineer was to like configure different pieces of software to do new and unique things, right? Well, isn't that the new role of prompt engineers? It would basically be like Legos though, right? At that point, all the different Legos, each individual Lego shape would be the different software. Now you're just coming up with new ways to connect the blocks to build different things. Well, you know what's so funny about you saying that is that's kind of what they've tried to build cloud services
Starting point is 01:11:29 as it is never that easy right i never would view cloud services like that i mean they're lego blocks the aws i would definitely think they got that the kind of marketing materials and even the like the logos and stuff they use kind of imply that you're like, oh, just plug this into that and glue that there with AWS glue. They literally have that. I see where you're coming from with that, but yeah, no. It's never that easy. I think, well, I mean, we've talked about how all the good computer science was done in like 1960s.
Starting point is 01:12:01 70s. Yeah. So maybe we're already there. Technology's just catching up to all of it now yep all right so we've been around since 1880 so yeah right there you go so we already mentioned that records or events are just records and even in the documentation they'll call them records now here's some important pieces of a record in a Kafka world. It has a key, it has a value, and it has an event timestamp. And then it can also have additional metadata. Now, I did personally, on a personal note, I wanted to share this. The key and value thing
Starting point is 01:12:39 will rock your world when you first start playing with Kafka applications, because a lot of the samples that you'll see online in the keys, they're just strings. And that's fine and dandy when you're looking at the examples and you're trying to make stuff work with it. But what you'll find out is like, oh, well, I'm just going to use a number here. Well, serializers and deserializers and all this stuff that happened in the client libraries can completely destroy you and waste days of your life while you're trying to figure it out. The important part I want to bring up here is a key can be anything. It can be an object. It can be a Java. If let's say that you're in a Java world, it can be a Java object. It can be an integer. It can be a string. It could be whatever you want it to be. The big thing is
Starting point is 01:13:25 you need to be able to serialize and deserialize that thing when it's coming in and out of Kafka. So same thing for a value. Value can be anything you want it to be, but you have to have a serializer and a deserializer set up so that it can read and write those data bits to and from Kafka. And that I think for me was the hardest thing for me to wrap my head around when all the samples online were showing strings for these things. But then when I go to use it, I'm like, well, I have a number in there. Like I have an integer. Why is it not working? And it's because you have to set up the proper ways to read and write it. So just wanted to say that in case anybody goes diving in and looking at this stuff.
Starting point is 01:14:08 Okay, so next. Somebody want to take this next one? I've talked to you. Well, we kind of talked about this already. There's two different users of Kafka. There's producers and consumers. And they're kind of like what you might think they are.
Starting point is 01:14:19 The producers are those things that are going to write to Kafka and the consumers are going to be those things that read from Kafka. And neither knows about each other. Like producers don't care about the consumers. Consumers don't care about the producers. They communicate. Basically, producers put stuff into the topics and consumers take them out. And you can have multiple producers producing to a single topic.
Starting point is 01:14:44 You can have multiple consumers reading from a single topic they can also be one in the same you could have a consumer that reads from one and writes to another but the point is is that yeah to jay-z's point they don't have to be related to each other they can be completely separate so and they're going back to our uber or amazon examples right? Like if my phone, if I'm watching all the drivers around me, uh, you know, when I'm using the Uber app, that's as a consumer and, you know, my app knows nothing of their producer app. That's writing up like, here's my current location. Right. Um, you know, that that's, that's it in a nutshell. Uh, yeah yeah i don't know if we want to get too far into it but yeah we'll leave it at that okay yep and oh uh one thing i wanted to mention is because i
Starting point is 01:15:34 mentioned that the butter analogy that's kind of cool about um both producing and consuming is like we tend to talk about things that like as an event at a time you know like uh you swipe your badge or something and that's an event but the way producers and consumers work kind of underneath the hood is they've got some like some smart logic in there about kind of batching things up and it's all configurable of course so they can do uh things like a latch so say like either every hundred events i'll send or maybe every 10 milliseconds i'll send or i'll send one at a time if you want you know that's all it's all kind of configurable but that's sort of the the idea of like kind of taking like something something like either an uncut loaf of bread
Starting point is 01:16:08 or a butter and deciding where you want to put the knife down and kind of slice off the chunk you want to take comes in. I want butter. Thanks. I always want butter. So that's one thing that's actually really important to note. Kind of what you just said there is, you know, hey, when I hit 100, send that in a batch or every 100 milliseconds, right? Or, you know, the least of the two. So if you only got 50 in 100 milliseconds, well, go ahead and send them to me because 100 milliseconds is hit.
Starting point is 01:16:41 And some people might be like, well, why don't I send them, you know, one at a time every time, just as soon as they come in? Well, because it's expensive, right? Like, especially if you have TLS set up between your client or your producer and, and Kafka, you know, that handshake takes time creating the one record, serializing that one record, send it out, takes time. Whereas if it can bundle it all up into one batch and throw it out at once, it can be way more efficient. So believe it or not, you can get, I mean, through these super micro batches that are going out, you can get way more throughput if you're doing that versus, you know, just sending everyone off as they come. And imagine too, like I configure the broker to be really fault tolerant. Like I don't want to miss a single thing. So what I do is I configure the broker so that basically it only says I have accepted this message
Starting point is 01:17:28 if everywhere, like all the replicas that I'm in charge of have my data. And that takes time, right? So you imagine if like we're doing that for every single little event, these things are coming in, you know, every couple of milliseconds.
Starting point is 01:17:41 And then the broker saying, stop. Okay, I got it. Stop. Okay, I got it. Rather than just saying like, okay, look, here's my latest hundred. And the broker saying stop okay i got it stop okay i got it rather than just saying like okay look here's my latest hundred and the broker says stop okay i got all hundred right yeah i kind of that one act or that one act or whatever yeah like i know that i'm probably i'm
Starting point is 01:17:58 probably really bad about this where i like i'll equate one thing to another like you know like you learn one thing you're like okay you. You kind of start making analogies. So I kind of think of it as like, well, why would you want to do the batching that you were talking about? In the, like, do I send either the amount of time has passed or let me send 50. And I kind of think of it in terms of like well ultimately like if you were trying to make sure that your your frame your network frames are like like i want to like get the maximum value out of that so if i can fully pack that thing then that's to my advantage to
Starting point is 01:18:38 then send that one frame along rather than you know just partials here and there so i don't know well that's part of it the the other is kind of what Joe just said, right? Like, imagine that you get a hundred tickets in and I'm like, Hey Joe, I sent it to you. And then he's like, all right, I got it. All right. Number two to you. Got it. Number three. Got it. Right. Like that's a whole bunch of network chatter back and forth for that. Whereas if you just send a hundred at a time, it's like, I got them. Right. Like you just, I say it once. So, you know, it's easy to equate that stuff. And I think, I think that's how all our brains work, right? Well, we gotta, we gotta somehow pack it into something that we've
Starting point is 01:19:13 seen before. I will say too, well, this is where I'm, I was trying to not go too crazy into the producer, uh, producer consumer discussion, but I will say that the one exercise in Kafka that I hadn't experienced before with other things was in regards to sizing it, like just needing to be more aware of the clients and producers and the numbers the numbers that you like thought you might have of each because the whole way that you size a kafka cluster normally when you think about like sizing like a like if you were going to size a database server you're like well how much data
Starting point is 01:20:00 am i going to store on disk but in kafka it, well, how much data do you plan to put out through that network interface, right? Are you going to have like a single interface and then, you know, like because I'm excluded like bonded interfaces, but are you going to have like that one single interface? Like how much data are you going to be able to push in and out of that? And then that's going to like drive, you know, your, your, the size of your cluster or, you know, you know, be a big part of it, that discussion.
Starting point is 01:20:32 And that was just kind of different, you know, like it was a way different way of thinking about it. Yeah. I mean, for, for people trying to follow along with what he's saying, if you have a Nick interface that has a one gigabit connection, if you have 10 consumers on that connection, let's just say it's all distributed evenly, you've now got, you know, or I said gigabit,
Starting point is 01:20:58 but, you know, it's divided by 10. So each one can do, you know, however many. And so as you add more consumers to it, then you're dividing that up more. So you're slowing down the consumers. Uh, and so if you want to increase that speed, then you may have to have more Nick output or whatever. So that's, that's exactly what he's saying. It's, it's totally different than, than database storage or file storage or something like that. Which is kind of exactly where we started it though. When, when there was the comment that Jay-Z made about like writing it straight off of the
Starting point is 01:21:28 nick and bypassing ram and going straight to the disk you know like that's really like if you want to size kafka it's all about that network uh io and and if you were going to go like on bare metal then that's where like you know bonded nicks you know where you could like have four like physical interfaces that act as one you know kind of thing but yeah and that's where if you're running the stuff in the cloud they can actually make it a lot easier for you because you don't have to worry about that stuff you just say hey i need this and they'll just charge you happily for it and you get it so we need another credit card right exactly that first one doesn't have enough capacity uh all right so the the last bit we're going to talk about here are topics and uh i don't know jay-z you want to take this one uh yeah let's do it uh so topics are
Starting point is 01:22:21 basically events uh groups of events and they're kind of like folders on a file system. Or we've mentioned the term queue several times. It's basically the analog of a queue. But I think the main difference between topics and a queue is the intention. Like a queue, if you think about a grocery store or something like that, it's kind of got that idea of one and done. You process the item and it goes. And so it's got that connotation idea of like one and done like you you process the item and it goes and so it's got that connotation but uh topics are a little bit different like we
Starting point is 01:22:48 mentioned you can have multiple consumers and each can do their own thing so like maybe alan comes and reads from the topic and then michael comes and reads and then i come read the topic and so it just kind of makes a little bit more sense and you often hear the word topic used when we're talking about things like PubSub, for example, over just a normal queue. And like we said, they're multi-producer and multi-subscriber. And there could be zero, one, or many producers or subscribers to a topic that read from the topic respectively. It seems kind of goofy to have zero, uh there are times and i can imagine um we've done a couple things and like connect uses a specific topic for storing offsets which is a whole other kind of topic but basically it's like it's what it uses internally to kind of keep track of uh
Starting point is 01:23:38 the state of various producers and consumers and uh all that stuff is stored internally in Kafka in a topic. And even though it's stored in a topic, that thing almost serves more like a, just a way of just kind of persistent storage. So it's kind of an interesting case where like you wouldn't normally have a consumer for that, but it still makes sense because it's used internally. Hey, hold,
Starting point is 01:24:00 hold real quick. So when they say that it's like a folder, it actually is a folder, right? Like, yeah, if you're going to go look on a Kafka broker, hold real quick so when they say that it's like a folder it actually is a folder right like yeah if you're going to go look on a Kafka broker they have directories where they write the data and and so you'll have a topic directory and inside that you'll see a bunch of files that are all the events that were written and and depending on how the data is written across the brokers you can go on to another broker and you're going to see that same topic folder there with that same thing so when they say it's the equivalent of files and
Starting point is 01:24:29 folders it actually is files and folders on the different systems written to disk yeah that's why we've been in situations where we've run out of disk space we're like well what's the problem here what kind of data then we got like shell in and i i imagine the tools have gotten better now but back in back in the day even just a few years ago you had a shell in like df or dh or whatever and like actually look at those folders on this like oh this is the one that's extra big we got to delete this there's better ways to do all that now and there's probably better ways back then but it's still interesting just to kind of see it like how it's stored underneath yeah um but there's not one file per event they're stored like butter hey what were you gonna say well i was gonna say like another important thing to call out here though the reason
Starting point is 01:25:13 why okay so like it is written on disk in uh directories as follows and an an important thing to keep in mind like why you wouldn't make the analogy of like a table for example or like a document you don't want to think about schema necessarily because one thing that's kind of unique to this is that you can have events written to the same topic that have their own schema that have different schemas. But as part of that event, it says what schema it is. And there can be another topic that has the schemas to read from that so that other consumers can know how to deserialize those messages. And I mean, we're getting kind of like in the weeds, that's way ahead. But you know, the,
Starting point is 01:26:06 the important thing here is that like, don't, don't go after it with something in your mind that's super structured, like a table where everything in that, in that table has the same layout, even if there's some nulls, because that's not going to be the case in Kafka. Yeah.
Starting point is 01:26:24 It's just a stack of data. And what he's talking about with the schema is you don't have to do. You don't have to. Yeah, you could totally just write a bunch of JSON documents or text documents or whatever you wanted into a topic. But what he was talking about is a feature of the Kafka platform called Schema Registry that allows each event or record to have its own schema that can be looked up, referenced, and used. So again, that is in the weeds, but it is a feature that's available
Starting point is 01:26:52 on the platform as it comes, which is pretty amazing. Also can be a pain in the butt. For sure. And like we mentioned, these events can be read many times as necessary, and they're not deleted by the process that does the reading. As far as I know, there's no way for a consumer to directly say like, okay, I'm done with that. Delete it. Deleting is basically handled out of band. You can configure like retention policy that says only keep the last hundred, only keep the last day's worth. And then there's an out of band process that deals with kind of cleaning that stuff up.
Starting point is 01:27:22 And that's part of the magic too. Deleting is handled per topic. We mentioned that too. You can cleaning that stuff up. And that's a part of the magic too. Deleting is handled per topic. We mentioned that too. You could do it off disk. It's going to mess some things up, but that's okay. It'll survive. And one other point to mention is
Starting point is 01:27:36 the performance of Kafka is not dependent on the amount of data, nor the duration of time the data is stored. So storing for longer periods of time is not a problem. And that's a really important distinction. And that's really ties into how scalable things are to say, like whether you have a hundred or a thousand, like you're still going to be able to get, you know,
Starting point is 01:27:55 or a thousand, thousand, thousand, you're still going to be able to get events just as quickly. Now there is one caveat to what you were saying about not being able to delete things out of band. There's one thing I can think of that can change that, and that's when you tombstone a record. If you have compaction turned on on a topic, I believe you can pass a null in for a value with the proper key, and it'll basically set it up for compaction the next time it does it, which means it should technically get rid of it. Compaction is cool. Yeah, it's an amazing thing, but I think that's the –
Starting point is 01:28:31 you can't technically – it's not like a delete from a database, right? Like I delete that record, it's gone. It's not like that. You can – again, it's called tombstoning a record, and the next time the compaction process runs, it should get rid of it out of the queue so yeah and compaction is cool so i i think we've talked about it probably before but basically the idea behind compaction is though it's like sometimes you might have multiple records for a
Starting point is 01:28:54 key for example if you're like thinking of a person in a theme park and every time they go on a ride they swipe their badge or their watch or whatever and it keeps track of them there and you could have all that stuff in a topic great but now if we're worried about efficiency we might want to compact that topic which basically goes by and says we only care about the last place we saw the person so as we're cleaning up let's go ahead and collapse all those keys and just keep the latest and like alan said if you tombstone that one you say basically don't keep it just get rid of it and so that's a way of kind of removing that permanence and we talked about how you can use Kafka to permanently store data.
Starting point is 01:29:27 We talked about those offsets too that are stored in Kafka Connect. That's a great example of something where you want to keep the latest offset. You don't really care about the oldest data, but you also want to keep the data around here forever. So we're going to compact that topic. So we only basically ensure that the latest key and data for that key is, is stored long-term. All right. So I have a bonus question for you guys. Uh, so I'll start with you outlaw cause you're at the top here. What, what would you say, who should use Kafka and why? Whoa.
Starting point is 01:30:07 Okay. Well, you're fresh out of college. I'm looking for a job. Want something to paint on your resume. Yeah, good, good. Now, I would think that it's definitely like a situation where you have a lot of data that's going to be coming in uh rapid fire that you want to be able to do things then and there with it you you can't necessarily afford to wait to do things like you know oh i can i can do things like once a night or you know something like that like if you start finding yourself having to do things,
Starting point is 01:30:46 some kind of process many times a day, then you probably want to be able to react to the data as it's coming in. I guess the react would be a good word for it. Maybe I should trademark that. I should create a JavaScript framework for it. Oh, there you go. That's novel. What about you, Jay-Z? I would say
Starting point is 01:31:09 anywhere that you need that publisher subscriber model distributed across multiple computers or nodes, I think that it's the premier solution for that. It's the most popular one. Even the cloud services are going to be using that stuff behind the scenes for things like Kinesis and whatever Azure calls it.
Starting point is 01:31:27 So I think that would be the prime thing to consider. Also syncing data, stuff like that, multiple consumers. I just don't like thinking. pub sub get part of a become part of a kafka conversation because it's so easy to confuse kafka then with pub sub technologies that solve a different purpose even though i mean yeah you have subscribers and publishers whatever but you know i mean i don't know well so i'm my thinking that it's kind of fair because there are technologies that are built specifically for PubSub, like Rabbit is one of them, like RabbitMQ.
Starting point is 01:32:11 And so I do think it's fair to say if you're considering Rabbit, you should probably be considering Kafka too, just to see if it aligns better with your use cases than Rabbit does. So I like both of those, and I'll tack on mine because I don't think either way you hit on this one is if you're trying to share data between different, we even mentioned in one of the examples, if you're trying to share data between different sections of an organization or, or different parts of an organization or multiple organizations could use the same data, that is a great way to do it because it's almost like your central hub of where people can go to get information. So Outlaw mentioned SSIS. There are so many companies out there that are invested in SQL Server that have SSIS packages set up to import data from all kinds of different places to bring it into a system that
Starting point is 01:33:03 they can either use as a data warehouse or whatever. Well, Kafka can be that central location. And it's truly amazing because it's set up for crazy fast throughput, right? So, oh, I told accounting over here that I have customer, you know, purchase type data in this Kafka topic. they need it, they can get it, right? So, you know, between the data streaming, like what Outlaw said, between the publisher subscriber role that Jay-Z said, and just the sort of the backbone of where data can go that can be shared. I think that's a really good explanation of why you might want this in your organization. So, uh, in the resources we like, I have, I have a link to the powered by thing, which they've got a lot of logos on there. Like I said,
Starting point is 01:33:55 usually corporations don't easily give that stuff up, but there's a ton of logos on this page. So it's kind of fun to see. Yeah. There's a few of them on there. So, all right with that we head into alan's favorite portion of the show i was like i wonder if i can catch him all right well i've got three uh recommendations for things i've ever used but uh all look very cool uh have y'all heard of uh flipper zero no yeah i can't believe we've never talked about this it helps me go to sleep at night it does i mean it might so uh isn't this
Starting point is 01:34:39 no well yeah yes oh yeah yeah we have talked about this okay have i given it as a tip before wait no we haven't talked about this one we've talked about another one that looks like this one i think you may be thinking of the play date i don't think we've talked about the we yeah we have not talked about the flipper this is new this is truly flipper the dolphin flipper yeah i for some reason i've been getting ads from lately like i had a friend who showed it to me a couple years ago and i won't tell you what he's doing with it but uh let me just tell you a little bit this thing so flipper zero is a multi-functional interactive device mixed with a tamagotchi alan you probably don't know what tamagotchi is never heard of it have you ever
Starting point is 01:35:25 seen back in why am i not surprised boomer hour people used to walk around you have like little keychain devices with like a thing with like an egg on it and you could play with the egg or feed the egg and eventually grow up into a animal people would you lost okay it was like you could buy it like a toy store where kids would have it and adults would have it well anyway this is kind of like that so this is um an interaction device right how how genetic does that sound uh it basically has a bunch of io uh capabilities so rfid reader uh also producer nfc uh near field communications gpio whatever that is bluetooth usb um and it's got a bunch of pins also like you would have on an Arduino. So you could hook up like a five volt line and kind of like have it light up some LEDs
Starting point is 01:36:10 or hook up a button or something to it. But the things people do with it are like maybe they would hit the scan button and scan the low gigahertz spectrum. And if someone presses like a garage door opener nearby it would capture that or a key fob or your badge if you're badging into office so they're stealing hyundais with this and teslas yeah awesome okay apparently there was a lot of fun things they were doing with teslas that are like turn people's radios on or off or like do weird stuff with that um but it's it's basically like a little hacking device.
Starting point is 01:36:46 And it's got this Tamagotchi angle, too. It's got a dolphin. And so whenever you hack a device or you scan and get a Wi-Fi password or something, you would kind of level up your little pet, feed it a treat or whatever. And eventually the dolphin kind of grows like laser wings and who knows what else. So as your criminal empire expands,
Starting point is 01:37:06 your dolphin grows. Yeah. It's awesome. I will say it's, um, I'm sure there are lots of legitimate uses for it, but I mean, if you just Google around for five minutes,
Starting point is 01:37:16 you're going to see almost like it's like 80% people using it for like crap, like mischief. Uh, some of which is the kind of nice though, like, uh, people talking about being like at a, like a tavern, like a bar and eating and like using it to turn on tvs uh you know stuff
Starting point is 01:37:30 like that it's got a bunch of built-in uh information for the common devices okay that was the one that i that i was like okay it reminded me going back in the day of where like uh infrared used to be like out of the box on a lot of devices. So the days of Palm Pilots or the iPaks, remember those? Going back to those. And because you just had the infrared, you're like, oh, I don't like that channel. You're at the bar and you're like, let me change the channel myself.
Starting point is 01:38:02 Right? Yeah. Stupid little stuff like that he's not kidding though like i mean he just kind of blew through all the technologies built into it but basically every type of communication protocol hardware platform thing out there this has it like all of them yep it's all open source we do a lot of cool stuff i saw people in the comments they're like i looked up like i actually got a list of seven cool things you can do. But I looked at the comments and people were saying like, oh, the local garage at my bar.
Starting point is 01:38:32 I figured out how to get the thing to go up and down like the what's called the barrier to let the cars in and out. So they can basically exit and enter the parking garage without paying for it. Wow. So dumb stuff like that. So it's just like this little kind of device. And because it's got this stupid dolphin on it, it just encourages you to kind of bust it out every day and see what's around you and see if you can figure out something cool to do now i do have this note from legal that requires that we say that code in box is not responsible for any uh laws
Starting point is 01:38:57 that you may break that we encourage you to abide by all rules and regulations and laws of your uh municipality blah blah blah we do not sponsor endorse condone blah blah blah blah blah yes so says outlaw yes this is actually for improving security though one of the things you can do is uh bypass most uh century safe security so if you got like a digital safe you can kind of crack into it serious no no century safe i thought that was the company that like advertises like monitoring home monitoring uh let's see uh so the one that i'm talking about is literally safes uh might be the same bypassing the century safe sure enough i'm buying one of these things this
Starting point is 01:39:40 is so cool the stuff you do it yeah all right I'd never heard of it. I'm happy that you've introduced me to it. I realized after, though, it was a conversation with my son because this came up on Reddit earlier in the year, and he sent it to me. I thought it was neat. You can crash certain versions of iPhones. Wow, dude. It's pretty fun. Without a flipper, though, right? Yep. Nice. So that's one of my three tips uh haven't used it
Starting point is 01:40:09 had a friend doing some interesting stuff with it a couple years ago and i just happen to remember it uh also you know just for fun i looked up to see if there is a kafka 2e uh text or is it text yeah text user interface and there is this one called cascade which looks really cool i'm a big fan of two ways in general and this one looks like it follows like that same mold so uh i mean this looks very similar to the ones i've used before so what i like about that is that you pop it open and it's like oh here's a list of your topics you could browse you don't have to remember the name of it if i just uh click on one of those or i i arrow to it. It's going to show me information about it. I can take various actions on it. This is so much
Starting point is 01:40:48 preferable to me than what's built into Kafka, which is basically a bunch of utilities that throw in a bin directory. You have to shell them to a broker box too. Then you have to pass 8 million flags for authentication and stuff. I'm much more a fan of using a tool like this.
Starting point is 01:41:04 We've talked about Kafka Cuddle before or K kcat i think whatever you know both of those uh and those both have a paradigm where you have like a config file it's like this is my information now is not have to pass that every command i think kafka cuddle is now kcat oh is it really wait no no i'm thinking of kafka cat yeah kafka cat is now kcat oh is it really wait no no i'm thinking of kaka cat yeah kaka cat is now kcat sorry okay yeah i think it's still kaka cuddle right kaka cuddle is better and there's the one um uh guy i forget his last name his first name was guy and did a kaka connect cuddle too really good anything's better than the Kafka shell scripts that come out of the box.
Starting point is 01:41:47 Oh, they're terrible. Yeah. They were better than nothing. Yeah. Marginally. But society has improved. Yes. I'll just do one example.
Starting point is 01:42:02 High level, they get everything done whatever it's fine but like if you want to like change some internal like partitions of like where uh topics are stored then the way it works is it's got this really cumbersome kind of paradigm where you like run one command to get all of them and then you can like change the file and then you save it again and you rerun again with the file to apply it. And then you go back and check and see if it's done. Just like super awkward. Not, it's just not designed for humans to use it.
Starting point is 01:42:30 Unfortunately. Well, that's going to make my tip of the week awkward. Great. I got one more of this one. Uh, thank you. Uh,
Starting point is 01:42:40 micro G, uh, micro studio is a web-based integrated development environment for making simple games. And it's open source. And it's just like the Game Boy Studio that I mentioned last time. What I like about it is that it's just got everything in one spot. And I like that I go to a website and say, create a new project, give it a title. And it's got me set up. I click on a tab to make the sprites. I click on a tab to make the sounds, one for the maps, one for the music. It just keeps me all in one spot. And it looks like it's based on uh well i don't know if that's true
Starting point is 01:43:09 or not it looks like it might be influenced at least by vs code so it's got kind of a similar layout so it might be using some of that stuff underneath the covers i'm not sure but i don't know it just seems really nice and it's got a nice gallery of games that people made with it and being published for platforms like windows or web. That's really cool looking in. Oh, I'm in. You're in.
Starting point is 01:43:33 Got it. All right. So now I've got a handful of tips here as well. So the first one I found while I was doing stuff tonight. So I always forget that I start up an edge for whatever reason. And so when I go to search, my search results look odd to me and it's because they're not Google. Did you bring it?
Starting point is 01:43:53 I did. I bung it. So bung it. I bung it. So when I, when I did it, they have gotten pretty aggressive with their AI stuff. And you can,
Starting point is 01:44:07 if you just go to being and you search for anything, you'll end up seeing the thing pop up at the top. But I also have a link straight to this thing. That's pretty cool. They have, so co-pilots, their regular thing, right?
Starting point is 01:44:18 So you search for anything and it'll, it'll start doing the interactive like search prompt thing, which is cool. But then they've got a designer that you can click there to where you can just say, Hey, I want, you know, world with flamingos and whatever. And it'll make you a picture really cool. They have a vacation planner, which I was super excited about because I want to plan a vacation and I don't
Starting point is 01:44:37 really want to actually plan the vacation. If this can do it for me, that'd be amazing. So that's cool. They have a cooking assistant and they also have a fitness trainer so i haven't used a bunch of these but like microsoft is pushing hard getting more ai available to people freely that are targeting specific things that they know people care about and this is really cool to me so i thought i'd share that and you know hopefully you guys play with it see something so you know go to you guys play with it, see something. So go to racing around the world on this micro studio thing, huh? Okay, cool. I'm just kidding.
Starting point is 01:45:11 Go ahead. All right. So the next thing that one, that one I thought was pretty cool. Just kind of stumbled upon it. The next one, this was something that I ran into that is sort of an interesting thing. So the three of us have talked about the fact that we work in Google Cloud quite a bit. And there are ways that you can scale your Kubernetes pods. And the simple, easy one that a lot of people end up doing with their policies is, hey, if CPU exceeds 80%, then add me a new pod, right? Or if memory gets eaten up by more than X amount
Starting point is 01:45:48 of percent, then, then scale out. And, and, you know, conversely, if, if CPU drops below 10% or something, then go ahead and scale down. So it's a good way to save money and, you know, I guess the earth, but not, but not frying things when you don't need to. So, however, there are things that may be more interesting to your application. So a good use case would be PubSub, right? You have a PubSub, which is kind of apropos for what we're talking about with Kafka here. You have messages coming into a queue and you have an application that's out there on in Kubernetes, it's running that, you know, it, it processes these messages that come in. And so maybe a better metric to use for
Starting point is 01:46:34 whether or not you need to scale your application is, Hey, am I falling behind? Right? Like, are there way more messages coming in that I'm processing? So maybe a metric would be, you know, if the incoming is 100 per second and my process is, you know, 50% less than that, right? Then that means I'm falling behind by half, you know, every second. Then maybe I want to scale that. Well, here's the problem. In Google, and from what I looked up, this is done differently in Google, AWS, and Azure. In Google, you can sort of create groups of resources, we'll call them, and the way that you do it in Google is by creating projects, right? So maybe you have one set of resources that are managed in project A, and then you have another set of resources. So let's just say that you have PubSub in project A and you have your Google cluster, your GKE cluster in project B. Well, in order to make project B, your Kubernetes pod scale, they need access to project A's metrics, right? Those pub sub metrics. Well, there's a way to do that in Google. That's really interesting. It's called scoping
Starting point is 01:47:51 your monitoring. So they have these metric scopes and it allows you to basically say, Hey, I want this project to have access to this project's metrics. And you just add a metric scope. So it's like a parent child relationship. And it's really awesome because one of the things about Google, Google cloud is the metrics that you get from their own internal infrastructure is free, right? So if you're leveraging PubSub, then all the metrics for PubSub are available to use for you for free. You know, the number of messages coming in, the messages per second, all that kind of stuff. Well, if you can share that metric scope
Starting point is 01:48:29 with another project, again, it's still free because you're not creating any custom metrics. You're not pushing metrics in. You're just allowing one system to use it. So this is a really nice way to be able to bridge your projects or your resource boundaries to where they can share metrics to be able to do things like scaling. And that's kind of a big deal. So that's pretty cool. Well, even though
Starting point is 01:48:51 I don't get to play in Azure and AWS as much nowadays, I thought that I'd share what I found on similar type functionality for those. Azure has what's called resource groups or subscriptions, and I have a link there to be able to do the same type thing to where you can share metrics across these resource groups or these, um, what did I call it? Subscriptions so that you can do the same type thing, right? It's pretty awesome. Now, what I found out there, Azure and GCP are very similar into the, in their approaches. What I found out was AWS being probably the first to the game. They learned about these things a little bit later. And so there's, isn't quite
Starting point is 01:49:33 as clear cut, right? Like, so I said that in, in Google, you can log in with one credential and you can set up multiple projects. One project might be your GKE stuff. Another one might be your storage. Another one might be PubSub, whatever. And that's a way for you to kind of sort of isolate permissions and things around those. AWS doesn't really have that notion. For the most part, from what I could see is they expect you to create different accounts to log in with. And so one account is how you group your resources, another account. So you need Alan at whatever, and then Alan to whatever, in order to get those type of
Starting point is 01:50:11 separations, those, those resource boundaries. Uh, but they do have a way to be able to share that across it. And so I have a link to that as well, so that you could do Amazon cloud watch, uh, to where it's shared amongst different logins. So the struggle of being first. It really is. I mean, and it's no knock on AWS. Again, this is stuff that Google and Azure were able to do because they looked and said, oh, these are the problems that people are running into. Let us create these ways to solve those.
Starting point is 01:50:43 So super awesome tip, especially if you need it for things like scaling, or even if you need it for dashboards to look at what's going on with your infrastructure or whatever, right? Really good way to do this. All right. So I mentioned, God, it seems like forever ago, and I still haven't really gotten too far into my redoing my wifi here, but there were, I have made some steps. And one of the things I wanted to share is I think a lot of people, when they go to set up wifi around their house, they just put it where it's the most convenient a lot of times. Right. But when you know that there are gaps, which I outlaw, he had the ASUS stuff at one point and he was getting frustrated because it
Starting point is 01:51:25 wasn't hitting every spot in his house. Well, one thing you can do if you have an Android phone, there is an app on Android that will allow you to actually see the signal strength of all the wifi signals walking around your house or any spot. Right? And I say Android because Apple locks that stuff down. If you have a jailbroke phone, you might be able to do it with Android. It's just available to you. So you can download this Wi-Fi analyzer app and literally walk around your house and it'll show you the strength of your signal for all the Wi-Fi things in your place. It'll also show you the best channels to go to for the least amount of interference
Starting point is 01:52:07 and all this kind of stuff. It truly is awesome. So highly recommend that if you're looking at doing what I'm going to share next. But the channels are supposed to be like automatically decided, negotiated, like you don't have to hard code that. You would think so.
Starting point is 01:52:23 What's interesting is if you go look and I have on my phone, on one of the Android phones that I have laying around, uh, they're very congested. A lot of the channels, it's like they strongly favor certain channels for whatever reason. And you have a bunch of wifi in, in your area that are all taking up the same three, or five channels and then for different frequencies there's a way more channels available for them so it's kind of interesting that it goes that way so i'm trying to see if i can even do that on mine i probably can't on your iphone now on my wi-fi like i don't even think that that's an option yeah they may not i mean some wi-fi things are like no you just set
Starting point is 01:53:05 me up and i'll run yeah so here's the reason why i bring it up the the wi-fi routers that i ended up getting i think i'd mentioned i got the omada uh i got a box yeah i have the amada ax 600s they're the ax 6000s they're the long range uh wi-fi 6e what i don't know whatever so these are poe so power over ethernet well i don't want to go running ethernet cables all around my house and not know if the thing's going to put out a good signal that will be reached in various different areas right so what i did is i'll have a link to this there is a little plug that you can plug into the wall and then plug a network cable into it and plug it into whatever your poe device is and you can just carry it around plug it in and see how the thing will operate so what i did is i took one of these wi-fi routers and just carry it around, plug it in and see how the thing will operate. So what I did is I took one of these wifi routers and I carried it around the house
Starting point is 01:54:08 and I'd plug it up and then I'd walk around with my Android device and see, Hey, is this thing picking up well around the house using the signal that's being sent out from it. So without having to run inner or network cables through my walls and all that stuff and just guessing at whether or not it would show up well in places, I would plug this thing up, put it up on a ladder or something, you know, up high to where it would be similar to where it was mounted on the ceiling and look and see what the wifi signal was like. So that was actually pretty cool. And that allows me to get a picture of where the weak spots are in the house and how it compared. So like one of the things I did is I put it right next to one of my Asus routers and I looked and saw what's the wifi strength of this compared to
Starting point is 01:54:53 this as I move around. Right. And so that was kind of a way to see apples to apples. Like what, what am I going to get when I replaced this router with this router? So that was really nice. And then the last thing I'll share here too is one of the things that I sort of knew, but didn't really realize until I started to do this is if you go the Omada route, which is what I've done. And I did it for the reasons that I said previously, which is security, right? Like I can set up basically VLANs without even really having to do a bunch of work so that I can separate my IoT devices. I can separate my work computer. I can separate my personal, I can separate all this stuff to where I don't have to worry about crosstalk
Starting point is 01:55:35 or things like trying to get in from, you know, a light bulb that I bought that has IOT built in that has no security, you know, baked into it or whatever. So if you just buy the Wi-Fi routers, then you have to use Omada or TP links cloud router service, which is probably not completely dissimilar to what, you know, all the, the net gears and things out there are doing anyways right like they're like hey log into your netgear account to see to see your stuff i don't love it because i don't love my routing and stuff my my network to be known by the external cloud service so much so there is a device that i'll have linked here as well that allows you to basically bypass the cloud part of it. And everything goes through this, this on-prem
Starting point is 01:56:32 little linked up router so that all the, the management and everything is done through this instead of a cloud service. So it's, it's about a hundred bucks. Is it worth it? I think it probably is. i just i really don't like my network map stuff being in some other service providers database somewhere so you know it's gonna be there anyway i still have to figure it out it probably will be i mean like asus and tp or tp link netgear all of them do it right like they want you to to set up an account with them and all that kind of stuff so i don don't know, I still feel better with a local piece of hardware that should be doing all that, that routing stuff. You had Orbeez before, right? Then ASUS.
Starting point is 01:57:15 Yep. And now onto the TP link. Yep. Yep. The Orbeez are loved. They were fantastic. Matter of fact, I actually preferred them over the ASUS the Asus in terms of just how stable and reliable they were. They were amazing. I mean, I had them for years and I never had issues with them. So for an out of the box, you know, just want to set up something easy and get it running. Love them. Absolutely love them. The, they have gone insane with some of their Orbeez prices. One of their systems, like the Wi-Fi 7, is like $2,000. I'm looking at that. $2,300. Do not buy that thing.
Starting point is 01:57:51 Just don't. It's ridiculous. And the reason I say it is this. I'll put this out there. Omada just came out, the TP-Link, the more commercial type stuff, like for business type stuff, just came out with their wifi seven thing, which is the equivalent of what that Orbi $2,300 system is. The outlaw just said with the addition that it is way more secure. You can set up as many V lands as you want. Again, without being super complicated, you just set new passwords and tell it, Hey, I want it to communicate with things on this, or I don't tell it hey i want it to communicate with
Starting point is 01:58:25 things on this or i don't right or i want it to be able to see this or i don't that tp link system is basically the same thing like it's that's insane amounts of money for that so so don't don't buy that it's awkward i just bought one though right so anyways with with that whole thing all that said and done i mean these tp links uh the the wi-fi 7 ones that just came out they're like 190 so you could buy almost 10 of those to put around your house for the same price as what that system you're looking at comes with three right outlaw yeah that for the 2300 that was three it comes with three of those things that you can put in different spots at your house now now those things will probably also give you cancer because
Starting point is 01:59:10 they probably got 80 antennas sticking out of them that are pushing out a thousand decibels of or dbs worth of um gain but at any rate hopefully i'm even thinking about doing a video just to sort of sort of go over what i just said because I think it might be useful for people to see. Like, hey, you don't just have to guess where you put your router. Like, you can actually test this thing out and look at it. Even with a cheap old Android phone, you probably want something that can at least pick up the different frequencies. But it's got to be better than going and buying a Wi-Fi scanning tool that's going to cost you more and have less functionality. You know? Nah.
Starting point is 01:59:48 Nah. So that's it. That's it. I'm done. Okay. Well, I guess I'm the only one that's going to talk Kafka during our Kafka conversation here. Probably. So here's my tip of the week. So I don't know if you guys have ran into this,
Starting point is 02:00:03 but using Kafka in Kubernetes with Strimsy, you can't just, you ever have yourself in a situation where like, okay, let me take it outside of Kafka. You're doing some local, you're doing some development, you've got your Kubernetes cluster up and running and you need to like oh for
Starting point is 02:00:27 whatever reason i need to kill this set of things you know maybe it's a set of replicas and you're like okay just let me just uh set that deployment dash dash replicas equal zero just to scale them down and be done with it or whatever right you would hope would hope, yes. But with Kafka and Strimsy, it's not that easy to do. And in fact, Strimsy kind of makes it really difficult to do because you're not supposed to, like, the idea is, like, you want your cluster running, right? Like, why would you not want your cluster running? Our job is to keep your cluster running.
Starting point is 02:01:04 That's our job. So when you do want to, you know, scale down, I'm going to say scale down the cluster, it's difficult. And, and that's, I say it like that because I'm purposely using the wrong Strimsy terminology for it because how they would refer to it is shutting down your Kafka cluster. So I'll have this set of commands in there, but it basically revolves around two commands. And all you're doing is just setting an annotation on your cluster for pause reconciliation. You're either saying pause reconciliation true,
Starting point is 02:01:47 or you're removing that annotation altogether. And, and what that's telling Strimsy, the operator is, Hey, if this cluster has pause reconciliation set to true, then I'm not going to try. Like if I notice that you've deleted the Strimsy pod sets,
Starting point is 02:02:04 for example, I'm not going to restart them. Right. But otherwise, if that annotation isn't there and I see that something happens, I'm going to assume it crashed and I'm going to restart it because my job is to keep that thing up and running at all times. Right. So there's another command in there to like, that would delete the Strimsy pod sets as just an example. But they they very they're very specific about like, you know, under normal circumstances, you wouldn't delete the Stremzy pod set, you know. Unless you had some specific reason, you definitely wouldn't do it while you have while Stremzy
Starting point is 02:02:43 is reconciling the configuration so you know where it would immediately kick it back up uh you know so uh the the those three example commands will be in the show notes that is incredibly useful by the way it's really nice yeah i read a little bit a while back on like why strimsy moved away from stateful sets and got into the whole strimsy pod set thing and i don't remember all the reasons now but it was pretty interesting part of it had to do with just additional flexibility because when you have a stateful set with a number of replicas then each replica has to be the same and strimsy wanted to be able to do things like okay we're migrating this to another version or something or we're doing some sort of upgrade so temporarily we want to change the definition on this one until it's stable and then roll it out to the next one
Starting point is 02:03:28 so they just wanted more control but uh they made a pretty good case for it i'll see if i can find that article and put it in the show notes um it was interesting yeah i'd like to i'd like to read that the the frustrating thing about this was that i just happened to stumble across this explanation in the streams in a strimsy slack channel right because I I was trying I was I was trying to do some maintenance where like in a live kubernetes uh you know kafka cluster I wanted to do some maintenance on it and I needed to like selectively bring some things down. And, you know,
Starting point is 02:04:08 I was like, okay, every other way that I would do this, this is how you would do it. And they're specifically saying not to like, what is going on here? And then I stumbled across that Slack conversation and I'm like, Oh,
Starting point is 02:04:21 okay, well, that is some gold there that is hidden in this place. Right. Not easy to find. Yep. So, yeah. All right.
Starting point is 02:04:35 Later. We done. That was ridiculous. I don't even remember what closing is. Whatever.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.