Programming Throwdown - 156: Perl and Regular Expressions

Starting point is 00:00:00 programming throwdown episode 156 pearl and regular expressions take it away jason hey everybody i'm sure most people have tried ChatGPT or at least have seen it and heard about it. Oh, my God. So everyone's heard about this by now. Even my parents are asking me about it and everything. So that's how you know it's really reached everybody. And I was looking into it and I thought this would be really cool if I could use it to scan my email. I had a bunch of ideas. I also had some ideas of maybe scanning some internal documents at work. But then, of course, all of my ideas either involve running it a zillion times, which I don't want to pay for, or running it at work, which is not a good idea.

Starting point is 00:01:06 So I was looking around at open source and I came across this, a couple of options. So there's one called GPT for all. It worked really, really well, amazingly well. I was really surprised because when the image thing came out, when DALI came out, the open source, you know, text to image stuff was not good at all. And it's still not really in a good place compared to the one from OpenAI. But this one is awesome. And you can download it onto your laptop or your desktop. It doesn't use that much RAM.

Starting point is 00:01:43 It's quantized. You don't need a supercomputer or anything. And you just ask it questions and it answers you. And it's pretty amazing. Like you just have it right there at your fingertips. I don't think it can run on a phone. I don't think it's that good, but it can run easily on most computers. The thing about GPT for all was when I went to like fine tune the model,une the model or train a new model, it didn't work. Basically, they have a bunch of config files that they didn't put into the repo and it's just not in a place where you could

Starting point is 00:02:16 really train it yourself. At least, I couldn't figure it out. Also, I think the repo is literally a week old. So it's not like it's really had a chance to go through its paces. So I found another one called Stanford Alpaca. And that one had a better repo and also kind of explained how both of them worked. So I'll kind of walk folks through that real quick. So if you remember from the chat GBT episode, you take these large language models,

Starting point is 00:02:49 you give the results to humans, the humans like rank them, and then you use reinforcement learning to like tune the model to try to return things that people would rank highly. That's really expensive, right? You have to pay all these people to do that. And so what these two projects did, which is really clever, is they basically just asked ChatGPT questions and then got their answer. And then they're basically trying to memorize

Starting point is 00:03:20 ChatGPT's answers. So they start with a large language model and they say oh when i ask this i expect that and that is you know chat gpt's answer now they don't like super overfit to that so it's like memorizing but they do that just to you know try to recover that sort of like the the ml term is bias but like try to recover that niche that ChatGPT had. And it works really well. I was impressed. The Stanford one, you have to download the large language model.

Starting point is 00:03:56 I got a BitTorrent of it. It was 300 gigabytes. So it downloaded over the night. Yeah. I don't even know why i'm doing this on my personal computer this is probably never gonna actually work i think you need like a some industrial strength computer but i'm gonna try it out and the model download over the night having a chance to to use it or anything but yeah it's it's very accessible way more accessible

Starting point is 00:04:22 than the image and video stuff and i think that's just because you know text is such an easier medium to operate in so is the idea like i mean i guess this stays like inference like running it and asking it questions locally so that you don't have to send personal or company data up to the cloud okay Okay, I get that. But is it accessible in a way where like, I want to fine tune it on something or have it think more thoughtfully? Like, I guess there's two things, like setting up that problem

Starting point is 00:04:53 and how difficult it is to set up fine tuning or training data. And then the second thing is actually like, do you need six GPUs and a bazillion gigabytes of GPU memory to be able to do those? Or can you actually do it on your computer given enough time? Yeah, I mean, I don't know yet. Well, I'm gonna have to figure it out. But the asking questions thing is very accessible.

Starting point is 00:05:17 Okay. But if you want to fine tune it, like, let's say you wanted to read your internal company wiki. And then, you know, be in a position where you could ask it questions like let's say you wanted to read your internal company wiki and then, you know, be in a position where you could ask it questions like, you know, what are the repositories in my company? Like the Git repositories in my company. That is the part that I haven't got to yet. They did say that you need a GPU with like a ton of RAM. I don't remember the exact amount, but it was an insane amount of memory for a

Starting point is 00:05:46 gpu so yeah i'm not totally sure what's the best way to fine-tune this like maybe you um take this 300 gigabyte file and you put it on amazon and then you you know borrow an expensive amazon machine sure but then that amazon machine now needs to be able to read your company wiki. Yeah, I have no idea how this is going to work. But I do think that there's something really powerful here. So this is the, I mean, like, I don't want to say killer because everyone has the next killer app for all of this. But if you could come up with a way of doing this very like straightforward having like internal search so this thing you mentioned maybe people aren't at big companies don't sort of like have this

Starting point is 00:06:30 problem or realize this problem this is a problem i run into constantly and you said just like what are our repositories but listening is one thing it's just like there's 5 000 tasks you need to do but you don't need to do all 5 000 routinely maybe your team or sub team doesn't have that knowledge. So even just finding the team to ask, or maybe they've even gone ahead and put it on the wiki, like here's how you set up your account and do all this. So just asking a question like, how do I do task A with data set B? You can't go to Google and type that in because what you need is, like you pointed out, you need to create an account at this, you know, go to this portal and fill in your information, which will create you this role, then you need to go to this web GitHub, clone this GitHub repository, and then run this command. And then your computer is

Starting point is 00:07:14 authenticated. And you do this. Asking a person on that team, if you could find the right team is actually super straightforward. But finding that team, getting them to like answer your call, and then forcing them to like, do that over and over again is like very difficult so everyone says why don't you put it up on a wiki they go well we did we put it on this wiki but like you need to know the search terms to put into right what system exactly to find that like it's a it's a just very difficult subtle problem and we tell folks on my team constantly, Oh, put this up on a webpage, put this in a wiki. And we do. And then we constantly field questions from other teams about like, well, why didn't you put it? And like we did here. Well, it's not in this form, or it's not linked into this hierarchy. And it's like, yeah, but how do I find out that you wanted it

Starting point is 00:07:58 there? Like, this is a very, very difficult problem, I would say for large companies. Yep. Yep. Yeah, totally right. You know, I think another cool idea here is you could make your own Alexa. I mean, it's not connected to the internet. So you know, it can't, you know, it can't do anything like that. But you could, you could make a little Raspberry Pi with a camera and not a camera, sorry, with a speaker, and a microphone. And then you could have a little Raspberry Pi somewhere in the living room. And you could say, Raspberry Pi, our humpback whales mammals or something. And it would just run this GPT for all.

Starting point is 00:08:38 And it would just tell you what it comes out. It'd be pretty cool. Do you know, is it capable? So one of the things that i've i actually just don't know and i've not found a good way to look up if i ask a question that it doesn't does it need to already know it in the data it was trained on or does it know how to like reach out get information synthesize that and then like riff on that so as an example like what's the weather going to be tomorrow it you it can't have trained on that right like so an example, like what's the weather going to be tomorrow? It can't have trained on

Starting point is 00:09:05 that, right? Like, so it would need to know how to reach out to a service, interpret that and give me like, is it capable of doing that? Like it can't if it doesn't have internet access. So, but if you gave it, does it know how to do that? Or are we not there yet? Not the open source one. So I think chat GPT has like third party apps, like plugins that you could install. It's not totally clear to me how that works. Like, like, how does chat GPT? So yeah, that's a good example. If you say what's the weather tomorrow? How does chat GPT know that it needs to, you know, query your weather plugin for that? Not totally sure. like might be kind of early days there where it's kind of you know hand coded like maybe you you look for certain triggers in the query like old school conversational ai stuff uh like yeah the kind of stuff we used when we played zork you know like that's probably what they're doing and they're saying oh you know someone asked you know someone asked a question that's like has a lot of pluses and divisions in it let's send this to the wolfram alpha plugin or

Starting point is 00:10:10 something but yeah that would be amazing if somehow you know and when you trained it so what you'd have to do to get this right is in the training data you you would have to have data for that particular plugin. So like right now, they give, let's say, all of Wikipedia to the large language model. Well, maybe they would also give 10,000 weather queries. And then the answer to those queries isn't like regular tokens, like the weather is clear, but it's actually like the answer is some special magic token that when chat GPT sees that magic token, it knows it needs to ask the weather people for something. Reaching out to the internet feels like a good segue to my first news article. So we'll take this as a forced time for news of the show. News.

Starting point is 00:11:06 So my first article that I have is an open letter from, I guess, the AI community at large. I mean, I won't name drop like the specific people. It's not super relevant. But asking for a six month pause in what they call giant AI experiments. I don't know how you feel, Jason. Maybe you can go in a second. But just sort of

Starting point is 00:11:25 like reading this article, this isn't my sort of domain. So I guess I would be a layman here sort of like thinking through like the general societal implications here as opposed to the specifics of AI, but asking for, you know, a six month pause and sort of making sure that we're ready, embraced for the impact of the, I guess, societal changes and the danger of some of this. I mean, famously, I guess, Elon Musk, who had a lot of early ties for open AI,

Starting point is 00:11:52 but seems to now be saying he's not very associated with them and has some disagreements with them, which we won't dig into here. But I think he sort of has famously always said that, that he's worried that there's a very real danger that, you know, general purpose AI would sort of pose a danger to society and to humans, the exact nature of which seems to just slide from, we can mock it for the silliness it does

Starting point is 00:12:15 today to like, all of a sudden, you know, somehow all of us are, you know, it figures out how to launch the nukes and we all die, right i think uh you know this six month pause is really interesting uh just as sort of like a a thought experiment in my mind which is like what does that amount to what is the like game theory of uh you know let's say somehow government steps in and enforces open ai as a u.s company and a u.s government you know to stop them but what about other countries? Doesn't that like they don't have it's a it's sort of like the kind of problem and not to like, you know, like the nuclear arms stuff, right? Like it's all the or climate change, you know, sort of policies,

Starting point is 00:12:55 all of these things are can only be whether you agree with them or not, they can only be as effective as the sort of enforcement or number of people who voluntarily, you know, sign up, right to doing it. That's why it's called global, right? Global warming, because the whole globe has to participate. Yeah, this is very like, what do they call it? I guess it sort of gets into tragedy of the commons, like no single country cares enough to stop what they're doing. And in common, okay, this is a side topic. So on this one, though, I think this is very interesting. And I was reading some analysis of it by Scott Aronson, who I didn't know, who's the person we've talked about before on the show, who does a lot of talk about quantum computers and sort of like, are the experiments you see coming out, are they actually quantum computer?

Starting point is 00:13:40 Or are they classical computers done in a quantum way? And sort of some of those nuances there and what that all means and p versus np and anyway so he talks about these kind of things i didn't realize he has been working recently at open ai on you know ai safety um but he sort of said and a personal response to some of this he has some good questions which is why six months not six years or six weeks like, what do we think is like the reason for six months? And just his analysis is like, this is a group of people signing on, but everyone actually kind of wants something different. And he had and I'll steal this story

Starting point is 00:14:15 from him. He had an anecdote that he talked to someone he knew from academia, who said they signed the letter and he asked them, he's like, no i completely disagree the letter he's like but i did a bunch of research for you know chat gpt4 and i like the next chat gpt5 comes out it ruins my research like i just need six months because that's the time iteration cycle of like getting my papers out so it's kind of funny because why that person signed it and why i saw i you know i don't know anything about it, but it was like, oh, Steve Wozniak signed this letter. Those two people probably have very different reasons and incentives for signing such an open letter. So this is an interesting sort of like news article. It's very prescient today for like, you know, all of these things are saying, is it really the end?

Starting point is 00:15:01 Is it the beginning? Is it a fundamental shift? Is it another personal 3D TV in your house for like, you know, it's just a fad and it's going to die out. VR is to be seen. We'll see what happens with that. But, you know, everyone's kind of like pontificating about what the implications of this all is. Yeah, I mean, I would, you know, when this show comes out, it'll be either on tax day or very close to tax day. So I think I would, I would probably sign it just so that I could get some tax reimbursement from all these professors who won't be working for six months. Like we should get some kind of a kickback. I think, yeah, I mean, it's true. The six month, you know, there's an interesting historical fact here, which is that countries that try to limit technology, it almost never works. Like, I think the one that comes to mind is when China banned the steam engine back in the in the 1800s they did it because they had almost like a

Starting point is 00:16:08 you know we have the taxi medallion system right you know they had a guild for you know stage coach you know and wagon transfers and all of that that was gilded and the steam engine would completely decimate all of those jobs and so so it's kind of like, you know, for automation reasons, they ban the steam engine. And some people say that it, it set them back like over a century because the steam engine was just so important to like the industrial revolution, which then caused like this like exponential growth in tech. I feel like, yeah, that might happen again here. You know, if someone tries to ban this technology, then whoever bans it might end up having a really hard time recovering from that later. I feel like the answer is almost never to pause or ban things, but to then build more things.

Starting point is 00:17:00 The constructive solution is almost always the long-term solution. I mean, to your point, you know, the six-month thing is by definition not a long-term solution. And so there was a Stanford, I think it was Stanford or Harvard, there was a researcher who, you know, using ChatGPT and a bunch of students submitted assignments, they wrote a classifier to classify whether it was written by ChatGPT or not. And so now like, you know, K through 12 teachers can run student reports through this classifier and it'll say whether ChatGPT wrote it

Starting point is 00:17:38 or how much of it, like what's the likelihood that ChatGPT wrote it and the spread, right? Like did it write all of it? Did it write just one part of it? so that i think that is really you know that entering that arms race is really you know the long-term solution i'll just say it and then drop it but you're mentioning of banning the the steam engine and setting things back is i guess like unintended kind of consequences and there's some analysis too about people sometimes,

Starting point is 00:18:06 again, not to keep doing it, but Elon Musk is always railing against short sellers on Tesla stock. And so there's like this equivalent with futures and options on various crops. And some by law are banned from having futures markets on them because they successfully lobbied that it would basically cause more volatility but in retrospect actually look there's more volatility in the ones without except then doubly look uh there's like where and how the distribution is shaped is different so like you get more very extreme price movements potentially in ones with options but these sort of like average

Starting point is 00:18:45 volatility i'm not a i'm gonna get my over my head here but like the the sort of like over time just the general volatility you see uh is reduced it's this is very like it's hard in these like very dynamic systems to predict what small changes will do yep yep yeah i totally agree we'll have to see what happens i mean they're up to uh what was it? Something like 2000 signatures. Yeah. 1800 signatures as of when we're recording. So probably when this comes out, they'll be up to 2000. Hey, maybe people here, it's like the like and subscribe. I'm just getting done. We're not advocating for you to go. Yeah. If you have to pick one, hit the subscribe button, hit the like button on this show uh but if you want to do both go for it we'll post the link in the show notes so related to this i have a article from

Starting point is 00:19:31 vice it says ai will smith eating spaghetti will haunt you for the rest of your life haunt is not the right adjective or adverb yeah it does stick with you though so you know we we all talked about dolly and we talked about on this show even dolly and image and things that take text and generate images so you could type in you know a dog in a space suit on the moon and it will generate an image like that one thing i've done that's really interesting is you can type in things that are abstractions like i i put in you know love wrapped in like swirls of happiness and then like it's like really interesting to see what kind of images come out of that but yeah these are just images so what's the next level the next dimension that is time and video and so someone has made the same kind of thing but with video

Starting point is 00:20:24 the thing about it is it's just you have to watch it. I mean, it can't really do it justice over the airwaves. But basically, it just moves in ways that are not Euclidean that really freak you out. And, you know, you look at the video. So someone put in Will Smith eating spaghetti. You look at the video and it's yeah it's definitely Will Smith eating spaghetti I mean it's it's completely unambiguous but but it's just so odd oh man what was your take when you first saw that I guess it's maybe it's like biased by if

Starting point is 00:20:58 someone who shares a link tells you this is funny or that this is like horrifying but it was like funny and so then when I watched it I'm like indeed this is like horrifying but it was like funny and so then when i watched it i'm like indeed this is like very funny but as jason when it takes a minute in my opinion it's like you said these concepts it definitely like oh okay will smith the famous actor if you know who it is you'll recognize you know or even if you don't know the name you would probably recognize the person but like it's not really the person so it's definitely picked up on like the attributes that make will smith recognizable but if you take like a screenshot like his face is distorted in very incorrect ways so it's like preserving the recognizability but it's clearly not like

Starting point is 00:21:38 stitching existing video or it's like hallucinating it right from like uh i guess that's like the latent space of anyways and so and there's definitely spaghetti but like the method is very unclear like he's not really using his hands or a fork it's like just going in like it's flying into his mouth it has like the spaghetti is going through his mouth like it's amazing it's just very like if you ask you know it's one of those things i always talk about like oh ai is so dumb you know it can't do it can't recognize a bird my five-year-old can recognize a bird but if you ever had kids or look at like kids pictures very interesting what they decide to draw and how they do recognize like like even to this day my kids are older asking like oh draw

Starting point is 00:22:19 a picture of a person sometimes they're missing a neck like they don't understand like the like what makes a person a person or the famous one uh is and i saw a clip where like i don't know if they did it often but a news reporter asked a bicyclist to adjust run one a like long rate like the tour de france kind of thing right and they come up with a piece of paper hey in 10 seconds can you draw a picture of a bicycle this is someone who spends like their life on a bicycle, like draw a picture of a bicycle. And it's actually very difficult for them to like put down the features that, you know, two wheels and a handlebar are great, but the geometry, the makeup, and this is not like a person who's never seen one before. It's someone who like tinkers hours, you know, weight savings, like all this stuff.

Starting point is 00:23:01 And they just like, can't kind of get it correctly. And this sort of the same thing here, like it's trained trained on these things but it doesn't get it quite right yeah have you ever tried to draw the layout of your house or your backyard for somebody like for you know a contractor or anything like that it's pretty much impossible like uh you know it's like you get the general thing like okay there's a bedroom and a bedroom connects this other room. And yeah, the bedroom is like smaller than the living room. But then it's like the thicknesses of the walls and stuff. Your house isn't shaped like a rectangle.

Starting point is 00:23:33 It's yeah. Yeah. It's like how thick the hallways are or how wide the hallways are, how thick the walls are in between the rooms. It's like we have zero concept of that in our in our mental model yeah it reminds me of if you ever watch like a slam video and when they close the loop so like slam is simultaneous localization and mapping so they're taking some you know camera or lidar through rooms and what you end up with is because it's not a hundred percent accurate in the reckoning that like rooms will be tilted and offset from each other but

Starting point is 00:24:06 then when you sort of like come at it via another path it sort of snaps into place and so i feel like what you're saying it's like if you drew the flow of your house like the graph connection is probably 100 accurate like which rooms connect to which rooms right but then you need like the shape and like some anchoring dimensions and then you could probably like deform it into it pretty accurately. But they always say this, like people live hundreds of years in a house and don't realize there's a three foot gap where like a million dollars is buried in the wall or whatever, because you just can't like spatially sort of like tell distances accurately enough to know that there's like some missing space in your room.

Starting point is 00:24:43 Yeah, that's totally right on. Completely unrelated. My next article, I guess it is images, so we got a theme going. So this image is someone posted a robust image compression from implementation from an old NASA paper. I think it was called ICER,

Starting point is 00:25:01 was what NASA had named it. Interestingly, when NASA developed stuff, it's like it and I'm going to be very careful. It basically should end up in the public domain because it's paid for by the public. I think there are some caveats made of the things that you say, oh, OK, so it needs to be if NASA, right? You think like space probes and robots on the surface of Mars and that kind of stuff. So the bandwidth is very limited. And that is definitely true. And it's very interesting to read how these videos and pictures we see get transmitted over time with budgets and allocations. And it's not like you have low speed connectivity all the time.

Starting point is 00:25:44 You have like bursts of low speed connectivity, and then long gaps. And so sending back initial stuff, and then refining it over time, which is very interesting. So the kind of cool part was this person who didn't have a, you know, as far as I can tell, like a ton of background in this rota C library for that implemented this, this image compression, where you can kind of give it a budget, and it'll try to do the best they can. But the other thing they pointed out that I found particularly interesting, probably just because of my background, is it being robust to like dropped packets. So we normally don't think about that. If you like take a JPEG image and you just like zero out one of the bytes randomly, or forbid, like delete the byte and shrink the file, like you're not going to get an image.

Starting point is 00:26:23 You actually just won't get anything. You're not going to get a crappy image. You just literally, it won't parse. You'll have an error. Your library, your computer might crash. I don't even know what'll happen. Hopefully not. There's just literally nothing in it. It's just built for reliable assumptions that all the data

Starting point is 00:26:40 is preserved. There might be a checksum. Maybe it'll know that it's not good and not render garbage, but you're certainly not going to get out just like a blurry image. And so what I thought was really cool here is thinking about a way of transmitting images or videos, which are sequences of images in a way where like, you don't have any assumption about, you know, packets getting through. Of course, if none get through, you get nothing. But like, you know, just randomly dropping one. none get through you get nothing but like sure you know just randomly dropping one i love the idea this is like uh this is really interesting i guess like the more packets you get the more clear it is kind of thing yeah and i think it

Starting point is 00:27:14 would be cool for if you imagine there's like really low bandwidth radio links you can get for like i think uh what is it i think laura is one of them l L-O-R-A, long range radio. But the bandwidth is very low. So they normally just use for like sensor reading, text message. But it would be cool to say like, if you set up a camera at an edge node, where maybe you had poor radio connectivity, and you wanted to put a camera there, and you wanted to like send back an image,

Starting point is 00:27:42 but your connection sometimes gets through and sometimes doesn't doesn't have a ton of bandwidth i think there's a lot of interesting opportunities there and it's funny how how we take for granted i saw the other day you can go to the store for sort of animal watching bird watching trail cameras uh like get a security camera that you like strapped to a tree that has like a cell phone subscription and it'll like upload when there's motion it'll like take a picture or a short video and upload it over your you know cell phone connection plan that you buy you know it's not a full plan but and it's just like wow and you just go to the store right i think you just go to the you know best buy or walmart or whatever and they have

Starting point is 00:28:18 one of these it's just like this is crazy to me how fast this sort of changed yeah i love how it says this library was designed with embedded systems in mind, but it should function on normal systems too. It's like saying, yeah, this was designed for Perl, but it'll work on normal languages as well. A little foreshadowing. The embedded stuff there, I think, is the call out because it's not

Starting point is 00:28:44 just the limited resources, but also things like embedded systems sometimes don't even have dynamic memory allocation. And you'd be surprised how few like off the shelf libraries will work if you say you're not allowed to run new or malloc. sense all right my news second news is called dig this vegas there's a bunch of these i found out there's one in texas i'm gonna try and go but basically these are kind of amusement parks where you get real life construction equipment so uh the one in vegas uh which a buddy of mine just came back from um they gave him an excavator and they they put a car like a you know a junkyard car on the ground and he gets to just demolish it and so like you as part of this experience they put you i think you start in like a skid steer and so you drive a skid steer through a track and then they have a bulldozer and you you know move some sand around all that and then they put you in an excavator and your job is to get to there's like yeah your goal is like there's inside the car there's a box and you have to get that box by like tearing

Starting point is 00:29:59 the roof of the car or throwing it around or something. And so they put you in this really heavy duty equipment and you get to have fun for like two hours and do crazy stuff. So I thought this was really cool. There's one over here in Texas. I'm going to take the kids at some point, but I got a real kick out of this. If you're ever in Vegas or there's one in Minneapolis, there's a bunch of them. If it's something that you'd be interested in, find out the locations and try it out sometime. It sounded really, really awesome. I don't have any firsthand experience, but the idea just immediately clicked with me. So looking forward to trying this at some point. Are you going to give the disclosure about that being a sponsored segment? No. Yeah, they're not paying us unfortunately i wish they were yeah maybe they should be no yeah so this is uh totally uh unsolicited uh yeah we don't we don't uh push

Starting point is 00:30:54 any product here so so uh it's all it's all legit i just teasy jason i've seen something similar i mean that one that sounds cool i've seen a similar one where i was somewhere and i guess it makes sense because it's a relatively cheap thing to do they go to i assume a thrift store i'm just gonna hope that they buy like damaged or broken you know tvs things that don't work anymore and then you pay and they put you in like a kind of like a jumpsuit like safety goggles and give you like a bat or a sledgehammer or an axe and like we will give you a room and in this room we're gonna have like you know dishes and a tv and like a sofa and you can just like go at it like smash them smash the but they probably don't want you smashing the walls but i assume they could build a fake wall if they wanted you

Starting point is 00:31:36 to yeah so you pay for 15 minutes to just like take a sledgehammer to whatever's in this room and it's like wow apparently there's a you know malls today are having difficulties so i assume like rent is pretty low and so like they just put it it's a very low overhead operation to uh grab a bunch of junk from somewhere and put it in a room and charge people to go smash it yeah coincidentally i have a friend who's who's a physician uh or not is that the right word basically he's a surgeon sorry that's where i was, I have a friend who's a physician or not. Is that the right word? Basically, he's a surgeon.

Starting point is 00:32:07 Sorry. I have a friend who's a surgeon and he works for a hospital in downtown Orlando. He told me that they got someone in who went to one of these things that you're talking about. And basically, he swung the bat and it smashed the window of the car and a piece of glass like flew up and like cut an artery in his neck or something. So I'm pretty sure lesson learned. I'm sure they take out all the glass now. But there was when this like first was a thing, somebody got like really, really hurt. Just happened to be in Floridaida uh well on that happy

Starting point is 00:32:46 note it's time for a book of the show my book of the show is how to get in the hospital now my book this show is it doesn't have to be crazy at work so i found this book fascinating it's from dhh and that's what everyone calls them. But I'm going to look up the actual person's name here for posterity's sake. David Heinmeier Hansen and Jason Fried. So basically these are the two folks who made Basecamp, the co-founders of the productivity software Basecamp. So a bit of background here, Basecamp is completely bootstrapped. So there's no VCs, there's no strategic investors, there's none of that. And so, you know, that means there's zero pressure.

Starting point is 00:33:30 It's not a public company. And so all they really have to do is make enough revenue to cover their expenses in terms of staying afloat as a company. So that gives them like a very different perspective. But, you know, the book is all about a different experiments that they have tried, you know, in management, you know, they tried doing unlimited vacation, and they found out it was a terrible idea. And they tried this other thing, they tried our thing. And so it kind of like walks through their journey starting Basecamp, similar to the last book I recommended from Tony Fidel, it is rather prescriptive. I mean, they're constantly kind of telling you what to do, what not to do. And so you always have to take that with a grain of salt.

Starting point is 00:34:13 I loved the anecdotes. They would talk about things that they did and what worked and what didn't. I mean, all the things that they tried, even the bad ideas, they make total sense. I mean, it seems completely reasonable to try any of these things. And so then hearing, you know, how they succeeded or failed or what, even the ones that succeeded, you know, how they had to make changes over time was fascinating. You know, I had a really great time reading it. I did finish the book a few weeks ago. So I read it cover to cover and it's relatively like homogenous. Every chapter is kind of like a different story. But once you've read the first chapter, you kind of see the pattern. And so yeah, really good content. Recommend it.

Starting point is 00:34:58 And you've now become immediately disgruntled with your current employer the the one that really stuck with me was the unlimited vacation because that what they talked about is exactly what i would think would happen in that situation which is you know some people will take like you know two three four weeks of vacation just like normal and then you have a hand like a percent of people who will take no vacation ever, because that's the type of mentality they have, and they get super burnt out. And so, you know, the nice thing about the, you know, you have three weeks of vacation or disappears is that at some point, someone's like working against their own interests by not taking it so that i thought

Starting point is 00:35:45 that was pretty salient but yeah it was a great read all right awesome i have to check that out yeah i think you're right i think a lot of the some of this stuff seems like it would make sense right first time anyone hears about unlimited vacation i'm not gonna say anyone a lot of people probably like that would be amazing and then it's like yeah but you do realize if you just took if all you did was take vacation you're going to no longer have a job like and it's very difficult like you said to how do you deal with the i don't say guilt that's not the right like how do you deal with some managers being like yeah sure you know do whatever and other managers being like no and so yeah it does become very nuanced even vacation, you have like how much is sort of socially acceptable to take at

Starting point is 00:36:26 once. So most businesses either give you a set amount a year and it expires each year or you get some each year and it kind of builds up and then there's like a max and you, like you said, you have to either take it or lose it. But there is still some like, I don't want to say interaction negotiation about like, yes, you've earned four weeks of vacation because you haven't taken any for the last 18 months. But like taking it all at once is still a big ask rather than taking one week at a time. And so I think there's a lot of like a lot of difficulty in navigating the politics of that.

Starting point is 00:37:01 Yeah, totally. I mean, one thing that is really controversial, but I stand by it is that I feel like in many ways it is a zero sum game. And so, you know, people will say, well, we have this, you know, we at Company X have this performance rubric. And if theoretically, you know, everyone was a greatly exceeded expectations, according to our rubric, then company X would get everyone a greatly exceeds expectation and everyone would get a three times bonus. But, you know, is it a coincidence that that never happens? Like, you know, do you really think it's a coincidence that there's always the same relative distribution every single time? And even if that did happen, people want to get promoted and the distribution of levels seems to stay the same all the time, right? So yeah, I do think that given that there are parts of it that are you know relatively zero sum or

Starting point is 00:38:05 we'll say low low positive sum then there's going to be a class of people who will just be as competitive as humanly possible and what they'll like in the short term and in the long term they'll get as i said earlier like totally burnt out and so it's uh yeah it's really difficult. It's almost like getting someone to take medicine. It's bitter, but it's really important. Mine is a fiction book, which is The Prince of Fools by Mark Lawrence, which is the first in a series called The Red Queen's War. I haven't read the follow-up books, but this first book I did read, there was the, the oh I should have

Starting point is 00:38:46 looked up the silent no the sisters oh man why can't I think of the name anyways I had recommended books by Mark Lawrence before that were in a uh in a series and so this is another series by them and it's a pretty interesting take in sort of uh it's not science fiction but more fantasy oh definitely fantasy is when you're thinking holy sister or gray sister yeah that's one of them well there's a series there that's all part of the same oh you're right there's red sister yeah yeah i can't remember the name of the series though that was what i was searching for it failed me didn't have my notes together so the prince of fools is this red queen's war number one they're introducing sort of a new world this new concept i think mark warren says a pretty good job of of this sort of like setting a very interesting not grim world but kind of like

Starting point is 00:39:29 things are are sort of like oppressive and problematic and you don't kind of always understand why or what all is going on and then as the story progresses you sort of get that feeling of like learning more and understanding the way the systems work and i think that's repeated here so i'm looking forward to reading the rest of this series. But there are kind of like two main characters here that are interesting dynamic because one is sort of like a privileged spoiled person. And the other is sort of like a very gruff, like raised on hard times,

Starting point is 00:39:59 kind of like a Norse warrior. And they kind of have to like end up going at things together and helping each other, even though like normally they wouldn't be associated. And so I found it a really nice read and looking forward to reading the rest of the series. Nice. So is this a completely separate thread or did you have to read the sister books first?

Starting point is 00:40:20 No, I, well, I haven't yet even seen any references to the other one. So got it. I don't think it's in the same universe. I think that's the way of sort of saying it. Yeah, that makes sense. Oh, very cool. Is this, do you think this is a good one for people who are just getting into like, like,

Starting point is 00:40:40 I guess, dark fantasy or epic fantasy? Oh, that's a good question. I don't know this is hard this is like asking we we answered this one but like it's still a hard one for me like what would you do today to get into programming and it's like when i got into it right and today it's like very different so it's hard so i've been reading fantasy for a very long time so i'm not sure that i'm called i'd have to to think. I'm sorry. I don't have a good response, Jason. No problem. Good question. I just don't know how to answer it. You know, I think maybe one answer is just, you know, any book that you pick up is going to be a good first

Starting point is 00:41:17 book. So if you're out there and you're looking for a really good hobby, you know, this is a great way to get started. I would say that having a book that pulls you in and regardless, you'll sort of make it through. I saw someone pointing out that there's a weird cultural thing, and I don't know how global it is, but that people have this thing when they start a book that they feel they got to finish. They want to finish it. it even if they don't like it they won't start a new book until they finish the first book and so like you get in this catch-22 where like it's hard to make it through this book so therefore you're not reading as much and they're like just very weird dynamic oh and i was like oh yeah actually i do feel this way like when i start a book i just really want to finish it and someone was pointing

Starting point is 00:42:02 out you should give it x pages or X hours of reading. And if you don't like it, just put it down. Don't finish it. Who cares? But I will say there have been some books that I really, really liked that took a long time to get into that like, you know, a third of the book was,

Starting point is 00:42:17 and I would love to critique them and say they should have done it differently or whatever. But it was sort of like, you probably wouldn't have had the payoff if you hadn't gone through the stuff. So I don't know how I feel about that assessment, but I definitely do fall into the camp of like, I don't, I want to finish a book. I don't like putting a book down, even if I don't like it, I've only done it once or twice. Yeah. See, I'm the opposite. I have a bunch of unfinished books.

Starting point is 00:42:41 I mean, maybe, I think that there is a fault the other direction to where if I read a book and it's not really like holding my interest, then I'll just move on to the next book. And then part of it is if you check out a book from a library, then, you know, if you don't finish in time, it just goes back to the library. And that's, I think, a big cause of it. Well, if you like these books, you should definitely check out our affiliate links. So if you want to get into epic fantasy or you want to read about it doesn't have to be crazy at work, please go to our show notes and click the link that helps us out that I don't exactly know how Amazon does it. But if you click the link, Amazon rewards us through some way of doing it. So we appreciate that.

Starting point is 00:43:34 And if you don't want to go through or if you already have these books, you can support us through Patreon. Or even if you do have the books, you can always support us through Patreon. That is patreon.com slash programming throwdown. And we appreciate everyone's support. Yes, thank you to everyone. Yeah, with with that we have the tool of the show my tool of the show is actually a piece of hardware what yeah it's the remarkable two it's really interesting again we're not you know sponsored by them or anything so this is totally organic but uh it's very simple all it is is a e-ink display that has like you know a touch screen and it's somehow set up so that the pen can write to it

Starting point is 00:44:13 but like your your palm of your hand or your finger like don't accidentally mess things up so they have some palm rejection i think is what it's called um you can write really well it's extremely responsive and then on the back end, it can OCR your notes and turn them into text and everything. I haven't, to be honest, used any of those features. But the thing I really like about it is, you know, you can have very easy random access to all of your pages. So I used to use, you know, hold it up, even though we're an audio podcast, I'll hold it up anyways, just for my own sake. I used to use, you know, hold it up, even though we're an audio podcast, I'll hold it up anyways, just for my own sake. I used to use, you know, these little composition books.

Starting point is 00:44:49 And so I have like a ton of notes in them, things I want to remember, things from discussions I've had. But then, you know, I'm kind of flipping pages and this is just so much easier. So I have my notes, you know, I have each page in this kind of organized. And so if I want to go to another page, I just click on the name with the pen. I'm really digging it. Yeah, I mean, it's kind of pricey. I think it was, I didn't buy a cover or anything like that, but I still think it was something like $400 or $500.

Starting point is 00:45:19 But I'm getting a lot of use out of it, constantly taking notes at work. And so I feel like I am getting my money's worth. And overall, I'm just it's a beautiful experience. Do you have any thought of I saw Amazon released a Kindle version, the scribe that sounds very similar. Do you have any like, why one over the other one wasn't available when you bought the other? Yeah, it's a good question. The Kindle scribe was pretty new when i was looking and so you know it was a tough it was a tough debate i think ultimately i knew somebody who's really happy with the remarkable and so that's really what what pushed me in that direction i mean part of it is uh if i'm gonna read a book it'll almost

Starting point is 00:46:02 certainly be an audiobook so So that was another factor. I think if you are a reader, the Kindle Scribe makes more sense. Like this thing, I think you can put PDFs on it, but it's not designed for that. It's not very convenient. So if you're going to read, I think Kindle Scribe might be a better option. I think it's a little bit cheaper too, but I haven't tried it, so I can't vouch for it. The Remark remarkable works really, really well and definitely get the pen with the eraser.

Starting point is 00:46:30 So it costs a little bit extra, but you know, I'm constantly kind of going through and erasing different lines or replacing them and everything. Nice. I didn't realize we could do actual tools for tool of the show. My tool of the show is a game i checked i don't think i've recommended it before so here it is slay the spire this is not a new game this has

Starting point is 00:46:52 been out for many years now um but i actually got back into it i was traveling and looking for something that was like uh easy enough to to sort of use while i was traveling on an airplane that kind of stuff. And this is a, I guess you'd call it a card game, but it's a card game that you wouldn't build physically. I guess they might try to do a version of it. And you sort of, I guess it's a rogue-like game or a rogue game in that you, as you progress through your journey, you can add cards to your hand. You can sort of figure out what you want to focus on you kind of build up points and then when you die you can you know improve your character in some way and i was having a really really difficult time like i was actually like kind of put it down

Starting point is 00:47:36 i was playing it again again i just like i really sucked at it so i like did a little bit of googling for like some tips and realized like okay there's a few main characters that you kind of like unlock and actually the first one they were saying is much harder to feel progression about and so they were recommending the second one i picked up you know trying to play with the second character and sure enough like it clicked i was like oh i understand a lot more part of these games is playing through them enough to understand like what's ahead. Cause you're sort of making choices. And if you don't know what's up ahead, it's like, it's a sort of like talked about the book.

Starting point is 00:48:12 You got to play it enough to understand the dynamics so that you understand, like you can't just pick randomly and, or whatever looks cool. You got to kind of focus for a direction and it's hard to have a direction if you don't know like what the pacing is like or how often you can heal or these kinds of things anyways slays fire probably most people have heard about it or played it but um i enjoyed it and um once you sort of like get into it you can actually just like kind of play quickly and you sort of like get to more the strategy and a little less the like you know okay what do i do on each turn i'm looking at all my cards you know what they all say you can sort of play them very quickly and so the pace picks up oh very cool

Starting point is 00:48:49 i've been getting into hearthstone which is somewhat similar although it's not episodic oh yeah another oldie but a goodie yeah wow yeah yeah it's uh it's good times i don't play with other people though because i'm not really competitive but uh the single player is actually pretty fun the nice thing about the single player is that you know that every level is is you know pretty either neutral or in your favor and you're not gonna get you know completely unbalanced you know cards i played similar exactly to as you describe hearthstone a while and i put it down and i i've not picked it up again i don't think you can really play it offline which to me is hard when like a lot of the times i can play is like when i'm on an airplane or whatever internet connectivity is

Starting point is 00:49:32 getting better but um i thought there was still like a compulsion to like get the reward loop unlock stuff to do at least some multiplayer gaming like oh you should go into a battle you should at least try to win one a day or one a week or during this time frame yeah you get these unlocks and even though they probably didn't matter because like you said they're like pre-canned for a lot of the single players i still felt like i was missing out and so i dropped it that makes sense all right on to pearl and regular expressions um what's your history with Perl? Like, have you used it for the purposes of scripting and kind of got more into in the early thing just using bash and it always felt like using something that came with it was a little easier and i was at the time a c programmer and so anyway so pearl sort of always floated on the

Starting point is 00:50:37 periphery and you know yay old patrick would go onto the internet uh dog pile and search you know like something and get a Perl one liner, right? So rather than using sed and ox, someone would give me a Perl one liner, and I would have no clue what it said. And I would run it, but always sort of like bounced around where people would talk about using Perl and kind of interacting with but I just never sort of like got heavily into the development. But yeah, that was sort of like got heavily into the development but yeah that was sort of my has read about you it's about the same yeah i think you know i'd mainly use pearl as a a few kind of scripts to manipulate some text the big use case that i had for pearl is

Starting point is 00:51:18 is where exactly where someone will come like i'll want to to, in a huge string or a huge, like, group of files, I want to, like, replace foo with var and all the files. And then you can use sed and all that. But sometimes it just, there's certain things, maybe, and I think it's gotten a lot better, but there was definitely a period of time where Perl regular expressions could do things that sed couldn't. And so it's like, at some point, you would try to do something that said couldn't and so it's like at some point you would try to do something that you read on the internet and it wouldn't work instead and you'd have to like go to pearl even actually pretty recent uh yeah the mac uh version of said the one that comes with mac or comes with freebsd oh this is yeah is like not doesn't have all the features. And so you have to go and get out of all.

Starting point is 00:52:06 Yeah. Yeah. You have to get GNU said, and then run G said, and so, so, you know, Pearl doesn't have that problem. Like Pearl has every version of Pearl, you know, reasonable version of Pearl has all the regular expression stuff. What we were researching for this show, I tried like looking for evidence or proof and I couldn't find any.

Starting point is 00:52:25 But one of the things that I always sort of felt like that the earliest example of, I don't want to say dependency management, that's too strong. But having a programming language where it wasn't just what was sort of as shipped, that there was like, and not even like python talks about the concept of like batteries included and you know having all these things but today we think about python and no one that i know develops python without like having using i guess that's pi pi through pip or some other you know distribution environment where they go out and pull down modules node even probably more so with the node package manager yep but i going

Starting point is 00:53:06 all the way back to like my earliest memories again couldn't find any defense of it but pearl had the the cpan and the cpan was like where you could go to get like the modules the downloads someone else had written something that did a thing kind of like plugins i guess or you know today we kind of know these are very common, but that was the Comprehensive Perl Archive Network. And it was a way for, you know, kind of like dynamically for you to grow the compatibility of your programming language and sort of pull down stuff that added capabilities or features or stuff that wouldn't have come out of the box. And again, I already kind of mentioned coming at this from like being a C programmer,

Starting point is 00:53:47 even to this day, even in C++, this is still like not how things are done. Like in C++, you don't go like, oh, let me run some command. They do exist. It's just not the regular use of most C++ programmers. But like, you can't just, I think Conan is an example. Oh, I'm going to run this Conan command and download this other thing into my project

Starting point is 00:54:04 and have all this other capability. That's just not how it's sort of done. But it was cool that Pearl had this. And I feel like this has become just the way it is today. But back when Pearl and the CPAN were kind of like in the zeitgeist, I feel like that was a pretty unique play. Yeah, totally right. Yeah, I think that was one of the great contributions of Perl was that it kind of created this network

Starting point is 00:54:29 effect. I think that culminated with GitHub, where now you can create a project on GitHub. Everyone's looking on GitHub for projects, and it kind of creates this whole network. I was noticing the other day, unlike the GitHub thing, is how much easier it is to find snippets of code. We had open source projects before right now like no i'm an old person but you would go to like remember like sourceforge but like sourceforge but it had like uh was that an svn repository i don't even remember and you would sort of like you could clone it well that's not what it was called

Starting point is 00:54:59 download it and then look at the source code but who who did that? Now when you search on GitHub, like the source code is just right there and you can just click, you can get some snippet, you can figure out how to do something, right? Like Stack Overflow snippet. I feel like it's just crazy how much different it is than kind of the way it used to be. Yeah, yeah, totally. Oh, the other thing that has always kind of stuck to me

Starting point is 00:55:20 about Perl is how amazon.com is built on Perl. And I would talk to people who work at Amazon and they would say, yeah, it's just, it's a, you know, I mean, obviously, you know, the database is not written in Pearl or something, but like, you know, the, the backend front end web stuff is all in Pearl. And that blew my mind because, you know, it's really an outlier. But yeah, I think I'm sure Amazon hires just tons and tons of Perl people even to this day. Is that like the Facebook runs on PHP, but it's sort of like a heavily, often heavily modified, whatever? Or do they really just like bog standard Perl? Do you know? They've got to have something right they've got to have

Starting point is 00:56:05 something because at that scale you know even if you make the compiler one percent faster that probably saves them enough money to justify you know some engineer working on that full-time but uh but yeah i think the the syntax is still very pearl like so I mean, getting back to our use case, you know, I primarily so Perl has this thing, they call it Perl dash pi. It's actually Perl, you know, dash p dash i dash e, you can just string them all together. And I'm going to try and see if I get this right. I think the I was interactive. I don't remember exactly what it was but basically you could do pearl dash pi you give it a regular expression and then you give it a directory and it will just execute that on the whole directory so you could do it's similar like set and off you know pearl dash pi you know substitute you know foo with bar on this directory, and it'll just go through and just take care of that for you. And I felt like, you know, definitely back in the day, that was extremely powerful and useful. Like imagine, you know, you have, let's say you have some text content, maybe it's

Starting point is 00:57:17 coming out of your top command or something, and it's showing you your processes and what CPU they're using, and you have a log of this. And you want to put it into Excel. So you want to create like a comma separated value. So you want to take, you know, these, maybe it's tab separated. You want to make it comma separated. Perl-py will just knock that out for you very quickly. And you have set to knock and do that too. But it has the issues we talked about earlier.

Starting point is 00:57:43 I didn't realize it was three different commands just squished together like a p flag and i flag and an e flag it just happens to just felt like a word so today i learned but yeah very powerful and i think like this thing that you're pointing out which is inextricably kind of linked in with pearl and its history at least in my mind but i feel like most people I talk to is my, is no one talks about Pearl without talking about regular expressions. Like that is the, like even these use cases, we're sort of saying,

Starting point is 00:58:16 well, not the Amazon one, but the other ones Jason is pointing out is all about like regular expressions and pattern matching and replacing and, and that kind of workflow. Yeah, totally. And so that's one, you know, Perl really popularized regular expressions. But I would say, you know, in the long run, the regular expressions is really the lasting legacy.

Starting point is 00:58:42 And it's an extremely powerful system that really every engineer should learn and get familiar with. So for those who don't know or haven't ever used a regular expression before, can you maybe like go, is that even like a possible, like the 30,000, but like, what is a regular expression? Why do we have that as a named concept? Yeah, I think that, so find replace is obviously really important. You know, a simple example, let's say you created a variable or a function name is even a better example. You created a function a while back and, you know, the function was something like replace circle with square float or something is the name of

Starting point is 00:59:26 your function. And then later on, you find, oh, you know, I actually want to use doubles, I want to replace all the floats with doubles in my program. So I want to change this function name. But I also need to change all the people who are calling this function. So, you know, if your function name is unique enough, you could say you know you know just go in your text editor and say go in all the files and replace you know find circle with square replace circle with square float with replace circle with square double and so it goes to replace all those strings but sometimes you know when you're in the habit of doing that especially for refactoring you can get busted Like imagine if your function name was,

Starting point is 01:00:05 you know, square root. And you're like, okay, I'm just going to go through my whole hard drive and replace square root or even this project or replace square root with something else. It's just going to be total pandemonium, right? You know, but if you could do something a little bit more advanced, like maybe, you know, there's places where you have, you know, square root in a comment, or there's places where you have square root, that's a variable name. So I want to say, like, let's actually look for a square root with an open parenthesis. And so you start getting more and more complicated, you know, find replace type things. Like another example is, maybe I want to find like contiguous sections of white space and replace all of it with a comma. So if I have, you know, Patrick space, space, space, space, space, Wheeler,

Starting point is 01:00:54 I don't want to have like comma, comma, comma, comma, comma. I just want to have one comma for that whole block, right? So you can imagine like a whole bunch of scenarios like this. And regular find or replace just can't do it. And so regular expressions were designed to do kind of text processing, more advanced text processing that we can't just simply do find or replace. Yeah, so I guess if I riff on that, if you go in your text editor and you, you know, special, special key F, whatever that is on your control or, or what is it command in Mac OS and, and you find something, it is like doing substring matching, right?

Starting point is 01:01:36 So I was looking for like a contiguous set of characters. So one space or two spaces or a, then the letter B, what jason is calling out is sometimes that's not powerful enough because as an example if you wanted to eliminate a word in a comment and you don't know where in the comment it takes place and let's say i'll use the example of c++ and i we know that we use the backslash backslash at the start of the comments and in our project then what i want to say is i want to find backslash backslash, but then I want to skip to anywhere in that line where the word foo is,

Starting point is 01:02:11 and I want to replace it with the word bar. There's no way for you to do like control F or control R for find and replace. There's no way for you to specify in the normal default behavior of most of those to say, look here at the beginning of the line, skip some stuff, and then match this like that kind of matching, right supported. So a lot of them do have the option of going into a regular expression mode

Starting point is 01:02:35 where you can you kind of write this. And that's a useful feature. But as Jason is pointing out, you need a special grammar or a special language for describing these more complex operations. Yeah, I actually want to interject like one kind of story here. When I was a junior engineer, I would get really, really frustrated when people would ask me to refactor things in pull requests. They would say like, change the name of this function or uh you know indent in this way or whatever and and the reason why i was getting so frustrated is because it took me a ton of time you know and especially if somebody said oh like actually i liked it so this happened to me one time where they said oh actually i liked it better the other way it had to do with um

Starting point is 01:03:20 there was a a variable that was plural and so they wanted me to make it plural. So let's just say make it simple. Like instead of apple, they wanted me to make it apples. And then when we actually looked at how it was used, they said, oh, no, I actually want to make it apple. And so I, you know, not knowing regular expressions was it was taking me hours to make this change. And I was getting really frustrated and it made me kind of frustrated at the whole refactoring thing in general to the point where i was kind of like look you know this is like i guess i'll put my researcher hat on like look you know the code works the numbers are like statistically accurate like i don't care whether the variable

Starting point is 01:04:00 name is apple or apples right but you know, I do care about the code quality. It's just that I didn't have the tools to do that efficiently. And so I care about the code quality if it's going to take me five minutes or 10 minutes. I don't care about the code quality if it's going to take me a whole day or three days, right? And so I learned regular expressions way, way too late. And so I guess it's a public service announcement uh you know learn regular expressions you'll you'll you'll end up writing much better code as a result so one of the things about regular expressions that i never got into but

Starting point is 01:04:38 i always find interesting is there is like a formalism built out around them where, you know, we're describing very common end programmer operations where you want to run something, but sort of building parsers and the language and the grammars of all this, I don't think we're going to necessarily talk about here. But like, I just point out to people that that is like a thing that you will run into or hear people talking about various ways of kind of doing these. Or if you're ever going to write a domain specific language or some sort of thing where you're not just writing sort of a text input, but actually like something that helps script or run some of your programs, you'll kind of run into a lot of the more academic approaches and regular expressions are sort of

Starting point is 01:05:25 an end-use implementation, of which there are a couple of flavors. There's like a Unix flavor and the Perl flavor of exactly what the special characters are, but it's part of this larger academic grammar for sort of actually doing these kinds of operations, like pattern recognition and matching. Yep, yep. Another thing that that regular expressions does really well, that's hard to get

Starting point is 01:05:51 anywhere else is is the grouping and the back references. And so the way this works is, you know, if you if you're also replacing, so if it's a find and replace, you might find a certain like, actually, Patrick, your example was pretty good. Like you might say, okay, find a line where it starts with, you know, a couple of slashes. And so I think there's, what is it the carrot sign, I think the carrot. So another thing about regular expressions, there's a ton of special characters. And you know, you can escape them out to treat the regular characters. But if you don't do that, they mean something completely different than look for a carrot. So if you see the carrot

Starting point is 01:06:29 symbol, it actually means this only matches if it's the beginning of the line. So you might say carrot slash slash dot star. And so dot means it could be any character star means any number of any characters or any number of whatever came before and then you know uh foo right so i'm looking for you know a string that starts with the slashes so it's a comment and then has something else and then now puts foo now if i want to replace that with something else right if i want to replace foo with bar right i need to keep all that other stuff like if i just say take that expression and replace it with bar it's gonna destroy the slashes and the dot star like my whole comment's going to be destroyed leading up to foo and i'm just getting

Starting point is 01:07:18 up with bar and then when i run that it's going to it's going to compile error because i lost the slashes right so you can create what's called a group so you can say look for something that starts with the slashes in the and anything you know a wild card and call that group one and then look for foo after that and then i want to replace that with group one whatever that and then bar, that's the back reference. And so you can do these like really intricate find replaces using regular expressions. And that ends up being extremely powerful. I think also like the common you're talking about refactoring, but like the common thing is like, if the thing you're trying to find and replace is like a substring of some other common

Starting point is 01:08:06 thing in your code base i think this is where the regular expressions really help as jason's mentioning like capturing that like i'm looking for this only at the start of a word so it needs to follow a white space or it needs to be before an open paren but then also sometimes if you are trying to like apple to apples is a good example. Is that idempotent? What is the phrase where it's like, if you replace Apple with apples and you run that twice, you will get a lot of occurrences where you get apples S like Apple S S because Apple is a subset of the word apples.

Starting point is 01:08:40 And so this is where it's really powerful but also these these back references and capturing this group of texts allowing you to repeat stuff so whatever you found after this like capture it and then like put it back twice like i want to repeat that thing or i want to put it as part of some other something so uh if i wanted to i wouldn't encourage you to do that but like if you were thinking about switching like uh from snake snake case, right? Is that right? Where it's the underscores to camel case. So it's like, if it's a start of the group or an underscore, then that whatever's next, like, imagine trying to do that, like Jason described, it would take you hours and hours to like, search your whole code base for underscores, ignore most of them them. Find the ones where there are variable names.

Starting point is 01:09:25 Like do these fixes. I'm not saying doing that with regular expressions at all. So it'd be really easy. That's probably could be reasonably tricky if you don't know what you're doing. But it would be at least, you know, approachable. Yep, yep. And there's definitely times where I've been able to use some clever regular expressions. And it is indistinguishable from magic.

Starting point is 01:09:43 Or maybe indistinguishable from hours of labor there was one case in particular where we were using pytorch and pytorch has uh there's there's a build system called basil for building code and so we're using basil and pytorch has basil as well, but the Bazels didn't quite match. There were some differences. And you ended up having to basically, I ended up having to rename pretty much anything that had PyTorch in the build files. I had to kind of give it a prefix

Starting point is 01:10:17 so that there wasn't a collision with this other thing. And so using some clever regexx i was able to do that and it ended up being a you know 65 000 line pr but it was just like in the pr description i just put the regex and so it's just like and i purposely like made a a pr that all it did was this regex. That way it's not like mixed in with real work, right? Yeah, and I mean, to do that by hand, right? 65,000, it would have been just intractable. We would have had to find another way around it. So regexes are super powerful,

Starting point is 01:10:59 highly recommend people learn them. And one really good way to do that is through these have you seen these patrick these websites where you can put in sample text and regexes and in real time on the website it'll you know show you the matches and the references and everything oh i've seen one that was like a tutorial but this one that you're you're you've got here in the show, this is like Godbolt, but for regular expressions. Oh, this is awesome. Wait, it's like what? Oh.

Starting point is 01:11:30 No, do your thing first. It'll make sense. I'll tell you at the end. Godbolt is like you put in a C or C++ and it'll run it through a compiler and show you the internal workings of it, whether it succeeded or not. Do the decompiling.

Starting point is 01:11:44 Yeah. You can check if how loop unrolling does or doesn't work given different like compiler versions or pragmas oh that is super cool it settles a lot of debates about what's faster that's awesome yeah i'd never heard of that that's a really interesting is it called it's called godbolt? Godbolt, yeah. Okay, I just, T-I-L. Today I learned. Okay, sorry. That's a complete deception.

Starting point is 01:12:10 So, yeah, so Redneck Explorer, the way it works is, you know, you put a sample, and, you know, oh, by the way, one thing we didn't mention is you can even do multi-line matching. So, you know, even if you have something really complicated that's multi-line, you can get it done. It gets tricky, but it's doable. But you can put some sample text. And then as you're writing your regex, this is just running in the browser. It's not copying anything to the server or anything. While you're writing your regex, it's real time showing you, you know, here's what you're matching. Here's what the groups are, if you want to reference them in your replace string. And here's what the replace string looks like if you're doing a find replace. So if you're doing a find, it'll also, you know, tell you whether this line would have been found or not, or this part of this line. Extremely useful. I mean,

Starting point is 01:13:02 one thing that you know, we didn't quite talk about with regex and pearl is or maybe maybe i'll just say regex it's extremely difficult to read you know reading a regex and trying to understand what it's doing is very very hard typically if i have a regex i'll almost always put a comment in front explaining like what i was trying to do with this regex because otherwise it's like you know slash bracket regular bracket abc b-z close bracket slash close bracket it's like it's it's it's almost like like hieroglyphics and so the nice thing is you can take any regex put it into this website and then you know put some sample text and see what matches and start to triage. That's definitely good advice.

Starting point is 01:13:48 Don't use regexes in your code to show up. If you use them, describe them. I'm not saying don't use them. That might be a bit too far. But I read some code base the other day that was using the ternary operator you know where you like put a boolean expression and then if it's true you do the first thing and if it's false you do the second thing and it's like a way of writing one-liners it almost never is not replaceable with an if else and so it was very frustrating because it takes 30 seconds to read what it's

Starting point is 01:14:20 saying instead of the you know five seconds of reading the if-else statement and they didn't write a comment they didn't you know anything it's like reading the if-else statement, and they didn't write a comment, they didn't, you know, anything. It's like, ah, why? And the red flag is the same thing. Even if you know what you're doing, you also, if you want someone to review it, not saying what you think it does requires the person to decompile it in their head, determine if that matches what they think the code should be doing. It's an extra step. If you write what you think it's doing, then they can just check that it matches that and then proceed. And that's actually just like a much better approach.

Starting point is 01:14:51 Yep, yep. I've started even, I don't know if the one I linked has this. I'm pretty sure it does. But as you're going through, you can, or maybe there's a button somewhere. There's basically a button where you can share the state of your regex you know exploration so so if you put in some sample text and you put in

Starting point is 01:15:12 uh you know a regex you can click this button now i you know i do think clicking the button causes them to save the sample text so if your if your sample texts are like government secrets or something that might not be a good idea. But you can click this, create like a shortcut URL, and then you could put that URL in the source code comments and say, this is the intent here. That's even better because now it takes people straight to that regex and they can see the decompiled result. I've often now gotten in the habit of putting either links to something like this or links to Wolfram Alpha. You'll see this in code all the time where it's like angle equals and then it's like cosine divided by this times this and then to the power of that. And then people

Starting point is 01:16:00 are just kind of scratching their head. So things like i'll put a link to wolf from alpha yeah i think for for you know big equate treat it maybe the salient point here is treat a regular expression like a complicated math equation and kind of give the right context man you got the life pro tips buried here down at the end of the podcast man yeah people got to stay till the end they have to listen to uh yeah stay to the end great tips at the end that's right yeah producer please make a note of that let's make sure folks catch the end of it are you just talking to yourself now jason yeah okay oh man very cool well this was awesome yeah i think you know we covered a language that we've we've uh this was a user request, actually, it's a multi, multi user request. Over the years, tons of folks have asked us to talk about Perl. You know, I think you can't talk about Perl without talking about regular expressions. And so we're able to kind of cover both of those, like almost a double header here. And, you know, thanks everyone for listening, staying till the end. And we appreciate it.

Starting point is 01:17:06 And thanks again for all your support. See you next time. Music by Eric Barndoller. Programming Throwdown is distributed under a Creative Commons Attribution Sharealike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you must provide attribution to Patrick and I and sharealike in kind.

CODACE Plant Stand

Programming Throwdown - 156: Perl and Regular Expressions

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Programming Throwdown - 156: Perl and Regular Expressions

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.