The Standup with ThePrimeagen - How BAD is Test Driven Development?

Starting point is 00:00:00 So today's stand-up, we are going to talk about TDD. Okay, we're not going to have some topical topic. We're going to go after the big stuff, okay? TDD, is it good? Is it bad? Do you use it? TJ's just smiling, looking like an absolute champ in that free frame? Dude, somehow TJ's the only human I know that gets freeze-framed and it's a good picture.

Starting point is 00:00:21 Yep. Yeah, look at DJ. I can't handle this right now. I can not handle this right now. All right, Casey, you probably have the strongest opinion potentially out of all of us. Okay. I doubt that, but okay. All right, because I also have a strong opinion, but I'm willing to...

Starting point is 00:00:42 It's probably you then. Yeah, I think Prime. You got to lead up the strong opinions here. I'll lead up with the strong one, which is, so I recently just tried another round of TDD. Prime, Prime. You have to say, hey, everyone, welcome to the stand-up. We got a great show for you tonight. I'm your host of Primogen.

Starting point is 00:01:01 With me as always are T.J. Hello. Trash. Hey, everyone. See Muratory. Great to be here, Prime. Thanks. Today's topic is tested.

Starting point is 00:01:12 Have you ever heard a podcast before? Thanks, everybody, for joining you're watching the hottest podcast on the greatest software engineering topics. The Stand Up today we have with us, Tej, who's in the basement fixing something. I know. Is that a home depot ad? Look, he's showing us his router. We have with those trash-deb broke-toe winning a judo competition, still living off that high,

Starting point is 00:01:40 and see Mortory, the actual good programmer among us. Oh, all right, I got a lot to live up to. Yeah, you got, don't worry, it's not hard to live up to, at least in this crowd. I'll kind of start us off on this one. So we've been developing a game called the Towers of Mordoria. We came up with the name from an AI, and AI generated that name for us. and effectively, I decided at one point I have this kind of really black box experience.

Starting point is 00:02:03 I have a deck of cards that I need to be able to draw cards out when they're played, they either need to go to the discard pile or the exhaust pile. When there's no more cards in my hand are not enough to be able to draw cards, I need to be able to take all the discard cards to teach it's just flexing on us touching grass right now, put all the discard cards, shuffle everything up, and then draw new cards back out, replenish the draw pile.

Starting point is 00:02:27 right this sounds like a TDD pinnacle problem there's a black boxes there's a few operations I'm going to be able to do this and so I just did this like a week and a half ago on stream it was amazing I did the whole thing designed the entire API in the test then set up all the tests set up it all to fail then it did work or then it failed then I made it work then it worked and it was great and everything was perfect and I was like there we go we just did it I just TDDed and it was a good experience so then I took what I made and I went to integrate it and it was a great and it was like there we go into the game and realized I made an interface that's super good for testing and not actually good for the actual thing I was making and I had to redo it.

Starting point is 00:03:06 And so it always, you know, that's kind of been a lot of my experience with TDD is that either the thing is simple enough and a super black box enough that I can make it and both the testing interface and the usage interface is the same or I create an interface that's really great for testing and it's actually not that great for the practical use case. and I know people talk about how it's a great way to develop your, you know, your architecture and all this kind of stuff. I've just yet to really buy it, but I will say the thing that I do like doing is on problems that are hard to develop and have many moving constraints. I do like to co-create tests with it, meaning that I come up with an idea, how do I want to use it, I program it out, then I go create a test, which is backwards from TD, go, okay, this is what I want to see. Am I even correct on this? Can I get a fast iteration loop being like, no, no, no.

Starting point is 00:03:55 No, no, yes. Okay, got it. Next thing. I want to make sure it does this. No, no, no, no, yes. Okay, got it. And so I do like the fast iteration cycle of getting tests in early, but few problems feel like I can actually have that.

Starting point is 00:04:09 And I feel like this. All things need to be tested immediately and need to be written first. You just write dumb interfaces. Everything becomes inversion of control. Everything's an interface. And you just have this abstract mess at the end of it because testing was the requirement, not, hey, use it when it's good.

Starting point is 00:04:26 So that's my TDD experience, and I feel fairly strong about that. And if you say TDD is good, I feel like I'm going to jump kick you. I can respond to that or Casey, if you have strong feelings. No, I think trash, let's go to trash take first. Yeah, no blockers on my end.

Starting point is 00:04:49 First, you couldn't, like, deter me from doing any task than to tell me I need to, like, start off by, writing tests first off. Second, I kind of had a very similar experience recently building a little seal. Hold on. Can you rewind for a second? What did you just say for your first point? I said you couldn't deter me from doing

Starting point is 00:05:06 work if you let off with me having to write tests first. I don't even know what that means. You mean you couldn't deter me any more like that's the most deterring thing? Man, am I stupid? You are. Can you try using a different word to describe what

Starting point is 00:05:22 you feel. Okay, let's reword this. Okay, let's reword this. this, flip, please take this out. Flip, leave it. I don't like test-driven development. However, however, however, I just built something recently at work, which was very kind of similar to what Prime was saying. I ended up doing a little bit of test-driven development. But you wouldn't notice this, but I'm actually really kind of passionate about testing. So I'm going to kind of like, when I test, I would say 2% of the time I actually eat off with,

Starting point is 00:05:55 test-driven development. And the rest is usually post-implementation because, one, requirements and all these things change afterwards, right? So it's like why write tests for things that are potentially going to change and you don't really know what's going to happen? But I do write test for very granular things that I think are going to be used all the time. So for instance, like if I wrote a parser for like my CLI, the parses going to be reused by multiple commands, probably never going to change. So I typically will write that. And I'll also also write tests for things that take multiple steps to test. So like, you know, people argue for n-10 tests because you test just like a user. So that's kind of how I like to develop. I like to use it like I would use it for anybody

Starting point is 00:06:35 else. But if I got to do like three steps to get to the point I want to test, I'm like, screw this. I'm just going to write a test specifically for this like instance. And then I save time that same time I arrive at test-driven development is specifically in those scenarios. Otherwise, it's purely just like, you know, what do you call it, integration testing? stuff like that because you end up with stuff like like prime said like you when you write test you write everything so granular and everything works so modular that you just don't even know what it's like when you actually put them together so that's why that's kind of like the way i approach things that being said like people that kind of like live and die by test certain development i don't

Starting point is 00:07:14 i don't get it um but i was hoping someone would be pro tdd in this group so i could argue with them that's all I got for right now I'm pretty much right there with trash on this one my feeling on TDD is pretty much the same as my feeling on Oop right the problem is not if you end up with an object

Starting point is 00:07:38 or something that looks like an object in your code in Oop that's fine the problem is the object oriented part the like thinking in terms of that wastes a lot of your time and leads you to bad architecture I think Td is the same way. I think testing is very good, but test-driven development, meaning forcing the programmer to think in terms of tests during development. Like, that is what your primary thing is, is I'm making tests, to drive the development, the TD part.

Starting point is 00:08:04 That's actually the problem. I think the reason for that's pretty straightforward. And it's kind of weird because usually the people who advocate for TDD are the same people who would poo-poo any similar suggestion. Like, any suggestion that, like, oh, we should measure performance during development. They'll be like, that's ridiculous. You're wasting so much program time. We're like, what about the 8,000 tests that no one ever needed because we ended up deleting that system anyway? And they're like, well, you know, test driven development is very good.

Starting point is 00:08:31 So I think, like, the problem with all of those sorts of things, any of them, is just putting emphasis on something instead of talking about it like what it really is, which is a tradeoff. If you're spending time making tests or you're orienting things towards making tests, that is necessarily programming, time that isn't being spent doing something else. Like programming is zero sum. If you spend time doing one thing, you're not spending time doing another thing. And so, for example, if you end up spending a lot of your time making the interface revolve around tests, you weren't spending a lot of your time making the interface revolve around actual use cases because actual use cases don't usually look like tests.

Starting point is 00:09:11 Therefore, sometimes you get the exact situation that Prime was talking about, where you do test term development and you end up with a completely unusable mess, which is actually very common in my experience, right? I wouldn't call it a mess. I write pretty good guy. It was, it was like pretty easy. I just want to say. No, the API.

Starting point is 00:09:25 The API is a mess. Not the code, right? Like the code has been tested and it probably works. But the API is a mess, right? I mean, that's what happens. Because it wasn't designed specifically for the kinds of use cases it was meant to tackle. And so you end up with that problem. And again, it was shifting the developer's focus was all it took to make that happen.

Starting point is 00:09:45 Good APIs are hard to make in the first place, right? So if you're. Adding more complexity to the programmer's workflow when they're trying to make an API that's good for the actual use case, you're just going to get a worse result. Again, zero sum. So what I would say is like testing is better something that you do as the thing takes shape. When you're like, okay, we now have figured out like how this system works properly. We're really happy with it. The API is working nicely.

Starting point is 00:10:12 We check the performance and we think we've got like reasonable. We've got a reasonable path towards having this perform really well. now let's start seeing what are the what is the overlay like what are the parts we can now start to add some little substrate to to test that we think are going to have bugs that will be hard to find because that's usually the way I look at like what are the parts of this system whose bugs will be either very hard to find or really catastrophic like they'll cost us a lot of money or they'll crash a user's computer or whatever right and then you put the tests for that right and that's how I've always approached it and I think that works pretty well and you do you do want to do that step, but I don't think you want test-driven development. It just seems like a really bad idea. And to be honest, when I use software that comes from people who are like TDD, this organization, CD, whatever, it's full of bugs all the time. Like I'm not getting these pristine software things.

Starting point is 00:11:07 I mean, I imagine that the people at YouTube do a lot of testing and code reviews. And yet for the past eight years, the play button cannot properly represent whether something is playing or not. eight years, right? Sometimes it's paused and it has the play thing and sometimes it's playing and it has the play thing. And sometimes both of those times it shows the pause thing. It's just, it doesn't seem like the testing is working, guys. So again, I'd say, you know, I think you have to kind of rethink how you're allocating that time. That's what I would say. Can we get TJ in here? Can we can playboy T.J. We don't know this one. Can we? Am I coming in loud and clear?

Starting point is 00:11:45 Yeah. Your voice sounds good. Your voice sounds good. I went outside like trash said. Thank you, thank you. So I would say, I'm, I think I agree mostly with what everybody is saying, although I would say for a few kinds of problems. The problem is I think people, when you say TDD, they just mean 10,000 different things. So this is like one of the problems, right? There's like people who are like TDD is only if you do red, green cycle before you write the

Starting point is 00:12:13 code, you have to write the tests. And like you can't write any code until you've written a failing test and blah, blah, blah. which like I get that maybe they're the people who invented the word so they can define it that way but like I find that not very useful for some stuff but there are like testing

Starting point is 00:12:33 or like projects where I think I test like basically everything and there's just ways that you can do it that are like more effective versus less effective like my favorite kind of testing is if you can find a way to do like snapshot or like golden testing whatever, you know, people have different names for it,

Starting point is 00:12:50 but where you can basically just write a test and then say, assert that the output is the same as it used to be, right? Like, so, and usually I've seen those called snapshot tests or expect tests. I find that those are like super... Yeah, yeah, or like people will use them in regression, but like the kinds that I've done it before, like, maybe I'm building up a complex data structure to, like, represent the state of like a bunch of users,

Starting point is 00:13:17 or maybe it's the parse tree of something, or maybe it's the type system, like whatever it is, if you can find a way to represent that in like a human readable way, usually just by like printing it in tests that I've done, you can just like compare the diff between this one and before you made the changes.

Starting point is 00:13:35 And they should be fast at least. If they're not fast, then that's a separate problem. Like your tests have to be fast for people to run them, otherwise they don't run them. Then those kinds of things tend to be, I find, like, very effective and give you a lot of confidence that you haven't, like, scuffed anything throughout the system with your changes, right? Because you're sort of like asserting

Starting point is 00:13:55 the entire state of something. And if something changes, it's really easy to update because you just say, oh, snapshot update this. Cool. Done. One second, all your tests are updated. You walk through them and change them. You get some other side benefits, too. Like in code review, you literally see the diff of the snapshot. And you can be like, um, that looks wrong. that used to say false and out says true, those things that shouldn't have been flipped between those. So there are like certain problems where I've employed like nearly TDD in this snapshot style in the sense that the snapshot always fails on the first time you run it

Starting point is 00:14:33 because there's nothing to expect. And then you accept that snapshot if it looks good or keep changing the code until it does look good. And then you like move on to the next thing. So but that, but it's kind of like a unique case. Like, I wouldn't want to snapshot test a, like, HTML page, right? It would just, like, there's no way that they're reproducible,

Starting point is 00:14:56 it feels like anyways. But, like, so there's, there's, it's only for like certain subsets of problems, which have been more effective, like, for me, because I've made a lot of dev tools in my career and stuff like that. So you want to do, like, verify that all of the references of something look like XYZ. Okay, very easy to do with snapshots.

Starting point is 00:15:16 Um, So that's kind of the main, the one thing that I wanted to throw out there is like that style of testing I find really powerful. And I've used it a lot. Like across projects you wouldn't really expect as well. I've always had the opposite experience with snapshot tests. I feel like it's okay maybe if like it's you and maybe some other people that actually know the expected output of a snapshot. My experience is it changes and they're like, oh, I was just update it. And then the test passed again.

Starting point is 00:15:45 and then you're kind of just left with the snapshot. And you're just like, all right. So it's to the point where it's like, all right, let's just get rid of them because they're just changing every time requirements change and no one actually knows what the expected output is. One more point. I want to make. Someone said something about like test help refactoring.

Starting point is 00:16:01 I think that's true after a product is actually mature because you're just going to be, it's kind of like a snapshot test where you're just going to keep updating your tests every time something changes to the point where it's kind of pointless. So I always find when tests are, like that, I usually just delete them immediately. Oh yeah. Go ahead, track. All right. I called you trash. I'm like, yeah, no, you call me trash.

Starting point is 00:16:21 Your comment filled Prime with so much joy. It translated into physical up and down motion. Yeah. I was literally watching him as I was talking and I was like, let's just see how hype we can get pride. I wish I could see you guys. I see nothing. Yeah.

Starting point is 00:16:41 Yeah, we see you. We see you guys. Yeah. We get to you, TJ, enjoy grass. If you see a strange mushroom in the grass, do not eat it. Oh, yeah, don't eat it. Don't eat the strange mushroom, DJ. When you're mowing, there's a strange mushroom.

Starting point is 00:16:55 Guys, do I, I don't have my AI glasses. Do I eat this or not? Do I eat it or not, guys? No, don't eat the mushroom. Eat the mushroom. That's all I heard. Okay, I'm going for it. Prime, send your thing.

Starting point is 00:17:09 Go on. All right. All right, so the reason why I have, actually agree with trash a lot on this one is that I've done a lot of snapshot testing and I think there's probably does exist a world where this makes a lot of sense. It's just all the times I've ever done it. Whenever a requirement change, or I have to make a change that warrants changing the snapshot, I've also had to make some decent, some non-minor change also to some logic throughout the program. So when my snapshot breaks, I also don't know if I got a right. So now I'm like trying to hand eyeball

Starting point is 00:17:40 through this, this thing like one end of the line, like, okay, is this, is this really? what I meant it for it to be in this moment. How do I know I should change this from true to false? Like I have to re-reason about everything, and I find that true snapshot testing on big things is, feels very difficult, though I love it. Like, I want, like, ultimately that's what I always want.

Starting point is 00:17:59 I fall into the, hey, let's just make a golden trap constantly. Like, that is like my default go-to. It's just like, it's golden time. And then I always get sad. As far as, like, the whole unit test breaking and mature product thing, I just find that testing generally, I do not think testing helps for refactoring unless it's a true end-to-end test. Like you use the product from the very tippy top and you only have the very bottom of the output. You have nothing else you look at.

Starting point is 00:18:27 But often I don't, like, that's really hard to test because I'll write a function that I know is very difficult to get right. And I just want to test those five things. So I'm going to make a test that's highly coupled to that piece. and if you decide to refactor it, yeah, the tests all break. Like, that's just a part of it. I wrote it so that this one thing works this one way, and I wrote a test that runs that one thing to make sure you didn't screw it up if you had to make a minor bug fix to it.

Starting point is 00:18:52 I don't try to do that whole. I just really think this whole idea of unit testing and, oh, it will prevent you from having headaches while refactoring has just been alive from the beginning. And then what you'll hear is people do the exact same thing every time. Oh, that's because you're testing at the wrong level. It's just like, I'm just testing the things that I, want to test. I don't test all the things. And what happens if you're testing on the running level just simply means that you broke a whole bunch of tests, but then some of your

Starting point is 00:19:17 test didn't. And that gave you the confidence that you did the right thing. And I just don't, I just am not going to test the universe. I'm going to have a few end to end. I'm going to have the hard parts. And that's that. No more. I also think like, I mean, when I'm thinking back to times when I've done testing, like a lot of times when I will break out testing, it's for things that I know aren't really all that refactorable or changeable anyway. It's like, okay, I made this memory allocator, and we pretty much know what it does, right? You have these like two ways you can allocate memory from it,

Starting point is 00:19:53 and you have one way to free or something like that. And then I just have these tests to make sure that, you know, it doesn't end up in bad states or whatever it is. And that's a really useful thing, right? I've definitely done that. I do that for math libraries. Like we kind of all know what the sign of something, things should return. So if I'm making a math library that implements the sign function,

Starting point is 00:20:14 that's not like, I don't have to think too hard about refactoring sign because math has been around for like, you know, hundreds of years and isn't going to change tomorrow. Like, hell, bad news, guys, like React updated and now the definition, the mathematical definition of sign is different today. So I'm so happy that's not even a possibility that React does stuff with math because we need. You know, guys, there's still time. There's still time. Yeah. So, so I do, I do like him for that. And actually, like, there was a, I can think of a specific time when I shipping game libraries when we had a really bad bug that caused a problem for one particular customer that took forever to find. And it was because it was in a thing that I had added to the library just for that customer as like a back door. Right. They, like, we need this one thing really quick. And I was like, okay. And I put it in there. And it had a bug in it. And of course, it wasn't in any of my standard testing or anything. It wasn't even really in anyone's use case, right? And, uh, and so like,

Starting point is 00:21:17 you know, that was a, to me, I was like, okay, that's a pretty, that's pretty good evidence that, like, the testing that I was doing before was good and, like, wasn't a waste of time because, like, the one time it didn't happen, it caused a significant problem, right? And so, you know, presumably I would have had more of those in actual other customers had all the stuff that normally shifts to the library not been tested in the way that I thought it should have been.

Starting point is 00:21:44 And from then I was like, I was like, you know, never do a one-off ad. Like if I'm going to add something for a customer, I'll add it like the right way and actually put in the testing for it or I'll just tell them I'm sorry I can't do that right now because, you know, I'm too busy or whatever. And so I do think like, yeah, when you can actually talk about this is isolated, this is what this thing does.

Starting point is 00:22:05 I'm pretty sure it's not going to change. I think the tests are great, but before then, yeah, it's just, and also, this was just kind of going off on a random anecdote, but just to circle back, the thing that Prime was saying is absolutely true, it's like refactor testing, not really. Like, that's only, you can only really get, like, a very high-level confidence from that. You can't really know, there could be all sorts of things that aren't really working because whole system testing like that is going to miss,

Starting point is 00:22:29 it's going to miss, like, weird edge cases that can overlap. And the only way you catch those with test-term development is knowing about those edge cases and making tests to kind of target them. And you can't do that if you just refacted everything because now the edge cases are different, right? So, yeah, I think TD people overstate the degree to which that really works. That's why I'm a huge fan of PDD-PROG-driven development. I think that every time you release and you have a bunch of people that use your product, you should do a slow release. you should measure the outcome because it's just going to test all the weird things you're never going to be able to think of or set up. And if it starts, if it starts breaking, you know that you screwed up.

Starting point is 00:23:12 Previous version worked better than the new version. You can't release this new version. You got to back it up. Not P Diddy. PDD. Okay. Stop safe. There's no baby oil in this situation.

Starting point is 00:23:23 Okay. But I mean, to me, shots fired at Sonas or Sonas or whatever that company is called. I don't know what they did. I don't know what they did. I don't know what that. Yeah. So you don't, you, you, you guys are supposed to be the web dev people. What's what hell are you talking about, Casey?

Starting point is 00:23:40 Sonos, the company that makes the speakers, nobody. I know, Bose. Are you kidding me? You guys are the web dead. Everything got. Beats by Dre. No, this is, this is like legendary. Like, everyone knew about this.

Starting point is 00:23:54 So there's this company called Sonos. They make like the most popular internet powered speakers in the world. Like everyone has these things. I mean, not me because I'll, there's no way in hell I'm connecting speakers to the internet. That's just a way to make them not work, which Sonos went ahead and demonstrated for me. They had a product that everyone liked.

Starting point is 00:24:11 They were extremely popular speaker. They then did a big software update where they were like, we're rolling out our new system. Like, you know, kind of like how Google does every couple years. They're like, hey, guess what? Gmail's going to suck way harder than before today. Like you'll get five months of being able to switch between them and then you're on to the new bad one.

Starting point is 00:24:30 They did that, only there was no five. months. They just basically said, like, here's the new rollout. And it was awful. Like, tons of feature regressions. It didn't really work. Customers couldn't do most of the things they used to. And it, like, tanked the company. Like, it was just, like, immediately everyone hated it. And nobody wanted to buy their stuff. And they had to do this walkback. And they couldn't roll back because they had redeployed all their servers. And I guess they didn't know how to, like, undeploy them. So they couldn't, you couldn't go back to the old app. It was, go read about this. I can't believe no one knows about this but me.

Starting point is 00:25:04 It was a massive failure. I know it. I know it. Never heard of them. I heard it. Okay. Casey, I don't know if you guys can hear me, but I heard that. Okay.

Starting point is 00:25:12 We can hear you. We can't really see you. You have very few expressions. That's fine. But I do. Okay. Your point is that hardware is definitely different than software. You know, if Twitter has a small breaking thing because they made something,

Starting point is 00:25:27 Twitter, all these major sites, they break all the time, right? I have much more forgiveness for that. This is, you know, it's a, you know, it's a small breaking thing. that's obviously different than, you know, you've got to play the field a little bit different when I say break production. If I'm building a game that only, you know, so many people play, you don't want to just crashing nonstop while you're testing features. There's probably some levels.

Starting point is 00:25:47 But if your Twitter and all of a sudden you can't post for five minutes and only 0.001% of the people couldn't post for five minutes, it's just not going to make a huge splash and it's probably not going to deter them from actually using the products again. Well, I was actually, no, I was agreeing with you with the Sonos thing. What I was saying was they should have done a, exactly what you're talking about. They should have taken and deployed a small server bank for what they were doing. Oh, I see. Given it to 1%

Starting point is 00:26:10 of their users and their users would have been like, this is utter garbage and I do not want it. And they would have like, okay, pull that on back. Nobody saw it. We're fine, right? So exactly what you're saying, I was totally agreeing with you. I'm like Sonos, if they had done that, they would have totally saved their company literally like six months of hell. Had they

Starting point is 00:26:28 just given a slice of users this thing for a while they would have been, you know, some significant percentage above beta test, right? Like so, like you said, you know, five, 10 percent of the people got this. They would all been like, nope, nope, nope, nope, nope, nope. And then they would have like, you know, be like, okay, okay. So that's all. I do want to, can I say one thing just about the testing part, though? Is that okay from before?

Starting point is 00:26:53 Maybe. Sure, that's what the podcast is about. Exactly. I like this. I like this. Go for, TJ. My favorite project that I've ever worked on, NeoVim, by the way, way. Can you go back to the grass? You were much more like better in the grass.

Starting point is 00:27:08 Really? Your house is blocking the tower. I got a notification that said your phone is overheating because I was in the sunshine. I was actually going to bring that up in the beginning. Like it literally said your phone is overheating. Make sure your body's the one covering. Okay. All right. Sit over. Create a shadow, TJ. So they can't. You guys are in the shadows now. You got a plank. Plank over your phone. Hey, what's up boys? Hey, what's up boys? This is like getting into like T.J. profile pick territory where it's like on the dating app. Can you tell us some of your pet peeves, T.J. Do you like long walks on the beach? I'm a, I'm a long walks and joyer. I'm really not a fan of short conversations.

Starting point is 00:27:47 I want to really find out about the real you. I don't know what else. I don't even know any my words are coming through. DJ is actually just describing himself right now. That's all T.J. is doing. I love laughing. I love having fun. I like to make you laugh. This is just me to prime. This is my pitch for getting on the podcast. It's true.

Starting point is 00:28:11 I'll make fun of trash. Yo, if there's a guy named trash on the podcast, I will make sure to make fun of him every episode. He uses one password. Are you joking? All right. All right. All right.

Starting point is 00:28:25 TJ, what's your take? My take is my favorite project that I've worked on. And like, definitely the one with the most weird edge case. and probably users, NeoVim, like, the test sweep for NeoVim is amazing. We have tons of, we have different, like, styles of tests. We have different ways to run them. We have all this different stuff. Tons of work went into making it.

Starting point is 00:28:48 And maybe that's just partially because NeoVim is, like, trying to maintain compatibility with VIM on some stuff, like, literally forever, right? So there's some other requirements that are different. But, man, every time I did a PR and a random test. broke. It was my fault. It was not the other test. It was not the snapshot tests. It was nothing else. And like random stuff. I mean, maybe it's just because NeoVim's also like a really old C project. So you like change something and then a random global is destroyed. I don't know. It's like whatever. It's maybe not always the best. But like so I don't, in in my experience where people like actually

Starting point is 00:29:26 cared about it and like cared to maintain snapshots and make those happen. Like no one's even getting paid to work on NeoVim. You know what I'm saying? And like the snapshots were valuable. The unit tests are valuable. The functional tests were valuable. And I used them in a ton of different PRs and across like really large refactors, adding Lua to auto commands, like doing lots of different stuff where I had to change tons and tons of the code.

Starting point is 00:29:49 I was just wrong every time until the test went green. I think that probably goes to show that tests are as valuable as the people who wrote them. I was literally going to say that. Yeah, that's what I've been waiting to say, too, is I just wanted to say, but since I've been lagging, I didn't want to yell. I just wanted to yell skill issue to everything trash has been saying. I would disagree, actually. I will defend trash, actually. Casey's smarter than you.

Starting point is 00:30:16 Because here's the thing. That's true. Dang it. I suspect that some of what you're seeing, though, is not so much the skill of the people who did it, although they may have been very skilled. It's more just about when the tests were added. So at least for me, a lot of times what I'll do is I'll go, okay, if I, like, let's say I have a bug that I actually work on for a day. Like, like, this is, because most bugs are like, oh, yeah, I know what it is and you just fix it, right? But you get one and you're just like, what the hell is happening?

Starting point is 00:30:49 And you actually, it's like that once every few months where you're just like, okay, this is really weird and I'm not sure what it is. when I find those I typically try to add a test or assertions or whatever I think would have caught it right because I'm like okay this one's nasty and I wasn't expecting it what can I add

Starting point is 00:31:10 that will trip this kind of thing up and I'll add it and I suspect that much of what gets seen in like mature products like NeoVem are just like hey if you've got some post test discipline like after we made the product when we're finding bugs we

Starting point is 00:31:26 add the test that would have caught the bugs, those are often very good because they come out of what are the things that are likely to break when we do stuff and we put in a net there to catch them. And so those are like very good. And they're the opposite of TDD. They're like, oh, they're TDD. They're test-driven debugging. Maybe you could say or something like that. So I do think there's something to that that's a lot more.

Starting point is 00:31:54 I had a couple other hot takes that I'll get to a second, but I just wanted to leave it. I wanted to say that one based on what to you say. I don't know if that's true about NeoVim, but I might suspect it, right? It definitely is. The thing, though, that, like, I was more pushing back. I was like, I find the snapshot tests in Neovim invaluable.

Starting point is 00:32:10 Like, they're amazing. I find a bunch of these good end-to-end tests, truly end-to-end, like literally it runs a test and does user, I don't know if I just dropped. It does, like, user input and then asserts the outputs, right? I found those super valuable in Neovim. Probably just dropped. I also, no, you didn't drop. We heard your point.

Starting point is 00:32:33 I think also to put on top of it is think about the horizon of Neovim versus you starting up a new dev tool at work. And maybe there's something to be said about when are the Golden's introduced during exploration phase? Or like, Neovim's just not going to be changing. My assumption is that a lot of Neovim doesn't change super hard. It's not like, hey, guys. Let's do a whole new data model today.

Starting point is 00:32:57 Wrong. That's all the ones. That's all the ones that everything you guys have pooed on that I found them the most valuable. When I changed literally the entire data model and how all auto commands work all inside of Neovim, it was the most useful to have the snapshot tests then. I mean, your testing is going to depend greatly on what you're building 100%. Sure. Yeah, yeah. But I'm just saying, like, we were fundamentally changing the data models.

Starting point is 00:33:21 Yeah, I'm just saying. That's what happened. I thought it was a pretty smart statement. Come on! You're really smart, Trash. I wouldn't poo that. I wouldn't poo that at all. You've noticed I have not pooed snapshot testing.

Starting point is 00:33:37 True. Good job, Casey. I love you. You're smarter than trash. I love that I'm not getting dugged on it. Can we start rearranging the windows based on who T. Teage thinks is smartest at any given time? And I'm guessing that it has a very high correlation.

Starting point is 00:33:53 to who is agreeing with him. It's like I'm getting that sense. No, I'll just always put trash at the end. It's fine, Casey. You don't have to worry. Today I'm going to tweet that Casey protected me from Teage. Yes. This is a pinnacle moment of my career.

Starting point is 00:34:07 I can't even see your guys as videos. So I have no idea what's happening. I'm talking to a black screen right now. Yeah, yeah. I've been just cracking up at your feed this whole time. I just can't handle it. It's pretty good. It's very good.

Starting point is 00:34:21 Like, the result is top notch of the feed. Unfortunately, I think this will be a stream-only feature. I don't know if T.J. will actually have this effect on the video, because Riverside will be uploading it. That's true. Yeah, that's true. He'll still be in weird, like,

Starting point is 00:34:35 it is profile, his dating site profile pick phase, but it will be much smoother, yeah. So, so anyway, I wanted to add a hot, I would agree that snapshot testing can be good. It just depends on whether those snapshots are going to be stable. I mean, that's the bottom line. I would imagine what Prime and Trash we're talking about is if you're trying to do snapshot testing at a level below

Starting point is 00:34:59 something that's actually consistent across the refactor, obviously that's not going to work, right? And so if your snapshot testing is like someone mentioned a parsery, well, if it's a parse of something that could be parsed different ways depending on how you chose to implement that subsystem, those snapshots are useless. If the parse always has to be the same because it's like parsing for the C language specific,

Starting point is 00:35:21 and it can't change because, like, you know, it's hard-coded. So if the thing's in C-99 mode, then it had better produced Zach Parsry, that's a great snapshot test, right? And so I think the difference boils down to those things. It has to be about something that actually is, you know, not going to change. Item potent to the release of the project. Otherwise, it's a limited time test. But, you know, that's just how it is.

Starting point is 00:35:45 Probably also the scope of the breaking, right? If only 10% of your golden is invalid because you made this change, that's a lot different than 100% of your goldens are invalid and you have to like rework all of them because of big kind of refactors. That was my last experience is that I had to do a bunch of testing against this games reporting thing at Netflix and to be able to do this auto dock and to be able to do it, I had to like create the data model of the incoming events. And then they are changing the incoming events like once a week because it was still an exploration phase.

Starting point is 00:36:15 So it's just like, well, not only is my input incorrect, now my output's incorrect. So I have to change both sides. So I'm just like, I guess I'm just redoing all the things again. Here we go. And so then it just became super hard to know if I was actually right or if I was breaking things. So I think at the outset I made it pretty clear, I don't like test different development as an idea. I don't like focusing the developer's attention on tests. I think they should understand tests and know how to deploy them smartly, but I don't think, do you right?

Starting point is 00:36:40 I do want to say there is a place where I do recommend literal test-driven development. And it is as follows. That is if the programmer is finding that they are low productivity on a particular task, and that task does not provide feedback. So, for example, if you have to implement some low-level feature component that has no visual output, no auditory output, no anything, it's just like an abstract thing that happens inside the computer, right? then sometimes programmers, like, they get bored working on these sorts of things, but a lot of programmers will, like, focus on things like numbers that come out of a thing.

Starting point is 00:37:30 Like, if a thing, if you run a thing and the number, like, you know, 37% pops out, suddenly the programmer's, like, super motivated to get it up to, like, 45% and 50%, and, right? So sometimes it's like, oh, okay, if you can turn this problem, if you can make a, test that turns whatever you're working on into something with a concrete output that you feel some reward about when it goes up or improves, that can be a valuable use of test-driven development. And we do this in performance a lot. We'll make a little test around something that outputs things like how many cash misses did it have or how many cycles did it take to complete or whatever.

Starting point is 00:38:11 And it's like, ooh, now I'm like trying to get that number down. and that's a lot more fun than I'm just trying to make this system faster or whatever. So, like, a lot of times you can make it so that those things go together, and then you get the test or the good statistics gathering or whatever it is that you actually want it anyway, and you get this motivation for the programmer, and they all kind of work together in this nice way. So I just wanted to put that out there's one place where I think it does kind of work to drive development by a test of some kind. First thing that comes in my mind is like code coverage reports And you see that number really low

Starting point is 00:38:48 And I have the opposite reaction where I just don't care And I want to quit I don't think I've ever saw like a low code coverage report And I was like, you know what? Today we're making that number higher I'm kind of just like guys we suck And I'm not going to help us get better Just no way

Starting point is 00:39:07 All right I mean to be fair If I go to a place It's like 100% code cover I also go, uh-oh, right? Like, I'm also kind of worried about the 100% code coverage as well. I'm not going to say the company I was at.

Starting point is 00:39:20 I've been at many. Netflix. And there was a team that required 100% test coverage to the point where people were just like writing the most worthless test just to get like this like PR bot to just shut up. And to make it even like funnier, there was like bugs every day like in production. I was just like, what's happening here?

Starting point is 00:39:44 I don't know if anyone's ever experienced like a hundred percent like test coverage quota by their team. Yes, I did it with Falcour when I was at Netflix. 100% code coverage bugs every day of my life. Oh, for Falcour, the one line production takedown? Wait, do you say Falcour? Uh-huh. Like the luck dragon from the never-ending story? Yeah.

Starting point is 00:40:10 What? Why was it called that? You don't know what Falcourt is, though? I do know what Falcour is. luck dragon for the never-ending story. Yes, it was named after the luck dragon from the never-ending story. Tell the story, Brian. Tell the story. Tell it all so Casey can hear it and then we'll be done.

Starting point is 00:40:24 All right, I'll do a quick one. Casey, Casey Snow, how it works is that you would provide effectively an array with keys in it as the path into your data model in the back end. So say you want videos, some ID title. It would return to you the title in that kind of as a tree, videos or as a graph or maybe as a force is a better way to say it. Videos, ID, title. So if you wanted, say, a list, I want list zero through ten title. It would go list zero through ten, which would be IDs, which would then link to videos by ID title. So it's kind of like a graph like data structure that makes it so that if you make duplicate requests,

Starting point is 00:40:57 it's able to go, oh, okay, I already have videos, whatever. I already have list item zero. I can kind of not request so much from the back end. The problem was is that what happened if lists seven doesn't exist? You don't know. So we have to return something back to you that says it does not a. exist. So we return something called like a, it's a boxed value. It's an atom of undefined. It lets you know, hey, there is a value there and the value there is nothing. You don't need to

Starting point is 00:41:22 request it again. It's not going to be something. So we created something called materialize. On the back end, that means you need to materialize missing values. Well, what happens if you request lists zero through 10,000, zero through 10,000, zero through 10,000, title? Well, you just created a gigantic number that we're going to materialize everything for you. And so with one simple line of code, a wild loop and bash, you can dost, or at one point, 2016, you could have dozed down and took down Netflix permanently. There was no rolling back. It was in production for like two years. You could do it for all endpoints, for all devices, for everything, and just, it would never run again. And that's it. It was a simple wild loop, and I wrote a lot of beautiful

Starting point is 00:42:03 code about it, and I did a lot of beautiful stuff with it, and it was very nice code. And then I also discovered it and reported it. And then later on, it got named the repulsive Grizzly attack. Repulsive grizzly? Yeah. So has no one at Netflix watched the Neverending Story? Shouldn't it be called the Gormork or something? Like it's a movie, it's probably on Netflix.

Starting point is 00:42:25 Okay, no, no, this wasn't some 9,000. It was a completely separate team that when they found out about it, they had to do a bunch of work, and it ended up being, you know, a fix on the Zool Gateway and all this kind of stuff. Because there are no grizzly bears in the Never Ending Story. I didn't read the original book, because I think it's in

Starting point is 00:42:41 German or something? I don't know. Yeah. Yeah, but the nice news is I originally didn't come up with the materialized idea. But I did do a lot of refactoring that while refactoring and recreating everything, I stopped and went, wait a second. Something seems funny here. So then I trashed staging for a day where no one could use staging for a day. And then that's when I realized we made a mistake. When I say we, I mean me. And that's it. he's off.

Starting point is 00:43:12 It's like, this is the guy told my story, and I'm not participating in this podcast anymore. The transitions at the end of this podcast are just getting worse and worse. Is that stream? He walks off frame. What's happening? All right.

Starting point is 00:43:27 Is stream done anything you want to add to this all? He's framed on. Are we still live? Yeah. We're still live? Dude, find a friend that smiles at you like DJ smiles at us. I think today's big takeaway is don't bring trash dev into a project that needs help. He'll just quit.

Starting point is 00:43:43 Yeah. No, actually, I will help your product get better. Bye. He'll drive your code coverage metrics down to zero. Yeah. Code coverage is a lie anyways. Are we live? And then quit.

Starting point is 00:43:56 Oh, TJ's back again. All right. Well, what is happening? Don't know. See you later. No blockers on my end. I think we had a really productive meeting today, everyone. Everybody, enjoy your weekend.

Starting point is 00:44:17 Next quarter, it's going to be big. It's going to be the biggest quarter of our lifetime. Absolutely. You've got to hit our OKRs. Yep. Well, okay, then. I'm a gentleman, another professional podcast from the stand-up. All right.

Starting point is 00:44:34 Ending it. All right, hold on one second. I'm going to end the stream. Thanks, thanks everybody for being around here. The TJ is back. Big, whatever TJ is.

The Standup with ThePrimeagen - How BAD is Test Driven Development?

The gang discusses Test Driven Development and laughs a lot. As usual ;) Don’t forget, ssh terminal.shop...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.