Coding Blocks - Boxing and Unboxing in .NET

Episode Date: September 28, 2013

This episode is all about boxing and unboxing. We discuss memory management, the pros (yes, there are a few!) and cons of boxing/unboxing, some of the weird side effects and how to you can avoid it wi...th generics and ToString methods. Download the episode on iTunes or Stitcher and make sure to drop your feedback […]

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, you're listening to Coding Blocks, Episode 2. Subscribe to us on iTunes, Stitcher, or your favorite podcast app. Be sure to give us reviews on iTunes. Visit us at codingblocks.net where you can find our show notes, examples, discussion, and more. Send feedback to comments at codingblocks.net. All right, so welcome to Coding Blocks. I'm Joe Zack. I'm Michael Outlaw.
Starting point is 00:00:29 And I'm Alan Underwood. And today we're discussing boxing and unboxing. Yeah, so we wanted to give you guys like a real world example. And Grand Theft Auto just came out, the new version. And let's say I wanted to ship a version or a copy of it to Michael because he's the outlaw. That joke never gets old. And depending on how I'm shipping it, whether it's UPS or FedEx or DHL or whatever,
Starting point is 00:00:53 the shipper doesn't really know what's in the box. They know how much it weighs, but they don't know it could be flammable or fragile. They're supposed to know if it's fragile. But in any case, the recipient knows how to use it and it's their responsibility to know. But the shipper just cares that they can treat it like any other box. They can put it on their planes and trains and trucks and get to the receiver and that's all they care about. And it's up to the recipient to know what to do with it. It's a little bit like our topic this week. We're going to be talking about boxing and unboxing.
Starting point is 00:01:23 I just want to give you a little background before we get started all right so that leads us into the stack and just a real quick overview of the stack is there is a very small piece of memory and i believe it's one megabyte by definition of of variables within a method that actually go onto the stack and and and i think michael's going to give us a little bit of information about the type of variables that go into the stack. Well, yeah. Okay. So we discussed it. What are the value types, right? I mean, we've said that there are different types like int, decimal, car, string, et cetera, structures, and enums. But what does that mean though? Right. So value types have a fixed size, which is to say that an int32 is always 32 bytes,
Starting point is 00:02:06 whether it's negative one or 32 million, it's always 32 bytes. And the compiler knows exactly how much space to allocate for value types, because the values are copied, as Alan just said. And so you can't store a reference type on the stack, because you don't know how big that thing is going to be at runtime. And so you have to store a fixed size pointer to that object on the heap and that's actually one of the reasons why arrays are reference types instead of value types yeah and so everything that goes on to the stack is actually a last in first out and if you've dealt with arrays you know of the push and pops or any stacks you have push and pop so basically every variable that comes into your method gets added to the top of
Starting point is 00:02:51 the stack via a push and then as the method exits then those all get popped off the top of the stack and you move out so the thing about these value types is when they get automatically added to the stack, you know at compile time, as Joe just said, so you know exactly how much memory needs to be allocated to that method at the time that it starts running. The compiler already knows that. And the other important thing about these stack variables is they literally only exist for the term of the method. So when that method opens up those variables are
Starting point is 00:03:25 allocated they're put onto the stack and then as soon as your method is over it exits and all that memory is deallocated is now available for the next method call referred to as the scope right refer to put that in normal the scope of the method right it kind of begs the question like okay so we've talked about value types but what does this mean for reference types right like where are they stored so joe kind of already threw out some information about uh reference types and and you know when he brought up the arrays and everything so where are they stored then why don't we discuss that for a moment so your reference type variables they have their data stored on the heap but there is actually a pointer that is stored on the stack that points over to that particular location and when this
Starting point is 00:04:12 happens these variables enter into what are called generations on the heap you have generation zero which are short-lived heap variables that get collected frequently by the garbage collector and generation one stores variables that live on there longer. So anything that kind of gets passed over a few times in the generation zero, those go down to generation one and get collected less frequently over time because they're more expensive. And the other important thing to know about reference variables is, unlike stack variables and the value types,
Starting point is 00:04:45 these things can actually live long past the scope of the method that they're in. So even though your method is gone and all the stack variables have been deallocated, your actual objects and values that are over on the heap, those things still exist until garbage collection comes and picks them up and cleans them out. Okay, so we've given an overview of value types and for reference types. And, you know, so what does this mean then are the benefits of reference types, right? So why don't we discuss that for a moment, right? So as Alan mentioned before, value types are known at compile time what the size is going to be,
Starting point is 00:05:29 and they can go ahead and be predefined what the allocation is going to be for that. But reference types can be dynamic in size. So we don't know what the size of that's going to be except for the pointer that's going to hold the address location. But the actual data, that blob, it can be dynamic and grow in size as needed on the heap so that's one benefit of a reference type over a value type right and that's really important because the default stack size is actually one megabyte and it's configurable but one megabyte is nothing compared to the gigabytes that your heap could be. And another big thing to know is you would have absolutely no such thing as object-oriented programming without this heap. Right, like imagine writing a word processor with only, you know,
Starting point is 00:06:15 ints and date times. Right. Okay, so here's the trick question then. What about nullable types? Where does that fall? Take a guess guess reference types value types yeah we actually had to google this one um so basically uh your your nullable types are actually instances of a nullable struct so they are actually value types so i'll be honest
Starting point is 00:06:40 like when we originally threw this one out there as a question amongst ourselves when we were just discussing uh you know what we wanted to put together for the show immediately in my mind i automatically assumed it was a i just pictured going back to like c day site just a void pointer kind of type like it would define that at runtime what the type was going to be but i pictured it was a pointer to an object uh that was on the heap i i was really shocked when we learned that it was a nullable struct. Yeah, and it makes sense because it's way more efficient.
Starting point is 00:07:09 But yeah, I think we all kind of guessed it was going to be an object and then... Well, I think it's like one of those scenarios where maybe too often we can kind of like pitfall, you know, fall into a pitfall where like we'll just assume like, okay, well, I think this is how I would implement this. That's probably how it is. And then you don't bother to look into it. And so it was really interesting to learn this detail when we actually did dive into that one. Yeah, I think part of it is when I hear the word null, I think reference type.
Starting point is 00:07:34 So usually those, not usually, but a lot of times those value types have some sort of default value, like int zero or booleans false. Yeah, okay. So we've given a basic understanding, right, as to the behind the scenes of how the memory is dealt with here. So Joe, can you tell us what is this boxing and unboxing thing that we're trying to describe here? Sure.
Starting point is 00:08:04 So to put it plainly, boxing wraps a value type inside of a reference type and stores it on the heap. So if you remember, we said that values or value types are normally stored on the stack because they're fixed size and we get a lot of benefits from doing that. So boxing wraps that value type up inside a reference type. And unboxing almost does the reverse.
Starting point is 00:08:27 It converts one of these wrapper objects back to the value type on the stack. Okay, but wait a minute, though. So it's not exactly the exact operation, right? So boxing is not... You said that boxing puts a copy of it over there, but the unboxing operation is not the You said that the boxing puts a copy of it over there, but the unboxing operation is not the exact opposite. The unboxing operation is just simply
Starting point is 00:08:52 getting an address to the value type that's contained within that piece of memory there. Yeah, but that all really depends on which source you choose to believe because Michael, I think they were using piece of memory there yeah but that all really depends on which source you choose to uh to believe because uh michael i think they were using what was it the uh the the bible yeah the c sharp
Starting point is 00:09:12 for clr apparently or clr via c sharp yeah so that's where they got their information but if you actually go to msdn and we'll have a link to both the uh and this. But on MSDN, it actually says that it's literally the opposite. You make a copy and you cast it. But, okay, so there is some conflicting documentation out there, right? But, you know, in our own conversations about this, though, I'm kind of guessing that in a managed code scenario, right? What the MSDN said is probably nine times out of 10 going to be right. That the copy operation does follow getting the address. But if we discuss an unmanaged operation, then you could kind of see where the two could be separate, where unboxing would just simply be getting the pointer to the
Starting point is 00:10:06 value type contained within but the actual copy operation would be an optional and separate step from that right so you know it kind of depends and i'd be kind of i'd be really curious to see behind the scenes like if it you know which which one is correct, right? Right, because right now it's basically definition differences. Also, Jeffrey Rector is the man. So he gets my benefit of the doubt. Is this our CLRC Sharp guy? It is. All right.
Starting point is 00:10:38 All right, so boxing and unboxing have a bad name. The blogosphere, Stack Overflow, forums, and other wretched hives the scum and villainy and there's a reason for that uh actually there are seven reasons for that seven deadly seven deadly sins or reasons yeah i kind of fell flat on that oh should we want to try it again. Well, I didn't. No, we're good. All right, we're just going. All right.
Starting point is 00:11:08 So number one, boxed values take up more memory. So you want to expand upon that? Sure. So we said when you're talking about the stack, a 32-bit integer takes up 32 bits. But when we're taking that and throwing it over on the heap, we actually have to add in the size of the pointer to that object in the heap, as well as the sink block index, which is used for locking. And so that 32 bit integer plus the 32 bit pointer, or, you know, could be 64 and 64 bits. And that 32 bit sink block index ends up being 92 bytes or sorry,
Starting point is 00:11:45 bits or a 128. So three to four times the size. That's a, that's pretty scary. So much for your light little integer. So number two box values require an additional read. Okay. As compared to on the stack like define that sure so a value on the
Starting point is 00:12:11 stack is just right there boom no pointer we know exactly the size of it we know exactly how to read it so 32-bit integer is just boom these 32 bits but once we move that value over to the heap then we've got to fetch that box value by first getting a pointer looking up that object on the heap and then getting it out of there so it just means an additional read or double the reads so number three your short-lived values actually clog the heap. Yeah, and that is true. Like if you just keep creating a ton of reference types, they will go into Generation Zero on the heap. And they could potentially, you know, depending on the size or the number of objects you throw on there,
Starting point is 00:13:04 you absolutely could grow that or fill up that heap pretty quickly. But also, generally speaking, a lot of things will be cleaned up fairly fast, but it is more overhead. Exactly. Also, reason number four, boxing and unboxing operations take extra time in CPU. Well, you mean because the operation required to allocate the space for the heap and then copy the value over and then return the address right exactly which then kind of goes back into our clogging the heap example too right or like if you were to imagine that scenario like how much longer it would take to process yep yeah just a little bit of extra overhead and speaking of extra overhead number five casting casting isn't free but it's generally considered
Starting point is 00:13:46 to be in that don't worry about it category of performance hits and it's probably the least of your concerns in any sort of you know real application but the real problem with casting is that you get no compile time safety checks so you need to either check ahead or be smote by invalid cast exceptions. I always worry about being smote. Smite me. So number six is implicit boxing. And this is one of the big ones. Yeah, this one's kind of cool.
Starting point is 00:14:23 Basically, an easy example is when you're doing like a string.format and you have value types nested in your string.format as parameters for the string. Like you might have an integer and a date time and a string in there. And if you don't specifically specify a.toString on each one of those, it's going to box it into an object and then get unboxed back into a string so literally then if i had in my curly braces zero and then i was passing in a number then let's just say it wasn't even in a variable just hard-coded number one yep you're saying that's going to get boxed in unless i were to call a two string that's correct it'll get turned into an object on the heap and then do and then it will convert it back to a string so that's an implicit one that most people don't even know about sneaky it is all right so uh number seven finally this is the big one for me they are almost completely
Starting point is 00:15:18 unnecessary generic solve most of these problems michael be talking about that in a little bit and uh we'll also be talking about some of the reasons that boxing is still around. But for now, just remember that boxing and unboxing is big, slow, ugly, sneaky, and largely unnecessary. Okay, so with that, let's get into some of the unintended consequences as they relate to interfaces, right? So in our previous episode, we focused on interfaces, right? And as it relates to this conversation, there's still some conversation left to be had, right? So interfaces are defined as reference types within the language, okay? And why are they reference types?
Starting point is 00:16:03 I would have thought they'd be value types i mean there's not a whole lot to an interface right okay so interfaces are reference types right because they they can represent objects and you're not you're not going to necessarily know that ahead of time until uh you know the developer has written his code and it's compiled okay so we're not talking about the interface as it's written. We're talking about the object that's got that interface applied. That's implementing it. Right, that's right. Yeah.
Starting point is 00:16:30 And so going back to the unintended consequences that are related to that, right? Interface definitions that have methods that take object, capital O, must box the value types in that scenario. So, for example, let's discuss the I comparable, right? Because it's an easy one to discuss. It only has one method to implement, which is the compare to method, and it takes in object. In that scenario, if I had a class, and let's just call it my class, and I wanted to implement i comparable compared to
Starting point is 00:17:07 and and i want i expected the users of my class the other developers that were using my class if i expected them the behavior would be that they would pass in an integer then that integer would have to be boxed every time they called that method, that compared to method. And furthermore, they might not even realize that they're calling it. And this goes back to Joe's point about them being sneaky, because let's say that they had a list of my objects and then they called an operation,
Starting point is 00:17:36 like a list dot sort that's going to then behind the scenes, call my compared to method. And every time that integer would be boxed. Yep. Right. Okay. So, so this begs the question,
Starting point is 00:17:50 like how can we avoid boxing? Right. And we can, we can, we can try to avoid boxing with generic interfaces. So, but before we get into that, we've mentioned generics a few times and it might be helpful if people actually understood what a
Starting point is 00:18:05 generic was and more specifically a generic interface okay so so a generic interface is is is an interface that's defined as um you know it takes in in t right so you got uh the the less than in t uh greater than right that? That's your interface type definition. Like when you see that, you know that it's a generic, and you're allowed to specify the type that your method is going to take in. That's a hard thing to describe in words. It's not a simple one. I feel like we as a developer community need to come up with a simple way to solve that one.
Starting point is 00:18:42 So you say that again. So it's basically class name, class name brackets and your type. Yes. It's not an upright brackets. Well, in this case, an interface, but yeah.
Starting point is 00:18:52 Yeah. Well, I was, I was being more generic than that though, right? Like, like if you're just looking at the documentation for the interface and you see I comparable less than in T greater than,
Starting point is 00:19:04 then that's saying that that interface that you're looking at in the documentation is a generic version of the interface. Now, when it comes time to actually implement the interface, then that's when you're going to say something like, you know, my class, colon, I comparable, and then within contained within the less than and greater than you're going to say the type so in 32 or i comparable string right uh to specify which type you're expecting that you're going to implement for the i comparable so just i mean real quick i think we kind of glossed over what generics really are and it's funny that they're called generics because, in my mind, they're more specifics than generics. Yeah, it's one of those things.
Starting point is 00:19:46 It's confusing. But when you say that you have a generic as opposed to having something like a list of objects, or in the old days there was a hash table. You basically had objects. You get to define what those types of objects are going to be. So when you're talking about generics and you have something like a list, you actually have a list of type my class. And so when he was saying something like list bracket my class, that's basically telling you that every item that is stored in that list
Starting point is 00:20:18 is going to be of type my class. And that is what a generic is. So when he's talking about the generic interfaces, that now means that you have an interface that defines the type that is going to be in that interface. So how does that reduce boxing? Okay, well, so by specifically defining the method signatures with the specific expected value types, right, when we implement our interface, we're not going to have to cast on that int example that I gave before. We're going to have a specific method for a compare to int.
Starting point is 00:20:55 And so it's not going to have to take that int and box it as an object that we then later might need to unbox, right? It's already going to be able to just pass the value of the int into it, right? So it provides some compile time type checking as well. So that's another benefit in addition to reducing the boxing. So you get the performance gain, but then you might even argue that you get the greater benefit of having the compile time checking so that as other developers are using your code,
Starting point is 00:21:26 they can be safe in how they use that. No casting. And yeah, so because it is the type that's expected in the method signature, it doesn't have to box it. So on top of that, generic interfaces can be implemented for multiple types, right? So for example,
Starting point is 00:21:46 I gave the MyG generic class example, I could have an I comparable implementation of compare to for both a string or an int 32, right? Well, with that, I mean, do you mean that your interface is just overloading those methods? So because we're implementing the generic I comparable, we can overload this interface method with the types that we want to allow. Right. So it's overloading it, but it allows us to be specific as to what types we want to, we want to take in so that other developers can use our code safely.
Starting point is 00:22:16 So, so let me give you an example, right? So let's say that I've got, I've got some class and inside that class, it has some memory, and then the set aside is for an integer, right? But I want to be able to let my class be compared to other instances of itself by a number, okay?
Starting point is 00:22:39 And I'm saying number specifically rather than integer or any other data type, right? So I could implement one version of compare to that could easily take in an int32. And then that way, programmatically, right, if you have some code and you want to say, okay, well, what's based off of some other data source, maybe you want to compare, you already have that int, you can pass it in and that's fine. But for example, if I also wanted to allow the string version of the number to be taken in, you can pass it in and that's fine. But for example, if I also wanted to allow the string version of the number to be taken in, maybe from the command line, if I want to prompt the user, for example, right, then I can implement a version of, uh, I comparable compared to as the string
Starting point is 00:23:20 type and have a version that would work for that right and so then it's coming in as its native type and not being boxed right right for the for the int version right yeah so there's a lot of benefits there to using the generic version of the interfaces over the non-generic interfaces if you have that option you do have to be careful though though, because some generic interfaces, some of the methods that you'll have to implement. So if the generic interface, for example, inherits from a non-generic interface, you might have to implement some interface methods that are still might include some boxing. But as a general rule of thumb, if you have the option to support a generic interface over a non-generic, you should go for the generic version, right? So earlier we mentioned that boxing and unboxing is almost unnecessary,
Starting point is 00:24:15 but let's discuss why there still is a need for boxing and unboxing. So, yeah, I mean, if you look in the pre.net 2.0 days, you had like, you basically had no choice. You had things like array lists and hash tables. And, and in those days, those only took in objects. So you were forced to box and unbox if you were using those, there was no way around it, unless you did fixed size arrays. And then that kind of killed that that purpose. And then you also have when you inherit other people's code whether it be third party code or if you just inherit a code base or you get a code base that you have to work in that has a lot of this stuff in there i mean unless you
Starting point is 00:24:55 have the time to go in and refactor all that stuff you're just kind of stuck with a little bit of boxing and unboxing also dot net does some stuff under the covers that still uses boxing and unboxing. For example, dynamics and reflection both make use of these techniques. Also, anywhere that you want to take param lists of both value and reference types, for example, console.writeline or string.format, these both take param lists of objects because you can use both value types like ints and date times as well as reference types like classes and anything else with you know to string which is everything so there are some uh some tools that we can use to detect you know if you wanted to like actually look behind the scenes and figure out what's happening, right, like once you compile your code. So IELTS DASM that comes with your installation of Visual Studio.
Starting point is 00:25:52 There's also another utility if you want a GUI interface that's a free open source project called IELTS Spy. Either one of these you can open up or pass in the path to your library or executable, and it can show you the decompiled version of that, or, yeah, the intermediate language version of that. And you can see, you can actually see the boxing and unboxing operations there. You can just simply search for the term, the word box and unbox, and you can see it. Right, and that's especially important because, as we mentioned, boxing and unboxing is sneaky. So it's hard to tell just by looking at the code when it's happening.
Starting point is 00:26:30 So speaking of being sneaky, I wanted to mention a little trick we found when we were doing some research. So as we mentioned, console.writeline and string.format both take in a string and an object's param list. So this allows you to do stuff like one example we talked about offline was 99 bottles of beer on the wall. So you would loop through and you would console.write line out this out. You would have a little token in there for the number 99 in a for loop. And unfortunately what ends up happening is that number as you count down from 99 down to 0 or 1 ends up getting boxed that many times.
Starting point is 00:27:09 What we found is actually if you call that toString method, that you avoid the boxing. And that's because the toString method actually exists on the value type. And so you can just skip the boxing operation, pass it a string, which is what it needs anyway for the writing, and voila, no boxing. I feel like you should point out a proper programmer would count down to zero. Yeah, I couldn't remember. I've never gotten that far down, and I had no bottles of beer on the wall. I don't think I've ever made it out of the 90s. Counting starts at zero.
Starting point is 00:27:42 That's true. All right, so that kind of leads us into our section for tips of the week so uh this week I I've got just a little one that's that's you know might be helpful for any of you out there trying to debug your programs you can actually assign labels to your breakpoints and then that way when you go in and you look at all your breakpoints you can actually see some human-readable stuff by you instead of some file out nowhere at line 165. You can actually see, hey, this is the break point to check this or whatever. That's beautiful.
Starting point is 00:28:14 Yeah, it's pretty nice. I wasn't even aware of it. So that's my tip of the week. All right, so my tip of the week is going to be a little bit different. I've actually had a few developers that have, over the course of the years, have stopped by and they'll see my workstation area and my Windows layout and they think it's kind of odd. It's like the Batcave.
Starting point is 00:28:37 So I actually, especially, you know, this is assuming a multi-monitor environment. But within Visual Studio, I like to tear off the pages and put that into another screen. So, for example, in one monitor, I'll have whatever primary piece of code that I'm looking at, and I'll allow it to have space to go more vertical space top to bottom. And then along with the solution explorer next to it, so I can see a list of files from top to bottom. But then in the second window, though, I'll have another set of files. And I might have one of them available that's just there for reference purpose. Maybe like if I'm working with someone else's class or interface type and, and I just want to know, be able to make sure like, okay, yeah, that's, that's the method that
Starting point is 00:29:30 I meant to call. But then, you know, underneath that, I'll also have like maybe my find results or my, uh, you know, like if I'm, if I'm actually going through the debugger, I might have the output window and the locals and what and watches and whatnot. Also on that that second monitor. So, you know, I guess basically the tip here is that, you know, use all of your available space if you can, because for me personally, I've gotten so accustomed to that type of work environment where I can just have a lot of information in front of my face without having to go searching for it. That when I,
Starting point is 00:30:06 I almost feel crippled when I, when I go back to an interface where like I have only one thing to look at and you know, so, so that, that's, that's, that's my tip. And real quick,
Starting point is 00:30:18 just, just so you understand when he says he tears one off, what he means is he literally clicks on a tab of an open file or whatever section he wants to kind of split off into the other monitor, and he drags it over there. So that's what he means, because when I first saw him do it, it kind of blew my mind. I was like, wait a second, you can do that without opening Visual Studio twice? Yeah, for whatever reason, that totally blew my mind. I'm spoiled. I've got dual monitors at work at home but it just never occurred to me to try and split up visual studio into
Starting point is 00:30:49 multiple windows and it's just amazing when you see michael working like that yeah and and i'm not over the years i've had surprising surprising to me at least but i've had i've had several people that have commented over the years about about the way I have my windows layout. And like I said, I've grown so accustomed to it that I just find it's the easiest way because I have a lot of information, especially like if I'm actually walking through debugging, right? I can see all my breakpoints in one window. I can have something else open for reference purposes only. But then I can see all of the output that's coming out. I can see the locals. It's just really helpful to me.
Starting point is 00:31:27 Yeah, it's sweet. Yeah, and my tip is I wanted to mention there's a really nice free decompiler that you can use from JetBrains. It's called DotPeek. That's spelled out D-O-T-P-E-E-K, DotPeek. A buddy of mine used to look up what Microsoft was doing in a few cases, and I'd just like to inspire you guys to try going out there and decompiling a few things, like maybe some X and A games, I don't know, and see what you see.
Starting point is 00:31:57 In your free time, should you fill up to it. I feel like there's a new game that just came out. Yeah, maybe you can decompile and recopile it. Have a little fun. All right, so with that, we will be putting up the links and the show notes up on our site. You can find those at codingblocks.net slash episode 2. We've discussed value types versus reference types. We hope you got a good understanding of that as well as the stack versus the heap.
Starting point is 00:32:21 We've discussed boxing and unboxing and that it's big, it's slow, it's sneaky, and it's at this point almost unnecessary. We have informed you of some unintended consequences as they relate to interfaces. But we've also reminded that they are still useful for things such as dynamics and reflection, methods that take objects. And we remind you to,
Starting point is 00:32:46 to avoid them with generic collections and tricks like the two string trick that we talked about on, on your value types. Yep. So that wraps up our show on, please do subscribe to us on iTunes or, or a stitcher or anywhere else, you know,
Starting point is 00:33:01 go up to our site and subscribe to our mailing list and leave us a review if you're on iTunes or Stitcher or any of those. It'd be greatly appreciated to help grow the show. And, you know, visit us at CodingBlocks.net. And any questions or comments or anything you'd like us to clarify or go into more that might help you along, you can get us at comments at CodingBlocks.net. And we'll be back soon with episode 3 looking forward to it

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.