Coding Blocks - Databases the SQL [see-kwuhl]

Episode Date: July 25, 2014

Welcome back for part 2 of the podcast about databases.  In this half, we discuss several of the things we believe that developers should know about databases.  From joins to unions, group by’s an...d indexing, we try to touch on a lot of the items that most developers should at least be familiar with when […]

Transcript
Discussion (0)
Starting point is 00:00:00 you're listening to coding blocks episode 14 subscribe to us and leave us a review on itunes stitcher and more using your favorite podcasting app and visit us at codingblocks.net where you can find show notes examples discussion and more and send your feedback questions and rants to comments at codingblocks.net and follow us on twitter at codingblocks or head over to www.codingblocks.net and you can find our other social links at the top of the page there. And with that, welcome to CodingBlocks. I'm Alan Underwood. I'm Joe Zach. And I'm Michael Outlaw. And welcome to part two. So dear listener, as you heard in the last episode, we ran a little long. I'm sorry, I spoke too much. So we decided to cut that up into two episodes, and this is the continuation of databases.
Starting point is 00:00:51 But before we get to that, we do have a little bit of news. So Michael and I actually went to a really cool event the other day, Build Guild in Atlanta. How would you describe that, Michael? It was just an opportunity to meet up with other technology-minded people and just have conversations around different technology uses. At a place with beer, video games, and amazing, amazing french fries. Yeah, okay, well, we could say that part too. But what was on the french fries? There was something special about these french fries. Yeah, I don't know, there say that part too. But what was on the french fries? There was something special about these french fries.
Starting point is 00:01:26 Yeah, I don't know. There was like some uber hot sauce stuff on there. It's got to be sriracha. There's got to be some sort of spicy mayonnaise. And fish. There was seaweed and then, yeah, the fish skin stuff. There was stuff definitely that didn't belong. That doesn't sound tasty to me.
Starting point is 00:01:38 It was like sushi on a french fry that was like really hot and spicy. And it looked like human skin. Yeah. But, yeah, I don't know. It was a really hot and spicy and it looked like human skin yeah but yeah it was really it was really cool meetup though i mean it it was originally well at least from like what i'd read about it i had thought that it was gonna be like just strictly web-minded individual you know 100 but you know there are people that were there that were you know uh were from a server admin background, you know, uh, and then there were some that were from, uh, like a PM role type, uh, some from, you know, just, uh, desk, you know,
Starting point is 00:02:14 desktop application, uh, type development. So there, there were, you know, you had, you know, all bases covered there. There was even a recruiter who didn't try to get anyone's name. Oh, I forgot about the recruiter. Really? Yeah, they were just really nice and normal. They just wanted to hang out and drink beer. He didn't ask for any cards. He didn't pass out any cards.
Starting point is 00:02:33 I was like, wow, this is the first. That's really bizarre. I don't know what that means. I think he was just there to hang out. So what was the predominant topic, though, of the night? Was there any one thing? I would say tales of woe oh really there's a lot of commiseration yeah yeah in what regard like you know people talked about troubles they had with you know like maybe mongo or um different pain points if whatever whatever technology stack you're using
Starting point is 00:03:01 name it yeah those pain points were discussed at one point so that that's pretty interesting because, I mean, that's fairly consistent. Then, like, there's nobody that's just, you know, basking in the glow of any one particular technology or no? Definitely not. Okay. No, and, like, all the tech, there was a pretty widespread, you know, in regards to, like, the different technologies used, different technology stacks, languages, whatnot. You know, there was pretty whatnot uh you know there is pretty much you know good representation there yeah it's kind of like when you go to a meetup and you end up like hanging out afterwards and talking a little bit or sometimes you go get drinks afterwards
Starting point is 00:03:33 it was basically that but the whole time so it's just really cool like talking with people who have similar interests yeah so it wasn't it wasn't like your you know your your normal meetup where you know you might go to to listen to a presentation on something or where you're trying to learn something. This was all about just more of the networking part of it. Tech talk, right? Yeah, just meeting with other people, networking. I hated it. I missed it.
Starting point is 00:03:55 I was supposed to go, and I wasn't able to. It was a good time. Yep. Awesome. Yep. Also, I just read this today. I haven't actually tried this yet, but we mentioned.peak a while back. It's a really nice free decompiler. And what I just read today is. I haven't actually tried this yet, but we mentioned.peak a while back. It's a really nice free decompiler. And what I just read today is that it can function now as a symbol server with version 1.2, which means that you can take your DLL that you don't have the source code for,
Starting point is 00:04:15 so maybe, say, some sort of game that you can get on Steam, and you can decompile it to get the code. And now, instead of recompiling and trying to run that, you can just go ahead and run the.peek as a SQL, as a server symbol. So you can actually debug that application while it's running with the source code that you just obtained from the compiled code. That's just amazing. That is pretty amazing. And kind of scary. Yep. Your code is not safe.
Starting point is 00:04:41 Yeah. And I imagine like, you know, 64-bit applications are a little bit weird but 32 you can change stuff you can change constants you can tweak stuff see what happens that's awesome yeah uh another thing i want to bring up that's actually about a month old now but in case anybody had not heard of this there is a site called code, and I believe it was www.codespaces.com, and it was like your GitHub or Bitbucket or anything like that, but it was a paid source code repository. So the reason I want to bring this up is not necessarily because it's new news,
Starting point is 00:05:17 but because the way, apparently, that they were taken down is they let somebody into their AWS admin, not knowingly, but they didn't have it locked it down. They didn't have it locked down with multi-factor authentication or anything like that. So somebody was actually able to gain access to it and basically hit the delete button on everything they had. Well, I think, I think there were a couple of takeaways from, from their experience. One of them had to do with, uh, you know, security.
Starting point is 00:05:43 So strong passwords, multi-factor authentication everywhere you can. I know I'm a fan. But also, too, from the developer point of view, another personal takeaway from that, though, was it's one thing to have a backup of your code, right? But you also got to think about like, well, what if that backup were to go down? So, you know, there is something to be said about having, you know, some resiliency there across your backups.
Starting point is 00:06:15 Yeah, because from what I understand, they were running completely on AWS. So they had backups, but their backups were from within their same AWS console. Yeah, but I mean like you, the developer, though. If Codespaces was your only place. Oh, yes, yes. Then, you know, so they say their rule about backups is to have one on-site and one off-site, right?
Starting point is 00:06:36 So, you know, if that's your off-site backup, as long as you had another one on-site, that might not be such a problem but if you are only relying on a third party party like um you know whether it be uh five uh was it frog creek um or or code spaces or somebody like that right like if you're relying 100 on them uh you know or god forbid if something were to happen to like a github well they're probably you know they're pretty big but you know what i mean there and you are responsible for your data ultimately so no matter who it is how big they but you know what I mean there. And you are responsible for your data ultimately. So no matter who it is, how big they are, you know, it's, it's your data. If it means something to you, you need to protect it. Yeah. And so that's interesting. I was thinking about from the opposite end,
Starting point is 00:07:14 if you're the person creating a service like that and you're trying to make backups, if you're doing it in AWS, you should probably have multiple different accounts all with MFA, right? So that you're backing up across different zones and different accounts or back up into another solution, whether it be an onsite, you know, set of drives or something that like you really need to think about whatever you're doing, making sure that you're resilient, especially if you have paying customers. I mean, it's dangerous, but security is definitely a huge thing.
Starting point is 00:07:42 So I do. I think I said frog. I meant fog. F yeah uh so so that was just something i want to bring up it's not necessarily like i said new news but it's still news that people are feeling so yeah you know and also actually just for just remember this i went to a really great uh owasp meetup that we have here in atlanta and they were actually talking about i didn't even know this existed, but there's an OWASP mobile top 10, which same thing like web applications that we talked about in episode four, except it's actually the top 10
Starting point is 00:08:13 vulnerabilities found on mobile apps. And there was a new number 10 this year, which was mobile code hardening. So there are actually a number of exploits and things you can do if you can decompile the code and put it up in like a free app store so um if you you know partake in uh of a like a jailbroken phone and you go to free um app stores because you want to get a hacked game or a free game or you know something else then there's a lot of bad stuff that you can do and you're going to see just a really great presentation there's a lot of bad stuff that you can do. And I got to see just a really great presentation.
Starting point is 00:08:47 I'll have a link to that so you can kind of see if that's something you're interested in. So I didn't get to attend, but I'm curious, though. Out of the top flaws that talked about, they might not have been specific. Maybe they weren't specific to operating systems. But I was curious as to, like, were the majority of them specific to any,, but I was curious as to like, were the majority of them specific to any,
Starting point is 00:09:06 like Android or iOS versus, you know? So the actual OWASP project looks at both, but in this case, it actually dealt with the binary after it was packaged. So they looked at the top 100 games, or sorry, the top 100 paid apps in both the iOS and the main main google android store and they found that 100 of the android apps had been uh decompiled and put up on these on a couple different um
Starting point is 00:09:32 free app stores and it was like 86 of the and the ios ones so it was more common in android but uh it was well okay not much better but you mean. Yeah, maybe I didn't say my question correctly. Because I wasn't referring specifically to the one about the, I think you said it was the 10th one on the list about the hacked app store. of all of the things to be aware of in regards to mobile devices, did the majority of them tend to swing to one particular mobile operating system over the other one? From what I gather, looking at the top 10, and I've only looked at the top 10 briefly. I really just kind of focused on the last one at the meetup. But it looks to be really agnostic, so they really don't talk much about one or the other. It's more about kind of the general purposes or problems that both have.
Starting point is 00:10:31 So like in OWASP, it's SQL injection, not like SQL injection in PHP or something specific. Yeah, it's interesting because you hear so much about how secure iOS is, as an example, compared to Android. vulnerabilities that they had mentioned was specific to a feature implemented primarily for iOS, you know, which was the bookmarklet, bookmarklet. Right. Say that bookmarklet feature. And if you guys aren't listening to security now, you should check it out. It's an awesome podcast and it's just fantastic. It's long too. So that's like an hour or two hours of my week every week.
Starting point is 00:11:24 Yeah. That's worth it. It's long, too, so that's like an hour or two hours of my week every week. Yeah. But it's worth it. Yeah, well spent. Yeah, so I want to give a thanks to Skinner MW. We got our 15th five-star review and write-up in iTunes, so thank you very much. We would love to have number 16 and 17, in the meantime i do want to give thanks to the skinner here and and i can't help but wonder like is that like really a name or is this like a you uh a call back to like principal skinner like you know what's happening there
Starting point is 00:11:57 yeah some people just get lucky with the last names as you well know. Yes. Me? Yeah, Outlaw, man. Outlaw and Apple. We always talk about Trent's last name. It's awesome. Tell it like it is, Jay-Z. Mr. Frank Underwood. Right. I'm so jealous. We also got a really nice comment from
Starting point is 00:12:22 Jim Basillo on www.cuttingblocks.net slash episode 12. And he had some – that's the one where we talked about kind of what we wanted to be when we grow up. And we talked a little bit about C Sharp versus Java. And he's just got like a really nice opinion on that. And he's using Java.
Starting point is 00:12:38 And he's done some.NET type stuff. But he's been using Spring Boot and is really liking it. And he's actually just got a really nice comment that you guys should go over to episode 12 and uh read and um i'd like to hear your side of the opinion too so yeah and and we're going to link to that in the show notes as well as this next one so you can either go there you can go to this episode and check it out uh before you go on the next one i did want to say that i really did appreciate the tone, though, in Jim's comment there because it wasn't, you know, so often, unnecessarily so, you know, there's this battle between, you know, languages like it matters. But, you know, he really took the time and put in some really well-worded response. Yeah.
Starting point is 00:13:26 That, you know, it really made you sit back and think, like, oh, yeah, those are some great points that he made there. Yeah. I mean, some of the points were, like, the link that he brought up that he loves in C Sharp, but that Java 8 now has lambdas. Like, he really did put a lot of time into this. And it's a fairly long write-back. So, again, it was an excellent write-up. I guess an overall premise to his comment there, though, was that for him, one of the things that he didn't like about working in.NET,
Starting point is 00:13:58 and maybe I'm restating this wrong, you tell me but was that um he he felt that um you know my you you were at the mercy of microsoft whatever they dictated that was the way it was going to go right and there it wasn't like the community decided things and that's what he liked about in the java world is like yeah you might have a thousand different options to implement some logging framework right but the community as a whole will filter through those until eventually you get to, well, here's the top one or two, and pretty much the community is going to standardize on this one. Yeah, and basically all the features end up getting implemented
Starting point is 00:14:39 into one or two that everybody loves. So, yeah, it was a really interesting take on that i mean but yeah i mean not i guess like my uh some summation of it though almost sounds like a disservice to microsoft because i do think they take some feedback from the community as well though so they do and i i'd say like generally speaking most people are pretty happy with a lot of the products that microsoft puts out because they're well baked, at least by the second or third revision, right? So, again, a very thoughtful post, and we all appreciate you taking the time to put it up there. Yep, thanks.
Starting point is 00:15:14 Yep, and then the next guy, Tron Anderson, he's actually left us feedback before and always very good feedback. And this one was in regards to the recursive table or query stuff that we talked about in episode 13 and he gave a link that we'll include in the show notes to where he said he implemented a closure table given uh this method and unfortunately none of us have actually read it we just saw the feedback a minute ago but we're going to put that up there and and we'll all be taking a read through it as well so definitely appreciate that as well tron thanks all right so let's get into the basics then let's talk about the types of joins so what do we got um my favorite is cross join
Starting point is 00:15:56 is uh i never get to use it and when i do it just it's kind of cool i don't want to show somebody like hey check it out i actually needed a cross join. So a cross join, for those of you out there, is basically a join that has no predicate. And it creates a Cartesian product. When would you want to do something like this? Whenever I want everything in one table mixed with everything in another. So if I wanted maybe all the different combinations. Can we just say it's like doing two select statements in one? So if I wanted like maybe all the different combinations of, you know, if I had like.
Starting point is 00:16:29 Can we just say it's like doing two select statements in one? No. Yeah, I don't know if I would say that. No. So an example that I was thinking of is if you have 24 hours in a day and you need a table that is going to say, all right, Sunday through Saturday, I need Sunday to give me 1 through 24, but I don't want to have to create this table on the fly. I don't want to have to write the code to create this table. You can do a Cartesian product where you say, all right,
Starting point is 00:16:53 select everything from this table, which is going to give you Sunday through Saturday, so your seven records, and then you're going to do a cross-join with a table that has the numbers one through 24 and so now you literally have a record for each day of the week with each hour right yep so you say cross join the order by day order by hour and boom yeah now you have monday one two three four five six seven eight nine ten blah blah blah tuesday i'm saying like you're still you're you're the two tables you're basically getting the results of the two tables in one combined set.
Starting point is 00:17:28 That's what I meant. But you can't really do two selects because it's not just the days and 1 through 24. It's every day 24 times. Yeah, yeah. You were basically saying, alright, everything in this table has to go to every one of these records in this table. Yeah, so it's
Starting point is 00:17:44 a Cartesian product. It's every possible combination from A to B. So that's what a cross-join is. And they're fantastically fun and can bring your entire database server down. Oh, and so what happens if you cross-join a table that has the days of the week and a table that has nothing in it?
Starting point is 00:18:02 Nada. I don't know if that's true. I'd have to write this, but I was wondering if I had my hour tables and there were no records in it nada i don't know if that's true i'd have to write this but i was wondering if like if i had my hour tables there was no records in it would it just show me monday to you know the sunday through saturday or would it just be no i really don't know i don't think you get anything back yeah i don't know i kind of feel like you would because to me like i always thought of um cross product as being like the or cross joins being like you know the kind of granddaddy of all the other joins and the rest of them just kind of deal with with the the where cloud or the on clauses uh but now i don't know i'm gonna have
Starting point is 00:18:32 to google this one while we're talking doggone it i was actually gonna type in it but i don't have a sql server up i'm running right now all right so so that's one uh how about you michael what's your favorite i just do a normal join. I don't do anything crazy like that. You don't type inner or anything? No, like rarely. No, I just type in a plain join normally. Explicit is best.
Starting point is 00:18:55 Is that bad? Oh, come on. No. I actually got into a heated debate with somebody about typing inner versus outer. I don't care. Leave it off. I don't want to type as much so which way so michael's talking about an inner join yeah i mean in like normally if i need to do anything crazier than that i'm already like whoa wait what am i doing oh come on let me rethink this all right we'll revisit but what's an inner joy i probably i probably don't even need to bother
Starting point is 00:19:22 so so what's an inner join get you uh great i didn't know i was gonna be tested oh yeah come on so so well that that's where you are providing a filter on it so uh a predicate on it to to join one columns values specifically on the column from one table specifically on the column from another table and you're only getting the results where those two match. So it basically throws away all the records where there's not a direct match. Yep. So that's your inner join. And then the other type of join is an outer join.
Starting point is 00:19:59 And you have your left outer joins and your right outer joins. And personally, I hate right outer joins. They do the same thing but if you're mixing that in with your left joins and you're really just going to confuse people just swap the tables around yeah yeah seriously make them all left joins well actually there's a third type of outer join there across the full outer yes i was about to bring that one up and that's the one that i was talking about earlier where uh you know if you've got one table of stuff no one doesn't this one will bring it back across join will not. So in that example where you have no hours, but you do have the Saturday through Sunday, you're going to get nothing from
Starting point is 00:20:31 the cross join, but you're going to get the days for the full outer join. Good call. Yeah. So in the, so to understand the full outer join better, let's talk about the left join or the right join on a left outer join, which I write left join because it's fairly obvious but a lot of people like to write left outer what that is is if you have records in the first table let's say one two three in the right table let's just say that you only have one if you say left select star from a left join b on this column you're going to get all one two three back from the from the left table but in the right table because you only had the one in there you'll get one back where it meets with the one from the left table but the other two columns will be null
Starting point is 00:21:18 or the other two rows will be null because there was no record that matched so a left join basically says give me everything in the left table in this expression. And if it's not in this right table, just leave it null. But if it had a record, go ahead and fill it in with the data that was there. So that's your left join. The right join is the same exact thing except flip it around. It's going to grab everything from the right table
Starting point is 00:21:37 where your records exist. And then anything in the left table that didn't match is just going to leave null, but it will give you a back of record and so the full outer join that the joe just mentioned is kind of the combination of the two anything that's in the left table is going to return you back if it's not on the right table it'll give you a null value for that record and it'll give you back everything in the right table and anything that was in the left table that didn't exist for it will give you a null value. So you get the full complement.
Starting point is 00:22:08 Yeah. And speaking of weird stuff, what do you guys think about normal forms? I know what normalized means to me. If I say that a table is normalized, it means to me that I basically got rid of anything that would be duplicated in that table. So an example would be if we've got an employee table and a manager table
Starting point is 00:22:29 and the manager's got an employee ID and employee as a manager ID, that would not be normalized. That's an example where these things have a reference we need to update in two places whenever you change that relationship. But I know there's first normal, second normal, third normal forms, and that's where my knowledge just stops. Oh, yeah. Yeah, that gets like getting into those levels of normality for databases.
Starting point is 00:22:52 It seems like it goes into things like data warehousing where they break apart like a date time field. And so it gets kind of ridiculous when you get into first normal form. Like they'll break it apart from the hours to the minutes to the seconds those will be three different fields in a table and so there's places for that but i think third normal form is what most people work in right right uh where the quickest example i can think of is if you had something like a product and you have a product type column, right? If in, in, in newbie situations, when we all first started, we probably had, you know, this is a motherboard. And so in the type column you would have motherboard listed for every single row, right? One or zero, right? Yeah.
Starting point is 00:23:38 So the way the, the, you could either do that or you might've just typed in motherboard for the type where in a normal form you would move the product type into its own table, give it an ID. So motherboard would be type one. And so then in that product table itself, for product type ID, you would have one for the motherboard, right? So that's when you start normalizing data. You're basically trying to get duplicates out of the table that aren't searchable duplicates.
Starting point is 00:24:07 Yeah, if you really want to hurt your brain, you can go into Wikipedia and look at the database normalization there, and they go all the way down to like the sixth normal form. Wow, too much. Yeah, beyond that, I was like, all right, you lost me. I'm out. I'm done. And this is where like you know the guys that are really interested in data in databases you know this is where like
Starting point is 00:24:30 they're all into it versus the guys that are you know more interested in in coding aspect that that's where you know we are is like okay we're done yeah and there's a there's actually a guy uh i follow on twitter named um well his twitter handle handle is SQLKris, S-Q-L-K-R-I-S. And he actually has been really open when I've had questions. So he's let me bug him, so maybe he'll let you bug him too. I'm kind of nominating him here, so sorry, SQLKris. Well, he put himself out there. He's got a blog on database stuff.
Starting point is 00:25:01 All right. Now, with that, I will say a lot of people will say oh everything needs to be in in normalized form or whatever uh that's true to a certain degree if you're writing reports it makes a lot of sense to create pre-compiled tables for reports and those are not going to be normalized you're bringing all the data in so that you can basically just spit it back out without doing a bunch of grouping and all that kind of stuff so you know in that regard yeah normal is definitely above my head well when you get into like the sixth degree of separation you know i feel like kevin bacon is going to jump in but you know one thing I hate is talking about motherboards
Starting point is 00:25:49 all of a sudden your data gets footloose you'll see an is motherboard column and then right next to it you'll see a motherboard type next to it and then the table will have stuff other than motherboards so for like three quarters of those records those motherboard is zero and the motherboard type is
Starting point is 00:26:06 null right and so you end up just getting this really junked up table and that's an empty string in sql server and generally speaking that's bad design yeah yeah that's actually really bad design like if the whole purpose of designing tables is you want you want to create a table that's almost always going to have a value in every single column, right? And that's when you know you've mostly got normal form that you need to be in, at least from a transactional point of view.
Starting point is 00:26:32 If you have this situation where is motherboard, is CPU, is memory, then you've probably designed it wrong and need to go back and take a look at that. Yeah, but refactoring databases is hard because you usually have to change the code around the same time and if you've got an application that's up 24 7 then you got a problem yeah refactoring a database is not easy at all from the database perspective it is right for all the applications that work with it you got to coordinate yeah it can be it can be a real job you got to coordinate well yeah i mean it almost begs
Starting point is 00:27:05 the question of uh i mean this wasn't really the topic per se but in regards to like abstracting the data away from your application what's the best route to go there then so that those database changes won't have the impact on your code oh that goes to application layers right like your dal and your maybe backwards compatible well i mean like as an example um i mean this was many years ago i was on one project in particular where but wait you're only 21 right yeah okay well i mean i was in high school thank you for mentioning that yes jerk so okay so it wasn't that many years ago um but we were on a project and uh as a as a design principle i don't even know who made the decision but it was decided that uh you know access
Starting point is 00:27:56 anything that you were going to do to the database whether it be uh you know any kind of crud operation regardless of the type it was going to be done through a stored procedure. And only the stored procedure would know the intimate details of the database. So that's one way that the abstraction was dealt with in that regard. So if it came time to refactor your database, as in the example that you brought up, right, then as long as the store procedure takes the same inputs and returns the same output,
Starting point is 00:28:27 you're good. Yeah, and that makes a lot of sense, actually. It's so annoying to work with, though. You update the table, you update the proc, you update the code, repeat. But you don't update the code. You don't update the code in that regard. That was the point.
Starting point is 00:28:38 If you add a column. Oh, yeah, okay, fair enough. Well, okay, but... Yeah, that's true. Yeah, I guess it's just one more step, not a big but i will say the only place where that fails and we've had discussions around this before is if if your database or if if the data for your application depends on two sources of data then that may not work so it's pulling from one database and maybe another database somewhere else you know it's pulling from a an hr type thing a people soft database or something and then your application database
Starting point is 00:29:08 you can't put everything in the store proc unless you set up link servers and all kind of stuff but well i mean okay so so excluding the the you know multiple database kind of weird scenarios which then might multiple server other questions like yeah you know well maybe you shouldn't be doing those type of uh queries um you know another maybe you shouldn't be doing those type of queries um you know another way just you know thinking off the top of my head that you might be able to do this to solve this abstraction layer would be through views the use of views so that only the views would know the true um if it has structure of the tables behind it if it has access to the external server yeah okay well, you're bringing up that weird example.
Starting point is 00:29:45 Yeah, yeah. But I will say that brings up another thing. Like you can join tables between databases on the same server. And quite often people do that. Whether it's a good thing or not is another question. It's a good thing, though. Yeah, you should probably have schemas within the same database, but you can join tables
Starting point is 00:30:05 from multiple databases i can drive a car with my feet doesn't mean that i should do it that way right i kind of do i mean i'm talking about the steering okay fair enough yeah you know actually i'm just talking about database design made me um think about something uh well you know we didn't really talk about cardinality much but there's this kind of this notion in database design of like one-to-one and one-to-many. And a one-to-one relationship would be something where my employee table has a manager ID field. There can only be, an employee can only have one manager.
Starting point is 00:30:34 And there's this notion of a one-to-many where a manager can be associated with multiple employees. So what I think is an interesting question is, what about one-to-one relationships where I've got a table and I've got three or four fields that are maybe even one that only applies to certain situations like to say that motherboard description. So to me, I kind of want to have a table that's like product motherboard information or something. And it's got a product ID and the motherboard description in there and and it's associated with the product. So the product can only have one, and theoretically I could put that column into the table,
Starting point is 00:31:11 but I don't like to do that because to me it's a separate logical entity, but I know a lot of people have different opinions on that. Yeah, that's interesting. That kind of came up. That was something I asked about at lunch the other day is that exact topic. So essentially what you're talking about is the equivalent of having a subclass in a programming language. So you have a, you have a product class and then you have a motherboard, which is a subtype of the product class, right? So in SQL, you're talking about having a product table and then a subtype of motherboard right so you have a separate motherboard table that one's interesting and kind
Starting point is 00:31:52 of hard to i i can't make any great arguments for or against it performance wise it would be awesome because you literally have a small subset of records the good thing is you break it out into a separate table you don't have a bunch of nulls in your product table for the motherboard type right right and you avoid these tables that have like 70 columns which are probably harder to do other stuff with okay but here's so this is what gets back to my article i wrote where this is the exact direction i was thinking about was, okay, so do you do that? Do you have a bunch of subtype tables, which by the way is going to be a Royal pain in the butt because you're going
Starting point is 00:32:31 to have to manage it all through your application somehow. So if you create a new subtype, you're going to have to have it generate your create table scripts, manage that stuff. So that if anything changes, your application needs to then be able to go in and modify the database stuff, generate your create table scripts, manage that stuff so that if anything changes, your application needs to then be able to go in and modify the database stuff, generate your views, generate everything so that it knows how to get this data. Right?
Starting point is 00:32:52 Yep. So the other alternative is what I had in my article to where you create a characteristic type table so that you have a product table and then you can assign as many attributes to it as you want. You can say it is a motherboard. All right. It it is and then i can add a description for this and so you can create as many as you want but now you're creating a bunch of rows to fill that need that you were talking about just for having a motherboard table yep so we've described scenarios where you have one table that goes wide with a lot of columns yep we've described scenarios where you have small you have uh in the in the example that you just gave where you you're select for
Starting point is 00:33:32 the product might be a long list yep of attributes so it goes it goes uh long instead of wide and number of rows in the in regards to the of rows. And then you have the other scenario, which is the Joe provider, which is short and short. And separated. But a lot of tables. A ton of tables. Which seems like it would become unruly. I think that's kind of like what WebSphere and those things do, though, right? Like everything you do generates like its own set of tables to manage them so not exactly not completely no okay i've heard i wouldn't i
Starting point is 00:34:11 wouldn't i wouldn't phrase i wouldn't put websphere in that same light i'm not certain but it does really so what he brought up was kind of the whole struggle i was having with with designing something was the what what jo mentioned, having a sub-type table. Performance would be off the charts. It would be fantastic because you're doing simple joints. Maintainability. That and you would also have like, you know, there would be a small amount of data in those tables too. But yeah, as you were about to say, I don't want to steal your thunder, but maybe I will.
Starting point is 00:34:46 Yeah, it just seems like it would be a nightmare to deal with that. And as soon as a new product came out, well, it's slightly different. So is that difference enough to warrant this new table? Or do you mix it in and just know like, okay, that's the one special case? Because that's the danger. Because as soon as you introduce that, okay, that's the one special case. And cause that's the danger. Cause as soon as you introduce that, well, this is the one special case,
Starting point is 00:35:07 then there's always going to be a second. Okay. Well, we'll have two special cases. Slippery SQL soap. Yeah. And so, yeah,
Starting point is 00:35:14 I mean, I'm really not a fan of, of doing it that way. It's, it's tough, but here's the downside to the way that I talked about the characteristic way. If you do that, it is infinitely scalable as far
Starting point is 00:35:25 as the kind of attributes you can add to a product. However, your query performance is going to take a major hit way on down the road when you have 10,000 products in your database and each one of those products now has 20 attributes. So 10,000 times 20, you know, you're up into the, uh, what, 200,000 realm records. And that's not even that much but that's where it all kind of gets hairy that is extremely flexible that's if you're trying to do you're trying to do a scenario where you want to get back every product with every one of its attributes right not even that's where that's where your 200,000 came from though so let's say that you do some sort of query in your application to say, all right, give me all the products that are under this category.
Starting point is 00:36:06 So now you got back, let's say, a thousand products and each one of these has 20 records associated. You just grab back 20,000 records. What really should have just been, you know, a thousand. You know what I'm saying? So you either have to creatively do this through your application to where you can get back these attributes and form them some sort of way to where you're not dealing with 20,000 records versus it. And again, that's why I wrote the article that I,
Starting point is 00:36:32 that I started putting together was man, it's real sweet. And I could write an interface quick, or I could write a user interface real quickly to deal with attributes. I don't think I could quickly write any kind of user interface to do the multiple table thing. I don't even know how I'd do it and make it workable.
Starting point is 00:36:54 So, I don't know. It's a dangerous, dangerous one. True. Well, we talked about joins a little bit. There are also some set operators which are kind of similar a little bit except different. So, the one I was thinking about right now is basically union. little bit um there are also some set operators which are kind of similar a little bit except um except different so the one i was thinking about right now is basically union and i was thinking
Starting point is 00:37:10 that's almost like uh you know we talked about horizontal and vertical it's it's kind of like vertically joining two data sets so it actually like appends the rows from one into the other yep there's also union all i'm not really sure the difference between union union all i just know that i always use union all okay so here's the only difference between them if you use union by itself it does a distinct if you do union all if you have let's say that you had michael alan and joe right and then you had another table i have michael alan and joe there we go yeah so you union that table to another table you're only going to get by michael allen and joe if you did a union all you get michael michael allen allen jojo so that that's the only difference union will get multiplied i wonder yeah i wonder which one
Starting point is 00:37:55 is more performant oh geez i wonder well i would think that the one that's distinct because it's less stuff right but uh you know i don't know. No, actually, it depends. So here's the deal. With union, if you have a very wide data set coming back, think about it like this. If you have 50 columns coming out of that and you do a union on another 50 column, it's essentially doing a group buy on every one of those columns.
Starting point is 00:38:21 So it can be really expensive. Now, depending on what you're doing, the union all may not be as effective because you might be getting back a million records as opposed to a thousand if there were a bunch of duplicated data between the tables. So it depends on the situation, but typically, generally speaking,
Starting point is 00:38:42 if you have a very wide data set with a lot of columns the union is going to be less performant yeah i just um looked up um sql authority.com which is an awesome website for i'm sure if you've done any sort of googling around sql you've been to this website and you'll recognize the guy instantly oh my god i was picturing it in my head yeah i've seen that guy's face right his name is penale dave is this the site that you're going to be referring to and it is yeah and uh he actually uh he's got a little thing here about union you know and he does that union all is faster which is kind of funny we've actually talked about this problem before of like finding distinct elements in two
Starting point is 00:39:17 different lists and we kind of came up with a hacky way or someone came up with a hacky way of converting the list to a hash table and then looping through that as a way of getting the actual algorithm complexity down but it looks like it isn't doing that particular algorithm so maybe there's a you know efficiency patch we can submit to sql server here i doubt it so a couple other set operators are accept and intersect and accept is kind of interesting because what it does is it says give me everything in table a except anything that shows up in table b so if if you have again michael alan and joe in there and and michael's the only one in table b we're basically just going
Starting point is 00:40:03 to give you back Alan and Joe. Because we say, we don't want this in the data set if it was in both tables. Intersect says, okay, no, just give me everything that is in between the two. And this one's interesting because intersect is literally the same thing as joining on every single column. If you're going to write a join, essentially, it's, you are basically saying, all right, everything that was in table A and has the exact same data in table B, give me back just those rows. So in the previous example, where all three of us were in the, in the first table and Michael's in the second table, if you did that intersect, you're
Starting point is 00:40:42 only going to get Michael back because he was the only one that was in both so those are ways of doing set base operations to get data back but it does kind of make you want to beg the question of like so okay let's let's focus in on the intersect and the accept right because these sound on the surface like operations that were going back to the left join right join type operation so it kind of it makes you want to beg the question like wait a minute why am i not just sticking to the join syntax why am i bringing in something else the only reason i can give you for that is you don't have to write all the join predicates that's really it so and and you're also the other good thing is you're not having to say okay give me first name from table a only not so if you did
Starting point is 00:41:33 a select all and you did the join you're going to get the duplicated columns in your data set in columns right so you'll have first name from table a and first name from table b if you do intersect you're just getting first name from the two tables that were intersected together. You're just getting one column. And then the other thing is, let's say that you have first name, last name, age, whatever. If you're doing the join, you're going to have, say, where A dot first name equals B dot first name, and B dot last name, or an A dot last name equals B dot last name, and A dotage equal b.age. If you do the intersect, you just say, hey, select first name, last name, and age from table A. Intersect, select first name, last name, age from table B.
Starting point is 00:42:13 Done. It gives it back to you. So your syntax gets a lot shorter, and you're not having to write all the join predicates. That's it. Yeah, interesting. I never really thought about it that way, but yeah, totally. Maybe that's why I've never written
Starting point is 00:42:28 in accept or intercept. Intersect. But Alan does it in all of his CTEs. No, I don't do those in CTEs. Just kidding. All right, so then let's talk about some aggregation. This is awesome because this is going to come up in every single job interview you ever have that deals with SQL.
Starting point is 00:42:49 It really will. Here's your golden nugget for the day. So what's the difference between a having and a where clause? It's kind of hard to explain, isn't it, in words. So when I think about having uh i know there's got to be a group by and group i kind of aggregates my data um like how do you even how do you explain a group by that's pretty much it like the answer is a having has to be used with a group by it happens but a group by doesn't have to be doesn't have to include a having doesn't have to have a
Starting point is 00:43:24 having but it's the only time you can use one as if there is a group by yeah so i can tell you when i use the having you know if i say select star from sort of table or you know select some column and comma count star i group by that column and then i say having count greater than one or two or something like that having's like a where clause for the aggregation though so you're typically doing your where on like you said count star greater than one or a max something less than five who knows you know but yeah that's all it is there's a where clause using your aggregations so what is the group by like how do you explain what a group by does uh a group by magic it's like you know when you need it it's one of those things. Like the final countdown starts playing, and you see a rest of development in the background.
Starting point is 00:44:12 Magic. I know for me, I think if I've got a table full of users, and I want to know how many Johns I have in that table, then I would do group by first name, and that's going to give me a distinct list of all the names because I'm grouping by all the values. So basically squashes all those johns into one john and then you do a comma and count star and it'll tell me how many records were squashed it'll tell me i had 77 johns yep that's a lot of bathrooms so the group by is really just a way of aggregating
Starting point is 00:44:40 information right i mean that's and to to say that just flat out probably doesn't make a lot of sense but you'll know it when you need it yeah exactly i mean you will come across the situation where you want to do exactly this and you'll remember this podcast and you will do it if you just go back to the standard math though uh your averages your maxes your mins your uh your counts that all those things can be accomplished with a group buy but you have to choose the columns that you want to do so like he said if you want to count the number of johns in there cool do that if you want to if you want to find out who has the max salary in a table you know you just say select uh actually let's say uh maybe you wouldn't just want to do the
Starting point is 00:45:24 table because you just did select max from the table, right? If you want to do it group by name, though, you could possibly find out which name has the max salary. You can't group by star. I feel like you should be able to group by star. I know everyone's in here has been in a situation where they group by everything. Well, no, the example, the example that I, that I find more frustrating when I want to use the group by though,
Starting point is 00:45:49 is that like, maybe I want to do a select star and then just say, but group that star by this field. Yeah. And that's the scenario that it's like, Oh, but now I have to go and specify everything else before the group by works.
Starting point is 00:46:01 Yeah. If I'm joining table a to table B and I'm counting the records that are associated in table b i should say group by a dot star am i wrong it would make life a lot easier yeah yeah so the group by is just that and you're having is what we said your where clause for your aggregations well the group returns it and and it makes the select a smaller set of rows. Yeah, like you said, it squashes them together. The having is just a predicate on the grouping to filter that result to a smaller set. By aggregations, yes. So if I can use group by to squash something, then why use distinct?
Starting point is 00:46:42 If I could say, you know, equival know, this SQL statement is the same. Select distinct first name or select first name group by first name. Those both do the same thing. However, if you wanted to get the count star, then you have to do the group by. So it's better to group. It's more
Starting point is 00:46:59 flexible. It's more flexible. Less maintenance. But just to go back for a moment, having that you said, you didn't have to use aggregation. That's what it's more flexible. Less maintenance. But just to go back for a moment, having, though, you said you didn't have to use aggregation. That's what it's used for. Because the thing is, you can use a where clause also with your select. Okay. I see where you're saying, like, useful for,
Starting point is 00:47:20 but my point was that it doesn't have to be used for. Yeah, probably not have to be, but that's typically what you use it for. Your where clause will narrow down your data set, and you're having filters out your aggregations yeah it's like so like uh if you want to find something where you know there's more than two rows that have or um while going back to going back to maybe your manager if you want to find all managers that have more than five employees that's an example where like you know i've done a having what was where i'm trying to find like things that have more than x number of something so having count star greater than five yeah i mean that's an example but going back to what you said with the cat with the distinct and the group by there are cases where you want to use count distinct
Starting point is 00:47:59 right right so uh if if you ever find yourself in a situation where you're like, well, I want to know the total number of unique names in the database as opposed to the total number of Johns, I want to know the total number of unique names, you would say count distinct F name, right? Yeah, it's funny. I think of distinct being like whatever I see. Why you got F somebody's name?
Starting point is 00:48:20 That ain't nice. Whenever I see distinct in a query, I automatically think that they're covering up some sort of mistake. So a lot of times you'll see people write a query and they'll have a join that's messed up somewhere and so when they're testing it they'll see two rows come back when they expected one and like well i don't know what's going on sequel's got a bug and so they put a distinct on it and it fixes it i'm pretty sure that's what that word is for though unfortunately i think you're probably right on the money with that it's the one magical sequel keyword that fixes all your queries problems yeah so whenever you see something weird just put a distinct in there and move on i had five rows
Starting point is 00:48:56 now i have one it worked yeah all right so magic all right the next thing oh dude one thing that drives me crazy is the different methods of getting things out of different databases. Like by far, my favorite row number feature exists in Oracle. You just use row num. Every data set that comes back, it creates its own virtual column that's row num. So if you want to say, hey, give me all the rows. Paging is a perfect example in applications, right? I want products
Starting point is 00:49:25 20 through 30 where row num between 20 and 30 done you're done you don't even think about it anymore sql server since 2005 added the row number function which essentially will give you the same type thing is not quite as pretty but you can do some sweet stuff with it. What are you doing with Ronum? Huh? You're doing some CTA stuff there, aren't you? No, no, no. CTA? CTA.
Starting point is 00:49:52 We're on a whole other level. Sorry, it's past 10, man. I just started going. Joe's about to fall out over there. No, no. So with Ronum and Oracle, it's just so easy. No, but he sounded like he wanted to do paging there. Yeah, yeah.
Starting point is 00:50:05 So think about it. You're on Google, right? You're on page one. It gives you results one through 10. like if you wanted to do paging there yeah yeah so so think about it you're on google right you're on page one it gives you results one through ten well you want to go to page two you just say hey where results were you know or were row number between 10 and 20 or 11 and 20 let me say my sequel you do like the limit 0 500 or limit 501 actually my sequel is disgusting i found a site that actually showed the hacks for it. It actually made me want to cry. I like that. I always get confused about which one's the start. Is it like start and how many to go? Or is it start and end?
Starting point is 00:50:34 I always got confused. No, no. That's to get back the numbers. But if you want the actual row number, so what you're describing works fine for getting back rows, you know, 20, 30. Why do you need the row number? But if you want the row number to show it on the page, good luck, my friend. It was nasty. It actually, I think I closed the page immediately.
Starting point is 00:50:53 I was afraid it was going to break my computer. Yeah, I would figure that out in code. It was bad. It was really bad. I actually used to have a code a long time ago when I first started where I would actually, it dealt with MySQL or SQL Server so, you know, it's my own custom ORM and I would actually, if it like
Starting point is 00:51:10 if it's MySQL, then do the limit if it's SQL Server, do the top Yeah, but I mean where Alan was going with that though, he didn't I don't feel like he did it, Jesse, because he started to go off when he started to say like when SQL Server introduced the row number function so really what he was talking about is the ranking functions that were added in the sql server the windowed
Starting point is 00:51:28 functions oh man i like i feel like this is getting back into like well why am i doing intersects again like because then you get into row number you get into rank you get into dense rank quartile it's like wait a minute hey we've made some magic happen with some dense rank and some row numbers. Yeah, but, you know, like when, okay, so there's row number, there's rank, there's dense rank, and there's intile, right? Yep. Ooh. You definitely need to Google that every time you use it.
Starting point is 00:51:57 Dude, I love me some dense rank. Yeah, I mean, like, seriously, we can put a link in the show notes to this one, though, because Microsoft has documentation specific, obviously, for SQL Server where they describe the purpose of the four different ranking functions. And it's just gross. And as Alan has mentioned, there have been some times where, you know, you need some of that magic. And so you sprinkle a little bit of that fairy dust on your code and then magic happens. And you're like, wow, thank you. Unicorns are flying everywhere.
Starting point is 00:52:27 Please make me never have to change this again. But just know that you just sold your soul to that little troll underneath the bridge and he's waiting for you because you're going to have to go back across that bridge to get home. Yeah. If you ever have to look at it again, you will literally have to reread exactly what you did in the first place. So let's move on to the next one. Varkar versus Mvarkar. Who put that in there? No, wait a minute.
Starting point is 00:52:57 Now, if we were talking, hold on. So I want to go back to when you guys corrected me earlier. Because if we were talking about the type, the data type, would you say car or would you say char? I said car. Car? Car.
Starting point is 00:53:11 Really? Car. Do you say character? No, hold on, hold on. Do you say character or character? What does car stand for? That's the thing I got here in. It's a char. I don't know i just okay whatever well that backfired moving right along i was wondering where you're going with that because it didn't
Starting point is 00:53:35 support anything you were trying to get no because forget it so i feel like i've actually looked this up a thousand times and thankfully stack overflow exists now so you can kind of get the canonical answer but i feel like every time i read it the answer is basically like well you really should use nvarchar but no one does we're going back to the database topic yeah so nvarchar can store unicode is basically the deal which is important if you need it but it just seems like most applications i end up working with don't need it, which is kind of weird, I guess, sticking to the USA or something. Yeah.
Starting point is 00:54:12 If you're working in English only, then VARCAR is enough. America? But the reason why most people don't use in VARCAR is because it eats up twice the space. Right. So, yeah. But VARCAR is already variable, so it's pretty good about space, right? Well, but it depends on the character sets
Starting point is 00:54:32 that you're going to be writing in there, though. So if you needed to support internationalization... You have to have Invar-Kar. ...then it's not up for discussion. Right. But every character... He's Invar-Kar. But every character stored in Invar-Kar eats up twice the space. Right. But every character stored in an Envarchar eats up twice the space.
Starting point is 00:54:47 Right. So, yeah. I mean, it's decent on space because it's variable, thus the V. But, you know, if you don't need Envarchar, you shouldn't use it. I think, like, Varchar is probably the most common data type in databases. Yeah. Well, because we're all lazy and we don't want to type in one extra character. And you'll know it if you accidentally do car
Starting point is 00:55:09 and you end up with all this white space because it's a padding. Char? Yeah. Only use car if you're using one character. All right. Yeah, that might not be the right thing to say, but whatever. The next one we were going to talk about were goods as keys yeah it's really convenient and if you've ever worked in a database in dev and a database in
Starting point is 00:55:32 production and you end up having some sort of status id and in your code you're like if status id equals five you know maybe you're being good pulling it from a config file and then you go and you insert that data into production and lo and and behold, that 5 is now an 11. So it's 5 on dev, 11 in production, and it's just nasty. So it's a pain. And you don't really run into that problem when you're dealing with GUIDs. However, the problem with GUIDs is they don't sort. There's no sortable thing on them.
Starting point is 00:56:01 So if you have your GUID as your key, you basically have your data in all kinds of random order and so all your queries are way more expensive yeah and it's it's also a large data type what is it like 32 bits or i don't even know it's it's really big yeah i think it's a 16 yes it's much bigger than an integer yeah they're pretty massive um but there is a new feature in sql server that is a sequential guid yeah but what's kind of gross what i don't like about that is that the sequential aspect of it makes it very predictable so if you're trying to use a goo and you probably shouldn't be using a goo it is uh you know for obscure reasons anyway but it's uh it's not really much better than you know just using integer as far as obscuring data.
Starting point is 00:56:47 And what I'm talking about when I say obscuring your data is if you've ever had an application where you've got like cookie.userid and you go in there and you change that 13 to 111, now you're somebody else. So it just kind of grossed me when you can see numbers there because it's actually leaking information about your application. So if you know your user 1,000 and you just signed up, then it's a pretty good indication that whatever you just signed up for has less than 1,000 actual users. But it also goes back to Joe's example with 5 becoming 11. It wouldn't happen here because your GUID is also based off your machine information. So it's off the time code, but it's also based off, like,
Starting point is 00:57:23 I don't know if it's a machine name or what so they'll be unique between how the data is getting in there though but i mean in the example that he provided though he he was using from a development perspective he was using one id and then he goes to the production and it's now that that the id for that row has changed but that's because they were both integer auto incrementincrementing. But if you had sequential GUIDs, so one that was created in dev and one created in prod, you wouldn't run into that issue. Because the one created in prod
Starting point is 00:57:52 will have its own unique signature on it. Right, which is bad, because my GUID in dev is FAA2B and the one in prod is... So you wouldn't have that problem. Right, right. You'd know that they came from different systems, but you wouldn't run into that collision, because the whole idea is like a lookup table right like you have a product type product type one over right but but the scenario that i'm describing though you would be the problem because
Starting point is 00:58:13 they're sequential good so like if you were expecting this good at this one particular position that i'm not going to call out all 32 bits of right you know you know and you were expecting it to end in five and now suddenly because that one already existed in your production database now that one's an 11 wouldn't happen that way because what he's talking about is so let's talk about like a lookup table of product types or something right you've got motherboards one cpus two okay in production people have an interface to be able to add rows to that table, right? So the problem he was talking about is you'd have a collision because development, you had your product types one through ten. Production, you have those as well, one through ten.
Starting point is 00:58:55 So now your one through tens don't match up. However, if you have sequential GUIDs, your GUID in development is going to be different than the GUID generated in production. It won't be like it's one, 2, 3, 4, 5. You're going to have ABCD in the development one and DEFG in the production one. Which is bad because that means I have to have a config now that tells me different things for different. But this is where I was saying,
Starting point is 00:59:15 it depends on how you initially loaded that production data, though. But anything created in production would be unique to production and anything created in development would be unique to development and anything created in development would be unique if you if you exported that data straight out of dev it wouldn't conflict straight into production would not conflict with production the ids would match on your initial dump yes okay and now if you create now go to joe's scenario where now you're trying to create something right and there's some key that maybe, for one reason or another, it doesn't it's not there, maybe you've wiped your dev database, that key is no longer
Starting point is 00:59:49 there, right, but that sequential GUID is in the production database, right, that's the scenario that you were describing. I was talking about a situation with basically auto identities where, you know, I created a new status and dev it's 5, I'm prod it's 11, and now, you know, my code's looking for 5, on prod it's 11 and now my code
Starting point is 01:00:06 is looking for 5. That's what I'm talking about except represented as a GUID instead of a number 5. No, but that's what I'm saying. The GUID signatures will be different in the two environments. Dev will always be, you'll know that the dev ones came from dev and they will never be able to clash with the ones
Starting point is 01:00:22 in production. Unless you change your dev one. Unless it depends on how the data got into the dev. And that's what I'm saying. Because you're saying if the GUID was generated the first time, a GUID that's generated in dev would never generate the same one in production is what you're saying. Oh, you're trying to say that. And I'm saying that that's not the case.
Starting point is 01:00:39 Let's say that the data was loaded, right? I don't know. Because it's sequential. Because they're sequential if you ever tried to have anything if you ever had if you ever had a scenario where you had to insert something into the middle i don't know if you can that's where i'm saying the problem would become i don't know if you can do that well that's actually a big problem with guids is um if you have them as a clustered index which deals with basically how the the data store on disk
Starting point is 01:01:03 and you're inserting records with new guids and these guids are basically random numbers then you're constantly shuffling the data around on disk in order to put those inserted values and if they're not sequential yeah sequential you know you always just pop it on well i wasn't even referring to it from a performance point of view but i was saying like you know if you're trying to get if you were trying you can't put something in the middle because, because the other ones are sequential. Whereas in your dev example, it won't be ordered.
Starting point is 01:01:29 It may have, it, it might not have been the middle on that one at that point in time. Right. But you know, but you could turn it off and then insert it and then turn it back on. Well, right. And that's what I'm saying.
Starting point is 01:01:41 It depends on how you got the data into that other system to begin with. Right. I don't know how that'd work. We'd have to do some testing on that. But yeah, generally speaking, I don't think GUIDs are great as your key column. Yeah, it's kind of weird. One thing that's nice is for distributed systems or really just in code too, like theoretically you can generate those GUIDs in code and kind of throw them up there.
Starting point is 01:02:03 But I just don't trust it. I know that's a, you know, 16-byte number. It's really huge. If I generate two GUIDs in a row, those are supposed to be different. But I just can't trust it. It drives me crazy. Yeah. All right, so we beat that one up.
Starting point is 01:02:21 Parameterized queries. This is something important for pretty much anybody. If you're writing application code, you need to use parameterized queries. Period. End of story. Or you're calling procs. One of the two. But the whole reason is, is one, security, and two is speed.
Starting point is 01:02:38 In a lot of cases, RDBMSs will cache query plans, and they do better with a parametrized query. I think newer databases have actually done a better job of doing performance on non-parameterized queries, but a parametrized query basically says that you're not going to be able to hack your database all that easily. So you definitely want to use that. We also spoke about that in the OWASP episode so that was episode four yep and that's codingblocks.net slash episode four yeah so go check that one out but you should absolutely be using that yep and we you know we really talked a lot about um kind of
Starting point is 01:03:19 SQL tips for developers but uh coming from the other way around, actually, personally, you mentioned earlier, SQL Chris asked us for any sort of coding tips for a SQL guy. And so we started, you know, talking and thinking about that a little bit. And, you know, I think that's a really good question. So if you are, you know, if your DBA friend asks you
Starting point is 01:03:39 what they should try doing or learning about in order to get started with coding, then what should they do yeah i don't have an answer for they should do yeah i thought about a couple different things um you know at first it was really kind of weird i started thinking about php just because it's so good at popping data out of a database and throwing it onto a web page so if you just want to kind of create a quick data-based application, then that's a good way to do it.
Starting point is 01:04:06 And you get to leverage your SQL knowledge and it probably will feel somewhat natural to you, which I never thought I would say about PHP. Well, when I originally saw the question though, usually, and I don't know if this is the case for SQL Chris, but I know for a lot of DBAs that I've known in the past, because of the nature of their job, especially from when they walk in day one on a brand new well because they need to manipulate text and take it from one format, put it into another in order to get it into that database the first time. That's an interesting take.
Starting point is 01:04:57 Yeah, it's funny. When I'm importing data in SQL Server now, a lot of times I'll use Sublime because it's got a really nice Regex support. And so I'll kind of throw the data in there. Then I'll run my little Regex to clean it up a bit before I throw it in the database. But now, with that said, though, and this being 2014, would you recommend Perl to somebody? Well, I actually replied to him on this. And I said, if I were going to try and do this as a SQL guy, I think the lowest barrier to entry goes back to what we mentioned on the last episode do some javascript so i i mentioned look up a node tutorial so that because
Starting point is 01:05:32 they're real easy to get started with and then without having to know a lot about language specifics library like you know you don't have to know about system libraries and all the all the goodies that dot net or java or any of those give you. You basically have a fairly simple programming language out of the box that you can get started with. And Node's really powerful. So that was kind of my thing was if you want to get started with the lowest barrier to entry, probably something like a Node.js would get you going. And there's a couple.
Starting point is 01:06:01 I can't speak to how well they are, but there's some SQL builders for JavaScript. At least with Perl, though, I was thinking you could take something that you know well, like a database, and you could actually interact with it programmatically. It's an interesting question. Crossing those paradigms is a lot harder for SQL guys coming to programming versus programming coming to SQL. I've noticed. Most SQL guys don't really want to play with coding that I've seen.
Starting point is 01:06:37 I don't understand that. Yeah. A real quick break here. Again, if you guys are listening and you're enjoying the show, please do go leave us a review on iTunes. Click the five stars, leave us a line, tell us how you think we're doing. Yeah, we definitely appreciate the written reviews. If you take the time to write us that review, we greatly appreciate it.
Starting point is 01:07:01 We'll even do you the favor of uh you know giving you a shout out on the show feel free to drop us a line and hit us up at uh comments at codingblocks.net and tell us your preferred form of shout out yep and i'll even record michael doing the happy dance if you leave a really nice review we will send you a great gif yeah Yeah, yeah, a gif. I'm going to go ahead and go on record as not accepting that. So if that does happen, if you do get that, just know that that was videoed against my will. I did not sign up for that. There was no consent given.
Starting point is 01:07:39 That's awesome. Awesome. All right, so we're nearing the end here, but I think we would be very remiss if we did not speak about performance of of yes so the answer is yes yes you should have performance you should have performance and how do you how do you have that performance exactly in sequel oh we're gonna get complicated do you just do you just get to write queries and be happy is that it yeah whatever entity spits out that's generally it's tough because like ignore the man behind the curtain that's not a good answer well it's like a sql you know sql server at least is not really
Starting point is 01:08:16 known for being very scalable so uh i think you know the more you can get out of the database as in like you're not going to the database for it and probably the better like doing stuff like caching or um you know trying to stick it in more appropriate data stores like we talked about like solar or something like that for certain types of data then that's how i usually approach performance but i know that's not exactly what you're going for i've never heard that it's not that scalable as far as far as performance i know it doesn't scale horizontally as well but well that's what i mean yeah i mean vertically it only goes so far right yeah but it's it's actually a pretty fast thing but so in order to make your queries faster there's this little thing called indexes that you need to be aware of and pretty much every database on the planet supports them in one form or another at least relational
Starting point is 01:08:59 databases and at least in sql server I haven't investigated all the different database servers, but you've got clustered indexes and you have non-clustered indexes. And this is also probably one of those interview questions a lot of people get hit with. And even the people asking the questions a lot of times have no idea what they're asking. A clustered index basically stores the data sorted in the table for you so if you create a clustered index on the table and you can only have one in sql server then you create what's called a clustered table and that basically means the data will be sorted by the the columns that you chose so if you have an employee id then if you select out of that table it's going to or when those rows are written
Starting point is 01:09:44 to the database it's actually going to put them in order. So 1, 2, 3, 4, 5 are going to be on the hard disk in that order. Now, the interesting thing about this that probably a lot of people don't know, if you actually have a clustered index on a SQL Server table, if you do a select from that table, it automatically comes out ordered for you. Because that's how it's pulling it off the disk. That's how it's stored on the disk. So you'll get 1, 2, 3, 4, 5 for your employee IDs. So then that takes you to non-clustered indexes.
Starting point is 01:10:16 And essentially what this is, is your data is stored, or a non-clustered table, if you have a table that does not have a clustered index on it, your data is stored in a non-clustered table if you have a table that does not have a clustered index on it your data is stored in a heap and it it typically stores it in order but it's not guaranteed so it'll write it as it comes in in whatever order it came in but you're not guaranteed that you're going to have employee one two three 12345 in that order stored on the disk. It could do 123, and then it's going to find some space for number 50 right in between those, and that's where it's going to squeeze it.
Starting point is 01:10:53 So that is a non-clustered index table, which means it's a non-clustered table. It seems to me like the clustered index is almost always the primary key. Yes, typically speaking. And the only example I can think of where it's not is if you've got a primary index or, yeah, sorry, primary key that's a unique identifier. And you don't want that to be clustered because of, you know, it's storing it in order on the disk and having to reshuffle all of that. Now, here's one of the situations where talking about the GUIDs we did earlier as a key really stinks if you make a good a clustered index on your on your database it's trying to sort garbage like your good by its very definition is random for the most part and so it's now having to figure out where to sort this stuff and it's really not a meaningful sort so when you do that you're creating
Starting point is 01:11:45 a lot of overhead on every insert into the database because it's having to resort that data in the table and shuffle records around so if you are using a guid as as your key in your table you probably should not create a clustered index on that now here here's when you know that you're ready for a beer okay this is the beer test when when you're when your queries involve you having to create your own index on the fly for results in a temp table right you my friend have earned a beer and we've all been there or we we have well yeah i mean there's times where it happens. Could we mention the CTEs earlier? And that's if you're going to do it in memory. But there's also times where you want to be able to reuse that result over and over and over.
Starting point is 01:12:33 So then you would write it out that index on the fly as part of your overall set of queries in order for you to get to the end result. Yeah, and they can be extremely useful even in that temporary situation. I mean, they could mean the difference between pulling back your records in zero seconds and five minutes, right? I mean, they can be extremely helpful. Yeah, indexes are huge. And so getting into a non-clustered index, essentially that is not stored in place in the table. That's stored outside in another location, but it points, if you have a clustered index on the table,
Starting point is 01:13:17 then that actually, that location gets stored in each one of the indexes. If you don't have a clustered index on there, then it's just going to try and point to the actual location of the data in the tables. And non-clustered indexes can be incredibly fast. You don't necessarily need a clustered index in every situation to get to that data. But a non-clustered index is your way of not sorting data in the table, but being able to point to it in a sorted manner outside the table. And so it's a fast lookup. So what's a filtered index?
Starting point is 01:13:50 Yeah, these are new in 2008. So these are kind of cool. So in the olden days when you created an index, you indexed every record in the table. In a filtered index if you know you're only going to be searching for a certain subset of data you can create what's called a filtered index that will only index that certain subset of data so if you have a table that has 80 million records in it but you know you're only ever going to be hitting these 1 million records you create a predicate on the index create and so only those 1 million records will be indexed okay so maybe I get like index
Starting point is 01:14:29 only the the records created in the last year or something exactly exactly and so when you when you're using your where clause to pull all the records within the last year it's going to hit only that subset of data so your record your your speed is going to be way faster uh so that's that's pretty interesting another thing uh here's the here's the downside you can create a ton of indexes and you don't even know if they work right i mean generally speaking i'd say a lot of people who learn about indexes are like oh i search on this field i'm going to create an index for that why don't you just index every field right so there's a couple things to know about indexes is
Starting point is 01:15:07 the the query optimizer at least in sql server will use one per table so creating a separate index for every field in the table is not going to help you generally speaking if you know that you are searching a an employee table by first name and last name all the time, then you'll create a composite index on that that includes both first name and last name. And then when you search, it can index very quickly and return back what you need. Well, the only way to really know whether or not that's working for you is obviously it's going to come back faster. But in SQL Server server in the management studio you can look at what's called the execution plan and if you see an index seek then you know your index is working if you see an index scan you're you're basically
Starting point is 01:15:56 doing a table scan except on a smaller on a narrower table essentially i'm trying to think if there's, I don't know, anything. Just creating indexes blindly is not going to help you. You need to know what your search, your queries are, what you use your where clauses on, what you use your join predicates on, and that's typically what you're going to create your indexes based off of. Yeah, and there is overhead associated with indexes so it's really fast for pulling stuff out when you use that index, but every time you insert, you've got to kind of pull out that data and throw in the heap.
Starting point is 01:16:36 Yeah. What about fill factors? I've seen that show. You got to grow stuff. That's awesome. So the fill factor, I think the default in SQL Server when you're creating an index is 80. But all that means is,
Starting point is 01:16:57 is it going to reserve any space for updates to this data? If you make a fill factor of 0 or 100, it's the same thing. It basically says, no, use up all the space on the row. So generally speaking, if this is data that's not being updated, if this is data in a report table, let's say, that really never gets changed,
Starting point is 01:17:15 you can make the fill factor 100 or 0 on the index. And that means that you're not going to waste any space. That index is going to be as efficient and lean as possible if you make it a fear uh fill factor a fear factor of 80 yeah i got you right i got you showing your age there outlaw yeah wait what so if you make it a fill factor of 80 that means basically leave 20 of room on this so if you update data in the table if your first name all of a sudden got bigger for some reason then it has space to update that index and point it back to that record without having to rearrange those records in the table or you probably want to leave that fill factor open for a little bit of wiggle room. If it's a reporting table to where that data never changes, just zero it or 100 it
Starting point is 01:18:09 and you're good to go. All right. And I think we have pretty much just exhausted a lot of information on databases here. All right. So resources we like for this particular particular episode i've actually got a few that i want to mention i'm getting into angular js development a little bit uh late to the game but you know i figure they've had time to perfect things yeah it's been four weeks yeah yeah but i figure they've had time to really uh perfect it and and there is some interesting stuff here so the first one that I want to mention that I've tweeted out actually through coding blocks and also mine is angular js there's a guy uh it is art and logic and he had three articles that he wrote for things that he's been
Starting point is 01:19:01 doing wrong a lot in angular js and the ways to fix it all the way from file structures to how where to put your logic and how to separate out your controller information and not mix in doms so excellent excellent articles so i highly recommend those i'll have links in the show notes another one is angular dash app This is literally not a tutorial. This is a baked app that is put up on GitHub that if you want to take a look at how it should be structured and how authorization and authentication happens, this application is there.
Starting point is 01:19:40 You can actually go through it. You can compile it. You have to use node node.js karma and bauer and i had the thing up and running in no time like it probably took me 15 or 20 minutes to have the whole thing running so definitely a nice resource if you really want to see how it should be laid out in an app that's running both server and client-side code. It's not a tutorial. It's just code there. And then the third one that I want to bring up is there was an excellent article on medium.com, and I'll leave a link to this as well.
Starting point is 01:20:14 But it was getting just strictly into the authentication and the authorization part of AngularJS because with a lot of the UI layers now for the web, most of your code is exposed because it's all JavaScript. And with tools like Google Chrome, Firefox, Firebug, even IE's debug tools, you can go in and manipulate every little piece of JavaScript that exists. So with that, it's very important that now more than ever, you also do server-side authentication and authorization on pretty much every call that's coming through and this guy laid it out beautifully the only thing is he doesn't describe all the files like where where the code goes but i'm assuming if
Starting point is 01:20:57 you're comfortable with angular and you're trying to get a foot up on this kind of stuff this will be a great article for you so i'll leave a link to that as well yep and i wanted to mention a foot up on this kind of stuff, this will be a great article for you. So I'll leave a link to that as well. Yep. And I wanted to mention a site called usetheindexluke.com. And I actually thought this was a blog. You didn't say it right. Hey, use the index, Luke. Luke.
Starting point is 01:21:17 Use the index, Luke. It's got a great little graphic on the site too. But I actually thought this was a blog until recently. And it turns out it's actually just a book. And I'm sorry, I have a correction. They actually have a blog too that I have not been subscribed to. So it looks like I messed something up there. But anyway, if you go to useindexlook.com, it is an online book and there's also a paper digital version you can buy. It's in English, French, French, Dutch, a few other languages.
Starting point is 01:21:48 And it's just got really nice in-depth information on SQL in general. So it actually has little kind of like takeouts there for specific platforms like Oracle, SQL Server. But it's got chapters on indexes, on joints, and it's just really in-depth. But it's also concise information. So there's not a lot of fluff here. It's just great information. There's a little character there.
Starting point is 01:22:08 It looks like a cross between E.T. and Yoda and Taco. It's not a graphic you're going to forget ever. It's definitely very interesting. Why is the taco holding a lightsaber? There's a Reddit that's pretty much dedicated to tacos and lightsabers, I'm sure. But also there's this really cool... There's about to be if there isn't already. There's a three-minute test that you can take
Starting point is 01:22:32 that'll kind of let you know how you stand on SQL performance. And it's got some tough questions, but you choose your database, MySQL, Oracle, SQL Server, and it's got some pretty tough questions. And really, it took me to the mat. So you guys should go there and try it.
Starting point is 01:22:47 We'll have a link in the show notes. Very cool. I think Alan's taking it right now. I was looking. Yeah, I actually clicked start. Sorry about that. All right, I'm back. There was one funny that Outlaw found this week that I don't even know why I'm talking about it because he should get credit for this.
Starting point is 01:23:08 There's html9responsiveboilerstrapjs.com. If you really want to take a break and you're over all these bootstraps and boilerplates and angulars and embers and all that basically if i could read from the summary of it i think this would please do you know describe it best in regards to html9 responsive boilerstrap js it's a flexible dependency free lightweight device agnostic modular baked in component framework mvc library shoelace strap to help you kickstart your responsive css based app architecture backbone kitchen sink tweety birds so that's straight from the site that's exactly what i was looking for yeah i think i think like like here this this is hilarious though i mean it's it's available on github you can download it if you need to. But of the frequently asked questions,
Starting point is 01:24:07 my absolute favorite was in regards to how do you install it. You just attack clone the GRIT repo, push merge, then RubyGem, the lymph node, JS, Shawana, and on a module and presto. It's a goldmine, especially if you've been digging through trying to find the perfect responsive stuff it's a nice break yeah yeah all right so with that let's get into the tips of the week yeah and uh i had a doozy this time so just discovered this you go last oh come on i'm dying I am chomping at the bit.
Starting point is 01:24:45 So, Alan, you had something. What was yours, Alan? No, no, I got to hear this one because I'm actually more excited about his than I am mine. Alright, fine. I get two tips a week for this one. It's so good. I'll use it next episode, too. But the sequel fiddle... Why not? Alan has.
Starting point is 01:25:02 Ease up off me. So, you've probably seen JS Fiddle around there. It's a nice little website that will let you send JavaScript to other people. Like if they're asking a question, it's a nice example that's actually working. They can run it and see what it does. They can mess with it.
Starting point is 01:25:17 You can send it to other people. SQL Fiddle is the same thing. So on one side, you have an area where you create your schema. You can create tables in your database server of your choice. You can insert some data. And then on the right you have your query.
Starting point is 01:25:30 So if someone wants to see an example of maybe the tree structure that I was talking about with the 0 to 100 and whatever with the managers and employees, you can actually write that out in SQL Fiddle, save it, and send it to someone. They could run queries, kind of experiment with it, and it's just really cool. Yeah, I about lost my mind when you showed this to me because my first question was like, well, what database servers do you use? He's like, well, MySQL. And then you go up there, there's a dropdown, and you can choose from like, I don't know, 12 different servers.
Starting point is 01:25:59 Like Postgres, MySQL, SQL Server, Oracle. It's fantastic. Yeah, it's really cool. And so if you've got a little SQL script, you can probably do this a lot easier. You can test your updates and stuff right here easier than you can create a backup, run it, see if it worked, restore the backup, try it again.
Starting point is 01:26:18 If you've got something weird you need to do. Yeah, that one's awesome. So I guess on to mine. So you don't test your query in production? What? Who doesn't do that? I'm sure you've been in a situation where you've nuked some data in your dev database, and you're like, oh, great.
Starting point is 01:26:34 I've got to figure out how to fill this back in so I can try it again. Not today. Not before lunch. That's why you should always write your where clauses first. Yes. Oh, man man we've all been there all right so uh my tip of the week are object initializers and c sharp uh many of you will know what these are those who don't may end up loving this so an object initializer is
Starting point is 01:26:56 when you go to rather than calling a constructor so if you have if you have a class that has a ton of properties the old school way of doing that is for every permutation that you can think of for for creating this particular instance of this class you would create a ton of different constructors well nowadays in c sharp you don't really need to do that you create a default constructor and it has to be public if it's not public this will fail but you create a default destructor or constructor that will have some pieces of destructor those are the days yeah that's rough some pieces of implementation in there that you need to have happen and then when you call your initializer it's basically going to be a new instance of the class and you're going to
Starting point is 01:27:40 do the curly braces and then you basically just set what you want those member variables to be equal to. It will call the base constructor first and then it'll set up the initialization members for you. And that's it. So it's real easy. Instead of having a bunch of different constructor signatures, you can just have one that does the meat of your work and then just pass in the member variables that you want set.
Starting point is 01:28:03 So we'll have a link up there, but for people who aren't aware of it, that will save you a lot of time in creating constructors just redundantly. Yeah, so with that, I have two I'm going to mention, both keeping in the theme of this episode. So we mentioned previously in part one, poor man's T-SQL formatter. So I thought that that deserved to be mentioned
Starting point is 01:28:31 as a tip of the week. So if you are a fan of Notepad++, poor man's T-SQL formatter is a plugin that you could apply there. And as we mentioned, you could take your query from anywhere else, plug it in there, format it, and it'll make it all nice and pretty for you. Obviously, you're not going to be able to query it from there. But it's a nice way of formatting that consistently if that's something that your OCD requires that you do.
Starting point is 01:29:00 And without using an online tool so that you don't know if people are stealing your sequel yeah you know yeah especially if you don't have the uh the bankroll to buy uh the red gate tools right they're pricey um so so i felt that one deserved but the other one i wanted to mention too is that if you aren't already using it is the sequel profiler tool that comes with SQL Management Studio. And that's a great little tool you can run alongside your app if you're trying to debug, especially for a brownfield application, if you're just trying to trace it to see, like, how bad is this thing before I get started into it, right? And you can just kind of see what that application is doing.
Starting point is 01:29:46 Also, there's a ton of different uses for it you could see like well how bad did entity make that query and you can see so it's a great little utility for actually being able to see the queries that are happening as they happen you know you can see it from not just a statement point of view but stored procedures that are getting called, things like that. So it's a great tool. A quick tip on that is if you are using SQL Profiler in a production environment, you probably don't want it to write to tables on that server because it is going to eat up a ton of IO. So you'll want it to set up to write to your local file system for the SQL trace. Yeah, I guess I should have mentioned that, because I was really thinking that you would use
Starting point is 01:30:28 this locally, not on a production server. Well, you use it locally, but you might be pointing to your production server. I'm sorry, connect to a... I wasn't thinking that you would use it to connect to a production database. Yeah, right. Dude, why else does it exist? How long have you been in this industry
Starting point is 01:30:44 and you've never seen this used on fraud? I'm not saying that people don't do it. I'm just saying what I was thinking when I wrote it. Okay, so from his standpoint, he's just running code and seeing what comes out without having to trace through every step of the code. Oh, yeah, me too. But if you ever do run into a problem where you just can't reproduce it because you don't have the same level of transactions in a development environment this is the tool that you use right i mean what else you're gonna do unless unless you have the bankroll again to get something like uh oh man i can't even think of the the the program sequel uh century sequel centuries one there's some other tools
Starting point is 01:31:20 out there but even then profiler will take it a step further, and you get to see everything. Yeah, I'm not saying that you couldn't. I'm just saying that from a developer's point of view, when I decided to mention this as the tip of the week, I was really intending it to be used from a development point of view in a development environment. Nice CYA. If you want to connect this to your production system, that's your own risk. Yeah, fair enough. So with that, we'll be putting all the links in the show notes.
Starting point is 01:31:51 They'll be available at codingblocks.net slash episode 13. Subscribe to us on iTunes. No, no, no, no. Episode 14. Oh, I'm sorry, 14. Yes, you're right. Thank you for that. So www.codingbox.net
Starting point is 01:32:05 slash episode 14 subscribe to us on iTunes, Stitcher and more using your favorite podcast app and be sure to give us reviews we really appreciate those, those go a long way to helping new listeners find us and us help people
Starting point is 01:32:20 become better programmers well I wouldn't go that far okay well yeah i just took that uh sequel performance quiz i'm not helping anybody on sequel oh really did you take it yep but what you don't want to i got five out of five but it was hard so all right so like i was saying if you could give us a review, we would greatly appreciate it. Yeah, where are we at? So yeah, visit us at CodyBlocks.net where you can find the show notes, examples, discussions, and more. Yeah, and send us your feedback, questions, and rants and comments to comments at CodyBlocks.net.
Starting point is 01:33:00 And make sure you follow us on Twitter. All right, that's a wrap

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.