Algorithms + Data Structures = Programs - Episode 10: snake_case vs camelCase (Naming - Part 3)

Episode Date: January 29, 2021

In this episode, Bryce and Conor complete the naming trilogy and talk about some of the most important questions in tech - indicated by the title.Date Recorded: 2021-01-27Date Released: 2021-01-29CppC...ast Episode with Guy DavidsonConor’s tweet as Guy about predicatesstd::vector::empty()std::is_empty()std::filesystem::is_empty()cudf::device_spanRuby each_consRuby each_with_indexRuby 3.0 Static TypingCrystal Programming LanguageCrystal each_consCrystal each_with_indexsnake_case, PascalCase, camelCase & kebab-caseRename concepts to standard_case for C++20, while we still canJulia Unicode InputIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8

Transcript
Discussion (0)
Starting point is 00:00:00 We're not, we can't have, we can't have a 60 minute episode. I think that was great. I think that that all should be part of this, this episode. Naming. Part three. Welcome to ADSP The Podcast, episode, recorded on January 27th, 2021. My name is Connor, and today with my co-host Bryce, we're going to be wrapping up part three of our three-part series on naming. So let me ask you about this. So one thing we didn't talk about last time that I made a note
Starting point is 00:00:46 about was ordering in names. So let me give you a few examples of that. So the first is like noun underbar verb versus verb underbar noun. Like as an example, do you call a function apply underbar policy or policy underbar apply? And are those two even necessarily the same thing? And what really gets me here is I sometimes see codebases that do this inconsistently. Like they have one function in their API that follows the noun underbar verb, like policy underbar apply. And they have some other function that follows the verb underbar noun pattern. And I don't particularly care which of those two you follow, as long as you're consistent. I have a slight preference for phrasing it as something like verb underbar noun, because I like my identifiers and my code to sort of read as like plain English.
Starting point is 00:02:05 So like something like apply policy, like what does it do? It applies the policy. So do you have any thoughts on that? Well, so this is, this is great because this is like completely adjacent to one of the things I wanted to talk about.
Starting point is 00:02:20 So to first answer your question, I've never actually thought about this in terms of noun, underscore, verb, verb, underscore, noun. But in thinking how I would just write code, I think if I were to go look back, it's always verb, underscore, noun. Especially because, like, I started my career in a mathematical domain, actual science. So there was a lot of like, calculate underscore reserve, or calculate underscore premium, or determine underscore, you know, insert something. So I think, inevitably, yeah, if I look back, it was it's almost verb underscore noun. But taking a step back, one thing that I thought was great, because I hadn't thought about it at the time, I heard it from Guy Davidson on a CppCast episode, is the idea of naming functions, verbs and nouns separately, based on whether they are modifying or non-modifying functions. So there's a very classic pattern in Java and C++, and I've even seen it in like Herb Sutter's talks
Starting point is 00:03:28 where he shows some reflection example of where we're generating methods for you, where you generate these getters and setters. So you have, you know, you have get value and set value. And Guy's point was that you shouldn't ever need to prefix something with get underscore. If you're just retrieving something, you're not modifying. If it's a non-modifying method, just drop the get.
Starting point is 00:03:52 And anytime you have a modifying method, you should be using a verb. Aim into that. There's nothing I hate more than seeing a function that starts with get, you know, under bar. It's just like, it's unnecessary symbols. Which was so, so that's like when it comes to verbs and nouns, when you started talking about that, I was like, oh, is Bryce about to make my point for me? So I'm, this is not, it's not, I just adopted, you know,
Starting point is 00:04:19 the first time I heard it was Guy. He probably heard it from someone else, but I really liked that. Don't judge guy like that for all you know guy was the brilliant genius that invented this it's true well we'll give guy credit even if he did uh uh we'll see yeah maybe maybe guy will listen to this we'll we'll add him on twitter and he can tell us if he heard it from someone or he made it up um because i think it's a great rule and i also i think on that guy the record, I am choosing to assume that you made it up and Connor's choosing to assume
Starting point is 00:04:48 that you just took it from somebody else. Sure, sure, we'll stick with that. But on that episode, he actually made another point how you should prefix predicates, which I had never even heard of that term at the time with is and has. Um, and so I think I asked him, you know, uh, can you define predicate? And that's just a function
Starting point is 00:05:11 that returns a Boolean. Um, and, uh, that whenever you have that, you should prefix it with is and has, because, or I think that's what ended up happening is that I said, what about, what about functions that like is even, Because technically is is a verb. And he said, oh, any function that starts with an is or has is a predicate. And that's an exception. Anyway, so this gets back to sort of the noun versus verb. So you can comment on everything I just said. But I think putting the verb first is like it makes it more clear that the whole thing together is a verb.
Starting point is 00:05:43 When you're saying, you know, calculate reserve. Really, the key in those two words is calculate. And if it's actually a free function, so it's not like a non-modifying or modifying, like ideally it's a pure function without side effects. And in that case, actually, I don't know what I prefer. Do I prefer calculate reserve or reserve? Because if you call it reserve, you just pass it the arguments. I don't know. So that question about the predicate or that Mark does about the predicate function is actually interesting.
Starting point is 00:06:16 And that's one of the few patterns that's actually in the proposed recommendations for standard library design guidelines. But as I mentioned a little earlier, one of the conclusions we came to when considering C++ library design guidelines in the C++ library design group last week was we want to try to avoid decorating names whenever it's not necessary. And by decorating names, I mean adding prefixes or suffixes just because a function or an identifier is some class of thing, sort of similar to the Hungarian naming convention where you embed the type of the thing into the name. But one of the ones that we'd actually suggested did make sense was, you know, to have the is under bar for predicates. One of the examples that somebody pointed out was the empty method on a lot of the standard library containers. It's kind of
Starting point is 00:07:34 ambiguous whether calling vector.empty empties the contents of the container or tells you whether it's empty or not. And the standard library is actually like a little bit inconsistent here. So as an example of that, C++'s std future has a valid method. Now that one's pretty unambiguous, you know, that that tells you whether it's valid or not, unlike empty, which is a bit more ambiguous. But the concurrency TS proposes some additions to std future. And one of those additions is an is ready function.
Starting point is 00:08:12 And I just find it amusing that, you know, I could understand inconsistency between different classes, but it really tickles me that there's a proposed inconsistency within this same class, that we would have is ready, which would be one predicate, and valid, which would be another. And I am inclined to say the world would have been better if std vector had called it is empty. So maybe that's one of those rare cases where the decoration is really truly useful. So I thought that it was a widely acknowledged fact that empty was a mistake and that going forward we were always supposed to use is empty. I can't find it right now, but I know that there's two examples in the standard that have actually started using is empty. One of them is std file system.
Starting point is 00:09:06 The other one I can't find right now. You're absolutely right that that is the policy going forward, and that's what we were discussing last week was codifying it into an actual policy document. What was interesting, though, is that the other day I was code reviewing someone's code, and we have, I think it's in the QDF repositories. It's the open source library I work on. We have a device span and it had an empty method. And whenever I see that, I comment,
Starting point is 00:09:35 oh, let's prefer is empty. And I have some links somewhere where I link to, you know, this is a widely regarded mistake and that going forward we should use is empty. And I think it was- Although if we're going to do that, we should probably, and that going forward, we should use is empty. And I think if we're going to do that, we should probably all of the C++ standard library things that currently have an empty, we'd probably need to give them an additional is empty, so that people can consistently just use is empty, you know, it's a pain if you have to remember hmm does this thing have an
Starting point is 00:10:06 empty or an is empty and then it becomes even more of a pain in generic code uh that's true um so yeah it would be nice to get those um but where i was going with that is that the the person i think it was vukasan i hope he doesn't mind me shouting out his name his response was oh we're keeping it consistent with stood span or yeah and was like, wait, what? Span added empty? I thought we were not doing that anymore. Well, span probably did it for the reason I just mentioned, which is span probably called it empty and not is empty
Starting point is 00:10:36 because for consistency with the existing things, yeah. See, this is the problem with consistency is that sometimes you have to, like, some, I I always say and this is sort of like Bryce's law. I'd I'd rather be consistently wrong than inconsistent. That that the idea that that, you know,, like why is spans empty? Why does span have empty and not is empty? Because it would rather be consistently wrong. Like, I'm not going to have any flexibility of thought on my opinions on matters. I'd rather just stick to my guns and be wrong than eventually be inconsistent and change my mind. That sounds like an awful law. I am happy with my law as stated.
Starting point is 00:11:41 Are you sure? I'm going to give you a couple episodes to think about whether you want this to be your law. I will think about that. So I have another story, which I'll give a shout out to my coworker, John Bichon, who was actually my coworker at NVIDIA and also my coworker at Lawrence Berkeley National Lab, where we used to work. And we were both at Lawrence Berkeley. He was working on this asynchronous runtime system. And whenever you have such a system, I worked on one myself called HPX. You tend to have some set of functions that are asynchronous and that have some async decorator on them. And in the code bases I had worked with, the async always came on them. And in the codebases I had worked with, the async always came at the front.
Starting point is 00:12:27 So you'd have async underscore reduce or async underscore transform, et cetera. But in John's codebase, if I recall correctly, he would call it reduce underscore async, transform underscore async. And I one time asked him, hey, why do you put it at the end, not the start? And he told me, well, he thinks of it sort of like That he wanted the most specific thing to be first. So what's the most specific thing of an asynchronous reduction algorithm? Well, it's of the family of reduction algorithms. That's more fundamental than it being a member of the family of things
Starting point is 00:13:27 that are asynchronous. So he sort of made an argument that the ordering of things in the name said something about sort of the grouping or the categorization of those things. And I've also heard this claim made for things like code completion, that if you call it async underscore reduce and async underscore transform, then when you type async underscore, you're going to get almost everything in your library. Whereas if you do it the other way around, then if you type reduce, you'll get auto-completion for reduce just the synchronous version and reduce underscore async and whatever other functions you have that are some form of reduction. And so I thought that was a very interesting notion that I had namespaces, but even just within the identifier itself.
Starting point is 00:14:35 Yeah, no, I've actually spent a lot of time thinking about that because there are examples where it's an amazing design for exact for that exact reason that you end up with like a list of algorithms that are alphabetically sorted and then you see you know zip underscore zip underscore zip underscore and you you're clearly like oh okay these are these are all doing roughly the same thing however you can go too far with it in some cases so like in ruby there is a collection of algorithms. I just looked it up. So they each start with each underscore, each cons, each entry, each slice, each with index, and each with object. And I don't know what all these do, but if I were to ask you, what do you think each underscore cons does? Like, do you think you could guess?
Starting point is 00:15:23 And our listeners can play along if you want and and the point that i'm about to make here is that yes they've they've family familyized these you know they put them in a group but they've completely in my opinion obfuscated like what these do and like the same named algorithms in other languages are completely different interesting in fact i'll give you a I'll give you a hint about each cons. The cons is a reference to an old language. Yes. So it's like a reference to Lisp, right? Like to the operation where it takes the tail of a list? Well, cons is basically your procedure that creates a pair. So in Lisp, your linked list is just a bunch of conses.
Starting point is 00:16:12 Oh, yeah, yeah, yeah. But so I believe, I can double click just to make sure that I'm not wrong, that each cons, you pass it an integer, and it gives you the sliding window of of the length of that so like each cons two will give you adjacent pairs each cons three will give you a sliding window of length three where the step is one and so so this yeah this in most languages is called uh sliding or windows or windowed you know rust i think just added a couple different versions of this in their 1.48 or whatever the version that they're on. But yeah, so I completely agree that there are times where
Starting point is 00:16:51 prefixing everything with the same prefix is a really good way to have people explore the language and discover, oh, look, there's a very similar algorithm I could use. In other cases like this, I think each with index isn't bad. I think that's just basically, it's called... Yeah, that makes more sense. It's called different things in different languages. The most common name is enumerate from Python, where you just bundle each element with an index.
Starting point is 00:17:17 But yeah, in this case, I think they went too far. And it's really unfortunate because Crystal, which is actually a language I think I first heard about from you, it's basically Ruby statically typed, although I think Ruby actually just got static typing in like Ruby something point something. It's, yeah. Thank you for that very informative comment.
Starting point is 00:17:39 Ruby static typing. 3.0, I'll guess. The state of Ruby 3 typing. I'll find out if it's been added and added to the show notes. I could be mixing up Scala 3 and Ruby 3. Because Scala 3, otherwise known as Dottie, is like... See, folks, this is why you shouldn't learn... This is why you should have learned only a few terms in there.
Starting point is 00:18:03 This is not a lot. Because then you don't get them mixed up. Touche. Fair point. Anyways, what was my point here? All right, Crystal is statically typed, supposed to have the performance of C version of Ruby. But they basically just like verbatim took all the algorithm names from Ruby. Whereas Elixir, which is basically like Ruby plus Erlang, they looked at all the algorithms, took the good ones, and then renamed the bad ones.
Starting point is 00:18:31 So they didn't take any of the each underscore methods. They renamed them to like chunk and better names, which I think is the better learn from, learn from languages, mistakes, copy what they did right. And then, you know, iterate. Yeah. What do you, what do you think about, uh, about naming styles? Well, so is this, is this like snake case versus Pascal case versus case? Well, so, um, well, well I'll answer that in a sec. Cause I don't know. Yeah. Well, kebab case is adjacent to, um, yeah. So there's, well, okay, let's, let's answer that in a sec. What is kebab case? I don't know what kebab case is. Yeah, well, kebab case is adjacent to... Yeah, so there's... Okay, let's answer this question. So we've got snake case, pascal, camel, and kebab case are like the four main ones.
Starting point is 00:19:13 There's a couple other variants, but I think those are the four main ones. I think most people have heard of snake case. It's the underscores in between words. Pascal case, the start of each word is capitalized. Camel case is pascal case, but of each word is capitalized camel case is pascal case but the first word is lowercase and kebab case for any lispers out there um they'll know what um kebab case is it's where you hyphenate the the words uh i actually like kebab kebab case quite a bit although it does not work in a number of languages uh it definitely doesn't work in a number of languages but i i think snake case
Starting point is 00:19:45 is my my preferred my my preference i really strongly dislike camel case i just think it's like super unreadable and it's just like it is jarring to me for the first word to not be capitalized i and i just i want to understand why why should it why should the first word not be capitalized but the rest of them should be it's just like pascal case i can i can i'm i can be okay with because like it's consistently capitalized at the first letter of each word like that's okay but but no i'm not okay with not okay with camel case i do not share i mean i'm i'm surprised actually that you have such uh not strong feelings about formatting you know you can do without but then for for that first letter in camel case uh that's where you draw the line
Starting point is 00:20:40 either capitalize or don't capitalize um yeah is there is there is there a style where you don't use underscores but you don't capitalize anything where you just mush everything together does that have a name uh all lowercase i don't know we'll let we'll let our listeners if that's a thing tweet at us and we'll mention it what do you say say? So in C++20, in the C++ standard for many years, the names for things that were the moral equivalent of concepts, we didn't have concepts in the language, but we had names for these sets of requirements. Those names were always, they used Pascal case, which was nice because it made them distinct from the standard library identifiers, which were all used snake case. And when we added concepts, we were originally going to add
Starting point is 00:21:34 the named concepts in the standard library as Pascal case, following the precedent of the named requirements that had been in C++ for many years, like copy constructible or random access iterator. Those names would have been spelled in Pascal case. But then a fairly late decision was made to make those concept names be snake case, which has led, I think, to some unfortunate ambiguities. So for example, std colon colon iterator is the name of a concrete type from like C++ 03 or C++ 11, whereas std colon colon random access iterator is the name of a concept. Whereas
Starting point is 00:22:29 with the Pascal casing, you had this, it was very visually clear when something was a concept versus a type because the concepts were in Pascal case and the concrete types were in snake case. But we decided we didn't like that. And I personally think it was a mistake. I don't know, what do you think about it? Do you think it's confusing that the concepts use the same naming pattern or the same naming style as types? I did say earlier in this discussion that, you know, I'm not a fan of, you know, these decorator type patterns where you have a prefix or a suffix or a particular style for things. But I guess in this case, I felt it would have been a good idea to have concepts be visually distinct from types
Starting point is 00:23:17 in the standard, in the C++ standard library. Yeah, I mean, I feel like that paper and that decision got made on either the first or the second committee meeting that i went to and um i didn't spend most of my time in luge um but i was i did sit in on a couple of the discussions i did not have anywhere as near as a strong opinion as the people in the room um uh think the biggest argument, if I recall, against mixing them was like, this was breaking precedence for the first time ever. And it's-
Starting point is 00:23:53 Although, but you could argue it was like, it was going to break precedence either way because it broke the precedence of the existing naming pattern. There was a thing that was referenced in the standard called random access iterator in Pascal case. It wasn't a programmatic concept because those didn't exist yet.
Starting point is 00:24:13 But the name of it now is inconsistent with what we used to refer to this set of requirements. So I think, and I wasn't in the room at the time of this discussion because I was chairing a different room. But if I had't in the room at the time of this discussion because I was chairing a different room, but if I had been in the room, I would have made the argument that there was going to be an inconsistency either way. And we should have gone with the one that was going to be less ambiguous and that was going to follow what the existing name of these things
Starting point is 00:24:38 was. Well, so although what you say is true, I think the argument was coming from trainers or people whose primary occupation and are very exposed to teaching C++. And you mentioned that's the way it is in the standard. I don't think the majority of C++ developers are working their way through the standard. And then you can make the argument, well, it's not just the standard. It's also an EOP, elements of programming. I don't think it was just through the standard. And then you can make the argument, well, it's not just the standard. It's also an EOP, elements of programming. I don't think it was just in the standard. If you go and look on CPP reference uses it, I think there's a lot of code that,
Starting point is 00:25:16 there's a lot of pre-C++20 code that referred to some of the core concepts from the standard using the Pascal case naming that the standard uses. I'm sure there's a lot of codes out there that refer to this type is copy constructible Pascal case. So, you know, yes, you can make all those arguments, but I think the argument, and so this is the thing is I'm doing a poor job because I'm trying to make someone else's argument for them. But I think their point was like, this is it's different. And this is going to be one more thing that I have to teach that like, people are going to are, you know, whether it's students, or whether it's professionals that are, you know, getting training,
Starting point is 00:25:55 or just learning the language for the first time, they're going to ask, you know, oh, why is this different? And it's a short answer. It a concept but i think from the educator's point of view like if they're if there's one thing that they don't need to teach or like they don't need to add to a bullet list of points of like you know insert why is this different that's preferable but now they do have to teach people that stood colon colon iterator isn't the name of the of a concept even though stood colon colon random access iterator is the name of a concept, even though std colon colon random access iterator is the name of a concept. And a bunch of other little cases like that. Like, I don't remember what it was for.
Starting point is 00:26:33 There was another one around Boolean that was very odd. Yeah. Anyway, so we should get someone on that voted for it, and then you can debate them. But yeah, we have to go. But the last thing I want to say before we wrap this episode up is that the same way that LISPs have a very sort of unique kebab style for some of the LISP dialects, I actually think that LISPs have a feature that I wish a lot of other programming languages had when it comes to the naming of predicates do you know how uh you denotate a predicate function in a lisp or in most uh i don't you use a question mark um which i actually really really like because instead of having to say is empty or has feature, which don't share the same prefix,
Starting point is 00:27:28 all of the predicates end in a question mark. So instead of is empty, it's just empty question mark. And then if you have, what's another one? Is even. So yeah, even and positive are not like is even or is positive. It's just even question mark, positive question mark. And I think that's super, super nice because there is actually some cases where is and has, you know, the is underscore and has underscore aren't the right prefixes. And then you sort of have to come up
Starting point is 00:27:57 with some awkward prefix that doesn't really work great. But if you can just add a question mark on the end, it becomes immediately clear, oh, this is just a predicate and I can use something simple. That's interesting. Yeah. I think that it would be interesting to see what a language like C++ would have looked like if we could have had a wider set of characters and identifiers. Like if we could have had question marks and identifiers or exclamation points and identifiers, a wider set of characters and identifiers. Like if we could have had question marks and identifiers or exclamation points and identifiers, what styles would that have led to?
Starting point is 00:28:33 Because I know languages that do allow question marks or exclamation points and identifiers, you know, those are often used to convey, to denote certain types of functions, etc. Like, I generally think that languages benefit greatly from having as many characters in the identifier set as possible without, of course, restricting, you know, what you have available as operators, etc. Yeah, there's like Smalltalk is a language that you can define binary operations as any combination of like a set of like 20 different ASCII characters. So you can come up with some really, really cool things that yeah, you can't you can't do anything. You're limited by the set of what binary operators are overloadable in C++, which we do have a bunch, but you can't just like
Starting point is 00:29:22 create one out of thin air that doesn't exist before, whereas in other languages you can. Yeah, I'm always jealous of languages that can invent, you know, operators like that. I think it's a very powerful feature. It's one that I often wish I had access to in a language like C++. Julia, actually, I just learned you basically can define like any unicode operator that you want because it's a scientific language and so a lot of like uh domain specific applications they want to have like a logical and that's you know the upside down v and a logical or with a with a lot of c plus plus uh uh compilers you can get away with using Unicode in your function names,
Starting point is 00:30:07 not for operators, but in function names. And so there are some shenanigans you can get away with there. If you follow JFBastion on Twitter, you're probably familiar with this. Yeah. I've also, I think it was in a meeting C++ quiz, I saw it was just smiley face emojis. It was, you know, what is this output? Yeah, I remember that too. That was, we were there together.
Starting point is 00:30:31 That was 2019. Man, you remember that? You remember that? That was a crazy month. We were on the road for like six weeks. That was when we went to... You were on the road longer than I was. Yeah, that was... I miss those days.
Starting point is 00:30:49 I miss those days. How long has it been since I've been on a plane? It's almost, I think, 11 months or 12 months now. Yeah, it's got to be, yeah, because it would have been Prague last year that we went on a plane. Oh, man, I miss it. I miss it. I miss it. It's all right. The vaccine.
Starting point is 00:31:08 My partner just got her second vaccine yesterday. Oh, awesome. My grandfather and my dad are getting their second dose this week. My grandmother finally got hers. So now all of the people over 80 in my immediate family have gotten one so i'm happy about that yeah it's awesome it's uh it's happening slowly but hopefully 2022 we'll be able to meet up in person that brings our naming trilogy to an end we hope you have a great day and we'll see you in the next episode

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.