CppCast - Pattern Matching

Starting point is 00:00:00 Thank you. maker of intelligent development tools to simplify your challenging tasks and automate the routine ones jetbrains is offering a 25 discount for an individual license on the c++ tool of your choice sea lion resharper c++ or app code use the coupon code jetbrains for cpp cast during checkout at In this episode, we discuss initialization syntax in C++. Then we talk to Michael Park from Facebook. Michael talks to us about his proposal to add pattern matching for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how's it going today? I'm all right, Rob. Made it to 201. Made it to 201. It's 201 it's gonna take a little

Starting point is 00:01:45 getting used to to start saying uh can you count that high uh yeah i wonder if it's gonna be another big milestone when we hit a full bite it's 256 or 255 255 yeah well we should have started counting from zero i really wish that i had started my videos from like zero count and like octal or something. That's what I wish I had done. Yeah, that'd be nice. Yeah. It's too late now. Too late.

Starting point is 00:02:14 Well, at the top of the episode, I'd like to read a piece of feedback. We got a bunch of tweets last week after we released episode 200. This week, this one is from Luke Tre trevorrow saying episode 200 of cpp cast herb sutter was a joy to listen to congratulations to rap and jason for making it to 200 quality podcasts don't ever give up which is a reference to us uh talking about how much longer we'll keep this thing going but yeah as long as there's uh c++ news to talk, we don't have any plans to end it anytime soon. Is it fair to say 200 quality episodes? Could we say it's at least 170 quality episodes or something?

Starting point is 00:02:56 We were definitely still learning early on, but I would hope to think that the vast majority of them have been quality episodes. We'd love to hear your thoughts about the show. You can always reach out Hope to think that the vast majority of them have been quality episodes. Yeah. Okay. Well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at cps.com. And don't forget to leave us a review on iTunes. Joining us today is Michael Park. Michael is a software engineer at Facebook working on the C++ libraries and standards team.

Starting point is 00:03:22 His focus for C++ is to introduce pattern matching to facilitate better code. Michael, welcome to the show. Hey, thank you. Thank you for having me. So how long have you been at Facebook? I started April last year, so like a year and a bit. Okay. I feel like we've had many Facebook developers on here over the years.

Starting point is 00:03:43 Oh, cool. Victor went on recently, I know. Yeah, and you always have a pretty good presence at conferences from what I've seen, too. Yeah. CBPCon especially, I think we have a lot of people at Con. Right. Well, Michael, we've got a couple news articles

Starting point is 00:03:58 to discuss. Feel free to comment on any of these, and we'll talk more about pattern matching, okay? Great. So this first one is a blog post from timur dumler i know i've met him uh from jet brains but i i have trouble with that name and this is a post initialization in c++ 17 which comes from a talk he gave and he just had a lot of requests for this one slide he made on all the different types of initialization that you can do in c++ 17 and it's uh it's quite a little chart isn't it it is yes and you've definitely met him because he goes to all of the c++ conferences

Starting point is 00:04:40 without realizing it i think i've met him in like five different countries now or something i know we talked to him along with uh anastasia and phil at the uh at cpp con last year yeah both of whom i've also seen at five other countries yes uh but yeah is there anything you want to call out with this uh chart michael Jason? 42 entries. I noticed that. That's interesting. I did not notice that. That's funny. That's funny.

Starting point is 00:05:13 Yeah, I think it's a useful chart for people. I think that some of the entries actually kind of are confusing to me. Like the box for default initialization for aggregates, for example, it says uninitialized. But I mean, like, if we have, you know, a struct x with a string in it, and you default initialize that, that's not uninitialized. It's going to default construct the members, which is exactly what it says for the other types with no user-provided default ctor row. Under default init, it says members are default initialized.

Starting point is 00:05:57 So that seems to be the exact same case for aggregates for default initialization. So I'm not exactly sure what's going on there. So that's, I think the point is if you had like a struct of three ints and that first column, all their values are undefined. Right, that's true.

Starting point is 00:06:14 But it's very much dependent on what's inside the aggregates, right? Yes, I would agree. So it's kind of interesting. If it said aggregates with default, aggregates with built-ins only or something like that... Or trivial aggregates or something like that, maybe? Yeah, something like that.

Starting point is 00:06:32 But it just says aggregates, so it's kind of like, I don't know what the... I think I understand what it's trying to say, but maybe it's just slightly inaccurate or something. Yeah, and it also says copy initializations for aggregates doesn't compile, but if I have a struct x with an int

Starting point is 00:06:52 in it, I can copy initialize one just fine. I'm not sure what's going on there either. I have a few questions, but in general I think it's a useful table for people. I feel like it must just be like what could fit in the square because like if you had a struct of an int and you do

Starting point is 00:07:10 equals five because he has you know equals value there that wouldn't compile right because it doesn't have a constructor or a converter conversion operator or whatever that it could call uh that's i guess that's what they that's what he means yeah i think that must be the point yeah so it's yeah yeah but initialization is a crazy area that i actually kind of like stayed away from for a while uh just because i don't want to see all the stuff on the bed um but i see but i see like common mistakes like people doing like array uh initializing an array with like curly brace zero close curly brace thinking that that zero initializes the array like it's array or something and like it happens to work right because it'll take the zero value and then zero initialize the rest but if people put like open curly one close curly thinking that it initializes all of the values to one, like that is not what's going to happen.

Starting point is 00:08:06 Right. Initialize the first value to one and the rest of them to zero, right? Zero. Exactly. Yeah. And so the curly zero, uh,

Starting point is 00:08:14 close curly is like very misleading as to what's actually happening there. Yeah. That's a, that's a great point. I see that all the time. I think I will use this chart like in my classes i mean just like tell my students that this chart exists but when it comes up sometimes in classes and teaching i'm like yeah there's like 17 types of initialization in c++ but we don't really care these are the

Starting point is 00:08:39 three rules that i want you to pay attention to yeah yeah yeah i mean we try to we try to like shove it under some sense of intuition right right hopefully hopefully it comes out okay but all the nitty-gritty is pretty ugly yeah that's true okay uh next thing we have here is another update from visual studio 2019 and this is cmake 3.14 and performance improvements. So we've talked about this a fair bit, how they, you know, keep improving their CMake experience with Visual Studio. And with this one, it seems like the main improvement is that they upgraded to CMake 3.14. And with that, they can now use the file-based API. Jason, are you familiar with what that gives you? I am not. I read this and at first I was like,

Starting point is 00:09:26 wait, where are these performance improvements coming from? Oh, that's because of a change in CMake. But then I noticed that they said CMake had deprecated the CMake server, which I thought was interesting because there's a very brief period where like all of the build tools and compilers were like, we're going to stand up our own servers and everyone can talk to the server and they can get all the information that they want and then they're like never mind so apparently uh cmake server is no longer being used and now there's this file-based api that i'm not familiar with neither am i okay well it seems like it's a good move and uh you know if you're getting performance improvements out of it,

Starting point is 00:10:05 it's definitely a win. Well, yeah, we didn't actually... I don't think you've said out loud. It's two times performance improvement for CMake-based configuration. Yeah. It looks like they tested it on the LLVM code base. Which counts.

Starting point is 00:10:20 I mean, that's a large CMake project, is what I'm trying to say. Yeah, yeah, yeah. It's not like testing it on my toy project or something. No, no, absolutely not. Yeah, I'm sure it's a large c-make project that's what i'm trying to say yeah yeah yeah it's not like testing it on my toy project or something yeah no no absolutely not yeah i'm sure it's a good test case yeah i think it would actually be even even like more impressive if it was on a small project because i think oftentimes it's harder to get big wins on small projects all right i mean on the upside a small project doesn't have a measurable time

Starting point is 00:10:45 when uploading CMake hopefully yeah but it seems pretty impressive yeah okay and then the last thing we have is Compiler Explorer released a beta version of code execution

Starting point is 00:11:01 support so you can now actually see the code you put into Compiler Explorer and see what the output would be from that code when you run it, in addition to the assembly and all the other good stuff that Compiler Explorer already did. Yes. I was going to say, for background, for people who

Starting point is 00:11:17 watch C++ Weekly, I've been showing this support for years. Because if you run it locally, you've been able to do execution with no problem but matt just had the time and opportunity to be able to with some level of you know assurance that it's not going to take down this whole server put this up on the official website now gotcha let's see okay well michael can you start off by maybe giving us an intro to what exactly pattern matching is i'm sure we've you know referenced it a couple times and talking about

Starting point is 00:11:52 other languages on the show but i don't think we've ever gone in depth with it cool um yeah so pattern matching uh basically is a declarative approach to inspecting a value or values in that rather than using something like a dfls chain where you would poke into a given value to check its state. So for example if you have like a point class, if you were to check whether this point is in the origin, you would often end up writing something like if point.x equals zero and point.y equals zero. And so we have some value, and we're reaching into that value to check some conditions, right? And oftentimes, these conditions are capturable in what we refer to as a pattern. And so a pattern generally is a description of a value.

Starting point is 00:12:51 And patterns, they either match or they don't match. And so in this case, we would introduce a pattern like square bracket, zero comma zero, close square bracket, which would capture the idea of first element zero, second element also zero. And this also has to be a two-tuple kind of thing. And so you can view pattern from... So similar terminology comes from regex, right? Where you have the regex pattern, which is a description of the string that you're looking for.

Starting point is 00:13:21 And then we just say, oh, given some arbitrary string, see if this pattern matches that string. And then if it matches, then you can actually specify which components of that string you want, which will then give you your regex matches, which is your catcher groups, essentially. And so we can do something similar with generalized beyond strings, where we can say square bracket 0 comma x, close square bracket, where the second x captures the value of the thing that we matched. Okay. is you end up writing like a switch kind of language construct. And rather than having cases that are constants or integral constants, you end up being able to do a lot more elaborate things in there. And then it'll match the first pattern. It'll dispatch to the statement expression for which the pattern first matches.

Starting point is 00:14:26 So you said, you know, a lot more things. Like, what exactly can I put in there? Like, I don't know, can I say if x is greater than 5 and y is less than 4, and that's one of my cases? So that would be provided through what we call a pattern guard. Okay. So there essentially would be a small pattern language, right? So the square brackets, which is kind of like inherited from the structured bindings notation.

Starting point is 00:14:56 After, let's say, for example, I want to say, so let's go back to the point example, right? If we say, I want to match a point of x comma y and then i want x to be greater than y right then you would say open square bracket x comma y close square bracket if x greater than y okay and so you can put an if statement essentially after a pattern inside which you can put arbitrary glory expressions again now this syntax you're talking about uh you have a pattern matching library and we didn't this syntax you're talking about, you have a pattern matching library, and we didn't, I don't think, mention yet that you have a pattern matching proposal.

Starting point is 00:15:31 This syntax you're talking about is in reference to the proposal? This syntax is in reference to the proposal, yes. Okay. Yeah. So maybe we should dig into that a little bit. I mean, you also do have the pattern matching library. Is the library based on your proposal? Is it like a sample implementation?

Starting point is 00:15:49 No, the library actually came before the proposal. It was an experiment to see how far we could take a library to see, well, it was twofold. was is well could a library solution be enough and the second was if if it's if it's useful but it sucks enough then similar to boost lambda there would be some desire to actually make it a language uh language also because you know it's not sufficient right i'm sorry but you just mentioned boost lambda for our listeners who are not familiar with it it is completely unusable but you know it was useful enough people wanted it but it sucked enough that we wanted a language thing right and right that's the that's the balance we want to strike there. But yeah, it was,

Starting point is 00:16:53 I gave a talk on it in CppCon 2017 describing the challenges that I hit with it, namely around identifiers. Basically, it was pretty difficult trying to introduce identifiers inside a pattern because there are only so many contexts in which you can introduce an identifier in C++. And so inside of the pattern, which is essentially just a value with placeholders, which I called arg, because the placeholders inside the pattern in that library would get dispatched to the lambda,

Starting point is 00:17:27 which is on the right-hand side. Okay. And so, you know, again, going back to the point example, if I were to say something like, you know, product type, like ds, which stood for destructuring, if I were to say ds, open paren, arg, arg, close paren equals a lambda that

Starting point is 00:17:47 takes two arguments, when that pattern matches, the two elements of the pair would get dispatched to the lambda on the right-hand side. And so it of course hit a readability issue because when you have

Starting point is 00:18:04 complex patterns with arg, arg, arg, arg, like you have no idea what those are, right? We really want identifiers in place to be able to help, you know, denote what those fields are. And so what ended up, what we ended up doing then is like, okay, auto name equals arg, auto address equals arg, and then put those names inside the pattern. But that has to be declared beforehand, before you actually use it. Right. Which is like, it defeats the purpose of the whole thing. So, yeah.

Starting point is 00:18:41 So how is, I'm sorry, no, go ahead. Yeah, so I think as an experiment, it was a pretty successful one. It definitely helped to guide the language design quite a bit. But I think we need something more than library solution. Is the library used actively by anyone right now? I hope not that might be the first time we've ever had a library author on our show say i hope no one's using my life i like it there has been there has been and i mean because because i put zero effort into optimizing it the actual actual render performance of it,

Starting point is 00:19:26 it was really an experiment to explore the API space. Right. And I think there... I mean, okay, so to be honest, there are some people who are using it in their school projects or something like that because they're writing a compiler in C++ or something and they want to have pattern matching because it's pretty useful in that context. But performance doesn't

Starting point is 00:19:52 quite matter for them. And so that's the context in which I was saying. I hope no one's using it. That's awesome. Yeah. I want to interrupt the discussion for just a moment to bring you a word from our sponsors. PVS Studio is a tool for detecting bugs and security weaknesses in the source code of programs written in C, C++, C Sharp, and Java. It works under 64-bit systems in Windows, Linux, and macOS environments, and can analyze source code intended for 32-bit, 64-bit, and embedded ARM platforms. The PVS Studio team writes a large number of articles on the analysis of well-known open-source projects. These articles can be a great source of inspiration to improve your programming practices. By studying the error patterns, you can improve your company's coding standards, as well as adopt good programming practices which

Starting point is 00:20:43 protect from typical errors. For example, you can stop being greedy on parentheses when writing complex expressions involving ternary operators. Subscribe to Facebook, Telegram, or Twitter to be informed about all publications by the PVS Studio team. Links are given in the podcast show notes for this episode. Well, I guess let's talk more about the pattern matching proposal. So you said it is kind of informed by the work you did on your library. Who else is authoring the proposal? Like, what's the current status of it? Right.

Starting point is 00:21:14 So there are three co-authors on the proposal. David Sankel, Sergey, and Dan Sargensen. And those guys are at Bloomberg. So the latest paper, actually, the preceding one was from David Senkel from 2015, I believe. He proposed language variant as well as pattern matching in a single paper back then. And the feedback from the committee was to separate the proposals into two different things.

Starting point is 00:21:51 And after that, I went and implemented this library to get some experience. And then he and I, well, all four of us. So the other two co-authors joined David through because they work in Bloomberg. And then we kind of, we had competing proposals initially. And then we kind of, you know, got together and figured out how to combine them. How to combine them. Yeah. So that's how it kind of came to be.

Starting point is 00:22:21 The latest status of the paper itself is that we presented it to the Evolution Group on a Saturday session in Kona. That was back in February. And we're going to be scrambling for the next two weeks because that's when the Cologne deadline is. So the next mailing for Cologne, which the meeting will be held in july but the pre-meeting mailing is due uh two weeks from now oh well okay two weeks minus two days so so we have like 11 days or so yeah for our listeners when this airs that means they'll have basically two weeks to get the if there's any papers they're hoping to get to clone. Yeah.

Starting point is 00:23:07 Yeah. Something like that. So you're not targeting C++ 20 though, or are you trying to get this into C++ 23? Yeah. Yeah. So it's targeted for 23. Um, 20 is feature complete. Um, it's just issues coming back and stuff like that.

Starting point is 00:23:23 Um, it's, yeah, it's very early in its stage uh still um there's not a full implementation of it um there is you know the implementation that was done in my library as well as a library like mock 7 which was library that was done by yuri yuri solotsky i believe uh is how you pronounce his last name, but I'm not 100% sure about that. So Yuri, as his PhD work, I believe did Mach 7, which was a pattern matching library that used macros and such. But he did a lot of work in actually implementing optimizations for

Starting point is 00:24:06 getting performance that matched virtual dispatch, basically. And so there was a lot of performance work that was done there that I think we'll be able to draw from as well. But yeah, this is definitely headed for 23.

Starting point is 00:24:22 I hope it'll actually be one of C++23's marquee features. We'll see how that goes. It sounds potentially like it would change a lot of code, but I think we should kind of, I guess, dig into it some more. Yeah. You mentioned that you can match,

Starting point is 00:24:38 you can basically do destructuring and then have a guard. And I believe, looking at the proposal, you can also match on types? Yes. Okay. What else can you do? How exactly would the type matching work?

Starting point is 00:24:50 Yeah. Sorry, I didn't hear what Rob said. How exactly would the type matching work? Yeah, so the type matching happens through angle brackets. The primary use case is to match std variant. So there have been several complaints about the API of std visit as well as performance of std visit. I may have heard more of that than others because I have an implementation of variant. So people come to me with visit issues.

Starting point is 00:25:30 But basically the API for visits is kind of backwards, right? We have to first list the cases that we want to handle in a function object or something like that. And then we have to say, std visit of that function object, comma sequence of variants that you want to match. And so oftentimes we end up doing a single visitation. So it ends up being like visit function object, comma, variant.

Starting point is 00:25:55 As opposed to what people are more used to is the switch kind of syntax where we should be able to just say, you know, switch on a variant and then list the cases that you want to handle. If those happen to be types, that's fine. And so what we want to allow is for people to say something like inspect V where V is invariant and then angle bracket, int, angle bracket, string, etc. So those are the cases in your inspect statement. And so the way that would work is you end up visiting the variant and then checking that the

Starting point is 00:26:33 resulting value actually matches that type. And if it does then it would dispatch that handler. So is it kind of like structured bindings where there's hooks so that you can implement your own thing to decide what the type of this object is or whatever? So the customization points for variant-like would be variant-size, variant-alternative, and get. Okay. And it relies on those already existing customization points?

Starting point is 00:27:02 Yes. Okay. Well, those are not customization points today, per se. Oh, okay. But tuple size, tuple elements, and get are for structured bindings. And so it's kind of a similar mapping to variant. So what would actually end up happening in the kind of like pseudo-translated C++ code

Starting point is 00:27:23 would be, uh, you would do a std visit on a generic Lambda. And then inside the generic Lambda, imagine you have a, a sequence of context for ifs where the checks are the type of the thing that, uh, thing that thing that came in is the type that we're trying to match.

Starting point is 00:27:41 Right. Um, and then, and so it's not, it doesn't quite have the semantics of std visit with an overloaded set with int and string, right, because if you were to

Starting point is 00:27:53 match a double, then with the pattern matching proposal, that wouldn't match. Whereas with the overloaded function object, it would convert and match. And so it has slightly different semantics. Okay. But in terms of the mechanisms, that's how it works with NU.

Starting point is 00:28:11 Okay. So it would be extensible for other user-defined types, not just variant. Yes. And does it work with RTTI for dynamic types? Right. So for polymorphic types, yeah, the,

Starting point is 00:28:27 the, the idea is that it would perform sequential dynamic casts semantically. Okay. Not implementation wise. Implementation wise, we can do, we can, we can,

Starting point is 00:28:40 we can do tricks that were explored in Mach 7, which actually like caches the result of dynamic cast and stuff like that, which comes out to be pretty impressive numbers. Really? Yeah, yeah. The paper is actually super interesting. It, like, stashes a Vtable pointer

Starting point is 00:28:59 and, like, uses that for something. I don't remember the details exactly, but, yeah, there's some interesting stuff there. There's like a pseudo Duff's device thing in there that catches stuff. Yeah. So it wouldn't actually be

Starting point is 00:29:16 sequenced with dynamic cast, but it would be, semantically, that's what would happen. And so there are some nasty cases because dynamic cast allows weird things like cross-casts. And so

Starting point is 00:29:31 this is one of the things that's difficult is people intuitively look at examples and think especially the variant examples and think like, oh, why don't we just do best match here when we just have rare matching types. But the issue is that even if we're doing types,

Starting point is 00:29:51 for polymorphic types, for example, we can't really do best match. That's not really an option. And so we have to do linear search. And I think even though we could do best match for variant, it would be very confusing if these two very similar things have different semantics.

Starting point is 00:30:12 I think it would be much simpler for people to look at it and say, okay, we just do first match. That's a lot to know. And so those are the customization points. It's the tuple-like protocol,

Starting point is 00:30:27 the variant-like protocol for variant-like stuff, and the third one will be an any-like. And so dynamic cast is the any-like thing for polymorphic types, but if I listed any, for example,

Starting point is 00:30:43 and I wanted to match a sequence of type-based matching, then you could call anycast for each of them. That sounds potentially very handy, because otherwise any is kind of a bit wordy to work with. Right, exactly. And so the customization point for that currently is anycast, but maybe it should be did cast or something neutral. Yeah.

Starting point is 00:31:10 So those are kind of the user-defined type customization points. But it gets more interesting because we can actually support user-defined patterns as well. And so this was something that I covered in my talk at C++ Now, from which I actually got an audible wow from one of the members in the crowd. That's always fun, yeah. Yeah. So CTRE, which is Compile Time Regular Expressions. Right, Harness Library, yeah.

Starting point is 00:31:44 Harness Library, yeah. It's gotten pretty popular. One of the things that it can do is it can tell you the size of the matching, the resulting matches at compile time. So with std regex, if I were to do a match, the thing that I get back is something like a vector of string views, essentially.

Starting point is 00:32:08 Right. Whereas with Hano's library, I get an array of string views of some compile-time size. Okay. So what you can do with that is you can say CTRE of some string, some regex pattern

Starting point is 00:32:23 that match on some string, and regex pattern that match on some string, and then the result comes out to be some fixedize array. I can immediately decompose that with structured bytes. For example, I can have a

Starting point is 00:32:39 date regex, and I can say that date regex that match of some string, and then i can uh on the left hand side i can say square bracket auto square bracket um you know the entire match comma year comma month comma day post square bracket equals the right the regex match thing okay and in and and you can you can use the year month month, day immediately. Right. That's pretty cool. Yeah. So that's really nice. And so if you were to put this into the context of pattern matching, then if we were to want to test a string to see whether it's a date,

Starting point is 00:33:17 test whether it's a phone number, and then do different things based on it, then now we need to, you know, so we say, like, inspect S, and then what do you do? Like, we have a date regex, but we can't, like, call, we can't match the date regex with the S and then match on it immediately or anything, given what we've just talked about so far.

Starting point is 00:33:38 Okay. And so what we introduce is a notion of an extractor. So an extractor takes a value, calls extract on that extractor with the value, and then we pass on the result of that into a subsequent match, a subsequent pattern match. So we say something like extractor question mark pattern. Okay. Okay. Trying to visualize it all. Yeah. Yeah.

Starting point is 00:34:06 So let's say the extractor is like a date regex. So we say date regex question mark square bracket year month day. Right. Okay. So what that would do is take, it would call date regex dot match on S, try extract on s that would that would that would return some right and then that array would be matched against the

Starting point is 00:34:32 Square bracket here month day, which is on the right hand side, right? And then if that matches then you can then on the right hand side you'd have access to ear month day to use immediately Okay, and so you could do a sequence of these, basically. Right. And you can check if the string is a date, and do this, and if it's a phone number, then do this with the area code or whatever. So instead of a bunch of nasty if statements together,

Starting point is 00:35:01 combining with CTRE and your pattern matching matching you'd be able to do a nice clean table of all the different options that's exactly right yeah okay yeah sounds very nice yeah yeah so that was so that was pretty pretty cool i'm kind of curious about two things if you don't mind um and the cases were like you do type matching and you said that it's as if you had done an if constexpr or if you had done a dynamic cast or something in there, that does kind of imply, I don't know, I feel like there has to be some kind of case where it's handling the fact that there might passing in does not necessarily be valid for all types that are matched or all context or something. I don't know. It does feel like it does have to effectively do an if-constexpr

Starting point is 00:35:52 in any of the things that do type matching. But that can't be right for the types that are runtime known. Never mind. I might just be overthinking this. So we can move on past that. Okay. Okay. To be clear, patterns do have requirements that are placed on the subject, which is the thing that's being matched.

Starting point is 00:36:16 So, for example, similar to how structured bindings if you were to do square bracket x comma y equals some expression, the expression is expected to be duplice size 2 with get 0 and get 1 valid. Valid get 0 and get 1. Right. And so those requirements are placed on the value that's being matched by any given pattern. Okay. And so...

Starting point is 00:36:45 So, I guess what I'm trying to ask is the thing on the right-hand side, unlike a switch statement where there is no new scope, the thing on the right-hand side is a new scope, it has its own types, it has its own context, whatever, like, it's its own thing.

Starting point is 00:37:02 It's not just... The right-hand side has its own cut but you mean the right hand side of like a like a match case yeah like the actual operation you're going to perform i guess right so that doesn't have a that doesn't have a type right okay so so so okay so there are two forms of inspect they were proposing one is an inspect statement and the other is an inspect expression okay okay so the so the first form is an inspect statement. Okay. Okay. So the first form is an inspect statement, which is very much like a switch, where the right-hand side is a statement. And so there's no type.

Starting point is 00:37:30 Okay. And there's no new scope or anything right there. Well, there is its own scope. Okay. Because we need it for the names that are being introduced in the pattern. Right. All right.

Starting point is 00:37:42 Okay. Yeah. And so it does have its own scope. In the expression case, it also has its own... it also has a type. Okay. And so the expression statement

Starting point is 00:37:55 uses a equals greater than, which is a fat arrow, essentially, in the middle, instead of a colon. Oh. Okay. But an inspection expression is very useful because otherwise you end up doing

Starting point is 00:38:11 every case having a return statement and wrapping that whole thing in a lambda and specifying return types and then immediately calling it. And thinking about it kind of in the context of a switch statement is there a default case yeah so the default case essentially would

Starting point is 00:38:31 be um any so okay this is an interesting part actually that we have to discuss uh during the meeting which is the underscore um okay which is which is a pattern that matches anything. Okay. But if you look at it, any identifier matches anything. It just binds whatever is being matched. And so if you were to just put foo as the pattern, that would match anything and just give it an alias foo for the thing that's being matched. Right? Okay.

Starting point is 00:39:02 And so you could just use underscore in that way where underscore just introduces a new variable, a variable name, because it's a valid identifier today, right? But what that means is you couldn't do something like underscore, comma, underscore, close square bracket. Right. Because I have a redeclaration issue. Yeah, and people have been wanting to skip

Starting point is 00:39:24 structured binding elements forever doing something like that. Exactly. And so we brought a proposal to Kona. I think the number was P1469, which was to propose disallowing use of underscore in the context of structured bindings so that we could use it to ignore stuff in structured bindings but also in pattern matching. And that did not fly.

Starting point is 00:39:55 people didn't want to because it's an already existing variable people didn't want to introduce context sensitiveness into it. Although technically a variable that's just underscore is reserved by the standard anyhow, right? Because anything that starts with an underscore...

Starting point is 00:40:15 No, that is not true. Underscore at the global namespace is reserved, but not inside namespaces. It is underscore followed by a capital letter is reserved everywhere. That's true. Underscore, underscore, anywhere in your identifier is reserved.

Starting point is 00:40:35 But underscore followed by a small case is not. Lowercase is not. In a local scope. Yeah. Yeah. I still don't do it, though. Yeah, me neither but but but but but but the issue basically is popular libraries like google mock um puts has a google

Starting point is 00:40:54 colon colon testing colon colon underscore uh which is passed to um their framework to indicate i don't care what gets matched here and so in Google mock, you can say stuff like, expect this function to be called with these arguments. And so you might say, expect call foo comma one comma two. And then you call foo with zero zero, and then the framework will tell you, oh, I expected a call to foo with one two, but you passed it zero zero. And if you didn't care what the value was for one of those things, you could pass underscore, which is their placeholder, which is an actual value there.

Starting point is 00:41:38 But their library knows how to interpret that to ignore those values. Right. And so and so because of these existing uses of underscore, we couldn't quite make that the otherwise case, which should have served as a people case, as you were asking about. Okay. So we need some different way to spell it. There has been some discussion of double underscore, which I, because it's already reserved, right? Which is definitely reserved, right. But, yeah, I hate it. I think it's a terrible way to go, but I don't know.

Starting point is 00:42:14 We'll... Yeah. A question mark has been mentioned. There's always triple underscore. Shh. I did have a silly idea of allowing two or more underscores at which point you could actually

Starting point is 00:42:29 if you have a table of score bracketed structured bindings looking matches you could actually use seven underscores if you need to align your variables properly right yeah but yeah so I don't know we'll see

Starting point is 00:42:46 what happens there but you know question mark has been brought up as an option but I think that would be challenging I think there are various extensions for like question mark colon I think there's a GCC extension that does question mark colon and so like if you were to use that

Starting point is 00:43:02 as a default case you would end up with question mark colon stuff right okay and that might be an issue yeah simple just grammar is crazy so yeah oh yeah yes i mean yeah i really feel like sorry go ahead i had i think students at one point recently asked why lamb does have the syntax they do and i was basically like because it was the only option left i mean that's not 100 accurate but i think it's pretty close to the truth actually yeah yeah yeah working on this proposal i really feel like i feel like i i got myself into a 35-year-old Jenga game.

Starting point is 00:43:49 It's pretty challenging. Where do I poke? Is this thing going to fall? I don't know. Is there any particular use cases for pattern matching that we haven't already discussed that you think C++ developers are going to find really useful um specific use cases for pattern matching yeah i mean i guess we haven't quite discussed trees all that much okay but pattern matching comes up in tree uh tree traversals but

Starting point is 00:44:20 also tree manipulations uh quite a bit and so um so And so one of the benefits of pattern matching is that it allows you to nest patterns, which works exactly like how you nest values, right? Like when we build values, we build by composition. You might have some int that you put inside of a pair, which you put as an alternative of a variant, which may be going to a struct or whatever so these things compose um and so and so in the exact similar way you can build patterns in a composition uh via composition and so you can you can think back to how you built

Starting point is 00:44:58 that value and then build build a pattern to see like do i do i have that thing and so with the tree it works in a similar way as well where I might say like oh like check if the thing on the left is so okay so in an expression tree for example like an arithmetic expression tree if you were to perform some simple simplifying operation you actually need to look deeper than one level down to be able to figure out that you actually can do simplifications. Okay. I wish I had a concrete example to share with you guys, but I think it's concrete enough. So if you say on the left, I might have an int, but I need that thing to be zero. And if I have a plus, then just return the right-hand side.

Starting point is 00:45:44 Sure. to be zero and if i have a plus then just return the right hand side right sure and i need to be able to do multiple uh levels of inspection before i can actually make this decision right um and so uh and so this kind of stuff comes up a lot in in in tree manipulation specifically uh one of the one of the um uh poster child for it is red-black tree rebalancing. You can... So with something like a stidmap, which keeps track of... which has a red-black tree underneath,

Starting point is 00:46:16 it actually consistently rebalances this tree to keep the height at a reasonable height. And the process for that, the process for which that happens is you inspect the structure of your children. And so it would be like, if you have left, left, like red, red, and then on the red, you have a black and a red

Starting point is 00:46:38 or something like that in some orientation, then you can reorient that such that we restore its invariant. And the code for it actually turns out that there's actually like four cases, four cases of the structure that you're looking for of the children. And then all of them actually emerge to the same structure. And so the code for it ends up being like 10 lines of code. As opposed to oftentimes when you look at C++ or Java

Starting point is 00:47:05 examples of red-black trees, there's a left-rotate function, rotate-right function, left-left-rotate function, and all this kind of stuff, which gets very big and very complex as to how you reason about this code. And so it allows you to think structurally

Starting point is 00:47:21 when you actually modify these trees and things like that. So that's a use case that I think comes up quite a bit. Okay. I was wondering if the type matching feature can match multiple types and if so, does it then work for multi-method dispatch? Yes. Yeah. It's actually not included in the paper today but the intent uh is that we allow matching multiple values okay uh and so you you want to be able to say inspect

Starting point is 00:47:58 a comma b comma c and then give it um which would you an implicit tuple-like thing, and so you would put square brackets around your pattern to match the multiple values. But if those happen to be variants, for example, you could do double dispatch. Right. Okay, sorry. Yes, go ahead. Yeah, so it would basically do tuple matching of those things. But remember, each pattern can still answer the question of, does this thing match?

Starting point is 00:48:36 And how that gets optimized underneath, we can do whatever. Compilers will have full power to do whatever they need to do to make that fast but i think i think like in terms of performance i think it's we have strictly more information than we did before right which which i think is uh is a good thing yeah i see that it gives the compiler more semantic i guess context for what you're trying to do. It doesn't just see a block of if statements. Exactly. Yep. Yeah.

Starting point is 00:49:07 It's very difficult for compilers to sift through arbitrary Boolean expressions. Right. And to deduce meaning from it. So we had Herb on last week, and he talked about his reflection in Metaclass's proposal. You said you had to have a library implementation of this do you think that library implementation would be any different if you had reflection and metaclasses or would the proposal change at all when those are available um i think my i mean i think i think my answer would be no probably um reflection i'm somewhat familiar with. Metaclasses, I'm not that familiar with.

Starting point is 00:49:45 But one of the issues with the library solution really is the lack of notation available. And so if you look at a pattern inside the library, like they're all just function calls right and so and so like there's no like it's hard it's very hard to look at a pattern and deduce what the structure of it is whereas with the added notation for square brackets and angle brackets for types and things like that when you see something like angle bracket point close angle bracket open square x, x, y, close square, you can kind of visually see, okay, I'm matching a point, and then I'm going to destructure that into a tuple-like thing. When you just see alternative, open paren, destructure, open paren, x, y,

Starting point is 00:50:41 it just looks like a sequence of nested function calls, and it's hard to deduce visually what's happening there. And I don't think metaclasses gives you the ability to add new syntax. Yeah, I don't think so either. Yeah, and so part of it is visual inspection,

Starting point is 00:51:00 and so I don't quite think that it would help much in this case. That makes sense. Well, it's been great having you on the show today, Michael. We'll definitely keep track of the progress with the proposal going towards C++23. Sounds great. Yeah, thank you.

Starting point is 00:51:19 I guess we haven't mentioned the number for the proposal. Oh, yeah, go ahead. It's P1371. 1371. Yeah. Okay, well good luck at the Cologne meeting. Yeah, thank you very much. Thank you for having me on the show. Thanks. Thanks so much for listening in as we chat about

Starting point is 00:51:36 C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter.

Starting point is 00:51:54 You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com cppcast and of course you can find all that info and the

Your Ad Here

CppCast - Pattern Matching

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.