CppCast - rr

Episode Date: December 2, 2015

Rob and Jason are joined by Robert O'Callahan from Mozilla to discuss the RR project. Robert O'Callahan has a PhD in computer science at Carnegie Mellon and did academic research for a while a...t IBM Research, working on dynamic program analysis tools. At the same time he was contributing to Mozilla as a volunteer, until he switched gears to work full-time with Mozilla; Robert has been working on what became Firefox for over 15 years, mostly on layout and rendering in the browser engine and on related Web standards like CSS and DOM APIs. Lately he's been devoting about half of his time to rr. News Breaking all the Eggs in C++ The wind of change Celebrating 30th anniversary of the first C++ compiler: let's find bugs in it Robert O'Callahan Robert O'Callahan's website @rocallahan Links rr project Mozilla on GitHub

Transcript
Discussion (0)
Starting point is 00:00:00 Episode 36 of CppCast with guest Robert O'Callaghan recorded December 2nd, 2015. In this episode we discuss breaking all the eggs in C++. Then we'll interview Robert O'Callaghan from Mozilla. Robert will talk to us about RR and how it can change the way you debug. Welcome to episode 36 of CppCast, the only podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? Doing great, Rob. How about you? Doing good. We've got a lot of catching up to do,
Starting point is 00:01:13 because it's been about three or four weeks since we actually recorded our last episode. To listeners, it's only seemed like a week, because we pre-recorded a bunch, but we actually haven't talked in quite a while. Yes. hopefully we remember how to do it yeah so at the top of our episode i'd like to read a piece of feedback uh this one came in a couple weeks ago from akshay and he wrote in hey i love cpp cast can we have a person from mozilla who works on firefox or any of the projects as they do tons of amazing c++ work like rr i think that sounds like a great idea for a guest, Akshay.
Starting point is 00:01:48 So we'd love to hear your thoughts about the show as well. You can email us at feedback at cppcast.com or you can find us on all social networks at CppCast. We're on Twitter, Facebook, and iTunes. We appreciate those iTunes reviews as well. So joining us today is Robert O'Callaghan. Robert has a PhD in computer science from Carnegie Mellon and did academic research for a while at IBM Research, working on dynamic program analysis tools. At the same time, he was contributing to Mozilla as a volunteer until he switched gears to work full-time with Mozilla. He's been working on what became Firefox for over 15 years,
Starting point is 00:02:26 mostly on layout and rendering in the browser engine, and on related web standards like CSS and DOM APIs. Lately, he's been devoting about half of his time to RR. Robert, welcome to the show. Great, thanks for having me. Well, that's an amazing time span there. When did Firefox actually get its name? Do you recall?
Starting point is 00:02:46 How long has that been? It's so long ago. I think it was 2002 or 2003. Wow. We actually went through a few different iterations of the name because of trademark issues. That's cool. Okay, so we have a couple news items we want to get through
Starting point is 00:03:03 before we start talking about RR. This first one is a Scott Myers blog post titled Breaking All the Eggs in C++. And Jason, we were both kind of looking forward to this article because we kept talking to Scott after we had him on the show a couple weeks ago. And he kind of hinted that he was working on this type of article. So it's nice to finally see it. Yes was looking forward to it definitely yeah and just to go over a couple things he's suggesting um he wants to make it so you know if you're overriding a virtual function you have to say override getting rid of null and zero in favor of null pointer um some very interesting things.
Starting point is 00:03:45 And what he goes into at the end is that all these changes that he's proposing, you should be able to create a Clang-based tool in order to make these changes to a code base. And he's saying the standards community should basically work on deprecating and then removing these features over a course of 10 years. Which sounds pretty reasonable to me. What do you think, Jason?
Starting point is 00:04:08 I don't know how I feel about actually completely deprecating the features, but I could totally see modes and compilers or something that would say enable this stricter mode. I don't know. I don't, I mean, mean it's I guess maybe I was looking for something maybe a little bit more drastic to see like let's really get rid of some stuff
Starting point is 00:04:31 but I don't know what that would be yet yeah Robert what are your thoughts about this you know you can't it's hard to undervalue backers compatibility so it's changing stuff like this is really scary especially You know, you can't under... It's hard to undervalue backwards compatibility. So changing stuff like this is really scary,
Starting point is 00:04:52 especially, you know, I'm used to working on the web where there's a whole lot of old, old websites that browsers have to keep running. And so we take backwards compatibility very, very seriously. I think the situation in C++ isn't quite so bad because if someone's running the compiler and compiling the program, then they must be, or hopefully they can actually change it
Starting point is 00:05:09 and fix it for a new C++ standard. Yeah, and we're only talking about changing this with future versions of the standard, like deprecating null and zero as null pointers in C++20 and then maybe removing the actual feature from C++23 or something like that. So unless you're updating your code base to a C++ whatever compiler, these things won't even affect you.
Starting point is 00:05:40 Well, and as Scott points out, you can use things like Clang's tools right now. Basically, all of these examples could totally happen tomorrow, and we could use Clang's tools to rewrite the code to meet the new standard. Right, right. So as long as that path exists, yeah. So speaking of getting into new versions of the standard, this next article comes from Meeting C++ where Jans actually tweeted out a survey
Starting point is 00:06:11 to see what version of C++ developers were using. It's a little biased because the people who are following the Meeting C++ Twitter account are more up-to-date with things and they're active and interested in where c++ is going but still it's a very interesting poll and the c++ 11 is the most used version according to this twitter poll with 57 and 17 using c++ 14 so you know almost three quarters of users using 11 or 14 which is pretty impressive jason do you have any thoughts on this oh yeah i've i mean it's it's great to see people moving
Starting point is 00:06:51 that way i think everyone i know who's currently not using the new standards is because of some business reason they have to support ancient compilers basically right it's unfortunate how about mozilla robert what is Mozilla using these days? We're mostly, well, we're using C++11 a lot, and it's been great. Our biggest problem is on Android. There's some old, the libraries there are not so good, so especially STL stuff on Android is not so good.
Starting point is 00:07:22 But we're using 11. We've gotten into the habit of introducing new features as quickly as we can based on the platforms we're supporting. We're trying to keep our compilers up to date to make that possible. It's been good. I forgot about that. I've gotten a couple of support requests for my open source project for people complaining I can't compile on Android because of STL issues.
Starting point is 00:07:45 Yeah. Okay, and this last article is pretty interesting. We talked a couple weeks about a week ago about how it was the 30th anniversary of the first C++ compiler, Cfront. And someone went
Starting point is 00:08:01 ahead and looked at the code which is now open source and went and looked at the code, which is now open source, and went and looked for bugs in it, which was pretty interesting. Jason, did you take a closer look at this one? I did, I did. And I would totally recommend reading this article, by the way, going through each of the examples,
Starting point is 00:08:18 because there is a couple of random... Well, okay, so the first thing that struck me, if I may, is Seafront, let's see, I'm looking at the date here, 1983, right, was the first thing that struck me, if I may, is Cfront. Let's see, I'm looking at the date here. 1983, right, was the very first type. So I'm looking at these calls to C standard library calls and really just letting it sink in that they have been unchanged for at least 32 years. And that just kind of impresses me a little bit.
Starting point is 00:08:45 But he's making some mistakes that would totally be caught by a modern compiler, like passing the incorrect type of arguments to fprintf. But there's a couple of little things and a couple of little tidbits, and Bjarne responded to this article, and they have his responses at the bottom. So it's a neat little piece of history, and you could probably learn something about good quality code yeah definitely agree
Starting point is 00:09:11 with that okay robert let's talk to you about rr we actually went over this a couple weeks ago as a news item but uh for anyone who hasn't listened to that episode, could you maybe give us an overview of RR? Sure. Well, I mean, the name, as the name implies, RR is really about recording and replaying program executions. And what that means is that you can run RR on a program, which could basically be a set of processes, you know, a whole tree of processes, and it will record the entire execution in essentially perfect detail, which means that we can replay that
Starting point is 00:09:53 and the program will take the exact same memory and register states as it executes during the replay as it did during recording. That means basically all the information you need to debug a failed execution is there, and there's really nothing missing. So you get a perfect replay. And of course, to make that useful,
Starting point is 00:10:17 we have to make it possible for you to actually debug the program as it replays, and we do that. We have integration with GDB, so that you can apply GDB to the replay as it replays and we do that we have integration with GDB so that you can apply GDB to the replay as it happens and we actually then went further than that and made it possible to use reverse execution during the replay so GDB has some commands like reverse continue, reverse step that work with things like breakpoints and normally in GDB the implementations of those are either completely missing or really really
Starting point is 00:10:50 really slow but with RR we can implement those very efficiently. Basically the idea is that you replay the program and take checkpoints periodically and then you know if you do if you do a reverse continue command, we can restore to a previous checkpoint and then replay again until we get to the part of the program that we need to stop. So that's basically what we do.
Starting point is 00:11:14 Yeah. So what was the motivation to first create RR? Right. So there are really two motivations. One is that we have a lot of problems with intermittent test failure. So, you know, like most big projects, we have loads and loads and loads of automated tests. When you do a Firefox check-in, you know, we probably run hundreds, maybe thousands of machine hours of automated tests to validate that change.
Starting point is 00:11:49 Now, some of those tests have problems where once in a while, the test will fail. And this is because of a bug. It could be a bug in the code or it could be a bug in the test. It could be some non-deterministic weird thing that's bug in the test uh it could be you know some some non-deterministic like weird thing that's happening in the system and the test farm and so these are these are frustrating right because you know whenever you do a check-in you're running millions of tests and a handful of them will fail and it's difficult to know whether that whether you cause that or whether it's
Starting point is 00:12:21 what's happening um and so these intermittency are horrible and they're hard to debug because as we all know, debugging bugs that only occur once in a blue moon is really, really painful. One of the goals here was to make those bugs easy to debug by running the program until you catch one of those
Starting point is 00:12:39 test failures in RR and then you can replay that failure over and over again, get the same failure, same execution, and you'll be able to figure out what's going on. That's the idea. So that was the first goal. As well as that, though, we just want to make debugging more fun. I mean, we all spend, certainly at Mozilla and I think elsewhere, we spend a ton of time debugging.
Starting point is 00:13:06 You know, we've got loads of bug reports and we spend a lot of time fixing those bugs and a lot of that time is debugging. And so we wanted to make that better and it's pretty well known that when you're debugging, you're reasoning backwards from effects
Starting point is 00:13:21 back to causes. And so that's naturally something you want to do by running the program in reverse time, reverse time order, right? And we have different, well, we've all developed different strategies for simulating that. We use logging or we run the program lots of times
Starting point is 00:13:41 or we have some guesses about where the bug might be and we try to break there and see what's happening. But we're actually always working around the fact that debuggers normally only execute forwards. And the reason they do that is because it's hard to implement reverse execution. So we had the idea always as well that RR would let us implement
Starting point is 00:14:03 a really good reverse execution back backend that programmers could really use and it would be efficient. And that's how it worked out. I'm kind of curious. You said these basically are features built into GDB already that you have made better versions of. Did I understand that correctly? That's right. So there's actually, GDB has several different backends implementing these features.
Starting point is 00:14:34 You know, there's, if you just use GDB out of the box on Linux, you can tell it to record program execution. And it will actually save some execution to a buffer and then implement its commands on top of that. But there are a lot of pretty big
Starting point is 00:14:56 limitations with the one you get out of the box. The main limitation is that the way that that's implemented is that when it's recording program execution, it basically single steps the program and records the effects of every single instruction uh to memory and so it's at least a thousand times slower than running your program normally wow whereas uh with rr the overhead is more like like 1.5 or less times. Wow.
Starting point is 00:15:27 So you can see there's a pretty big performance difference. Another limitation is that with GDB, because of the very, very intense logging it's doing, you can only record a small section of programming execution before you basically run out of memory and explode. So whereas RR, you can record it, you know, you can record maybe a second of execution if you're lucky, right? But with RR, we can record hours of
Starting point is 00:15:51 execution, and we've done that, and it's fine. Wow. So does that mean that standard GDB GUIs that are out there work with these features also then? I don't think so at the moment because I think these features are very rarely used
Starting point is 00:16:08 by X developers because of those performance issues. And I don't know of any GDB UIs that support this other than the command line. I could be wrong. I haven't looked at them. I don't use them myself, so I don't know. Okay. I hope that they could be added pretty easily though.
Starting point is 00:16:24 All right, cool. So you talked about the goal or one of your goals was to make debugging more fun what's the workflow like exactly when using rr right so the workflow is that you you just run your program under rr uh you just basically you rr and then you know your line. And then RR runs the program. Everything should work as normal. You can interact with it. Because RR is low overhead, it should feel just like you're using it normally. And it does.
Starting point is 00:16:55 It really works. And so when you've finished recording your program, you can shut it down or just kill it, and then you can use an RR replay command, which basically drops you into GDB at the start of where the program started running. And then
Starting point is 00:17:18 you can basically just run GDB on it like normal. One limitation right now is that you don't get the actual window if you're debugging an interactive application because the actual window is part of the side effects of the program which aren't replayed. We're not writing to files or writing to sockets during the replay
Starting point is 00:17:36 because obviously that would kind of be crazy if your program did something dangerous or state-changing. But you can get all the console output, and of course you can set breakpoints anywhere, inspect anything, and of course you get those reverse execution features when you want them. There are other extensions to that workflow that we support.
Starting point is 00:17:58 So you can, for example, tell RR to replay the program and log an event number before every line of console output. And then you can replay to a particular line of console output. Interesting. We'll start the replay. We'll basically seek to that particular point and start replay there
Starting point is 00:18:17 so you can figure out what caused some particular line of output to be produced. And yeah, that's basically, those are the most useful workflows at the moment. It's quite common also to, if you've got a bug that's hard to reproduce, you might run RR
Starting point is 00:18:33 on your test framework over and over again, you know, on your test harness until you see the bug. And then you might, in between each run, if you've seen the bug, you would stop.
Starting point is 00:18:43 But if you haven't seen the bug, you kind of delete that recording and just try again, so that sort of workflow is also very common. And one more workflow that's common is if you're debugging a multi-process workload, which we do a lot in Firefox, because Firefox is sort of multi-process now at least
Starting point is 00:18:57 in nightly builds and so if you record a workflow with lots of processes in it you can get like a PS command that shows you all the And so if you record a workflow with lots of processes in it, you can get like a PS command that shows you all the processes that were spawned during that session and you can attach to a particular process and debug that. Interesting. So you talked about how you have all these automated tests
Starting point is 00:19:18 that run against every commit. Are you now running those automated tests using RR or do you kind of go back to RR if you see a problem occur? We aren't doing that yet. And that's because there's a couple of reasons for that. We are looking into that. We have people who are actually setting that up. The main issue right now is that some test failures are hard to reproduce under RR.
Starting point is 00:19:44 It's because of the way that RR works. We basically simulate a single core, and then you can have an auto-threaded program, but we only run one thread at a time, and it's a technical thing to do the way RR is designed. And that's fine for lots of bugs, but there are certain kinds of bugs that are hard to reproduce that way. And we're working on improving that situation, but it's kind of a research problem.
Starting point is 00:20:10 I can talk more about that later, what the issues are. Partly because of that, it's not really a good idea to run all our tests under RR. Okay. Because that might mask some bugs that could really happen. Some of the other issues are just sort of technical, like just actually getting it running in our cloud environment. All our Linux tests are running on Amazon, and getting it running there is actually a problem as well, because most of the cloud providers don't enable
Starting point is 00:20:41 the hard performance counter features that RR relies on. So you can't actually run RR in the Amazon cloud. There is one company, DigitalOcean, has a cloud setup where they actually enable those counters, so we can't run them there. So that's another issue that right now all our tests are not running on the right cloud. I'm guessing you already answered this question, but is QA running the application using RR,
Starting point is 00:21:06 or do they kind of find a bug and then developers use it? It doesn't really reach QA yet. We're just having developers mostly using R at the moment. So you said it relies on the hardware counters. That, I believe, implies it doesn't work in VirtualBox. Is that correct? Do you know? It doesn't work in VirtualBox at the moment. It does work in VMware.
Starting point is 00:21:30 Oh, okay. And it does work in Linux KVM. So virtualization isn't inherently a problem. Some VMs are able to virtualize the counters. It's just that... And Xen also supports PMU virtualization. So VirtualBox doesn't. It would be a neat project for someone to add a PMU driver to VirtualBox
Starting point is 00:21:51 since it's open source. We haven't gotten around to doing that, but it would be cool to do. Yeah, I could have used that for some performance monitoring I was doing recently. Kept running performance tests and not getting any counts back, and then I figured out why. Yeah. So we've talked about a couple of limitations that R has, the single-threaded aspect,
Starting point is 00:22:11 not being able to run on some virtual machines. Are there any other limitations that are worth pointing out? Yeah. One big limitation is that this is really only for Linux, and it's quite interesting, I think, why. I mean, RIR relies on recording users. RIR records user space execution, and it replays user space execution.
Starting point is 00:22:36 And we kind of need to understand everything that's happening in user space. And RIR also needs to understand the kernel interface in considerable detail. And in Windows in particular, the interface of the kernel is an unstable and sort of mostly private interface that only Microsoft really understands.
Starting point is 00:22:58 And so Microsoft could implement something like RR, but it's very difficult for anyone else to do that. And it would also kind of be more work because an interface, as I understand it on Windows, changes version to version. Whereas on Linux, the system call interface is very well documented. It doesn't change. It's a stable interface.
Starting point is 00:23:17 It's backwards compatible. And any time you have a problem, you can actually dig into the Linux kernel source code and try to figure out what the kernel is actually doing and that's very very helpful so although the techniques of RR could be applied to other platforms but anything that's not Linux is going to be a lot of work it's going to basically be a rewrite and on Windows especially it'll be very very difficult to do the other limitation that i mentioned sorry yep i was going to ask is rr able to run on mac os 10 no um again it could be ported and the port for mac would be easier because you've got more source code available and stuff um but it doesn't uh it
Starting point is 00:23:59 doesn't run there again it would be a ton of work like okay i'm gonna have to read because most a lot of the what RR does is really about wrangling the system core interface and the interface between user space and the kernel. So it's a lot... And those are different, quite different. Another limitation, which I mentioned earlier, is the single core limitation.
Starting point is 00:24:17 So if your program application is very parallel, you use a lot of cores, then RR is going to slow your program down massively. It's also quite likely that the kind of bugs you're interested in won't even show up when you're using it because of that. So that's a limitation. I mean, obviously there's, you know, Firefox can use multiple cores and we do have some, we do have parallelism, but most of our tests don't really exercise that. They're focused on exercising specific pieces of DOM and HTML and CSS functionality and other APIs, and those don't really require multiple cores to perform well.
Starting point is 00:24:55 There's also a few extra limitations to do with the kinds of kernel features that RR doesn't support. So the way that RR is designed is that we don't support sharing memory between the application that's being recorded
Starting point is 00:25:13 and things outside the application like device drivers, kernel device drivers, or other applications, other processes that aren't being traced. And that's just because of the way that RR works, which is that... Well, I'll go into that later, maybe.
Starting point is 00:25:30 But because of that limitation, that actually works fine on Linux. You can turn... Linux applications tend not to share memory with the rest of the system, except in a few really well-defined cases, like X11 shared memory or the Pulse Audio shared memory buffers,
Starting point is 00:25:49 and we can actually turn those features off and there's some kind of fallback for most of those features. And that's what we do, and everything works great. But if you have a program that needs to share memory with some other thing that you can't record in RR, and a particular case of this is graphics drivers. So if you're using direct rendering on Linux,
Starting point is 00:26:14 then you might be sharing memory directly with the graphics driver, and that could cause problems. So there's a few limitations like that. That's basically it, though. It's pretty general. I mean, we've run a huge number of different applications through RR successfully, and it's been good. You just alluded to maybe that you could go into the mechanism by which RR actually works.
Starting point is 00:26:40 Can you do that over the call? Yeah, I can probably give you... I mean, I think it's, yeah, I can. Okay. So, I mean, basically, the basic goal of, basic design goal of RR, which makes it different from pretty much everything else that's out there, is that we wanted to build a tool that had very low overhead, and we also didn't have many resources to build it. We only had a few interns and some half-time engineers. So we took the approach in the beginning
Starting point is 00:27:13 that we weren't going to instrument code. Most tools in this space, pretty much every other tool in this space, instruments code it. You know, basically some kind of binary code rewriting system, something like Valgrind or Dynamo Rio, basically where you've got the machine code of your application and you're translating it and adding instructions
Starting point is 00:27:33 into the instruction stream to record things and monitor things. We decided from the beginning that we didn't want to do that, and we decided to try to figure out how much we could do without doing that. And two reasons, really. One is that those systems are horrendously complicated and very fragile. So, you know, every time Intel extends the instruction set, which is like all the time, you have to update your code rewriter to handle all the new instructions,
Starting point is 00:27:58 which is a ton of work. And it also means you're always kind of behind, right? Someone brings out a new compiler and suddenly your tool doesn't work. The other problem is that it solves its overhead. Rewriting binary code is expensive. You can get the overhead reasonably low, but it's always going to be there.
Starting point is 00:28:23 And especially it's there when you've got self-modifying code, right? And self-modifying code is actually incredibly common for web browsers because we've got these JavaScript engines that are generating code on the fly and also patching code on the fly. They do a lot of code patching for technical reasons. They've got these polymorphic inline caches, is the buzzword. And so, you know, rewriting binary code
Starting point is 00:28:48 is slow and difficult, and we didn't want to do that. So we don't do that, and it's basically the key differentiator for RR. And we use performance counters to measure progress. So the key problem for these systems is, if I've got some asynchronous event, like an alarm going off and trapping to some signal handler,
Starting point is 00:29:09 we need to be able to replay a certain amount of program execution and then deliver that interrupt at exactly the same point where it happened during recording. And that's difficult to do. That's really the hardest single problem we have. And so you need to be able to measure progress accurately through your program. And that's what we use performance counters for. So in RR, we actually count the number of retired conditional branches.
Starting point is 00:29:35 And that's our measure of progress. And we combine that with the state of the current registers. So we say, okay, stop the program after we've executed this number of conditional branches and the registers look like this. And then we record that and then we replay and during replay we actually get that and we can do that with no-code
Starting point is 00:29:56 instrumentation by counting those branches using the performance counters. That's pretty all I need to say about that. Okay. Very interesting. So what's the future look like for RR? Yeah, so there's tons of stuff that could be done with RR. It's a really powerful sort of baseline framework. Once you've got this record and replay
Starting point is 00:30:19 and also checkpointing capability, you can build a ton of interesting stuff on top of that. I really hope that people actually pick it up and do some of those things with it because we at Mozilla don't necessarily need and don't really have the time to actually build everything that you could do. So some of the projects
Starting point is 00:30:41 that I probably would like to do at Mozilla, one of them is really to focus on this problem of reproducibility. We do have bugs that are hard to reproduce that don't reproduce on RR easily, and I want to understand more about why that is and try to tweak RR to make those things more reproducible. We believe that a lot of those bugs are to do with the way that the scheduler works. RR's scheduler is pretty deterministic at the moment,
Starting point is 00:31:10 and maybe we need to introduce some randomness to the scheduler. But there are probably other things as well that are going on. So we want to figure out that problem. As I said earlier, it's a research problem. No one really knows how to do this, as far as I know. And so that's something we're going to have to investigate. Another, well, there's so much more that could be done, I don't even know where to start.
Starting point is 00:31:35 But another big, big area is sort of dynamic analysis of the replay trace. So imagine that you want to use something like Valgrind. Valgrind, I think is the correct pronunciation. Imagine you want to use it to find memory leaks or memory errors in your program. RR is good for debugging a bug once you've found it, but you might want to apply some dynamic analysis tools
Starting point is 00:32:02 to actually detect bugs. Well, we could actually do that during the replay. So you could record the program with really low overhead, and then you could turn on some instrumentation during replay to search for bugs. And that would be kind of neat because it means that the low overhead recording would mean your program wasn't too disturbed by the tools you're running, and then you can run the other stuff offline at your leisure. That would be awesome.
Starting point is 00:32:31 And a framework for doing that could be built, and it wouldn't be that hard, but it's just a bunch of time. Those are a couple of things that would be great to do. I could definitely see utilizing that. I was also wondering if maybe it's possible to monitor things like memory usage over time or something during the replay. Yeah, absolutely. Performance analysis
Starting point is 00:32:52 during recording and replay would be pretty interesting. Multi-memory usage is one of those things that would be easy to do with instrumentation if you could instrument the replay only. Also, because RR is relatively low impact on the program execution,
Starting point is 00:33:09 you know, you could actually imagine, well, I've actually done this, actually. You can run a profiler during recording, and then you can replay and compare the profile to the replay. You can say, well, you know, we're spending a lot of time in this piece of code, you know, and then actually debug that, you know, without... And you've got your performance results already,
Starting point is 00:33:35 but now you're debugging the exact execution that produced those performance results. That's pretty cool. Okay, I might have to play with that. So, I mean, this is all specific to gdb and as we mentioned during the news and stuff everyone's talking about clang and llvm and whatever is there plans i don't and does it work with lldb i don't really know how these okay i i don't know i don't i don't there's any fundamental issues there. I don't know if LLDB has the same kind of retargetable reverse execution backend that GDB has. If it doesn't, that could be added. And when it does, R could be added to support that.
Starting point is 00:34:17 R really works at a very low level. So almost all the machinery of RR is pretty generic. It works on any binary code. It's not C++ specific. It's not language-specific or compiler-specific. In principle, you can debug all kinds of things with it. We have been debugging Rust programs with it. Some people have.
Starting point is 00:34:40 And, of course, Rust is LLVM-based. So LLVM itself is not a problem. LLDB would just require a bunch of work to plumb the interfaces through. Okay. Are there other tools in the space similar to RR that you're familiar with? Yeah. So there are, and there's been a ton of academic research in this area, and there's also been a few commercial projects
Starting point is 00:35:07 that have done this. And I think the most important one is there's a tool called UndoDB by Undo Software, which actually is very, very similar functionality to RR. The way you use it also is fairly similar. And it also plugs into... It's also Linux for CIFIC. It also plugs into GDB.
Starting point is 00:35:28 You know, it's a great tool. And I think that, you know, as I understand it, you know, the main difference between RR and LDB is... Sorry, the main difference between RR and UndoDB is that UndoDB uses code instrumentation to do its work, and so it has a little bit higher overhead than RR. It's also, they have to do,
Starting point is 00:35:53 that's also more work for them to keep it maintained and up to date with instruction sets. The one thing that does have that RR doesn't have that's really important is it has ARM support. And, you know, there's some deep technical reasons that I probably shouldn't have that's really important is it has ARM support. And there's some deep technical reasons that I probably shouldn't go into now why on ARM we can't actually do RR the way that we've done it.
Starting point is 00:36:13 And so you need to use code instrumentation on ARM, and so under DB works there, but RR does not. And so that's a good option if you're on ARM. Okay, so I was wondering before if RR would support Android, so I guess not. So that's a good option if you're on ARM. Okay, so I was wondering before if R would support Android, so I guess not. Right, well, Android x36. Okay. Not really what you're looking for.
Starting point is 00:36:34 But yes, we did look into ARM, but there are some really cool and interesting and deep and unfortunate technical reasons why they didn't work. They could be fixed by the ARM guys. I hope one day they will fix it. But anyway, so, yeah, the other tool that's really interesting and related to this is that years ago, VMware had a VM-based record and replay engine where they record and replay the execution of an entire virtual machine.
Starting point is 00:37:01 And you could actually run Windows in this. And so you could get record and replay for Windows. And we actually used that at Mozilla to deal with some really horrible bugs, and it was pretty good. I mean, it wasn't quite all there. Unfortunately, they cancelled the project before they really got the engineering really solid, I think. But I was very sad they cancelled that.
Starting point is 00:37:19 It was a great system, and we actually, I believe that at Mozilla there is still somewhere a Windows machine set up to run that. We've taken it off the network so that it doesn't get any Windows updates because otherwise it will stop working. But, yeah, that was a cool project. All right, so this is a C++ podcast. RR is written in C++, is that right?
Starting point is 00:37:46 That's right. So as a C++ developer, we've got to ask you the standard question of do you have any favorite features of C++ or favorite annoyances of C++? Yeah. C++ is cool. I really like
Starting point is 00:38:02 templates. I like the metaprogramming, compile time, all the compile time stuff that you can do. I like the metaprogramming, the compile time, all the compile time stuff that you can do. I think that's really great. Being able to generate code essentially, the exact code that you want for a particular data structure or whatever is specializations and things like that.
Starting point is 00:38:17 That's really neat. I'm not a huge fan of C++ to be honest with you guys. I think the new language features are cool, but the problem I see is that the complexity only grows and it's already incredibly complex and it's just getting more and more complex. And, you know, you really have to study to keep up, right?
Starting point is 00:38:40 Like, you know, I had to sit down, you know, for an hour or two and like figure out, you know, R value references. And I'm sure that I don't understand all the interactions they have for the rest of the language. So just, you know, I would really like, you know, people have talked about having a, at some point, you know, a restricted subset of C++ with all the good parts and the bad parts kind of disabled. And I think that would be really great. You know, I also, you know, because we use C++ to build a web browser and web browsers, you know, security is incredibly important.
Starting point is 00:39:20 And the unsafe C++ features, which, to be frank, you can't really live without, are a problem as well. So, you know, I would love to see, you know, a safe version of C++. You know, at Mozilla, we've got, maybe I shouldn't mention it, C++ podcast, but we decided that we didn't really want to build, you know build another iteration of a web browser in C++ because of the safety issues, and so we built this Rust programming language which tries to deal with a lot of that. So I think those are the things that I worry about with C++.
Starting point is 00:39:57 But I do enjoy using it. We use C++ for Firefox for a reason. It's not just legacy. Right now, until now anyway, C++ has been the only language you'd be saying to do something like this in. It's okay. It's okay, yeah.
Starting point is 00:40:16 Earlier, when I read the piece of feedback, the listener talked about how Mozilla has lots of other projects that might be interesting for C++ developers. Are there any you could maybe mention? Obviously, we can't go too deep into anything. I think WebAssembly, what used to be called Asm.js, and now I think it's turned into WebAssembly. I think that is a pretty interesting thing. I don't know if you guys have covered it. I haven't checked, but... Yeah, we had Jeff Bastion from Google a while back
Starting point is 00:40:48 talking about WebAssembly. Okay, cool. Okay, so that's a pretty big deal. So that's cool. I mean, you might, if you guys are really bold, you could invite a Rust person on to talk about C++ versus Rust. I think that's pretty
Starting point is 00:41:08 fun. Pretty fun podcast. But maybe you shouldn't. We've actually had someone from Rust on too. I have. Thank you for the suggestion. No, no. Cool. Okay. Jason, is there anything else
Starting point is 00:41:24 you wanted to ask? I think I covered all the questions I had. Okay. Well, Robert, where can people find you online or find more information about RR? So there's the rr-project.org website. It's the main landing page. People can email me. You've got my email address.
Starting point is 00:41:53 Also, we also discuss RR on IRC, if some people still use that. That is irc.mozilla.org, hash research. That's where we hang out for that. So I'm off on online there. So those are the main places. And we're always... Also, we're on GitHub, of course, and people should download RR and file
Starting point is 00:42:11 issues if they have them. Okay. Well, Robert, thank you so much for your time. It's a great pleasure. Thank you for having me. Thanks for joining us. Thanks so much for listening as we chat about C++. I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're interested in,
Starting point is 00:42:28 or if you have a suggestion for a topic, I'd love to hear that also. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you can follow CppCast on Twitter, and like CppCast on Facebook. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.