CppCast - How CLion works under the hood

Starting point is 00:00:00 Episode 366 of CppCast with guest Dmitry Kozhevnikov, recorded 1st of August 2023. This episode is sponsored by the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we talk about new releases in the world of C++ tooling. And about why we should use lambdas instead of regular functions. Then, we are joined by Dmitry Kozhevnikov. Dmitry talks to us about how the C-Line IDE works under the hood. Welcome to episode 366 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Timo Dummler, joined by my co-host for today, Matt Gottbold. Matt, how are you doing today?

Starting point is 00:01:14 I'm doing great. Thanks, Timo. How are you doing? I'm not too bad. Thanks. Yeah, so Phil is on vacation. So I'm doing a couple episodes with guest co-hosts. And Matt, thank you so much for coming on the show and being my co-host today. I'm very excited about having you here. Likewise. Thank you for asking me. I'm privileged to be here again.

Starting point is 00:01:33 Yeah. So how's it going on your end? What are you up to? It's going very well, thank you. So recovering from a couple of conferences back to back where, in fact, I met you, which has been the first time in a little while that you and I have seen each other in the flesh. We had CPP North and C++ on C, and it was so, so lovely to see C++ people again

Starting point is 00:01:54 after so many years without having seen anyone or seeing them over video conference. So yeah, it's going great. I'm just sort of recovering now and a bit of a comedown after that. Yeah, yeah. So I've also been to both of those conferences uh and it was good fun um now i'm taking a bit of a break from conference travel it's been quite a lot uh the last few months for me but um yeah all right so at the top of every episode i'd like to read a piece of feedback and this time i got some feedback from richard powell, whom I actually met at CPP North, the conference that we just mentioned. Richard said, thank you guys so much for starting up CPP Cast again. I don't always get the chance to listen to it, but when I get

Starting point is 00:02:34 the chance to listen to it, I really love it. Thanks. Well, thank you, Richard, for the feedback. Richard actually went on and gave me another piece of feedback. He said that because we're an audio onlyonly show, sometimes it's a bit difficult to follow the conversation if there's no visual material. For example, if you're talking about syntax or something like this, and he suggested actually that we can use a podcast feature where you can have different cover art for every section of the episode.

Starting point is 00:03:02 And so you can essentially use them as slides to visualize what you're talking about. I'm not sure this is something you want to do i think i need to look into this but i've never heard about this before no me neither yeah that's that sounds like a really intriguing idea um and i can certainly see you know i listen to some other podcasts and certainly some of them where they start to talk about code it would be useful to actually see the code on the screen but then the other thing is if you start relying on that too much then you lose your listenership who are walking their dog for example like myself uh or listen to it at 1.9x and can't keep up with the slides changing every two seconds like some people we know so i i could see that

Starting point is 00:03:39 being useful but it also adds an awful lot more burden on the podcast producers to actually then generate visual content as well yeah so that's why i'm a bit skeptical a it's more work and nobody likes more work and b it's kind of a trade-off right because it might be uh easier to follow the topic uh for the people who do watch these like these slide sort of content yeah but but but but then yeah a lot of people like you said like they told me that they listen to cpp cast while they're on a run or something like that and and we really don't want to lose it might it might make you lazy because if you could rely on the visual aid the whole time then you're not forced to explain it better for those folks who aren't able to see the picture so yeah an interesting idea nonetheless though so thank you richard all right so we'd like to hear

Starting point is 00:04:25 your thoughts about the show and you can always reach out to us on master on or on twitter or whatever it is called these days um or email us at feedback at cppcast.com joining us today is dmitry kazhevnikov dmitry has over 15 years of experience in c++ development he began his career with a company specializing in GIS, simulators, and 3D entertainment systems, serving as a C++ software developer for seven years. Later, he joined JetBrains and contributed to the C++ language support in C-Line. There, he worked on the in-house parser, semantic analysis, and refactoring features for C++ code. Currently, Dimitri is doing engineering management tasks on the C-Line team.

Starting point is 00:05:05 He has a passion for developer tooling and dreams of making C++ development a universally pleasant and accessible experience. Dmitry, welcome to the show. Yeah, hi. Nice to meet you. Nice to see you all. I'm super excited to hear that you used to do stuff with, what was this, 3D entertainment systems.

Starting point is 00:05:24 What on earth was that? Yeah, well, basically the company had an in-house 3D, like sort of gaming engine, game engine, or something like that. Right. But yeah, and there are some simulators for ships and stuff doing on top of that. And as a sidekick, it was like some entertainment,

Starting point is 00:05:47 like, you know, 5D things that are installed in malls oh i see so not like a and people come come and play a game oh the one way that your chair rocks back and forth and like soap bubbles are flying around and yeah and some air breathes in your in your face and stuff right right there are like the rats run along and then you can feel something at the back of your heels and all that kind of stuff. Right, okay. That's cool. That's cool.

Starting point is 00:06:08 That was C++ as well, presumably. Yeah, that was all C++. That was very C++ heavy shop. I suppose, yeah. Cool, very cool. All right. Well, Dmitry, we'll get more into your work in just a few minutes.

Starting point is 00:06:21 But before we do that, we have a couple of news articles to talk about. So feel free to comment on any of these, okay? First one is a release in the C++ tooling world. Mold 2.0 got released. And Mold is a faster drop-in replacement for existing Unix linkers, such as LLVM, LLD, and GNU Gold.

Starting point is 00:06:41 And what the new major version 2.0 does is it switches the license from AGPL to MIT. There's a comment by the team saying they've been attempting to monetize Mold through a dual AGPL commercial license scheme. And that didn't work out as well as they hoped. And so they switched the whole thing to MIT now. And in addition to that license change, there are also some technical updates in this version, but no super major features, I think.

Starting point is 00:07:12 So I think it's mostly about the license change. Yeah, mostly the license, which is huge for commercial users. It was a bit vague before as to whether or not if you use the linker that was AGPL, whether or not the binary that was linked by it was also bound by some of the the the sort of clauses of it so making it mit makes it very very clear who who what what the the sort of um the legal situation is of the linked binary and mold is so fast it's almost as fast as copy so if you just copied all your dot o files into a single

Starting point is 00:07:43 directory mold is only slightly slower than that and so we use it um for some of our debug builds so you get a really really fast uh you know edit debug link run test cycle in my in my day job and it's it's fabulous so this is great great news it means that i feel a lot more confident about using mold myself um and yeah it's i don't know how he's made it so quick yeah that's interesting i have to admit i have not heard about mold before uh but yeah it seems like quite a lot of people are relying on it because it is much faster so is that just it basically speeds up your your link at times if that's your bottleneck or is that absolutely yeah there are some other features

Starting point is 00:08:20 that it has it has some sort of like layout randomize layout randomization options that you can use to make sure that like your uh code isn't performance bound by uh linker layout so you know you can run it multiple times and link multiple times and then run it and see if it gets faster and slower you're like well there's some sensitivity to exactly how my code is laid out but the main reason you do it is because it's just so blisteringly fast you don't even notice that it's running anymore and like sometimes you get to the end of your program and there's like a two minute pause while the link happens and you're like oh that's a shame i only made a one line change right okay we have another uh release uh another major release in the tooling world uh c-lined 2023.2 actually got released uh last week um it has quite a few really spectacular new features.

Starting point is 00:09:07 There is an AI assistant. I mentioned this on the show, I think, a couple of episodes ago when there was an EAP, like an early access preview, that had the AI assistant. We talked about that one already on the show, I believe. This release also gives you registers and a debugger, which you now can inspect. There's integration with Platform.io,

Starting point is 00:09:26 which is, I think, great for embedded people. Is that right, Mitya? Yeah, that's correct. Well, we had it before, but now it's redone, and now it's way more straightforward. It's especially great for hobbyists and newcomers. Well, it's also probably used for professional developers, but yeah, professional developers has other options,

Starting point is 00:09:51 but newcomers and hobbyists, that's who are using Platform.io. So I think it would work great for them in CLion. All right. And you also added VC package integration, which I think is also quite exciting news. Yeah. The AI assistant is particularly interesting to me. I looked through the terms and conditions to see whether or not I could easily enable it. And so far, because of the amount of like back and forth, understandably, and the fact that I share

Starting point is 00:10:20 my account between my personal and my work means i couldn't clearly couldn't look my legal team in the eye and say don't worry none of my work code is going to go to this but i'll be excited to turn it on and see what it comes up with you know i've only really dipped my toe in all of this ai nonsense but if it's in my idea it'd be rude for me not to try it out yeah uh that's actually a valid point we're working on to make it more uh straightforward and more clear from a legal point of view to understand well uh how do you disable the thing completely and how do you enable it when you need it oh fantastic obviously your code is your code is going to to be sent somewhere for it to work so you're probably very aware of how sometimes code

Starting point is 00:11:01 has to be sent to remote servers for some reasons i won't just mention that well it's it's not included by default into the release you need to install it separately so i know that it's a concern for a lot of people so uh you're not going going to get it automatically when you install c-line but in generally i'm kind of uh excited and terrified at the same time about our future because, well, it will change things. I'm not yet sure what things it will change and how it will look like in several years, but it definitely will do something to the way we work. Yeah, I think we should also probably have an episode about this AI revolution that's happening right now, also in a C++ tooling world, along with the rest of everything. Probably we should talk about this at some point on the show. Today, we're going to talk more about CLion. But before we get to that, I have one more news item.

Starting point is 00:11:54 There was a blog post. I mean, there were a bunch of blog posts released in the last couple of weeks, but there was one that I found particularly interesting. It was written by Jonathan Müller, who I think has been around writing and talking about c++ since at least 2015 um but he's actually now with a think cell a german company in berlin and uh he's now writing a blog post for them among other things um and this one um is called should we stop writing functions and use lambdas instead? So basically, you know, half ironically what he's saying

Starting point is 00:12:29 is you shouldn't write functions anymore. We should write everything with lambdas. Like instead of writing, I don't know, a function that takes two integers and returns an integer, you just, you know, write const x pro auto, blah, blah, blah, equals, and then write a lambda expression instead

Starting point is 00:12:44 and you call that. And basically, you should just do that everywhere and just not write freestanding functions anymore. Because, you know, functions are kind of really broken, right? Like, you have ADL, which, like, is really complicated. You can overload them. You can, like, the way they do, like, template argument deduction

Starting point is 00:13:06 is kind of weird. And lambdas are so much better. There's no ADL, there's no overloading unless you want to, so you can construct these lambda overload objects, but by default, there isn't any overloading. Lambdas do this really cool thing, which I think I mentioned

Starting point is 00:13:21 also in my C++ Lambda Idioms talk. You can separate template arguments that are deduced and the ones that you explicitly have to specify so like the you can put like you can make like the lambda itself a template and then you have template arguments there and then you can make it an auto or you call this generalized lambda generic lambda i forget what they call but yeah with an auto parameter parameter and then those are the ones that are going to be deduced but they're like in two different places right so you have like full control i'd never thought of that yeah that's interesting yeah yeah that's a that's a really cool trick and then they're implicitly constexpr as well which is something

Starting point is 00:13:55 that i think we all want right you know if you if they're sprinkling constexpr like magic sauce throughout our code base is something we're all kind of a bit used to doing nowadays but you know it was nice for it to be the the default so it's an interesting idea um it also means that your code looks more and more like typescript or javascript which maybe is maybe it's not something you want to aspire to but it is an interesting idea um i don't know that i could go with it i'm too old and fuddy-duddy i think i still like uh functions but uh i i certainly appreciate the idea of of having something which is trying to show us that the defaults are broken yet again yeah i don't think this is going to make it into any by any company's actual coding

Starting point is 00:14:37 guidelines but i think it's a really interesting blog post summarizing the problems with functions that we have and kind of it helps us being more aware of them and like choose the other thing if if you want to you know yeah i wonder if there is any other mainstream language where the the community kind of agrees that functions are broken so i mean we already agreed that initializing a variable is broken right and so apparently functions are also broken i wonder what else is broken i mean with c++ it's been around so long now it's hard to find things that that aren't broken in any meaningful way. I mean, that's unfortunately part of the legacy that we bring with us as C++ folks. And the desire to be backwards compatible all the way back to pretty much C is that we have to put up with an awful lot of wrong defaults and

Starting point is 00:15:22 stuff. So yeah, I don't know how many other languages have so much baggage to deal with. Okay, so that concludes the news items for today. So we can transition to our main topic. And I want to say a little bit about like how this episode came about, because I wanted to do this episode for a very, very long time. What happened recently, a piece of personal news is that I actually quit my job. So I actually no longer work for JetBrains as of yesterday. So nobody can accuse me of like a conflict of interest here anymore that I actually quit my job. So I actually no longer work for JetBrains as of yesterday. So nobody can accuse me of like a conflict of interest here anymore. I can finally do this episode.

Starting point is 00:15:51 And so the idea was kind of, you know, when I was talking to people about C-Line or working at the JetBrains booth and conferences, a lot of people came to me and were asking questions about how C-Line actually works under the hood. There's not a compiler, right? There's an IDE. So it's kind of different, but like people are very curious. And so, yeah, I decided to invite Dimitri on the show to talk about this because I think it's a really cool topic. So yeah, Dimitri, welcome again to the show. Yeah. Hi again. So probably the most frequent question I actually got asked when kind of about like how C-Line actually works is which programming language is C-Line actually written in?

Starting point is 00:16:29 Is it really all written in Java or is there anything else going on? Are there any native components written in C++? Yeah, so the very short answer is yes, but it actually depends. So indeed, like most of the code is written in JVM languages. So that's being Java, a bit of Groovy, and now Kotlin. And so Celion shares the platform, the IntelliJ platform, with a bunch of other JetBrains ideas. And what I actually wanted to talk about here,

Starting point is 00:17:02 while we're on the JVM side, I wanted to talk about Kotlin a bit, and about how we made the transition to Kotlin. And actually now in JetBrains, in IntelliJ, well, in CLion as well, Kotlin is the go-to way to write new code. And it's actually explicitly preferable to write in Kotlin when you're working on CLI and other IDs. And it was all done in a 20-year-old code base in a single monorepo where thousands of people have contributed there. And Kotlin was integrated there over time. And what made it happen is that it has absolute top-notch inter interoperability with java so you basically can take like any class uh convert it to kotlin or rewrite it to kotlin

Starting point is 00:17:54 start using it and it's not just possible it's actually the apis are somewhat compatible so you can use the collection apis from kotlin and from. And on both sides, it would be idiomatic, natural. And I wanted to talk about that because actually that is a direction I sometimes see the C++ is going recently. So there are a lot of activities, a lot of work in introducing some kind of successor to C++, whether it's Google's Carbon or Herb Sutter's CPP front, or like Swift is trying to do C++ interoperability and stuff like that.

Starting point is 00:18:35 So I think all these efforts have a lot to learn from how it's possible to gradually migrate to Kotlin over time. And it wouldn't be possible if we just had to rewrite bigger components on Kotlin and then just have some kind of compatibility layer or stuff like that. My understanding is that Kotlin, and you mentioned Groovy as well, and I know that there are other JVM targeting languages,

Starting point is 00:19:01 but the JVM itself enforces essentially the ABI between all of these components. And so in order to operate, they are forced through the ABI of the JVM. And that means that the way that you instantiate classes, the way that objects are created, the way that references are passed between them all has already been agreed upon. And so in order to be interoperable with the JVM makes you instantly interoperable with pretty much everything that is also JVM based. Whereas sort of C++ doesn't have the luxury of doing that because the ABI is much lower level and doesn't control, doesn't talk about a lot of this sort of nitty gritties. And in order to use something like C++, you actually need to look at the header files and understand them.

Starting point is 00:19:40 Whereas in a JVM language, you already have like this blob of bytecode. So do you see that there's, you know, obviously from the point of view of kotlin being a natural successor to java i understand that but like um i think it had fewer problems or am i misunderstanding yes indeed it had very fewer problems and it would be a way more challenging task for c++ but it also there are some uh more subtle and ergonomic things that has to be done uh has has to be done right. For example, you work with collections and basically collections and iterators

Starting point is 00:20:10 somewhat differently in Kotlin and in Java. But it's actually interoperable in both sides. So you can have a method in Kotlin that takes a collection and it has slightly different interfaces in Kotlin and in Java, and they are both idiomatic, and you can call it transparently from one side to the other. Okay, so the JVM isn't a panacea. You still have to do some very clever tricks in Kotlin

Starting point is 00:20:36 to make it interoperate really, really seamlessly with Java stuff. Cool. Yeah, you still have to design it properly. You still have to think about ergonomics of actually interoper interpreting between languages. So how much of the code base is Kotlin now compared to, say, Java, roughly? Yeah, I don't remember, actually. But a decent amount of it.

Starting point is 00:20:54 Yes, a decent amount of it. Also a new code for a few years to read in Kotlin. And Kotlin, forgive me if I'm wrong, is actually JetBrains' own language, right? This is something JetBrains came up with. Yeah, it's initially JetBrains came out with that. Then it's, well, kind of governed by Kotlin Foundation now. So it's open source. And like there are some other major players

Starting point is 00:21:15 in Kotlin Foundation right now, but it's still like mostly being developed by JetBrains. That's a pretty impressive achievement for a tool company to say, we've had so much experience. You know, the main product line, I think, IntelliJ here is the Java AD that you sort of started with, as I understand it. And so you're like, you're so sick of trying to deal with Java that you've come up with your own alternative spelling of all of the way Java works in order to write your own program. That's cool. So now, obviously, the next obvious thing is that you should turn your attention to c++ and um and and come up with the alternative syntax for c++ right probably yes

Starting point is 00:21:50 but yeah someone need to do it it's probably not us yeah because well we have some in-house experience of c++ uh obviously like for example the gdk we're working on top it's uh we actually work on top of a fork of open gdk it's called jetbrains runtime and well it's maintained and developed also in jetbrains it's also open source and it has a bunch of a bunch of fixes and additions compared to open gdk it's mostly related to desktop applications the fonts handling the to focus handling and stuff like that. So there are a lot of people working there on C++ because OpenJDK has a lot of C++ code there. It's actually a very marvelous engineering thing.

Starting point is 00:22:37 It's impressive. It works surprisingly well. Yeah. And it's very complicated, but it works very well in some cases. So you have a JDK written in C++, and then you use that to run Java in Kotlin, and then you use that to implement a front-end for C++? Yes, something like that, yeah.

Starting point is 00:22:54 That's fun. It's turtles all the way down. But are there any components in CLan itself that are actually written in C++? Yeah, sure. Then the bigger one is that for, for some IDA features, we're relying on ClangD. We have a fork of it.

Starting point is 00:23:09 Well, ClangD is a Clang-based daemon for IDEs. Well, it has an upstream implementation and we are maintaining something specific for CLI and for that

Starting point is 00:23:22 in our fork. So, yeah. It's purely, well, it's LLVM and LLVM is purely C++ projects. So we're also working on C++ there. And does the ClangD, is that sort of integrated directly into your codebase? Are you doing like JNI calls to it or is it something that runs sort of independently?

Starting point is 00:23:41 Is there sort of a separation between the Java process and the C++ side? No, it actually runs as a separate process and, well, for a good reason. I love C++, but still C++ applications tend to crash. And it happens rarely, but it happens way, way, way more often than the managed application like Java. And it's actually, I don't think it would be possible to write the IDE platform like IntelliJ in C++ because it requires the code of thousands of people,

Starting point is 00:24:13 including third-party developers running in the same process and interpreting with each other. So the fact that it doesn't cross the IDE and it's like everything can be recovered, it's really huge. But yeah, the Clang clang we kind of controlling ourself and we can work on it but we're still running it as a separate process got it got it so it sounds like you spend most of your day well it sounds like you're

Starting point is 00:24:37 managing folks these days more than more than writing code but when you are writing code obviously there's a lot of kotlin uh going on. Is there anything you miss about C++? Using C++ as opposed to staring at the parse tree of C++ all day? Yeah. What I actually miss is, well, it's control over memory and control over lifetimes. So these are two main things that I'm actually missing. So it's actually really sad when when we can't uh do some trivial there's some things that i consider trivial in cyclospass

Starting point is 00:25:12 like you can't uh have objects with efficient memory layout you need to have allocations there are not not real allocations they are managed by like jvm and they are pretty fast but they're still uh they still happen and yeah it's kind of uh became complicated became complicated sometimes and that's the main thing i missed and also the lack of more destructors and lack of more controlled lifetimes of things it's also it's funny when i see you know folks trying to write java for the first time who have been in the c++ world they'll immediately latch latch onto finalizers in Java as like a destructor. And it's like the first thing you learn is never, ever do anything in a finalizer. Just so many odd caveats and weird things.

Starting point is 00:25:56 It's unfortunate because you, as a C++ engineer, the RAII sort of semantics are so baked into your soul that you want to have it in other languages and you just can't get it without, you know. Yeah, and I think most JVM applications, large applications have some things implemented, a similar to RAII inside them, like just from user side, like from library side. Yeah, sorry. Yeah, I remember this.

Starting point is 00:26:24 Like when I first joined jetbrains as a developer i think that was back in 2017 uh you know i was hired and then i was told oh you're gonna be writing uh you know a lot of java and do you know any java and it's like no i don't and i kind of learned java on the job but like it felt so weird in the beginning where you just sprinkle these like naked news all over the place like new this new that and i'm like oh my god this is an allocation like this is going to be slow and then took me a while to like get used to the fact that it's fine this is how it works it's not going to be slow it's okay you know and yeah it's just a very different different way of coding i think what i really appreciated like in java is that like things were actually a lot easier to write

Starting point is 00:27:02 i could just write code and you know the computer is actually going to do what i wrote which almost never happens to me when i write in c++ but but yeah on the other hand like as dimitri said like yeah you don't have this uh kind of control over what's actually happening and that felt really weird for for quite a while but are there things that you prefer about working in a managed language what thing you know timur has just sort of just explained some of the things that were good for him. But what about you, Dimitri? Is there other things that when you go back to C++, if you do indeed write C++, you're like, oh, gosh, I have to do this thing. Yeah, well, the first thing is I kind of mentioned is all sorts of safety. So like not being able to actually crash the application

Starting point is 00:27:48 and recover reasonably from any kind of issues is really powerful. But the main thing I'm missing when doing C++ is actually all the tooling and control over whatever is happening in the program. So it starts from source-based tooling like the ids as a inspections and stuff they're way more powerful in both java and kotlin and well you basically can write half of your code by just uh pressing shortcuts and it does a thing for you

Starting point is 00:28:18 and then like that's the running joke though isn't it like you know we're talking with people it's like you know just hit control space or control enter until like your entire application is written. I suppose that's the chat GPT style or the sort of the open AI or stuff that we were talking about AI assistant is the logical conclusion of auto-completing your entire code base. Yeah, something like that.

Starting point is 00:28:39 But it actually, like, it actually makes sense when we're doing it in kotlin and java so yeah but it's not even that like the more powerful thing is like the control over runtime and stuff via tooling so the debuggers works way better and you can have reasonable memory snapshots where can you just get a snapshot of your application inspect all the objects with all the fields and see how they are related and like look at their values and it's well it's not always possible in in c++ and then have some instrumentations and have some code to be rewritten on the fly have compiler plugins plugins if you need it. It's all sort of possible in C++, but it's not accessible to

Starting point is 00:29:27 people. You need to be a really sophisticated engineer and really understand what's going on to access these things and in JVM it sort of happens naturally. And I guess the other thing that I think of when you mentioned about not crashing is that

Starting point is 00:29:43 there's no undefined behavior that I'm aware of in Java. So if you reference a null pointer rather than the compiler going, I can do anything I darn well like now, it rather pedestrianly throws a null pointer exception and you carry on with your life and someone else can catch it somewhere else and go, oh, yeah, that's fine. Never mind, there's a bug in that bit of the code, just don't run it again, you know? Which is a very different mindset, I think, from C++. Right. So speaking about C++, I'd like to mention my favorite topic, parsing. Yeah, so parsing C++ is hard.

Starting point is 00:30:15 I believe, Dimitri, you remember that you and I actually have done a CppCon talk about parsing C++ and how hard it is. Yeah, I do. A few years ago at CppCon. But in an IDE, it's even harder because you're working with half written code right uh and you still kind of have to make sense of like

Starting point is 00:30:31 what the user's writing so how do you do that how do you pass code in c line and what paths do you use and can you talk a little bit about that uh yeah sure i probably will start with a more easier part uh with how we're using clangD. So nowadays we're using ClangD basically for things that can be done on a single file. So when you show errors, when you show like highlightings, when you do code completion, like the basic ones, it just completes the things that are imported, included into your file. So that's we using ClangD and it uses Cl clang parser under the hood and well it's works well but that's actually not enough for what we uh what we are doing and what we want

Starting point is 00:31:14 to do what we want to achieve so at the same time we maintain our in-house parser which is currently used for like refactorings and uh like project-wide things and uh it actually pretty powerful and pretty fascinating what it can do basically it has to do everything that uh like the compiler parser has to do and then more on top of it and for example it has features like for example it can be incremental in a way that it can reparse specific functions when you type inside a function. So that's something that current compilers can't really do. Then it has a sink. So as you know, as you parse C++ code, you need to do semantic analysis for that.

Starting point is 00:32:01 And our in-house parser, what it can do, it can do only necessary semantic analysis on the fly and not all of it. So if your code is structured in a way that it's obvious what's written and it doesn't have any ambiguities, it can avoid doing any semantic analysis until it's actually needed. Is it those kind of stuff where like you have, I don't know, an identifier somewhere and you don't know if it's a type or a variable or whatever, and you kind of have to execute arbitrary constexpr code

Starting point is 00:32:37 somewhere else to figure that out? Yeah, that's something like that. And also like the common thing inside the id when you work with project-wide things so you you have a name and then you look through it you just and you need to understand whether it's usually actually usage so you need to do like type inference type checks on that and what's our in-house thing can do it's it can do it also on demand lazily and only for a single thing. It doesn't have to reparse all the files for that.

Starting point is 00:33:11 So, and yeah, while the throughput of this, like the raw performance is obviously way lower than just the compiler in terms of like megabytes per second and stuff like that. But doing tricks like that, we can very often do things way faster than a compiler would do these things for us.

Starting point is 00:33:30 Because you're able to do them incrementally. I had no idea that you were reparsing on a function-by-function basis. That explains an awful lot. That's something I've often wondered. I'm tapping away quickly, and I expect when I type this completely new variable that I've just

Starting point is 00:33:45 finished typing, that I can immediately do dot and hit control space. And it's going to tell me all of the members that might be in there. And I'm very intolerant if it delays in any way, but you have to essentially pause the world in order to answer the question, what functions might be able to go here? And know especially with the adl stuff that you do nowadays i think you can like control space on pretty much anything and it goes oh there's a free operator that takes this as a first parameter and you can i'll give you that as an option as well yeah that's one of my more favorite things to have here uh but uh to be honest uh just the normal completion is done by clangd and it's also pretty fast because ClangD,

Starting point is 00:34:26 unlike the Clang itself, it also has a bunch of optimizations that work on a single file. So it doesn't... If you have just a set of normal includes on top of your file, just top-level includes one by one without any declarations, so it doesn't reparse it.

Starting point is 00:34:44 It keeps it uh in memory so it also works reasonably reasonably well but i see uh yeah we with our in-house parser we can do it even more incrementally even more granularly and then is there a separate thing that does like the syntax highlighting is that different again or is that is that built into one of those puzzles because i can only imagine literally that's character by character you have to work out what color to print the next character as yeah that's like mostly that's clangdy like they're highlighting what's what's wearable and what's not uh and when you see like errors or warnings and stuff like that the little squigglies and things underneath stuff. Yeah, it comes from both.

Starting point is 00:35:28 It depends on where it's implemented. So I noticed that I use C-Line in my day job. It's my daily driver, but there are a bunch of other C++ tooling things that JetBrains provides, a ReSharper, and then

Starting point is 00:35:43 there's the Rider. Is that the other one that's like the resharper, and then there's the Rider. Is that the other one that's like Unreal Engine thing? Yeah, that's right. How much of the code, you mentioned you're in a monorepo, so I'm imagining you can share quite a lot of the code between these things, but they look on the face of it quite different from each other. Is there a lot of code reuse between them,

Starting point is 00:35:59 or is there a, yeah, how does that look? Well, actually not much, actually not much yet because, well, it's a story, it originates from very old times when there were two completely separated departments in JetBrains. One is doing IDEs and another is doing like Visual Studio plugin, like the ReSharper.

Starting point is 00:36:28 So when, and a Visual Studio plugin, like the ReSharper. So when, and the Visual Studio plugin, it's actually running on C-sharp and.NET infrastructure. So it's a different code base. And back then it was decided just to re-implement the same thing another time for ReSharper C++. And yeah, the only thing we are sharing right now is some test data for our parsers and actions. And yeah, that's obviously not ideal, and that's something we're working to do something about. I probably can't go into more details right now, but yeah, that's something we're very well aware of uh and hopefully one day uh it wouldn't be like that but for now yeah for now it's kind of share the same ideas because well the resharper passer and

Starting point is 00:37:18 well rider which also uses the resharper backend under hood. It's kind of the second attempt on the same idea that CLI in particular has. All right, we're going to take a little break because now we would like to mention a few words from our sponsor. This episode is supported by PVS Studio. PVS Studio is a static code analyzer created to detect errors and potential vulnerabilities

Starting point is 00:37:43 in C, C++, C Sharp, and Java code. Podcast listeners can get a one-month trial of the analyzer with the CppCast23 promo code. Besides, the PVS Studio team regularly writes articles that are entertaining and educational at the same time on their website. For example, they've recently published a mini-book called 60 Terrible Tips for a C++ Developer. You can find links to the book

Starting point is 00:38:08 and to the trial license in the description. So one of the things that I'm more impressed with with all of the IntelliJ family of products is how wantonly I can completely change the code from my shell while the IDE is running in the background. You know, like I'll check out a different branch. I'll revert some files. I'll do some changes.

Starting point is 00:38:30 I'll copy things around. And then I'll alt-tab back to the application. And magically, it's kind of kept up with me and is completely fine. Whereas I remember like in the old days of Visual Studio, this is like 15, 20 years ago, there was always this kind of button I had to synchronize state with the UI, all that kind of stuff. And I would always wonder, should I be shutting my IDE down before I edit stuff or whatever? But I just assumed that CLI is going to keep up with me.

Starting point is 00:38:56 How on earth does that stuff work? Yeah, well, some people would argue it might work better. But yeah, we're doing it we're actually working hard on making making it actually work smooth smooth and painless like there are a bunch of tricks we're doing like from the one hand we have like a separate process that watches file system changes and then just syncs it back to the IDE state. So IDE might look at some out-of-date, a bit out-of-date state, but it's always consistent. So the files won't change in the middle of some code analysis

Starting point is 00:39:37 or it would be just canceled and restarted. But also specific to C and C++, there is also a bunch of things we're doing in our indexing and optimizations. We're sort of doing the same thing that modules are doing, but in a somewhat different way.

Starting point is 00:39:55 So we're trying to keep a parsed representation of header files and reuse them when possible. And then, so when we like reindex a changed file and we encoder and include directive, we need to understand what is the preprocess state, what was the preprocess state.

Starting point is 00:40:15 We did it last time. And is there significantly different or should the difference affect the parse three of this header file? And if it doesn't, like we don't go into this file and we just reuse some parsed representation of the header file we're storing. So tricks like that, they're hard to do correctly.

Starting point is 00:40:38 They're complicated. That sounds like the kind of trick that you could imagine a compiler vendor implementing with the same thing. You know, if you encounter this particular file with the same state ahead of time, then load some incremental slab that you've pre-done instead of reparsing everything again. But yeah, maybe it's because it's only a subset of the information that you're interested in, the types and the inferences and things like that.

Starting point is 00:41:06 But yeah, it would be lovely if we could just get all the benefits of modules without actually having to write modules or understand how they work. That's just me being lazy. Yeah, well, it's actually one of the things we're actually doing under the hood. It's like sort of experimental project is what we're trying to,

Starting point is 00:41:28 like for our Clang D and Clang backend, we're trying to do automatic modelization on the fly. So when we, we have this thing now called, it's called Clang D indexer. So it's like Clang D, upstream Clang D has some index infrastructure. You can just run it and have a list of references and symbol use, just stuff like that.

Starting point is 00:41:56 And yeah, it's pretty powerful, except it works with the speed of Clang parser, which is pretty fast, but we are not always satisfied with that for id tasks so what we are trying to do and it's a ongoing experimental project is we're trying to automatically convert headers into modules if we think that they're suitable for that. Wow. So, and then keep like some set of hot and often used headers, like as a models somewhere on disk or in memory, and then reuse it as we go.

Starting point is 00:42:35 It's for now, it's in our fork. It's not open sourced. It's not suitable to be used in like in actual compiler because well, in actual compiler, you actually care about correctness much more than in IDE because in IDE, you can get an incorrect resquiggle

Starting point is 00:42:50 and in actual compiler, you can just get the wrong behavior of the program. But eventually someday, I dream of a world where we can do it automatically like even in an actual compiler. That sounds very promising to me, for sure.

Starting point is 00:43:11 So another thing that occurred to me is that I assume I can throw pretty much any C++-ish code into CLion. So I use GCC at work, and we'll use every extension we can possibly find. We'll use GNU annotations, and we'll use inline assembly blocks. We'll use anything that's available to us we'll start using because we know which compiler we're using. But obviously that's not supported by every compiler.

Starting point is 00:43:38 Sometimes the things that I'm doing won't even work in Clang, which makes me wonder how on earth you deal with the fact that you're using ClangD, which doesn't necessarily support everything that GCC does. And also there may be other vendor extensions out there that other people might be using in their compiler and they can't use CLion if it can't understand their code. So how do you decide what goes into your sort of superset of all C++ features? Or is there a process for this? Or how does that work? Yeah, well, nowadays we're kind of using whatever Clang has.

Starting point is 00:44:11 And with a bit of like fiddling with compiler flags and stuff like that, and with some Clang person modes, you can get a lot of different behavior and various behavior of various simulation of other compilers. We obviously follow our users here and we see what they need, what they request and what they say that doesn't work. We have some minor tweaks for Clang frontend to match some commonly seen extensions. Right. Generally, I believe that Clang has, for recent years,

Starting point is 00:44:48 Clang has catch up with most of major compilers. And we don't see compatibility with existing non-standard extensions or older non-standard extensions as a major pain point of the users. Maybe these users just don't use c-line but generally we don't see as a huge problem right but i mean you target embedded and they tend to have a few magical keywords but i suppose most of those you can work around or are already

Starting point is 00:45:18 supported by clang because most clang supports everything so that's interesting but obviously you've got multiple parsers to keep up to date too so i could imagine it being painful and is that the same for example for uh like new language features like i had a few times the situation where i was you know using something that wasn't even in the latest standard yet like you know deducing this or some new syntax and then and then the c line is giving me a red squiggles um obviously. Is that basically because Clang doesn't support it yet? Yeah, well, for new features, it's basically the same thing. So we're working on a pretty fresh Clang. We're kind of merging every two weeks,

Starting point is 00:45:59 and usually we're releasing a version of ClangD, which is ahead of the latest official Clang release. But it's actually lately became some sort of a concerning issue because what I feel like is that Clang development is a bit stalled. Well, it's not completely dead, obviously, and everyone's relying on it, but the speed of how quickly it gets new features and it's not like it was before like before all the new standard features were implemented there first

Starting point is 00:46:35 and then came to other compilers and i believe this is not always the case recently yeah i noticed that too nowadays often like the microsoft compiler is actually the first one to implement something like deducing this for example whereas you whereas a few years ago, they would be the last compiler to get the latest syntax or whatever. And now they tend to be first. So that's quite interesting. well, it's easier for us in our in-house parsers because, well, we have way more expertise there compared to the Clang 1. And also it's kind of complicated to contribute new features to Clang when we have absolutely no expertise and no need to implement backend support for that.

Starting point is 00:47:21 Probably you can just contribute a frontend thing that doesn't work to the Clang front-end upstream. So there was this thing once, what we did is we merged the concepts branch in our ClangD support before it was merged to upstream ClangD.

Starting point is 00:47:38 So we tried that once, but yeah, it was kind of painful. We probably won't do it again. And yeah, and all of that is kind of reinforced us in thinking that we shouldn't drop our in-house parsers for now and just work on it in parallel and see how it turns out. Because, well, especially now, we don't want to bet all in on Clang

Starting point is 00:48:06 and we want to provide the competition for that. So my day job relies on me having to access a remote computer to do most of the compilation work. I kind of work with a laptop and then everything else is remote. And I know that CLion supports some remote execution stuff, but that's a bit fussy for me. One thing I've noticed is that there's a new product line coming, Fleet, which seems to be like a completely redesigned experience where it's very much a client-server thing. And sort of first running the main part of the IDE remotely but showing the UI locally is an option.

Starting point is 00:48:45 So that's very appealing to me, but it doesn't have C++ support yet. Is that coming soon? Or where are we on that? Well, actually it does have C++ support. Oh, then I missed it. Darn. I need to go and check that out.

Starting point is 00:48:59 It's kind of bare bones. Yeah, so it's based exclusively on Clang d it doesn't have any like uh we are still using our clang d fork and we're still using this uh indexer optimization there but uh it's not using our c lines in-house stuff so well it's a bit bare bones so it's uh like a bit basic at the moment but is this the sort of the direction that JetBrains are going with their IDEs? Is this it? Or, you know, I mean, I can't imagine CLion is going to go away anytime soon,

Starting point is 00:49:29 but like in the five, 10 years frame time, is this the sort of the way forward? Well, I don't think that CLion will go away even in five or 10 years time. And well, Fleet has a different set of priorities and different set of trade-offs to do so it has this multi-process remote by default architecture and with multiple workspaces multiple like code analysis engines can connect it and all the remote handling and stuff like that

Starting point is 00:50:00 and well it's kind of offers this lightweight feel uh that's the line sometimes legs but yeah we'll see how it goes uh so for now like sea lion is alive and well and we are not planning to switch the fleet completely what we're actually planning to do is to uh have great remote development capabilities in sea lion asine as well. So they're also working with that. And for Fleet, it's just probably targeted on different user base and different use cases. Got it. Right, so we talked a lot about parsing and other things.

Starting point is 00:50:35 Is there any other really cool tech under the hood in C-Line that you want to talk about? Yeah, well, there is this thing I want to mention is that C-Lion actually has a pretty powerful and kind of state-of-the-art data flow analysis engine. So when you see where the variables are defined and then what values they can get and how they pass between functions and stuff like that.

Starting point is 00:50:58 So there are a bunch of powerful analysis built on top of that and CLion, like starting from just offering to simplify logical expressions. If CLion knows that it's true or false, but it also goes beyond that and checks like some nullability stuff, whether like local references is leaked outside of function, or maybe if it can even check some lifetime issues like beyond the references but yeah it actually work in progress and it requires

Starting point is 00:51:32 probably better support from libraries and standard library because for standard library we can hard cut all the lifetime interactions between types and objects, but for soft-party libraries, it's not possible. So we could really use standardized and agreed-upon set of attributes or stuff like that about what's an owner, what's a pointer, and stuff like that. There are some examples of that in what's it's called, JCL library, the support library for core guidelines,

Starting point is 00:52:07 but I'm not sure it's widely used yet in these foundations. But yeah, it's a really cool piece of computer science in Science CLI, and it has some solver of binary decision tree called binary decision diagram. That's super cool. I didn't realize that that wasn't something like Clang Tidy running. I've seen those wiggly things in my IDE where it says, you know, this is always true or, you know, this can't be reached.

Starting point is 00:52:30 This is unreachable code. And I didn't realize that was something that was unique to the IDE itself. That's impressive. But does – ask the question, this is great in my IDE, but like if it's just a set of annotations or wigglies in my IDE, it's no use to my other developers who sadly, and despite my many controlling ways, have never picked up C-Line. So we have a team that's mixed.

Starting point is 00:52:58 So is there ever an idea that we might be able to run this in CI somehow and get the analysis report back out in an automated way? If it just lives as a squiggly in my idea, it's not as useful to me as it is as a tool standalone. Yeah, there is this somewhat new project at JetBrains. It's called Kodana. I think it now enters like that. It goes out of beta and, well, it's basically a tool to run various analysis,

Starting point is 00:53:26 including like JetBrains in-house semantic analysis on CI. And just it has a bunch of features about reports and like inspection results and showing it pretty like having a baseline and just showing you a new field inspection and stuff like that. Sadly, there is no C++ yet there, like the properly supported, but we're working on that. I hope it would appear in some

Starting point is 00:53:54 foreseeable future. So I hope that would be the way to use this CLI and built-in checks. Some of them are pretty cool, pretty powerful on the CI. Like, for example, CLion also implements

Starting point is 00:54:11 a bunch of checks for automotive, like a subset of MR checks. So it's also something that obviously everyone can benefit from running Git and CI. So please stay tuned. It will appear at some point. All right.

Starting point is 00:54:26 Well, that's really cool. Yeah, I used data flow analysis quite a few times when I was, for example, recently I did this like C++ and safety talk at like CPP on C and CPP North. And I was like playing around with like examples that would be UB. And there were quite a lot of cases

Starting point is 00:54:43 where CLI said, hey, you know, you just, this iterator might be invalidated or, you know, this is an out-of-bounds access and, you know, things like that. Yeah, I found that was very, very impressive. But I was also really curious, like, okay, is that Clang now? Is that like your own thing? Yeah, maybe we are not doing a really great job of promoting that. Like showing it's actually our shiny thing.

Starting point is 00:55:03 So it kind of goes along with all like the inspections we have from clank and clangd and other stuff but and clang tidy you have clang tidy stuff as well right so yeah yeah all right so um actually beyond c-line or the world of ide is there anything else uh kind of going on in the world of c++ now that you think is really exciting or interesting and you're keeping an eye on? Yeah, well, I'm kind of looking closely on modules progress because, well, it's somewhat important for the IDE. I won't say it will solve all of our problems

Starting point is 00:55:37 because, well, if half of the people are using modules, it would be good for these half of the people, but we still need to support another half which doesn't use modules. And this actually goes with all the C++ features and things from new standards and improvements. So if someone is using a new improvement, we need to support both this improvement

Starting point is 00:56:00 and what was before that. But yeah, I'm looking closely on modules and i'm kind of disappointed of it of how progress is going in the actual compilers because i think the only thing that has production ready support in like c++ 20 modules is visual studio and the rest of the compilers are lagging behind so so there is no cross-platform universal support. But yeah, actually, when everything would be implemented, well, it's obviously very complicated

Starting point is 00:56:32 as anything else in C++, but the compiling pipeline and how things are built would make, I believe, way more sense than it does now when you have to have a lot of code in your header files. Yeah, so we're kind of nearing the end of our episode here. But before we wrap up,

Starting point is 00:56:55 is there anything else that you want to tell us? Anything else you want to mention? Or maybe you can tell people where they can reach you if they want to get in touch. If you want to reach me about some things about CLion, probably the best thing to do is you can either

Starting point is 00:57:12 write to CLion support if it's some specific question or you can we have CLion ID account on Twitter. You can write to it and that's probably the best way to communicate. If you want to reach me personally,

Starting point is 00:57:27 I think there are some contact details on the CppCast page, and feel free to reach me as well. All right. Well, thank you so much for coming on the show and talking to us about C-Line and how it works under the hood. It was a really fascinating discussion, I think.

Starting point is 00:57:42 Yeah, thank you very much, Dimitri. Thank you for inviting me. Thank you. All right. So you very much, Dimitri. Thank you for inviting me. Thank you. All right. So thank you again, Dimitri, for coming to the show. And thank you, Matt, for being my co-host for today. So thank you so much, both of you. Thanks so much for listening in as we chat about C++.

Starting point is 00:57:56 We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in. Or if you have a suggestion for a guest or topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate it if you can follow CppCast on Twitter or Mastodon. You can also follow me and Phil individually on Twitter or Mastodon. All those links, as well as the show notes, can be found on the podcast website at cppcast.com. The theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - How CLion works under the hood

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.