CppCast - LFortran

Starting point is 00:00:00 Episode 306 of CppCast with guest Andrey Shurtek, recorded July 1st, 2021. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we talk about ISO papers and GitHub Copilot. Then we talk to Andre Sertic from Los Alamos National Lab. Andre talks to us about the L4Train compiler and the Fortran language. Welcome to episode 306 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm okay, Rob. How are you doing?

Starting point is 00:01:22 Doing all right. Any news from you anything you want to share well i guess i could share that it looks like training is starting to return everything seems to be going back to normal yeah that's the thing hopefully this uh delta variant nonsense won't um shut things back down again but yeah from everything that i've read and seen it seems for people who are vaccinated it's no more risky than any of the other variants it's just traveling very quickly through the people who aren't vaccinated right now right and so if you aren't vaccinated consider getting vaccinated if you're not vaccinated definitely go get one i i've been doing fine with it i haven't gotten any

Starting point is 00:02:00 secret messages from bill gates or anything like that. There have been zero zombie outbreaks so far. Okay, well at the top of every episode I like to read a piece of feedback. We got a tweet from Pratik Anand saying, late to the party but really enjoyed the conversation

Starting point is 00:02:19 on CppCast on Rigel Engine. I've contributed to the project too so even more happy for it to see some more recognition. It's an important step in video game preservation IMO. That was the Duke Nukem engine, right? Yeah. But we've done a couple episodes on kind of that topic lately, preserving old video games.

Starting point is 00:02:40 Yeah, I may or may not have actually sought out people who would be good interviews for that yeah well it's been fun okay uh we'd love to hear your thoughts about the show you can always reach out to us on facebook twitter or email us at feedback at cppcast.com and don't forget to leave us a review on itunes or subscribe on youtube joining us today is andre certic andre is a scientist at Los Alamos National Laboratory, originally from the Czech Republic. His background is computational physics and high performance computing. In addition, he has been actively involved in open source. He is the

Starting point is 00:03:15 original author of SymPy, SymEngine, L4Tran, and a co-founder of the 4Tran Lang organization. His current mission is to rejuvenate 4Tran, a language for high performance numerical computing. He likes and uses C++ as a great tool that allows him to deliver robust, very fast libraries and applications, including SimEngine and the L Fortran compiler. Andre, welcome to the show. Thank you for having me. I'm curious as we get into this interview, because I know the National Lab sponsor a lot of open source work. Is the work that you work on at the National Lab sponsor a lot of open source work. Is the work that you work on at the National Lab also open source? Is it related to these projects too?

Starting point is 00:03:52 Some parts. I'm funded a little bit on the L4Tran compiler. In fact, I'm funded on L4, sorry, on Fortran to C++ translation because as you probably know, a lot of people are moving away from Fortran to C++. And so I'm happy to help with that also. We'll get to that later. Awesome. Okay, thanks. Sounds good. Okay, so, Andre, we just have a couple news articles to discuss.

Starting point is 00:04:13 Feel free to comment on these, and we'll start talking more about Fortran and maybe SymPy and all that, okay? Awesome. Okay, first one we have is the June 2021 ISO mailing. Lots of new papers, as always. I didn't have a chance to go over any of these in too much detail. Jason, I did see there was one for constexpr for cmath and cstodlib. So it looks like the constexprizing of the standard library is continuing.

Starting point is 00:04:43 So that's always good. That one has been a long time coming because they have problems with... It's revision 8 of that paper. Yeah. Anything either of you wanted to call out with these papers?

Starting point is 00:04:57 Yeah, I like the stack trace from exception paper. It's the P2370R0. The idea, I think, if I understand correctly, that the language itself should somehow help you get the stack trace when you get an exception. That could be handy. Very nice for debugging. I do that by hand in every C++ project that I have. I get nice exceptions. Yeah, very nice. And it looks like there's movement on

Starting point is 00:05:24 executors of some sort standard execution um first revision of this paper i didn't see that one for managing asynchronous execution on a generic execution context so it looks like there might be maybe we'll get that into c++ 23 that would certainly be nice and i know a lot of other things are waiting for that to come in, right? Yeah. Yeah. All right. Next thing we have is a GitHub repository, which is really just a collection of the this graph, and I guess, maybe some code that they use to run it. C++ library include times. So it's a nice graph of a lot of the standard library headers and also some popular open source library headers

Starting point is 00:06:10 and how long it takes to include them if you are using these headers. And yeah, I guess maybe some of these were maybe a little bit surprising, like file system takes a huge amount of time to include maybe that's not surprising i don't know right now the only one that's surprising to me is that regex isn't the one that's at the end of it yeah that's true regex is actually very small they have a kernel library it takes 1.1 seconds and i tested it on my mac and it takes 1.1 seconds. And I tested it on my Mac, and it takes 6 milliseconds with Clang on my Mac.

Starting point is 00:06:46 So I'm a little bit surprised it takes 1.1 seconds on Windows. Yeah, this was all done using Visual Studio, I believe. So it would be nice to see if someone did this with GCC and Clang so you could compare these different popular libraries on the different platforms. Right. Yeah, another one that stands out here that I find a little disappointing is Speedlog, which I do like using.

Starting point is 00:07:11 And it's actually the highest one on this entire chart. Yeah, and I was a little surprised that that was the highest one and not like Windows 8 or something like that, that you would kind of think would be one of the worst offenders. Yeah. And most of these things, you know, like if you really care, you can isolate their usage. But something like speed log, you really can't.

Starting point is 00:07:32 I mean, realistically, if you want to use the same logger in all of your CDP files. Hmm. I have to give that a look. Okay. And then we also have, this is not necessarily directly c++ related but uh i see a lot of programmers on twitter talking about this over the past week uh github co-pilot got announced and this is uh an ai you know programming assistant it's available as a visual studio

Starting point is 00:08:01 code extension and it's you know powered by all the repositories on GitHub. Looks pretty fancy. I'm not sure if it's directly related to the Visual Studio AI tool that I know has been Visual Studio for at least the last release or two. Yeah. Last year or two. I've turned on the Visual Studio one and then not necessarily noticed a difference in the autocomplete, so I don't know if I was just using it wrong or if it didn't see something. This one looks a little different in that you can write your function name and it suggests

Starting point is 00:08:34 an entire block of code for you, which is maybe a little scary. Well, the question is, once this gets good enough, can I start training AIs instead of humans? Yeah, maybe. What do you think about this one, Andre? Yeah, I just think it's very interesting. I wonder how they do it, if they have an abstract syntax tree for every language, and how much semantics they encode, or if this is all machine learning. It would be interesting to know the details.

Starting point is 00:09:03 Yeah. Okay, and then the last thing we have, Jason, this is the announcement for the CppCon field trip for this year? Yes, the field trip. Now, you organized it two years ago. Did you have any hand in this one? I did not have a... No, not directly.

Starting point is 00:09:22 When it was being planned for the 2020 conference, it was the same plan that we wanted to do for the, I say we loosely, the meetup wanted to do for 2020. And the organizer brought up this idea and we all collectively agreed that this is kind of like, this is Colorado, right? This is going to give you a taste of colorado because uh the field trip will be starting at the gaylord and then going up into the the mountains um i don't know what would you hear like nine ten thousand feet something like that uh from the five thousand feet that we are at down here so from like uh sixteen hundred meters to like twenty,000 meters-ish around in that ballpark. And doing a narrow gauge railway that will take you to one of the old mining camps.

Starting point is 00:10:13 And you can go in, and I've done this trip before myself. It's been a long time. But you can actually see what a working mine used to be like. I'm pretty sure that they're planning to actually go into the mine. And then when that's done, then stop in Idaho Springs for Bojo's Pizza. And I know some of our friends who like to drive up to C++ now make a point of actually stopping in Idaho Springs just for Bojo's Pizza on the way from the Denver airport to Aspen.

Starting point is 00:10:43 So it's, yeah, it's a Colorado institution. And if you care about beer, I don't know how much time that you'll have, but there's two breweries within walking distance of Bojo's where you could grab and go and take it back to the hotel with you. Nice, nice. All right. Well, Andre, we've talked about Fortran on the show plenty of times on CBBCast, but usually we're talking about it as if it's very much a thing of the past,

Starting point is 00:11:14 a somewhat dead language, but you're actively working on it. Do you want to start us off there? Sure. Well, it is an old language one of the if not the oldest high level language started at ibm i believe in early 1950s and you're right that a lot of people think that it's it is dying or even dead but um and i also i most you know most of you who studied any kind of physics or any kind of engineering degree you probably know that your advisor typically has some kind of old Fortran code around

Starting point is 00:11:51 and you're stuck with it you have to resurrect it and fix some bugs in it so that's most people's experience with Fortran and then of course I knew Python and I was mostly using Python and C++ but then

Starting point is 00:12:06 I came back to Fortran in around 2010 I was at Lawrence Livermore National Lab as an intern and I was optimized I had some Python code with some Fortran kernels for some electronic structure and we were optimizing it and eventually we rewrote it in Fortran and that's when I kind of learned Fortran, the modern incarnation of Fortran. And I realized that it's a really nice language. It feels like Python and NumPy. And so for numerical computational things, it's very handy. Everything is in the language.

Starting point is 00:12:43 And then the compilers can optimize it beautifully. And so that hooked me in. And then as time goes, 2015-16, usually what I see the direction of let's say I already was working at Los Alamos National Lab. We have a lot of Fortran codes, a of production for trump codes but i only see one direction typically uh people want to move away from fortran to mostly to c++ so yes uh a lot of and if you talk to a lot of people um it's very common um that's kind of the sentiment that

Starting point is 00:13:18 fortran is a dying language um and so that's how i joined i decided to uh fix that and make sure it's not a dying language anymore. And also I should say the term dying. Every time I say it, I get a pushback from the Fortran community. I always say it's not dying. It's never been dead. You know, you should not be using such words. But, you know, it's one way to characterize it. So I prefer to use the term rejuvenate. Anyway, but yes, it's definitely in the 2015, 16, 17 timeframe, it was not doing well at all. And that's when I decided to join. So I wanted to help fix it, because I like the language.

Starting point is 00:14:00 We can get back to it later why, but I wanted fix it and so I thought well how what can I do even so I figured well there was only one organized community around Fortran there was the Fortran standards committee so I looked it up online and there's not much information there didn't know anybody there but there was a mailing list so I decided to join it and then I asked them you know is there a meeting and there was a meeting in Vegas I believe so I you know booked my flight and came to the meeting and I I just didn't know what to expect I know today you know is it a meeting you know or how does that even you know how many people are there and so on turns out there were about 20 people 15 20 people and 15-20 people,

Starting point is 00:14:46 and they were indeed working on new features to Fortran. And so I asked them, what's your vision? I feel the situation of Fortran as a language is not very good. Do you have any plans to fix it? I see a lot of codes around me moving away from Fortran and I don't see any code moving to Fortran. Do you have any kind of ideas how to fix that? The feeling I got was that they did not. I asked, well, one of the arguments that I heard there was that at universities that you don't have Fortran classes anymore and so that they teach C++ or I would even argue Python Let's say Well, so I told them well, I have some ideas and so

Starting point is 00:15:34 It was I had a very early prototype of L4chan which is a compiler I've been working on The prototype by the way was still in Python. I like Python. So it was in Python. It was just a prototype showing that it can work interactively. And so this L4TRAN compiler, I'll talk more about it later, but I showed them the prototype in a Jupyter notebook, interactively executing Fortran cells. And I think they liked it. And I was trying to motivate them that I think this is how we can attract a lot of new users from the Python and MATLAB Julia community. I still think that. And I will say that I think now we are in 2018 or 2019.

Starting point is 00:16:17 Since then I've written L4TRAN to C++ and we can talk more about the details later. And also, as I mentioned, the Fortran standards committee felt kind of secluded. So unless you were already on it, not many people knew much about it. So the other thing I've done is I created this GitHub repository for people to submit proposals to the committee and didn't think of it much. I thought, well, let's at least try, see what happens. And it was tremendously successful. I announced it on Twitter, and we immediately got dozens and dozens of people just coming in and opening issues and saying, I would like to fix that and that.

Starting point is 00:16:59 And one of the more popular, I would say, proposals is to release Fortran standard every three years instead of five years. Another proposal is to put the standard itself on GitHub and use GitHub as opposed to these old papers with hand diff, essentially, to propose changes to it and so on. Anyway, it was very successful. And then from that initial online community, we started an effort that we now call Fortran Lang. We first started with writing a standard library for Fortran. So Fortran itself as a language has a lot of features, as well as intrinsic functions like sine, cosine,

Starting point is 00:17:41 Bessel functions and so on. But there's a lot of things that almost every Fortran programmer would like to have, and they are not part of the language. And so we decided, well, why don't we write a standard library, sort of in the scope of MATLAB or SciPy. So all the special functions, all math functions, as well as all kinds of algorithms like sorting, one example. Fortran doesn't have sorting in the language.

Starting point is 00:18:05 And so we did that and then one thing led to another. So we then created a website and a logo for Fortran. If you Google Fortran or Bing or DuckDuckGo, the first or second page is the Fortranlang.org website. So that's the Fortran website. So we created the website for Fortran. And then I guess the most exciting project there, besides Fortran, is Fortran Package Manager called FPM.

Starting point is 00:18:38 We can talk more about that. Oh, wow. So that's how it all started. We also have a discourse forum. Discourse, for those who don't know, it's an online forum. It's sort of like a mailing list, but it allows you to edit your post. And it's a very nice way to communicate with the wide community online. And so we have hundreds of users.

Starting point is 00:19:00 So I'm very excited about all of these developments. Especially that within one year, we launched a website about a year ago. Within one year, we got in Google even to the second page after Wikipedia for Fortran. So things are changing. Oh, and the other thing that happened within last year was there is this Tiobe index of the kind of rank languages. I'm not quite sure exactly how they do it,

Starting point is 00:19:26 but somehow, out of nowhere, Fortran jumped from, I don't even know, 50th place to the first page. It made it. Again, I think it must be the web page, I assume, but I'll take any good news. And so this is all very positive. That all happened within the last year, pretty much.

Starting point is 00:19:47 So if we can get a little bit of timing perspective here, because you were talking about modern Fortran. And I think there's a good chance I'm going to get this wrong, but I'm pretty sure my dad learned Fortran 60 when he was in university. So what is the modern version of Fortran? You said it has been updated. When was it last standard, last released?

Starting point is 00:20:11 Well, I'll start in the back. So, there's Fortran 4. I don't know if that was the very first standard that they standardized, because they had multiple compilers and so on. After that, it was Fortran 66, then Fortran 77. After that, the next big it was Fortran 66, then Fortran 77. After that, the next

Starting point is 00:20:28 big revision was Fortran 90, then 95, 2003, 2008, and 2018. Okay. And in terms of kind of modern, what most people consider modern, so F77 or

Starting point is 00:20:44 Fortran 77, back then they used punch cards. And so card you know i had to you know i was born in 1983 so i've never used punch cards i had to look it up on youtube exactly how that works but i know some older folks might be thinking oh how is it that you don't know how it works but it's just a card and then it has the first i believe six columns are used as control characters and so when they converted those programs on punch cards into files uh you end up with a format that's called fixed form so the first six columns i believe or seven are control characters and so you cannot use them to write code you have to you have to put spaces there

Starting point is 00:21:21 or you can you can use them as comments or labels. And so that's called fixed form. It's very hard to program it because you have to put six spaces all the time and it's very particular. That was Fortran 77. Fortran 90 introduced what's called free form, which looks like Python. Essentially, you just write your code like any modern language. So that's Fortran 90. And the other thing that happened, Fortran 90 introduced modules with dependencies

Starting point is 00:21:54 and all that stuff, derived types. So that is like a C struct. And it also introduced some improvements to the arrays. You can allocate them at runtime. For Tron 77, I believe you have to allocate the array in the main program. You can, however, pass it to subroutines as unknown length.

Starting point is 00:22:14 That all works. So it's really cool when you think about it. It's for Tron 77. It's very old, and you write a subroutine just like you would today. So you accept the length of all the arrays you have, and you have multi-dimensional arrays, and this all works nicely. But Fortran 90 adds, you can actually allocate it at runtime at any time you want. So that's Fortran 90. And then since then, most of the additions are relatively small.

Starting point is 00:22:37 So Fortran 2008 adds objects and methods, stuff like that. And then the rest of the changes are just kind of fixes here and there. Things like F77 uses, I believe, slash and parenthesis as, you know, when you want to write an array, as an array delimiter. So for term 2003 or 2008, I can't remember, introduces square brackets, you know, to make it like Python, stuff like that. They also added like a parallel loop. It's called doConcurrent.

Starting point is 00:23:08 I believe that does it eight or so. They also added parallel arrays. They are called co-arrays into the language itself. Before that, well, it's not used too much. We can get to later why, but the idea is to put parallel features into the language itself and let the compilers handle that. So the co-arrays are

Starting point is 00:23:29 sort of like MPI, like one-sided MPI, if people are familiar with that. Yeah, I was going to say MPI is one of the parallel things that's supported on Fortran, right? Yes, yes it is. In fact, most codes, all the codes, I would say, use MPI for historical reasons. But co-arrays add that feature directly into the language,

Starting point is 00:23:47 and the syntax is beautiful. You just have arrays, and you just have, in brackets, you just put which MPI or which co-array, which rank you want to access, and the compiler handles and knows when the data is available and when it's needed, so it will start sending the data as soon as it has it. Interesting.

Starting point is 00:24:10 And so it's like one-sided MPI, but you don't have to worry about it. So it's very nice. The idea is very nice. In practice, the reason people don't use it too much is because, well, they already have the code using MPI, so you don't want to rewrite it. And for U-code, the compiler support wasn't great until just very recently. So then L-Portran, your compiler, which we'll, I guess, get to more in a minute parser supports full for 2018 If you know I encourage people to try it and report any bug that you have to find in the parser itself

Starting point is 00:24:51 You can use L for trans space FMT like format. It will format your code. It will for now it will skip comments and empty lines We are on there. We are still working on but everything else every single thing should work So it parses to AST, abstracts in Testtree and back to source code. The semantics we are now working on. So the current status is we are trying to identify a proxy app, it's called Snap, it's a particle transport code from Los Alamos actually but not written by me i know the people who wrote it but i have nothing to do with that and it's 495 and we are about half the way to be able to compile it we'll have the modules compile uh but but we're still fixing

Starting point is 00:25:35 things to actually compile that and the time frame we are really close i'm hoping within months you know and the end of summer i'm really hoping we are able to compile it. And at that point, we'll release MVP, Minimal Viable Product. We'll ask people to test it out and start using it. So that would be roughly Fortran 95 level. So the parser is full 2018, but the actual

Starting point is 00:25:57 compilation, you know, semantics and LLVM code and all that stuff, it will be roughly Fortran 95, which turns out to be really actually large. Most of the hard work is in there. You know, after this works, what will remain is just the objects and just kind of a runtime library,

Starting point is 00:26:14 all these functions, all kinds of condor cases and so on. But the hardest part will be behind us. Sponsor of this episode is the PVS Studio team. The team develops the PVS Studio Static Code Analyzer. The tool detects errors in C, C++, C Sharp, and Java code. When you use the analyzer regularly, you can spot and fix many errors right after you write new code. The analyzer does the tedious work of sifting through the boring parts of code.

Starting point is 00:26:39 It never gets tired of looking for typos. The analyzer makes code reviews more productive by freeing up your team's resources. Now you have time to focus on what's important, algorithms and high-level errors. Check out the team's recent article, Date Processing Attracts Bugs or 77 Defects in Qt 6, to see what the analyzer can do. The link is in the podcast description. We've also added a link there to a funny blog post, COVID-19 Research and uninitialized variable, though you'll have to decide by yourself whether it's funny or sad. Remember that you can extend the PVS Studio

Starting point is 00:27:09 trial period from one week to one month. Just use the CppCast hashtag when you're requesting your license. But before we talk more about like L4Trans and these other projects, you know, maybe we should just back up for a moment and go over how Fortran compares to other languages like C++, and why do you feel like it deserves all this attention, and why are you against people converting old Fortran code into C++,

Starting point is 00:27:41 which you said you're trying to get them to stop doing? Or you're encouraged. L4Trans has a C++, which you said you're trying to get them to stop doing. Or you're encouraged. Fortran has the C++ backend, so it translates Fortran into C++. Oh, it translates it into C++. So if people want to move away and just develop in C++, they will be able to use

Starting point is 00:27:57 Fortran to do that. So, this is also cool. But, in fact, I think, I'm hoping a lot of people will decide, well, if I'm not logged into Fortran, and if I can always translate to C++, maybe I will stay in Fortran. But, anyway, so why Fortran? So, the main, I would say, three motivations for Fortran is to essentially enable scientists and physicists and domain experts, engineers, to write domain-specific code,

Starting point is 00:28:27 essentially write numerical code, and be able to maintain it themselves. And I would say there are three kind of pillars. One is the basics mathematics is in the language itself. So Fortran has this pronunciation, has complex numbers, has all kinds of special functions. You know, F77 already had all that. It is more restrictive, I would say, and higher level than C++. So it's, for example, it has the multi-dimensional arrays, but in the language. It has pointers, but you cannot just point to to anything you have to declare what you point to as target so things like that. It's much more restricting. F77 if anybody plays with it, it's very restricting. It feels oh it's just so hard to do anything. It's very

Starting point is 00:29:18 restricting but the advantage of that is that it's very simple. There's not much to it. F77 is just a bunch of subroutines and functions and arrays and loops. And also that kind of design allows the compilers historically to optimize it really well, which means you get very good speed. So those are the three kind of advantages. I would say, if I can expand on that, historically the mission for Fortran from the very beginning was to allow scientists, engineers and domain experts to write programs that naturally express the mathematics and algorithms employed, are portable across HPC, so high performance computing systems, remain viable over decades of use and extract a high percentage of performance from the underlying hardware. So Fortran as a language it feels

Starting point is 00:30:11 high-level it doesn't have things like you cannot inline assembly for example the language doesn't allow you to do that. The language itself doesn't have a memory model. In other words, in C and C++ you can go into the bits and how floating point, you can rely on how floating point is represented and stuff like that. In Fortran, you can operate on floating point in a more abstract manner. And the reason is historical. Fortran ran on all kinds of machines that did not have IEEE, you know, floating point,

Starting point is 00:30:49 all kinds of weird, if you look up on, you know, all the high-performance competing systems in the past, they have all kinds of architectures. And yet the Fortran compilers were able to take the program and just compile it to run on the machine.

Starting point is 00:31:04 So that's the motivation. So then with that in mind, do you support or recommend or what would the process be like if your engineers, scientists, wanted to just keep code in Fortran that's part of a larger C++ application? And it sounds like you can just emit the C++

Starting point is 00:31:23 and potentially just link it together. Eventually. So I think there are multiple approaches. The basic approach, so I would say languages in modern era, they have to interoperate. So you need to be able to have a Fortran library and just call it C++ from C++

Starting point is 00:31:39 or from Python, just seamlessly. To support that, Fortran, to have some prototypes kind of showing that it works, it should provide the wrappers automatically for you. It's a compiler. It knows all the types, knows everything.

Starting point is 00:31:53 It should just allow you to use it from C++ and Python or Julia, MATLAB, whatever you name it, it should just work. So for example, from Python, there should be a library where you can just use, like import a module that happens to be a Fortran module,

Starting point is 00:32:08 and the compiler behind the scenes should just wrap it and make it available for you. Or you can use it to generate the wrappers as files. So traditionally, the way you wrap Fortran from other languages is you have to write, it's called ISOC binding. It's a special module in fortran which allows you to interface c either calling it or exposed to c and then from c plus plus you have to or python you have to then call this as a c library but then typically fortran has arrays so in python for example you don't want just a memory you want a numpy array that should be mapped to the same memory as in fortran. So there's quite a bit of

Starting point is 00:32:45 technical things involved and so there are tools that allow you to do that. F2Py is for Python. For C++, I don't know actually if there's a tool that... well, one issue with C++, there are a lot of C++ library that can handle arrays and so for each, you know, you have to have some kind of, you know, wrapping code. All of this should be in my opinion handled or at least helped by the compiler so that as a user you can just go to c++ and just start using it so that's one one answer um the if you want to go away from fortran i also think the compiler should allow you to just translate all your phone track on to c. Technically, so it seems to be working. What we have to do just,

Starting point is 00:33:26 what we are doing now is just kind of finish all the semantics and actually make it work for Fortran 95. But technologically, it seems to be working. Go ahead.

Starting point is 00:33:35 The code that's generated, code generated by tools is not necessarily often maintainable. What is the code like generated from your tool? Yeah. So for LANOS, Los Alamos National Lab, most people like the

Starting point is 00:33:51 Cocos C++ library for arrays. So right now we just target Cocos, but it's not tied to it. We can change that also, but later. So array expressions are transformed into Cocos array expressions, and loops are simply transformed to access Cocos as the array implementation in C++. It looks... We tried to make it as readable as we can. I think it can be done so that it's readable. So F77, there is a tool called F2C,

Starting point is 00:34:27 and it translates F77 to C. That's an old tool. It's an old tool and not that readable. I think it can be done much better. The subset that we can translate so far is very readable. It's essentially, when you think about it, so the compiler knows exactly what you are doing in Fortran because it knows how to translate it to

Starting point is 00:34:49 LLVM for example or machine code. So it knows exactly the semantics, it knows that you have an array, it knows that you have a loop or a subroutine or a function, it knows exactly the type. So to make it readable in C++ all that's needed is to decide how you want to represent things in C++. And the huge advantage of C++ in this case is that because it's such a bigger language, there's multiple ways you can represent the Fortran things. And the Fortran semantics is very simple. Even the latest incarnation of Fortran is still very simple. The types map nicely to C++ types and then some of the

Starting point is 00:35:26 corner cases can be handled by the compiler and the arrays for example. So the Cocos arrays, they pretty much have all the operations and more than what the Fortran arrays allow you to do. The only thing they don't have, the main thing they don't have is in Fortran you can have array operations. So you can stack or you can operate on arrays as a whole. So those have to be written to for loops. But again, the compiler has to do that anyway to emit machine code. So the compiler has all the technology to do that. And it's just about the C++ backend.

Starting point is 00:36:02 We are trying to write it in a way so that it's readable, so that people actually like what they get. So aside from the ability of L4tran to generate C++ code from your Fortran code, what else sets it apart from... Are there other Fortran compilers, or is there just one other one? What else sets this one apart? So there is about 12 Fortran compilers. But if you go to fortranline.org,

Starting point is 00:36:28 there is a section of compilers, and we list all of them and links to them. So in terms of open source compilers, there is GFortran, part of GCC. There is Flang, part of LLVM, and there is LFortran, which are the three main ones. There were a few more in the past, but they are not actively developed anymore.

Starting point is 00:36:46 Does that mean Intel's is no longer maintained? So those are open source. And then you have commercial. Oh, sorry. Okay. So there's Intel as the main, I would say, historically at least, that's the main kind of compiler in terms of delivering optimized code.

Starting point is 00:37:00 But there's NAG compiler, there's Cray compiler, AMD, IBM. All these companies have typically their own compiler. But I would say Intel and NAG are the two main ones that we use often at least. So then how is L4TRAN different? It's interactive. So you can use it to compile to binaries as any other compiler, but you can also use it interactively so it feels like Python or Julia. So you can launch a REPL from a command line and it looks just like Python.

Starting point is 00:37:33 You just do integer i and i equals 5 and you can start putting four-time commands and each command gets compiled to LLVM and machine code loaded into memory and executed. The way it works technically is just like Julia works. In fact, Julia was a huge inspiration for me to start L4TRAN. I kind of investigated how they do this and realized, oh, I think this can be done for L4TRAN also. And so that's one big change, big difference. And the other difference is that it has multiple backends. And I would say the design of L4Tran is something a little bit unique. I'll just quickly kind of describe it. So there is a parser, parser parses to abstract syntax tree AST, and then the AST gets transformed. So then all the semantics gets checked,

Starting point is 00:38:23 and then it gets transformed to a representation that we called abstract semantic representation ASR. Again, it's a standalone representation that represents just the semantics so it has a symbol table and things like that and you can print it to a screen you can give it back to the user so they can see exactly what the compiler sees is exposed anyway and and then all the back end just take this asr and do something so llvm you know back end generates llvm code we'll c++ back-end generate c++ code the python wrapper back ends that we will write will generate python wrappers or whatever and then we also have x86 uh direct machine code um generation back-end

Starting point is 00:39:03 that generates generates machine code very quickly. It's more of a prototype just to see how fast it can be. It's very fast. It's about, well, on the artificial benchmark I tried, it's about 20 times faster than LLVM to compile. Oh, wow. To compile, okay. To compile, that's the key.

Starting point is 00:39:19 So that back-end will be used for development when you want to compile your stuff very quickly. Right. Yeah, because I'm... Sorry. Implementing all of the same optimizations that LLVM does would be very difficult. I'm not going to even attempt that. LLVM is so...

Starting point is 00:39:35 I could talk about it for hours, but LLVM is awesome. The way... It's absolutely amazing. It has very little information. It's a low-level IR. And yet, it can optimize. information it's a low level you know IR and yet it can optimize what it can do is just amazing so we have these tests in in Fortran that I test to compiler with things like for loops and it does

Starting point is 00:39:53 some calculation integer let's say then at the end I have if that integer is equal to 55 then exit with zero otherwise exit with an error and then when I run it through LLV so llvm just sees a bunch of doesn't even have for loops it just has you know jumps and yet it can optimize everything out and so the whole test is just a return zero you know it's just amazing absolutely amazing and but yet so it's great for this kind of low level optimizations where llvm is not that great is if you want to optimize areas at this kind of higher level things like it well it's great for these kind of low-level optimizations. Where LLVM is not that great is if you want to optimize arrays at this kind of higher level. Things like, well, it's still amazing what it can do, even just not knowing anything.

Starting point is 00:40:31 It can unroll things. It can, you know, it's amazing. But for Fortran, it's not good enough. For Fortran, what's really needed is to be able to optimize array loops that operate on arrays on kind of a higher level. Things like there's multiple different ways you can transform array operations into loops and things like inlining and stuff like that. It's always better to do it at the higher level.

Starting point is 00:40:55 So ASR is Abstract Semantic Representation L4TRA before passing it to LLVM. It's a similar idea as MLIR. MLIR is a library that's part of LLVM now. It's built on top of LLVM. It's a similar idea as MLIR. MLIR is a library that's part of LLVM now. It's built on top of LLVM. It allows you to represent arrays and for loops and if statements in the IR itself precisely so that they can optimize it better. So it's a similar idea. Did you consider writing your Fortran parser and compiler in Fortran? Yes. But, yeah, I can maybe get back why I chose C++.

Starting point is 00:41:35 I knew that I could deliver this in C++ and make it very fast. So one of the things I wanted, since I decided I'll do this, I wanted this to be the fastest compiler. I still want it. I think it will be. That's my goal. Fastest to compile. And to do that, everything has to be fast. So the internal representation, the AST and ASR, has to be as fast as possible.

Starting point is 00:41:56 And so to do that, I spent two months just benchmarking seven different ways how you can represent a tree in C++. So as a class you know like inheritance as c structs and you know casting there's and also how to visit the tree very efficiently so there's again so many different ways you can visit a tree in c++ you can have a you know a visitor pattern you can have a just you know switch and you know there's many different ways you can do it but also with C structs you can have a you can represent the AST as union or you can have each AST node as a struct of different size. Also how you allocate the memory and so on. I kind of reused my experience by

Starting point is 00:42:41 I've written this library called SymPy, which is a Python library for symbolic manipulation. It's very successful. It has a lot of users and contributors. It's Python. It's great. It's like Mathmerica or Maple. It allows you to compute symbolic integrals and so on. But it's in Python.

Starting point is 00:42:56 It's very slow. So I decided, well, if I want to make it fast, how do I do that? And I tried many different approaches. I tried C. I tried Python. Python is this great tool that allows you to kind of speed up Python with C and so on. But eventually I realized the only tool like C++ that allows me to do that, to deliver. And you know, I don't want to spend too

Starting point is 00:43:19 much time on that, but essentially in around 2015 we kind of delivered a library called SymEngine, which is a C++ implementation of the core of SymPy. It's a tree in memory and it's using reference counting as the memory management. We wrote our own reference counter, very optimized, and eventually we made it the fastest library for symbolic manipulation that we benchmarked. We benchmarked against Mathmerica, Maple, Sage, SymPy, of course, Ginnut. It seemed like it's the fastest. So and then when I was writing the compiler, I realized a compiler is not really that different from SymEngine or SymPy. In fact, SymPy is a compiler.

Starting point is 00:44:01 I just never thought about it that way. But what SymPy does, it allows you to parse, it has all kinds of input parsers. So it allows you to parse things, it allows you to represent the symbolic expression in memory, allows you to apply operations on it in memory, and then it has code generation including even LLVM. So it is a compiler. I just never thought about it that way. So using this experience from SimEngine, I realized, well, the reference counting is one way, but I think there is a faster way to do that. And so after spending months benchmarking different approaches, I ended up using just essentially C struct, a custom allocator, a linear allocator. So it just moves the pointer. And the way to visit it, the fastest

Starting point is 00:44:47 that I was able to get is just a C switch where based on the type, dispatches on the type, type is an integer, park would destruct. And then because it's a lot of boilerplate code, I generated. So I looked into how Python represents abstract syntax tree and they have they have this language called ASDL which represents every AST node. It's this nice kind of high scale style or ML style language and then they have a tool that can parse it and in their case they generate a C as a kind of representation of the AST. So I took that and

Starting point is 00:45:23 I generate this the very fast C++ implementation of this. And then, of course, to make it easier to use, we also generate a kind of CRTP pattern to generate very fast. It looks like classes, it looks like inheritance, but it's all in compile time. And I carefully benchmark it to make sure there's no overhead and there's no measurable overhead. So from the user perspective, it looks like user, I mean the compiler developer, it looks like a

Starting point is 00:45:50 class pretty much and you have a visit bearing and it gets called and underneath is all this not really complicated but kind of tedious machinery that's generated automatically. Edious. So yeah and it's very fast so it can uh it can so it can represent things fast and then the parser is in bison and um the the tokenizer is in re2c2 um yeah and and then so now the slowest part is lvm even in the debug mode it's just really slow it's great for optimizing but if you just want fast compilation um that's when I decided to see if we generate machine code directly and it's much, much faster.

Starting point is 00:46:28 So right now we are working on the LLVM backend because we want that. That's the most versatile and it will allow us to deliver, including some pretty good optimizations. But down the road, I would like to come back and see how with the direct machine code generation, it could make it very fast. And I think as a user, as a Fortran user now, I would love if the compiler can really generate the code 10 times faster than other compilers.

Starting point is 00:46:53 I think it would be really cool. So out of curiosity, did you benchmark LLVM to see where the slowdowns are and see if you could contribute back to them to speed up your use cases by any chance? I did not. I would be curious. I always assume that it's just inevitable that this is the way they represent the IR. could contribute back to them to speed up your use cases by any chance? I did not. I would be curious. I always assume that it's just inevitable that it's the way they represent the IR.

Starting point is 00:47:12 It's the fact that I have to even generate the IR. If I don't use LLVM, I generate the machine code just directly. I don't even do any assembly. I generate literally the machine code in memory right away. For all you know, there might just be like a while loop that says, if it looks like we're generating Fortran. Let's slow it down a little bit. That seems unlikely. Is the L4 train compiler considered ready

Starting point is 00:47:39 for production use or is there still more work to be done before you recommend it for production it's not ready for production uh it's ready for testing the parser that should be complete if there are any bugs you know we'll fix them and then um we have this proxy app snap that i mentioned in positive we are trying to compile this fortune 95 code and i'm hoping we'll be able to compile it in a matter of months uh and we'll make a release once we can compile it. And then we'll be ready for first users. And you said at the intro,

Starting point is 00:48:11 you said your employer is at least partially funding these projects. Yes, so they fund me on Fortran 2 C++ translation as a help to some of our internal teams that move away from Fortran. That backend should be ready also. The hardest part is not so much the backend, the hardest part is all the

Starting point is 00:48:32 semantics and all the modules and symbols, important for modules, all this stuff. That's what we are working on right now. Once we can get all the semantics, the backend will be relatively quick to update, to get up to speed. Since you mentioned modules and you talked about how Fortran has

Starting point is 00:48:47 modules in one of the newer versions, how does that compare to C++ modules? Are they comparable? I don't know. I haven't used C++ modules yet. No one has. The Fortran modules are, it's just a piece that has

Starting point is 00:49:03 subroutines, functions, variables, and then you can use, it's it's it's what it's just a piece that has subroutines functions variables and then you can use it's it's a user at use and um in the name of the module and it imports everything from the module or you can just import one function and that that's pretty much all there is to it and and so that and so the ford frontrun compilers do they so modules have dependencies so cmake for example it understands how these dependencies work and it will call the Fortran compiler in the correct order. So then it compiles the modules,

Starting point is 00:49:31 you know, so it actually compiles. And then each module gets compiled to a mod file. So object file and mod file. And the mod file contains compiler internal representation about what symbols are in the module. So that when you compile the next module, it knows what to expect in the object file. We haven't really ever, I don't think, talked about this on the show, but CMake

Starting point is 00:49:52 has full Fortran support, right? Yes. Is CMake the standard or de facto standard for Fortran users as well? I would say so, yeah. Okay. That's what I would recommend at least. Although we have a better solution and that's the Fortra Package Manager that's modeled by Cargo from Rust. And so it's a build system and it's a package manager

Starting point is 00:50:15 in one. And so then your needs you make. Essentially FPM, so it's called FPM, Fortra Package Manager. You can use it, you can think of it as a high level. It has more information, has all the information about your project on a higher level than CMake. So we are planning to, so right now we just compile your code directly,

Starting point is 00:50:36 but we are planning to be able to generate CMake project for your code. Some people might prefer that if you don't want to use FPM. The way cargo works, if you're not familiar, it's opinionated. So it kind of assumes where things are on your disk, your files and so on. But if you follow that, the layout, default layout, but if you follow it, everything just works. It's great. And all the dependencies get compiled. It's a source package manager and build system in one. And so FPM is very similar. And so it can compile Fortran

Starting point is 00:51:15 files, but we are also planning to allow to compile C and C++ files because a lot of projects use both C, C++ and Fortran in one. So as long as you're willing to follow the layout, FPM can compile it for you also, or will be. That sounds pretty cool. So is FPM actually part of the Fortran 2018 standard? No.

Starting point is 00:51:40 FPM is part of the Fortran-like organization that we started about a year ago. And for me, this might be the most exciting project there because it works today. You can use it today still in, I would say, alpha or beta kind of version, but

Starting point is 00:51:58 it works today. It works with any Fortran compiler and you can use it to use it as you can finally have dependencies and finally create a Fortran compiler and you can use it to build to use it as you can finally have dependencies and finally create a Fortran package that others can use which in the past was really really hard as I'm sure you know from C++

Starting point is 00:52:15 similar problems this is one way to fix them. So if you're listening to this and maybe you have horror stories in your past from Fortran or maybe you've never touched Fortran before, like I don't think I ever have, what would be your pitch to a C++ developer? Why should you look into Fortran?

Starting point is 00:52:34 If you are trying to solve some numerical or math application problem and you like fast compilation and fast execution of your code, I think you should definitely look at it. If you like Python and NumPy and you enjoy using those, you will also like Fortran. It feels very similar. You just have to add types pretty much, and the syntax is close. If you like Julia, I'll give Fortran a shot also. If you like Matlab, the way I pitch it to our postdocs i ask them what tool do you use

Starting point is 00:53:05 to prototype they typically say well python or matlab or julia and i ask them what you know when you want it to run fast for production what do you use and they say well c++ or fortran and then i say well wouldn't it be nice if you can use fortran interactively from the beginning and develop start in fortran let's say using lfortran and develop your start in Fortran, let's say using L4Tran, and develop your prototype. And then because it's already in Fortran, you can just take it and put it in the production code and it will also run fast. And I would say if your application is to write a compiler,

Starting point is 00:53:36 I would, you know, as you see, I chose, I like Fortran, but I chose C++. So it's a great application for C++. Yeah, that sounds like for numerical processing, you're saying this is where Fortran still has its niche, even after 65 years or whatever, right? Right. So, yeah.

Starting point is 00:53:57 Right. I would say the issue with Fortran is language is very nice, and still, I would say, has not realized its full potential. And the reason is that the tooling around Fortran and the compilers, I think, are a little bit lacking. And so that's what we are trying to fix. Awesome. Yeah, very cool. Anything else you want to tell our listeners about before we let you go?

Starting point is 00:54:16 I feel like we've gone over a lot. Yes. I would say if you are interested, please join us. Go to fortranlang.org and join our discourse or just contact me. I'm happy to get you up. We are looking for contributors and users. Okay, and what's the best website to go to? fortran-lang.org

Starting point is 00:54:34 Okay, great. There are links to discourse and other things. Awesome. Thank you so much for coming on today, Andre. Thank you for having me. Thanks for coming on. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to

Starting point is 00:54:56 feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info

Starting point is 00:55:19 and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - LFortran

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.