CppCast - LFortran
Episode Date: July 1, 2021Rob and Jason are joined by OndÅ™ej ÄŒertÃk from Los Alamos National Laboratory. They first talk about ISO Papers and Github's CoPilot AI programmer. Then they talk to OndÅ™ej about LFortran, a moder...n LLVM based Fortran compiler that can compile Fortran code into C++. News June 2021 ISO Mailing C++ Library Include Times GitHub Copilot CppCon Field Trip Links LFortran Fortran Fortran Package Manager Sponsors PVS-Studio. Write #cppcast in the message field on the download page and get one month license Date Processing Attracts Bugs or 77 Defects in Qt 6 COVID-19 Research and Uninitialized Variables
Transcript
Discussion (0)
Episode 306 of CppCast with guest Andrey Shurtek, recorded July 1st, 2021.
Sponsor of this episode of CppCast is the PVS Studio team.
The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we talk about ISO papers and GitHub Copilot.
Then we talk to Andre Sertic from Los Alamos National Lab.
Andre talks to us about the L4Train compiler and the Fortran language. Welcome to episode 306 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm okay, Rob. How are you doing?
Doing all right. Any news from you anything you want
to share well i guess i could share that it looks like training is starting to return everything
seems to be going back to normal yeah that's the thing hopefully this uh delta variant nonsense
won't um shut things back down again but yeah from everything that i've read and seen it seems
for people who are vaccinated it's
no more risky than any of the other variants it's just traveling very quickly through the people who
aren't vaccinated right now right and so if you aren't vaccinated consider getting vaccinated
if you're not vaccinated definitely go get one i i've been doing fine with it i haven't gotten any
secret messages from bill gates or anything like that. There have been zero zombie outbreaks
so far.
Okay, well
at the top of every episode I like to
read a piece of feedback.
We got a tweet from
Pratik Anand saying, late to the party
but really enjoyed the conversation
on CppCast on Rigel Engine.
I've contributed to the project too
so even more happy for it to see some more recognition.
It's an important step in video game preservation IMO.
That was the Duke Nukem engine, right?
Yeah.
But we've done a couple episodes on kind of that topic lately,
preserving old video games.
Yeah, I may or may not have actually sought out people
who would be good interviews for that
yeah well it's been fun okay uh we'd love to hear your thoughts about the show you can always reach
out to us on facebook twitter or email us at feedback at cppcast.com and don't forget to
leave us a review on itunes or subscribe on youtube joining us today is andre certic andre
is a scientist at Los Alamos National
Laboratory, originally from the Czech Republic. His background is computational physics and high
performance computing. In addition, he has been actively involved in open source. He is the
original author of SymPy, SymEngine, L4Tran, and a co-founder of the 4Tran Lang organization.
His current mission is to rejuvenate 4Tran, a language for high performance numerical computing. He likes and uses C++ as a great tool that allows him to deliver robust,
very fast libraries and applications, including SimEngine and the L Fortran compiler.
Andre, welcome to the show. Thank you for having me.
I'm curious as we get into this interview, because I know the National Lab sponsor a lot of open
source work. Is the work that you work on at the National Lab sponsor a lot of open source work.
Is the work that you work on at the National Lab also open source?
Is it related to these projects too?
Some parts.
I'm funded a little bit on the L4Tran compiler.
In fact, I'm funded on L4, sorry, on Fortran to C++ translation because as you probably know, a lot of people are moving away from Fortran to C++.
And so I'm happy to help with that also.
We'll get to that later.
Awesome. Okay, thanks.
Sounds good.
Okay, so, Andre, we just have a couple news articles to discuss.
Feel free to comment on these,
and we'll start talking more about Fortran and maybe SymPy and all that, okay?
Awesome.
Okay, first one we have is the June 2021 ISO mailing.
Lots of new papers, as always.
I didn't have a chance to go over any of these in too much detail.
Jason, I did see there was one for constexpr for cmath and cstodlib.
So it looks like the constexprizing of the standard library is continuing.
So that's always good.
That one has been a long time coming
because they have
problems with...
It's revision 8 of that paper.
Yeah.
Anything either of you wanted to call out
with these papers?
Yeah, I like the stack trace from
exception paper. It's the
P2370R0.
The idea, I think, if I understand correctly, that
the language itself should somehow help you get the stack trace when you get an
exception. That could be handy. Very nice for debugging. I do that by hand
in every C++ project that I have. I get nice exceptions.
Yeah, very nice. And it looks like there's movement on
executors of some sort standard execution um
first revision of this paper i didn't see that one for managing asynchronous execution on a
generic execution context so it looks like there might be maybe we'll get that into c++ 23 that
would certainly be nice and i know a lot of other things are waiting for that to come
in, right? Yeah. Yeah. All right. Next thing we have is a GitHub repository, which is really just
a collection of the this graph, and I guess, maybe some code that they use to run it. C++ library
include times. So it's a nice graph of a lot of the standard library headers
and also some popular open source library headers
and how long it takes to include them
if you are using these headers.
And yeah, I guess maybe some of these
were maybe a little bit surprising,
like file system takes a huge amount of
time to include maybe that's not surprising i don't know right now the only one that's surprising
to me is that regex isn't the one that's at the end of it yeah that's true regex is actually very
small they have a kernel library it takes 1.1 seconds and i tested it on my mac and it takes 1.1 seconds. And I tested it on my Mac, and it takes 6 milliseconds with Clang on my Mac.
So I'm a little bit surprised it takes 1.1 seconds on Windows.
Yeah, this was all done using Visual Studio, I believe.
So it would be nice to see if someone did this with GCC and Clang
so you could compare these different popular libraries
on the different platforms.
Right.
Yeah, another one that stands out here that I find a little disappointing is Speedlog,
which I do like using.
And it's actually the highest one on this entire chart.
Yeah, and I was a little surprised that that was the highest one
and not like Windows 8 or something like that,
that you would kind of think would be one of the worst offenders.
Yeah.
And most of these things, you know, like if you really care,
you can isolate their usage.
But something like speed log, you really can't.
I mean, realistically, if you want to use the same logger
in all of your CDP files.
Hmm.
I have to give that a look.
Okay.
And then we also have, this is not necessarily directly c++ related but uh i see a
lot of programmers on twitter talking about this over the past week uh github co-pilot got announced
and this is uh an ai you know programming assistant it's available as a visual studio
code extension and it's you know powered by all the repositories on GitHub.
Looks pretty fancy.
I'm not sure if it's directly related to the Visual Studio AI tool that I know has been Visual Studio for at least the last release or two.
Yeah.
Last year or two.
I've turned on the Visual Studio one and then not necessarily noticed a difference in
the autocomplete, so I don't know if I was just using it wrong or if it didn't see something.
This one looks a little different in that you can write your function name and it suggests
an entire block of code for you, which is maybe a little scary.
Well, the question is, once this gets good enough, can I start training AIs instead of humans?
Yeah, maybe.
What do you think about this one, Andre?
Yeah, I just think it's very interesting.
I wonder how they do it, if they have an abstract syntax tree for every language,
and how much semantics they encode, or if this is all machine learning.
It would be interesting to know the details.
Yeah.
Okay, and then the last thing we have, Jason,
this is the announcement for the CppCon field trip for this year?
Yes, the field trip.
Now, you organized it two years ago.
Did you have any hand in this one?
I did not have a...
No, not directly.
When it was being planned for the 2020 conference, it was the same plan that we
wanted to do for the, I say we loosely, the meetup wanted to do for 2020. And the organizer brought
up this idea and we all collectively agreed that this is kind of like, this is Colorado, right?
This is going to give you a taste of colorado because uh the field trip will be starting at the gaylord and then going up into the the mountains um i don't know what would you
hear like nine ten thousand feet something like that uh from the five thousand feet that we are
at down here so from like uh sixteen hundred meters to like twenty,000 meters-ish around in that ballpark.
And doing a narrow gauge railway
that will take you to one of the old mining camps.
And you can go in, and I've done this trip before myself.
It's been a long time.
But you can actually see what a working mine used to be like.
I'm pretty sure that they're planning to actually go into the mine.
And then when that's done, then stop in Idaho Springs for Bojo's Pizza.
And I know some of our friends who like to drive up to C++ now make a point of actually
stopping in Idaho Springs just for Bojo's Pizza on the way from the Denver airport to
Aspen.
So it's, yeah, it's a Colorado institution.
And if you care about beer, I don't know how much time that you'll have,
but there's two breweries within walking distance of Bojo's
where you could grab and go and take it back to the hotel with you.
Nice, nice.
All right.
Well, Andre, we've talked about Fortran on the show plenty of times on CBBCast,
but usually we're talking about it as if it's very much a thing of the past,
a somewhat dead language, but you're actively working on it.
Do you want to start us off there?
Sure. Well, it is an old language one of the if not the oldest high level language
started at ibm i believe in early 1950s and you're right that a lot of people think that it's
it is dying or even dead but um and i also i most you know most of you who studied any kind
of physics or any kind of engineering degree
you probably know that your advisor
typically has some kind of old Fortran code around
and you're stuck with it
you have to resurrect it
and fix some bugs in it
so that's most people's experience
with Fortran
and then of course
I knew Python and I was mostly using Python
and C++ but then
I came back to Fortran in around 2010 I was at Lawrence Livermore National Lab
as an intern and I was optimized I had some Python code with some Fortran
kernels for some electronic structure and we were optimizing it and eventually
we rewrote it in Fortran and that's when I kind of learned Fortran, the modern incarnation of Fortran.
And I realized that it's a really nice language.
It feels like Python and NumPy.
And so for numerical computational things, it's very handy.
Everything is in the language.
And then the compilers can optimize it beautifully.
And so that hooked me in.
And then as time goes, 2015-16,
usually what I see the direction of
let's say I already was working at Los Alamos National Lab.
We have a lot of Fortran codes, a of production for trump codes but i only see one
direction typically uh people want to move away from fortran to mostly to c++ so yes uh a lot of
and if you talk to a lot of people um it's very common um that's kind of the sentiment that
fortran is a dying language um and so that's how i joined i decided to uh fix that and make sure
it's not a dying language anymore.
And also I should say the term dying. Every time I say it, I get a pushback from the Fortran community.
I always say it's not dying. It's never been dead. You know, you should not be using such words.
But, you know, it's one way to characterize it. So I prefer to use the term rejuvenate. Anyway, but yes, it's definitely in the 2015, 16, 17 timeframe,
it was not doing well at all.
And that's when I decided to join.
So I wanted to help fix it, because I like the language.
We can get back to it later why, but I wanted fix it and so I thought well how what can I do
even so I figured well there was only one organized community around Fortran there was the Fortran
standards committee so I looked it up online and there's not much information there didn't know
anybody there but there was a mailing list so I decided to join it and then I asked them you know is there a meeting and there was a meeting in Vegas I believe so I
you know booked my flight and came to the meeting and I I just didn't know
what to expect I know today you know is it a meeting you know or how does that
even you know how many people are there and so on turns out there were about 20
people 15 20 people and 15-20 people,
and they were indeed working on new features to Fortran. And so I asked them,
what's your vision? I feel the situation of Fortran as a language is not very good.
Do you have any plans to fix it? I see a lot of codes around me moving away from Fortran and I don't see
any code moving to Fortran. Do you have any kind of ideas how to fix that? The feeling I got was that
they did not. I asked, well, one of the arguments that I heard there was that at universities
that you don't have Fortran classes anymore and so that they teach C++ or I would even argue Python
Let's say
Well, so I told them well, I have some ideas and so
It was I had a very early prototype of L4chan which is a compiler I've been working on
The prototype by the way was still in Python. I like Python. So it was in Python. It
was just a prototype showing that it can work interactively. And so this L4TRAN compiler,
I'll talk more about it later, but I showed them the prototype in a Jupyter notebook,
interactively executing Fortran cells. And I think they liked it. And I was trying to
motivate them that I think this is how we can attract a lot of new users from the Python and MATLAB Julia community.
I still think that.
And I will say that I think now we are in 2018 or 2019.
Since then I've written L4TRAN to C++ and we can talk more about the details later. And also, as I mentioned, the Fortran standards committee felt kind of secluded.
So unless you were already on it, not many people knew much about it.
So the other thing I've done is I created this GitHub repository for people to submit
proposals to the committee and didn't think of it much.
I thought, well, let's at least try, see what happens.
And it was tremendously successful.
I announced it on Twitter, and we immediately got dozens and dozens of people just coming in and opening issues and saying,
I would like to fix that and that.
And one of the more popular, I would say, proposals is to release Fortran standard every three years instead of five years.
Another proposal is to put the standard itself on GitHub and use GitHub as opposed to these old papers with hand diff, essentially, to propose changes to it and so on.
Anyway, it was very successful.
And then from that initial online community,
we started an effort that we now call Fortran Lang.
We first started with writing a standard library for Fortran.
So Fortran itself as a language has a lot of features,
as well as intrinsic functions like sine, cosine,
Bessel functions and so on.
But there's a lot of things that almost every Fortran programmer would like to have,
and they are not part of the language.
And so we decided, well, why don't we write a standard library,
sort of in the scope of MATLAB or SciPy.
So all the special functions, all math functions,
as well as all kinds of algorithms like sorting, one example.
Fortran doesn't have sorting in the language.
And so we did that and then one thing led to another.
So we then created a website and a logo for Fortran.
If you Google Fortran or Bing or DuckDuckGo,
the first or second page is the Fortranlang.org website.
So that's the Fortran website.
So we created the website for Fortran.
And then I guess the most exciting project there,
besides Fortran, is Fortran Package Manager called FPM.
We can talk more about that.
Oh, wow.
So that's how it all started.
We also have a discourse forum.
Discourse, for those who don't know, it's an online forum.
It's sort of like a mailing list, but it allows you to edit your post.
And it's a very nice way to communicate with the wide community online.
And so we have hundreds of users.
So I'm very excited about all of these developments.
Especially that within one year, we launched
a website about a year ago.
Within one year, we got in Google even to the second page after Wikipedia for Fortran.
So things are changing.
Oh, and the other thing that happened within last year was there is this Tiobe index of
the kind of rank languages.
I'm not quite sure exactly how they do it,
but somehow, out of nowhere,
Fortran jumped from, I don't even know,
50th place to the first page.
It made it.
Again, I think it must be the web page, I assume,
but I'll take any good news.
And so this is all very positive.
That all happened within the last year, pretty much.
So if we can get a little bit of timing perspective here,
because you were talking about modern Fortran.
And I think there's a good chance I'm going to get this wrong,
but I'm pretty sure my dad learned Fortran 60 when he was in university.
So what is the modern version of
Fortran? You said it has been
updated. When was
it last standard, last released?
Well, I'll start in the back. So, there's
Fortran 4. I don't know if that was
the very first standard
that they standardized, because they had multiple compilers
and so on. After that, it was
Fortran 66, then Fortran
77.
After that, the next big it was Fortran 66, then Fortran 77. After that, the next
big revision was Fortran 90,
then 95,
2003, 2008,
and 2018.
Okay.
And in terms of kind of modern,
what most people consider modern,
so F77 or
Fortran 77,
back then they used punch cards. And so card you know i had to you know i was born in 1983 so i've never used punch cards
i had to look it up on youtube exactly how that works but i know some older folks might be thinking
oh how is it that you don't know how it works but it's just a card and then it has the first i
believe six columns are used as control
characters and so when they converted those programs on punch cards into files uh you end
up with a format that's called fixed form so the first six columns i believe or seven are control
characters and so you cannot use them to write code you have to you have to put spaces there
or you can you can use them as comments or labels. And so that's called
fixed form. It's very hard to program it because you have to put six spaces all the time and it's
very particular. That was Fortran 77. Fortran 90 introduced what's called free form, which looks
like Python. Essentially, you just write your code like any modern language.
So that's Fortran 90. And the other thing that
happened, Fortran 90 introduced
modules
with dependencies
and all that stuff,
derived types.
So that is like a C struct.
And it also introduced
some improvements to the arrays. You can allocate
them at runtime.
For Tron 77, I believe you have to allocate the array in the main program.
You can, however, pass it to subroutines as unknown length.
That all works.
So it's really cool when you think about it. It's for Tron 77.
It's very old, and you write a subroutine just like you would today.
So you accept the length of all the arrays you have,
and you have multi-dimensional arrays, and this all works nicely.
But Fortran 90 adds, you can actually allocate it at runtime at any time you want.
So that's Fortran 90.
And then since then, most of the additions are relatively small.
So Fortran 2008 adds objects and methods, stuff like that.
And then the rest of the changes are just kind of fixes here and there.
Things like F77 uses, I believe, slash and parenthesis as, you know,
when you want to write an array, as an array delimiter.
So for term 2003 or 2008, I can't remember, introduces square brackets,
you know, to make it like Python, stuff like that.
They also added like a parallel loop.
It's called doConcurrent.
I believe that does it eight or so.
They also added parallel arrays.
They are called co-arrays into the language itself.
Before that, well, it's not used too much.
We can get to later why,
but the idea is to put parallel features into the language
itself and let the compilers
handle that. So the co-arrays are
sort of like MPI, like one-sided MPI,
if people are familiar with that.
Yeah, I was going to say MPI is one of
the parallel things that's supported on Fortran,
right? Yes, yes it is. In fact, most
codes, all the codes, I would say, use
MPI for historical reasons.
But co-arrays add that feature directly into the language,
and the syntax is beautiful.
You just have arrays, and you just have, in brackets,
you just put which MPI or which co-array,
which rank you want to access,
and the compiler handles and knows when the data is available
and when it's needed,
so it will start sending the data as soon as it has it.
Interesting.
And so it's like one-sided MPI, but you don't have to worry about it.
So it's very nice.
The idea is very nice.
In practice, the reason people don't use it too much is because,
well, they already have the code using MPI,
so you don't want to rewrite it. And for U-code, the compiler support wasn't great until just very recently.
So then L-Portran, your compiler, which we'll, I guess, get to more in a minute parser supports full for 2018
If you know I encourage people to try it and report any bug that you have to find in the parser itself
You can use L for trans space FMT like format. It will format your code. It will for now it will skip comments and empty lines
We are on there. We are still working on but everything else every single thing should work
So it parses to
AST, abstracts in Testtree and back to source code. The semantics we are now working on.
So the current status is we are trying to identify a proxy app, it's called Snap, it's
a particle transport code from Los Alamos actually but not written by me i know
the people who wrote it but i have nothing to do with that and it's 495 and we are about half the
way to be able to compile it we'll have the modules compile uh but but we're still fixing
things to actually compile that and the time frame we are really close i'm hoping within months
you know and the end of summer i'm really hoping we are able to compile it. And at that
point, we'll release MVP, Minimal
Viable Product. We'll ask people to
test it out and start using it.
So that would be roughly Fortran 95
level. So the parser
is full 2018, but the actual
compilation, you know, semantics and LLVM
code and all that stuff, it will be
roughly Fortran 95, which turns
out to be really actually large.
Most of the hard work is in there.
You know, after this works,
what will remain is just the objects
and just kind of a runtime library,
all these functions, all kinds of condor cases and so on.
But the hardest part will be behind us.
Sponsor of this episode is the PVS Studio team.
The team develops the PVS Studio Static Code Analyzer.
The tool detects errors in C, C++, C Sharp, and Java code.
When you use the analyzer regularly,
you can spot and fix many errors right after you write new code.
The analyzer does the tedious work of sifting through the boring parts of code.
It never gets tired of looking for typos.
The analyzer makes code reviews more productive
by freeing up your team's resources.
Now you have time to focus on what's important, algorithms and high-level errors.
Check out the team's recent article, Date Processing Attracts Bugs or 77 Defects in Qt 6,
to see what the analyzer can do. The link is in the podcast description. We've also added a link
there to a funny blog post, COVID-19 Research and uninitialized variable, though you'll have to
decide by yourself whether it's funny or sad. Remember that you can extend the PVS Studio
trial period from one week to one month. Just use the CppCast hashtag when you're requesting your
license. But before we talk more about like L4Trans and these other projects, you know,
maybe we should just back up for a moment
and go over how Fortran compares to other languages
like C++, and why do you feel like
it deserves all this attention,
and why are you against people converting
old Fortran code into C++,
which you said you're trying to get them to stop doing?
Or you're encouraged. L4Trans has a C++, which you said you're trying to get them to stop doing. Or you're
encouraged. Fortran has the C++ backend,
so it translates Fortran into
C++. Oh, it translates it into
C++. So if people want
to move away and just develop
in C++, they will be able to use
Fortran to do that.
So,
this is also cool. But, in fact,
I think, I'm hoping a lot of people will decide, well, if I'm not logged into Fortran,
and if I can always translate to C++, maybe I will stay in Fortran.
But, anyway, so why Fortran?
So, the main, I would say, three motivations for Fortran is to essentially enable scientists and physicists
and domain experts, engineers, to write domain-specific code,
essentially write numerical code, and be able to maintain it themselves.
And I would say there are three kind of pillars.
One is the basics mathematics is in the language itself.
So Fortran has this pronunciation, has complex numbers, has all kinds of special functions. You know, F77 already had all that. It is more restrictive, I would say, and higher
level than C++. So it's, for example, it has the multi-dimensional arrays, but in the language.
It has pointers, but you cannot just point to to anything you have to declare what you point to as target so things like that. It's much more
restricting. F77 if anybody plays with it, it's very restricting.
It feels oh it's just so hard to do anything. It's very
restricting but the advantage of that is that it's very simple. There's not much
to it. F77 is just a bunch of subroutines and functions and arrays and loops. And also that kind of design allows the compilers historically to
optimize it really well, which means you get very good speed. So those are the three kind of
advantages. I would say, if I can expand on that, historically the mission for Fortran from the very beginning
was to allow scientists, engineers and domain experts to write programs that naturally express
the mathematics and algorithms employed, are portable across HPC, so high performance computing
systems, remain viable over decades of use and extract a high percentage of performance
from the underlying hardware. So Fortran as a language it feels
high-level it doesn't have things like you cannot inline assembly for example
the language doesn't allow you to do that. The language itself
doesn't have a memory model. In other words, in C and C++ you can go into the bits
and how floating point, you can rely on how floating point is represented and stuff like
that. In Fortran, you can operate on floating point in a more abstract manner. And the reason
is historical. Fortran ran on all kinds of machines
that did not have IEEE,
you know, floating point,
all kinds of weird,
if you look up on, you know,
all the high-performance competing systems
in the past,
they have all kinds of architectures.
And yet the Fortran compilers
were able to take the program
and just compile it to run on the machine.
So that's the motivation.
So then with that in mind,
do you support or recommend
or what would the process be like
if your engineers, scientists,
wanted to just keep code in Fortran
that's part of a larger C++ application?
And it sounds like you can just emit the C++
and potentially just link it together.
Eventually. So I think
there are multiple approaches.
The basic approach, so I would say
languages in modern era, they have
to interoperate. So you need to
be able to have a Fortran library and
just call it C++ from C++
or from Python, just seamlessly.
To support that,
Fortran, to have some prototypes
kind of showing that it works,
it should provide the wrappers automatically for you.
It's a compiler.
It knows all the types,
knows everything.
It should just allow you to use it
from C++ and Python or Julia,
MATLAB, whatever you name it,
it should just work.
So for example, from Python,
there should be a library
where you can just use,
like import a module that happens to be a Fortran module,
and the compiler behind the scenes should just wrap it and make it available for you.
Or you can use it to generate the wrappers as files.
So traditionally, the way you wrap Fortran from other languages is you have to write, it's called ISOC binding.
It's a special module in fortran which
allows you to interface c either calling it or exposed to c and then from c plus plus you have
to or python you have to then call this as a c library but then typically fortran has arrays so
in python for example you don't want just a memory you want a numpy array that should be mapped to
the same memory as in fortran. So there's quite a bit of
technical things involved and so there are tools that allow you to do that. F2Py is for Python.
For C++, I don't know actually if there's a tool that... well, one issue with C++, there are a lot
of C++ library that can handle arrays and so for each, you know, you have to have some kind of,
you know, wrapping code. All of this should be in my opinion handled or at
least helped by the compiler so that as a user you can just go to c++ and just start using it
so that's one one answer um the if you want to go away from fortran i also think the compiler
should allow you to just translate all your phone track on to c. Technically, so it seems to be working.
What we have to do just,
what we are doing now
is just kind of
finish all the semantics
and actually make it work
for Fortran 95.
But technologically,
it seems to be working.
Go ahead.
The code that's generated,
code generated by tools
is not necessarily
often maintainable.
What is the code like
generated from your tool?
Yeah.
So for LANOS, Los Alamos National Lab, most people like the
Cocos C++ library for arrays. So right now we just target Cocos, but
it's not tied to it. We can change that also, but later.
So array expressions are transformed into Cocos
array expressions, and loops are simply
transformed to access Cocos as the array implementation in C++.
It looks...
We tried to make it as readable as we can.
I think it can be done so that it's readable. So F77, there is a tool called F2C,
and it translates F77 to C.
That's an old tool.
It's an old tool and not that readable.
I think it can be done much better.
The subset that we can translate so far is very readable.
It's essentially, when you think about it,
so the compiler knows
exactly what you are doing in Fortran because it knows how to translate it to
LLVM for example or machine code. So it knows exactly the semantics, it knows
that you have an array, it knows that you have a loop or a subroutine or a function, it
knows exactly the type. So to make it readable in C++ all that's needed is to
decide how you want to represent things in C++.
And the huge advantage of C++ in this case is that because it's such a bigger language,
there's multiple ways you can represent the Fortran things.
And the Fortran semantics is very simple. Even the latest incarnation of Fortran is still very simple.
The types map nicely to C++ types and then some of the
corner cases can be handled by the compiler and the arrays for example. So the Cocos arrays, they
pretty much have all the operations and more than what the Fortran arrays allow you to do.
The only thing they don't have, the main thing they don't have is in Fortran you can have array
operations. So you can stack or you can operate on arrays as a whole.
So those have to be written to for loops.
But again, the compiler has to do that anyway to emit machine code.
So the compiler has all the technology to do that.
And it's just about the C++ backend.
We are trying to write it in a way so that it's readable,
so that people actually like what they get.
So aside from the ability of L4tran to generate C++ code from your Fortran code,
what else sets it apart from...
Are there other Fortran compilers, or is there just one other one?
What else sets this one apart?
So there is about 12 Fortran compilers.
But if you go to fortranline.org,
there is a section of compilers,
and we list all of them and links to them.
So in terms of open source compilers,
there is GFortran, part of GCC.
There is Flang, part of LLVM,
and there is LFortran, which are the three main ones.
There were a few more in the past,
but they are not actively developed anymore.
Does that mean Intel's is no longer maintained?
So those are open source.
And then you have commercial.
Oh, sorry. Okay.
So there's Intel as the main, I would say,
historically at least,
that's the main kind of compiler
in terms of delivering optimized code.
But there's NAG compiler,
there's Cray compiler, AMD, IBM.
All these companies have typically their own
compiler. But I would say Intel and NAG are the two main ones that we use often at least.
So then how is L4TRAN different? It's interactive. So you can use it to compile to
binaries as any other compiler, but you can also use it interactively
so it feels like Python or Julia.
So you can launch a REPL from a command line and it looks just like Python.
You just do integer i and i equals 5 and you can start putting four-time commands and each
command gets compiled to LLVM and machine code loaded into memory and executed.
The way it works technically is just like Julia works. In fact, Julia was a huge inspiration
for me to start L4TRAN. I kind of investigated how they do this and realized, oh, I think this
can be done for L4TRAN also. And so that's one big change, big difference. And the
other difference is that it has multiple backends. And I would say the design of L4Tran is something
a little bit unique. I'll just quickly kind of describe it. So there is a parser, parser parses
to abstract syntax tree AST, and then the AST gets transformed. So then all the semantics gets checked,
and then it gets transformed to a representation that we called abstract semantic representation
ASR. Again, it's a standalone representation that represents just the
semantics so it has a symbol table and things like that and you can print
it to a screen you can give it back to the user so they can see exactly what
the compiler sees is exposed anyway and and then all the back
end just take this asr and do something so llvm you know back end generates llvm code we'll c++
back-end generate c++ code the python wrapper back ends that we will write will generate python
wrappers or whatever and then we also have x86 uh direct machine code um generation back-end
that generates generates machine code very quickly.
It's more of a prototype just to see how fast it can be.
It's very fast.
It's about, well, on the artificial benchmark I tried, it's about 20 times faster than LLVM
to compile.
Oh, wow.
To compile, okay.
To compile, that's the key.
So that back-end will be used for development when you want to compile your stuff very quickly.
Right.
Yeah, because I'm...
Sorry.
Implementing all of the same optimizations
that LLVM does would be very difficult.
I'm not going to even attempt that.
LLVM is so...
I could talk about it for hours,
but LLVM is awesome.
The way...
It's absolutely amazing.
It has very little information.
It's a low-level IR. And yet, it can optimize. information it's a low level you know IR
and yet it can optimize what it can do is just amazing so we have these tests
in in Fortran that I test to compiler with things like for loops and it does
some calculation integer let's say then at the end I have if that integer is
equal to 55 then exit with zero otherwise exit with an error and then
when I run it through LLV so llvm just sees a bunch of
doesn't even have for loops it just has you know jumps and yet it can optimize everything out
and so the whole test is just a return zero you know it's just amazing absolutely amazing
and but yet so it's great for this kind of low level optimizations where llvm is not that great
is if you want to optimize areas at this kind of higher level things like it well it's great for these kind of low-level optimizations. Where LLVM is not that great is if you want to optimize arrays at this kind of higher level.
Things like, well, it's still amazing what it can do, even just not knowing anything.
It can unroll things.
It can, you know, it's amazing.
But for Fortran, it's not good enough.
For Fortran, what's really needed is to be able to optimize array loops that operate on arrays on kind of a higher level.
Things like there's multiple different ways
you can transform array operations into loops
and things like inlining and stuff like that.
It's always better to do it at the higher level.
So ASR is Abstract Semantic Representation L4TRA
before passing it to LLVM.
It's a similar idea as MLIR.
MLIR is a library that's part of LLVM now. It's built on top of LLVM. It's a similar idea as MLIR. MLIR is a library that's part of LLVM now. It's built on top of LLVM.
It allows you to represent arrays and for loops and if statements in the IR itself precisely so that they can optimize it better.
So it's a similar idea.
Did you consider writing your Fortran parser and compiler in Fortran? Yes.
But, yeah, I can maybe get back why I chose C++.
I knew that I could deliver this in C++ and make it very fast.
So one of the things I wanted, since I decided I'll do this,
I wanted this to be the fastest compiler.
I still want it. I think it will be.
That's my goal.
Fastest to compile.
And to do that, everything has to be fast.
So the internal representation, the AST and ASR, has to be as fast as possible.
And so to do that, I spent two months just benchmarking seven different ways how you can represent a tree in C++.
So as a class
you know like inheritance as c structs and you know casting there's and also how to visit the
tree very efficiently so there's again so many different ways you can visit a tree in c++ you
can have a you know a visitor pattern you can have a just you know switch and you know there's
many different ways you can do it but also with C structs you can have a
you can represent the AST as union or you can have each AST node as a struct of different size.
Also how you allocate the memory and so on. I kind of reused my experience by
I've written this library called SymPy, which is a Python library for symbolic manipulation.
It's very successful.
It has a lot of users and contributors.
It's Python.
It's great.
It's like Mathmerica or Maple.
It allows you to compute symbolic integrals and so on.
But it's in Python.
It's very slow.
So I decided, well, if I want to make it fast, how do I do that?
And I tried many different approaches.
I tried C.
I tried Python.
Python is this great
tool that allows you to kind of speed up Python with C and so on. But eventually I realized the
only tool like C++ that allows me to do that, to deliver. And you know, I don't want to spend too
much time on that, but essentially in around 2015 we kind of delivered a library called SymEngine,
which is a C++ implementation of the core of SymPy. It's a tree in memory and it's using
reference counting as the memory management. We wrote our own reference counter, very optimized,
and eventually we made it the fastest library for symbolic manipulation that we benchmarked. We benchmarked against Mathmerica, Maple, Sage, SymPy, of course, Ginnut.
It seemed like it's the fastest.
So and then when I was writing the compiler, I realized a compiler is not really that different
from SymEngine or SymPy.
In fact, SymPy is a compiler.
I just never thought about it that way.
But what SymPy does, it allows you to parse, it has all kinds of input parsers. So it allows you to parse things, it allows you to
represent the symbolic expression in memory, allows you to apply operations on it in memory,
and then it has code generation including even LLVM. So it is a compiler. I just never thought
about it that way. So using this experience from SimEngine,
I realized, well, the reference counting is one way, but I think there is a faster way to do that.
And so after spending months benchmarking different approaches, I ended up using just
essentially C struct, a custom allocator, a linear allocator. So it just moves the pointer. And the way to visit it, the fastest
that I was able to get is just a C switch
where based on the type, dispatches on the type, type is an
integer, park would destruct. And then because it's a lot of boilerplate
code, I generated. So I looked into how Python
represents abstract syntax tree and they have
they have this language called ASDL which represents every AST node. It's this nice
kind of high scale style or ML style language and then they have a tool that can parse it and
in their case they generate a C as a kind of representation of the AST. So I took that and
I generate this the very fast C++ implementation of this.
And then, of course, to make it easier to use,
we also generate a kind of CRTP pattern to generate very fast.
It looks like classes, it looks like inheritance,
but it's all in compile time.
And I carefully benchmark it to make sure there's no overhead
and there's no measurable overhead.
So from the user perspective, it looks like user, I mean the compiler developer, it looks like a
class pretty much and you have a visit bearing and it gets called and underneath is all this
not really complicated but kind of tedious machinery that's generated automatically.
Edious.
So yeah and it's very fast so it can uh it can so it can represent things fast and then the
parser is in bison and um the the tokenizer is in re2c2 um yeah and and then so now the
slowest part is lvm even in the debug mode it's just really slow it's great for optimizing but
if you just want fast compilation um that's when I decided to see if we generate machine
code directly and it's much, much faster.
So right now we are working on the LLVM backend because we want that.
That's the most versatile and it will allow us to deliver, including some pretty good
optimizations.
But down the road, I would like to come back and see how with the direct machine code generation,
it could make it very fast.
And I think as a user, as a Fortran user now,
I would love if the compiler can really generate the code
10 times faster than other compilers.
I think it would be really cool.
So out of curiosity, did you benchmark LLVM
to see where the slowdowns are
and see if you could contribute back to them
to speed up your use cases by any chance?
I did not. I would be curious.
I always assume that it's just inevitable that this is the way they represent the IR. could contribute back to them to speed up your use cases by any chance? I did not. I would be curious.
I always assume that it's just inevitable that it's the way they represent the IR.
It's the fact that I have to even generate the IR.
If I don't use LLVM, I generate the machine code just directly. I don't even do any assembly.
I generate literally the machine code in memory right away.
For all you know, there might just be like a while loop that says,
if it looks like we're generating Fortran.
Let's slow it down a little bit.
That seems unlikely.
Is the L4 train compiler considered ready
for production use or is there still more work to be done before you recommend
it for production
it's not ready for production uh it's ready for testing the parser that should be complete if
there are any bugs you know we'll fix them and then um we have this proxy app snap that i
mentioned in positive we are trying to compile this fortune 95 code and i'm hoping we'll be
able to compile it in a matter of months uh and we'll make a release once we can compile it.
And then we'll be ready for first users.
And you said at the intro,
you said your employer is at least partially funding these projects.
Yes, so they fund me on Fortran 2 C++ translation
as a help to some of our internal teams
that move away from Fortran.
That backend should be
ready also.
The hardest part
is not so much the backend, the hardest part is all the
semantics and all the modules and
symbols, important for modules, all this stuff.
That's what we are working on
right now. Once we can get
all the semantics, the backend will be
relatively quick to update, to get
up to speed. Since you mentioned modules
and you talked about how Fortran has
modules in one of the newer versions,
how does that compare to C++ modules?
Are they comparable? I don't know.
I haven't used C++ modules yet.
No one has.
The Fortran modules
are, it's
just a piece that has
subroutines, functions, variables, and then you can use, it's it's it's what it's just a piece that has subroutines functions variables and then you can
use it's it's a user at use and um in the name of the module and it imports everything from the
module or you can just import one function and that that's pretty much all there is to it and
and so that and so the ford frontrun compilers do they so modules have dependencies so cmake
for example it understands how these dependencies work
and it will call the Fortran compiler
in the correct order.
So then it compiles the modules,
you know, so it actually compiles.
And then each module gets compiled to a mod file.
So object file and mod file.
And the mod file contains compiler
internal representation
about what symbols are in the module.
So that when you compile the next module, it knows what to expect in the object file.
We haven't really ever, I don't think, talked about this on the show, but CMake
has full Fortran support, right? Yes. Is CMake
the standard or de facto standard for Fortran users
as well? I would say so, yeah. Okay. That's what I would recommend
at least.
Although we have a better solution and that's the Fortra Package Manager
that's modeled by Cargo from Rust.
And so it's a build system
and it's a package manager
in one. And so
then your needs you make. Essentially
FPM, so it's called FPM, Fortra Package
Manager. You can use it,
you can think of it as a high level.
It has more information, has all the information about your project
on a higher level than CMake.
So we are planning to, so right now we just compile your code directly,
but we are planning to be able to generate CMake project for your code.
Some people might prefer that if you don't want to use FPM.
The way cargo works, if you're not familiar, it's opinionated. So it kind of assumes where
things are on your disk, your files and so on. But if you follow that, the layout,
default layout, but if you follow it, everything just works. It's great.
And all the dependencies get compiled. It's a source
package manager and build system in one.
And so FPM is very similar. And so it can compile Fortran
files, but we are also planning to allow to compile C and C++ files because
a lot of projects use both C,
C++ and Fortran in one.
So as long as you're willing to follow the layout,
FPM can compile it for you also, or will be.
That sounds pretty cool.
So is FPM actually part of the Fortran 2018 standard?
No.
FPM is part of the Fortran-like organization
that we started about a year ago.
And
for me, this might be the most
exciting project there because
it works today. You can use it
today still in, I would say, alpha
or beta kind of version, but
it works today. It works with any Fortran
compiler and you can use it
to use it
as you can finally have dependencies and finally create a Fortran compiler and you can use it to build to use it as you can finally have dependencies and finally create
a Fortran package that others
can use which in the past
was really really hard
as I'm sure you know from C++
similar problems
this is one way to fix
them. So if you're listening
to this and maybe you
have horror stories in your past from Fortran
or maybe you've never touched Fortran before, like I don't think I ever have,
what would be your pitch to a C++ developer?
Why should you look into Fortran?
If you are trying to solve some numerical or math application problem
and you like fast compilation and fast execution of your code, I think you should
definitely look at it.
If you like Python and NumPy and you enjoy using those, you will also like Fortran.
It feels very similar.
You just have to add types pretty much, and the syntax is close.
If you like Julia, I'll give Fortran a shot also.
If you like Matlab, the way I pitch it to our postdocs i ask them what tool do you use
to prototype they typically say well python or matlab or julia and i ask them what you know when
you want it to run fast for production what do you use and they say well c++ or fortran and then i
say well wouldn't it be nice if you can use fortran interactively from the beginning and develop
start in fortran let's say using lfortran and develop your start in Fortran, let's say using L4Tran, and develop your prototype.
And then because it's already in Fortran,
you can just take it and put it in the production code
and it will also run fast.
And I would say if your application is to write a compiler,
I would, you know, as you see, I chose, I like Fortran,
but I chose C++.
So it's a great application for C++.
Yeah, that sounds like for numerical processing,
you're saying this is where Fortran still has its niche,
even after 65 years or whatever, right?
Right.
So, yeah.
Right.
I would say the issue with Fortran is language is very nice,
and still, I would say, has not realized its full potential.
And the reason is that the tooling around Fortran and the compilers, I think, are a little bit lacking.
And so that's what we are trying to fix.
Awesome.
Yeah, very cool.
Anything else you want to tell our listeners about before we let you go?
I feel like we've gone over a lot.
Yes.
I would say if you are interested, please join us.
Go to fortranlang.org and join our discourse or just contact me.
I'm happy to get you up.
We are looking for contributors and users.
Okay, and what's the best website to go to?
fortran-lang.org
Okay, great.
There are links to discourse and other things.
Awesome. Thank you so much
for coming on today, Andre. Thank you for having me.
Thanks for coming on.
Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the
podcast. Please let us know if we're discussing the stuff you're interested in, or if you have
a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to
feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow
CppCast on Twitter. You can also follow me at Rob W. Irving
and Jason at Lefticus on Twitter.
We'd also like to thank all our patrons
who help support the show through Patreon.
If you'd like to support us on Patreon,
you can do so at patreon.com slash cppcast.
And of course, you can find all that info
and the show notes on the podcast website
at cppcast.com.
Theme music for this episode
was provided by podcastthemes.com.