Advent of Computing - Episode 171 What Is A 4gl

Starting point is 00:00:00 In 1958, an ambitious project was undertaken to develop a universal programming language. That language was initially called IAL, but it's better known as Al-Gol. The goal wasn't necessarily to make the perfect language, rather to make a language that could be used for everything by everyone. Al-Gol was planned to be backed by a rigorous standard and form a basis for international collaboration among programmers. It's a great idea on paper. While groundbreaking, Algold didn't really take the world by storm.

Starting point is 00:00:36 The reason for this was simple. The entire endeavor was cursed by an old sailor. It was doomed from the start. Now, it's not exactly true, but it does sound nice and dramatic, huh? Algole's goal was only one possible path for programming languages. Some believed that it would be preferable to have a single, fully general-purpose language that was used for everything. On the other end of the spectrum, we have specialization.

Starting point is 00:01:05 Rear Admiral Grace Hopper, a not-quite sailor who didn't necessarily deal out curses, believed otherwise. Her view, which was shared by many other programmers, was that we needed specialized languages for different tasks. Many languages form many purposes. From this, we can form a bit of a spectrum, from fully general-purpose languages to fully specialized. All languages today fall somewhere on that spectrum. Most are kind of in the middle, a mix of specialization and general purpose features. But what happens out on the edges? Is it possible

Starting point is 00:01:43 to be too generalized? Or conversely, to be too special purpose? Welcome back to advent of computing. I'm your host, Sean Hasse, and this is episode 171. What is a 4GL? So, yeah, I hope you listened to last episode. If you did, you remember how I mentioned that fourth-generation languages would be an interesting topic for another time? Well, I guess this counts as another time, right? What exactly is the story with 4GELs? In today's episode, I'm going to try and figure that out. Here's the very general rundown.

Starting point is 00:02:31 And this is all I really know going into the research for this episode. First generation languages were machine code. Second generation languages were assembly language. Third generation languages were procedural languages, aka normal-looking programming languages, like C or Basic. Then we get to generation number four. 4GLs are higher-level still. They use English-like syntax, and they're often described as non-procedural,

Starting point is 00:03:00 meaning you don't have to describe every step that the computer needs to take to get a result. They also tend to be used in conjunction with some tool like a database. I said last episode that I don't really like this generational classification system. There are a number of reasons for that. Languages don't really fit into nice buckets. That's one of the reasons I find them so interesting. You can have pretty primitive languages that have English-like syntax, and you can have languages created after the term 4GL was coined that are definitely not in that generation. Then why come up with these labels at all? And what really is a 4GL? This episode we're going to tackle those questions. I want to see how the term 4GL was created and if it comes out of a larger

Starting point is 00:03:51 tradition of language classification. I want to figure out what exactly makes a 4GL so different from some other kind of programming language. And ultimately, see if there is a good reason to throw those languages into these huge buckets. Let's kick things off with a frustrating and perhaps futile search of the literature. When was the term fourth generation language first used? Why do I want to start here, you may ask? Well, I almost want to say simple, really, but it's not that simple. I don't have a simple explanation for this being my starting point.

Starting point is 00:04:34 I want to find the first use of the term and use that as a bit of an anchor point for future inquiry. That's because the idea of a 4GL, as in some newfangled kind of language, is so vague as to be almost meaningless on its own. If we can find the first use of the term, then we can get some insight into what it meant in context in the period. That, in turn, should give us a springboard to keep moving. Usually, I'll start this kind of search with Wikipedia. Specifically, I like to steal their citations, and then use those citations to find more citations. It's a nice method, and it helps me do a quick breadth first search, and then I can identify

Starting point is 00:05:22 what sources are worth investigating further. The article for 4GLs, however, is bizarrely vague for some reason. It's also lacking a lot of useful citations. The time frame for the origin of 4GLs is sometime in the early 80s, but that page only cites two sources from the 80s, which that makes no sense. That's just kind of sloppy at best. One of those sources that's cited on the Wikipedia page, and also is a cited in other articles I found about 4GLs, is a 1982 book titled Application Development

Starting point is 00:06:02 Without Programmers by James Martin. This, on its own, is a very interesting text. The core argument of the book is that the current mode of software production is unsustainable. There's simply too much software that needs to be written and not enough programmers. The obvious solution would be to have more programmers, but Martin flips that on its head. He argues that it's not actually practical to have enough programmers for all these software needs. It would just be too difficult to initiate enough humans into the software arts. Rather, it would be better to create a tool that lets the laity author software. Now, Martin isn't exactly saying that we should cut out programming.

Starting point is 00:06:51 entirely. He also isn't saying that anyone should be able to produce software. His scope here is very restricted. Martin is only really concerned with business software. Even more specifically, data processing and analysis software. In other words, he wants better tools for accounting. The solution that Martin proposes is this thing called the fourth generation language. Notably, Martin is a little bit odd with his description of 4GELs. He gives us a list of languages that fit his programmer-free requirements. He describes that list and then starts talking generations. To quote,

Starting point is 00:07:37 How should we describe the set of languages in figure 2.1? That's Martin's big list of languages he considers 4GLs. Continuing. They are sometimes called database languages, but some do not use a database. They are sometimes called non-procedural languages, but many contain procedural code. They are sometimes called fourth-generation languages.

Starting point is 00:08:02 The first generation was machine language. The second generation was languages, the level of assembler language. The third generation was machine-independent languages of the level of Cobol, PL1, and Basic, end quote. So there's the rundown of generation. But even still, Martin is a little slippery here. Here's what follows that passage. Quote, Fourth Generation Languages is a useful term.

Starting point is 00:08:31 However, it is sometimes misused. Any new language is likely to be called fourth generation by its promoters, even if it does not have the characteristics we have described. Fourth Generation language seems to be the best term we have if used with caution. A language should not be called fourth generation unless it's used. users obtain results one-tenth of the time with COBOL or less." Martin is talking as if 4GL is an existing term. It's sometimes misuse, which means it must already be in use.

Starting point is 00:09:05 Also, to note that one-tenth of the time of COBL thing, he's describing 4GLs in terms of productivity gains, not necessarily in terms of their design. That's interesting, and it's kind of a pattern we see. There's not exactly a specification list for all of these features of 4GL needs. There's just kind of this vibe that a 4GL has to be easy to use, has to be more productive to use than existing languages, and it should be pretty good at data processing.

Starting point is 00:09:37 Now, going back to my first point, this book points to an earlier use of the term 4GL. So can we find that? Well, kind of. I was able to dig up a 1973 computer world article about RPG2 of all things. If you don't remember, RPG is a language developed at IBM to replace punch card tabulators for report generation. This 1973 article calls RPG2 a, quote, true fourth generation language.

Starting point is 00:10:13 It goes on to list the three earlier generations following the same thing. formula as Martin. This is a bit of a clue, right? It's fair to say that Martin is working from an established tradition. It's also possible that the term is somewhat new in 73, or at least a little obscured. It would be weird to rehash a common concept so explicitly. The other angle here is the term 3GL, third generation language. When exactly does that show up in the literature?

Starting point is 00:10:48 From what I've seen, 3GL seems to have been part of the lexicon for quite a while. I can't exactly pin down an exact origin point for this one either. Here's where I actually get to call up a neat source. Back in the day, jobs used to be advertised in newspapers. If you go trawling through scans of newspapers, you can find job postings for all kinds of super specific things. I've used this in the past to get approximate dates when companies start operating. or to figure out where offices were actually running out of.

Starting point is 00:11:23 Here, we can use it to figure out when 3GL enters common use. If a company is hiring using the term 3GL in an advertisement, then we can assume it's commonly understood in the community. The earliest reference I can find are from ads in 1966. As a neat aside, these ads are actually for junior programmers who will be trained to use 3GLs on the job. These early ones are primarily for COBL jobs additionally. So this kind of generational language is in use as early as 1966.

Starting point is 00:12:03 But, and here's the biggie, it's not in academic use. There are basically no academic works before 1984 that use the term 3GL or 4GL, or 1GL or 2GL for that matter. The only one I can dig up is actually talking about the third iteration of a project as a third generation language, which, I mean, yeah, that's a good use of the term, but it's a different thing. That backs up a complaint that I've seen online in quite a few places, that 4GL was a buzzword.

Starting point is 00:12:40 But it's a little more complicated than that. 3GL is initially used infrequently and casually. It's used in more business settings, but not academically. 4GL also shows up in casual business settings as early as 1973, but it's not until 81 and really 82 that it picks up steam. This, well, that leads to a bit of an issue. 4GL doesn't really have any temporal or evolutionary, meaning. It kind of implies that it's the next step ahead of existing languages.

Starting point is 00:13:21 But it's not. 4GLs are a very specific style of programming language, not a specific era or period or even feature set. Case and point, RPG2, that's created in 1969 and is, in its period, called a fourth generation language. C is codified a few years later in the early 70s, and it's a 3GL. This is just a long-winded way to say, everything's a bit dumb here. Having these generations numbered, implying that they're sequential, that's very buzzwordy.

Starting point is 00:14:02 That's not very true to the real world. Bottom line is, we're dealing with a buzzword that does have some meaning on its own, but it's a little too fuzzy for my taste. That said, 4GLs do form a distinct category of programming languages. What are those languages like? We've talked about some of them on the show before. RPG is one, so is SQL, actually. To get a larger sense, let's pick out one of these earlier so-called 4GLs.

Starting point is 00:14:37 The language I want to focus on is called Nomad. Martin dedicates a whole chapter to this language, so I figure that's a good place to get an idea of what a 4GL is supposed to look like. What I want to establish here is that 4GLs don't exactly appear in the 1980s. Rather, they always existed. Or, you know, always within reason, right? The term appears as a way to group together languages with similar design goals. Nomad is developed in the early 70s, and it serves as a pretty good illustrative example. A big reason I like Nomad as an example is that the language was developed to replace a still

Starting point is 00:15:24 earlier language called Ramis. That's RAMIS. Nomad isn't so much an improved version of this earlier language. Rather, it was developed to replace Ramis. So we have an entire ecosystem of fourth generations that exist, live, and die, before the term 4GL really becomes popular. The tale of Nomad and Rammis starts in the 1960s with the founding of National CSS. This was a time-sharing company. They sold compute hours on their mainframes to anyone with the scratch. But you don't just buy an account on a computer for the sake of having that account. The whole point was a clients would buy time on NCSS mainframes so they could run specific programs.

Starting point is 00:16:14 Early on, that was Fortran and Cobol compilers. Over time, NCSS would bring on more programs. Each new program gave users more tools and also made it possible to court new kinds of users. In 1970, this program called Rammis made its way onto NCS's big iron. Rammis was a pretty unique offering in the lineup here. It's a database, report generator, and a command language to tie everything together. In other words, it's a classic 4GL. From the scant information I can find, Rammis actually sounds kind of similar to COBBOL.

Starting point is 00:16:55 Users would create files that stored raw data. They'd describe the structure of those files and then write simplified code to access and manipulate that data. The end product here is reports. The key part here is that Rammis was meant to be used by non-programmers. As a 4GL, it's a tool for those outside of our circle. There's an oral history of the folk that would later develop nomad that I've been working off of. Within that text, there's actually a neat argument over if Rammis was even a language at all. I'm falling on the side that it was a language, mainly because of how expanse of the term

Starting point is 00:17:36 4GL ends up being. Anyway, how does this lead to Nomad? Why? Corporate disagreement, of course. NCS was licensing Rammis directly from a company called Mathematica. No, not that Mathematica. Apparently, it's just some other company named Mathematica. Wolfram not involved.

Starting point is 00:17:58 So, this Mathematica company didn't really want to license to a time-sharing vendor. They wanted to sell software more directly. So, according to that nomad oral history, it became increasingly difficult for NCSS to negotiate with Mathematica. As a result, NCSS decided they'd make their own 4GL offering. This would let them ditch Rammis and Mathematica, while also going into business for themselves. It's a classic win-win. This led to a team within NCSS starting the designs for Nomad in 1973. This was not intended to be a recreation of Ramos.

Starting point is 00:18:40 That part is crucial. Rather, NCSS wanted their own competing product. That meant that the team had the opportunity to create something better than Ramos. They could address issues that users had the older software, while also addressing issues they had with its vendor. I want to point out a few interesting aspects of nomads' development. The first is that the initial team actually took up separate office space from NTSS's main office.

Starting point is 00:19:13 This was done so they could work uninterrupted. I bring this up not so much for the larger story, but because, well, that's not how we do things anymore. It's neat to see these kinds of stories where programmers just operate differently. There were three initial developers on this project, and one of them actually went off and found physical office space and rented it, just for the project. Once settled, the trio started to define a spec for Nomad. Bales, one of the developers, makes it clear the spec was very, well, ambitious would be a word. As he says multiple times

Starting point is 00:19:52 during the oral history interview, quote, the team threw everything and the kitchen sink into the spec. I never imagined we could actually deliver all of it. End quote. Fine Lebe, another of the developers, described the process this way. Quote, we sat around the table and we wrote stuff on the board, and we basically went through doing a lot of use cases.

Starting point is 00:20:15 If you want to think of it that way, saying, okay, here's what we're modeling, here's a report we'd like to produce, here's a data structure we'd like to model. and I remember us having a picture of an organization chart of a company, and we would constantly go back to that organization chart to design a report to say sales or something by month or whatever. And it was a fun time.

Starting point is 00:20:36 We had a great group, end quote. The use case thing here is particularly neat. Nomad is designed more like a product than a language. Whenever a new language is designed, you have to lay down a set of goals. take Fortran as the most classic example possible. Fortran's design goals were speed and accessibility. Its designers wanted a language that an engineer or scientist could use, so it had to be relatively easy to understand.

Starting point is 00:21:07 They didn't want a whole lot of details about the computer creeping in. At the same time, they wanted it to produce programs that executed quickly. Those goals shaped how the language was designed and how the compiler was implemented. Note how broad those goals are. Fortran was developed as a very general purpose language. It's meant to be useful for anyone who wants to do math on a computer. That's a very wide field. Nomad, on the other hand, had much more specific design goals. You want to have sales reports by month or department. You want it to model an org chart for a business. The NCSS team was working from these specific use cases, and extrapolating outward.

Starting point is 00:21:52 This leads to much more specialization. This also kind of reminds me of RPG, another early 4th-gen language. RPG was designed specifically to allow old IBM tabulator nerds to create similar reports using a computer. That's a design that arose from a very specific use case and led to a very specialized program. Take this as another mark of similarities between 4G. It would take a year and a half to go from initial idea to a working product.

Starting point is 00:22:27 So what did we end up with? Oddly enough, Nomad is actually very, very similar to the later SQL in some regards. In other ways, well, let's just say it kind of shows its age. Let me go over some highlights. First of all, we have Nomad's database. It supports both relational and hierarchical. data structures. In the early 70s, we're at this transitional point. Some of the earliest databases were hierarchical. Under that model, you have records that belong to some set of data, and those

Starting point is 00:23:03 records themselves may own deeper nested sets of records. It's simple but not super flexible. In the 70s, the first proposed designs for relational databases emerge. These designs are more flexible, but they're more complex to implement. SQL is a relational database, for example. I'm actually pretty sure you can just implement a hierarchical data structure using a relational database, but whatever. Folk like to have the older model around, I guess. To kick things off, a user would describe a schema for their database.

Starting point is 00:23:38 Application development without programmers gives us a neat little example schema. And I can say, with confidence, It's not that simple. It actually looks a lot like Cobl. That comparison is coming up again, right? In the scheme of file, you define each table, its fields and its keys. Crucially here, this is done with special syntax. What makes me think of COBEL is the fact that you have to define what type of data each field stores and its size

Starting point is 00:24:11 using kind of arcane special encoding. For instance, here's one of the example items. Item department number as A4. That last part, the as A4, defines the item type. A is for alphanumeric. Four means the character length. There's another item later in the example that's defined as dollar sign 9-999-999.999.9.99.9.9.9.9.

Starting point is 00:24:41 That's a set-size currency value. You can also define masks and constants here. You can find numbers that have upper and lower limits, but it all follows a similar pattern. You have to know this arcane way of structuring a data type. Cobal data definitions work in a very similar way. Instead of using some kind of operators or fancy syntax, you write out more of these special patterns.

Starting point is 00:25:08 To be fair, things in COBEL are a little more. clear, but it's still this strange magic that you have to memorize. I find that curious because of the whole English-like syntax part of 4GLs. However, we are looking at software from before that definition was popular. I found no contemporary texts that actually call nomad of 4GL. It's only later that it gets put in that category. So we can't really expect it to fit every aspect of 4GLs. With that said, though, how does Nomad describe itself? Well, here's how it's explained in a 1983 reference manual. Quote, Nomad is a generalized database management capability designed to support a wide variety

Starting point is 00:25:57 of applications in a time-sharing environment. Nomad is a single-level, command-oriented system that allows a user to perform the full range of database management and reporting activities in the manner best suited to the specific application, although there are no special environments and nomad for data manipulation, the system can conveniently discuss relative to three major areas of database management, database organization, maintenance, and report generation, end quote. You will note again the slippery language used here. It's pretty long-winded and non-specific.

Starting point is 00:26:36 There isn't really a simple label to put on this kind of software. This is kind of the power of that 4GL buzzword. By the 1980s, we have a collection of these language-tool hybrids that are broadly similar. By throwing that name on them, well, you can save a lot of ink. In later years, Nomad is just described as a 4GL. That's three letters, and anyone who's working within the business space would understand that. It's also a lot more precise, or at least it sounds, a lot more precise than saying Nomad is a database capability that allows you to perform operations

Starting point is 00:27:17 in three major areas of database management. Okay, I got a little sidetracked. So let me reel back into specifics. Once a scheme is defined, you can get interactive. Nomad isn't just a database, and it isn't just a language. It's also an interactive environment. But you do have to have a schema to actually interact with that environment. You can actually enter data from the interactive shell itself. It's all command line, but it's actually pretty user-friendly. One way to enter data was with a command called prompt. This prompts a user to fill in a database entry one field at a time.

Starting point is 00:27:59 It quite literally walks you through adding data. To me, as a seasoned computer person, that sounds awful. That's a fully manual process, and us programmers hate to be manual. But this isn't for us. This is for business users. Like I said last episode, business users don't really want computers. They want appliances that do business things. If you have a table schema, the nomad can walk you through entering data one field at a time.

Starting point is 00:28:29 It's handholding, but in a context where it's pretty handy. As far as appliances go, Nomad had all the best bells and whistles. Once you get some data loaded up manually or otherwise, there are other options, you can generate reports. Now, the syntax here is very similar to SQL. In fact, the oral history I've been using has a couple of jokes where the authors say that, oh, SQL must have been made by a Nomad user. The syntax is very English-like. It has very few special characters.

Starting point is 00:29:04 One example from application development without programmers reads as, quote, list by product across month some sales amount, end quote. That's one of the neat features of 4GLs. You can read their code like an English sentence. I'm not going to ruminate on the syntax here because, well, it's broadly similar to SQL. Besides, all 4GLs kind of look

Starting point is 00:29:31 and feel similar in a way. That example line of Nomad generates a report. In fact, you can probably tell me what it reports on. That would give you a listing of monthly sale figures for all your products. It's a really simple report. If you want something more sophisticated, you can actually save larger queries as a program. You can also format text with Nomad.

Starting point is 00:29:58 This is used to generate fancier. reports. The feature is, again, very reminiscent of another language, RPG. RPG has a text formatter built into it, so you can generate very specific looking reports. Again, all these 4GLs have a very similar feel and feature set. One thing that I didn't really expect to see, and I think kind of distinguishes Nomad, is that it can generate graphical plots. If you replace list with plot, then boom, you get a graphical representation of your data. Normally, you don't see something like plotting software built into a programming language.

Starting point is 00:30:40 It's usually an extra library or some outside option, or you might have to go even grab special software. So the inclusion of plots as a native command feels strange, but it kind of makes sense. Nomad and other 4GLs are bundles of all this vaguely related software. In the case of Nomad, you have a database, a report generator, plotting software, and even tools for creating interactive forms. That's right. You can rig up your own custom prompts. This is all strung together with a programming language. That means we kind of have to flip how we look at 4GLs, or at least I have to flip how I think about 4GLs. For Nomad, the programming language is really just a necessary tool.

Starting point is 00:31:32 The language is only a way to deal with the database and report generator and plotter and interactive forms. As such, it's not central. That inversion makes it easy to get confused by 4GLs, or at least easy for us initiated programmers to get confused. 4GLs aren't really about the language itself. They're about everything around the language. You can also see this on a meta level.

Starting point is 00:32:01 Let's take the contemporary articles, the oral history, and the later papers about Nomad and its history. In those sources, there is not a big emphasis on language design. Sure, there are short passages about nice language features, but it's not really at the center of discussion. These sources all focus on the business and application side of things. If you go and find similar sources for, say, Algoal or C, you find pages and pages just about scope, or about the use of semicolons. You won't see that for many 4GLs. Nomad, for what it's worth, was received very well.

Starting point is 00:32:46 By the end of the 70s, you can find it referenced and discussed all over the place. But crucially, not really in academic circles. Articles on Nomad show up in trade magazines and books. You can find industry reports on adoption and sales. I actually ran into a number of articles on new trends and programming that use Nomad as a key example. But I can't find hardly anything in ACM or ICCI. This is very much business stuff for business people. Let's keep that last bit in mind as we zoom out.

Starting point is 00:33:22 Nomad, like similar languages, is designed as a tool for business. It's not a language first. It's a language second. That means that 4GLs fall into a different niche than traditional programming languages. Rear Admiral Grace Hopper is one of my personal heroes. You should probably know that by now if you've listened to any amount of my content. One of her observations that I always think about is that the number of bugs in a program scales

Starting point is 00:33:57 proportionate to the number of lines of code in that program. Hence, the best program is one that has less code. Or, the more productive programmer writes fewer lines. It's a very zen kind of observation. Also, I guess, means that the best program in the world is one without a single line of code, but Let's not think about that too much. We can use this as a lens to look at the larger arc of programming language development.

Starting point is 00:34:27 As languages become more sophisticated, a single line of code can do more. A line of C can do more than a line of assembly. A line of nomad, well, that can do much more than a line of C. In theory, that means a nomad program should have fewer bugs than a C program, and so on and so forth. That's one way to view things, and I think it's a very natural way to view things. However, that doesn't mean it's completely correct. The overall picture is a lot more complicated. We know that there isn't some smooth progression in technology.

Starting point is 00:35:07 Allow me to bring in my next source. It's a paper called The Impact of Fourth Generation Programming Languages by Alan Tharp of North Carolina State University. This article was published in the ACM in 1984. I like this source for two reasons. First, we get an actual academic publication that mentions 4GLs, so I like that. It means it's removed from the business side of things a little bit, which I think gives us more of a thousand-yard view into the whole 4GL phenomenon.

Starting point is 00:35:41 Second, it uses a pile of other contemporary sources to draw conclusions. So it's almost a meta-analysis of all of this research into 4GLs. Tharp himself is a computer science professor, so this paper is trying to see how 4GL adoption may impact education. I think that's, again, important because it will give us a read on how future impacts of 4GLs were viewed in the period. And there's one other reason why I like this paper. Tharp comes out the gate swinging with this remark, quote, the term language in 4GLs is a misnomer, since many software systems placed in this classification are not yet well enough integrated to be considered a language. At their current stage of development, the term

Starting point is 00:36:35 programmer productivity aids may be more accurate. A 4GL is the software development system which goes beyond the traditional 3GL in providing a greater programmer productivity. End quote. Language is a misnomer. That speaks volumes, right? I also think I'm going to use that on some of my coworkers when they submit poor software. So is Tharp taking a cheap shot at 4GLs? Well, I think it's part of a larger trend.

Starting point is 00:37:10 4GLs come at programming from a very different angle than, if I have to use the word, 3GLs. As such, there is a lot of debate in this period and even later over 4GLs are full-on programming languages. I mean, the dudes that wrote Nomad were arguing about the applicability of that term in an oral history. R4GL's programming languages? Strictly speaking, yes. We can take Nomad as an example. To be turning complete, a language needs conditional branching. That matters because if a language is turning complete,

Starting point is 00:37:51 it can be used to describe anything a computer can do. It makes it a full-on language. Nomad has conditional branching. If expression, then go-to-label. That means it's turning complete. A nomad program can do anything a computer. computer can never hope to do. For other 4GLs, things are a little more dubious. RPG, for instance, is only turning complete on a technicality. You can kind of do

Starting point is 00:38:19 conditional execution in RPG, but you have to use some tricks, and it's not exactly comfortable to do it. But just because the language is turning complete doesn't actually mean you want to use it for everything. This is another place where we should think about the fact that the best possible program actually contains no code. If a programming language doesn't want to do something, if you have to do all kinds of tricks and contortions to make it do something super specific, you're going to end up using a lot of code and you're going to end up doing things that aren't optimal. That could lead to bugs or just problems. A 4GL like Nomad is super specialized. You aren't going to be writing a compiler

Starting point is 00:39:05 and Nomad, even though it's technically possible. Would you consider a programming language that can only do a few things a real language? That's a pretty philosophical question. I think the fact that we're even considering that question means that 4GL's, well, they fall into a bit of a weird, liminal space. That's part of why I like Tharp's paper so much. Sure, he uses the term 4GL, but then he points out that they're, closer to productivity aids than programming languages.

Starting point is 00:39:39 And he backs that up. By 1984, a number of studies had been carried out on the effectiveness of 4GLs. These studies specifically looked at programmer productivity. One 1983 study showed that 4GLs made application programming 20 times faster. Now, I can't find the original study. I want to. It's got to be fascinating. But anyway, Tharp cites and quotes it extensively, so we have some record.

Starting point is 00:40:15 Now, a 20 times improvement in speed is a huge difference. These days, we kind of quibble over if AI can boost programmer productivity by tens of percent. 20 times, as Tharp points out, is 2,000 percent faster, 2,000 percent more productive. Compare that to maybe 20, 30 percent? That's wild. That is kind of beyond comprehension. A 1981 article in Datamation made a very similar claim. This article was discussing management information systems, MIS, more business software. Specifically, it was looking at how a new 4GL called Focus could change offices. To quote, by programming the system entirely in focus, the MIS was implemented with one-fifth the people

Starting point is 00:41:12 needed to do the same job in Cobol, end quote. The headline writes itself, right? With focus, one programmer can do the work of five. However, expectations must be tempered. This productivity gain was observed specifically in business applications. The report that Tharp quotes used 11 business programs as benchmarks. 4GLs worked really well in their specific domain, but not generally. All these wild benefits are for very, very specific types of data processing software. This is where we go full circle back to application development without programmers. Martin sets up his book by hitting the numbers.

Starting point is 00:42:02 He's trying to show that the demand for new software will far outpace the speed of software development. To quote, If we assume no increase in programmer productivity, the figures above indicate that in 10 years' time, the industry will need 93.1 times as many programmers as now. There are approximately 300,000 programmers in the United States today. That suggests about 28 million programmers in 10 years' time.

Starting point is 00:42:32 Before long, the entire American workforce would be needed to program its computers. Ridiculous! End quote. When I first read that passage, I couldn't help but laugh. It is ridiculous. I am well over 10 years after Martin made that projection. As of 2023, there are an estimated 1.6 mills. professional programmers in the U.S. That's way off than 28 million.

Starting point is 00:43:03 So what's going on? Is Martin's math just plain off? Why would he predict that we need so many new programmers and so much new software? For this to make sense, we need to cast ourselves back to the period. Before Fortran, before Ratfor, when programmers were programmers and wrote in hexadecimal.

Starting point is 00:43:24 Well, maybe not. quite that far back. Martin is specifically concerned with data processing software, aka business software. The guy's into accounting software. What did the landscape look like in 1982? I hope you can kind of answer that question at this point. It was a mix.

Starting point is 00:43:46 We did have 4GLs, Rammis, Nomad, RPG, SQL, and many more acronyms than initialisms. We also had COBEL. At the time, a large amount of business software, read the majority of business software, was written in COB. In fact, COBL is usually the second part of any headline about 4GLs. It's not really any wonder why I keep name-dropping that language. Such-and-such 4GL is 10 times as cool as COB. Replace 100 COBOL programmers with a single nomad, that sort of thing.

Starting point is 00:44:24 reading through all this 4GL stuff, I found a very heavy focus on comparisons to Kobol specifically. Almost all comparisons are to that older language. Secondarily, there will be comparisons to Fortran or PL1, but Cobol is really the big boogeyman here. That makes my senses tingle a bit. I guess I should explain why. So, strap in for a bit of a ramble. The unstated premise in these articles is that data processing software is developed in a very specific way. Some business person, ideally in a suit, says they need some kind of report or analysis.

Starting point is 00:45:08 They go to their in-house programming staff and they plead their case. Then the programming staff writes a COBOL application once they have the time of day to do it. That program is then delivered to the business person and the analysis or report or whatever can be done. Martin uses this assumed workflow when he calculates this quote-unquote application gap. There are all these programs that business folk want. That's way too much for the current programming workforce. What are these applications? Well, it's a mix of ad hoc analysis and recurring analysis.

Starting point is 00:45:47 On the one hand, you have suits wanting to run what-if scenarios or look for patterns or run specific one-off reports. On the other hand, you have something like payroll or taxes that you have to do on a continuing recurring basis. The part that makes my senses tingle here is COBOL. This all assumes that all of these jobs had to be done from scratch and in COBEL. It's true that COBOL was the language of business, and it can do database-like work. But where there's just no better tools?

Starting point is 00:46:26 The answer to that depends on the time period. Martin's book goes to print in 1982. The term 4GL was in use earlier, but it seems to really pick up steam in 81 before it becomes truly popular in 82. It would have taken a while for Martin to make all his observations anyhow. So I'm going to peg things to 1980. That's a nice round number after all. So what kind of business tools were out there by 1980 that could support ad hoc and recurring analysis?

Starting point is 00:46:59 Well, I mean, there's nomad, right? There were extant 4GLs kicking around in the shadows. So that was an option. But they must have just not had that much traction. There were also accounting programs. This is actually a much, much larger topic. I'm not going to say it. I swear. I'll think about it another time, maybe.

Starting point is 00:47:25 There were a number of accounting packages that had replaced earlier tabulator-based systems. By 1980, there were primitive spreadsheet programs running on mainframes. There were tabulation programs. IBM would even hook you up if you were a client. Then in 1979, we get a lot of, a C change. That year, VisiCalc is released for the Apple 2. It's the first graphical spreadsheet

Starting point is 00:47:50 program. I mean graphical as in you see a grid on the screen, just to be 100% clear. VisitCalc is a program that lets non-programmers do ad hoc and recurring analysis of data. It's not as capable as a 4GL. It can't work with as much data. It can't be as flexible, but it is easy to use. One huge promise of 4GLs was that non-programmers could use them. They don't just make programmers more productive, they're more supposed to make it so you need fewer programmers. In a perfect world, a suit would just have to think,

Starting point is 00:48:30 oh, gee, I wonder where our sales figures are for last month. Then they'd have to go tell their programmer, which is, of course, themselves, that they should write a report for that. Then they just have to open up a terminal, type in that query, and get the result. No dedicated programmer is needed. VisiCalc offers the same proposition,

Starting point is 00:48:53 or at least a very similar one. A spreadsheet isn't exactly like a database. It's much less powerful and much less flexible. However, it does let you do quick data manipulation and reporting. If set up correctly, you can do recurring tasks as well. VisiCalc also manages to completely remove the programmer from the loop. It's end-user software, plain and simple. So then it's really just the power aspect that's the main issue here.

Starting point is 00:49:24 A spreadsheet like VisiCalc is just less powerful than a full-blown 4GL. This isn't helped by the fact that Vizacalc only runs on wimpy little microcomputers. but I'm reminded of a much earlier advancement here. It's, of course, my beloved LGP30. Can't go an episode without mentioning that computer, right? When Stan Frankel designs the LGP30, he builds a very simple and underpowered machine. His logic was that not everyone needs a big, powerful computer. Most applications actually just require a little bit of computer.

Starting point is 00:50:02 VisiCalc does just that. For businesses that adopt microcomputers, VisiCalc can give you just a few benefits of a 4GL. And maybe that's all you need. What I'm getting at is that by 1980, there's kind of a predator waiting in the wings. Martin even gives Vizicalk a shout-out in his 1984 book, an information systems manifesto.

Starting point is 00:50:27 In that book, Martin explains how some end users, given the correct tools, can cobble together what they need for complex data processing. This will happen on its own without any programmers. Those tools include spreadsheet programs like VisiCalc. In one corner, we have these weak applications. VisiCalc is much weaker than a 4GL, but it's very easy to use. In the other corner, we have 4GLs that can do basically anything

Starting point is 00:50:57 and process huge amounts of data, but require that users learn a programming system. I hope you can see where we're going here. Business users want an appliance. We saw this all the way back in the punch card era. Once IBM started selling computers, they had some resistance moving their old tabulator customers over to digital hardware.

Starting point is 00:51:21 As a result, IBM developed a number of tools, including RPG, which let old legacy customers use a computer as if it were a tabulator. These businesses don't want a computer. They want a box that does accounts payable, and maybe can occasionally spit out some more fancy reports. As we reach the early 80s, tools like Vizacalc become generally available. That's when we really get the microcomputer boom. The IBM PC hits desks in 1981, the Apple Macintosh does the same in 1984.

Starting point is 00:51:56 In fact, there's even something funny right between them. The Apple Lisa is released in 83. that machine ships with a suite of graphical office software, which includes a spreadsheet program. Microcomputers and office productivity software don't offer the same power as 4GL systems. However, they fit into the same niche. I mean, just look at that Tharp paper. 4GLs are a form of productivity software. My theory, which would take a lot of work to back up concretely, but I think feels correct,

Starting point is 00:52:34 is that the niche 4GL's targeted got eaten up from the bottom. Cheap microcomputers and boring office software ate up the low end of the market. When it comes to most business applications, you don't need custom software. A spreadsheet will do it just fine. So maybe the ultimate downfall of the fourth generation language wasn't so much it's complexity or even the fact that it was only a buzzword, but more that better and more user-friendly tools appeared. All right, that does it for today's episode. Fourth Generation languages aren't so much created as labeled. The term seems to have existed as far back as the early 70s, but it

Starting point is 00:53:22 doesn't really get popular until the 1980s. It's initially used to describe languages that already exist. I had really assumed that there was some language. that came out and claimed to be the first 4GL, but no, it's the other way around. 4GLs appear on the scene slowly over time as these weird offbeat language systems. Then this term is created to label them. If we're just looking at the term, then yeah, 4GL has its place. It does manage to explain and group together all these languages that are doing very similar things. But if we look at 4GLs as their own languages, well,

Starting point is 00:54:02 That's a more interesting story. They straddle this line between language and productivity software. Some aren't even true languages, while others are very powerful. 4GLs are also a very, very special purpose, almost to the point of being single purpose. This combines to make a pretty unique class of languages that's not really meant for your average run-of-the-mill programmer. Thanks for listening to Advent of Computing. You know the drill. I'll be back in two weeks.

Starting point is 00:54:34 You can find everything at advent ofcomputing.com. And as always, have a great rest of your day.

Advent of Computing - Episode 171 What Is A 4gl

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.