Advent of Computing - Episode 60 - COBOL Never Dies

Episode Date: July 11, 2021

COBOL! Just its name can strike terror in the hearts of programmers. This language is old, it follows its own strange syntax, and somehow still runs the world of finance and government. But is COBOL ...really as bad as it's made out to be? Today we are talking a look at the languages origins and how it's become isolated from early every other programming language in common use. Perhaps most importantly for me, we will see is Grace Hopper should really be blamed for unleashing this beast onto mainframes. Selected Sources: https://archive.org/details/historyofprogram0000hist - History of Programming Languages, contains Sammet's account of CODASYL https://archive.org/details/bitsavers_codasylCOB_6843924/ - COBOL 60 Manual https://sci-hub.do/10.1016/0066-4138%2860%2990042-2 - FLOW-MATIC/MATH-MATIC usage paper

Transcript
Discussion (0)
Starting point is 00:00:00 It's something of a fun pastime for programmers to dunk on old programming languages. I think some of that is certainly justified. Technology is always moving forward, and some languages are just left in the dust. You aren't really going to be writing a web server in BASIC or a fancy graphics-intensive game in FORTRAN. But that's not to say that these old languages don't have their own place, even if that place is just on a podcast about computer history. Fortran is an example of a downright archaic language that still has a very well-defined niche. The language is used for a lot
Starting point is 00:00:39 of supercomputers. It was built for doing math quickly and efficiently, and Fortran has managed to keep those core principles intact for decades. I've even used Fortran myself, and I can personally attest that there's just some jobs it does better than any other language out there. Then we have COBOL. Probably up there as one of the most easy languages to make fun of. It's become something of a joke amongst programmers. COBOL is really old. It was first developed in the very late 1950s and early 1960s.
Starting point is 00:01:14 It just looks plain different from every other programming language. According to some, programming COBOL can even be harmful to your mental well-being. But despite all that, the language is vital to keeping the modern world running. I've seen estimates that there are upwards of 200 billion lines of COBOL code in production around the world. Now, that's billion with a B. Around half of all banks are purported to run on some amount of COBOL. And if you live in the United States, then I have a great example of how COBOL has affected you directly.
Starting point is 00:01:54 During the recent round of pandemic and direct stimulus payments, the IRS ran into mainframe issues, which caused a bit of a delay with distribution. ran into mainframe issues, which caused a bit of a delay with distribution. You see, doing this kind of direct stimulus payout was outside the normal operating procedure, so some code at the IRS had to be updated to deal with, you know, taxes, I guess. Well, that code just happened to be written in COBOL. And it turned out that most of the agency's staff that knew how to write the venerable language, well, they had retired. Finding new COBOL programmers isn't exactly an easy task, so there were some delays in getting the first round of checks out. This is just the surface level. It turns out that the dichotomy of COBOL, this bizarre but
Starting point is 00:02:46 crucially important language that almost no one knows, runs much deeper. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 60, Kobol Never Dies. Although the alternate title that I've been toying with is Don't Blame Grace Hopper for Kobol. Before I get into the actual show, I need to throw in a quick programming announcement. We're fast approaching the first Advent of Computing Q&A episode. If you haven't been keeping up lately, or if this is your first time experiencing the podcast, then let me tell you that we recently passed 50,000 all-time downloads. That's a really exciting number for me.
Starting point is 00:03:36 So to celebrate, I'm doing a bonus question and answer episode that should be airing in about a week. Maybe more like two weeks. Still working on the exact production schedule. It's going to be on an off week, so don't worry. If you don't like that sort of thing, then it won't take away from my usual content. In preparation, I'm collecting questions from my audience up until July 16th. That's going to be my cutoff date. So there's still time. If you've ever wanted to know something about the show, myself, or a topic I've covered, then send in your questions. I have a pinned tweet on my
Starting point is 00:04:10 Twitter account over at Advent of Comp, or you can just shoot me an email to adventofcomputing at gmail.com. Now, with that out of the way, let's get into today's actual episode. I know I just finished talking about Q&A, but I'm going to start this off with a bit of an early answer. I often get asked if I have any heroes in the field of computer science, and I do. It's Dr. Rear Admiral Grace Murray Hopper. There are a whole lot of people I look up to in the history of computing. There really have been decades upon decades of amazing scientists in the field. But Hopper has always been in a league of her own for me. Part of it comes down to her direct contributions to computer science.
Starting point is 00:04:59 She invented the compiler, a technology that's pretty much the foundation of modern programming. Part of it is her work to educate others and make computers approachable and accessible. But for me, what really pushes Hopper to the top of the list is how relatable she is. There are piles of crucial characters in the history of computing that seem pretty two-dimensional in the historical record. But Hopper's the exact opposite. You can get a feel for her personality even just in technical writings. It makes it easy to understand where she's coming from, which, for me, makes the history that she created all the more understandable. Hopper also had really modern views about computer programming. I think I've
Starting point is 00:05:46 brought this up on the show before, but in my opinion, she was really one of the first modern computer programmers. I mean, she was also one of the first computer programmers in general. So, as a programmer myself, Hopper is almost a patron saint. When she first started working with computers, there wasn't really this identity of a computer programmer. Programming wasn't differentiated from operating, maintaining, or even building a computer. Early on, she started to push back against that. She very literally carved out a spot for a new discipline. Now, that's not even getting into the really stubborn determination that led to Hopper's naval career. Well, naval careers. She didn't become a rear admiral
Starting point is 00:06:33 overnight, but I'm going to stop before I get too much further off topic. I'd like to keep this squeaky clean image of Grace Hopper in my head, but there's always been a bit of a blemish. Commonly, Hopper is cited as the creator of COBOL. Now, I've tried learning COBOL before, for, well, purely selfish, non-academic reasons. The language is still used in a lot of legacy projects. Common knowledge is that it's a big thing in banks. It's a little bit of a trope, but amongst programmers, there's always rumors of a friend of a friend that has a really well-paying gig maintaining old COBOL code. These rumored friends always demand really high wages, often laughably huge amounts of money, because COBOL runs a lot of mission-critical systems, and no one really knows COBOL anymore. So, if programmers are in such demand, then why don't more people learn the language? Well, I'll tell you from personal experience that COBOL is an awful programming language to learn.
Starting point is 00:07:40 I've tried to learn it multiple times, and I always give up pretty quickly. There's something unnatural about it, at least for me as a trained programmer. The language just plain looks and acts differently than I expect. Because of that, I've kind of been putting off talking about COBOL. I just wanted to think of it as this fluke in Hopper's portfolio and leave it at that. I finally decided to look into the matter, you know, like a good, brave-hearted academic should, and I'm really glad I did. In reality, Grace Hopper had a lot less to do with the bizarre language than a lot of people think. At least, Hopper didn't directly create COBOL herself. As with all aspects
Starting point is 00:08:27 of computer history, the story gets really complicated and there's a lot of interconnecting pieces. This episode will be untangling this mess a little bit, examining COBOL and finding out why Grace Hopper is so often associated with the language. Along the way, we'll see why COBOL is such a frustrating language, the dangers of design by committee, and try to figure out why COBOL's remained such an important piece of technology. To kick things off, we need to talk some more about Hopper. Like I said, she didn't create COBOL per se, but her work greatly influenced the language. And also, as it turns out, the roots of COBOL lie really close to the roots of programming languages in general.
Starting point is 00:09:12 It all goes back to automatic programming and the first compilers. Now, automatic programming, or automatic coding, is an antiquated term used to describe programming in really anything but assembly language or machine code. Automatic programming languages were, technically, high-level languages. That's really the closest descriptor we have nowadays, at least. The complication is that early on, we're dealing with so-called automatic programming systems, as in software that turns some type of input into machine code, and a computer can then execute that machine code. We'd probably call them compilers today, but there's some differences. This is a bit meandering, but my point here is that the language we use to describe programming has changed a lot, so
Starting point is 00:10:05 if I slip into calling something an automatic system, that's close enough to a compiler. Whatever you choose to call it, automatic programming gets its start at the beginning of the 1950s. This is early into the digital age. The first electronic digital computers sprang to life in 1945. The first stored program computers, basically the first machines that a modern programmer could recognize, became operational in 1948. Initially, these machines had to be programmed in machine code. An operator had to talk to the computer on its own terms. No one really liked doing that, so almost immediately researchers started looking for a solution. Some programmers put handy-dandy mnemonics in place of raw instruction numbers, thus creating early, simple assembly
Starting point is 00:11:00 languages. But you're still talking to a computer very much in an alien tongue. Then came 1951. That's the year that Hopper, at the time an employee of the Eckert Mautchley Computer Corporation, became the latest programmer to take up this challenge. The result was the A0 system, the first rudimentary compiler. Grace Hopper's drive to pick up this project was simple. At the time, programming kind of sucked. During this period, she was working on UNIVAC-1, one of the very first commercial computers. Despite being the very cutting edge in digital technology, UNIVAC lacked many features that we take for granted today. Text editors didn't really exist, so machine code was usually written out by hand.
Starting point is 00:11:54 Univac instructions were a mix of numbers and letters. Each instruction was prefixed with the location to load it into memory, so you had to be careful about advancing the memory location correctly so things stayed in the proper order. By the time a program was ready to test, it would be multiple pages of these alphanumeric notes. To actually run your new program, it had to first be copied down onto punch cards, then transferred to magnetic tape, and finally loaded into Univac's banks of mercury memory. It's a mechanical and repetitive task, but Hopper found that it was really prone to human error. Quote, You had to copy these darn things, and I sure found out fast that programmers cannot copy things correctly. On Univac 1, we used a delta for a space, and if you wrote a careless delta, it could look like a 4. We also put a decimal in Univac 1.
Starting point is 00:12:50 Any number of people used Bs that turned into 13s. Programmers couldn't copy things correctly. On each of these routines, they started with 0, which when you put them into a new program, you had to add every one of the addresses to position it in the new program. Programmers couldn't add. End quote. Imagine trying to spot an error in hundreds or even thousands of lines of machine code, only to realize the problem came down to poor handwriting.
Starting point is 00:13:22 That's not just frustrating, that destroys any possible progress. It's not very hard to extrapolate this out into the future. How could programmers ever hope to collaborate on huge projects if handwriting and addition errors were core problems of the discipline? The leap that Hopper took was, in retrospect, obvious. I know I've covered this part of the story before all the way back in my episode on Fortran, but it's worth coming back to. One of her colleagues, Betty Halberton, had recently written an amazingly elegant data sorting program. Given a list of data and some parameters about that data, Halberton's program would create a customized sort program just for your dataset.
Starting point is 00:14:10 The idea of a program that output another executable program was totally new at this point. This allowed Hall-Burton to create a highly optimized way to sort data. Hopper would take this concept called generative programming and run with it. The leap was to make a program that took so-called pseudocode as input and then output executable machine code. This new pseudocode would be a lot closer to English, so it would be easier to spot errors in. A single line could translate into many multiple instructions, so you end up writing less code overall. That would also reduce the chance for errors.
Starting point is 00:14:51 Hopper called this initial compiler the A0 system, a new type of programming language with a focus on arithmetic. Now, like I said, this may sound repetitive to some long-time listeners, but I'm retreading this part for a good reason. A0 shares an important distinction with Hopper's upcoming works. This initial system was focused on reducing programmer error, thus making programmer easier on the human side of things. A little detail that underlines this fact is that the A0 system used what's known as a one-pass compiler. That functions just like it sounds. You give it some code and it produces machine code all at one go. This is simple to use since a programmer just needs
Starting point is 00:15:38 to run the compiler once. There's only one step to mess up. However, a single-pass compiler isn't always the most flexible. There can be issues with performance. That's the big one, really. A one-pass compiler will run quickly itself, but the output program isn't very optimized. We could just explain that as A0 being a really primitive system, but I think Hopper would have stuck with this approach even if better technology was available. For her, A0 was a tool to make programming faster and, almost as importantly, easier for humans. If the final product worked and the process to get there was simple, then that was problem solved. Performance, at the end of the day, didn't really matter to Hopper. So what did A0 look like? Well,
Starting point is 00:16:38 that's hard to say. There isn't much preserved information about the first compiler. It was successful enough that Hopper and a few of her colleagues were able to get funding to pursue the idea of a compiler further. This would lead to A1, A2, and eventually A3, aka Arithmetic. A2 and A3 are the most well-documented systems, so looking at these languages in the family give us a pretty clear idea of what earlier A-series systems must have been like. I'm sad to inform you that we aren't dealing with anything close to a modern programming language. Each line of code in an A-Series language was 12 characters long. The actual instruction took up the first few characters, with arguments taking up the rest of that space. There were variables... kinda. Variables were referenced by number instead of defined as a name. As an example, if you wanted to add 2 and 2 and then store it in
Starting point is 00:17:33 some variable 1, you could write AAO002002001. Now, keen observers among you may note that we're still very far off from anything recognizable or, dare I say, user-friendly. A-series languages look a lot closer to assembly language than anything high-level. The idea was that a programmer could write down their program as a flowchart or a series of notes. Then those notes could be translated by hand into the A system. Then Univac would actually do that final step of turning the A system pseudocode into machine code. There's still a couple of steps, and some places for errors are left in, but in general, the A system was reducing how much a programmer had to do. I don't want to get bogged down in the implementation details because
Starting point is 00:18:30 I could talk about that all day and I have before. Compiler design is fascinating and wildly complex, but I do want to draw your attention to one thing. Code written for A-series compilers was really easy for a computer to parse. A more advanced programming language like C or Java is written as pretty complicated text. Lines can be of varying length. Variables have names. Rules have to be enforced so naming conventions play well with each other. There are multiple types of symbols. Some symbols have matching pairs, some are unary. Processing that kind of structured text takes a complicated program, and actually a surprising amount of computational resources.
Starting point is 00:19:19 By contrast, A-series code is simple. It's almost laughably simple. Each line is the same length. There are no names to keep track of. Everything is alphanumeric. Hopper didn't have to solve some complicated string comprehension problem to get an early compiler running. The downside is that while A2 and arithmetic may have been easier than programming in just raw hexadecimal and binary,
Starting point is 00:19:48 that only really helped existing programmers. A2 still looked like indecipherable gunk to anyone else. Doesn't matter if the file of numbers is half the size, that's still incomprehensible. half the size, that's still incomprehensible. This brings us to the general vicinity of 1955 and Hopper's next big leap into the unknown. She was starting to form an interesting opinion about compiler use. As Hopper recalled in a 1980 interview, I decided there were two kinds of people in the world who were trying to use these things. One was people who liked using symbols, mathematicians and people like that. There was another bunch of people who were in data processing who hated symbols and wanted words,
Starting point is 00:20:37 word-oriented people very definitely. And that was the reason that I thought we needed two languages. End quote. Hopper was gaining enough weight inside EMCC that she could take on more ambitious projects. In 1955, two new programming language teams would form under her. One to create a math-oriented language called, if you believe it, Math-Matic. oriented language called, if you believe it, Mathmatic. The other was a business and data manipulation oriented language initially called B0, but it's better known as Flowmatic. Mathmatic is essentially a more fully realized A-series compiler. You could even write inline A3 statements
Starting point is 00:21:21 in Mathmatic itself. It has a full algebra system, meaning that for the first time, someone programming at UNIVAC could just type out an equation as a line of code. Variables were named and assigned values using an equal sign. We're starting to see something we could recognize as a programming language. This phase of Hopper's new projects are also happening contemporary with Fortran. It's unclear how much crossover there was between these two languages, but both have very similar math systems. It seems that, the least we can say is by 1955, compiler design is just becoming complex enough to handle math in some, you know, reasonable way. While an interesting footnote,
Starting point is 00:22:07 mathematic is something of a dead end. It doesn't have any languages that inherit from it. Its sibling, Flomatic, would prove to be more influential. And all things being equal, Flomatic is obviously the better sounding language. Just imagine putting that in your resume. Expert at Flomatic. Anyway, Flomatic was distinct in that it was designed for people who would traditionally not be programming a computer. Manuals and articles on Flomatic explain that it was meant for basically anyone in a business.
Starting point is 00:22:47 So was Mathematic, but Flomatic, it's emphasized even more. One charmingly period manual boasts that Flomatic can help, quote, break the communications barrier between programming and management groups. What we're starting to see is that Hopper had a very different view of digital egalitarianism. For a competing and more widely held view, look no further than the contemporary Fortran. One of the language's major design goals was to find a way for more scientists and engineers to use computers. Work was also done throughout the project to ensure it would be useful for existing programmers, you know, people who were slaving
Starting point is 00:23:32 away with machine code. If you read very long about Fortran, you'll find a lot of details about the acrobatics the team went through to make sure the language produced fast programs. That was partly to please existing programmers. We can easily say that Fortran was aimed at making computing more accessible, but it was in a very closed-off niche. It was aimed at those that were already in the hard sciences, people who were already math freaks, people who liked symbols. When you get down to it, programming is just a distant cousin of the scientific method. Hopper saw things in a different way, and we can observe that really clearly in the whole flowmatic-mathematic split. She believed that computers could be a tool for anyone. But there wasn't any one-size-fits-all approach. Something like Fortran or Mathematic
Starting point is 00:24:27 could work well for STEM people. Those in business hadn't necessarily taken a calculus course. I know, I shudder to think of their loss. But that didn't mean that a computer would be wasted on them. In Hopper's view, these business folk just needed a different way to program a computer. So how does this ideology fit into a compiler? Well, I'll tell ya. Flomatic code was written almost entirely in English. I said earlier that mathematic looked roughly recognizable, and I guess I should really emphasize the roughly part there. A few lines of mathematical expressions in Mathematic will look nearly identical to the same lines in, say, C or Fortran. But outside that,
Starting point is 00:25:18 Mathematic is a lot more, well, verbose. It uses English words where other languages would use fancy characters. Flomatic takes us to a whole other level. In general, I'm going to be talking about Flomatic moving forward, but a lot of the points that I hit are just as applicable to its math-based sibling. just as applicable to its math-based sibling. In Flomatic, each line of code starts with a line number, just like in BASIC. These are used for flow control. So far, that's not bad. But let me change that.
Starting point is 00:25:56 To branch to a specific part of a program in Flomatic, one could write, test x against y, if less, go to sentence 10, period. Now, I figure a quick splash is the best way to approach the weirdness here. A line of code isn't called a line in Flomatic. That's far too pedestrian. It's called a sentence.
Starting point is 00:26:23 It contains nouns and verbs, and it has to end with a period. Semicolons are used to separate clauses. You may have missed it, but this is in fact an example of an if statement. There are no equal signs in phlegmatic. There are no greater or less than signs. There are basically no symbols here that you wouldn't find on a normal page of text. So, Flomatic is a bit of a weird language to wrap your head around. At least, it was until I started to realize something. This isn't in any manual, so we're back to the whole patented advent of computing speculation corner here. But it seems to me that Flomatic was partly designed to replace punch card tabulators.
Starting point is 00:27:12 Those were the machines that were used to handle punch cards way back in the day. And I guess it wasn't back in the day for Grace Hopper, but for us, it's back in the day. They weren't quite computers, but they could perform mathematical operations on a stack of punch cards, and then they could output the results either onto a readout or onto a new stack of punched cards. Machines were used by configuring how input values on cards should be mapped and manipulated to form outputs. Then cards were just ran through one at a time. Crucially, these were often used for accounting and other business-related tasks. So just keep this in mind while we're explaining this language,
Starting point is 00:27:53 and things might start to make a little bit of sense. The first part of a FLOMATIC program is its data definition. This part wasn't in plain English statements, but it wasn't quite in machine code either. Data structures were defined by filling out a worksheet, basically just a page with grid sections. This specified things like the name and location of a file on magnetic tape, if it took up more than one tape, that sort of thing. Importantly, it also specified the actual format that data entries on that file should follow. In a clumsy way, this is what passes for a typing system in Flomatic. Once you have the sheet filled out and passed into the computer, you basically get a
Starting point is 00:28:40 blank spreadsheet with preset columns. Each column, or field in FLOMATIC lingo, has a name. It has a design and a size. Going off a 1958 FLOMATIC manual, these definition worksheets make it look similar to defining a structure, just a more low-level kind of thing. Once you have all your data files designed, then you can move on to writing actual code. Since there isn't a huge amount of FLOMATIC code readily available, I'm going to be sticking pretty close to the 1958 manual's example programs. These all start with an input line, basically a statement to open a series of files. From there, you can read entries into memory line by line and work your way through
Starting point is 00:29:26 each file. There are, of course, options for manipulating data and outputting results to different files. So, in effect, a FLAMATIC program just defines mappings from input files to eventual output files. The actual language contains no real math facilities as near as I can tell. Instead, the manual recommends the use of machine code to handle tasks outside Flomatic's powers. A lot of early languages offer this kind of functionality. In this case, you can embed blocks of machine code directly into a Flomatic program. This is where I think the whole punch card tabulator framing is key. In general, a flowmatic program is going to be used to manage sorting, organizing, and moving around data.
Starting point is 00:30:13 It's not meant for heavy mathematical tasks. Crucially here, you're working with what amounts to rows in a spreadsheet, or you could also say cards in a deck of punch cards. But the fact remains, this is a really weird programming language. So why on earth would Grace Hopper, by all accounts a brilliant programmer, design a language like this? Well, it all comes down to her views on accessible computing. Hopper contended that there couldn't be any one perfect programming language. There shouldn't be. Instead, she believed that there should be multiple languages each designed for a specific niche, just like you have different tools for different jobs.
Starting point is 00:30:59 A hammer is best for driving a nail, but you wouldn't want to use it to cut a board. As Hopper put it in an ACM keynote speech, quote, working with the people that I had worked with in data processing, I found that very few of them were symbol-oriented, very few were mathematically trained. They were business trained, and they were word manipulators rather than mathematics people, end quote. To a programmer, someone who is trained to be symbol-oriented, a language without symbols just doesn't look right. A language without native math functions just seems useless. But Hopper envisioned Flomatic as a tool for a
Starting point is 00:31:40 totally different niche. This decision worked out well. Soon after the first compilers were operational, Univac began selling Flomatic to customers. Crucially, they actually studied how it was being used in the field. For this, we can turn to a 1960 paper written by Alan Taylor, one of the programmers on the Flomatic project. This text, The Flomatic and Mathematic Automatic Programming System, explains the benefits of these new English-like languages really well. Specifics are a little vague since Taylor is working off client data and clients tend to be a little cagey about vendors releasing details. That aside, here's a particularly interesting example of Flomatic at work. One large industrial organization started training some 15 programmers a year. Instead of being
Starting point is 00:32:35 trained in machine code, as had always been done previously, these were given a short machine code course and a thorough two weeks training in Flowmatic. Taylor continues, To date, they have successfully compiled around 200 programs, mainly of the accountancy type. This means the capacity of the computer over and above production runs has allowed programs to be compiled and, if necessary due to various errors,iled, and tested out at the rate of over one every two days. End quote. There's quite a bit of information in just that account, so let me pull out the important bits.
Starting point is 00:33:15 Taylor explains this throughout the paper, that clients found it was easy to train new programmers to write in Flomatic. It was at least a lot faster and easier than training programmers in machine code alone. Those new Flomatic programmers could also write programs a lot faster, and debugging was less of a hassle. But that very last part is what I find most informative. The idea that these programmers weren't just producing more code, but they were able to write a lot more separate programs. To me, that sounds like Flomatic was lowering the barrier to create a new program altogether. Instead of programming being a huge affair,
Starting point is 00:33:57 it was starting to become just a matter of course. The Flomatic manual even backs this up. Quote, of course. The Flomatic Manual even backs this up. Quote, The programming of many one-shot jobs formerly considered impractical is now not only feasible and economical, but also provides invaluable additional fact power to decision-making management. End quote. That may sound like a small detail, but what we're seeing here is a shift in how a programmer could work. If programming is made easier, then it's made more accessible as a tool. Need to answer a question quickly? Just
Starting point is 00:34:31 hammer out a short Flomatic program, compile it, and run it. You'll have a solution in a matter of minutes. Of course, some restrictions do apply. The big picture result here is that by using Flomatic, computers could be used to tackle new kinds of problems. We're seeing the broadening of computational horizons here. It's the little steps like these that really add up over time. The other big picture result of all of this, and one that Taylor spends a good amount of time covering, is that these new languages saved businesses money.
Starting point is 00:35:06 The article estimated that one company that adopted Mathematic in 1958 was able to reduce computer-associated costs by $11,000 over a two-year period. That's including reduced computer time, less time spent debugging, less time spent training staff, and other management overhead. Adjusting for inflation, that's over $100,000. Taylor doesn't give a financial analysis for Flomatic, but we can assume that there would have been similar gains, at least qualitatively. The point is, Mathematic and Flomatic were able to pay for themselves. And beyond that, companies that adopted these new languages saw a drastic change in how they used computers. Imagine being able to better utilize your digital resources while saving money. That's a winning combination.
Starting point is 00:35:58 I don't think anyone could say no to an offer like that. Well, at least anyone that was using a UNIVAC. This brings us nicely up to the period around 1960. I don't really like grouping topics into hard and fast epochs because, honestly, there's just a lot of bleed over between time periods. This is especially true in computer history, if you try and place a hard date on when one era ends and another begins, you run into some weird issues. That said, important dates are a good way to help understand the progression of technology.
Starting point is 00:36:34 If you've listened to any of my other programming episodes, then you know where this is going. 1958 was the year of the first ALGOL committee. And yeah, we're going to have to touch on ALGOL again. I'm going to speedrun this so we don't have to go through the story yet another time. Problem. By 1958, some researchers were worried that there were too many different programming languages. Solution? Form an international committee of experts from science and industry to construct
Starting point is 00:37:06 a universal programming language. The product was ALGOL, a language designed to facilitate international collaboration in computing and the sciences. Of course, Grace Hopper was not part of this committee. The idea of a universal one-size-fits-all programming language was pretty much antithetical to her own views. However, there was a limit to things. After Flowmatic started hitting offices around the country, similar business-oriented languages came out of the woodwork. We start seeing in Microcosm the same issue that the ALGOL committee was trying to deal with. One of these new languages, Comtran, was under development at IBM. More were sure to follow.
Starting point is 00:37:53 The fear here was that a massive fracturing of the market, a breaking of a niche, could occur. At the time, most programming languages were vendor-exclusive affairs. Fortran and Comtran ran on IBM hardware. Flomatic and Mathematic were only for UNIVAC machines. Another business language, called FACT, was in development at Honeywell. If your company happened to be on RCA or GE hardware, then you didn't have access to any of those languages, but instead had your own vendor-specific options. What's the point of having an easy-to-learn business language if there's a dozen that fill that same niche? By 1959, this concern was spreading amongst the industry.
Starting point is 00:38:39 The only thing for it was, as usual, to call a meeting. only thing for it was, as usual, to call a meeting. That year, a group of concerned individuals met at the University of Pennsylvania to discuss the issue. One of the attendees, Edward Block, would write that the group was assembled at the behest of a programmer at Burroughs specifically to, quote, develop the specification for a common business language for automatic digital computers. In a way, this was a less ambitious contemporary to the Algol project. But the business group was tackling a slightly different task. Unlike Algol, a universal business language would need to be accessible to less skilled programmers. Ideally, it would be accessible to people who didn't program at all.
Starting point is 00:39:26 This first meeting was attended by a cross-section of the computing world. Among them were, of course, Grace Hopper, who we've already met, and one Jean Samet. Samet is important for, one, her contribution to the group, but two, she wrote extensively about her experiences for ACM years later. From that account, we know that this first meeting was really just a way to kick things off. The name wasn't decided on until later, but I'm going to drop it now to save us some time. The group would be known as the Committee on Data Systems Languages, or CODASIL. From the beginning, this was a volunteer organization, but operations were still going to cost money. After this first meeting, CODASIL contacted the Department of Defense to see if they were interested in sponsoring the group, maybe
Starting point is 00:40:17 fronting some money for some nicer meetings. Now, you see, this is where we run into one of those trends and forces things that makes history so interesting. The DoD was also trying to make its own business-oriented language, called Amaco. It's unclear how far along this language was. Supposedly, there was a compiler produced at some point. The prospect of an industry-standard language just sounded a lot better to the DOD. By May, Codacil had their first official meeting at, where else, the Pentagon. Samet recalled, quote, about 40 people attended the meeting, including 15 people from about seven government organizations, 11 users and consultants, including one but only
Starting point is 00:41:06 one person from a university, and 15 representatives of manufacturers. The manufacturers represented were Burroughs, GE, Honeywell, IBM, NCR, Philco, RCA, Remington Rand, Sylvania Electronic Products, and ICT. End quote. Hopper was in attendance representing UNIVAC. Samet represented Sylvania Electronic Products, and ICT. End quote. Hopper was in attendance representing UNIVAC. Samet represented Sylvania, but look at the composition here. Just 15 attendees were from computer manufacturers. Everyone else is outside the industry. The committee that drafted the first ALGOL spec was primarily industry experts and scientists. CODASIL was simply staffed by a different type of people.
Starting point is 00:41:50 From this meeting, official goals were set for this theoretical language that they started to call the Common Business Language, or CBL. The language should be easy to use. It should use as much English as possible. Symbols were just too hard to understand. It should use as much English as possible. Symbols were just too hard to understand. It should be machine-independent. And, in Samet's words, We need to broaden the base of who can state problems to a computer. CBL had to be easily learned by as many people as possible.
Starting point is 00:42:22 Now, I'll ask you, does that sound like a language we know? Perhaps it starts with the word flow and ends in matic? I don't think this is a coincidence so much as an example of the outsized effect that Grace Hopper had on the business and computing industry at this time. Amoco, the DoD-funded language, was heavily influenced by Flowmatic. RCA had been working on their own language that was roughly contemporary to the formation of Codacil. Howard Bomberg, one of the programmers on the RCA project, had worked with Grace Hopper during the early days of Flowmatic. So we could call it the tides of computing, or we could just call it Hopper's influence at work. However we want to look at it, Flowmatic was staged to be a huge
Starting point is 00:43:12 factor in whatever happened inside Codacil. But the path followed wouldn't be that direct. Remember, we're now dealing with a government-funded and government-administered project. First, everything would have to go through committees, and then subcommittees. A series of groups and timelines were established, but all that really matters are two committees, the short-range committee and the intermediate-range committee. Even of those two, the short-range was really where it was all at. It's also important to note that Hopper doesn't appear on any of the committee roles. After 1959, she would only serve as a
Starting point is 00:43:50 advisor to the executive committee. So her direct involvement with the fledgling CBL ends here. So let's look at the short range committee. It was tasked with examining existing options and possibly making a recommendation for a compromise language. They were scheduled to report back to the main body in early September. That gave them four months to complete their task. The expectation was some kind of short-term solution, hence the name short-range, maybe choosing a language for use until the intermediate and long-range committees could sort something out that was better. Oh, but that wasn't what happened.
Starting point is 00:44:31 Samet gives a blow-by-blow account of the committee minutes, but it's a little mind-numbing to actually read through. The short-range group broke up into at least three subcommittees. From there, pieces of a new language started to be developed. The best part of her account, or at least what I found the most resonant, was that, quote, some people felt the new language had gone too far. And honestly, that's a pretty reasonable view from everything I've read. After four months, the short-range committee returned to give their
Starting point is 00:45:05 recommendations. What they turned in was a very early draft specification for COBOL. Now, obviously, this was outside the committee's original scope. This was known and discussed in the committee. The chair was recorded as responding with, quote, I also want to point out, in the Constitutional Convention, if they had not taken this attitude, we would not have a Constitution, end quote. Now, I don't know if that sounded more appropriate in context. Samet just pulls that single line into her writing. But it should be clear that we're pretty far off the rails when a business programming language is compared to the U.S. Constitution. That might be a bad sign.
Starting point is 00:45:54 Now, Samet's account also gives us detailed examples of what design by committee really means. The core of COBOL was based off FLOMATIC and similar languages, so we see the same rough design form. One of the subcommittees that broke off the short-range committee designed the data descriptor part of COBOL. A separate subcommittee handled the actual statements that would be used in the language. After months of work, a joint meeting of the entire committee was held, and votes were taken on the draft design. Each statement, that is to say, each word used in COBOL was voted on. Informal polls were taken earlier to find out which keywords had a chance of passing the committee and which would be kicked back for
Starting point is 00:46:43 review. By September, the very rough draft was ready, and the short-range committee had a proposal. They submitted what they had and requested another two months to complete the specification. CODACIL, in full conference, granted the request. After some more fun committee work, a new draft was passed around. Edits were made by a few more committees, including the Intermediate Range Committee, and the first specification for COBOL, COBOL 60, was approved. I guess this brings us fully into the belly of the beast. It's time to take a look at COBOL proper. And this is also where we need to start addressing the oddities of this language in detail. From the outset, COBOL looks uncomfortable, at least if you're used to any other programming language. Part
Starting point is 00:47:34 of the reason for this is COBOL exists in a weirdly isolated group of programming languages. You can plot out languages on something like a family tree to show how they're interconnected, which languages influence which, and so on. These trees are useful because they give a broad overview of how different languages are connected. In these trees, we can see main families and big groupings of languages. The largest group, and the one with by far the most descendants and interconnections, has Fortran and Algol at its root. That's where we get C, C++, Java, even Python is in this family.
Starting point is 00:48:13 There are some exceptions, but these languages tend to have curly brackets and semicolons, discrete blocks of code, and similar control statements. Lisp is kind of off doing its own thing with some interplay with other languages. But then we have Flomatic, COBOL, and their descendants. We have such hits in that lineage as PL1 and REX, if anyone knows those languages, that is. There are only a handful of descendants along COBOL's lineage. It's almost a straight line. If you haven't used one of those handful of languages, then Kobol will look totally unfamiliar to you.
Starting point is 00:48:52 On a more technical note, Kobol's syntax is defined in its own unique way. This is getting into some deep-cut theory stuff, but I think it stands to mention. Algol-like languages are usually defined using what's called the Backus-Naur form. It's a formalized way to explain basically what a programming language looks like. It's a way to define its grammar. This also means that these languages follow a certain syntactic structure that's compatible. To really paint with a broad brush, they look the same. Sure, something like Python and C, you can tell the difference visually, but they still have the same feel because they still adhere to the same form. Kobold just doesn't roll with that. It has
Starting point is 00:49:42 its own meta-language for defining its own syntax. One of the nice things about Bacchus' Naur form is you can type it on a keyboard, you know, like you usually do with a programming language. It uses a lot of symbols, but they're all symbols on a keyboard. Kobol's meta-language is just its own thing altogether. It uses large brackets for displaying options. It's more graphical than textual. I'll link to some resources with examples, but the point here is the very bones of COBOL, right down to the operating theory on how to construct a language, are unrelated to nearly every other programming language. We can start to see that before we even get to COBOL's actual code. Like with FLOMATIC, a COBOL program is composed of more
Starting point is 00:50:32 than just source code. Initially, COBOL60 breaks everything down into three so-called divisions for defining data, environment, and procedures. That last one, the procedure division, is where actual COBOL code goes. The data division is where files, their formats, and variables are all defined. At least for me, this is the most mind-bendingly weird part of the 1960s specification. COBOL syntax, whatever division you're dealing with, is almost entirely composed of English words. But describing data needs to be more exacting. So the data division has this mix of verbose English explanations with numbers and very Cobol-specific abbreviations. Let's say you're describing an input file. That's a really common task.
Starting point is 00:51:27 You first have to define a file descriptor and give it a name like fdPayrollFile. Once you have a descriptor, you can start writing its layout as a tree-like hierarchy. This is done using so-called level numbers. Each line that describes the file has a numeric prefix that defines where it is in the hierarchy. O1 means it's at the root of the structure, O2 means it's below the root, and so on. You also have to define the type for each field. Now, I guess you could say COBOL is the most strictly typed language out there, sort of. The whole typing system is something totally different from anything I've ever experienced. This is accomplished using the PICTURE clause. The COBOL 60 manual describes PICTURE's function as, quote,
Starting point is 00:52:30 to show a detailed picture of the field structure and permit editing representations. What exactly does that mean? Like, I get it's used to define format, but that tells me so little. Why is it used to picture a field? The name picture comes from IBM's Comtran, but it really just sounds like they're using that name as a joke. It at least doesn't sound formal enough for a formal language specification. Going deeper into the picture clause doesn't really help matters. For example, let's say you wanted to find a floating point field called cost. It's going to store a money value. For that, you'd
Starting point is 00:53:11 write cost, pick, 9, open paren, 8, close paren, v, 9, 9, period. Now, go ahead. Tell me what that actually means. Unless you sit down with a manual, you probably can't. And believe me, it doesn't come across any better in print than it does in audio. The format syntax is a little wild. Here, 9 is used to signify a numeric value. The 8 in parentheses says that the field is 8 digits long. V, for whatever reason, is a decimal point. And then the trailing 9s represent two more digits after the decimal. And you have to end in a period because every sentence in COBOL ends with a period. So yeah, one of the design goals of COBOL was to be easy to understand for non-experts.
Starting point is 00:54:08 The data division doesn't fit that. There are also formatting options for characters, alphanumerics, spaces, optional decimals, and a handful of other types, including modifiers for left and right text alignment. types, including modifiers for left and right text alignment. You use these PIC clauses to describe fields in input and output files, plus for any variables. Later on, scope becomes a thing in COBOL, but in the 1960 spec, it's all global scope. Every variable has to be defined right where you go and define your file formats. What's interesting here, and something that's easy to miss, is that these pick statements serve multiple purposes. Sure, it's a kind of typing system, but it also handles formatting
Starting point is 00:54:59 and data parsing. Take the cost example I gave. If you output the value, it gets front-padded with zeros. No matter what the value cost holds, the printed output will always be eight numbers, a decimal point, and then two more numbers. At least that's a cool trick that simplifies programming. It makes your outputs consistent without having to constantly be formatting things. Now, if you've programmed anything before, you know that this kind of typing system isn't normal. This isn't how you do variables. If you're new to the whole realm of programming, then let me be the first to tell you, this is not how you usually do variables. Variables are a core feature of any programming language, so how you deal with variables dictates a lot about the language at large. Usually,
Starting point is 00:55:53 you have a handful of types for variables, such as integers, floating point numbers, characters, or strings of characters, just to name a few common types. In COBOL, you get to define your own types, which is actually really powerful. That's something that other languages could do, but not with the same level of flexibility as COBOL. And remember, when you're defining a data type, you're also defining its print format. The cost is that you have to explicitly define each variable's typing and format,
Starting point is 00:56:28 and you can only define that in the data division. So yeah, you get this cool flexibility, you get the power to make up your own variable types, but at the same time, this imposes limits on how you can go about programming. In something like ALGOL or C, there are restrictions on what kind of types you can have. You only have a handful of different variable types. But you can define these variables anywhere you want. It's a trade-off to be sure. What we get from all these factors is something I didn't expect.
Starting point is 00:57:04 The data division is competent, but difficult to use. To make matters worse, it bears little resemblance to the rest of COBOL. You see, the data and procedure divisions were designed by different subcommittees, so this is exactly what people mean when they say something was designed by committee. You can see these divisions in the code itself. Now, let's hit the procedure division next. This is where we start to see more of the FLOMATIC DNA. It's also where we can see more of COBOL's separation from other programming languages.
Starting point is 00:57:41 The 1960 manual describes this part of COBOL like a human language, like a language you'd speak. It's put in terms of verbs and nouns, subjects and objects. Each line of code is a sentence. They can be grouped into paragraphs. So yeah, this is very far removed from the mathematics-like programming languages. By comparison, the specification for ALGOL reads like a mathematical proof. Just like with FLOMATIC, much of COBOL is focused around data manipulation. Most keywords are borrowed from FLOMATIC directly. What's strange is that even standard programming conventions don't really apply once you're in COBOL world.
Starting point is 00:58:26 standard programming conventions don't really apply once you're in COBOL world. For instance, the basic conditional in most languages is the if statement. Usually you get something like if x then y else z. That's the norm. For COBOL, you get if x then y otherwise z. You can also construct compound and nested conditionals using the also if statement. In general, COBOL just has a lot of these little unexpected twists. If you haven't been trained on other languages, then you're probably fine. But coming from a background in anything else, you get confused from this really easily. One of these twists that I find wild to think about is Cobol's mathematics system. Or, I guess I should call it systems plural?
Starting point is 00:59:18 Man, this part is weird. But there is some logic to it, believe it or not. weird. But there is some logic to it, believe it or not. In COBOL, you can write out equations as you would with any other language using the compute statement. So if you need to do some math, you can just write compute C equals A plus B and call it a day. This follows all the expected rules for mathematics. Strip off the compute word, and this looks the same as an equivalent in Fortran or C. But there's a totally separate math system that uses no symbols. If you were so inclined, you could write out an entire equation in English, such as add A, B, giving C. You get all the hits, subtract, divide, multiply, exponentiate by. These statements don't
Starting point is 01:00:09 nest, so you have to use multiple lines to do more complicated math. Comparisons follow a similar pattern. You can use their symbols or the English equivalents. So why on earth is COBOL like this? Why on earth is COBOL like this? Under what circumstance would you want to write exponentiated by? Well, this brings us nicely into the environment division and the last major design goal of COBOL. This language was meant to be widely portable between computers. How do two separate math systems help with portability? Believe it or not, it all comes down to keyboards. Well, keyboards and character sets in general. Coble first hit the scene in 1960, and that's not exactly at the dawn of computing, but it's still early in the timeline. The idea of portability was just starting to become even something you'd want to discuss,
Starting point is 01:01:06 partly thanks to ALGOL. But we're still so early that no one really knows what portability will look like in practice. Later languages would approach this on a highly technical level, constructing code and compilers to be adaptable to different computer architectures. The big three issues were always memory layout, input-output devices, and computer character sets. In 1960, different computers were so incompatible that you couldn't even guarantee two systems could process the same text. The wild English language math system in COBOL solved the character set problem by catering to the least common denominator. In general, computers from any vendor could handle Latin
Starting point is 01:01:53 characters, numbers, and simple punctuation. The most exotic characters in COBOL are parentheses, so that fits the list. The weird data description system also starts to make more sense in the context of portability. One issue for early portable languages was figuring out how to handle variables. Computers from different vendors just handled memory in different ways. They operated on different sized chunks of data. How big should an integer variable be? Well, that could change from computer to computer, which could affect how code ran. COBOL circumvents that problem by just not having built-in types. Every variable you use has an explicit format. A compiler just has to figure out how to turn
Starting point is 01:02:40 a format into data in memory. But that's not set in stone by the specification. You get to do that for your own compiler. The net result is that COBOL can be easily adapted to work with any kind of memory. If your computer can store a single digit or a single character, it can work with COBOL. The third part, the input-output issue, was solved by COBOL's environment division. This part isn't very complicated at all. It just defines the type of computer you're using and defines your input and output devices. The compiler then figures out how to map those into files and whatnot. In general, that part of the program never has to change unless you're moving code to another computer. All things considered,
Starting point is 01:03:25 COBOL provides a really reasonable approach to portability before any other languages really tackled the issue. Alright, now this may sound like I've cut things off, but this is for a reason. This wraps up our discussion of the history and development of COBOL, and I think I've hit all but one of my personal design goals for this episode. Hopper was involved with the early development of programming languages in general. She created Flomatic, which was the closest thing to a predecessor of COBOL. But ultimately, she was only part of the larger committee that made COBOL. But ultimately, she was only part of the larger
Starting point is 01:04:05 committee that made COBOL itself. Hopper wasn't even part of the actual short-range committee that wrote the bulk of the spec. COBOL isn't necessarily a bad programming language, per se. It just exists in its own little world. It's isolated from other language families, its own little world. It's isolated from other language families, especially the algol-like languages that we all use, and that makes it appear alien to trained programmers. That doesn't make it bad so much as inaccessible to a lot of people in the industry. I'd be curious if anyone has recently taken up COBOL as their first language. I'd love to know if you had an easier time coming to terms with its idiosyncrasies. Now, there's still one central question that I haven't explicitly answered that I think will wrap this all up. Why is COBOL important? Why did it become popular,
Starting point is 01:04:58 and why does it persist? The CODASIL committee members, at least the ones who represented computer companies, all committed early on to producing their own COBOL compilers. Part of that agreement was that they would write to the COBOL specification as written. Compatibility here was key. By 1960, Codacil was already growing. Its ranks included Burroughs, GE, Honeywell, IBM, NCR, Philco, RCA, Univac, Sylvania, and ICT, plus Bendix and Control Data Corporation. That's a lot, with the only notable exception I can think of being DEC.
Starting point is 01:05:38 We're basically looking at the whole computer industry in America. Some companies planned to drop their own business languages entirely. Others, like IBM, planned to make their in-house languages COBOL-compatible. So once COBOL 60's spec is finalized, we see the industry at large putting the finishing touches on their own versions of the new language. And they're all compatible. August 1960 saw the first fruits of this work. That month, a COBOL program was compiled on an RCA mainframe. The same code, with almost no changes, was compiled and ran on a UNIVAC computer. The following years, more compilers were released, Codacil continued operations, and updates to the language were drafted.
Starting point is 01:06:28 What's utterly unique, at least in this early history, is that COBOL came out the gate with institutional backing. That's something that ALGOL never had at the same level. Neither did FORTRAN. Even later languages like C had to spread on their own. COBOL started out with a built-in market. Not only was the computer industry behind COBOL, so was the US government. They had paid good money for this language, so contracts started requiring code be written in COBOL. And I think that brings us full circle to 2020, the IRS having to fix a
Starting point is 01:07:08 bunch of COBOL code, and the growing realization that no one knows COBOL. Because if you're already in the computer industry, it's just not easy to learn COBOL. Thanks for listening to Advent of Computing. I'll be back in two weeks time with another piece of the story of computing's past. And hey, if you like the show, there are now a few ways you can support it. If you know someone else who'd be interested in the history of computing, then why not take a minute to share the show with them? You can also rate and review on Apple Podcasts. And if you want to be a super fan, then you can support the show through Advent of Computing merch or signing up as a patron on Patreon.
Starting point is 01:07:47 Patrons get early access to episodes, polls for the direction of the show, and bonus content. You can find links to everything on my website, adventofcomputing.com. If you have any comments or suggestions for a future episode, then go ahead and shoot me a tweet. I'm at Advent of Comp on Twitter. And as always, have a great rest of your day.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.