Command Line Heroes - A Language for the Web
Episode Date: April 20, 2021The Hypertext Markup Language (HTML) gave everyone a foundation for building and viewing the World Wide Web. In 1995, its standardization led to dominance. Its simplicity helped it spread. And its sol...id common foundation helped shape the internet. Dr. Belinda Barnet explains what kind of framework was initially needed to build and navigate the Web. Jeff Veen describes the three ingredients Tim Berners-Lee combined to create HTML: the ideal language for the Web. Gavin Nicol recounts the need to standardize the quickly-growing language. And Gretchen McCulloch points out how HTML instills an inherent bias for English speakers to develop for the web.If you want to read up on some of our research on HTML, you can check out all our bonus material over at redhat.com/commandlineheroes. Follow along with the episode transcript.Â
Transcript
Discussion (0)
In medieval Europe, scholars had to converse in Latin.
For centuries, the British monarch spoke French.
And today, the business language of India is English.
Official languages have the power to unify people, but they don't always reflect
everybody's lived experience. And when we look at not just a country, but a worldwide web,
that struggle to impose a standard language can grow to epic proportions. This season, we've been exploring a pivotal year in the history of tech, 1995.
We already heard how it launched the dot-com bubble and how it led to the privatization of the Internet.
But 1995 was also the year when HTML, the language of the web, was standardized.
HTML's rapid evolution was crucial to the web's development and growth.
But some basic assumptions about who a coder is and whose language mattered were locked into place at the same time.
And once we began digging into HTML's past,
we realized a language can become a standard,
but it can never be neutral.
I'm Saran Yitbarek, and this is Command Line Heroes, an original podcast from Red Hat.
Today, hypertext markup language, HTML, is the mother tongue of the web.
The standard markup language for pretty much everything you see in a browser.
But right there in its name is a much older concept that predates HTML.
And that's the idea of hypertext.
Back in 1945, the engineer Vannevar Bush wrote an article for Life magazine where he imagined a futuristic machine.
This machine would allow you to display information on a screen.
And that information could be retrieved from a microfiche storage device that sat under the desk.
Bush was thinking this up decades before anything like the World Wide Web. But what
he'd proposed was the start of something big. Belinda Barnett, Senior Lecturer in Media and
Communications at Swinburne University of Technology, explains. What was most interesting
is that you could create links between pieces of information from different articles
in order to create what he called a trail through information.
And so this was really the first incidence of technical device that would create hyperlinks.
For years, Bush worked as the head of the U.S. Office of Scientific Research and Development.
So, even when he was delving into a bit of fantasy, he was still being influenced by academic
practices. That classic rule of academia, where one person's work is constantly linked to other
authorities. Bush imagined a machine that would make those links come to life. A machine
that would work like an academic's mind, connecting to every other mind it had ever encountered.
He called this imaginary machine the Memex, combining together the words memory and extender.
And that's what it was, an extension of human memory, an extension of human thought.
Bush's memics inspired generations of computer scientists to pursue that holy grail of preserved and interlinked knowledge.
But, yeah, he still lived in the 1940s.
He'd have to wait another decade until the memics started coming to life. In the 1950s, Douglas Engelbart, who we've talked about on this podcast before,
was inspired to build a system of links, a living network of linked information.
And he brought together this idea of using a computer screen to display knowledge and information
and link it together in the manner of Vannevar Bush's trails
and create a system, which he eventually got funding for
at the Stanford Research Institute,
that was the first hypertext system.
Of course, he didn't call it hypertext yet.
The word itself was coined a decade later, in the 1960s, by philosopher Ted Nelson.
Though, as Barnett tells it, Nelson's version differed from Engelbart's in important ways.
Ted wanted something far more freeform, more like, as he put it, thought itself, which is kind of meanders between things
and is there's no restrictions to what you can connect to or at what level you can connect it.
He had also imagined that hyperlinks would not be one wayway, but that they would be two-way.
But this basic concept of connecting together different pieces of information associatively
and forming trails through the information
was certainly evident in Ted's thinking in the 60s.
That distinction between one-way links and two-way links
has pretty profound consequences.
A web composed of two-way links would arguably
create an entirely different online experience.
And at that point in history,
there was no obvious form that hypertext had to take. Our linked future was
still being imagined. Douglas Engelbart's hypertext system, which was called the online system,
was not especially user-friendly. Only the truly technical were able to use it. And for years,
most early hypertext attempts had the same roadblock. But then,
along came a computer scientist named Tim Berners-Lee. While working as a contractor for
CERN in 1980, he created a document-sharing program called Inquire. And nine years after that,
he wrote a memo laying out a plan to use hypertext to take his work onto the global stage.
The result? A hypertext markup language was unlike any hypertext system that had yet been developed.
Designing HTML and HTTP was really, really easy because I was the only one doing it. Berners-Lee had delivered something deceptively simple, something democratic, something that took hypertext across all borders. on your own computer, but to link between countries and to all kinds of different places
outside of your own machine and your own unique operating environment.
That vision of a system to link computers and share files across huge distances brought
us closer than ever to Vannevar Bush's idea of a memex.
What had begun as a bit of abstract philosophy had suddenly transformed into a reality.
We asked Jeff Veen, who was part of the founding web team at Wired Magazine,
to talk us through the three ingredients that made HTML an ideal language for the web.
Today, Veen is a partner at VC fund TrueVentures.
First, there was the format's utter simplicity.
So you would essentially do a lowest common denominator for all kinds of documents.
Because back then, every word processor had a proprietary format.
We didn't all just use Word docs.
There was WordPerfect and, youPerfect and there were so many other types
of files that it was impossible to share files with each other. So he made a very, very simple
file format. Next ingredient, a way to transport those simple files over the web. So you still see
that when you type in a web address or something and you see that HTTP, that hypertext transport protocol. And that was, again,
very, very simple, but easy enough that anybody could quickly make a browser or make a web server.
And so it caught on really quickly as being very simple. And the third and final ingredient,
Berners-Lee created URLs. Uniform resource locators, right? And that was a standard address
so you could find things.
Because again, back then there were all kinds of different systems. There was FTP for file
transfer and Gopher was a rudimentary information system. And you could tell that to other servers.
There were all these different protocols and places where servers would be.
With those three ingredients, a simple file format,
simple transport protocol, and URLs,
Tim Berners-Lee had created a framework so easy, so infinitely adaptable,
it seemed to appeal to everybody that was racing onto the World Wide Web.
He had taken the established idea of hypertext
and made it global for the first time,
allowing us to not just link depending on your frame of reference, is that it was so simple that it didn't systems had offered, you weren't going to find any of that with your basic HTML.
It didn't have any of it, so many people, I think, discounted like, oh, this stuff is way too simple. It'll never catch on. Except those people were wrong. In fact, it was exactly the pared-down simplicity that made HTML incredibly popular.
In the mid-90s, it didn't even have 20 tags, 20 things you could do.
A few headers, some citations and block quotes.
It was pretty bare bones.
But that meant the barrier to entry was super low.
You could easily learn HTML in an afternoon.
And everybody did. That's why a tech thriller
movie like The Net made sense in 1995. Hacking into the internet's guts felt like something that,
sure, Sandra Bullock could probably pull off. Another obstacle that turned out to be a mirage
was the question of broken links. In the past, people had
fussed about the fact that moving a document meant you had to alert everyone who linked to it,
get them to update their link, or else the connection will be broken. Maintaining all
those links seemed incredibly onerous. But Berners-Lee decided not to worry about the
broken link problem at all. He gained
it out. A few broken links were the cost of simplicity and easy connections. Look, I'll just
make my stuff and you make your stuff and we'll link them together. And if they move, it breaks,
you know, and things like that helped with the popularity of the growth of the web.
As the web grew, fueled by the rapid adoption of HTML, that pioneering attitude,
if it breaks, it breaks, was going to spread. Soon, people were sprucing HTML up, adding their
own tags without asking anybody's permission. And everybody had their own idea of how it should
evolve. A young Mark Andreessen was a classic example. He was working on the Netscape
browser and decided to give it an image tag. The authorities back at CERN thought they'd instigate
a thoughtful back and forth to determine how an image tag might work. Were we talking about
embedded objects of any kind? Maybe just icons? But Andreessen wrote back to inform those so-called authorities.
Yeah, I already made the image tag. You guys can make your own if you want.
And things just got more tangled from there. It was sort of the Wild West for a number of years,
especially as Silicon Valley sort of woke up to like, OK, this web thing is huge. And there is the potential for very
big businesses to be created here. If you're a longtime listener of this podcast,
you've probably heard me talk about the browser wars. Starting in the late 1990s,
web browsers like Microsoft's Internet Explorer and Netscape's Navigator and a whole bunch of
others began competing for dominance.
They all wanted to be the world's portal to the World Wide Web.
To distinguish themselves, everybody was doing what Mark Andreessen had done with his image tag. They were just creating their own thing, modifying HTML however they wanted.
The language and the experience of coding for the web was beginning to branch.
I would say that, you know, standardization was a necessity.
Gavin Nickel is a technology evangelist and CTO at Context Labs. He was in the field at the time
and remembers Tim Berners-Lee's first proposals for the World Wide Web. He pointed out that it
wasn't so much all those new tags that created
a problem. It was the fact that nobody was making the tags cohesive. Okay, hey, anybody can create
a tag, but the question becomes, if you have a tag, how do you display it? And how do you hook
up an event handler to that tag? So to a certain degree, even though there were some issues about,
okay, well, who gets to define the common set of tags. If the
web kept going down that road, it would soon become so disjointed there'd be siloed experiences
in each proprietary browser. In the same way that engineering task forces created SMTP
to standardize email or FTP to standardize file transfers, there needed to be a standard for how HTML would work.
It was absolutely crucial to the adoption of HTML. It would have happened with or without
the W3C, but it was crucial. The W3C that Nicole just mentioned
is the World Wide Web Consortium, founded in 1994. The consortium took it upon themselves to bring a little order to the growing chaos of
the browser wars. The W3C would provide guidance and standardization for HTML so that whatever
tags anybody might create, they would all be interoperable across any browser. This was the
task, standardization, that would allow the web's potential to be fully realized.
Standardization would allow a truly global web experience.
To be honest, the W3C was a great convening body and a great independent third party to get everybody who had strong commercial interests to sit down and talk about interoperability. So, in 1995, HTML was elevated from a bare set of tags to HTML 2.0.
In its new life, HTML became aligned with SGML,
the Standard Generalized Markup Language,
and standardization was brought to its processing model,
to its data representation, and to the handling of characters and coding.
So it was the first rigorous standard, and that was key for interoperability.
HTML 2.0, the first standardized version, was released in 1995.
Problem solved, right? Everyone in the global internet community could look forward
to a happy, interoperable future. Not quite. Think for a moment about the global tech reality
that people lived with in 1995. Things may have been nicely ordered from the perspective of
Silicon Valley or from the perspective of big shots at the World Wide Web Consortium,
but that's not the whole world.
It was far from including everybody.
At that point, much of Asia and sub-Saharan Africa were still offline.
The idea of a global village was introduced by Marshall McLuhan in the 1960s,
but decades later, much of the world remained disconnected.
And that meant they didn't have a seat at the table.
They weren't part of any standardization process.
The new standardized HTML was a gateway to web development, but a gateway created by English speakers with all their biases along
for the ride. The only way to read HTML was from left to right, and the only way to write it was
in English. A standardization may have been inevitable, but it also created exclusions.
It ran the risk of turning non-English speakers into second-class citizens of the web.
And that was not inevitable. That was something people could fight to fix.
One of the big things that having programming languages be based on English does is it creates
this barrier for access to the field of coding.
Gretchen McCulloch is an internet linguist and the author of Because Internet,
Understanding the New Rules of Language.
We asked her to walk us through some of the ways English programming languages and markup languages give one part of the population an advantage.
And it goes way beyond being able to read commands.
If you can't read what people have put on Stack Exchange, or it takes you twice as long to read
because you don't speak the language, then that's also a significant barrier to using a programming
language. It's not just the technical words used in the code. It's also reading the help
documentation, reading the Q&As, reading all of the meta information about how to use that code, the libraries, all of this additional stuff, that all exists in English.
After 1995, a non-English coder working on the web would have been guaranteed to run into those barriers everywhere. Standardization guaranteed it.
Those of us in the English-speaking world often forget
about these hurdles. We think we're on some even playing field, coding alongside our peers in China
or India or Germany. But if you're a coder and your first language is English, just ask yourself,
what would your career look like if the code you were writing and all the supporting documents and communities you rely on, if all of that were in Chinese or Swahili?
Would your career be the same?
Would your ambitions?
Gavin Nickel was inspired to make HTML work for everybody.
In the 1990s, he was living in Japan, and he noticed firsthand how difficult it was for non-English speakers to code for the web.
It was a mess, to be frank.
Nickel was the one who looked at HTML, which at the time had no real character processing model, and decided to find a way to let everybody use it.
His solution was to use Unicode, a standard that handles text in almost all the world's
languages. By adopting Unicode as his character set, he managed to establish a formal model
for the internationalization of HTML. Partly it was practical, but also there was a mission
aspect to it as well. You know, the practicality came from the fact that I was working at NEC or living in Japan.
I was like, hey, listen, you know, I would love to be part of this global conversation.
But then also, I still believe this is, you know, there's an underrepresentation of non-Latin
voices in the global commentary.
So I really wanted to, you know, accelerate the pace of that.
Despite the great work that the W3C folks had done in 1995,
it was up to people like Nicol to take the HTML standard and open it up to all the world's
languages. And it became kind of a mission for me to make that easy for non-native speakers,
because I firmly believe that if you force everyone to speak in English,
you force everyone to sort of think in English. And that's a very sad thing,
because you end up losing the culture that is associated with the language.
Nicol believes people must be allowed to communicate and work in their own language.
To make the web an English-only zone would mean cutting off part of our shared humanity.
There's a thing called Conway's Law.
It's kind of like systems tend to evolve to represent the organizational structures that created them.
To me, Conway's Law is a kind of warning.
Make sure the organizational structures represent all of us, or else.
Don't be surprised when the systems
that evolve lock some people out. 1995 was a quarter century ago, and HTML has evolved to
HTML5 today. But the work is far from finished. Look around at the coding landscape, and English is still taking up a lot
of the oxygen. Sometimes it can feel like a foregone conclusion. Pascal, for example,
was created by a Swiss computer scientist who made it in English to appeal to the rest of the world.
Python, same thing, written in English by a creator in the Netherlands.
And Ruby uses English too, even though it was made in Japan.
Here's Gretchen McCulloch again.
So if you're a non-native English speaker and you're thinking,
oh, I want my programming language to be adopted by the most number of people,
you might say, well, I know that people are used to coding in English-based things.
I know that I've gotten used to coding in English-based things. I know that I've gotten used to coding in English-based programming languages. So I'm also going to create my programming language that has English-based keywords because that's what people are used to.
It's a feedback loop and not a great one.
One of the things that I think we could do as a short-term way of calling attention to
the problem is when we talk about programming languages where the keywords are based on English,
we could call them that. The first website wasn't written in HTML. It was written in English HTML,
which opens up the possibility that what would Spanish HTML look like? What would Russian HTML
look like? What does your HTML look like? How will you get to program on the web?
And how will you make sure everyone else can do the same?
I mentioned at the top of this episode that, in medieval Europe, reading and writing meant working in Latin, even if you didn't speak Latin every day. Only Latin was allowed as a tool
for accessing the technology of writing, the technology of the printing press. Today, we look
back on that and it makes little sense. But how different are we really? Isn't it just as ridiculous
to expect everyone to code in English? And by the way, all you English-speaking coders out there, one day,
the shoe could be on the other foot. I don't think it's likely in the short term for a programming
language based on a language other than English to become dominant, but it's entirely possible
in the long term, because we know that Latin didn't last forever as the lingua franca.
1995 was the year that HTML became standardized.
But that moment in tech history sparked a decades-long discussion that's continuing to this day.
We're still finding ways to make the web's language a platform for everybody.
And this matters because we have no way of knowing what people from different backgrounds, different languages will build, what apps they'll design, what code they might write once they're given the chance to work with their own voice.
We just might be amazed by our own diversity.
And maybe that is the standard we should all be reaching for. Next time, we're diving into another of 1995's biggest
transformations, the fantastic emergence of web designers. Until then, find bonus material about HTML and all our 1995 stories over at redhat.com slash commandlineheroes.
I'm Saran Yitbarek, and this is Command Line Heroes, an original podcast from Red Hat.
Keep on coding.
Hi, I'm Mike Ferris, Chief Strategy Officer and longtime Red Hatter.
I love thinking about what happens next with generative AI.
But here's the thing.
Foundation models alone don't add up to an AI strategy.
And why is that?
Well, first, models aren't one-size-fits-all.
You have to fine-tune or augment these models with your own data, and then you have to serve
them for your own use case.
Second, one-and- done isn't how AI works.
You've got to make it easier for data scientists,
app developers, and ops teams to iterate together.
And third, AI workloads demand the ability
to dynamically scale access to compute resources.
You need a consistent platform,
whether you build and serve these models on-premise
or in the cloud or at the edge.
This is complex stuff,
and Red Hat OpenShift AI is here to help.
Head to redhat.com to see how.