Programming Throwdown - Wolfram Language and Mathematica
Episode Date: January 15, 2019Happy New Year! Today we are sitting down with Stephen Wolfram, inventor of Mathematica, Wolfram Alpha, and Wolfram Language! In this super interesting episode, Stephen talks us through his j...ourney as a mathematician, software architect, and language inventor. It was truly an honor to talk to Stephen and hear about his decades of experience. Check this interview out and give us feedback! Show notes: https://www.programmingthrowdown.com/2019/01/episode-86-wolfram-language-and.html ★ Support this podcast on Patreon ★
Transcript
Discussion (0)
programming throwdown episode 86 wolfram language with steven wolfram take it away jason
hey everyone happy new year welcome back and we are starting the year off with an amazing interview.
You know, very rarely do we get the opportunity to interview someone who actually invented a language.
Every show we always talk about a language.
And here we have Stephen Wolfram on the show.
And Stephen, why don't you kind of describe yourself and Wolfram research and kind
of walk us through your journey? Oh boy. Okay. This is a describe yourself in a few words. I
remember I could tell you some interesting stories about times when people have said that to me, but
All right. Well, we have an hour, so you have plenty of words.
Well, let's see. For the last 32 years, I've been working on what's now Wolfram language and a couple of
things that many people will know about have come out of that one is Mathematica which was first
released in 1988 so we just had our 30th anniversary and the other is Wolfram, which came out in 2009. And Wolfram language is kind of the core of those
things. I kind of started working, well, gosh, let's see. What's the story? Well, I got really
interested in physics when I was about 10 or 11 years old, which is a depressingly long time ago now.
That's the beginning of the 1970s. And one of the things about doing physics is you have to
calculate all kinds of things with math and so on. And I was very enthusiastic about physics and
very unenthusiastic about calculating things with math and so on. And so I kind of said, this is really boring and very mechanical
and should be able to be automated.
And that time was about 1972, 1973.
I first got exposed to a computer, which was a big thing the size of a large desk and so on,
programmed with paper tape and all those terrible things. But I started trying to figure out
how to essentially automate the computations that I might want to do for things like physics.
And that worked really well. And I discovered all kinds of interesting things in physics.
And I got my PhD when I was 20 at Caltech. And then I kind of, at that moment, actually, I was like, okay, I've been using all
these computer tools and I've kind of outgrown the ones that exist today. If I want to do all
sorts of science and things that I'm interested in, I'm going to have to basically build my own
tools. So I started building. So can you dive into that thought process a little bit? So how did you decide that you had sort of outgrown the tool, right?
I mean, people say Turing complete and all that stuff, right?
I'm sorry?
Oh, I was just going to say, you could say, oh, you could do anything in assembly language.
You could do anything in C.
So it actually requires, I think, a lot of, to some degree, experience,
or maybe wisdom is the right word, but to recognize that, you know, the productivity
hit you would take from having to build something, some tool from scratch is justified by the gain
you would get once you have that tool, right? Yeah, it's been interesting in my life. You know,
I spent, for example, a decade doing big basic science projects starting in 1991. It was after the first version of Mathematica came out. And I realized, it would have taken me 30 years to do
what I managed to do in 10 years. So the five years before that that I spent developing Mathematica
as a tool was, in the end, I was coming out way ahead relative to saying, I'll just cobble
together what I need to be able to do the things that I want to do. But back in 1979, the story was in a sense simpler because I had been, so the idea of sort
of doing mathematical and algebraic computation by computer was one that had originated in the
1960s. People had built a variety of kind of research systems for doing that. And I had used kind of most of the ones
that were at all practical for some strange reason. Other people never used them. It was,
it was like there were all these physicists and they were spending all their time with,
you know, pencil and paper and so on calculating things. And it was like, no, you know, I can just
use a computer to do that. So at that time you're talking like Fortran, mainly Fortran, right?
Well, at that time, most theoretical physicists did not use computers.
I mean, what happened was if you needed a computation done, you would find some programmer who would go off and do the computation.
And in terms of programming languages, yeah, Fortran was the main one.
It's still Fortran 66.
And, you know, if you were a real programmer,
you used assembly language.
And I remember people telling me,
oh, even in the 80s, people telling me,
oh, you know, all these high-level languages like C,
they're never going to catch on.
You know, if you want to do something serious,
it has to be assembly language.
Yeah, yeah.
That's kind of a story that will repeat because in terms of Wolfram language,
you know, it's kind of a higher-level language than one seen elsewhere. And people still say, oh, if you want to do
something serious, you have to use some lower level language. It isn't true. And, you know,
the course of history, you know, I'm very confident in the course of history in terms of that.
Yeah, if you're willing to play the long game, you know, Moore's Law always works in your favor.
And eventually, almost any of these optimizations that you get from C end up not being that important.
But I don't think that's the point.
I think that the real point is it's sort of the conceptual level that you can reach.
I mean, we're jumping around a lot.
And I mean, this is, we're jumping around a lot and I mean, this is super
interesting stuff, but if you want to go back to the narrative of how I came to start.
No, no, jumping around is fine. Yeah. Let's talk about the trade-offs between,
say, Mathematica and writing it in C. I think that's interesting.
Well, okay. So first thing you have to understand is in terms of Wolfram language,
how do I view that?
What do I think of it as?
I view it as a computational language, a little different from a programming language.
It's a super productive programming language, but more importantly, it's a computational
language.
It's a language for communicating computational ideas in an explicit, concrete form.
And it's a language that allows computational ideas to be
communicated and understood both by people and by machines. So kind of my goal in Wolfram Language
is to do something really very different from the goal of other kind of the traditional
programming languages. You know, the traditional programming languages, it's kind of, okay, I've got my computer in front
of me.
I need to tell it how to operate its transistors, so to speak, in the best possible way to get
what I want done, done.
What I'm more interested in is how do I represent in a language the way that I think about something computationally?
And then it's the job of, you know, us as the implementers of the language to get the computer
to take sort of the goals that humans have expressed and get them implemented as efficiently
as possible. Now, in terms of, you know, what we've tried to do in Wolfram Language is to build
as much knowledge as possible into the language. So, you know, typical we've tried to do in Wolfram Language is to build as much knowledge as possible into the language.
So, you know, typical language, there's a fairly small core language, and then maybe people add libraries and all kinds of other things to it.
But in Wolfram Language, the idea is have it have as much as possible built in in a coherent way into the language.
And that's a lot of work. I mean, you know, the last 32 years, basically, I've spent a large fraction of my time trying to fill in the oh, then and just build lots of stuff
coherently into the language. I mean, pretty much, you know, every day, I'm working on kind of the
design of the language. In fact, I, in the last for the last almost a year now, I've been doing
actually live streaming a bunch of our internal design
review meetings. I think we just passed 250 hours of live stream design review meetings.
So that's to the public? So anyone can just watch that?
Yeah, absolutely. It's on Twitch and Facebook Live and YouTube Live.
Wow, that's awesome.
It's really pretty interesting dynamic. I mean, there's some pretty interesting, sophisticated people who tune in, and they give some great
comments.
I think we've done better design as a result of things people have suggested.
It's always interesting because, like, today, let's see, I did one today.
Probably the one yesterday is, I did a couple yesterday, actually.
One yesterday was interesting, was about,
well, this relates to the whole thing about the language knows stuff.
So one of the things it knows is,
it knows geographic data about the world.
So it knows, you know, the locations of all the cities,
the populations of all the cities.
We happen to have curated the borders
of all historical countries that are known.
So there's a few thousand historical countries from, you know,
the Roman Empire to, you know, the different gyrations
of different countries and so on.
And so we got all that data, and the issue is how you deal
with the language design of talking about a historical country.
So, you know, we were starting off looking at, well, I was like,
let's look at the Roman Empire in 55 BC.
And then one of the guys who's worked on this stuff said,
well, actually the Roman Empire didn't exist in 55 BC.
It was the Roman Republic in 55 BC.
And it's like, okay, what does it mean for the entity?
You know, what counts as France, for example?
You know, it's changed its name at
various times. And so, you know, so these are questions that are an interesting mixture of
sort of real world issues and kind of precise language design. But more than that, this was
actually mostly about how do you think about the sort of time series character of country borders,
given that one only knows them every year,
and how does one deal with the fact that one has a time series which kind of holds to one side,
but not to the other, etc., etc., etc. But this is kind of the story of my life, is trying to figure out how to take these kinds of things and make a precise symbolic
language in which one can talk about, let's say, historical countries or today's live
stream thing happened to be about the display of data sets and figuring out how to parametrize the way that a hierarchical data set is displayed
and elided and so on.
So one thing that has always interested me is this idea of sort of push versus pull when
it comes to data.
So for example, if we take, let's say, Google's web mirror, like Google's indexer,
right, they crawl the web, they gather a bunch of data. But the websites themselves are not
particularly participating. You know, they're sort of involuntarily participating.
Yeah, I mean, what we've done is very different. I mean, you know, what we've been interested in
for the last, I don't know, 20 years or so is collecting and curating as much data as possible about the world.
And so what we've done mostly is we work with sort of the primary providers of data, or we collect it ourselves, and we try and make as good as possible a kind of computable version of the world.
There's a big difference between data that's good enough for somebody to read it on a webpage
and data that's good enough to compute from.
You can say, well, you can do some NLP extraction of stuff and you can get it 80% right, and
that's fine so long as you've got a human to see that 20% of it is totally ridiculous. In fact, one use case that we've been much involved with
in computational contracts and smart contracts on blockchains and things like this,
in the end, if you want those to allow one to run the world by machine-to-machine contract
interactions, you have to have some source of facts about the world by machine to machine sort of contract interactions,
you have to have some source of facts about the world.
You know, you're selling weather insurance.
You want to know, did it actually rain yesterday in wherever?
And so, in fact, right now, the source in the world, for better or worse,
of computational facts is the Wolfram Alpha API, because we've collected
and made computable basically a huge range of kinds of facts in a way where those facts can
actually be used as inputs to computation in a consistent fashion. Oh, I've heard about this
where there's a, I think it's a betting site where you can bet on facts
you could basically you could bet you know there's a prediction tomorrow and
it's all done on blockchain yeah yeah I mean you know that that that particular
activity egregiously violates our Terms of Service okay all right people do it
all the time it's it's you know we we hope to, you know, one of the issues is, you know, it's like, who's liable if we say so-and-so won the election in Bolivia or something?
And, you know, we do our best to get it right.
But it turns out, you know, a lot of people bet on it one way and goes the other way.
It's like, how does this work?
I think what's emerging is whatever Wolfram Alpha says is true is true, which puts us
in a curious position.
And we're sort of working to make our sort of computational fact infrastructure, which
deal with these kind of, you know, deciding things at that moment, so to speak, in as robust a way as
possible. But I think we've gotten to much of the computational fact, computable knowledge that we
deal with is knowledge that wasn't stuff that happened today. I mean, we do deal with plenty
that happens today, you know, earthquakes,
flight times, you know, all this kind of stuff. But I would say the majority of stuff people are
using right now is sort of facts that might be, you know, properties of a chemical or, you know,
box office results from a movie or something like this. Things which are, you know, firmly in the past, so to speak, rather than things that,
you know, just happened and are uncomplicated to verify.
What about, is there, so, you know, there were things like OpenPsych, for example,
where people could specify, you know, a list of facts about the world. Is there any way to sort of scale this outwards so that anyone can contribute in some
way to deal with Byzantine contributions and things like that? Have you thought about things
like that? Of course we have, yeah. Of course. We've had a whole program of volunteer data
curators and so on. It hasn't been particularly successful. I would say that there are particular well-columated types of data where people have been very good at contributing and been very helpful to us.
Like, for example, properties of fictional characters in books.
That was done by a big volunteer effort.
Oh, okay. More precise kinds of data, usually there's some sort of professional data collection
operation that's doing it, or nobody has done it, but to do it consistently is non-trivial.
What we found is we employ lots of data curators, and we found that's more efficient than, you know, having volunteers do it because it's just, you know, we can train them.
They use a bunch of nice tools, which we could make available to people.
But, you know, it takes some training to get to be a good, efficient data curator.
And, you know, it's not everybody's cup of tea.
That makes sense.
I think there's also great
ambiguity in the structure of almost any data and so if you just give it to this heterogeneous
group of volunteers you're very rarely going to get a consistent structure yeah i think what one
of the challenges of data curation and and you know a thing i hadn't really realized but you
know we've built this whole sort of culture of doing data curation and i hadn't really realized, but we've built this whole culture of doing data curation, and I hadn't really realized how unique that is.
Because we do a bunch of work with all sorts of world's largest companies making basically private versions of Wolfram Alpha.
So the Wolfram Alpha that people know that powers Siri and also has just started powering Alexa, that's working on public data.
But we've also had a sort of growing business on making enterprise versions of Wolfram Alpha,
where we're taking the internal data of some large company and letting people ask unstructured questions about that internal data,
making use of public data plus
their internal data. So somebody might say, what were my sales between Christmas and New Year
in sub-Saharan Africa or something? And for that, you have to know things about the world,
like Christmas and New Year. You have to know what's sub-Saharan Africa. You have to be able
to know currency conversion rates, all those kinds of things. And to answer that question, you have to go be hooked up to the internal databases of
company X to go and answer that.
So we've done a bunch of those things.
And one of the things that's been interesting is that people are sort of horrified.
You mean you actually have to go and do things with humans to make all this stuff work?
Well, yes.
I mean, you know, it's we can do.
We have very good, you know, we've got very leading edge machine learning stuff and NLP stuff and so on.
But, you know, that's not if you really want to make computable data, you can't use that stuff.
You can use it to help prime various pumps, but in the end, humans and large volume of work to be done and there's
also judgment to be exercised.
And it's a tricky management issue how you propagate judgment calls up and down the chain
of people who are working on this type of thing you know at what point do you propagate you know I don't know in in you know is this a I don't know like if
you're doing image curation you know does this really count as an elephant or
is it a model a plastic elephant you know what does a do we really call this
an elephant or do we call it a toy yeah and somebody you know at some point
somebody has to make a decision like that. But you have to kind of propagate that to the point where that decision is made by somebody with appropriate experience and who knows how it fits together with the rest of the system. But the person who's actually looking at pages and pages of images doesn't, you know, isn't the one who has to be figuring that out all the time. I feel like there's something kind of philosophically profound about that.
You know, there's basically two types of jobs.
There are jobs that have almost infinite scalability
and then there are jobs that don't.
So for example, being a musician now
is one of these jobs that scales
in the sense that some famous musician will go in a studio and record a song once,
and then it gets played on the radio over the whole world to millions of people.
Being a software engineer is also one of these scalable jobs where you can design some software that's used by billions of people
with only, let's say, a few hundred engineers, right?
Yeah.
And everyone is really worried about, say, automation, taking a job that isn't scalable,
like truck driving, and turning it into a scalable job.
But in your case, you're kind of doing the opposite, where you are taking a job that
had been done with, say, NLP poorly, and you're changing the scale, changing that
sort of dynamic to get something that has really high fidelity.
And it's funny how even intuitively, on the surface, my first reaction is, oh, that doesn't
scale.
But maybe that's actually a good thing, right?
Maybe what we need is to have more of a human touch. Oh, I don't scale. But maybe that's actually a good thing, right? Maybe what we need is to have more of a human touch.
Oh, I don't know.
I think the fact is the thing to understand is what does it mean it doesn't scale?
The world is finite.
You know, when we started working on Wolfram Alpha, I had been interested in building something like Wolfram Alpha since I was a kid.
I've been interested in kind of being able to sort of collect information, answer questions from it, and so on. I mean, horrifyingly, I found a bunch
of stuff I did when I was like 12 years old, which is sort of pre-WolframAlpha stuff with,
you know, typewriter rather than, you know, data centers, so to speak. But it's always upsetting.
In a sense, I don't know whether it's upsetting
or satisfying to discover one's really been doing the same thing all one's life.
But anyway, I was interested to, you know, I'd been interested in kind of, can you
make computational knowledge real since I was a kid? And, you know, I had actually always assumed
that one had to kind of invent the whole of AI to do that. And eventually,
as a result of a bunch of science I did, I kind of realized that wasn't the case.
But when we started doing the Wolf Malfa project, it was like, okay, let's take the team over to a big reference library. And let's look at this reference library. And it's like, it's a big
thing with, you know, full of books. And my basic statement was, okay, over the next few years,
we're going to grind all of the knowledge that's in here, and we're going to make it computable.
And, you know, it's a daunting task, but it is finite.
You know, it's a finite-sized room, so to speak.
And we got much more knowledge than was in, you know, the big reference library.
But it's worth realizing that, you know, our civilization has only collected sort of a finite amount of knowledge about things.
It's always surprising, you know, the amount of sort of purely structured data that there is in the world is quite big compared to, for example, the text content of the web.
I mean, I don't know whose estimate you want to take, but let's say the web is, I don't know, some modest number of tens of billions of pages
of text. There's actually much more of that in structured data that even we have
for Malpha. So in other words, even if you say, well, gosh, let's use the scaling of the web
and try and deduce knowledge from it, there actually isn't that much there.
That makes sense, yeah.
But I think, for me, there are two big multipliers that go beyond the sort of curate the data
type thing.
One is knowledge about computation, knowledge about algorithms, how you compute things from
that data.
Because actually, most of the time, people don't just want that one number that was sitting in a database. They want
something that was the particular thing they wanted to know that is computed using some method
or model or algorithm or something from the raw data that might sit in the database or whatever.
Right. Or they're even looking for some signal. They want to decrease the entropy. So for example,
they have a bunch of sales data and they want to know which countries are outliers. So they want some decision tree that's going to say, this area of the world is performing great, this area of the
world is performing poorly, and they want your system to extract that high level information.
Right. But you see now, take that
example. It's an interesting example because it's an example where the kind of knowledge-based
approach that we use in Wolfram language is important. Because let's say you've got that
raw sales data. You've got the table of numbers. You've got the number for Luxembourg. You've got
the number for Germany. You've got et cetera, et cetera, et cetera. On its own, those numbers aren't all that significant because if you don't know, let's
say, the population or the GDP or the number of internet users or something in each of
those countries, it's hard to make all that much from it.
And so it becomes really important to have good computable data that you can use as part
of the kind of pipeline of actually doing things.
You could say, well, I'll go scrape that data from somewhere.
Okay, great.
Then you scrape the data, you discover that countries that have spaces in their names
don't work, et cetera, et cetera, et cetera.
This is what I've been trying to do is to kind of add this kind of layer of kind of automated computational intelligence that people can kind of take for granted in computing things. ago and so on, when the first computers were coming out, it was like, well, you could take
for granted the fact that there'd be some, what was called a high-level language like
COBOL or FORTRAN on that computer.
Then a few years later, you could take for granted there'd be an operating system on
the computer.
Then a few years later, take for granted there'd be some kind of user interface or some kind
of networking support on the computer.
And what I'm trying to do, basically,
is to let people take for granted a sort of layer of computational intelligence that they can expect
to find on a computer. And where we've tried to take sort of the knowledge that our civilization
has accumulated and try to sort of inject that as something that can be automatically accessed
by anybody. That's the bigger picture of what we're trying to do and a couple of branches of
doing that. In Wolfram Alpha, we're trying to make a consumerized drive-by version of that
where you just ask a question with natural language. You can ask it by voice through intelligent assistance,
or you can type it on the website or whatever.
And the natural language, like the let me ask a question,
like what was the population of France in 1952 or something,
that's something one can easily ask as a kind of natural language question.
And that's what we're doing with Wolfram Alpha. With Wolfram Language, we have a precise language
in which a question like that can be expressed. It's very simple to express, but in which one
can build up much more sophisticated kinds of things in which one can sort of represent. It provides a more
precise way than human language to represent what one's talking about when one thinks about
things computationally. I mean, a way to think about this that sort of emerged recently is when
we think about contracts, people want to say, I want to say precisely what's going to happen.
Well, they try and write it in English. Actually, they wind up with some kind of legalese that's some kind of very stilted English
often. But what we're trying to do is to make it possible to say what you want to have happen in
code, in precise computational language code. But that code, in order to talk about what you want
to have happen, it has to be able to talk about things
in the world.
It's no good to just say, okay, I'm going to declare that there's an array of integers
of length whatever.
That's not really good enough.
You have to be able to say, if the person is more than 50 miles from this place, then
whatever.
That requires that you understand geo positions and distances and,
you know, all this kind of thing. You have to actually understand stuff about the world to
be able to express these things. But what I've been trying to do is to make a language in which
you can express those kinds of things, but you can do it precisely and you can do it in a way
that you, where you can kind of build up, of build up many, many layers of complexity.
So in a typical, when you interact with Wolfram Alpha, it's like the typical thing will be
a fragment of a sentence or maybe a whole sentence.
When you look at what people do with Wolfram Language, there are probably the biggest code
base we have.
I don't know what our total, I know Wolfram Alpha is about 15 million lines of Wolfram language code.
I think our whole Wolfram language code base, which is mostly written in Wolfram language now, is around 50 million lines.
So it's a rather dense code because it's very high level symbolic representation of things. But so, so kind of the idea is that,
that one be able to, you know, my, my goal is to give people sort of a, to be able to let them
sort of in every line of that code, make use of all the knowledge that, that our civilizations
been able to accumulate. And, you know, don't have to build everything kind of from the sand up,
so to speak. Yeah, that makes sense. I think it's fascinating. One of the things that I've felt in
the past is the way, and then I'll jump to something else, but the way Google has sort
of crawled the web, it's really kind of, I mean, you know, it's a great service, but one of the
issues with that is the data, as you said, hasn't been very structured and other people haven't been able to play a role.
Even though you might not have volunteers, but you're still going out to these data providers who are ultimately going out to the actual source of that information.
And there's this sort of two-way conversation.
And I think that adds a lot to the richness of the data, which I think is awesome.
What do you deal? How do you deal with, let's say, questions around saliency?
So if someone wants to say, you know, tell me something interesting about my profits in the past 10 years or something like that.
Like, how do you avoid you? How do you how do you take into account someone's mental model?
So so when they ask something,
it's not just sort of a laundry list of details,
but you kind of take into account their intuition there, right?
Right. I mean, it's interesting.
When I started working on WolfMalfa,
I kind of designed this framework for figuring out,
okay, so what does the sort of automated report
that WolfMalfa generates look like? And I thought we're going to have to iterate this a zillion times to make it, you know, useful
to people. Turns out, you know, with a bunch of fairly clever heuristics and things, we do
remarkably well. And, you know, because we obviously have data on what people click on and,
you know, where they read and so on. And, you know, I think it's a, but it requires, you know, expert knowledge to build these things. So, I mean, those heuristics
are built by using the fact that, you know, somebody who actually knew a lot about this field
helped define those heuristics. If it was just like, let's automate it, let's try and put a
ranking algorithm in here and just do it based on, you know, we don't know
what this is. It's just a question of how the words are correlated and, you know, let's use,
you know, TFIDF or something to do something. It's turned out that that was actually,
of all the various things, you know, it was one of many things that I thought was going to be
really difficult with Wolfram Alpha that turned out, you know, we had a bunch of big advantages in terms of the way our algorithmic system was, the framework was built, and also the kind of way that we made use of, you know, experts all the way through what we've done.
But, you know, that turned out to be making reports that are what people care about has turned out to be remarkably – that has not turned out to be the difficult thing.
Now, if you say, okay, let me – throw me some random data.
Tell me something interesting about it.
We actually have done that a bit.
We actually had the pro version of Wolfram Alpha.
You can actually upload data sets, and it will try and do things like that.
I don't consider that particularly successful. I mean, I don't think people, you know, people find
it amusing, but I don't think it's, I don't consider it kind of something that is a, you know,
a core thing that people really care about at this point. That makes sense. I mean, I think it's profoundly difficult to come up with some kind of distance metric there.
Like if someone gives you a bunch of financial data
and you just project that into some latent space,
you'll end up with some distance metric,
but it probably doesn't match their mental model.
And things that you think are really close
or really important actually aren't. Okay, so this, I mean, it happens to be something I've thought about a lot is the theory
of interestingness. So in other words, if you've, so I've studied a lot kind of simple programs out
in the computational universe, things like cellular automata and so on. And so you look at lots of
these cellular automata and you say, which ones of these are interesting, right? You know, you look
at them and some of them look, Oh, they, that's, that's kind of cool. you say, which ones of these are interesting? Right.
You know, you look at them and some of them look,
oh, they, that's, that's kind of cool.
You know, it does all these things,
but which ones are interesting?
Or you can do the same thing for chemicals, let's say.
You can start enumerating possible,
you know, hydrocarbon structures and you can say,
which ones of these are interesting?
So interestingness is a, is a very cultural kind of thing.
That is what is interesting to somebody depends on, you depends on the whole history of how they got there.
And for us as a civilization, when we say, well, what's interesting to invent?
Or what mathematical theorems are worth proving?
Those are things which are very history and context dependent. And in fact, it's really, so, okay, in my world as a language
designer, here's a way to think about this. So, you know, you can imagine sort of concepts in the
world and you could say, well, which ones of these concepts are interesting? You know, we see stuff,
let's say, you know, in Stone Age or something, we saw, you know, I don't know, you know, different
cracks in mud or something or different kinds of puddles or something like this.
And at some point we decided, well, puddles were interesting enough that we would give them a name.
And so as human language emerged, you know, somebody said, OK, that's called a puddle.
And then, you know, people could discuss puddles and they could discuss lakes and they could discuss other things for which they have names right so in other words that there was sort of a a con um uh this sort of condensed this
this cluster of things in the world condensed into a concept that was interesting enough to
be given a name that's right and it's actually culture as you said it's culture dependent you
know the the common um uh the one that everyone talks about is
how eskimos have i think 50 words for snow and that's just because that that snow occupies a big
spot in their mental model of the world right because they see so much of it right right exactly
but i think in in um and so okay so what does a language designer do well a language designer has
to think about, you know,
what are all the computations people might want to do,
of which there is sort of an infinite collection.
And if you, as I've done, you know,
study the sort of abstract space of possible computations and so on,
you realize there are even more of these possible computations to do.
The question is, which of these are interesting enough to give them names,
to make them primitives
in your computational language, and to have people think in terms of them?
And that's, you know, that I think is the sort of at a meta level, that is the role
of the language designer, is to figure out, you know, what are the repeated lumps of computational
work that you should define as primitives and build your language out of?
And what's really interesting to me is that when you build your language and you identify
certain primitives and you give people this language, they will start thinking in terms
of that language.
And so one of the achievements, I think, with Wolfram Language is that lots of things have
been invented and discovered with the language and with Mathematica and so on
over the years.
And I can't quantify it, but I think
some significant fraction got discovered
because we gave people a framework
for thinking about things that was of a computational nature
and that sort of condensed their kind of concepts
into these things, into these kind of conceptual anchors that are the
primitives of the language. And I think that that's, and so the way that works, it's kind of
an interesting spiral because as you build certain primitives, you get to think about things in
different ways. And when you've thought about those things in different ways, then you can
build another level of primitives. And we're kind of continually in the history of sort of civilization, we're continually building these layers of abstraction as a result of having
successfully sort of condensed our concepts into definite things that we give, for example,
names to. That makes sense. I mean, that's why, you know, if you go back, I'm probably not going
to get this chronologically right, but if you go back enough, far enough, you get to the point where only the most brilliant mathematicians could understand geometry. And now you can't get
a high school certificate without knowing geometry. And that's only because, as you said,
we've compacted that and we've made that so innate and so unconscious that it can be broadcast to
everybody. Yeah, that's right. I mean, I think one of the things that one can think about about education and the history of sort of development
of civilization again is, you know, as time has gone on,
we know more and more stuff.
And you might think, oh, my gosh, that means we have to, you know,
people have got to be educated for 100 years
in order to be sort of functional in the world
because there's all this stuff to know.
But the thing that that's ignoring is the fact that there have been these moments of
kind of where these kind of moments of unification and abstraction that happen that allow one
to take something like what you're just describing.
You don't have to learn every, you know, you don't have to learn every theorem in Euclid. You can know certain principles of geometry, and that allows you to
figure out a lot of things without having to go through every detail to get there. And that's,
but, you know, the thing that's important about that from the point of view of language design,
computational language design, it's kind of the same story. It's like, what do you have to put into the language that gives people a framework for their thinking that lets them make use of this process of abstraction?
And as far as I'm concerned, people don't yet understand the role of computational language in framing the way that people think about things. It's a funny thing
because people talk about the Sapper-Whorf hypothesis for human natural languages. And
that's the hypothesis that the words in your language define how you think about things.
And people argue, how valid is Sapper-Whorf, and, you know, is it really a significant thing or not,
and, you know, that's, they're a sort of, you know, it's a, I think it's a real thing in human
language, but it's somewhat weak in human language. In computational language, it's extremely strong.
That is, the things you think about, you know, we've developed in, you know, in language,
we've developed sort of a way
of thinking about things. And you can kind of see that people, you know, I know for myself,
and I can see with other people who are kind of fluent Wolfram Language users, I would say
speakers. It's a funny thing. Years ago, I happened to visit this group of, I think they were 11-year-olds at the time,
who had been learning Wolfram language and pretty smart kids.
And that was the only time I've ever heard people actually speak our computational language.
And it was a very bizarre experience for me because, you know, I invented this language,
but yet I couldn't process it fast enough.
You know, I'm just not used to hearing it as a spoken language.
It's very, very bizarre.
But anyway, I mean, people, it's, you know, it's interesting to me that, for example,
when I'm trying to think about something and, you know, I can start typing Wolfram language
code much faster than I could explain to you what I can start typing Wolfram language code much faster
than I could explain to you what I'm about to do so that's you know that's
what it's like to be sort of fluent in that kind of computational thinking and
with that with our you know computational language is a way of
expressing that computational thinking is you know the the the thoughts are
forming so to speak around what will emerge as that language,
rather than the thoughts are forming in a way that I could express in human natural language,
and then sort of translate into computational language.
That makes sense. What about, I mean, you know, things tend to be moving towards,
let's say more of like these, what's the right word, like signal processing or statistical
based approaches where you, you don't actually know anything atomic. You just throw a bunch of
data into some, some, uh, you know, layered some systems, some neural network or something that's
just, you know, this composition of embeddings and then you cross your fingers kind of on the other side right um yeah and so how does
you know with with that sort of trending and and people wanting to sort of automatically extract
signal where does wolfram language kind of fit into that my guess is it sort of would provide a very nice sort of basis function or basis set of data that then
someone can go and and and do that that kind of thing or maybe maybe it's that no those people
are totally wrong and really the yeah no that's i don't think that's quite the right way to think
about it i mean it's the same as the way that we use so so for a long time these kind of soft
tasks like identify whether this is a
picture of an elephant or a teacup, those were very hard for computers. And, you know, nowadays
with modern machine learning, that's rather easy. And so there are these tasks that had
traditionally, these kind of soft tasks that had traditionally been very hard. And what's happened,
you know, if you look at how that's interacting with orphan language,
I mean, we happen to have a rather wonderful sort of highest level machine learning framework
that people have right now happens to be the thing we built on top of MXNet,
you know, which is a low level framework for machine learning.
But, you know, the way that interacts withram language is there's a function called image identify,
and the innards of that function make use of all of this kind of soft neural net stuff.
But at the end of the day, it's like, okay, we'll take a thousand images and we run image
identify on all of them, and then we start making histograms of how many rhinoceroses
and how many eagles were there in those pictures and so on. So in other words,
these kind of sort of soft things become elements of what is then ultimately going to be this big
structure that we build. I mean, if we think about it in terms of the way that humans work on stuff,
again, let's imagine it's a legal contract.
Somewhere in the legal contract, it will say,
if the bananas are too ripe, then they'll be rejected.
And that, if the bananas are too ripe, that's a soft question.
The whole chain of things around, then they'll be rejected,
and then this payment will happen, and that will happen and so on. That's all sort of symbolic structure. I mean,
we look at it in terms of how we as humans act, you know, symbolic structure is what we tend to
represent, like in a conversation like this with, with our human language. But there are other
things that we do that are, you that are the way that we automatically process
things in our visual system where we say, yeah, I'm looking at a bottle of water or something.
That's something that is happening at that level. Now, there is an interesting trade-off that's
happening right now. So there's certain kinds of tasks where it's absolutely true. A really good way to do it is you just give it a bunch of training examples.
You let it learn.
It's been interesting.
What does it mean to do data curation in a world of machine learning?
So for example, we have this neural network repository.
And the importance of that is that we are not there.
The main thing we're learning is things like feature extractors.
Right.
So, you know, and that, you know, that is a very useful piece of curation.
You know, it is a piece of sort of the fact that we now have a really good feature extractor for images
or, you know, or really good, getting a pretty good feature extractor for sounds, for example, as well.
The, you know, that then allows us to compute a lot of things.
And that's kind of a different kind of data curation.
I mean, just like when we were first doing sort of modern machine learning stuff, maybe
five, six years ago, one of the big advantages that we had was that we were very experienced
at doing data curation.
So when it was a question of, okay, take all these images and figure out what they're of, it wasn't just, oh, we'll just search the web and see what tags they have.
We actually had good processes for being able to inject some human knowledge into being able to do that. But I think there's another thing, which is sort of the trade-off between what is, you know,
what's some, when do you have a model of things?
I mean, so, for example, is machine learning the end of science?
It's an interesting question.
In other words, you know, one has been interested in physics,
for example, in saying, let's understand, let's write down some theoretical model
for this process, and let's work through that theoretical model and see what consequences it
has. That's sort of, you know, way number one for doing it. Way number two for doing it is say,
let's just take a bunch of examples and try and machine
learn the results. So for example, let's say you're trying to work out that sound your washing
machine is making. Is the washing machine about to blow up or is it perfectly happy? Well, one
approach you can take is to use like we have this whole system modeling, systems engineering system.
You can actually have some sort of engineering model of your washing machine system you can actually have some some sort of
engineering model of your washing machine and you can try and compute from
it the resonant frequencies and work out vibrations and so on and figure out
what's going to happen to your washing machine or you can just say let's just
measure 10,000 washing machines and throw it all into a neural network and
see what comes out it's interesting I'm I'm not sure what we're seeing in people who use our technology stack.
They can do both of these things with it.
And it's interesting to see in the last few years,
there are some things where, well, the black box method is working really well.
Other things where the black box method just doesn't seem to be making it.
You actually need to kind of understand what's going on.
Yeah, exactly. I's that's exactly it is you i look at it as sort of three layers the bottom layer is is is are what questions so um you know what you know here
here's a washing machine uh what will happen today will it break or not and so you can take
thousands of washing machines. You can say these
ones broke today. These ones didn't. We have a bit of a sampling problem. Let's say take millions of
washing machines and you can figure that out. And, you know, automated methods are getting very good
at that. Right. But then if you move up the next level or kind of how questions or control, control theory questions. And so I feel as if when you do the pure automated way at the bottom, and then you try to do
control on top of that, it becomes very difficult because you don't actually know what you're
controlling because the signals are just effectively coming from noise.
You don't know the difference between that data and noise.
And you've extracted signal from it,
but now you're trying to control based on that.
And then the top layer is causal analysis.
So why am I even taking this policy?
Why am I even doing this controller based on these signals?
And you end up sort of kneecapping
or handicapping yourself
when you do too much automation,
too much unstructured learning
at one of those lower layers.
You make it impossible to build the layer above it.
Well, I mean, one way to think about this is
if you ask, is there a role
for kind of computational language,
for symbolic language,
or should we just throw everything into a black box machine learning system? Well,
if you look at the difference between humans and other animals, you kind of see what the payoff is
for symbolic language. That's the way in which we differ. Yes, you can get your average beaver or something
to do many of the tasks that we can expect to do without this construction of symbolic
language that allows us to express abstract things and so on. There's a level that you
can do, which we were unable to do with computers properly until quite recently
but now those become components in what you know in our sort of human development has been all the
stuff we've built you know our civilization was made possible by this idea of symbolic language
and by the idea of being able to express abstraction and so on.
And that's, you know, that's in a sense what, you know, what someone like me is trying to capture
in the computational language that we're building. I mean, what's interesting about the computational
language is it's kind of another level of sort of communication. I mean, in other words, when,
you know, back in the day, well, you know,
there was, you know, the most basic level of communication for life is genetic. You know,
you pass to your offspring certain information. Then there's things where sort of things are
automatically learnt in each generation, like your visual system, you know, learns from the
actual objects and correlations it sees in the world. Your visual system, you know, just like a neural net,
learns how to see, so to speak. But then, you know, the big innovation of our species
is that we get to actually communicate stuff from one generation to the next,
using sort of abstract symbolic representation of ideas.
But still, we're kind of stuck because, you know, one generation can write the books,
the next generation can be like, oh, we don't understand the books.
You know, when you talk to somebody, it's like you form the thoughts in your brain,
you express those thoughts in human language.
But then the person on the other
end has to do a whole bunch of work in their brain to absorb that language, to fit it into
their thought patterns, to be able to take action on the basis of it.
The thing that's pretty interesting in terms of computational language is we don't need
that second step.
Once we've expressed something in computational language, it is immediately executable. And so, in other words, the way I see that in terms of sort of the AI world is we get to,
instead of just having to, you know, we could like talk to our AIs in human natural language,
and we could try and say, yeah, you know, we want you to do this and that and the other,
and the AI can try and figure out what one's talking about, or one can, if one has a good way of expressing one's human goals
in a precise computational language, and you just say, okay, here's what I want to have happen.
Now, it's up to the AI to figure out how to make that happen in the most efficient possible way,
but you've expressed what your
goals are in a precise fashion. I mean, I think that the, you know, again, in a sense, the, you
know, the role of automation is, you know, humans define, you know, what you want to have happen is
humans define what they want, then it gets actuated as automatically as possible.
But a piece to that is humans have to define what they want. And now that with computation and AI
and so on, the set of things that we can get done automatically is very broad. There's a big focus
on, so how do you tell, in a sense, how do you tell the AIs what you want them to do? And that's, again, that's,
you know, that's why I care about computational language is that gives us this bridge between
our human goals and human thinking and what we can get computers, computation, and so on to do.
And that's, you know, that's why, you know, I see it as being a pretty important thing in the history of developing the trajectory of human civilization.
It's an important moment that we're at right now where we can automate a very wide range of things. The question is just to say, you know, what do we actually want?
And then to be able to describe what we want.
And that's, you know, that's the, we have a program for middle school kids.
And I think the tagline, last I knew at least, the tagline was, who's going to tell the AIs what to do?
Nice.
And it's, I think the, I mean, it's an interesting thing in terms of education and what people
should be learning about.
I mean, this idea of sort of thinking about things computationally is, you know, this
is the really important thing of this time in history. I mean, we are,
the sort of the 21st century is the time when the computational paradigm basically took over
everything. Yeah, I think one of the things there is, I'm seeing a lot of research lately around
these generative adversarial networks, right? And so the idea there is you want to project things into some latent space.
That was sort of the big thing of,
let's say the last decade or the last five years
was this sort of compositional embedding
and how powerful that is.
And a lot of the unsupervised methods
got kind of wiped out by deep learning, right?
Now you're seeing sort of these approaches to try to
reverse the process so you'll see for example um someone will use a gan to generate endless
amounts of birds you know that was one of the research paper i saw or for example they would
use a gan to reverse an embedding and generate a text
description from a picture. So there's a picture of a person sitting on a bench that's in the test
set and that the training set didn't have that, but it had a lot of benches and a lot of people
and the system was smart enough to sort of reverse that embedding and create language.
But it's an extraordinarily difficult problem.
And, you know, as opposed to images where people have understood that convolutional nets and these things have really captured the regularity of images, we haven't really
done that for language.
And, you know, definitely the research should continue.
But I think an alternative might be, you know, at the educational level, getting people to communicate with computers in a slightly different way.
That might be the low-hanging fruit there rather than trying to reverse engineer the entire history of human language.
No, no, I think you're right.
I mean, I think that the human language is decently optimized for humans communicating with each other.
You know, computers, you know, we are in the business.
You know, we built this whole NLU system that, you know, is used in Wolfram Alpha that's in the business of taking the random things that people say to their phones or their, you know, cylinders in their kitchens or whatever.
And, you know, and trying to make sense of those.
And, you know, for small utter sense of those and and you know for small
utterances we can do quite well for building up a whole complicated story you could do it but it's
really a pretty silly way to do it now right and the state becomes a big problem so if someone says
you know i mean you hear these these these um you know people will showcase these examples but
they're always very contrived so someone will say uh you know google was showing this off at one
point someone would say uh you know what time or or you know what's the movie with this actor and
they would say i don't know black panther and the person would say what time is that and that
subsequent query would have the context of the first query.
But it's so hand-coded.
You're getting a system to keep state of the conversation in an automated way.
No one's even come close to that,
and we expected so much more progress than we've got as a community
that that might not be a good problem to be solving.
Right. I mean, so here's an analogy.
So back 500 years ago, literacy was not common.
But written, you know, people didn't know how to read and write.
So certain kinds of things could be expressed by, you know, spoken things.
People had to memorize. They're probably better at memorizing things back then. But, you know, there was a time when, for example, if you, you know,
longer ago in history, like with lawyers, you know, they would read the law. Somebody would
recite, well, not read it. They would recite the law. They knew every law and they would just recite
the law to decide what, you know, whether somebody was doing the right thing or not. Then when literacy came in,
it enabled a much greater level of sophistication and richness. You could create things that were
much more, you could write books and many people could read them and so on. I think that uh what your uh you know this this whole question about whether whether it's um uh human natural
language is is good for expressing certain kinds of things but when it comes to building big
structures it's not very good you know if you imagine writing some big doing some big piece
of software engineering where every piece of it was written with human natural language
it really wouldn't work very well now a couple of things to say about that. So one thing I found really interesting. So,
you know, I've worked on Wolfram language for, what was it, 15 years or so before I started
working on Wolfram Alpha. Wolfram language is very precise language. Everything is precisely
defined and very, you know, there's a precise meaning to everything. Then I started working
on Wolfram Alpha, where the idea is people should just talk to it there's a precise meaning to everything. Then I started working on Wolfram
Alpha, where the idea is people should just talk to it and say whatever they want to say. And
there's no documentation. It's just like, it's our job just to understand those weird things that
humans say to it. Okay, so I decided to adopt a completely different design philosophy. So in
Wolfram language, you know, one of the things is I want everything to be as consistent as possible. I want to minimize concepts. I want to have everything be sort of
infinitely factorable and so on and so on and so on. For Wolfram Alpha, it's just a pure do what
I mean type story. And it's just like deal with whatever crazy things the humans say, right?
So those were two, it was interesting for me because those are two very different design
methodologies. And I was, you know, it took me a me because those were two very different design methodologies.
And I was, you know, it took me a little while to get used to the second one.
But then, you know, I used to be very afraid of heuristics.
I used to say never use heuristics.
It's crazy.
And then, you know, in Wolfram Alpha, it's all heuristics.
It's like what does a human mean when they say, you know, 50 cents?
Oh, they mean something about money.
If they say 50 cent, they probably mean a rapper.
Oh, yeah.
It's very, you know, but if they say 47 cent, you know,
that's probably a mistake for 47 cent.
So it's just all heuristics.
Doesn't that, I mean, when you said that example, it made me feel sort of agoraphobic, right?
I mean, it just sounds like there's an endless amount of,
there's no end to the amount of things, the heuristics that you need to addoraphobic, right? I mean, it just sounds like there's an endless amount of, there's no end to the amount of things,
heuristics that you need to add to the system, right?
Well, yes, that's what I thought, right?
I thought, oh my gosh,
the thing's going to drown in heuristics, right?
But what I learned after a while
is there's a logic to heuristics.
And yes, there are lots and lots of them.
I mean, it's a, for example,
if you're into software quality assurance,
doing SQA on natural language understanding system
is one of the scariest things you can do in that business
because nothing is modular.
The whole system, you can end up,
if there's a new rapper who comes on the scene called 47 Cent,
that just blew up a bunch of your NLU for
financial transactions and so on.
So it's a very weird thing.
But the point, you know, when in working on Wolfram Alpha, it was just this heuristics,
you know, the logic of heuristics.
Working on Wolfram Language, it's the sort of very precise,
very sort of formal way of doing things. And I didn't know, you know, I thought these were
two different branches. Very interesting thing happened a few years ago, when we started to
basically build Wolfram Alpha capabilities into Wolfram language. What I realized is that there
are little fragments of when you write code, and you need a list of the U.S. states, for example, you just say control equals list of U.S. states.
That's a natural language. You hit return. It turns that into a precise symbolic representation of that that then becomes something precise that you can use in your program.
And so what ends up happening is there are these little fragments of natural language that you use for things like that. You don't say, you know,
map this pure function over this using natural language. That's a total lose. But you do say,
give me the list of state capitals in the U.S. or something. That you can say with natural language.
So there's this interesting sort of, you know of connection between these two things. I mean, it's worth realizing that the software
engineering stack that you need to make all this stuff work is somewhat complicated because when
you're dealing with, for example, the thing I just said, let's imagine you're mapping over some list
of state capitals. Okay. That list of, and you're doing it in some, you know,
version of Wolfram language that's sitting on your desktop computer.
That list of state capitals lives inside our knowledge base in the cloud.
It has to, you know, it's a sort of language where things, you know,
sort of it's a language where it can make use of this kind of vast pool of
knowledge that exists in the cloud.
It's downloaded, you know, sort of magically downloaded and cached and, you know, all those
wonderful things.
But it's sort of interesting that, again, that's a different kind of software engineering
experience, so to speak, to what one might be used to with, oh, that's just a language
and it compiles into machine code type thing.
Yeah, it seems like that whole concept of, you know, I wonder if you need to, if it needs,
if the programming language needs to be attached to this sort of common sense reasoning database,
or maybe, you know, maybe they do have to be attached,
but, but my guess is there's probably a large use case where somebody has, you know, because the
vast majority of what, let's say someone's building an app, the vast majority of their app, um, is
going to be written in a specific language. And that language is going to be designed around
showing buttons and showing, you know, uh, sliders and things like that um but then they could take advantage of the capable
this this sort of knowledge database and this knowledge language it could almost like be
embedded into sure any process that someone's doing you know that's the least less interesting
way to do it okay Okay. All right.
Yes, we have a great way of making instant APIs
where people can call us
just for little bits
where they have figured out
that we can do them.
What's much more interesting
is to think about everything
you're doing in this kind of
very high level symbolic way.
And because you get to do a lot more,
like let's say you're
laying out those buttons.
Okay, well, we have a symbolic representation
for, you know, a user interface
that's completely programmed, you know,
that ever for the last 20 years or something
has been a completely program manipulable thing.
So it's something where, yeah,
you could write, you know, hard code
to lay out these buttons in this way,
but you can have a symbolic structure
that is, you know,
computed with nice functional programming
sort of on the fly
that does that button layout.
And being able to think about that,
or you could say, you know,
make the buttons be, I don't know,
let's say a crazy thing,
you know, let's say you want to,
you're putting buttons to represent
which city you want to go to next.
Well, you can actually use geographic data to place the buttons in your interface.
It's kind of a stupid idea.
But, you know, the whole point is…
It could be interesting.
Maybe you have a spinning globe or something like that.
Yeah, right.
But, I mean, the point is being able to…
The big thing with Wolfenlanguage is it's free to do fancy stuff.
You know, with most…
If somebody says, oh, by the way,
I've got, oh, I'm going to have 10 cities and let me do a traveling salesman tour
through those cities. And somebody says, oh my gosh, I got to figure out how to write this
traveling salesman thing. Download a library. Sometimes library doesn't work.
Right. But the whole point is for us, it's free to do that stuff. It's just
sitting there. It's just some function that does it. And by the way, you asked much earlier about
sort of the trade-off between writing things in low-level languages and writing things in,
you know, a knowledge-based language, you know, like Wolfram language. Here's what I found.
You know, you imagine you were writing an algorithm. Let's say it's some,
oh, I don't know, some numerical computation algorithm, and you're going to write it in a low-level language.
But at some point in that algorithm, you say, gosh, if only I could do some algebraic analysis of what's going on, or if only I could use some piece of graph theory.
Well, you don't do it if you're writing in a low-level language because it's just too hard to get that done. What ends up happening in our language and what's been probably the last decade or so, most of the algorithms that we've produced
and we create a lot of algorithms, their building blocks are these very sophisticated things.
And the fact that those building blocks are kind of freely available is critical to having a much more sophisticated level of algorithm than you could achieve if you
were building it kind of from the sand. And by the way, the whole question of, is it efficient?
The answer is, gosh, if you're doing, let's say, solving a traveling salesman problem or something,
we have the world's best traveling salesman, you know,
problem algorithms.
You don't have to go, you know, if you were writing that from scratch,
you'd never use the world's best such algorithms because it'd be a big pain
to try and figure out how to do that.
Yeah.
And, you know, then the question is how good can our compilers be?
And that's, you know, because that's the big, big thing.
You know, back in the day, people used to say when I started doing computing, people would say, oh, you know, cause that's the big, big thing. You know, back in the day people used to say when I, when I started doing computing,
people would say, Oh, you know, if you're doing something serious,
you've got to write an assembly language. You know, these languages like C,
they're, they're just, they're crazily inefficient. And, but of course,
that's not true anymore because, you know,
optimizing compilers typically will do better than a human writing.
Most kinds, most humans in most situations.
It's just not a good idea to write the assembly code yourself.
Or even another way of looking at it is the limiting factor now is the person's time and energy.
You know, the people, that's the commodity that doesn't scale. You can get a really large data center, but you can't necessarily
take someone's mental model and replicate it to 10,000 people, right?
Yeah, right. But I think more than that, and what happens, because we've built this kind of
language at a different level than people have tried to build languages before,
what happens there is there's just much more stuff automated.
So people who end up getting fluent in the language can just be much more productive.
They can write.
It might take five lines of orphan language code, or it might take 200 lines of some other
language.
But by the way, calling on five libraries that you then have to figure out, well, are they
really going to work and so on and so on and so on.
So I completely agree.
One of the things about orphan language is my original idea is pander to the humans,
not to the computers.
Back in 30 years ago or more, particularly when I was starting this, everybody was saying, oh, you have to make it easy for the computers. Back in 30 years ago or more, particularly when I was starting this,
everybody was saying, oh, you have to make it easy for the computer. I'm saying, no, actually,
make it as easy as possible for the humans. And the ultimate version of that is find a way for
the humans as easily as possible to express what they want in computational terms. And then it's
up to us, the implementers,
to try and make that run as efficiently as possible. And, you know, it's lots of work.
And, you know, like we're just building a very elaborate new layer of compilation technology
to basically take this sort of very high level language representation of things and really,
you know, turn into LLVM and, you know, really have it grind all the way down, so to speak.
Yeah, that makes sense.
I think so most people, let's say they have some system,
maybe it's a service on the internet or maybe it's an app or it's a website or something like that,
and so they can interoperoperate with with wolfram language but
you're saying that that you know over time uh wolfram language can eventually be used to drive
you know apps and uis and things like that yeah and it is right now i mean it's used in
in lots of large-scale enterprise applications they are just running wolfram language code i
mean they have some web server in the front end and and then behind it is, you know, let's say a private
version of our cloud technology, and it's just running, you know, Wolfram language code. And
that's, you know, there's lots of that. It's kind of always funny to me, you know, I go to,
I don't know, you know, I go to a pharmacy and I pick up a prescription and I realize,
oh my gosh, it's our technology that actually created, you know, the thing that I'm seeing
here that went from, you know, the doctor's prescription, you know, written out to the
kind of symbolic representation that's used to compute all kinds of things about, you know,
multiple drugs or whatever, and to compute, you know, the schedule. And, you know, in the end,
that label is probably,
there's probably some piece of orphan language code that made that label, so to speak.
It's a weird thing in the world because there's an awful lot of stuff now that,
in the past, there was a lot of stuff where the R&D was done with our language. Now there's an
increasing amount of stuff where the actual deployment is done with our language. Now there's an increasing amount of stuff where the actual deployment
is done with our language. But it's a strange thing. As a language designer, you realize,
gosh, it's our technology underneath there. And it's like, and so what? I'm still standing
in line picking this thing up, and it doesn't make any difference.
I also find it interesting, the flip side of flip side of that, you know, I, I, I, I'm interested in sort of education.
And we should talk about that a bit, because I think there's some, some, some worthwhile things to say about, about that.
But, but so I end up quite often, you know, talking to groups of kids and things like that.
And a lot of them use Wolf Malfa.
And they kind of, you know, they notice my name, they know about Wolf Malfa, and there's this moment of surprise when they realize that there's a person who is connected to this thing that exists on the internet and that they use every day.
That's right.
There's a real building and set of people behind all of these websites.
They're not just ephemeral.
Right, right.
And it's like there's a human, somebody decided to do this
and somebody actually built it.
Yeah, how is Wolfram Alpha answering questions
when you're here?
Yes, yes, yes, right.
We did that as an April Fool gag at one point.
We had a handwriting output from the thing
and we had a whole backstory about that.
Nice.
But I think the, you know, in terms of sort of the educational aspects of things,
I mean, this whole question about, you talked about, you know,
what's the right way to do things?
Is the right way to have computers kind of come and understand human natural language
or is the right way for humans to get to the point
where they can express things in a more computationally precise way. And I think the
thing, you know, I was talking about sort of the literacy transition 500 years ago.
There will be another such transition and it will be a transition where people can, where the typical
person understands computational language. and I like to think
that the things I've spent much of my life building will contribute to that, that people
will understand enough computational language that they can routinely express themselves
in that way. They can write that little contract. They can define, you know, the restaurant
menu will be in computational language, and they can say, well, actually, I want to make a change to this.
Let me, you know, change that piece of code, so to speak.
That makes sense.
It's sort of like solving a maze by, you know, beginning at the start and the finish and working your way to the middle.
You know, if we make it way too open ended for the the people then there's just too much ambiguity um on the
flip side we can't expect everyone to write c code um to order a hamburger right but there's
probably a spot in the middle where where people think and act people interface with computers
um with much more unconscious knowledge similar similar to right now, how we can
use algebra almost unconsciously. And that will help the whole process tremendously.
Right. No, I think that's, you know, as I say that, you know, language design is a curious
kind of art because it isn't, it's not like, it's all about kind of trying to think about how people think about things.
It's not like doing basic science where you are purely trying to do something abstract.
And you have to be sort of, you have to be thinking about how does this relate to sort
of what people care about, how people think about things, and so on. And I think in, you know,
one of the things that's really neat right now is that, well, through the technology we built,
the, you know, the world's fanciest, you know, research scientists and so on, you know, a very
large collection of them use our technology to do their work every day. And it turns out that exact same
technology is now perfectly accessible to middle school type kids who can learn to express
themselves computationally in our language and who can, by the way, do things in the language that
they immediately care about. So it's not just this kind of abstract exercise of, you know, write a program that
sorts numbers or something. They can say, you know, let's take movie posters of, you know,
the popular movies of the last year and let's find out whether red is becoming a more popular
color or whatever it is. We always laugh about the, you know, if someone's applying to be a
data scientist or, you know, which is basically
Silicon Valley's version of just statistician, right? But if someone's applying for one of these
roles, they'll do a coding interview where they'll have to sort a list or invert a binary tree or
something like that. And it's kind of funny to see someone with, say, 20 years of experience,
you know, inverting a binary tree and it's just kind
of a running joke but but maybe maybe that's because there's sort of this inertia and maybe
something better would be you know using a language that allows you to fetch a lot of information
and solving a much more high level problem right right so i you know, a typical one that we've used a few times is, okay, you're given a lat-long coordinate on the Earth, and you're going to produce a map.
There's a real problem in Wolfram Alpha. You're going to produce a map.
What's the default map scale that you should use, given that you're throwing a particular lat-long coordinate on the Earth?
Okay, if the lat-long coordinate lands in the middle of the Pacific, you obviously don't want to show a one-mile radius map.
If the lat-long coordinate lands in the middle of Manhattan,
you probably don't want to show a 1,000 kilometer radius map.
And so the question is, what do you actually do?
And so the answer might be, well, you
look at the population density.
You look at the actual imagery in the map, you try and figure out density of that.
That's kind of typical sort of computational thinking problem.
And the whole point of our language is once you can define what you want, like you say, I care about population and people, I care about the entropy of the actual map features, or I care about, you know, you can invent lots of different kinds of things, then it becomes pretty easy to express those things and to, you know, try out your
algorithm. And that's a very different kind of exercise than the typical kind of computer science,
you know, sort of list type thing. It involves different kinds of thinking. And by the way,
it's thinking which is not really taught much in a lot of computer science education. This idea of let's just think computationally about real problems in the world, that's not what ends up being in those computer science education, particularly at the K-12 level,
is that it'll go the same way as things have gone with math. I mean, what happens with math,
you know, most people's takeaway from studying math in school is, I don't like math. It's kind
of boring. It's very, you know, it's intricate, it's mechanical, it's kind of, why do I care?
Like, I've asked many kids of, you know, with
quite, you know, quite sophisticated kids. I say the math you've learned in school,
have you used it anywhere? And they're like, well, uh, well, uh, no. And that's, you know,
that's kind of, so they learned it as a, oh, well, I've got to, you know, do my algebra,
calculus, whatever. And I'm doing that as a as a sort of
separated abstract thing and it's it's something where i'm i'm going through the mechanics of it
i'm learning how to factor a polynomial how to do an integral whatever that's like to the horror
of particularly one of my kids yeah one of the things that that cracked me up when i was taking
the sat as a teenager was was there was a question where
somebody had 17 watermelons and they were at a grocery store and the answer ended up being 17.
And it was just, I found that hilarious because I just pictured this person with 17 watermelons,
you know, in a shopping cart in a grocery store. And that is the stereotypical math problem. It's
just, you know, even in that environment where you had every opportunity to put, I don't know, something, I don't know, bottles of water or something.
But they chose something even more absurd than that.
And it just shows how disconnected it is from anything real.
Right.
I mean, look, I think that the, a couple of comments about that, but I think, you know, programming, low-level programming, I think is the enemy of computational thinking in
education.
That is, if what you do is teach people how to write quicksort, you are, maybe you don't
even get to that level, but, you know, if you write them, if you teach them how to,
I don't know.
Do a for loop or something.
Yes, right.
You know, most people will find that
kind of boring. Some people who might go on to be systems programmers will find that super
interesting. And that's great. And, you know, but most people, just like all those people who won't
go on to be pure mathematicians, and who find a lot of the you know algebra stuff kind of boring will will
similarly be turned off computation by being fed this kind of very you know uh very low level
kind of why do i care type stuff and i think you know computer science education has gone through
about four waves over time you know the the basic wave, which was one of the better ones, by the way, and then a series of other waves. And, you know, it, you know, one of the things that, you know, I put a fair amount of effort into, although education is such a fragmented area, it's so hard to, to know, really, you know, to, to, to, well, it's easier in smaller countries, actually, where there's more central kind of decision making. But in a country like the U.S., it's really hard to, you know, to sort of put effort and get commensurate returns.
But it's, you know, this thing about, okay, let's teach computational thinking and let's teach it by having people be able to sort of form computational ideas and then make it up to the job of the implementers like us to make it so that, you know, what the kids, you know, to make it as easy as possible for the kids to getting the syntax wrong, that's not because the kids are stupid, it's because we haven't given the right
user assistance prompts to make sure
that they don't get the syntax wrong.
I think it's the, whereas in math, you don't get to do that.
You don't get to have somebody thinking about
how do we provide algebra is algebra,
and you don't get to say, oh, let's,
we're trying to
automate this as much as possible so that the humans just have to do the real thinking part.
It's just like, well, no, actually you have to, you know, add, you know, X and X and get to X and
all that kind of thing. And then you have to do that yourself. And I think, so, you know, I think
there's an important thing that happens in education there education there i mean it's for me it's
it's interesting we we um uh so we we run a um uh summer camp for high school students
and we also have a summer school for sort of grown-ups that we've been doing now for 16 years
um that uh uh where people come and you know do some original project that's never been done before.
And they've often never done a project where you go from sort of nothing and actually create something.
And it's really great.
With the tools that we have now, there's an awful lot of low-hanging fruit out there.
I mean, there's an awful lot of stuff where using, you know, using our language, using the whole sort of computational paradigm,
there are things that people can do. You know, a high school kid can come in and in two weeks
actually produce something really, really, really neat. You can, you know, there's a website with
all kinds of projects on it that kids have done. But, you know, that's something that is, you know,
it's a thing of our times that that's possible. And I think that I find it particularly, you know,
for me, it's particularly cool to see that, you know, these kids are able to make use of the same
tools that kind of the fanciest, you know, research scientists are using. And they're able to go find
things that, you know are are interesting
new things so to speak yeah let's talk about about that for a minute on the accessibility side so
so this is something we actually haven't mentioned on the show but um the vast majority of languages
are just you know completely public open source and and totally accessible and the way that the
people who put their time and effort
into those languages support
themselves is usually through some
type of enterprise contract. So for
example
Java becomes really popular
Java's probably not a good example
Let's say Python
It's an example of the disaster
of what you're describing yeah it's
like i immediately regret my decision uh this is a python it's really popular um you know anyone
can go and download the source code and look at all of it so in a sense the intellectual property
is is is you know the value there is questionable but you know, some enterprise customer starts using it. And now they
need to reduce their risk because they have a risk of something bad happening, you know, with their
system. That's a fault of Python itself. And so to reduce their risk, they spend money. And that's
really what keeps that whole process moving. Now, I know for Mathematica, and I have it to be totally fair, I kept totally up to date on Wolfram language.
But I know for Mathematica, it's similar to MATLAB in the sense that it costs money to sort of even enter and start using it.
And so how does that work?
If someone's a student, how do they get started?
And sort of what's sort of, I guess, the business model to some degree?
And how does that sort of work operationally?
Right.
I mean, so, you know, I've been interested my whole life in kind of threading the needle
of figuring out how to do the most interesting, innovative stuff for the world and have that
be sustainable over the long term.
Okay. and innovative stuff for the world and have that be sustainable over the long term. And I think I'm not doing too badly in the sense that we've been able to consistently
be building a technology stack for the last 32 years.
It's interesting that when I started this, the first company, the first thing I built,
I was originally going to sort of do the open source thing.
I realized, gosh, there's no way to start this.
It was in 1981.
There's no way to support this. I have to start a company. I started a company of a certain kind,
you know, venture capital funded and so on. That didn't work out that well. I mean, the company
eventually did an IPO and did okay, but, you know, it didn't work out well in terms of having a long
term sustained vision. For the last 32 years, I've been pretty lucky in that I have a smallish private company, only 800 people, mostly R&D folk, and we've been able to
keep innovating and building things. Now, the problem that we have, namely, build this huge
computational language that has everything in it, it's not just one of these, oh, I'm going to spend
the weekend and come out with a new language and throw it out on GitHub type thing. It's a thing which is a
sustained, in this case, 32-year story. And it's a story where I was pretty pleased in the 30th
anniversary of Mathematica. I thought, oh, let me see whether I can take code that I wrote 30 years
ago, and there it was on an original Mac, and I get it off with floppies and things like that.
I take this code.
This was just a few months ago.
I take this code.
With great effort, I get it off the floppy and onto a modern computer, and by golly, it just runs in the latest version of orphan language.
That's cool.
We have 30 year compatibility
and, and, you know, people, a lot of people in sort of the research world really appreciate this.
I mean, they, you know, they have stuff that they did, you know, 25 years ago and it just works.
And, you know, I think that the, um, the idea that, so, you know, my main goal has been sort
of thread the needle of being able to do sustained
innovative work for a long period of time and try and deliver something valuable to the world while
letting as many people as possible use it and have the ecosystem work. And, you know, we've done
a variety of different things. So, for example, you know, basically every major university
in the U.S. least, and most around the world
has a site license for our technology. So for anybody at a university, it's free.
The university is effectively paying an enterprise license, but that means every
individual student and so on gets it for free. Nice. So they put in their EDU email address
or something like that?
Yeah.
I mean, supposedly the universities have ways that they distribute all these things.
But yes, and that's basically what happens.
And, you know, now Wolfram Alpha, the main Wolfram Alpha is just free to the world.
There's a version, there's a pro version that students, some students, well, a lot of students, end up getting because it has some additional capabilities like showing steps and so on for computations.
But Wolfram Alpha, the idea was just make it free for the world. Now, then, obviously, we have an API that's what gets used by folks like Apple and Amazon and so on.
And we have enterprise versions and all this kind of thing.
But that's one model.
Now, in terms of Wolfram language, it's sort of interesting, actually. belief that one wants to have the source of funding for something be as aligned as possible
with where the value is going, so to speak. So in other words, it's like the people who are
actually going to use this seriously should be, if they believe in it and they're going to pay for
it, then I want, and I don't want something where it's like, well, we'll make a free version, but we're really going to make our money off support.
So let's not make the free version too good.
Let's make the thing a little bit cockeyed so that we can make money from support, which is unfortunately what's happened.
Or let's make everything free and get a lot of people to use it and then say, oh, whoops, we got some patents on this.
You have to pay up lots of money because you're using our patents.
Yeah, especially when it comes to documentation. I mean, it's like every line of documentation that you provide with your open source language is literally taking revenue away from your business
model, which is to support people who are trying to use undocumented code.
Yeah, I mean, look, my point of view is, I just want, I mean, you know, I've tried to operate in a sense, a very sort of, you know, a straightforward business, so to speak. I mean,
you know, what we try to do is, you know, continue to innovate and produce the best language we can.
Of course, the fact that we're doing this in an ultimately commercial way
allows us to make use of all these data sources and all these other kinds of things,
which there's no way they would say.
If we said, oh, we're going to give everything away to everybody,
they'd say, well, you can't give our stuff away,
so forget having financial
data or something.
It's part of the ecosystem that one has to have everybody.
It has to be a sort of thing.
The other thing is, there's the question of how do you maintain coherence?
That's been my personal commitment, so to speak, for the last three decades or so, is actually try and keep this language consistent and do unified development. I think, like Python, I gather, recently had trouble with that because it's like by the time,
if, you know, in our case, I'm, you know, in a position where I'm actually trying to lead this.
Now, there's the question of source code.
I actually don't really care whether, you know, I mean, our source code isn't sort of publicly out on the internet that's you know that's not something i care that much about and i don't
think anybody else cares that much about it either yeah you know actually for a decade we actually
had a large chunk of code source code out you know freely available and um the thing that was
really funny was and you know we were doing this because we thought, oh, people are going to be interested in all these mathematical algorithms.
Isn't that cool type thing?
It turned out when we finally did a sort of an assessment, did anybody care?
We discovered that a large swath of comments in the code were in Russian.
It happened that the people who were working on it were the people
who natively speak Russian
who work on this stuff for us
and it's like
nobody read this for a decade
we had it out there, nobody read this
because somebody said
oops, your comments are in Russian
I think the open source model
makes sense when it's
low level, so for example
an operating system,
you know, if his operating system is not open source, then there's a worry that, you know,
I won't be able one day I won't be able to use my mouse or something like that. You could always
write some interface driver. But for something like Mathematica, I agree. I mean, if I had to guess, I would guess that the source code is less important.
What's more important is that somebody can make Wolfram language a part of their process.
Right.
Yeah.
Right.
So what we're trying to do, I mean, so, you know, I think we have a pretty good record of, you know, three decades of, you know, stable, compatible, you know, development, which I think is pretty impressive on the scale of languages.
And, you know, what, you know, what I want to do, you know, my goal is to have our language prosper, you know, forever, so to speak.
And, you know, what's the best way to do that?
I don't know precisely, but obviously we want to set it up so that whatever happens to our
company, et cetera, et cetera, et cetera, the language will always be available, et
cetera, et cetera, et cetera.
Actually, a thing we're doing is the Wolfram engine, which is kind of the core computational kernel of
the language, we're actually going to be changing the way that's licensed so that it's much
easier for people to incorporate it in things.
And basically the deal there is going to be, if you're developing stuff, you can just use
the engine for free.
You can just download it for free. You can just download it for
free. You can use it for free. As soon as you want a production license, if you want to actually go
into production, you know, running some big commercial site, then you pay us. But while
you're just using, you know, using the thing to play around now, you know, if your product is R&D,
that's, you know, that's our traditional market, and that doesn't count as software development.
But if your goal is, I'm developing this, I'm playing around with it, I'm trying to develop some product,
then it's just a version that's free for developers.
And I don't know exactly how that's going to work out, at the end of the day, package their game in a certain way.
But I still think that this business model makes sense.
The other thing that people don't really give a full appreciation to is just the, and I might not be saying this correctly, but the sort of private charitable nature of of people i mean the vast majority of
people recognize value and and will pay for things um yeah especially if they're also making money
if it's part of their commercial process i mean you know my attitude is there's um you know and
we give away you know app software to lots of you know students lots of students and lots of different groups and so on.
But like on the Raspberry Pi computer, there's a Wolfram Language bundle on every Raspberry Pi computer freely available to people.
And there are lots of things like that.
My goal is to make sure that the people who are really deriving value, and that value might be doing the next great R&D thing, that value might be teaching a class, that value might be delivering a product.
The people who are really deriving what ends up being commercial value support the continued development of what we're doing. And, you know, I mean, it's been, you know, it's a complicated sort of threading of the
needle because, you know, I personally, you know, if we had infinite commercial resources,
maybe we'd be able to just give it away to everybody.
But I actually think that isn't even healthy.
I think it's good to have a situation where the people who are benefiting are the people who are supporting continued development.
And that allows us, I mean, it makes us do things that actually make sense for the world, so to speak.
That's right.
I mean, it's sort of like if you were to play one of these simulation video games like SimCity or something like that, and you give yourself infinite money that all of a sudden the game becomes
completely unguided.
And so you,
you have no direction unless you have some customers.
Yeah.
The later versions of SimCity actually were the a bunch of it was designed in
Wolfram language.
Cool.
I didn't know that.
That's just one of those.
Just, you know, and I think, you know, the rules, I don't know exactly all the things that were done there.
I've seen a bunch of the simulations of the simulations of the simulations of cities, so to speak. Right. I saw an interesting article about how, not to go on too much of a tangent, but how they
decided on the traffic model.
And effectively, it's a Monte Carlo approach where they, at every time step, they randomly
pick two different zoned places.
So, for example, a residential zoned place and then a commercial zoned place.
And they draw a path
using the roads that the player has placed. And then they also then decrease everything by a
constant so that they always have a distribution, a probability distribution. And over time,
if many of those paths require the person to take the same road, then that naturally,
that becomes sort of a very nice
mathematical model of the traffic. And to do anything like that, you need to try many different
ideas. I'm quite certain that that was prototyped in Mathematica and Wolfram Language,
just knowing the team that was involved in doing that. Actually, I have to tell a personal story about traffic flow, which is kind of a...
So when I was a kid, I was sort of interested in traffic flow as kind of an interesting
kind of mathematical type problem.
And I didn't really ever figure out terribly much that's interesting about it.
Then I started working on these things called cellular automata.
They're very simple programs that we just have a row of black and white cells,
and you just have a rule that says if there's a black cell to the right
and a white cell in the middle, et cetera, then do this.
And the main thing that I discovered about things like cellular automata
is it's very easy to get very complicated behavior
even when the rules are very simple.
And that turns out to, you know, this 1200 page book called the new kind of science that is
based on that idea that actually just today came out in paperback so after 16 years of being a pure
hardcover book but oh cool i i digress is it on uh on um kindle or anything like that oh yeah and
there's actually a free version on the web.
Cool.
It's a very elaborate version on the web, which I've been slowly taking.
This is a fun story.
So all the science for that book, the book came out in 2002,
but all the science that I spent the decade before that working on for that book,
it's all in Wolfram language notebooks.
And so I wanted to sort of expose all that stuff.
And so I've been just recently, actually, I've been doing,
it's ended up that it ends up being me who has to do a bunch of this stuff
because I kind of understand the code I wrote before.
But so I've been live streaming some of these things and I've been taking these
notebooks that I made in 1992 and things like this. And it's really satisfying because they
just run. I just start them up and press shift enter and the code just runs. Some of it can be
better now because we have cleaner you know, cleaner sort of
functions that where, you know, we've used sort of higher levels of abstraction to define how
certain kinds of things work. So the code can be shorter and clearer, but it's pretty neat that
the, you know, that all this sort of research notebooks that I had all just run. But in any
case, I digress. I was talking about road traffic flow so so i studied these cellular automaton things and then years went by and you know i had sort of failed in my
youth to be able to do much with road traffic flow um and then you know then somebody said by the way
there's the cellular automaton model that people have developed of road traffic flow and actually
it turns out the standard model now is a cellular automaton model based on a thing called Rule 184, which is the, well, the 184th rule in this enumeration of rules that I made.
And so it's like, it's kind of amusing for me because that's become, as I say, the standard model that people use for road traffic flow.
And there it was sort of right under my nose, the cellular automata that I studied.
But it's actually, if one's interested in sort of the meta theory of modeling, it's
actually an interesting story because what really happened and what always happens with
modeling is modeling is an idealization.
So you say, I'm going to model snowflake growth. You might say,
what really matters to me is exactly how fast the edges of the snowflake expand.
But the fact that my snowflakes are circular, I don't care about that. I really just care about
how fast the edges expand. Or I might say, what I really care about is the overall structure of
the snowflake, and I don't care whether I get exactly right how fast the edges expand. So modeling is always as much about what you care about as it is about, you know, you're
never going to get all the details right because models are always an idealization.
And it's a little bit like what we were talking about a while back about sort of, you know,
what's interesting, so to speak, you know, and the fact that that's very culturally dependent,
so to speak.
And so what really happened with this road traffic flow thing is that the things I thought
were important about road traffic flow turned out not to be the things that are important.
And what the model, what this cellular automaton model captures is a bunch of stuff that I
really wasn't focused on.
I was focused on various quantitative things that it doesn't capture, but what it does
capture is the stuff that really matters for you know modeling self-driving you know modeling
you know current traffic jams and the effects of self-driving cars on them and
all that kind of thing but okay so I think I think that was a digression no I
mean one of the nice things about doing something such as inventing a language
or doing something you know in the pure math space
or publishing a paper in graph theory or one of these things is that it has really wild
implications that get realized over many years. I found out recently some research I had done as a
student went into just figuring out the mass of a cork and then I know absolutely
nothing about physics that's interesting yeah I'd be curious what that was I I you know my only
so I worked on particle physics when I was a teenager actually and um uh it was it's interesting
in terms of sort of the meta theory of fields because I worked on particle physics in the late
1970s when it was kind of the golden age of particle because I worked on particle physics in the late 1970s
when it was kind of the golden age of particle physics and a bunch of new methodology had come
in and there was this like five-year period when just there was major new results every two weeks
type thing kind of the same way that machine learning is today yeah and uh and so what's
what's interesting about being involved in a field during its kind of golden age of hypergrowth is that there's low-hanging fruit lying all over the place.
And so I invented this thing, this way of studying the structure of particle events, the particle accelerator, which I did when I was like 17 years old or something, and I was really
pleased that, you know, when the Higgs particle was discovered, you know, they'd used these,
you know, as part of the kind of the pipeline of doing the analysis and so on. They used these
things called event shapes that I'd invented. So that was, you know, that's the benefit of being involved in a field
during its sort of hypergrowth phase.
From a sort of career planning point of view,
the downside is you've got a lot of competition
during that period.
I personally tend to prefer working on things
where I can do neat stuff, but nobody else cares.
And after it's done, I mean, my favorite thing, in a sense,
is producing what I tend to call alien artifacts,
things that nobody kind of thought would exist.
But once it exists, you can look at it and you can realize it's interesting.
So like Wolfram Alpha, I consider to be an example of kind of an alien artifact.
Like people hadn't, you know, people had, well, they'd kind of tried to do question answering stuff for years, but they had not, they didn't have knowledge bases, they didn't have
symbolic languages underneath them, etc, etc, etc. That hadn't really gone anywhere.
Yeah, I think the thing that I noticed about WolframAlpha is it's sort of a blending of two
things. So there had been expert systems like i
talked about you know open psych and and there had been you know there's word net there had been
these sort of prologue obviously driving a lot of that there had been these these systems but
they were far too rigid right and then on the other end you have something like like google.com which is extremely open-ended
but entirely data-driven and there's no there's no structure and there's no
human intuition behind those results right and so I feel like Wolfram Alpha
sort of walked that line very well yeah I think I mean you know you mentioned
things like psych you know psych is a isn is a – I had known Doug Lanat, the guy who created Psyche,
from back in the early 80s.
And so I watched that whole progress.
And it was – I have to say it was interesting because when Wolfram Alpha came out,
it was – AI was at one of its real low points.
I mean, people didn't really believe AI.
You know, 2009, AI was not – you know, people said, I'm working on AI.
It's like, oh, that's not going anywhere type thing.
Yep.
I literally, my background is in deep neural networks.
And in 2009, I got a job writing JavaScript because nobody needed someone who knew anything
about neural networks.
That's good. That's, that's a, yeah. So you're a data point.
No, I think, I mean, it was interesting when, when,
when Wolfram Alpha was coming out,
I had a friend named Marvin Minsky who was a sort of pioneer of the AI world.
And, you know, I saw Marvin,
oh probably a few weeks before Wolfram Alpha went live.
And I said, Marvin, let me show you something.
It's like I said, it's a question answering system, among other things.
And it's like, okay.
And so type a couple of things in.
And then Marvin changes the subject, wants to talk about something different. I said, no, Marvin, this time it actually works.
And then he's like, oh, well, let me try something else.
Let me try something else.
Oh, wow.
And he's like, it was interesting to see because he'd seen so many kind of fake question answering systems over the course of many years.
He just decided it was impossible.
And why were we able to succeed? Basically,
I didn't really even understand this until sort of after the fact. I mean, as far as I was
concerned in building Wolfram Alpha, it was just use what cleverness I might have and just do a
bunch of engineering. But after the fact, I realized a couple of things. First of all,
we had a lot of actual knowledge and people have been trying to do natural language understanding
and so on without knowledge. They've been trying to abstractly do it. It's just like,
what's the structure? Let's parse the sentence and find the noun and the verb and so on.
That's much, much less relevant than knowing this is a city and its population is
roughly this and so on. And that's why, you know, that's why it's likely to be talking about
Springfield, Massachusetts, rather than Springfield, you know, Illinois or something like this.
And the second thing was, which I, is that we actually had a target to turn the natural
language into, namely, we had our symbolic language that could represent things about the world. And another thing that, again, only became sort of
clear after the fact is that a bunch of the, well, ideas about algorithm, what one might call
algorithm discovery that came out of my new kind of science project, we use that a lot in building the way that Wolf Malfoy works. I mean,
you know, just to say something about that, I mean, you know, the traditional way to think
about building software is or has been, you know, let's write the lines of code to do what we want
to do. What I ended up discovering from sort of just looking at, you know, trillions of simple
programs is that there are simple programs that do really interesting things. And sometimes those things are really useful.
They might be good pseudonym number generators. They might be good image processing things and
so on. And so we started developing this kind of methodology for just finding algorithms by knowing
what kind of a thing you want and then just going out and searching the space of possible programs.
I mean, in today's world, that seems less surprising
because with deep learning and so on,
one's sort of doing the same kind of thing,
although what we've done a lot more of
is just exhaustively going out and sampling the computational universe,
whereas in typical neural nets,
you're doing this kind of incremental sort of more like biological evolution,
you know, taking small steps and trying to make sure the animals don't die on the way type thing rather than just sampling the whole space.
But, you know, I think that the but to go back to to represent, to use essentially predicate logic to represent sort of facts about the world, common sense facts about the world, you know.
If something is dunked in water, then it is wet, those kinds of things.
Right. when Wolfram Alpha came out, you know, Doug Lannat said basically,
well, you succeeded in what I tried to do.
And, you know,
and the whole sort of common sense reasoning thing clearly wasn't able to get there.
What I realized was a sort of interesting conceptual point,
which is in a sense,
what Doug was trying to do was to use the state of
human thinking in the medieval time, in the Middle Ages. And I just cheated and used the
last 300 years of science. Because for him, if he wants to figure out some physics problem
with common sense reasoning, he's like, oh, well, you know, if you push on this,
then it'll push on that, which will make this happen. And it's kind of like you're reasoning
through it, like you're doing natural philosophy to figure out what's going to happen in the world.
For us, it's like, well, let's just, you know, just use all that science that people have figured
out. Let's just, you know, for the physics problem, let's turn that into, you know, Lagrangian mechanics and solve the differential equations and say, and the answer
is 7.5 or something. So essentially we're, you know, we're making use of all that additional
knowledge that's been developed in civilization beyond the pure human reasoning knowledge. And I
think that's what, you know, that's kind of, you know,
in a sense we cheated relative to the original mission of kind of common sense AI.
I think that the fact that you have these dynamical systems and they don't is definitely part of it.
But I think the other, the even more important part is the resolving of the ambiguity,
ultimately through, you know, heuristics, but also through
statistics and through language understanding and things like that.
No, no, right. I mean, look, it's just, I mean, as an engineering project, it's a,
I mean, you know, our effort was much, you know, has been much more, I don't know, pragmatic and
in a sense making use of many more methods. I mean,
not just saying, we're just going to use common sense reasoning to figure everything out.
We're using whatever method makes sense. And that's kind of a story, you know, that's,
that's part of kind of the whole Wolfram language story is put everything in so you can use whatever
makes sense. I mean, you know, when we come to do data science, for example, we like to talk about
kind of multi-paradigm data science. It's just everything is there, so whatever makes sense to do,
you can do, so to speak, rather than, oh, we've got a package that does this, so that's what we're
going to have to do. But I was going to say, you know, you're talking, we were talking earlier about math problems and watermelons and, and, and so on. And one of the use cases that I really wanted to make
use of sort of common sense reasoning for was math word problems. So we, you know, I, and I,
we still have never managed to do this. And, and it's, you know, when you talk about, you know,
putting the watermelons in your, in your cart or whatever and then and then somebody might say
um and uh i don't know um after well let's say you put the watermelon you know um uh you know
you put the watermelons in bags in the shopping cart and uh you And if there are X number of watermelons go in these bags,
how many of them are still visible as you're walking around the store?
Well, of course, to know that it will be visible
when you're walking around the store,
you have to know that once a watermelon has been put in a bag,
you can't see it anymore.
And that's kind of common sense reasoning.
And so to do,
you know, if you set up a word problem like that, you have to have a layer of common sense reasoning.
You know, we might be able to, if we can get an equation out of it, you know, 3x plus 7 equals
14 or something, where x is the number of watermelons, then, you know, then we're all done.
And we can just use, you know, modern science, so to speak, to just crack that and get to the answer. But the thing about, you know, how many watermelons can
you see in the shopping cart that we need common sense reasoning for now, you know, I've been
hoping, uh, uh, you know, I've been hoping that, that we'll be able to make use of what Doug has
done for so many years. I mean, he,. I mean, he's been very pessimistic about whether
this is going to work. So it's never been tried. But, you know, in fact, we've reached, we can do
basically up to middle school level math word problems using extremely boring heuristic
template-y type techniques. We can't do more sophisticated ones. and it's sort of an interesting open problem to
be able to do those, and it's a place where one might be able to use, or one should be able to
use, you know, common sense reasoning. But it's interesting to realize that the space where
there's sort of things that one actually wants to do with sort of common sense reasoning is still
worth doing is a narrow space, and it's a space where, in a sense,
those things were built for humans to have to sort of put effort in to figure them out. They
weren't, they're not natural things. I mean, you know, another example might be, you know,
let's say you're doing medical school and you're saying, you know, which nerve goes near the such
and such tendon, you know, of such and such. Well, you know, which nerve goes near the such and such tendon, you know,
such and such? Well, you could do that with common sense reasoning, but actually, you know, we've got
a complete 3D map of human anatomy and you can just go and use computational geometry and answer
the question. And it's not, you know, even though a medical student might do it by thinking through,
oh, you know, this goes that place and that place. And so they're
using common sense reasoning, but that's not how, you know, we're going to do it with a computer.
It's interesting. It sounds like, so if you think about these word problems,
ultimately what they're trying to do is, they're really trying to do two things. One is sort of
obfuscate the core problem. So they're not just giving you 3x plus 7 equals 14, but they're wrapping it in some narrative that you have to understand.
So making the problem harder.
But then they're also trying to relate the problem to real life, which is why it was so funny to see the 17 watermelons in one shopping cart. So in a sense, if you could
generate relevant, if you could generate, say, SAT math questions that really connected with
somebody, that would be maybe more difficult than the counting they would do at a real grocery
store, but would closely emulate something
they would do in real life.
I feel like there's something really profound there.
So, I mean, because you've sort of understood
that it is truly a common sense reasoning problem.
You've understood the way,
the routine things that people do
and the math that people do on a routine basis.
And you've been able to sort of extrapolate that
to something more difficult, like differential equations
and figure out a way to sort of tie those two things together.
That's actually something profoundly complex.
Right. I mean, look, I think that the thing one would like people to learn
is the stuff they can't just feed to computers and get automated.
And, you know, the main thing is this question learn is the stuff they can't just feed to computers and get automated.
The main thing is this question of taking a human thing you want to do and thinking
about it in computational enough terms that you can describe it in a precise language
and have a computer unambiguously know what you're talking about. And I think that process, I mean, when you talk about math word problems,
sort of the idea there is take these kind of fake situations in the world
and try and mathematize them.
I think the real thing that people should be learning how to do,
and very few people are actually learning this yet,
is take things in the world that are real
things in the world and learn how to think about them computationally and be able to express them
in such a way that both you can think about them more clearly. I mean, the remarkable thing for me,
at least, is, you know, I've been spending years kind of thinking about things computationally.
And so there are lots of kinds of, you know,
very everyday questions where I think about them in terms of breaking it down into the kinds of,
you know, structure that I would use to think about something computationally. And then by golly,
I can actually get to an answer. And whereas otherwise you just don't have, it's very mushy.
You don't have a way to think about it. You know, if you don't have this sort of framework
and in our case, you know, computational language
to use to think about things,
you know, you can't form them precisely enough
in your mind to actually get to a conclusion.
I mean, it's the same problem as, you know,
Doug's common sense reasoning thing.
You just, you end up with, up with, you don't get, it's kind of all, it's all these, if it rains, then it will get wet, then this.
But it's all very kind of vague.
And it doesn't give you a precise structure to operate in. Talking of which, I think we're probably reaching the end of my,
I'm a creature of, you know, one of the things I, I'm a data oriented person.
Yeah.
I do, I've, I've do a lot of sort of personal analytics stuff. And I probably,
I was somewhat horrified a number of years ago to discover that i'm the human who's collected more data on themselves than anybody else and so one of the things i know
is that i get uh i get sleepy at a very precise time and that time was about 10 minutes ago all
right so i'm gonna start failing here but but um no think that's great. So I wanted to, I mean, this has been just absolutely fantastic.
Thank you so much for coming on the show.
I think people are going to be absolutely fascinated to hear a true programming language inventor and to sort of hear your story and this dialogue.
I think it's been amazing.
What about just to sort of conclude here conclude here like um you know we always
ask um two things one is sort of what is it like day to day at your company um so in this case you
know well from research and and are you hiring if so what kind of jobs are you hiring for where things like that okay so i mean for me i spend most of my time
figuring stuff out it's pretty neat actually it's you know i've i've uh you know it's it's i'm an
unusual kind of ceo because i um uh you know i i'm deeply involved in the design of the product
and so on and uh you know i have we, we've got great people at the company.
And so an awful lot of what happens at the company just happens and doesn't require,
you know, the CEO to stick his nose into those things.
And in fact, you know, one thing that's been interesting the last year, but since we've
been doing this live streaming of internal design review meetings, is that folks who
want to know what it's like to work at our company, company well there's 250 hours of what a bunch of meetings are like
that's uh out there in the archives of the of the live streamed um uh stuff i think that um so our
company is very geographically distributed where our headquarters is in champaign illinois i have
been a remote ceo for 28. I live near Boston. Although
I happen to be tomorrow doing my about three times a year trip to Illinois.
But the thing that we end up having people kind of scattered all over the world. Actually,
our HR department tries to maintain an accurate map on our careers
section of our website of where people are in the world. And it's quite diverse, I would say.
In terms of what we, well, the places we tend to not hire people in New York and Silicon Valley,
because we're going to, that's super expensive for us. And it doesn't, you know, we don't,
we don't really need the, the proximity to other things that,
that Silicon Valley provides. Not that, I mean, it's, I think,
in terms of, you know, we've ended up with, well, I would say, a wonderfully talented group of people from around the world.
Yeah, I know some parents who work at Wolfram Research who are in South Africa.
Yes.
Yeah, it's amazing.
Right. The thing that I really actually, I've come to appreciate the diversity of having people all over the world in all kinds of different settings and situations and so on.
I think it adds a certain vibrancy to what happens that is something I certainly appreciate.
But we've ended up with lots of world experts on lots of kinds of things.
We've had lots of people who've been at the company a very long time.
And some people, they're at the company for a decade.
Then they say, oh, the grass is greener somewhere else.
And they go off somewhere else for a few years.
And then they come back, which is always satisfying.
It's satisfying for management to see that happen, so to speak.
Yeah, definitely.
Because it means we're doing something right.
But I think that the thing that we've ended up with, it's like people who are good at thinking are the people who tend to succeed because you know we
have we're always trying to solve you know a lot of what we're doing is stuff that's never been
done before and you know my principle in building the company is very similar to my principle in
building our language automate as much as possible and so you know for me you know you mentioned
javascript programming you know there was a period of time when we had tons of javascript programmers automate as much as possible. For me, you mentioned JavaScript programming.
There was a period of time when we had tons of JavaScript programmers.
It's like, look, guys, what you're doing is, why don't we build something higher level
that just automates a bunch of this stuff?
So we did that.
Then those folks moved on to doing other things. For us, they end up being able to work on higher-level things.
For the company, that's how we get away with only having 800 people is by the fact that
we've automated a huge amount of what goes on internal to the company. We're using both
the principles and the actual technology that we've built.
Like our testing system is, of course, all written in Wolfram language.
One of the more exotic things that we're now doing is writing our ERP transaction processing system in Wolfram language because we just got fed up with some of these pseudo open source things, which actually are costing way too much money.
And we said, these don't even work well.
Let's just build our own.
So that stuff is...
Now, in terms of hiring, one of the things I might mention is we have the summer school every year.
It's usually about 70 people from around the world. One of the things I might mention is we have the summer school every year.
It's usually about 70 people from around the world.
It's a great collection of people.
It's become a favorite sort of pseudo-vacation for a bunch of our R&D folk.
And, in fact, most of the instructors at the summer school are alumni of the summer school, which reveals the fact that the summer school is our single sort of… We recruit a lot of people from there.
Cool.
That makes sense.
It works really well because people come and they often have never thought about working
for the company.
They come and spend three weeks and they do a project and they get to know a bunch of people at the company and we get to see what they do they get
to see what uh kind of you know our approach to doing things is and it's kind of uh you know it's
uh it's it's very um uh works well but i think in terms of what are we what are we i mean we're
we're always looking for you you know, very talented people.
I would say that my approach to management tends to be, you know, we've got a lot of projects we want to do.
We've got a certain pool of people that, you know, have lots of talents.
And sort of the role of management is to solve the puzzle of how do you connect, you know,
the talented people with the projects that you want to get done.
And so we're more interested in kind of, you know, really talented people than we are in
the specifics of, oh, we want to get a such and such kind of person right now.
Yeah, that makes sense.
And, you know, it's sort of strange because we're a company where, you know, we actually
need things like history PhDs because we actually do stuff with curating all military conflicts
and history type thing where it's useful to... We need people... Actually, another thing
interesting to mention in connection with orphan language is if you look at the people who are...
We have a whole lot of people who have, I would say, subject area backgrounds.
And many of whom were not, you know, quotes computer people particularly before.
But they're very smart people.
And they, you know, and it's turned out many of them have become really, really strong Wolfram language programmers, which is an interesting thing to watch.
I mean, it's interesting to see that we see, you know,
in an educational level, for example, you know,
like in K through 12 education, it's like, well, okay,
which teachers are really going to understand this stuff?
Is it going to be the math teacher?
Is it going to be the computer science teacher who knows Java, for example?
Or is it going to be the head of the English department who's just a smart person who decides
that they're going to pick the stuff up? It turns out it's a diverse background of people,
and that's the same thing we see in people who end up being really strong
Wolfram language programmers. So I guess the point there is that it's a whole world of
computational language and computational thinking is the most important thing. It's a bit of a disruption relative to the, you know, we're all going to learn to be the,
you know, the best, you know, churn out the maximum number of lines of code and make sure that,
you know, the type checking works correctly type. Right. Yeah, that makes sense.
You know, I think in, anyway, that's some. Yeah, that totally makes sense i think i think the the camp sounds
awesome especially for um you know people are interested in uh as you said maybe they have a
great background in history or particle physics or something like that and this is a great way to
to um you know jump into computation and and programming and things like that.
No, it's been, I mean, look, the one for grownups, that's the case.
The one for high school kids, that's interesting because, you know,
it turns out that even at the fancy high schools, and we've tried very hard to get kids also not from the fancy high schools,
but it turns out that many fewer people are learning
any kind of real programming, any kind of anything computational in high school than you might think.
I mean, I kind of assumed with all the noise that's made about, oh, everybody's, you know,
doing... Everyone's STEM crazy. Right. You know, but it turns out a lot of these kids, and actually
it's often, it's often the worst case is they did a c++ class
and they found it boring and they learned a bunch of you know the most important thing to do
is to i don't even know i put a semicolon every time or something yes yes yes right it's um uh
anyway well listen i should i should be off but yeah totally so yeah just to recap real quick so
if you're a student you have access to Mathematica already you just you might have not known it so
definitely check it out I mean it's totally free go through your university and start playing with
it and see how many people there are in each country and do all these cool things if you
aren't a student I believe correct me if I'm'm wrong, but you can still use it for free.
Yeah.
In fact, the web version, you can just go to the front of our website and it says immediate access and startup.
Wolfram Programming Lab, which is Wolfram Programming Lab is the full Wolfram language, but it's kind of themed for kids.
So if you don't mind that it's themed for kids, you get it for free.
Cool. Good to know.
And if you want to integrate it
with something you already have half written,
you can use the Wolfram language,
which is also free, right?
For personal use.
Yeah. I mean, the Wolfram engine,
I mean, everything gets complicated at this point.
But basically it's uh everything gets complicated at this point but but basically it's been um uh uh it's it's you know go to our website and you should find a path
yeah getting getting go you can find good documentation there's there's there's there's
definitely at least a web way to to sort of like a sandbox where you could play around yes exactly
right and everything.
Cool. Thank you so much for coming on the show. It's been amazing.
It has
gone a little long, but I'm sure the people
are going to be, they're not going to be upset about
that. This is an absolutely fantastic interview
and thanks for staying up late with us.
Yeah, sounds great. It was fun.
The intro music is Axo
by Binar Pilot.
Programming Throwdown is distributed under a Creative Commons Attribution Share Alike 2.0 license.
You're free to share, copy, distribute, transmit the work, to remix, adapt the work,
but you must provide attribution to Patrick and I and share alike in kind.