The Changelog: Software Development, Open Source - TensorFlow and Deep Learning (Interview)
Episode Date: September 9, 2016Eli Bixby, Developer Programs Engineer at Google, joined the show to talk to talk about TensorFlow, machine learning and deep learning, why Google open sourced it, and more....
Transcript
Discussion (0)
I'm Eli Bixby, and you're listening to The Change Log.
Welcome back, everyone. This is The Change Log, and I'm your host, Adam Stachowiak.
This is episode 219, and today Jared and I are talking to Eli Bixby about TensorFlow.
Eli is a developer programs engineer at Google. And this episode is produced in partnership with O'Reilly Media in conjunction with OzCon in London next month.
We'll be there.
Make sure you say hi.
Head to OzCon.com slash UK and use the code PCCL20.
Once again, PCCL20 to get 20% off your registration.
We talked to Eli today about what TensorFlow is, why it's open source, and the role it plays in deep learning.
We have three sponsors today, TopTile, Linode, and Datalayer, a conference organized by our friends at Compose.
Our first sponsor of the show is our friends at TopTile.
And this message is for all those team leaders out there looking to easily add new developers and designers to their team.
Easily scale up when you need to.
You got a big push coming. You got a new area of the product you've got to go into. You've got
more need than you thought you could. You've got to go through all this hassle of putting a job
out there and hiring people to find the right people. Well, that's a bunch of hard stuff that
you don't need to even deal with. Call upon my friends at TopTal. That's T-O-P-T-A-L.com.
The cool thing about TopTal is you can hire the top 3% of freelance software developers
and designers. And what that means is they've got a rigorous screening process to identify the best.
So when you call upon them to help you place the right kind of people into your team that you know you're calling upon the best people out there.
Once again, go to TopTal.com. That's T-O-P-T-A-L.com. Or if you'd like a personal introduction,
email me, Adam, at changelove.com. And now on to the show.
We're back for a great show. This one, Jared, is in partnership with O'Reilly. We're back for a great show.
This one, Jared, is in partnership with O'Reilly.
We're going to OSCON.
It's going to be a blast.
OSCON London.
In fact, it's not we.
It's you.
You're going.
That's right.
I'm pretty excited to go to London for OSCON and just because I've never been to London.
So it'll be fun.
We've got our guest today is Eli Bixby.
Eli, you're on the developer program. Well,
you're actually on the Google Cloud team, but you're a developer programs engineer
at Google. So what do you primarily work on? So I primarily work on machine learning. So right
now that's the various machine learning products. We're focusing right now on the cloud machine
learning platform launch, which uses
TensorFlow. So I'm doing a lot of TensorFlow. So you don't actually work on TensorFlow,
but you work around TensorFlow. Yeah. Yeah. So there's, you know,
the user contributor distinction. You know, I'm a I'm a big user, not a not a contributor,
except, you know, to docs occasionally. Gotcha. And so for the listeners today,
this show is all about TensorFlow, even though Eli is not a contributor to it. He does a lot of education around it.
And Eli, you're actually, you're doing a talk at OSCON London and you're doing a workshop with
Amy Unruh, also in Dev Relations with you. So your talk is deep learning with TensorFlow.
That's a beginner talk. And then the workshop is kind of a deeper dive into it. Can you
kind of break those down for us?
I guess, actually, before we do that, I want to mention that we actually have a code.
So if you're thinking about going to OSCON, we actually have a code.
Go to OSCON.com slash UK and use our code PCCL20 and you'll get 20% off the registration.
And make sure you tell your friends, too, because as we said, Jerry's going to be at this conference. We want to meet all the listeners. We can,
we'll have t-shirts, we'll have stickers, Jared, we'll have mics. We're sitting down talking to
people, but, and maybe even you, Eli, we can talk to you if, uh, if, uh, I'll be around,
but, uh, yeah, let's get back to your tutorial session and your talk. So kind of break down
what you're going to be covering there. Yeah. So the, um, the tutorial session, you know, we get a little bit, um, more in depth. Uh, we like to get people's, you know,
since it's the tour tutorial, we'd like to get people writing TensorFlow code. That's the goal,
um, is to sort of like, you know, there's so much hype around it. Like we don't really need to hype
it, but there is a big leap from like, Oh yeah, I've like read some news articles about TensorFlow to like I wrote like a basic linear regression.
So that's the goal for that.
We start off, you know, just like.
If you've used other like Python, big data libraries, if you use NumPy and SciPy, like here's a little bit of the correspondence, you know, write some very basic things.
And then we launch into sort of one of my favorite machine learning applications, which
is Word2Vec.
So that's embedding words in vector space is the description of that, which is kind
of cool.
It's just like words become these lists of numbers, and those lists of numbers are meaningful
and they encode the relationships between words.
The talk is going to be a little bit more of an overview.
Like, here's what you can do with TensorFlow.
You know, here's some neat tooling we provide to make it easier to use and to make it, you know, in production, work in production at scale, all that stuff.
And so that's going to be more of a demo. You know, we don't really have time to get people's hands wet, but I'll provide you with tons of resources to learn more during the talk.
Awesome. Adam, you mentioned our code PCCL20.
That's not the easiest code to remember.
So if you're interested in that 20% off, definitely check the show notes.
We'll include it in there.
Not only will we be there and we have that 20% off coupon,
we also have a free ticket to give away.
So if you want to come to OSCON London, hang out with me, Eli, have some fun,
learn more about TensorFlow and all the other things.
We have a free ticket and we're going to give it away to a lucky subscriber to ChangeLog Weekly,
which is our weekly newsletter that covers everything that hits our open source radar.
If you're not subscribed to that, now you have the best reason ever to go get subscribed
because we'll be giving away that ticket on weekly, I think, September the 17th.
So check that out.
Well, Eli, what we like to do is before we get into the main topic of TensorFlow, we
know that you're speaking about it at OSCON, but we don't know too much about you beyond what you're doing at Google today. We'd like to get
to know our guests a little bit and hear some background on where you're coming from and how
you got to become the developer programs engineer at Google. Can you give us a little bit of insight
into your history with software? Yeah, so I'm a recent grad, actually. So I graduated three years ago with a degree in math and computer science from Oberlin.
And I sort of wanted to do something that was a little bit different from normal software
engineering.
You know, I've always loved teaching.
I love sort of the art of presenting information well.
And so, you know, I was contacted about this position and it seemed like a great fit.
And from there, there was a need for people to cover this ML stuff
within developer relations,
especially TensorFlow was released about a year after I joined
and so it was all sort of coming online as I was joining
and so my math background sort of helped a little bit there.
And I've just been sort of drinking from the fire hose ever since then.
You mentioned you have a partner in crime there in uh, kind of in research and in, uh,
university style environment thinking that, uh, she wasn't going to be relevant to industry.
And all of a sudden this machine learning and all these things that were just kind of the,
the area of research and development all of a sudden are very relevant to many people. Can you,
can you expand on that for us? Yeah. So, uh, PhD from Stanford in AI-related things and sort of joking that it's all suddenly
relevant again. And that's because this idea of deep learning, it's based around... In the 80s,
they were called perceptrons, but it's based around this really old idea of like, let's build a like very high level abstraction of a
human brain. And maybe we can teach it things. And then sort of the, it kind of fizzled out,
because it turned out, it seemed like, no, we can't teach it things. But then it's like,
oh, actually, it turns out, we just need a few 100,000 more of these perceptrons or you know a couple million more of these perceptrons
And and we can teach you things and and that's you know obviously been enabled by hardware
You know they couldn't do that
It wasn't you know a failure of academia necessarily as they just couldn't do it in the 80s
And now we can and so all of this this research is now suddenly relevant and there's this whole like
reawakening in this area of research that's kind of cool too because i know a lot of people that have gone to school for things that's really hard to get a job in and this could have been one of
them because as you said it the desire to do this stuff didn't match the current technology so it
wasn't quite the right timing but but now is the right timing.
So you have all these people potentially
that have a lot of wealth and knowledge
and research around this thing
that now actually get jobs
and be relevant again in that area.
Yeah, this is a story maybe
that gives hope to academics everywhere.
I was going to say,
it's kind of the opposite
of what many people experience
with their university degrees,
where many things that you learn go from relevant to later irrelevant, right? Or obsolete. And this
is like the opposite trend, which is surprising, I'm sure. Yeah, I guess there's always like,
it moves into something else in academia, but it is a sudden shift into industrial relevance,
which is uncommon. Yeah. So obviously we want to talk about TensorFlow.
And the cool thing is, Eli, is that while we're working with O'Reilly
on producing this show and going to OSCON,
this isn't the reason you're actually on.
It's sort of like a two, it's a two for one for us
because we wanted to have a conversation on machine learning,
but specifically TensorFlow and deep learning.
And when it came up like, hey, let's work with O'Reilly
on some promotion around OSCON and producing some shows
with some of their speakers and all that good stuff,
we went on a list, saw your name, saw TensorFlow,
and we're like, let's get him on because we want to do a show
around TensorFlow anyway.
So this is perfect timing.
But for the listeners who don't know much about TensorFlow
or haven't touched this whatsoever,
can you give a breakdown of what TensorFlow is, maybe even why it's even named TensorFlow?
Yeah, I love giving the name-based breakdown because it is pretty much the easiest way to do it.
So the tensor part, tensors are just an n-dimensional matrix generalization.
So, you know, you got your, you know, a tensor has a shape, which is described
by a vector. So a tensor with shape, you know, 50 is, you know, just a list of length 50. A tensor
with a shape 51 is a list of 50 lists all with of length one, so on and so forth. So you can imagine this being useful to bundle up
different kinds of data. Like for example, an image might be, you know, 10,256,256,3, right,
where you have 10,000 of these images in a little batch, you know, they're 256 by 256 pixels,
and each of them, you know, have three channels. So that's what a tensor is. The flow part is a little bit more complex, but if you've ever played around with like a
graph computation framework, like say like Spark, then you're familiar kind of with this idea of
like you build a computation graph and then that graph is deferred, you know, you do deferred
execution on that graph where you sort of calculate the upstream dependencies and use that to evaluate whatever node you want the value of. And so it's
just this computation graph where the edges are tensors. So tensors are flowing along this
computation graph and then you get tensorflow. And it's all modeled around the, the actual brain, right?
Like the, the whole point of, I guess, TensorFlow in particular is that it's meant to be around
the brain's initial cognition.
Well, so the, the initial idea of deep learning is modeled around sort of this very abstract
idea of, of how neurons work that actually like neuroscientists have now largely rejected,
but we keep around cause it turns out to do good stuff. But TensorFlow, I would be, you know,
remiss if I didn't say, like, TensorFlow is more general than that, actually. It's, you know,
it's not just for deep learning. Arguably, any computation that you can describe as sort of,
that's efficient to describe as, like, a graph of tensors is a good fit for the TensorFlow
use case. So this includes like, you know, there's people internal to Google building like Bayesian
inference systems on top of TensorFlow. And like people have talked about working on like simulation
systems that are on top of TensorFlow. Granted, a lot of the like higher level, easy to use wrappers that we provide with TensorFlow are focused on deep learning because, you know, that's the first sort of application.
But it is sort of built to be an extensible, more general system that you can build on top of.
So deep learning is a kind of machine learning.
It's a specific tactic or strategy.
Is that fair to say?
Yeah.
So I think deep learning is generally,
it's kind of ambiguous,
but it's often synonymous with sort of like neural nets.
So this idea you have a node that resembles
or represents rather a neuron
and you have some inputs to this node and those inputs
go through sort of like an activation function and result in a single output and you have layers and
layers and layers of these nodes and if you think about sort of a layer of these nodes, you know, one layer feeding into the next, it becomes clear why tensors are important, right?
Because at that point, that activation function is really, rather than calculating it for a single node, you calculate it for the entire layer at once.
And you're doing some, you know, fancy linear algebra stuff, which is why, you know, it pays to have this general system. And the Python wrappers are on top of a C++ API
that does all the fast tensor computation stuff.
It does the graph execution.
It's the graph execution engine.
And there's people out there building wrappers in other languages.
So it is really meant to be sort of a base
upon which the entire
community can build. It seems like from Google's perspective, the, you know, the, the movement of
saying, let's take this, um, this mathematical science or this research science and let's apply
it to a useful things, right? At first it was, you know, search, um, now, uh, expanding upon
that in many different ways is like kind of a, maybe not a secret sauce, but like it's it's Google's move.
You know, like it's and it's a great move, by the way.
But but like because of this, like, all right, I'm an application developer and many people out there are building either web apps or mobile apps.
And like this stuff was very far from us in terms of something that we could possibly use or do or apply and make our products any better.
And all of these open source efforts, TensorFlow amongst others, is like, you know, I call it like bringing the cookies down to the bottom shelf where the kids can reach it.
Right. And so like now we have opportunity at it, which is why like the changelog is interested in this kind of stuff. But doesn't it seem like it's like a you're taking the superpower that was inside Google and you're you've open sourced it back in November of 2015.
And and now everybody can use it and everybody can get the cookies.
Like, aren't those cookies valuable?
Like, what do you think is the in your opinion opinion, or even if Google has an official stance,
that's fine as well.
But why open source something that's this research heavy and invested in?
Yeah.
So Google actually has an internal meeting every week.
TGIF, you've probably heard about it.
It's company wide.
Sergey and Larry stand up there and crack jokes.
But this is almost verbatim a question that an audience member asked.
So there's people inside Google that, you know, are thinking these thoughts too.
And there's a great doc that sort of circulated around the time of release by Jeff Dean, one of the progenitors of TensorFlow.
Sort of like why open source TensorFlow.
And a lot of it is because we've published a lot of white papers in the past.
So the MapReduce white paper, which resulted in Hadoop, sort of the Bigtable and Colossus white papers, which resulted in HDFS and many other distributed file systems, so on and so forth.
And when that happens, like there's smart people outside of Google, like a lot of them.
Right.
And they're going to take these white papers and they're going to turn them into APIs that will become the industry standard because they are founded on like such quality ideas.
So you end up, you know, quoting Jeff Dean, you end up with this ridiculous sentence like Google Cloud Bigtable now supporting the industry standard HBase API, which is ridiculous.
Right. We built Bigtable.
We published this Bigtable white paper. HBase was, which is ridiculous, right? We built Bigtable. We published this Bigtable white paper.
HBase was made, you know, significantly down the road.
HBase became the industry standard.
Now we're trying to expose Bigtable as this cloud service.
And all of a sudden, we have to adapt to this slightly and annoyingly different API.
So it really is part of the much larger decision Google has made,
just like bring the fire to whoever that Greek person you're supposed to bring the fire to is.
Or the cookies.
Yeah, or the cookies.
Yeah, the cookies was the metaphor we were using.
I keep hearing this too, though, from other people who have either worked inside of Google for a bit,
either as an intern or a full-on employee, that there's tools that either worked inside of Google for a bit, either as a, an intern or,
you know,
full on employee that there's tools that you have inside of Google that when
you step outside of the Google sphere,
you miss them.
And one of those actually,
we actually teed this up a little bit in the last show with beyond Lou on
source graph,
there was this thing called Google code search.
And I'm sure you probably know about that.
You lie where you can actually search inside of Google for various code that is used from this is all kind of hearsay basically from from beyond and
that's kind of where sourcegraph came from was like there was this tool the superpower that
google had and stepping outside of that sphere you no longer have it and they wanted something
similar and so they built sourcegraph yep and if you look at you know if you look at docker like
google's usage of borg preceded docker by oh, I don't know, a decade. And, you know, the same story is repeated, you know, many places in the industry. And so it's sort of like been a big decision to say, like, yeah, this whole thing where like open source replicates things that we're doing internally, like, we think there's a better way, you know, and we think part of that way is like being part of the open source community. And just being the best place to run these open source frameworks on our
infrastructure. And that's really what cloud is about. You know, being the cheapest place being
the fastest place being the most convenient place. And it's also you know, there's, there's all sorts
of ancillary benefits, like, it's really hard to hire machine learning experts.
If you get an extra three months of time from each machine learning expert you hire because you don't have to teach them TensorFlow because they already know it because it's the industry standard, that's pretty great.
Yeah.
So let me just play devil's advocate and go the other way.
Of course, we're all about open source here, So you'll know I don't necessarily follow this reasoning, but you have and I had never.
That's a great insight.
And like we publish these white papers, people take the white papers, turn them into open source projects that we then end up having to like align ourselves to when.
Why not just open source it from the start and be the player in the game?
Why publish the white papers? Why don't you just keep it all to yourself and then you don in the game. Why publish the white papers?
Why don't you just keep it all to yourself and then you don't have to worry about any
of that stuff?
Yeah, I mean, that would be fine if people never left Google, right?
Or, you know, I mean, it would also be bad for like Google wins when computing as a whole
advances, right?
Like we're kind of all like the nice thing about the technology industry is kind of, we're all pushing against this abstract, like, how do we do this? We're not
fighting for pieces of a pie, you know, we're fighting against the arbitrary foe of not being
able to do stuff. Right. Right. Well said. All right. Well, we're bumping up against our first
break. After the break, we're going to get into how you would use TensorFlow, why you would use TensorFlow, all these questions that I'm sure
I'm not the only one that they're flowing around in my head.
And Eli is going to help us out with all these things right after this.
Linode is our cloud server of choice.
Get up and running in seconds with your choice of Linux distro,
resources and node location, SSD storage, 40 gigabit network, Intel E5 processors.
Use the promo code changelog20 for a $20 credit.
Two months free.
One of the fastest, most efficient SSD cloud servers is what we're building our new CMS on.
We love Linode.
We think you'll love them too.
Again, use the code changelog20 for $20 credit.
Head to linode.com slash changelog to get started. All right, we are back with Eli Bixby demystifying
TensorFlow, trying to figure out not just why it's cool, but what can I use it for? Who is it for? These kind
of questions. So Eli, we'll start with that one. Who is TensorFlow built for? It's obviously not
just for Google. Otherwise, they'd keep it to themselves. But who should be using this?
Yeah, so the initial audience was sort of largely, you know, research scientists or sort of people
at the cutting edge of machine learning. But we really want to move it back and add layers on top. So it's usable for data scientists and
it's usable for devs that are learning some data science, that are learning some machine learning,
and just very accessible for them. And so that's kind of where the sort of layering comes in. And there's a lot of that.
So research scientists, do those types of people typically have any programming knowledge?
Are they, would we consider them non-engineers? Are they newbie engineers? Where do they sit at
in the gamut of being able to program? So they're all, I all, I think, you know, fluent programmers,
but maybe not fluent software engineers is how I would describe them. And I think that's part
of why it's awesome to have like this environment, like Google, where you have, you know, domain
experts sitting next to really experienced software engineers who can kind of say, Oh,
you know, maybe you want to like frame your abstractions more like this because it'll be more extensible or save you maintenance costs
and that sort of stuff.
There's really value in bringing the two groups together.
I've also heard that when it comes to TensorFlow,
I guess more machine learning, more this deep learning in general,
it's like giving a machine eyes.
Almost that they can see something if
you gave it a picture it can kind of decipher what that is that's where you get some of those things
where if you hold your phone up and you look at a different language it'll repaint it in the proper
language can you expand on like this idea of giving the machine eyes yeah so i mean one of the
you know one of the first things that deep learning really excelled at above traditional machine learning techniques was computer vision.
So just to sort of step back, the distinction between machine learning and deep learning is, you know, deep learning is generally a subset of algorithms that require a lot more data, a lot more compute time, and a lot more time.
But you maybe need to know a little bit less about the structure of the data.
You need a little bit less domain expertise maybe to structure your algorithm.
I don't want to make it sound simple because that would be disingenuous,
but you don't need to craft your algorithm.
Basically, you have more tolerance in what's going on in your data. And you can learn sort of like many layers of structure.
So I don't know if you guys have seen like the deep dream images.
No.
So they're like these crazy images that look like they were, you know, drawn by someone who just
sees eyes everywhere. So they just like take an image and
they like crazy it up and put eyes everywhere. And that's because someone ran the model backwards.
And the model has some layer of abstraction that is specifically looking for eyes, you know,
above all the layers that are like figuring out what shapes are and what colors are and what lines
and all of those angles are. There's some layer that's like here's where
i look for eyes um and that's the sort of thing that deep learning excels at is picking out
uh higher level features from uh sort of like large data sets i think one such example is
in use at google is the google photosos recognition of not just color and location and metadata,
but actually objects in space.
This is a mountain.
Identification, right?
This is a mountain.
This is a dog, so on and so forth.
Is that particular feature using TensorFlow?
So there are, yeah.
So if we've open sourced actually the Inception model, I forget which version we open sourced,
but the model running in production to do sort of image classification is the task's name,
is called Inception.
And we open sourced a version of it, which is just TensorFlow code.
And you can actually train it yourself on your own data set. Or a really common task is to retrain part of the model on your data set and use it to fine tune. So like
image classification by itself might not be great at say, like learning what a flower is, right?
But all of those lower layers of abstraction, you know, here's where shapes are, here's
distinguishing foreground from background, here's, you know, here's where shapes are, here's distinguishing foreground
from background, here's, you know, colors and lines, those are all useful for a variety of
tasks, one of which is, you know, figuring out what flower a flower is. And so if you retrain
a piece of this model, on your data set, you can get much shorter training times use much less data,
but still sort of specialize the algorithm.
So like, then you can pick out, you know, I don't know, flowers, like maybe an orchid from an iris
or whatever. So the other thing I wanted to briefly mention is because there's a lot of people
who want to integrate machine learning into their applications, but don't actually have a data set.
And so if you don't have a data set, TensorFlow is not for you. If you don't have a data set, though, we do provide cloud APIs that provide access to the internal machine learning that Google uses.
So, for example, we have a vision API where you can send us a JPEG or send us some image.
I don't remember all the formats, but we'll send you back like bounding boxes or and like
labels for entities or we'll send you back like the emotions that faces in the image
are displaying all that stuff.
And it's just like a REST API.
It's super easy to use.
So it's almost like it's a comparative thing.
You send it to Google and you've already got this index of data that says basically all the learning yeah has been done and you and you provide that through an api
yeah so there are there are like kind of two distinct phases of machine learning there's like
training and then there's inference and so one of the things that that tensorflow is made to do is
make it really easy to bundle up a trained model and then use it for inference elsewhere.
So like you can, you can like basically, you can export these models down to files that are,
you know, like, maybe like 100 megabytes. And you can put them on phones, you can put them, you know, on whatever device and do inference on that device, which is pretty cool. And so yeah, so behind the scenes, we have for
the vision API, you know, we have a trained model that we're using to do inference on the images
that you send us. And, you know, we periodically update that trained model so that you're getting,
you know, something near the best of what we have, whenever you call the vision API.
And we have a similar API for natural language
and translate, of course, and speech.
You can actually send us audio files
and we'll do speech to text for you.
So let's do this.
Well, let's give this a shot.
We'll see if it works.
I'm gonna give you kind of a basic use case
based on your idea about flowers.
And maybe you can help us out.
Give us kind of a high level soup to nuts, like how you
would accomplish this using TensorFlow or whatever tools we have available. So I run a website that's
all about orchids. I love orchids, but I hate other flowers. This is all hypothetical. I don't
care about orchids. But let's say I'm an orchid enthusiast and it's a user generated content
website. And so I want my users to be uploading
me pictures of orchids because I love orchids. I already have a compendium of images to use as a,
for training. Let's say I have, well, how many would I need to have? Let's say I have 10,000
orchid pictures. Oh, let's say you have a hundred thousand to a million orchid pictures. Okay. Let's
say I have a 3 million orchid pictures on my hard drive. Um, And what I want to do is I want to allow a person to upload
a picture to a website, my website that I run, and my website will determine this is or is not
an orchid. I believe that's as simple as it can get. Yes or no, this is an orchid. And then I can
use that to either reject their picture or to accept it into my, you know, I have 3 million and
one. So can you take us through like high level? What would I start?
How would I get this feature done so that I can start having people upload orchid pictures today?
Okay.
So the first thing is like, you got to figure out what kind of model you're looking for,
what kind of model you want to build.
And like almost pretty much all models, almost all models are broken down into sort of two
categories.
So, all right, we're in the domain of supervised learning,
which means I have data and I want to teach you things from my data.
Within that domain, we have regression,
which is I want my model to spit out a number, 45.2.
Or classification, which is I want my model to spit out a category.
Orchid, not orchid.
So here we know we're in a classification problem.
So then what we really need to do is we need to figure out, okay, what models have historically
been successful for image classification?
And you can go see, oh, Google has open sourced this inception model,
but it takes two weeks to train from scratch. And probably my 3 million image data set is not
going to be enough to train inception from scratch. It's this huge sprawling model.
But you say, okay, many image classification tasks share a lot in common. Like I was mentioning
before,
we have to identify these higher level features.
So maybe I'll just try retraining the top few layers of the inception model.
This is a task called transfer learning.
And then you teleport to four weeks in the future
when my transfer learning sample has been published.
Okay.
And you look at that sample
and you look at how to do transfer learning. I'm sure
if you Google transfer learning TensorFlow, you'll find a sample that is not mine. Yeah. So that's
sort of the breakdown high level. Then what you're going to end up doing is you're going to run
this code and you're probably going to need to run it in a distributed environment. Inception
is a big model. You have a lot of data. So you're going to need to figure
out how to do that. You can run it. We have bundled it up. We provide basically wrappers
around code that starts a gRPC server. So I don't know if you guys know gRPC. It's this like
efficient RPC framework from Google. It uses the protobuf serialization format and it's basically
built to be fast for large amounts of
data and support streaming and all that good stuff.
gRPC is referenced a lot on GoTime.
We have another show called GoTime, so
it's all about Golang, but
gRPC is referenced a lot on that show.
Cool, yeah. gRPC
is what sort of runs behind the scenes
to enable distributed TensorFlow,
but the abstractions that
you use to distribute your model are actually really simple. It's literally it's like a Python
context. You're like with my GPU and then everything in that Python context runs on your GPU
and it's magic. So you're going to need to look up how to distribute it. Fortunately, there is a
tutorial on how to distribute TensorFlow inception training on Kubernetes that's published out there.
So Kubernetes is great because you can start up your Docker containers and, you know, Kubernetes will do all the DNS stuff for you.
And so you don't have to worry about too much about connecting all your TensorFlow servers.
Or scratch all of the last few paragraphs,
you could use cloud machine learning
to train your distributed model,
which is Google offering training basically as a service.
So it's like hosted TensorFlow as a service.
But you're going to run into another problem
after you've trained,
which is how do I efficiently do inference?
Right.
Like Inception is this deep model that does lots and lots of linear algebra operations in order to do inference, even without having to calculate the updates for all your little parameters.
It's still going to do a lot of operations.
So how do you do that quickly?
And TensorFlow Serving is another project.
It's a C++ library that starts a gRPC server
that basically takes a serialized TensorFlow model,
starts up a server, and gives you a gRPC API
that you can call with your examples.
Or, again, instead of figuring out TensorFlow serving,
you can just upload your serialized example
to Cloud Machine Learning inference
and do inference from there.
And so now you have a nice gRPC API
that your website can call with your user's data.
You'll probably have to do a little bit of data munging
to turn it into the right format for inference,
but that's sort of the high-level overview.
Okay, so training is something that can take multiple weeks,
and you would love to have somebody else's data set already trained,
or you can't solve a problem today,
but maybe you can solve one next quarter.
But once you get your model trained,
you still have this inference problem,
which is more the way I'm hearing it.
It's like kind of the real time aspect of applying the model.
And please help my verbiage if I'm getting it wrong.
Yeah.
Against this particular piece of data that I'm holding onto right now.
Yeah.
And running it through that.
And so those are two separate problems,
both solved by TensorFlow in different ways ways one is the serving project one is by tensorflow proper is that am i
tracking you yep okay yeah so a model really consists of sort of two parts the architecture
which is like how each individual node feeds into each other node and like how those nodes activation functions work
and how you're and and how you're updating the variables which is the other piece of a model
so you have all these variables that are sort of they store the knowledge of the model so so this
is what we mean when we open source like sometimes so we open sourced SyntaxNet, for example.
This is an architecture for doing natural language processing in TensorFlow.
But it's separate from the trained model that we open sourced, which was Parsemic Parseface.
I laugh whenever I hear it. Yeah. So which is, we like to be whimsical,
but which is the trained model with those variables
at their correct values, right?
So you can think of like all these sliders moving around
and some position of these enormous number of sliders
corresponds to a model that does the thing you want it to do.
And the process of training is finding the right position
for all of these sliders.
Gotcha. But even then you have then you have your your inference which happens after you even have the trained model yeah so you like fix all your sliders and then you're just
running your values through your architecture to get the result once you got this trained and you
got a model you're doing inferencing how do you maintain it for, I guess, accuracy?
Yeah.
What's the process there?
Yeah.
So this is like a great, there's like all of these sort of DevOps-y questions around
machine learning.
And I like, if you're looking to get involved in the machine learning community, I think
this is a great place to do it.
I think there's tons of tooling that, you know, developers are used to having that machine
learning experts are not used to having,
but they don't know that they want
or they do know that they want,
like stuff like CI.
And we're trying to solve a lot of that stuff in the cloud,
but I think there's a lot of room for additional tooling.
So like how we handle it in Cloud ML, for example,
is you retrain the model on your new data set.
So you say you went from 3 million to 6 million images and you want to retrain your model because you think you'll get a better accuracy.
Or you changed your model architecture and you want to retrain it on the same data set.
You retrain your model, you export another binary, you upload it to, you know, the inference API as, you know, a new version.
And suddenly your, you know, your front end or whatever
just has a new version to hit that maybe has a different accuracy.
Hopefully that was answering your question.
Yeah, I mean, I'm just wondering, like, from a developer standpoint,
what does it take to, I guess, actually manage the accuracy of it?
Not just from a programmatic standpoint and, you know,
the cloud that you mentioned for Google that you've got there
and all that stuff, like, you know, how do you, how do you know it's
being accurate?
How do you know you're getting back the right results, I guess.
And I guess you take that feedback from a user once they upload the orchid picture and
it's like, yes or no, this is an orchid.
And if, you know, if they say like, clearly I know this is an orchid.
I took the picture.
I grew the orchid.
It's an, it's an orchid.
Yeah.
So there's, this is like a whole area of research so so the easy
example is like whenever users correct you you just file that away in your pocket and add it
to your data set does that go back into the training portion of it then yeah then you then
you retrain your model but there's a whole other field which is like online learning is what it's
called and this is online this is online in the sense not of the internet but of the sense of
like a continuous process like an online sorting algorithm and and this is online in the sense not of the internet, but of the sense of a continuous process, like an online sorting algorithm.
And this is basically like, how do we...
It's really hard.
I don't think there's a lot of people doing it in production, but it's like, how do we
give immediate feedback to our models and improve the accuracy?
And so you see how nascent the field is.
You as a layperson stumbled upon a very hard problem that we're not really solving in very many production systems yet.
Right.
So even at Google scale or Apple scale, the people that are putting this into practical products, when it comes to retraining with new data sets, when Google's photos catalog goes from three petabytes to three petabytes to six petabytes.
They just retrain the entire thing.
Or do they have or you guys have some of the stuff going where you're doing this incremental
training?
So this is where we do retrain a model.
But this is where like things like transfer learning come in, where you'll maybe only
retrain a part of the model or you'll take your take.
So this is very common when you want to like you want to change the architecture, but you want sort of keep, you know, you don't want to retrain the whole thing from scratch.
That's the field of like transfer learning.
That's the field of like, I want to somehow use the output.
And this is, this actually gets into a really interesting thing that I want to talk about, which is this like marketplace of models, which is sort of, I think, how we envision the future.
But you use the output of one model as the input to another model, or you use the output
of one model to train another model.
And there's lots of ways to do transfer learning, but it's basically one of the goals.
So one of the goals that I already talked about was like, you have an abbreviated data
set, an abbreviated amount of time, and you want to specialize an existing trained model. But one of the other goals is I changed my network architecture and I want to
use the learned parameters of my old network architecture on my new network architecture,
or, you know, I have a new data set and I want to like reuse to some extent the, the,
the existing trained model. So these are all like sort of hard problems, but it's very common to actually use the output
of one model as the input to another model.
So this is, for example,
I was talking about word2vec,
where we like embed these words in this vector
and the vector is meaningful.
So if you take like the vector for king,
subtract the vector for man,
add the vector for woman, you the vector for man, add the vector
for woman, you get the vector for queen. Like they have these sort of like crazy tied relationships.
So if you build a model that makes word to VEC embeddings, you can use those word to VEC
embeddings as the input to another model that say maybe figures out whether your comments are,
you know, trolls or not, because the first model has some like knowledge of the structure of the English
language.
And so, so this sort of gets to sort of how I think we see things a little bit farther
in the future, which where, you know, we have this marketplace of models where you you need
to sort of like, pick up a grab bag of models to accomplish your task,
but you don't necessarily have to do much retraining at all to get your goal,
or you don't have to be an ML expert to get to your goal.
You can just sort of like pick up this grab bag of models that does what you want.
And then ML experts can get paid for their expertise
in developing these models through this marketplace.
Yeah.
So I think that's very fascinating.
And I think a marketplace like that would flourish because as the proprietor of the
Orchid website, I don't care at all about any of that stuff.
I just want to know, is this an Orchid or not?
And so if you can just give me an API that was going to say, is it going to give me classification,
for instance, or I'm sure they have to be more complex than that but point being is like at the end of the day most
people are trying to have just specific questions answered and i'm sure i would be willing to pay
money to get that answered as opposed to setting up a kubernetes cluster and so on and so forth
maintaining the learning and all that stuff yeah and and that's uh sort of i think the
the natural language and the speech and the vision
and the Translate APIs are sort of the first shot at that
where you could probably use it for your Orchid website.
You just throw away every label that isn't Orchid
and then the ones that are Orchid
are the ones you say are an Orchid.
Right.
It'll give you back all the labels.
I don't think there's a way to say,
I only care about X label, but certainly that's something you could imagine all the labels. I don't think there's a way to say, I only care about X label.
But certainly that's something you could imagine in the future.
Right.
In the meantime, I'll probably just code it up against Mechanical Turk and let them decide if it's an Orchid or not.
Yeah.
That's just a bad joke.
I was going to say, you're going to basically rely back on humans again.
Yeah, exactly.
There are people who use Mechanical Turk to produce training data sets when they don't have data.
That would be smart.
It's kind of funny.
Yeah.
Yeah.
Well, you always need a human somewhere in the mix.
That's for sure.
We're coming up on our next break.
But we do want to dive deeper into TensorFlow models.
So let's break here.
When we come back, we'll dive a little further into that with Eli.
If you're focused on the data layer, there's an awesome conference being put on by our friends
at Compose. Modelethic databases are so 20th century. Today, teams are using a JSON document
store for scale, a graphing database to create connections, a message queue to handle messaging,
a caching system to accelerate responses, a time series database for streaming data, and a relational database for metrics
and more, it can be hard to stay on top of all your options, and that's why you should
attend.
While much talk in developer circles these days focuses on the app layer, not enough
attention is placed on the data layer, and data is the secret ingredient to ensuring
applications are optimized for speed, security, and user experience.
Hear talks from GitHub, Artsy, LinkedIn, Meteor, Capital One, and several startups, including Elemento and DynamiteDB.
Talks range from the Polyglot Enterprise to using GraphQL to expose data backed by MySQL, Elasticsearch, and more.
The conference is in Seattle on September 28th.
Tickets are just $99, and Changelog listeners get 20% off.
Head to datalayer.com and use the code CHANGELOG when you register.
We're back with Eli Bixby talking about TensorFlow.
Eli, we've got models, we've got layers,
we've got all sorts of things that make up TensorFlow.
Like help us break down the components here.
I think there's a lot of different layers of abstraction to work at, one of which is actually called layers.
So that's going to be confusing.
But there's a lot of different levels, maybe is better, of abstraction, depending on who you are.
So a lot of work, like I said, we sort of initially launched with this research scientist primary audience. But a lot of work since then has gone into building these easier to use levels
of abstraction on top of TensorFlow, like the core graph execution system.
So one of the big ones that we launched recently is tflearn,
which you can find at tf.contrib.learn.
And what that does is it provides sort of like,
if you use SciPy or any of those other sort of like ML packages
that sort of wrap up a whole model in like one line,
it does one of those.
So you can be like, okay, I got my deep neural net classifier
and I'm going to fit it on this data set that I have in Python
and then I'm going to train it or I'm going to evaluate.
And so you get like these nice like five line model definitions.
Sort of one rung down from that is layers,
which basically lets you, you know, it's maybe like turns five lines of code into one
instead of defining like some parameters and a function that takes the parameters. You just
define like, here's a, like a convolution layer, which is a thing that I won't get into, but, um,
and then wide and deep, which is like, there is a blog post about it, but it's like, okay,
here's my data set.
And it's sparse data.
So the difference between sparse and dense data is pretty simple.
Sparse data is like words where you've got a bunch of words and you don't necessarily know how they relate to each other.
So you can't number the words like word zero, word one, word two, word three.
Because why is cat more related to you know bucket than it is the dog it just happened to be in the ordering that we gave it like there's no
proper ordering of words um so that's sparse data whereas dense data is like here's a bunch of
numbers so like if you had like weights and heights and stuff that would be dense data
so you can just uh if you know your data set,
you know things like whether it's sparse or dense about it,
you can just say, here's my data set,
like here's the columns,
and then like evaluate the relationship
between these two columns, stuff like that.
So these are really like great tools for data scientists
who maybe aren't ML experts,
but like have done a little bit of ML
and sort of want to move more
into ml they're like great uh great places to hang out at great places to experiment with
and then below all that is like the fundamental like here's like an optimizer that runs on the
model architecture that i spent you know 50 lines describing lots of moving parts and it seems like
it's such a big project and effort
that just depending, like you said,
where you're coming from or who you are,
I think is what you said,
what you're trying to do with it,
it kind of informs where you're going to start
and where you should focus.
But what about just the curious developer
who doesn't want to fall behind
and everybody's mentioning ML
and TensorFlow is always out there on Hacker News
and she's just wondering,
I should try, I should learn more about this.
I've listened to the changelog on it.
I'm excited, but I'm still a little bit mystified.
So it makes sense, Eli,
that you're giving these talks and workshops
at OSCON and other conferences
because there's a lot to learn here.
But just give us like a getting started.
Like if you just had a friend at a bar that says,
hey, I'd like to give TensorFlow a shot.
What would you tell that person?
Well, so if they've never seen machine learning before,
sort of like the gold standard is the Stanford Coursera course on machine learning.
It's a great introductory course.
I can't recommend it enough.
Covers lots of topics.
And if you want sort of an extra challenge,
you can try like duplicating your assignments in TensorFlow
and that's totally possible.
Also, if you're familiar with machine learning
and you just want to get into TensorFlow,
there's a Udacity course, I believe,
on TensorFlow, on deep learning using TensorFlow for people who have machine learning experience, but not necessarily deep learning experience.
So, again, like the getting started guide is like always depends on on who you are.
And, you know, it's like, yeah, we don't really have an avenue for, you know, people who have never written a line of code in their life before.
That's, you know, pretty common of a lot of projects.
But it's hard to say, like, OK, where do you draw the line of people you're bringing into the fold of machine learning?
Yeah, I think there's never been a better time to learn machine learning.
And certainly you can start with some of those resources. resources uh there are a lot of great ml classes that are online that sort of like started the
whole like mooc thing and i would definitely recommend taking one of those just so you can
get familiar with the terms honestly it's like there's a lot of terminology that i've been
throwing around like classification and regression and and and so much of it will start to click
as you're reading blog posts, as you're reading documentation.
It's this feedback effect.
The more terms you get, the more the things that you are reading every day anyways start to make sense.
And the more you learn from them.
So we will definitely link up resources.
You just mentioned the Stanford course.
If there's a Coursera course, give us those links, Eli,
and we will include those in the show notes
so those interested can find them easily.
Definitely myself might need to check out that Stanford course.
It sounds like a great way to at least demystify, like you said,
the terms and the jargon around it
because there is definitely a lot of surface area here.
I think even before the show,
you mentioned that I said,
I think I may have said the TensorFlow team at Google.
And you said, well, it's not like there's just,
you know, 12 people sitting in a room
and they do TensorFlow.
This is the kind of thing that kind of sprawls
across campuses and groups.
And there's people that work on it
from all different angles of the company.
Is that correct? Yeah, yeah. It's like, and like the number of people who contribute, you know,
is one number, but then like the number of people who are using it at Google and outside of Google,
like it's like, you know, my feeds blew up after it was released because everyone was trying it.
But that like level of excitement has stayed like shockingly high. It's like there's so many people publishing projects with it, which is which is
fun because like once you learn how to like sort of understand what's going on in the models or
like once you learn how to read, you know, read papers describing new models, there's so much
content out there to learn. And it's just getting over that initial hump,
getting the terminology.
But I totally think, you know,
I don't think it's like restricted
to this elite class of programmer, I think,
or elite class of academic person.
I really think it is like a thing all devs can learn.
I believe that's the case.
I think it's going to get easier and easier as well.
And it seems like there's still some infrastructure around kind of the devopsy concerns like the more we can remove
tangential education in order to learn what you know what you need to learn to use it
yeah i think that that will will make that more and more true you mentioned excitement and and
people releasing other projects.
One project that got me excited, and maybe you can share some insights on this.
Maybe you can't,
but it's out of the TensorFlow org,
at least,
um,
on GitHub is the Magenta project.
Oh yeah.
Which is music and art generation with machine intelligence.
Can you,
can you share what that is with people?
Yeah.
So generative models is sort of the, the class is like a kind of, uh, it, it honestly,
it's like one of my, it's so cool. Um, it's one of my favorite things, uh, in machine learning,
but it's basically this, like you take, you take data in and you want to sort of generate examples
of this data that fit, you know, that fit in the class.
And so Magenta, you know, I don't know a lot about the specifics of Magenta, but it is an example of this generative models where you take in music and you sort of like, you
know, generate music from the like the properties of all of the music that you've taken in really.
It's pretty fascinating.
And another example, there's a sort of a big paper
in this area was on, on, uh, generative adversarial networks, uh, which is like a fun little thing
that I've been interested in looking into, but you can go see the same thing for images actually.
And they generate these kinds of like spooky, uncanny Valley images, um, of, of like faces or
bedrooms or whatever, where you like at first glance it passes off
as like a bedroom and then you realize that like like the window curtains are like sheets or
something and not actually curtains and like the bed has like a window in it or something
just like little like weird things out of place but but they do like a remarkably good job of
generating this and So yeah,
there's a whole team working on
Magenta and these
generated...
What's the point of it?
For fun?
Yeah, I think it's for fun.
Push the boundaries?
Yeah, I think you get a lot of people,
you get a big injection of an academic
mindset into these communities and i
like the academic mindset is very much like we want to do this thing because it's tricky
and like we have no ulterior motive for doing it other than the fact that it's tricky um and and we
we believe that we'll learn things from accomplishing this this hard task um there was
there was actually like a gallery showing in San Francisco of like
generated artworks.
That's pretty sweet.
Yeah.
The read me,
it says,
you know,
this is a project that asks,
can we use machine learning to create compelling art and music?
It's like,
can we do it is really the point of it.
And if so,
how,
if not,
why not?
Yep.
So it's,
it's very much like he said,
because it's tricky and also
because it gets people involved that otherwise wouldn't right like now you have this cross
melding of musicians working with um you know research scientists and so on so interesting
stuff i stumbled on the blog post on the magenta blog it's magenta midi interface so i'm thinking
like okay now you can actually probably as a
musician create your own from your own keyboard midi wise and then maybe pass that through
something i don't know to create like random art random music seems pretty interesting though
there's an api for that yeah definitely definitely inviting a lot of unique people into the mix of
program which is what i like about this too. Cause like the bigger picture here is like why open sources in the first
place,
right?
And we kind of inch that to a degree with like,
you know,
Google has released white papers before and had to deal with,
you know,
other people making things,
I guess,
and having to,
to use that stuff that you'd actually white paper to now in this case with
TensorFlow,
you're actually stepping out and open sourcing yourself may not be the only
reason why to open source it, but now it's sort of this this invitation to a lot more than
just academics just developers just engineers now it's musicians now it's artists now it's
a lot of different ways orchid lovers obviously so well uh yeah this is the this is what we love about open source, right? It's like, you know, the community is open and available, and we try to bring as many people in because we all benefit from bringing as many people in as possible.
How often are you asked to the front, the forefront for Google in terms of educating the developers, the masses, doing workshops and talks at OSCON, for example.
So you're on the forefront of this.
So I'm curious if you can help us maybe predict some future.
I'm not sure how far in the future we should go but where do
you see things heading when it comes to machine learning deep learning tensorflow and others
yeah so i guess this is recorded so like someone can point to this like five years in the future
and be like oh this guy this guy's wrong he was totally wrong or totally right or this guy was
brilliant yeah yeah this is your big moment right here don't blow it right yeah you know i think
the that sort of marketplace that i was talking about is where I would love to see things go.
I'd love to see like a much sort of lower latency and higher bandwidth pipeline between academia and industry in this area.
And I think we're moving in that direction.
And I think TensorFlow is a great step forward.
I can't even begin to predict what the next sort of like big shockwave in the machine learning community is.
You know, there are people so much better qualified than myself to do that, that any attempt of for me to do it would be would be insanity.
But I do think that you're going to see this like this sort of use the, I'm making air quotes, but you can't see them, the ecosystem, you're going to see it become really big and become really diverse.
And I think sort of what we saw in the data science community, right, where we have this, like, enormous proliferation of tools, if we can keep from, like, duplication, I think we'll move so much faster than we have previously.
Yeah, it's hard to...
I don't know about concrete predictions.
We will have robot overlords by the year XYZ.
But I think it's pretty clear now to everyone
that the industry standard for X
is going to be open source.
And so I think we're hoping that it's TensorFlow. And I think we're hoping that people
are happy that it's TensorFlow because it's a good, you know, a good tool. And I do see that
happening. Well, speaking of people who have enough experience to forecast the future, our
other show in partnership with O'Reilly is from none other than Corey Doctorow, who's
keynoting the conference.
And so, Adam, we might have to ask him a similar question.
We can compare and contrast Eli's answer and Corey's answer.
And five years from now, we can decide, you know, who got it right.
I'm sure he'll have a much snappier, less rambly answer because he probably gets asked
that question all the time.
He probably does yeah
i i think uh his keynote is actually on how you got here is is actually the exact title of it so
i'm curious and i'm i'm really jealous jerry because you're actually going to this conference
and i'm not and it's also the first time that uh the changelog is officially going international
our our show has been international forever, basically.
But we've never actually stepped foot off of U.S. soil
to represent the changelog anywhere.
And so you get to be the cool one and do it first.
And you'll get to see this keynote firsthand
and actually get to talk to Corey face-to-face too.
But we'll have him on the show.
I think, when's he scheduled?
Do we know?
Did we earmark that? We don't have the too, but we'll have him on the show. I think once he's scheduled, we know that we,
we earmarked that we don't have the notes,
but a couple weeks out,
a couple weeks out, we're talking to Corey Doctorow,
probably the larger picture of open source.
You know,
he's got some really deep insights background from the EFF and a bunch of
stuff at,
as an editor of Boing Boing and,
you know,
just really well-rounded person.
So that might be kind of interesting but
as we did mention that uh eli you got the workshop diving into machine learning through tensorflow
and you also got the talk deep learning with tensorflow at this upcoming oscon in the united
kingdom over there in london uh once again if you didn't hear that, the code to use is PCCL20.
Get 20% off your registration.
Tell your friends we want to meet you there.
When I say we, I mean Jared.
Bummer for me, of course.
OzCon.com slash UK is an easy way to get to the website for the upcoming OzCon.
Eli, before we close out, any closing thoughts from you on machine learning, deep learning, OSCON, your talks, your workshop, what else you want to share?
Yeah, I mean, I like I'll make sure to get you guys all those resources, you know, those those classes.
And, you know, if you're looking for another podcast to learn about this sort of stuff, check out the GCP podcast, gcppodcast.com. But like I said before,
I think now is the best time
to start learning this stuff.
I think there's so many resources out there.
Like don't,
don't be afraid to jump in.
I promise it's,
it's,
you know,
not impossible.
It's not too difficult.
Yeah.
Just,
just start,
start learning about this stuff.
It's fun.
Cool.
And one more mention of, of OzCon, we actually will have a table.
We'll be in the Expo Hall.
I have no idea where that's at, but if you're going to be there, look for us somewhere in
the Expo Hall.
We'll have a table.
We'll have a big banner that says Change Log in front of it.
We'll have two microphones sitting there, so we're going to talk to attendees.
We're going to talk to some speakers.
And afterwards, we're working with O' a rally to produce kind of a recap show we're not sure
if it's gonna be one show or a several small shows we're not really sure but we think that
that's the general idea is one larger recap show at least if not many smaller ones of different
conversations but uh once again oscon.com slash uk use use the code PCCL20 for 20% off your registration.
See Eli, see Cordoctro, meet Jared, sit down and talk, all that fun stuff.
But Eli, thanks so much for this walkthrough of all this.
I know it's a deep conversation to have around machine learning and deep learning.
It's certainly not the easiest to navigate, but you did well.
And I appreciate you walking us through it. And I can see why you have the
program engineer job at Google. So that makes total sense for that. But it's time to say goodbye.
Listeners, thanks so much for tuning in. And again, if you want to go to OSCON, use our code,
get 20% off, meet us, talk to us, and hear from Eli and Corey there, as well as many others at
OSCON London.
But that's it, fellas.
So let's say goodbye on this show.
Goodbye.
Thanks, Eli.
Thanks.
Bye. Outro Music