CppCast - TensorFlow
Episode Date: July 23, 2020Rob and Jason are joined by Andrew Selle from Google. They first discuss Ranges support being added to Visual Studio, and Compiler Explorer's support for using some libraries. Then they talk to Andrew... Selle from Google about Machine Learning with Tensorflow and Tensorflow Lite which he was one of the initial architects for. News Initial support for Ranges in MSVC Support for Libraries in Compiler Explorer Cmake 3.18 Release Links TensorFlow TensorFlow users TensorFlow on small and mobile devices Eigen library for linear algebra using expression templates C Bindings for TensorFlow AI Responsibilities Sponsors PVS-Studio. Write #cppcast in the message field on the download page and get one month license PVS-Studio is now in Compiler Explorer! Free PVS-Studio for Students and Teachers
Transcript
Discussion (0)
Episode 257 of CppCast with guest Andrew Selle recorded July 23rd, 2020.
Sponsor of this episode of CppCast is the PVS Studio team.
The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we discuss library support
and Compiler Explorer.
Then we talk to Andrew Selle from Google.
Andrew talks to us about machine learning
with TensorFlow and TensorFlow Lite. Welcome to episode 257 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm all right, Rob. How are you doing?
Doing okay. Don't think I have too much news to share here. Very hot week here in North Carolina. You?
We've actually had a slight cool down. It's only been in the low 90s here instead of the upper 90s.
Oh, yes. It's nice and cold.
David would like to comment that I did just now, like 10 minutes ago um make my cpp con registration so i am planning
to attend cpp con virtually um so just to reminder to our listeners that it's
soon that the early bird expires sometime in august right i still need to do that myself
it's like the fifth yeah it's august so it'll be like two weeks two weeks yeah yeah
and it saves you a hundred dollars it's 200 versus 300 so it's a pretty significant 50
50 30 savings yeah definitely worth it to sign up early yeah okay well at the top of episode
electric piece of feedback uh we got this email from michael about uh the episode from two weeks
ago with uh the disney hyperion renderer team and they said this episode was great i really enjoyed
listening to yinning and david hearing about how the disney render their movies was super fascinating
keep up the good work and uh yinning was, Yining Carl, was the one who actually
recommended our guest
for today.
Oh, that's cool.
Yeah.
We'd love to hear your
thoughts about the show.
You can always reach out
to us on Facebook,
Twitter,
or email us at
feedback.cbs.com.
And don't forget to
leave us a review on
iTunes or subscribe
on YouTube.
Joining us today
is Andrew Selle.
Andrew is a senior
software engineer
for TensorFlow Lite
at Google
and is one of its initial architects. He's software engineer for TensorFlow Lite at Google and is
one of its initial architects. He's also worked on improvements to the core and API of TensorFlow.
Previously, he worked extensively in research and development of highly parallel numerical
physical simulation techniques, physical phenomena, like for film and physically based rendering.
He worked on several Walt Disney animation films, including Frozen and Zootopia. He holds a PhD in
computer science from Stanford University.
Andrew, welcome to the show.
Well, thanks for having me.
I'm kind of curious what drove this move from animation simulation, physical simulation to TensorFlow.
Yeah, I get that question a lot.
I think if you look at the core of what you have to do to do physical simulation for film or physical simulation for physics,
it's actually a lot of similar skill sets to what you do for machine learning.
And at the same time, as I was doing things in film,
I started getting more interested in what was happening in machine learning,
and I wanted to give it a try.
Okay. Very cool.
I'm also curious what highly parallel means in your universe.
Yeah, so I think this has varied over time,
but when I got into physical simulation,
there was a big problem with doing things in film
is that people were using single computers.
So one of the things that I worked on at Stanford
was extending physical simulation to work on MPI.
And of course, this had been done for supercomputing for a long time.
But we took sort of algorithms that had been applied to film,
and we scaled them.
So we did the first simulation of clothing
that was like a million triangles,
which most people were simulating around 10,000 at the time.
This was a while ago.
So people have gotten to these levels as just
bread and butter types of scale. But since then, you know, I've done a lot of GPU stuff. I've done
a lot of distributed parallelism and just a lot of micro optimization on single core as well.
I distinctly remember the first time I saw a clothing or a cloth simulation, physical simulation. I was at SIGGRAPH and I was in
high school. It was in like 1994 or something like that. And it seemed like complete magic at the
time. And it was like, they had a single piece of fabric that they were able to simulate, you know,
like things have come in a long way since then. Yeah. I mean, it turns out that the biggest problem for clothing is the collision detection.
So when you have a cloth fold over itself, you have to make sure it doesn't fly through itself.
And that's where sort of trivial parallelism where you say, oh, this part of the cloth is not anywhere near this other cloth starts to fail.
And you start to have this problem of like an N squared possible interactions where any point of the cloth
can contact any other point. And you have to check all of those in an efficient way.
So you do actually have to check them all, or is there a shortcut?
Well, I mean, you use spatial structures to accelerate that. So one of the most common
one is a bounding box hierarchy where you say, well, this triangle is within this bounding box
and this neighbor of it is within this bounding box. If we put those together together then we can say there's a bounding box that contains both of them and you
continue with that and you create a hierarchy of bounding boxes then it it reduces uh you know not
the worst case scenario but the average case scenario so that it's tractable i mean in the
worst case you could have all the clothing compressed into a single like tiny box and that would be very
difficult to solve right cool okay well andrew we're going to have a couple news articles to
discuss uh feel free to comment on any of these and we'll start talking more about tensorflow okay
yeah awesome all right so this first one we have an article on the visual C++ blog, and this is that initial support for C++ 20 ranges
are now available in the latest version,
Visual Studio 2019 version 16.6.
The first user-visible pieces of range of support.
Apparently they've been working on it
kind of under the covers for a while,
but now you can actually kind of go and test it out,
kick the tires.
I feel like it's more significant than that
as well it's not just ranges but it is ranges built on concepts yeah so for a long time the
ranges implementations that we've had have been built with concepts emulation and now we're we're
just about there we could actually see this stuff coming together and concepts it looks like has
been in there for the past three dot releases.
So now they're using ranges with full concepts.
Yeah, good stuff.
You had a chance to play around with ranges at all, Andy?
Unfortunately, no.
A lot of times we've been limited to C++14 and TensorFlow.
Oh, okay.
Well, there is a request here before we move on that please go try it out,
kick the tires, submit bug reports, and a reminder that here before we move on that please go try it out, kick the tires,
submit bug reports, and a reminder that all this stuff
is being developed on GitHub now.
The STL implementation for Microsoft is all there.
So check it out if you can, submit bug reports, whatever.
Yeah, definitely.
Okay, next thing we have is some new features
in Compiler Explorer.
Do you know how long Matt's been working on these, Jason?
Being able to link libraries is the new feature.
Well, I think, I don't know how long they've been working on it.
Okay.
Although I know that Matt would like us to point out
that this was not just Matt's work.
I don't know how much involvement he had in it.
It is quite the team that helps him with Compiler Explorer at this point.
So, but yeah, I think it's awesome because there's been a few times
when I've tried to use libformat in Compiler Explorer.
It's just unusable because you have to use the header-only version.
It takes forever to compile, whatever.
And so now they have libraries that you can link to.
Do they have a full list of libraries here?
It looks like they're listing a couple of the unit testing libraries,
but I'm not sure what other ones are available.
No, libformat's the only other one that I know specifically is called out,
plus the unit test ones.
I haven't looked myself, actually.
I may have just found it.
Yeah, Google Benchmark, Intel, TBB, format, cross cables.
I'm not sure what that is.
Catch-2, DLib, Google Test.
That's fascinating with Google Benchmark also being supported
because there's
quick bench compiler explorer integration if you haven't seen that so if you're in compile explorer
you can hit a button and go to quick bench and i'll just copy the code over and from quick bench
you can go back to compiler explorer and having that library now shared between them means you
can get a more a more complete integration between the two i'm'm guessing. I haven't tried it yet, though. Okay, and then the last thing we have is CMake 3.18 releases out. Anything worth pointing out
with this, Jason? It's a big release. I'll ask Andy, did you read this? Do you have any interest
in CMake? I've used CMake a lot in the past, but I haven't recently. TensorFlow was using CMake
for Windows support before Bazel supported Windows. But now I think we're not using it
that much. But yeah, I enjoy using CMake for small projects. And the cross platform
support brings me back there when I need to do that often. Yeah, absolutely. So yeah,
there's nothing specific that I feel like
I need to call out on here. There's just so many little changes. And it's clear that, oh, no, wait,
no, there is one thing I'll call out profiling support. So if your CMake project currently
takes forever to configure and generate, you can run the profiler, figure out where it's spending
its time. Very cool. I need to try that. All right. Well,
Andy, we've had a couple people recently ask us to do an episode on TensorFlow. So could we maybe
start off by just letting you explain what exactly TensorFlow is for listeners who have
never worked with it, which I think includes both me and Jason. Sure. I mean, TensorFlow is a big project and it offers a lot of functionality, but the main
core idea behind it is that it's an open source library that lets you develop and train
machine learning models.
And the basis behind it was the idea that you can start from research and a researcher
can create a model and you can take it all the way to production and deployment on a wide variety of devices, including smaller devices like
mobile devices.
So given that scope, it has a lot of features and I'm not even an expert on a large majority
of them.
Okay.
That's interesting that it has mobile applications.
What does that look like to be doing machine learning on a phone?
I'm curious.
So on the phone, you typically do machine inference.
So that's the other half of it.
So once you train a machine learning model using data, you often want to use it to integrate into applications.
So a lot of the work has been doing inference on a mobile device where you carry the weights that describe the function and the program that describes the function,
and then you can evaluate it on whatever inputs there are.
We can get more into TensorFlow Lite,
which is the mobile product for part of TensorFlow in a bit.
Okay, so you've already used words like weights and models,
and for those of us who don't use it,
I would love the explain like I'm five
description of what machine learning is. Why are we training models? What are we doing with them?
How does this work? Yeah, definitely. So traditional machine learning is about creating a
function. And it's, you know, it processes some inputs, and it produces some outputs. How do you
create that function? Without machine learning, we have a way of doing that algorithmic. So we say, I want to make a function that compute the
sum of two numbers. I can do that. I know how to write an algorithm to do that. It's fairly
mathematical. The big difference in machine learning is that you use data. And how you use
data can vary a lot of different ways. But if you kind of think of the simplest thing you could do with machine learning,
like the simplest form of machine learning would be something like a linear regression.
So you have a bunch of XY data, and you want to find the line that most closely matches it.
So you think about that function, and the way you describe a line is basically a slope and an intercept or a two point. There's many ways to describe a line, but the idea is that given that
data, what is the best line that fits that data? And best can be described in many different ways.
Is it, you know, the, the distance, the perpendicular distance, the L2 norm, et cetera.
But without getting into the details of like how you describe these
kind of air functions, let's just imagine something like, I want to determine red marbles
from white marbles. How do I do that in real life? I might lay them on the floor, and they're all
mixed up. And then I might want to say, well, which ones are red, and which ones are green,
which ones are yellow, etc. I might start pushing them into well, which ones are red and which ones are green, which ones are yellow, etc.
I might start pushing them into piles, right?
And by pushing them into piles, it becomes much more obvious how many there are.
And I can quickly draw a line between them.
And that's kind of what machine learning tries to do.
It tries to warp the data in a way through a function so that these kind of decisions become obvious by some point in the function.
Is this sort of making some sense?
Shall I be a little bit more concrete?
Sure.
So one type of problem that is usually used for machine learning is the idea of classification.
So if we have an image and we want to put it through a function,
that function could maybe describe what that image is in terms of like a category. So if I have a classifier that determines what kind of dogs there are, if I have a bunch of
different dogs, you know, one of them might be a Boston Terrier, one of them might be a German
Shepherd. And I want to give that image and output sort of a number which represents a class or a
string which represents a class. Machine learning is about how I create that function.
Okay.
Yeah.
So the model is the function effectively?
Exactly.
Okay.
And this is not that different from traditional modeling.
So if I want to, if I drop a ball from a height
and I want to compute what its velocity is and what its position is, I might develop a model.
And that's a physics model.
That's a Newtonian physics model.
And I might do that empirically.
I might do that.
I might try to come up with a mathematical function that measures it really accurately.
And that's sort of what Newton did.
The problem with that approach is that it works really well for simple phenomena. But if you have complicated things, like this example of identifying different dogs in images,
it becomes much more harder to do that algorithmically.
So the idea with machine learning is that if you can create a class of functions,
a model architecture that is perhaps really complicated,
can we make a way of getting the particular parameters of that model
to do a good job.
Okay.
Yeah.
So back to my linear example, if I have a particular line, and then I look at my data,
I can measure how good my line is doing against that data.
And if it's doing badly, can I improve it?
The way I might improve it is I push that line until it's the right angle.
I push that line up and down until it's the right angle. I push that line up and down until
it matches the right bias. And that's what's happening in machine learning, but at a higher
dimensionality using much more complicated, higher matrix order things and nonlinearities.
If you kind of make that function more and more complicated, it's harder to intuitively see it,
but it's basically doing the same thing. So are all of these machine
learning algorithms, I mean, do you specify when you're training a model how what dimensionality
you want it to try to fit to the data? Or does it just do whatever it does? Yeah, exactly. In
traditional machine learning, you would specify the model architecture. So you would say, okay,
if I want a linear fit to
something, I would choose how many dimensions in and how many dimensions out. I would also choose,
so like that would correspond to the image size in our example, and would correspond to how many
classifications. So concretely, if I'm just going to do a simple matrix as my model, and I say the image is, you know, 256 by 256, then I'm going to have, you
know, 256 squared elements on the rows of my matrix or the columns of my matrix. And then on
the other dimension, my matrix, I'll have the number of classified classifications, and my
desired output for that function might be what's called a one hot vector. So if I give it a Boston Terrier, and Boston Terrier is code three,
then I would get zero, zero, one, zero, all zeros. And if I say my German Shepherd is class zero,
then I would get one followed by all zeros. So that's one possible encoding.
Do you have to tell how many classifications there are up front?
Yeah, in this type of representation, you would.
Okay. Okay. Okay. Could you maybe talk a little bit more about how models are trained specifically with TensorFlow? Yeah. So as I said, TensorFlow is a framework to help you train ML models. And
there's a couple of things that come up over and over when you try to train a model. One is how do
you get data? How do you represent data? And how do you shove it through the system efficiently?
The second one is, how do you describe models
and how do you specify them?
So TensorFlow provides a library, tf.data,
which is a way of sending data within it.
The second thing it provides
is a way of specifying model architecture.
So there's a lot of conventions
that have come across from successful research
projects, like fully connected, which would be basically a matrix multiply layer. And then on
top of it, you might stack another thing, which is maybe a convolutional thing, which knows how to do
things like blurs and edge detections. And all these kind of layers can be accumulated in the
library and create a potential
model architecture. Then the next phase is when you actually start training. To actually start
training, you need to go from a particular set of parameters that perhaps were randomly initialized
that don't do a good job at all, and you try to perturb them until they're good quality. And the
way you do that is using gradients differentiation. So if you
evaluate your function on a set of data, that gives you an output. From that output, you can
also compute what is the derivative with respect to all the variables, all the parameters in the
model. And that will give you sort of a perturbation that you can apply to all those variables that
will make it better, that will make the error less. Okay. And you keep doing that over and over again,
running your same data over and over again, against your model, computing these small
perturbations to the model using your gradient function. So TensorFlow is helping you define
that architecture. In terms of the gradients. It's also doing automatic differentiation, which is basically allowing it to you to specify your model architecture in a
straightforward way,
and then compute the the gradients automatically.
So you don't have to apply the chain rule and you know,
all the differentiation rules that everybody's forgotten from calculus.
All right.
So I want to, let's see,
maybe focus on something
that might be an appropriate focus on.
But when you're saying you can run the input
through convolution matrix or whatever,
edge detection or something like that,
it kind of sounded to me like that means
for something like this dog classification example,
you're not necessarily just training one model.
You're not saying,
here's an image, now give me the output, do you then split it and like, say, okay, well,
I'm going to wait, I'm going to train a model that's based on edge detection, and I'm going to
train a model that's based on colors, and I'm going to train a model that's on something else,
and then have those things work collaboratively? Or is that all part of one process? So essentially, you're trying to create a huge composite function. And that
composite functions can have multiple stages. And I think that's what you're sort of feeling
your intuition is basically correct. So I was using the example of a matrix or the example of
a line, which is also a matrix. Those are simple linear models, those aren't that powerful. And it
turns out there's a, you know, a hard limit to what you can do with them. So the way people solve that is they add
multiple layers that do more steps, maybe just more steps of linear. And between those
linear layers, there's also nonlinear functions. And all these kind of functions allow you to have
more resolution power and represent more complexity. The problem with that
is that it gets harder to reason about, and it gets harder to train when you get deeper. So the
idea of deep learning, which came out a number of years ago, and that's one of the major drives of
AI recently, is that you now contractively train those multi-layer models. Because traditionally,
you were not able to do that. It was not tractable at all. You didn't have enough
computation power, and you didn't have any way of dealing with the numerical issues that occur
when you do gradients across multiple layers. You can imagine, if I want to, you think about
the butterfly effect, right? I perturb some air somewhere. Does it cause a tsunami somewhere?
This is kind of what happens when you go through many, many layers of a machine learning model.
You try to figure out what the causality of a particular output is, and it becomes harder.
And that's called the vanishing gradient problem.
It turns out there were techniques like dropout and data augmentation that helped make it so that these types of problems were tractable.
And that's what allowed deep learning.
And so deep learning allows you to make multiple layers.
So when you were talking about the edge detection, what happens in a deep learning image model is that you have something that's very close to the image, which is computing very
low-level features like edges, like blurs.
And then the layers subsequent to that create higher-level features like maybe course shapes,
course orientations.
And as it goes down and down in the model, it gets higher and higher-level features until
it can successfully
do a classification problem. Okay. Okay. Today's sponsor is the PVS Studio team. The company
develops the PVS Studio Static Code Analyzer designed to detect errors in the code of programs
written in C, C++, C Sharp, and Java. Recently, the team has released a new analyzer version.
In addition to working under Windows, the C Sharp part of the analyzer can also operate under Linux
and Mac OS. However, for C++
programmers, it will be much more interesting to find
out that now you can experiment with the
analyzer in online mode on the godbolt.org
website. The project is called
Compiler Explorer and lets you easily try
various compilers and code analyzers.
This is an indispensable tool for
studying the capabilities of compilers.
Besides that, it's a handy assistant when it comes to demonstration of code examples.
You'll find all the links in the description of this episode.
I want to see if we can get more into the C++ being used for this.
But before we do that, could you tell us a little bit about TensorFlow Lite,
which in your bio you mentioned that you were one of the architects for?
Yeah, so TensorFlow is aimed at training ML models
and aimed at deploying them on the server for serving.
So if you wanted to do inference over many users
that were hitting the same server at once,
TensorFlow has been used for that.
There's a library servo that does that.
But what emerged as we started deploying ML models on device is that the overheads were high in TensorFlow, which was okay for a server-based language because you basically have many inferences happening at once.
You have many pieces of data coming at once. So any sort of interpreter inefficiencies were not such a big deal
because they were amortized over really large amounts of data.
At the same time, there were other constraints like binary size
that became really important on mobile devices.
So an app developer doesn't want to have a huge binary
that they have to ship around.
So TensorFlow Lite had the goal to make the overhead of
individual operations be much smaller. It had the goal of having a very small binary size.
When we first shipped, we were about 100 kilobytes for the interpreter.
Wow.
And it also had the goal of basically having a very low latency to startup.
So you can imagine that there's a lot of kind of algorithms that you can use.
And if you have a lot of a big binary size, it's not such a big deal to load that huge VM image if you're not going to use it all the time, even if you're initializing large
parts of it.
If you're going to run for like 10 days doing a machine learning training, it doesn't matter
that it takes 10 seconds to load or whatever. But on a mobile device, when somebody's starting an interaction
with their app, and they want to get the result in like two seconds, then you want to minimize
the latency. So in TF Lite, we focused on those design constraints, and we made a like a subset
of features that would work well on mobile. And then we made a way in which you can take models from TensorFlow
and put them into TensorFlow Lite
so that you have a continuous authoring process.
So can TensorFlow Lite do the learning also,
or just the inference, the running of a model,
if I'm getting those terms right?
So it doesn't support the learning as a first-class citizen.
There are ways to do the gradient propagation manually. In fact, I think we have a blog post or an instruction on how to do that. But that's typically not done as commonly, though, as mobile devices are getting more powerful, and people want to do kind of more adaptive algorithms, it does occur. There are certain types of applications that are deployed
that do training, but it's not the most common path right now. Okay, so I'm curious what the
model actually looks like. What is this thing that you generate and then hand off to your mobile
device? Yeah, so as we talked about, there was these different layers, and each layer
might have something like if it's a
convolutional model it might have a set of filters so you don't put a layer in and say it's an edge
detect model you say it's a convolutional model and you sort of learn the filters so that that
layer can be a an edge detection it could be a blur it could be some other just swizzling data
okay um so those are called the weights.
So when you put a model on a device, when you serialize it,
you remember the weights and you also remember the topology.
So on TensorFlow Lite, we have a flat buffer, which is memory mappable, that contains the weights and the topology.
Okay.
And so it has essentially a graph of,
a directed graph of these layers and or we call them ops, I guess.
And it has the weights associated with them.
So it can then run.
All right.
Thanks.
Okay.
I know TensorFlow has multiple bindings.
Are developers or scientists who use TensorFlow using the C++?
Or are they mostly using Python bindings?
How do they usually interact with it?
Yeah.
So I guess I haven't mentioned that TensorFlow is often interacted with Python so far,
but it's written in C++ and written in Python.
Most researchers that are training models use Python.
Okay.
So that's how they interact with it.
Most people that are deploying onto mobile devices
are using a different language.
So if you're using Android,
you would probably be using Kotlin or Java.
If you're using iOS,
you might be using Objective-C or Swift.
And both of those,
you could write a library in C++ and use that.
So in terms of the bindings,
TensorFlow has C bindings, which is kind of the way in which you can create, you can call
TensorFlow from that. So you can load a model, you can run inference on a model using the C API.
And similarly, TF Lite has a C API, which you can invoke. So if you want to bind to Java, you would use JNI to write a binding layer
that connects to TensorFlow Lite, for example.
If you are going to use TensorFlow
or TensorFlow Lite from Objective-C,
well, Objective-C can use C++,
so you could just do it directly with no bindings.
Right.
Yeah.
So I think most bindings are written by hand
using the C layer.
And due to ABI compatibility, it's usually required that you write things in a C lowest
common denominator way to make them compatible across multiple compiler versions.
ABI comes up again, our nemesis.
We've had lots of conversations on the show about uh breaking abi compatibility across c++ versions
so it's been a yeah um so i how how much effort is it i'm trying to imagine like how high level
how small like how what kind of a footprint do you try to keep with the c binding layer to
make sure that all this is maintainable like uh, what is the surface area of C? How big should you make it?
Yeah, basically. Because anytime I've done language bindings to another language,
I use Swig that parses my C++ header files, and it does the work for me.
I've never personally done this official C binding
and then let people use that kind of thing.
So I'm kind of curious what that ends up looking like in your world.
Yeah.
So I think if you look at machine learning inference or even training,
typically you interact with a small surface of the interface.
If you're not involved in authoring individual nodes,
you really don't need to do anything except send inputs in and get out
outputs.
So it's not as big of a surface as the whole thing.
And that means that for a long time,
the Python bindings were kind of a special case in TensorFlow.
The other thing that I would say is that even for that small surface area,
it's often useful to make bindings idiomatic.
So going back to your Swig example,
I've seen a lot of libraries or a lot of applications
that have bound with Swig and, I've seen a lot of libraries, or a lot of applications that have
bound with Swig, and they bound their whole C API. And that turns out to be really good if you're
kind of a C++ developer, and you want to prototype C++ things in Python. So I think Maya had a good,
like very direct API for doing this, which is a 3D animation tool. And I used it extensively.
And I've also done this in my own projects. But if you're trying to make something
that's idiomatic Python or idiomatic Swift
or idiomatic Objective-C,
you tend to want to rewrite the bindings
in terms of manually.
So you want to use the language features
and the language idioms
that are considered good for that particular language.
And that's why a lot of people handwrite their bindings.
Okay.
So your footprint footprint you're
saying it is pretty low it basically just comes down to like loading a model and executing the
model from as far as tensorflow light goes yeah i mean i believe there are ways to uh to construct
the model i haven't looked at what the current situation of it is but in terms of what most
people use from c it is uh to to run inference or to run a training loop, possibly.
In terms of our Python bindings, we do actually use a wrapper generator.
We just don't wrap the entire C++ library.
We wrap a smaller interface.
Oh, that's what you were saying before, that it was a special case thing.
Yeah.
And traditionally, we used Swig as well.
More recently, we switched to Pybind for the
TensorFlow library. And Pybind's a library
created by a
graphics researcher who does a lot of rendering work,
Vensel Jakob. And he has a lot of other
interesting C++ things you should
look into him as a future
guest, I would say.
He's done a lot of things on machine learning
combined with graphics which
is really interesting uh differentiable rendering that's cool so now i'm also curious uh since
you're talking about passing large chunks of data around through the c api like what does the how do
you deal with object lifetime like because the c plus plus side of things is going to have some
notion of things and the c thing do you have to do like this typical, like create an object and then ask the binding to destroy it for you kind of
thing or.
Yeah.
I mean,
it's,
you know,
typically you'll have a free function associated with a,
an opaque handle in the C API for a TensorFlow.
Okay.
Um,
in terms of,
uh,
what we do in TensorFlow light,
a lot of times we're memory mapping, uh, the, the model, in which case we assume everything has kind of all those weights have an infinite lifetime.
And we try to not copy any of the really big data.
We only copy and create internal representations of the topology of the graph.
Just load it from the memory map whenever you need it.
Yeah, exactly. And in fact, one of the big differences
with what we did in TensorFlow and TensorFlow Lite
is how we dealt with memory allocation in general.
So TensorFlow for its traditional runtime
uses kind of a reference counted tensor handle.
So there's like a buffer and then tensors are sort of copy,
not quite copy on write, but that same kind of feeling, which is that you have these handles and multiple reference counted views of them to emulate kind of a value semantic situation.
In TF Lite, we do ahead of time memory planning, where we try to create an arena that's the whole memory that you would need. So it's kind of like an uber
activation frame of what's needed for the model, where we can overlap different parts of the
computation that are used at different times to make the minimal overhead in terms of memory
allocation, or at least a smaller, maybe not minimal. Yeah, minimal becomes very tricky if that's truly your goal so is TensorFlow Lite written as a separate
library or is it like a
pared down if deft out version of TensorFlow
yeah it's a separate library right now
but does it share a number of the same operations I think you said
it shares similar operations
there's some differences in operations
it defines some differences in operations.
It defines some of its operations as fused versions. So, you know, we were talking about convolution. And it turns out that after a convolution, or before it is, depending on how
you look at it, you often do a bias, which is just adding a vector to everything. And then you do some
nonlinear thing. And if you kind of imagine doing those all at once
while you're loading a single element,
or you do the bias and the activation,
which are kind of like a nonlinear function,
then you have reduced memory bandwidth,
like typical kernel fusion type strategy.
So TF Lite defines some of the key operations
like convolution bias add as a fused operation
to get higher performance
there's different implementations a lot of the tf light implementations are arm optimized by hand
using assembly and we use another google open source library called rui for that
which does basically fast quantized and floating point matrix multiplication,
which is a primitive that's used by a lot of these types of machine learning operations.
And you mentioned before that you are limited to C++14 with TensorFlow Lite.
Yeah. So, I mean, there's kind of a long story.
If you want to support software on a wide variety of machines, it tends to be,
you get a lot of complaints if you go too new, too fast. The other thing that's interesting
on TensorFlow Lite side, which I'm much more familiar, admittedly, is that we have a lot of
people that are trying to take TensorFlow Lite and take it to really small devices like
microcontrollers. And while microcontrollers have gotten to be a lot better
at handling new compiler toolchains,
since ARM is so ubiquitous and RISC-V as well coming up,
that means that you can use C++,
whereas traditional embedded developers would never touch C++.
They're still sometimes a little bit behind.
And it's sort of like sometimes the chips that you might choose,
you choose for hardware reasons,
and their tool chain might not be as advanced,
even though there are definitely microcontrollers
that make every version of C++ available.
That's cool, yeah.
On one hand, it's a shame that you can't move beyond C++14,
but also really cool that you're able to at least use C++14 on these devices.
Yeah, I mean, I see a lot of possible benefits of the C++17 and C++20 features.
I think if you look at TF data and some of the streaming operations that we do,
they look very similar to coroutines and using coroutines
directly might be really interesting. And then lots of other creature comforts.
So we've talked a lot about training these models and deploying them and TensorFlow Lite and small
devices and everything. And then now I'm like, well, what are people actually using TensorFlow
Lite for? What kinds of models are they executing on handheld
devices or microcontrollers? Yeah, I mean, so like, if you look at TensorFlow Lite, it's been
deployed in like over 4 billion devices. And that's a lot. Yeah. So the way so it's been used
in a lot of Google's core applications, you can imagine what types of things it might be used for.
If you look at an app like Google Photos and you type in something into the Google Photos search,
you can say, I want to find a flowerpot, and it will show me the flowerpots.
And that's basically an image classification.
And that's a model that's been trained.
There's some parts of that that can run on device,
some parts of that that run on server.
If you look at other models, like speech recognition is another big one.
There's been a lot of work on speech recognition,
and that's enabled such devices like Google Home,
where you can talk to it instead of having to interact with a traditional
input device. You can also do that on your phone. And then there's, like, again, some of that can
be done on device and some of that can be done on server. And it turns out that as mobile devices
get better and better, you can do more and more on the mobile device and have to rely on the server
less. Okay. It's fascinating just thinking about
in my personal career, how many times we've gone to, you know, everything on the server,
know everything local and everything on the server. Okay. Now let's find the balance that
makes sense for everyone. So it's interesting. Yeah. I mean, I think it's going to be a constant
push and pull and there's always trade-offs. And this also happened in graphics,
where you think about what you can render on a GPU versus what you can render on a CPU,
or what you can render in 30 frames per second
versus what you can render overnight.
And you're always going to be pushing both of those at the same time.
Another application that is really interesting
that we demoed at Google I.O. that I worked on was the idea of using pose.
Like if you point a camera at someone, you can actually figure out what kind of orientation all their limbs are in.
And you can do that and you can use that to kind of teach them things.
So in that case, we took a dance instructor and showed people how to do dance moves, and then we slowed it down.
And then we used that post-match to tell them when they're doing a good job and gave them a score and allowed them to improve and give them feedback on what they're doing.
So a lot of different kind of applications that people have done with very specialized equipment like motion capture, like specialized cameras, now can be done with regular cameras on device.
And that's a really exciting thing because it's going to just make it more ubiquitous. All right. That's neat.
So has there been any, uh, I'm thinking about, you know, your desire to run these things on small,
fast, you know, quickly on small devices and whatever. Has anyone done any work on like
actually compiling the model itself?
Yeah, so this is kind of an interesting question.
There's been a lot of work on compilers for machine learning.
So part of TensorFlow, like released with TensorFlow is XLA,
which is actually a compiler.
And it takes as an input HLO, which is an instruction set,
which is basically linear algebra operations.
So you can actually take TensorFlow operations,
and they can be lowered to this dialect,
and then they can be compiled to CPU to GPU.
And this is actually how Google's TPU is actually fed.
You make programs into these XLA programs.
And the way this works in TensorFlow...
Yeah, what is it?
Processing unit or something?
Yeah, the TPU.
Yeah, Tensor Processing Unit.
Tensor Processing, okay, sorry.
Yeah, so essentially, yeah,
so that gives you the ability to compile things.
And a lot of the benefit that you can get from that
is kernel fusion,
like we had talked about with TensorFlow Lite.
You can imagine doing that on the fly.
You can imagine doing this with,
basically XLA is integrated as a JIT.
So you can basically tell TensorFlow to compile this thing.
And then that becomes a new function
that actually goes to a compiled version of it.
That's cool.
There's also another framework
that's been created called MLIR,
which was started at Google by Chris Latner.
And it is essentially a framework for creating multiple levels of representation of IRs.
So you can build compilers for a wide variety of things.
And the observation there is that you might have different sort of dialects
or different IRs that you might have different sort of dialects or different
IRs that are useful at different times. So you could imagine XLA being a dialect within it,
or you could imagine TensorFlow and being a dialect with it. And in fact, the converter
from TensorFlow to TensorFlow Lite is actually implemented using MLIR technology. So there's a
lot of, you know, exciting things happening with compilation uh what traditional tensorflow did for its implementation is it used a lot of eigen um so this is kind of using c++
as a uh compiler for an edsl essentially so a lot of the operations in tensorflow
were implemented in terms of terms of eigen um so at that stage you can imagine um
so you asked if there's like heavily uhization of TensorFlow, you might ask that.
And mostly at the Eigen stage.
But if you look at some of the operations, they're often templatized and their kernel implementations in terms of dimension and terms of type.
So you've got a specialized version for type and dimension. And Eigen can sometimes do better with its packetization, where it's mapping these high
level operations that are linear algebra or tensor operations into packetized SSE forms
that work better on CPU.
All right.
That's pretty awesome.
Yeah.
I'm sorry.
Could you go over what Eigen is?
Because it sounds familiar, but I don't think it's something we've talked about in quite a while.
Yeah, so Eigen is a library that was created several years ago now to handle linear algebra.
So it's aimed to be a C++-oriented linear algebra library.
And it uses expression templates to sort of describe repetitive operations. So if you're doing like a bunch of operations
on small matrices,
you'll get a specialized small matrix
for that particular size and type that can outperform.
So the basic idea of expression templates
where try to take all the dynamic checks
and turn them into templates.
Excellent.
So, yeah.
Oh, I was just going to say that
I believe that the tensor
functionality was added by a TensorFlow developer at the time. And that's the extension that was
used to create the first version of TensorFlow. I'm always in favor of more things being done
at compile time if they can be, of course. So that's, yeah. Yeah, I mean, there's a lot of
things that you can imagine doing at compile time.
So as I said, the kernel fusion is one thing. The other thing is this memory planning, the idea,
if you can infer what the shape of a tensor is, so tensors are basically multidimensional arrays,
and various of these machine learning operations could be formulated in a way that you can infer what the final shape of those tensors are. But if you change certain inputs, like the image size, then the shape of the output tensor might
be completely different. But if you can infer that it's the same, then you can allocate the
memory more efficiently, you can pre partition for parallelism, etc, etc. So yeah, lots of
opportunities. And after the models generated, you definitely know the sizes of these things.
Yeah, for some types of models.
There's also dynamic models where this is not the case.
So if you imagine just like a function,
there's some types of functions like a convolution.
If I stipulate that the image size is always something
or I resize it first,
then everything below that resize is known shape.
But if I take an operation,
I say, give me a tensor that has all the positive elements, then the size of that output tensor will be smaller or equal to the input tensor. So the shape is unknown. There, that's kind of a nice
case because the shape is bound by the input shape, at least um so you might be able to do more
there but uh in general you could create a function where there is no guarantees so something i've
been uh was just chatting with my nephew about recently is some of the articles and stuff i've
seen you know come up on twitter or whatever about unintentional bias built into our machine
learning models and um i'm just kind of curious if you have any opinion,
if you'd like to talk at all about the ethics of building these models
and a bias, intentional or otherwise, that can be built into them.
Yeah, so I'm not an expert on this area, but I think it's super important.
And one of the neat things is that Google is actually looking at this
because we take that really seriously.
There's a whole website devoted to AI responsibility that Google has.
Let me check the URL.
I think it's ai.google.com.
And it talks about what are some of the sources of bias, some best practices to avoid it.
And I think there's even extensions to some of our tools.
So we didn't talk about it, but TensorFlow has a visualization tool
that's really nice for understanding what's going on in your model called TensorBoard.
And one of the nice things, I believe there's an extension
that allows you to understand what the characterization of your data is.
But, I mean, personally speaking, I would imagine,
like one of the downsides of using data is that, you know,
you're vulnerable to not choosing the right data, having some kind of sampling problem with it.
And you need to actively work against this. And you actively need to compensate for that and be
aware of what you're putting into it. There is no magic bullet to it. People have to be vigilant.
Yeah, it sounds like it would be the kind of thing that would be at least today impossible to automatically detect bias in the system.
Right. Because like, if you imagine an automatic detector, you would, how would you make that you
would have to use data to make that detector. So my best case, maybe you could compare detectors
against other detectors or something.
Right. I mean, I think the valuable thing about machine learning models is that they are kind of observable.
You can put new inputs into and see how they do.
That doesn't mean you understand the internals.
There's a whole branch of AI about AI understandability, which is trying to take these models that are kind of trained through automatic means and try to make them interpretable. And some classes of training.
Only want to create interpretable models.
You know.
So you kind of think.
Interpretable by a human.
Yeah.
That you're actually understanding how they work.
And this.
I mean these areas are still under active research.
And I'm not an expert.
So I can't really say anything too intelligent about them.
But I think these are the kind of tools that we're going to need to improve the situation.
With any new technology, there's always this period of understanding how it's used, understanding other second-order implications that are really important.
That's a fascinating comment, if I understood you right.
You're saying that the average model that's created, a human cannot understand.
I mean, it depends on what types of models that you use.
That is one of the criticisms against deep learning often, which is that it's harder to
understand the model. People have found techniques to kind of analyze them. They're certainly being
validated against their validation set. So they're producing the right answer for all the set that you look into. But it's fundamentally an unsolved problem right now.
Interesting. Okay, well, and is there anything else you want to go over before we let you go?
I think we covered a lot of things. I don't want to, you know, overwhelm everybody. But
I mean, I think it's a really easy area to get into. There's lots of tutorials on doing machine learning. There's lots of YouTube
channels, lots of content on how to get started with this. And it's really fun. Pick some problem
that you want to do and do a weekend project with it. And I think you'll find that it's really
exciting what you can achieve with very little code. And the frameworks have made it a lot easier,
especially TensorFlow. One of the major things that we've been focusing on
the last couple of years
is making it easier to use TensorFlow.
The tf.keras library,
which is kind of a high level for creating models,
has made things way more understandable.
And there's still the lower level library
if you need to dive into a lot of details.
So I encourage everyone to give it a try.
Okay. Thanks very much.
Thank you.
Thanks.
Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the
podcast. Please let us know if we're discussing the stuff you're interested in, or if you have
a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to
feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter.
You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter.
We'd also like to thank all our patrons who help support the show through Patreon.
If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast.
And of course, you can find all that info and the show notes on the podcast website at cppcast.com.
Theme music for this episode is provided by podcastthemes.com.