CppCast - TensorFlow

Starting point is 00:00:00 Episode 257 of CppCast with guest Andrew Selle recorded July 23rd, 2020. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we discuss library support and Compiler Explorer. Then we talk to Andrew Selle from Google. Andrew talks to us about machine learning with TensorFlow and TensorFlow Lite. Welcome to episode 257 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner.

Starting point is 00:01:18 Jason, how are you doing today? I'm all right, Rob. How are you doing? Doing okay. Don't think I have too much news to share here. Very hot week here in North Carolina. You? We've actually had a slight cool down. It's only been in the low 90s here instead of the upper 90s. Oh, yes. It's nice and cold. David would like to comment that I did just now, like 10 minutes ago um make my cpp con registration so i am planning to attend cpp con virtually um so just to reminder to our listeners that it's soon that the early bird expires sometime in august right i still need to do that myself

Starting point is 00:02:00 it's like the fifth yeah it's august so it'll be like two weeks two weeks yeah yeah and it saves you a hundred dollars it's 200 versus 300 so it's a pretty significant 50 50 30 savings yeah definitely worth it to sign up early yeah okay well at the top of episode electric piece of feedback uh we got this email from michael about uh the episode from two weeks ago with uh the disney hyperion renderer team and they said this episode was great i really enjoyed listening to yinning and david hearing about how the disney render their movies was super fascinating keep up the good work and uh yinning was, Yining Carl, was the one who actually recommended our guest

Starting point is 00:02:46 for today. Oh, that's cool. Yeah. We'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter,

Starting point is 00:02:53 or email us at feedback.cbs.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Andrew Selle.

Starting point is 00:03:01 Andrew is a senior software engineer for TensorFlow Lite at Google and is one of its initial architects. He's software engineer for TensorFlow Lite at Google and is one of its initial architects. He's also worked on improvements to the core and API of TensorFlow. Previously, he worked extensively in research and development of highly parallel numerical physical simulation techniques, physical phenomena, like for film and physically based rendering.

Starting point is 00:03:18 He worked on several Walt Disney animation films, including Frozen and Zootopia. He holds a PhD in computer science from Stanford University. Andrew, welcome to the show. Well, thanks for having me. I'm kind of curious what drove this move from animation simulation, physical simulation to TensorFlow. Yeah, I get that question a lot. I think if you look at the core of what you have to do to do physical simulation for film or physical simulation for physics, it's actually a lot of similar skill sets to what you do for machine learning.

Starting point is 00:03:48 And at the same time, as I was doing things in film, I started getting more interested in what was happening in machine learning, and I wanted to give it a try. Okay. Very cool. I'm also curious what highly parallel means in your universe. Yeah, so I think this has varied over time, but when I got into physical simulation, there was a big problem with doing things in film

Starting point is 00:04:16 is that people were using single computers. So one of the things that I worked on at Stanford was extending physical simulation to work on MPI. And of course, this had been done for supercomputing for a long time. But we took sort of algorithms that had been applied to film, and we scaled them. So we did the first simulation of clothing that was like a million triangles,

Starting point is 00:04:38 which most people were simulating around 10,000 at the time. This was a while ago. So people have gotten to these levels as just bread and butter types of scale. But since then, you know, I've done a lot of GPU stuff. I've done a lot of distributed parallelism and just a lot of micro optimization on single core as well. I distinctly remember the first time I saw a clothing or a cloth simulation, physical simulation. I was at SIGGRAPH and I was in high school. It was in like 1994 or something like that. And it seemed like complete magic at the time. And it was like, they had a single piece of fabric that they were able to simulate, you know,

Starting point is 00:05:19 like things have come in a long way since then. Yeah. I mean, it turns out that the biggest problem for clothing is the collision detection. So when you have a cloth fold over itself, you have to make sure it doesn't fly through itself. And that's where sort of trivial parallelism where you say, oh, this part of the cloth is not anywhere near this other cloth starts to fail. And you start to have this problem of like an N squared possible interactions where any point of the cloth can contact any other point. And you have to check all of those in an efficient way. So you do actually have to check them all, or is there a shortcut? Well, I mean, you use spatial structures to accelerate that. So one of the most common one is a bounding box hierarchy where you say, well, this triangle is within this bounding box

Starting point is 00:06:01 and this neighbor of it is within this bounding box. If we put those together together then we can say there's a bounding box that contains both of them and you continue with that and you create a hierarchy of bounding boxes then it it reduces uh you know not the worst case scenario but the average case scenario so that it's tractable i mean in the worst case you could have all the clothing compressed into a single like tiny box and that would be very difficult to solve right cool okay well andrew we're going to have a couple news articles to discuss uh feel free to comment on any of these and we'll start talking more about tensorflow okay yeah awesome all right so this first one we have an article on the visual C++ blog, and this is that initial support for C++ 20 ranges are now available in the latest version,

Starting point is 00:06:50 Visual Studio 2019 version 16.6. The first user-visible pieces of range of support. Apparently they've been working on it kind of under the covers for a while, but now you can actually kind of go and test it out, kick the tires. I feel like it's more significant than that as well it's not just ranges but it is ranges built on concepts yeah so for a long time the

Starting point is 00:07:11 ranges implementations that we've had have been built with concepts emulation and now we're we're just about there we could actually see this stuff coming together and concepts it looks like has been in there for the past three dot releases. So now they're using ranges with full concepts. Yeah, good stuff. You had a chance to play around with ranges at all, Andy? Unfortunately, no. A lot of times we've been limited to C++14 and TensorFlow.

Starting point is 00:07:40 Oh, okay. Well, there is a request here before we move on that please go try it out, kick the tires, submit bug reports, and a reminder that here before we move on that please go try it out, kick the tires, submit bug reports, and a reminder that all this stuff is being developed on GitHub now. The STL implementation for Microsoft is all there. So check it out if you can, submit bug reports, whatever. Yeah, definitely.

Starting point is 00:07:57 Okay, next thing we have is some new features in Compiler Explorer. Do you know how long Matt's been working on these, Jason? Being able to link libraries is the new feature. Well, I think, I don't know how long they've been working on it. Okay. Although I know that Matt would like us to point out that this was not just Matt's work.

Starting point is 00:08:18 I don't know how much involvement he had in it. It is quite the team that helps him with Compiler Explorer at this point. So, but yeah, I think it's awesome because there's been a few times when I've tried to use libformat in Compiler Explorer. It's just unusable because you have to use the header-only version. It takes forever to compile, whatever. And so now they have libraries that you can link to. Do they have a full list of libraries here?

Starting point is 00:08:39 It looks like they're listing a couple of the unit testing libraries, but I'm not sure what other ones are available. No, libformat's the only other one that I know specifically is called out, plus the unit test ones. I haven't looked myself, actually. I may have just found it. Yeah, Google Benchmark, Intel, TBB, format, cross cables. I'm not sure what that is.

Starting point is 00:08:59 Catch-2, DLib, Google Test. That's fascinating with Google Benchmark also being supported because there's quick bench compiler explorer integration if you haven't seen that so if you're in compile explorer you can hit a button and go to quick bench and i'll just copy the code over and from quick bench you can go back to compiler explorer and having that library now shared between them means you can get a more a more complete integration between the two i'm'm guessing. I haven't tried it yet, though. Okay, and then the last thing we have is CMake 3.18 releases out. Anything worth pointing out with this, Jason? It's a big release. I'll ask Andy, did you read this? Do you have any interest

Starting point is 00:09:39 in CMake? I've used CMake a lot in the past, but I haven't recently. TensorFlow was using CMake for Windows support before Bazel supported Windows. But now I think we're not using it that much. But yeah, I enjoy using CMake for small projects. And the cross platform support brings me back there when I need to do that often. Yeah, absolutely. So yeah, there's nothing specific that I feel like I need to call out on here. There's just so many little changes. And it's clear that, oh, no, wait, no, there is one thing I'll call out profiling support. So if your CMake project currently takes forever to configure and generate, you can run the profiler, figure out where it's spending

Starting point is 00:10:21 its time. Very cool. I need to try that. All right. Well, Andy, we've had a couple people recently ask us to do an episode on TensorFlow. So could we maybe start off by just letting you explain what exactly TensorFlow is for listeners who have never worked with it, which I think includes both me and Jason. Sure. I mean, TensorFlow is a big project and it offers a lot of functionality, but the main core idea behind it is that it's an open source library that lets you develop and train machine learning models. And the basis behind it was the idea that you can start from research and a researcher can create a model and you can take it all the way to production and deployment on a wide variety of devices, including smaller devices like

Starting point is 00:11:08 mobile devices. So given that scope, it has a lot of features and I'm not even an expert on a large majority of them. Okay. That's interesting that it has mobile applications. What does that look like to be doing machine learning on a phone? I'm curious. So on the phone, you typically do machine inference.

Starting point is 00:11:28 So that's the other half of it. So once you train a machine learning model using data, you often want to use it to integrate into applications. So a lot of the work has been doing inference on a mobile device where you carry the weights that describe the function and the program that describes the function, and then you can evaluate it on whatever inputs there are. We can get more into TensorFlow Lite, which is the mobile product for part of TensorFlow in a bit. Okay, so you've already used words like weights and models, and for those of us who don't use it,

Starting point is 00:12:02 I would love the explain like I'm five description of what machine learning is. Why are we training models? What are we doing with them? How does this work? Yeah, definitely. So traditional machine learning is about creating a function. And it's, you know, it processes some inputs, and it produces some outputs. How do you create that function? Without machine learning, we have a way of doing that algorithmic. So we say, I want to make a function that compute the sum of two numbers. I can do that. I know how to write an algorithm to do that. It's fairly mathematical. The big difference in machine learning is that you use data. And how you use data can vary a lot of different ways. But if you kind of think of the simplest thing you could do with machine learning,

Starting point is 00:12:48 like the simplest form of machine learning would be something like a linear regression. So you have a bunch of XY data, and you want to find the line that most closely matches it. So you think about that function, and the way you describe a line is basically a slope and an intercept or a two point. There's many ways to describe a line, but the idea is that given that data, what is the best line that fits that data? And best can be described in many different ways. Is it, you know, the, the distance, the perpendicular distance, the L2 norm, et cetera. But without getting into the details of like how you describe these kind of air functions, let's just imagine something like, I want to determine red marbles from white marbles. How do I do that in real life? I might lay them on the floor, and they're all

Starting point is 00:13:39 mixed up. And then I might want to say, well, which ones are red, and which ones are green, which ones are yellow, etc. I might start pushing them into well, which ones are red and which ones are green, which ones are yellow, etc. I might start pushing them into piles, right? And by pushing them into piles, it becomes much more obvious how many there are. And I can quickly draw a line between them. And that's kind of what machine learning tries to do. It tries to warp the data in a way through a function so that these kind of decisions become obvious by some point in the function. Is this sort of making some sense?

Starting point is 00:14:05 Shall I be a little bit more concrete? Sure. So one type of problem that is usually used for machine learning is the idea of classification. So if we have an image and we want to put it through a function, that function could maybe describe what that image is in terms of like a category. So if I have a classifier that determines what kind of dogs there are, if I have a bunch of different dogs, you know, one of them might be a Boston Terrier, one of them might be a German Shepherd. And I want to give that image and output sort of a number which represents a class or a string which represents a class. Machine learning is about how I create that function.

Starting point is 00:14:47 Okay. Yeah. So the model is the function effectively? Exactly. Okay. And this is not that different from traditional modeling. So if I want to, if I drop a ball from a height and I want to compute what its velocity is and what its position is, I might develop a model.

Starting point is 00:15:06 And that's a physics model. That's a Newtonian physics model. And I might do that empirically. I might do that. I might try to come up with a mathematical function that measures it really accurately. And that's sort of what Newton did. The problem with that approach is that it works really well for simple phenomena. But if you have complicated things, like this example of identifying different dogs in images, it becomes much more harder to do that algorithmically.

Starting point is 00:15:31 So the idea with machine learning is that if you can create a class of functions, a model architecture that is perhaps really complicated, can we make a way of getting the particular parameters of that model to do a good job. Okay. Yeah. So back to my linear example, if I have a particular line, and then I look at my data, I can measure how good my line is doing against that data.

Starting point is 00:15:56 And if it's doing badly, can I improve it? The way I might improve it is I push that line until it's the right angle. I push that line up and down until it's the right angle. I push that line up and down until it matches the right bias. And that's what's happening in machine learning, but at a higher dimensionality using much more complicated, higher matrix order things and nonlinearities. If you kind of make that function more and more complicated, it's harder to intuitively see it, but it's basically doing the same thing. So are all of these machine learning algorithms, I mean, do you specify when you're training a model how what dimensionality

Starting point is 00:16:32 you want it to try to fit to the data? Or does it just do whatever it does? Yeah, exactly. In traditional machine learning, you would specify the model architecture. So you would say, okay, if I want a linear fit to something, I would choose how many dimensions in and how many dimensions out. I would also choose, so like that would correspond to the image size in our example, and would correspond to how many classifications. So concretely, if I'm just going to do a simple matrix as my model, and I say the image is, you know, 256 by 256, then I'm going to have, you know, 256 squared elements on the rows of my matrix or the columns of my matrix. And then on the other dimension, my matrix, I'll have the number of classified classifications, and my

Starting point is 00:17:17 desired output for that function might be what's called a one hot vector. So if I give it a Boston Terrier, and Boston Terrier is code three, then I would get zero, zero, one, zero, all zeros. And if I say my German Shepherd is class zero, then I would get one followed by all zeros. So that's one possible encoding. Do you have to tell how many classifications there are up front? Yeah, in this type of representation, you would. Okay. Okay. Okay. Could you maybe talk a little bit more about how models are trained specifically with TensorFlow? Yeah. So as I said, TensorFlow is a framework to help you train ML models. And there's a couple of things that come up over and over when you try to train a model. One is how do you get data? How do you represent data? And how do you shove it through the system efficiently?

Starting point is 00:18:06 The second one is, how do you describe models and how do you specify them? So TensorFlow provides a library, tf.data, which is a way of sending data within it. The second thing it provides is a way of specifying model architecture. So there's a lot of conventions that have come across from successful research

Starting point is 00:18:25 projects, like fully connected, which would be basically a matrix multiply layer. And then on top of it, you might stack another thing, which is maybe a convolutional thing, which knows how to do things like blurs and edge detections. And all these kind of layers can be accumulated in the library and create a potential model architecture. Then the next phase is when you actually start training. To actually start training, you need to go from a particular set of parameters that perhaps were randomly initialized that don't do a good job at all, and you try to perturb them until they're good quality. And the way you do that is using gradients differentiation. So if you

Starting point is 00:19:06 evaluate your function on a set of data, that gives you an output. From that output, you can also compute what is the derivative with respect to all the variables, all the parameters in the model. And that will give you sort of a perturbation that you can apply to all those variables that will make it better, that will make the error less. Okay. And you keep doing that over and over again, running your same data over and over again, against your model, computing these small perturbations to the model using your gradient function. So TensorFlow is helping you define that architecture. In terms of the gradients. It's also doing automatic differentiation, which is basically allowing it to you to specify your model architecture in a straightforward way,

Starting point is 00:19:51 and then compute the the gradients automatically. So you don't have to apply the chain rule and you know, all the differentiation rules that everybody's forgotten from calculus. All right. So I want to, let's see, maybe focus on something that might be an appropriate focus on. But when you're saying you can run the input

Starting point is 00:20:12 through convolution matrix or whatever, edge detection or something like that, it kind of sounded to me like that means for something like this dog classification example, you're not necessarily just training one model. You're not saying, here's an image, now give me the output, do you then split it and like, say, okay, well, I'm going to wait, I'm going to train a model that's based on edge detection, and I'm going to

Starting point is 00:20:37 train a model that's based on colors, and I'm going to train a model that's on something else, and then have those things work collaboratively? Or is that all part of one process? So essentially, you're trying to create a huge composite function. And that composite functions can have multiple stages. And I think that's what you're sort of feeling your intuition is basically correct. So I was using the example of a matrix or the example of a line, which is also a matrix. Those are simple linear models, those aren't that powerful. And it turns out there's a, you know, a hard limit to what you can do with them. So the way people solve that is they add multiple layers that do more steps, maybe just more steps of linear. And between those linear layers, there's also nonlinear functions. And all these kind of functions allow you to have

Starting point is 00:21:20 more resolution power and represent more complexity. The problem with that is that it gets harder to reason about, and it gets harder to train when you get deeper. So the idea of deep learning, which came out a number of years ago, and that's one of the major drives of AI recently, is that you now contractively train those multi-layer models. Because traditionally, you were not able to do that. It was not tractable at all. You didn't have enough computation power, and you didn't have any way of dealing with the numerical issues that occur when you do gradients across multiple layers. You can imagine, if I want to, you think about the butterfly effect, right? I perturb some air somewhere. Does it cause a tsunami somewhere?

Starting point is 00:22:05 This is kind of what happens when you go through many, many layers of a machine learning model. You try to figure out what the causality of a particular output is, and it becomes harder. And that's called the vanishing gradient problem. It turns out there were techniques like dropout and data augmentation that helped make it so that these types of problems were tractable. And that's what allowed deep learning. And so deep learning allows you to make multiple layers. So when you were talking about the edge detection, what happens in a deep learning image model is that you have something that's very close to the image, which is computing very low-level features like edges, like blurs.

Starting point is 00:22:50 And then the layers subsequent to that create higher-level features like maybe course shapes, course orientations. And as it goes down and down in the model, it gets higher and higher-level features until it can successfully do a classification problem. Okay. Okay. Today's sponsor is the PVS Studio team. The company develops the PVS Studio Static Code Analyzer designed to detect errors in the code of programs written in C, C++, C Sharp, and Java. Recently, the team has released a new analyzer version. In addition to working under Windows, the C Sharp part of the analyzer can also operate under Linux

Starting point is 00:23:26 and Mac OS. However, for C++ programmers, it will be much more interesting to find out that now you can experiment with the analyzer in online mode on the godbolt.org website. The project is called Compiler Explorer and lets you easily try various compilers and code analyzers. This is an indispensable tool for

Starting point is 00:23:42 studying the capabilities of compilers. Besides that, it's a handy assistant when it comes to demonstration of code examples. You'll find all the links in the description of this episode. I want to see if we can get more into the C++ being used for this. But before we do that, could you tell us a little bit about TensorFlow Lite, which in your bio you mentioned that you were one of the architects for? Yeah, so TensorFlow is aimed at training ML models and aimed at deploying them on the server for serving.

Starting point is 00:24:13 So if you wanted to do inference over many users that were hitting the same server at once, TensorFlow has been used for that. There's a library servo that does that. But what emerged as we started deploying ML models on device is that the overheads were high in TensorFlow, which was okay for a server-based language because you basically have many inferences happening at once. You have many pieces of data coming at once. So any sort of interpreter inefficiencies were not such a big deal because they were amortized over really large amounts of data. At the same time, there were other constraints like binary size

Starting point is 00:24:54 that became really important on mobile devices. So an app developer doesn't want to have a huge binary that they have to ship around. So TensorFlow Lite had the goal to make the overhead of individual operations be much smaller. It had the goal of having a very small binary size. When we first shipped, we were about 100 kilobytes for the interpreter. Wow. And it also had the goal of basically having a very low latency to startup.

Starting point is 00:25:28 So you can imagine that there's a lot of kind of algorithms that you can use. And if you have a lot of a big binary size, it's not such a big deal to load that huge VM image if you're not going to use it all the time, even if you're initializing large parts of it. If you're going to run for like 10 days doing a machine learning training, it doesn't matter that it takes 10 seconds to load or whatever. But on a mobile device, when somebody's starting an interaction with their app, and they want to get the result in like two seconds, then you want to minimize the latency. So in TF Lite, we focused on those design constraints, and we made a like a subset of features that would work well on mobile. And then we made a way in which you can take models from TensorFlow

Starting point is 00:26:07 and put them into TensorFlow Lite so that you have a continuous authoring process. So can TensorFlow Lite do the learning also, or just the inference, the running of a model, if I'm getting those terms right? So it doesn't support the learning as a first-class citizen. There are ways to do the gradient propagation manually. In fact, I think we have a blog post or an instruction on how to do that. But that's typically not done as commonly, though, as mobile devices are getting more powerful, and people want to do kind of more adaptive algorithms, it does occur. There are certain types of applications that are deployed that do training, but it's not the most common path right now. Okay, so I'm curious what the

Starting point is 00:26:51 model actually looks like. What is this thing that you generate and then hand off to your mobile device? Yeah, so as we talked about, there was these different layers, and each layer might have something like if it's a convolutional model it might have a set of filters so you don't put a layer in and say it's an edge detect model you say it's a convolutional model and you sort of learn the filters so that that layer can be a an edge detection it could be a blur it could be some other just swizzling data okay um so those are called the weights. So when you put a model on a device, when you serialize it,

Starting point is 00:27:31 you remember the weights and you also remember the topology. So on TensorFlow Lite, we have a flat buffer, which is memory mappable, that contains the weights and the topology. Okay. And so it has essentially a graph of, a directed graph of these layers and or we call them ops, I guess. And it has the weights associated with them. So it can then run. All right.

Starting point is 00:27:55 Thanks. Okay. I know TensorFlow has multiple bindings. Are developers or scientists who use TensorFlow using the C++? Or are they mostly using Python bindings? How do they usually interact with it? Yeah. So I guess I haven't mentioned that TensorFlow is often interacted with Python so far,

Starting point is 00:28:15 but it's written in C++ and written in Python. Most researchers that are training models use Python. Okay. So that's how they interact with it. Most people that are deploying onto mobile devices are using a different language. So if you're using Android, you would probably be using Kotlin or Java.

Starting point is 00:28:35 If you're using iOS, you might be using Objective-C or Swift. And both of those, you could write a library in C++ and use that. So in terms of the bindings, TensorFlow has C bindings, which is kind of the way in which you can create, you can call TensorFlow from that. So you can load a model, you can run inference on a model using the C API. And similarly, TF Lite has a C API, which you can invoke. So if you want to bind to Java, you would use JNI to write a binding layer

Starting point is 00:29:07 that connects to TensorFlow Lite, for example. If you are going to use TensorFlow or TensorFlow Lite from Objective-C, well, Objective-C can use C++, so you could just do it directly with no bindings. Right. Yeah. So I think most bindings are written by hand

Starting point is 00:29:24 using the C layer. And due to ABI compatibility, it's usually required that you write things in a C lowest common denominator way to make them compatible across multiple compiler versions. ABI comes up again, our nemesis. We've had lots of conversations on the show about uh breaking abi compatibility across c++ versions so it's been a yeah um so i how how much effort is it i'm trying to imagine like how high level how small like how what kind of a footprint do you try to keep with the c binding layer to make sure that all this is maintainable like uh, what is the surface area of C? How big should you make it?

Starting point is 00:30:07 Yeah, basically. Because anytime I've done language bindings to another language, I use Swig that parses my C++ header files, and it does the work for me. I've never personally done this official C binding and then let people use that kind of thing. So I'm kind of curious what that ends up looking like in your world. Yeah. So I think if you look at machine learning inference or even training, typically you interact with a small surface of the interface.

Starting point is 00:30:34 If you're not involved in authoring individual nodes, you really don't need to do anything except send inputs in and get out outputs. So it's not as big of a surface as the whole thing. And that means that for a long time, the Python bindings were kind of a special case in TensorFlow. The other thing that I would say is that even for that small surface area, it's often useful to make bindings idiomatic.

Starting point is 00:30:59 So going back to your Swig example, I've seen a lot of libraries or a lot of applications that have bound with Swig and, I've seen a lot of libraries, or a lot of applications that have bound with Swig, and they bound their whole C API. And that turns out to be really good if you're kind of a C++ developer, and you want to prototype C++ things in Python. So I think Maya had a good, like very direct API for doing this, which is a 3D animation tool. And I used it extensively. And I've also done this in my own projects. But if you're trying to make something that's idiomatic Python or idiomatic Swift

Starting point is 00:31:28 or idiomatic Objective-C, you tend to want to rewrite the bindings in terms of manually. So you want to use the language features and the language idioms that are considered good for that particular language. And that's why a lot of people handwrite their bindings. Okay.

Starting point is 00:31:44 So your footprint footprint you're saying it is pretty low it basically just comes down to like loading a model and executing the model from as far as tensorflow light goes yeah i mean i believe there are ways to uh to construct the model i haven't looked at what the current situation of it is but in terms of what most people use from c it is uh to to run inference or to run a training loop, possibly. In terms of our Python bindings, we do actually use a wrapper generator. We just don't wrap the entire C++ library. We wrap a smaller interface.

Starting point is 00:32:19 Oh, that's what you were saying before, that it was a special case thing. Yeah. And traditionally, we used Swig as well. More recently, we switched to Pybind for the TensorFlow library. And Pybind's a library created by a graphics researcher who does a lot of rendering work, Vensel Jakob. And he has a lot of other

Starting point is 00:32:35 interesting C++ things you should look into him as a future guest, I would say. He's done a lot of things on machine learning combined with graphics which is really interesting uh differentiable rendering that's cool so now i'm also curious uh since you're talking about passing large chunks of data around through the c api like what does the how do you deal with object lifetime like because the c plus plus side of things is going to have some

Starting point is 00:33:01 notion of things and the c thing do you have to do like this typical, like create an object and then ask the binding to destroy it for you kind of thing or. Yeah. I mean, it's, you know, typically you'll have a free function associated with a, an opaque handle in the C API for a TensorFlow.

Starting point is 00:33:19 Okay. Um, in terms of, uh, what we do in TensorFlow light, a lot of times we're memory mapping, uh, the, the model, in which case we assume everything has kind of all those weights have an infinite lifetime. And we try to not copy any of the really big data. We only copy and create internal representations of the topology of the graph.

Starting point is 00:33:39 Just load it from the memory map whenever you need it. Yeah, exactly. And in fact, one of the big differences with what we did in TensorFlow and TensorFlow Lite is how we dealt with memory allocation in general. So TensorFlow for its traditional runtime uses kind of a reference counted tensor handle. So there's like a buffer and then tensors are sort of copy, not quite copy on write, but that same kind of feeling, which is that you have these handles and multiple reference counted views of them to emulate kind of a value semantic situation.

Starting point is 00:34:14 In TF Lite, we do ahead of time memory planning, where we try to create an arena that's the whole memory that you would need. So it's kind of like an uber activation frame of what's needed for the model, where we can overlap different parts of the computation that are used at different times to make the minimal overhead in terms of memory allocation, or at least a smaller, maybe not minimal. Yeah, minimal becomes very tricky if that's truly your goal so is TensorFlow Lite written as a separate library or is it like a pared down if deft out version of TensorFlow yeah it's a separate library right now but does it share a number of the same operations I think you said

Starting point is 00:35:01 it shares similar operations there's some differences in operations it defines some differences in operations. It defines some of its operations as fused versions. So, you know, we were talking about convolution. And it turns out that after a convolution, or before it is, depending on how you look at it, you often do a bias, which is just adding a vector to everything. And then you do some nonlinear thing. And if you kind of imagine doing those all at once while you're loading a single element, or you do the bias and the activation,

Starting point is 00:35:31 which are kind of like a nonlinear function, then you have reduced memory bandwidth, like typical kernel fusion type strategy. So TF Lite defines some of the key operations like convolution bias add as a fused operation to get higher performance there's different implementations a lot of the tf light implementations are arm optimized by hand using assembly and we use another google open source library called rui for that

Starting point is 00:36:00 which does basically fast quantized and floating point matrix multiplication, which is a primitive that's used by a lot of these types of machine learning operations. And you mentioned before that you are limited to C++14 with TensorFlow Lite. Yeah. So, I mean, there's kind of a long story. If you want to support software on a wide variety of machines, it tends to be, you get a lot of complaints if you go too new, too fast. The other thing that's interesting on TensorFlow Lite side, which I'm much more familiar, admittedly, is that we have a lot of people that are trying to take TensorFlow Lite and take it to really small devices like

Starting point is 00:36:42 microcontrollers. And while microcontrollers have gotten to be a lot better at handling new compiler toolchains, since ARM is so ubiquitous and RISC-V as well coming up, that means that you can use C++, whereas traditional embedded developers would never touch C++. They're still sometimes a little bit behind. And it's sort of like sometimes the chips that you might choose, you choose for hardware reasons,

Starting point is 00:37:09 and their tool chain might not be as advanced, even though there are definitely microcontrollers that make every version of C++ available. That's cool, yeah. On one hand, it's a shame that you can't move beyond C++14, but also really cool that you're able to at least use C++14 on these devices. Yeah, I mean, I see a lot of possible benefits of the C++17 and C++20 features. I think if you look at TF data and some of the streaming operations that we do,

Starting point is 00:37:41 they look very similar to coroutines and using coroutines directly might be really interesting. And then lots of other creature comforts. So we've talked a lot about training these models and deploying them and TensorFlow Lite and small devices and everything. And then now I'm like, well, what are people actually using TensorFlow Lite for? What kinds of models are they executing on handheld devices or microcontrollers? Yeah, I mean, so like, if you look at TensorFlow Lite, it's been deployed in like over 4 billion devices. And that's a lot. Yeah. So the way so it's been used in a lot of Google's core applications, you can imagine what types of things it might be used for.

Starting point is 00:38:27 If you look at an app like Google Photos and you type in something into the Google Photos search, you can say, I want to find a flowerpot, and it will show me the flowerpots. And that's basically an image classification. And that's a model that's been trained. There's some parts of that that can run on device, some parts of that that run on server. If you look at other models, like speech recognition is another big one. There's been a lot of work on speech recognition,

Starting point is 00:38:58 and that's enabled such devices like Google Home, where you can talk to it instead of having to interact with a traditional input device. You can also do that on your phone. And then there's, like, again, some of that can be done on device and some of that can be done on server. And it turns out that as mobile devices get better and better, you can do more and more on the mobile device and have to rely on the server less. Okay. It's fascinating just thinking about in my personal career, how many times we've gone to, you know, everything on the server, know everything local and everything on the server. Okay. Now let's find the balance that

Starting point is 00:39:35 makes sense for everyone. So it's interesting. Yeah. I mean, I think it's going to be a constant push and pull and there's always trade-offs. And this also happened in graphics, where you think about what you can render on a GPU versus what you can render on a CPU, or what you can render in 30 frames per second versus what you can render overnight. And you're always going to be pushing both of those at the same time. Another application that is really interesting that we demoed at Google I.O. that I worked on was the idea of using pose.

Starting point is 00:40:09 Like if you point a camera at someone, you can actually figure out what kind of orientation all their limbs are in. And you can do that and you can use that to kind of teach them things. So in that case, we took a dance instructor and showed people how to do dance moves, and then we slowed it down. And then we used that post-match to tell them when they're doing a good job and gave them a score and allowed them to improve and give them feedback on what they're doing. So a lot of different kind of applications that people have done with very specialized equipment like motion capture, like specialized cameras, now can be done with regular cameras on device. And that's a really exciting thing because it's going to just make it more ubiquitous. All right. That's neat. So has there been any, uh, I'm thinking about, you know, your desire to run these things on small, fast, you know, quickly on small devices and whatever. Has anyone done any work on like

Starting point is 00:41:02 actually compiling the model itself? Yeah, so this is kind of an interesting question. There's been a lot of work on compilers for machine learning. So part of TensorFlow, like released with TensorFlow is XLA, which is actually a compiler. And it takes as an input HLO, which is an instruction set, which is basically linear algebra operations. So you can actually take TensorFlow operations,

Starting point is 00:41:31 and they can be lowered to this dialect, and then they can be compiled to CPU to GPU. And this is actually how Google's TPU is actually fed. You make programs into these XLA programs. And the way this works in TensorFlow... Yeah, what is it? Processing unit or something? Yeah, the TPU.

Starting point is 00:41:50 Yeah, Tensor Processing Unit. Tensor Processing, okay, sorry. Yeah, so essentially, yeah, so that gives you the ability to compile things. And a lot of the benefit that you can get from that is kernel fusion, like we had talked about with TensorFlow Lite. You can imagine doing that on the fly.

Starting point is 00:42:06 You can imagine doing this with, basically XLA is integrated as a JIT. So you can basically tell TensorFlow to compile this thing. And then that becomes a new function that actually goes to a compiled version of it. That's cool. There's also another framework that's been created called MLIR,

Starting point is 00:42:25 which was started at Google by Chris Latner. And it is essentially a framework for creating multiple levels of representation of IRs. So you can build compilers for a wide variety of things. And the observation there is that you might have different sort of dialects or different IRs that you might have different sort of dialects or different IRs that are useful at different times. So you could imagine XLA being a dialect within it, or you could imagine TensorFlow and being a dialect with it. And in fact, the converter from TensorFlow to TensorFlow Lite is actually implemented using MLIR technology. So there's a

Starting point is 00:43:00 lot of, you know, exciting things happening with compilation uh what traditional tensorflow did for its implementation is it used a lot of eigen um so this is kind of using c++ as a uh compiler for an edsl essentially so a lot of the operations in tensorflow were implemented in terms of terms of eigen um so at that stage you can imagine um so you asked if there's like heavily uhization of TensorFlow, you might ask that. And mostly at the Eigen stage. But if you look at some of the operations, they're often templatized and their kernel implementations in terms of dimension and terms of type. So you've got a specialized version for type and dimension. And Eigen can sometimes do better with its packetization, where it's mapping these high level operations that are linear algebra or tensor operations into packetized SSE forms

Starting point is 00:43:55 that work better on CPU. All right. That's pretty awesome. Yeah. I'm sorry. Could you go over what Eigen is? Because it sounds familiar, but I don't think it's something we've talked about in quite a while. Yeah, so Eigen is a library that was created several years ago now to handle linear algebra.

Starting point is 00:44:13 So it's aimed to be a C++-oriented linear algebra library. And it uses expression templates to sort of describe repetitive operations. So if you're doing like a bunch of operations on small matrices, you'll get a specialized small matrix for that particular size and type that can outperform. So the basic idea of expression templates where try to take all the dynamic checks and turn them into templates.

Starting point is 00:44:41 Excellent. So, yeah. Oh, I was just going to say that I believe that the tensor functionality was added by a TensorFlow developer at the time. And that's the extension that was used to create the first version of TensorFlow. I'm always in favor of more things being done at compile time if they can be, of course. So that's, yeah. Yeah, I mean, there's a lot of things that you can imagine doing at compile time.

Starting point is 00:45:10 So as I said, the kernel fusion is one thing. The other thing is this memory planning, the idea, if you can infer what the shape of a tensor is, so tensors are basically multidimensional arrays, and various of these machine learning operations could be formulated in a way that you can infer what the final shape of those tensors are. But if you change certain inputs, like the image size, then the shape of the output tensor might be completely different. But if you can infer that it's the same, then you can allocate the memory more efficiently, you can pre partition for parallelism, etc, etc. So yeah, lots of opportunities. And after the models generated, you definitely know the sizes of these things. Yeah, for some types of models. There's also dynamic models where this is not the case.

Starting point is 00:45:50 So if you imagine just like a function, there's some types of functions like a convolution. If I stipulate that the image size is always something or I resize it first, then everything below that resize is known shape. But if I take an operation, I say, give me a tensor that has all the positive elements, then the size of that output tensor will be smaller or equal to the input tensor. So the shape is unknown. There, that's kind of a nice case because the shape is bound by the input shape, at least um so you might be able to do more

Starting point is 00:46:25 there but uh in general you could create a function where there is no guarantees so something i've been uh was just chatting with my nephew about recently is some of the articles and stuff i've seen you know come up on twitter or whatever about unintentional bias built into our machine learning models and um i'm just kind of curious if you have any opinion, if you'd like to talk at all about the ethics of building these models and a bias, intentional or otherwise, that can be built into them. Yeah, so I'm not an expert on this area, but I think it's super important. And one of the neat things is that Google is actually looking at this

Starting point is 00:47:03 because we take that really seriously. There's a whole website devoted to AI responsibility that Google has. Let me check the URL. I think it's ai.google.com. And it talks about what are some of the sources of bias, some best practices to avoid it. And I think there's even extensions to some of our tools. So we didn't talk about it, but TensorFlow has a visualization tool that's really nice for understanding what's going on in your model called TensorBoard.

Starting point is 00:47:32 And one of the nice things, I believe there's an extension that allows you to understand what the characterization of your data is. But, I mean, personally speaking, I would imagine, like one of the downsides of using data is that, you know, you're vulnerable to not choosing the right data, having some kind of sampling problem with it. And you need to actively work against this. And you actively need to compensate for that and be aware of what you're putting into it. There is no magic bullet to it. People have to be vigilant. Yeah, it sounds like it would be the kind of thing that would be at least today impossible to automatically detect bias in the system.

Starting point is 00:48:10 Right. Because like, if you imagine an automatic detector, you would, how would you make that you would have to use data to make that detector. So my best case, maybe you could compare detectors against other detectors or something. Right. I mean, I think the valuable thing about machine learning models is that they are kind of observable. You can put new inputs into and see how they do. That doesn't mean you understand the internals. There's a whole branch of AI about AI understandability, which is trying to take these models that are kind of trained through automatic means and try to make them interpretable. And some classes of training. Only want to create interpretable models.

Starting point is 00:48:49 You know. So you kind of think. Interpretable by a human. Yeah. That you're actually understanding how they work. And this. I mean these areas are still under active research. And I'm not an expert.

Starting point is 00:48:59 So I can't really say anything too intelligent about them. But I think these are the kind of tools that we're going to need to improve the situation. With any new technology, there's always this period of understanding how it's used, understanding other second-order implications that are really important. That's a fascinating comment, if I understood you right. You're saying that the average model that's created, a human cannot understand. I mean, it depends on what types of models that you use. That is one of the criticisms against deep learning often, which is that it's harder to understand the model. People have found techniques to kind of analyze them. They're certainly being

Starting point is 00:49:39 validated against their validation set. So they're producing the right answer for all the set that you look into. But it's fundamentally an unsolved problem right now. Interesting. Okay, well, and is there anything else you want to go over before we let you go? I think we covered a lot of things. I don't want to, you know, overwhelm everybody. But I mean, I think it's a really easy area to get into. There's lots of tutorials on doing machine learning. There's lots of YouTube channels, lots of content on how to get started with this. And it's really fun. Pick some problem that you want to do and do a weekend project with it. And I think you'll find that it's really exciting what you can achieve with very little code. And the frameworks have made it a lot easier, especially TensorFlow. One of the major things that we've been focusing on

Starting point is 00:50:26 the last couple of years is making it easier to use TensorFlow. The tf.keras library, which is kind of a high level for creating models, has made things way more understandable. And there's still the lower level library if you need to dive into a lot of details. So I encourage everyone to give it a try.

Starting point is 00:50:45 Okay. Thanks very much. Thank you. Thanks. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter.

Starting point is 00:51:13 We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

Your Ad Here

CppCast - TensorFlow

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.