Orchestrate all the Things - OctoML announces the latest release of its platform, exemplifies growth in MLOps. Featuring CEO & Co-founder Luis Ceze

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. OctoML is announcing the latest release of its platform to automate deployment of production-ready models across the broadest array of clouds, hardware devices and machine learning acceleration engines. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook. Picking up from where we left off, sort of, so as you said, the last time we spoke was March, and I think it was your Series A back then. I think it was Series B, that's right.

Starting point is 00:00:38 Okay, or Series B, yeah. I'm sorry, I may have missed a few series, because there's just, if you believe it, I mean, there's just so many series fundraising rounds every single week for the last month. I know, yeah, it's just easy to lose track. I know I hear you. Yeah, the market's very active, which is a good thing. You know, people are building stuff and capital is being deployed, right?

Starting point is 00:00:59 Yeah, yeah, exactly. So I figured, okay, before we dive into the news and the community aspect as well, let's do a little bit of catch up on where you were, let's say, last time we spoke and how you have developed the company since then. So unless I have missed something, you've done another funding round recently. That's right. right yeah we raised our series c uh a couple months ago that's right yeah uh to continue fueling the growth i guess motivated by uh all everything that happened since we closed series b and you know um technical

Starting point is 00:01:36 accomplishments team growth and all the partnerships that we had built you know i guess culminating to even more fuel to continue growing faster. Can you give a quick summary of some of the good, some of the key things that happened since Series B, if that'll be helpful? That'll be helpful, yeah? Yeah, absolutely. Yes, please do. Yeah, so in terms of technology, we made a ton more progress, both on Apache TVM, on, you know, increasing its reach,

Starting point is 00:02:06 no breadth of support of other hardware targets, as well as support for new frameworks and new types of models, model breadth and so on. And notably we announced partnerships with AMD, Qualcomm ARM, we've demonstrated and publicized some of the cool work we've done with Microsoft on their watch for video content monitoring system. They've been our partner customer there.

Starting point is 00:02:36 We grew the team. We doubled. We are close to 100, more than doubled. We're close to 100 people right now. We significantly improved and revamped our platform, which you're gonna hear more about later today. I guess that's another piece of the news. We started building a really awesome marketing team.

Starting point is 00:03:00 We hired David Messina from Docker, had experience building a marketing program in a company around open source at Docker. So he joined from a marketing team. And then a long series of technical accomplishments that we can go through along the conversation. But yeah, now where we are today, we raised about, as I mentioned,

Starting point is 00:03:21 we're about 100 people. We raised a little over $130 million worth of capital. And we have a very ambitious plan for growth over the next year and ramping up our software as a service product. Okay. Yeah, that's quite a lot, actually. And I just went and checked the previous, the write-up from the previous time that we talked.

Starting point is 00:03:43 And in there, I didn't see in your plans, there wasn't any mention of another funding round, but I guess probably I know the answer to that before I even ask. And you kind of hinted to it already. So there's just so much capital flying around these days. And I guess I've heard this story from many founders that actually they were not actively looking to raise more money, but it's more like VCs coming to them and telling them, well, you should be raising more and offering them more, actually.

Starting point is 00:04:13 Yeah. Yeah, no, sorry. Go ahead. Yeah, no, I can relate to that. I feel ours is a mix of really building momentum. I remember that, of course, we get calls all the time from venture capitalists trying to deploy capital and I was saying no and pushing that out because we weren't actively fundraising. Then we had a board meeting. It was pivotal in my thinking that we realized, oh wow, we have all these opportunities of serving, of continuing to grow and establish

Starting point is 00:04:42 our presence in enabling new hardware and new cloud platforms to work with TVM and our SaaS product, as well as end users coming to us and like, hey, we can grow even faster. And to grow faster, it's always more comfortable to grow faster with capital. So we have room for adjustment as the go-to-market effort catches up, right?

Starting point is 00:05:00 So, but yeah, it was this mix of significant interest to progress that we had made us very, it was very visible externally together with our you know business momentum and all the opportunities we saw in front of us was like hey you know this all stars aligned and we took uh and then we took the um took the opportunity to raise money earlier than we were planning and i thought it was exactly the right decision so looking back a couple months now so and the other thing you you also mentioned headcount. So you said that you are over 100 people at the moment.

Starting point is 00:05:29 Yeah, yeah, yeah. We're just crossing 100 now. So you actually exceeded the goal you had set back then because you said, well, that you were looking to grow the team up to something like 70 people. So you've exceeded that. That's right, yeah, we exceeded that. Yeah, no, I couldn't be more grateful to, you know,

Starting point is 00:05:46 our team here, everyone in the company is mobilizing, recruiting great people. We really value team culture as well. So there's something that we all take to heart so that I'm glad everyone's coming together and we need to think that we are very proud to be part of and excited to continue growing. But yeah, no, we exceeded our expectations

Starting point is 00:06:02 in terms of team growth there for sure, especially with the kind of people that we are hiring, right? Which is on the technical side in Ohio is where we're looking for people with experience in machine learning, in compilers, in systems, in the computer architecture, then finding that makes it something that tends to be challenging, but luckily we've been able to grow. So yeah, it seems like um yeah you you're doing quite quite well on that front as well um something else i wanted to ask you has to do with uh the product and well you already mentioned a few things on how it has grown technically and one thing i wanted to clarify was again because i was checking has the changed? Because last time I think you mentioned you refer to it as Octomizer and now I couldn't

Starting point is 00:06:48 find the name anywhere. So I guess now it's just OctoML, the product as well. Yeah, no, thanks. Thanks for bringing that up. Yeah. So we really need to simplify messaging. We are the OctoML platform, right? So the platform is probably the same name as the company.

Starting point is 00:07:04 It might be useful for me to articulate how we think about OctoML today and what the product is and also get into a little bit of what's the news. So the way we think about OctoML today essentially is offering performance automation and choice for folks to deploy machine learning models. Okay, so we are building this deployment platform that enables folks to upload models and get them ready to be deployed, making the most of the harder targets that they want to deploy it on, and including choosing in what harder

Starting point is 00:07:37 they should deploy their model, right? So automating the entire process of benchmarking across a range of choices. So if you're going to deploy it in the cloud, you want to find the one that has the right throughput per dollar or the right performance characteristics. If you're going to deploy it on the edge, you want to make sure it runs on the devices that you have access to. Or you want to be able to give choice of what models you run in your hardware that you've already deployed. Or you'll be able to choose the hardware to run your model at scale, right? And one thing that pins a lot of what we, the way we think is really accessibility and

Starting point is 00:08:11 sustainability in AI. You know, as you know, AI and ML models are very compute hungry, and they're very memory hungry. They use a lot of resources. So at scale, they can be huge resource hogs right so and we think that threatens progress in AI right so it's funny that you can think about sustainability not just from an energy and natural resource using directly but also sustainability in terms of human efforts and we think and we are confident we can attack both right so? So, and, you know, as you know, there is this growing set of harder targets

Starting point is 00:08:48 that folks can deploy models to, and they all have their best use case and being able to make that accessible to a broad set of users is something that enables better sustainability because you get better energy efficiency and better resource utilization, but also to make the entire process of doing that without having to recruit and find and use, you know, specialized low and expensive engineering is something that also brings sustainability from the human

Starting point is 00:09:16 aspect here. Anyways, and if you don't do this, like if you don't make it easier for folks to make the most out of the deployment hardware, you know, you're going to have skyrocketing costs. And if you rely on hand engineering, you're going to be seeing the delays in deploying models. And in the end, actually offering innovative products to end customers. So that's where read automation comes into play. And that's what we really at the very core is about automating the engineering required to get models ready for deployments while making the most out of the hardware targets right so okay so you mentioned

Starting point is 00:09:52 the ever-growing let's say options that people have in terms of deploying on various hardware infrastructures so and i think part of the news has to do with that exactly. And one part has to do with the fact that you now support Azure Cloud as well. And the other part has to do with new edge options, right? Yeah, that's right. And to be clear, what we are announcing now, what's the news here? It's really announcing the latest version of our machine learning deployment platform. We announced it is at CVMCon.

Starting point is 00:10:30 That's happening. Tutorial starts today. And the conference happened Thursday and Friday. So again, it's a full automated platform. And we have deployment targets now. We support the three big clouds, AWS, Azure, and Google Complete Platform with both AMD and Intel chips. So AMD and Intel CPUs, AMD GPUs

Starting point is 00:10:54 as targets in each of the clouds. And they have different mixes and matches of these, right? So being able to actually get your model and see how it runs across all three clouds in the specific instances gives people's choice in how they want to deploy it. And then also on the edge side, we onboarded, now we have available in the platform, NVIDIA Jetson and ARM Cortex-A,

Starting point is 00:11:16 NVIDIA Jetson's GPUs, edge GPUs, right? And ARM Cortex-A CPUs. And this is just ready to use in the platform. And we're also improving performance. So we've improved a lot of the core TVM in how we produce high-performing codes, high-performance codes in a specific hardware target using something that we called auto-scheduling and auto-tuning.

Starting point is 00:11:40 Those engines have improved significantly. And then we've shown, you might have seen some of our recent blogs showing technical data, but often in established models, established architectures, we offer 2x. It's not uncommon to offer 2x better performance gains with very little to no engineering effort. Is this answering what you were asking? I can go through some of the other things as well. We talked about the target, I was about to go into the models and why the performance is better,? I can go through some of the other things as well. We talked about the target, as I was about to go into the models

Starting point is 00:12:06 and why the performance is better, but I can get to that later. Yeah, yeah. I also wanted to get to that as well. But before we do, I have an additional question on, again, hardware deployment targets. So I saw one of your latest blog posts,

Starting point is 00:12:20 which was about M1 processors. And I also wanted to ask well what what's what's in your roadmap basically i know from the last time we spoke that um you are um you take the ecosystem view basically then you work uh in collaboration let's say with with hardware vendors to be able to onboard them and make them part of the TVM platform. So I guess that the fact that you now support M1 means that you must have worked with Apple and there was interest on their side to join the ecosystem. And I'm wondering if you also have interest from up and coming, let's say, vendors with more exotic platforms such as, I don't know, Samba Nova and Cerebros and Graphcore

Starting point is 00:13:03 and that lot. Yeah, that's an excellent question.amba Nova and Cerebros and Graphcore and that lot? Yeah, that's an excellent question. So let me comment on Apple's hardware first, and then we'll talk about the new up-and-coming ones. So with Apple, of course, they watch our results and we know folks there. We did not work directly with Apple on those. In fact, Apple tends, you know, it's their style to control hardware and software and then have an integrated offering. So and we respect that. But, you know, we actually used M1. We've done M1 last year and we've done M1 Max and Pro this year. The key thing there was to show that we can onboard new hardware onto TVM and use its machine learning basis query engine for optimizations to do the job of of automating the process of onboarding new hardware. So the key thing that we're going to

Starting point is 00:13:49 demonstrate that was that with in a couple of weeks one engineer is able to onboard a new hardware target on tvm get it up and running and run a model like bird you know fairly complex model and get really good performance out of it compared to, you know, a very strong baselines like in that case, Apple's optimized TensorFlow implementation, right? So, and we've demonstrated really good performance. There was a blog post last week that's probably referring to.

Starting point is 00:14:19 We have not onboarded that in the actual XML platform, SAS platform yet, because these chips are just now starting to become available in the cloud, right? And honestly, even though TVM supports it fundamentally, you know, so adding ads, adding that to our commercial SaaS platform requires, you know, us to spend on infrastructure. We want to see where customers are going, and we can do that very quickly, but we just take the expenses as customers demand it, right? But the key thing is proving

Starting point is 00:14:47 that the core technology works on that. And again, in that case, it did not work with Apple precisely because the reason we did not have to work with Apple is precisely because the machine learning based discovery engine of all the hardware can do and how it does it well is automatically, right? So it works well.

Starting point is 00:15:04 Okay. Does that answer your question yep yep great um and then now let's let's talk about the new you know it's just amazing to see all of the innovation going on in new um uh like i'm talking about new companies not just innovation research right so new companies coming on board with new exciting AI ML chips. And several of them, I would say most of them, ping us to work with them and they're interested in using TVM. And some of them do use TVM, some of more public than others.

Starting point is 00:15:35 They use it themselves and we support them via the community or sometimes even directly. In terms of working directly with us as being a supportive platform, it's something that we are open to, except that we have to be very strategic, right? So since we are a fast-growing company, but every single hardware target takes resources to add and maintain and add infrastructure for, and we can onboard them quickly. So what we do is we are taking the approach of as customers start requesting

Starting point is 00:16:05 that we can quickly add it to our platform. But we're not doing, we're being very selective early on in who are our partners because as we are set up, we want to focus on the broad install base, the customers want to deploy models today, but given that we can move very quickly to add new ones, we're going to move as the market is going there. So. Yeah, yeah. That's a fundamental technology. It's great to see them watching and actually using. One up and coming company that I think

Starting point is 00:16:33 is particularly interesting, Sima.ai, they're building embedded computer vision engines and they're very public about using TVM. We've worked with them closely. They participated in TVM Con last year. They're participating in TVM Con this year. So that's one example of a company that we that they essentially using tvm and they're very open

Starting point is 00:16:49 about it others are using tvm but less open about it okay okay okay thank you okay so yeah i think with that we can move on to the um to the software side of things and you briefly referred to it earlier so part of the news that you're announcing is expanded model format support and adding onyx runtime and tensorflow and tensorflow light right yeah and pytorch coming soon that's that's right yeah so essentially you know adding these native model importers make it you know even easier for folks to upload the models into the platform and get them ready for deployment. And it's something that we think is going to make it even easier for folks to use the platform without having to first themselves have to convert to,

Starting point is 00:17:36 say, ONNX before they upload the model. And then we now support acceleration engines for comparison natively and for packaging. like ONNX on time, TensorFlow on TensorFlow Lite, not just modeling ports, but also we support their native engines on the platform so users can compare and contrast and choose which one they want to use. And now we announced the ability of doing a very comprehensive benchmarking of, and we have this notion of workflow.

Starting point is 00:18:08 You upload the model and then you can choose what hardware targets you want in one single workflow. You say that's against all clouds or specific edge targets. we do the optimization, auto-tuning, packaging, benchmarking, and provide this comprehensive data to help you make recruitment decisions in how you're going to deploy in the cloud or how you're going to buy your edge devices. And we have also, as you're going to hear more, we are in the process of announcing to a collection of models have been pre-opt popular models that have been pre-optimized in the cloud uh for for cloud deployments right and um and people can go and see how well they run in in in different hardware targets and so on we call

Starting point is 00:18:57 these you know our version of a model of a model zoo right so okay okay interesting i don't i don't recall maybe that was in the uh in the uh draft uh announcement but if it was i missed it so what what types of models are those yeah you're gonna you're gonna see more uh in the in the tvm um con uh later but in decision for example computer vision models they're pre-optimized over a collection of hardware targets we already support for convenience and for folks to see how well our technology performs. And those are going to be there ready to be used and compared and contrasted by the end user. Quick question on something I missed previously, edge deployments. I wanted to ask you actually as part of, you know,

Starting point is 00:19:45 when we talked about the hardware targets, but since you said that you're basically kind of trying to see where the market is going and following along, do you see increased demand for edge deployment? Yeah, absolutely. So we see that more and more, we see customers wanting to deploy on ARM-based SOCs and, you know, NVIDIA Jetson-like edge GPUs, that's for sure. So one of them, you know, there's one

Starting point is 00:20:11 consumer electronics companies that is looking at using NVIDIA Jetson-like, you know, GPUs for a end-user computer vision appliance, let's put it this way. And then there's automotive technology companies that are using computer vision in their cars and are considering, of course, those are beefier edge devices and they come from a platform, right? So we definitely see growth and interest in edge devices. So there's plenty of opportunities in cloud deployments. And I would say that there's definitely huge business to be made there.

Starting point is 00:20:49 But there's a lot of excitement in the edge. And that's something that we pay a lot of attention to. And we're ready to go there. That's why we even already support some of the popular hardware targets there. Okay. I know one of the other, let's say, big goals down the road that you had last time we spoke was eventually being able to support training as well. And I wonder, well, how much of a progress you have made towards that goal? Yeah, no, it's a great question.

Starting point is 00:21:18 So we've made very significant progress on the core technology aspect. It's not on the platform yet. It is in the roadmap for next year, but it's not on the product right now. And we made a ton of progress on supporting training on TVM and also integrating the rest of the ecosystem required to enable training.

Starting point is 00:21:44 As you know, this is a significant lift. But luckily, when you look at the actual core computational components in training, that's what matters to us. And we can support those really well when making the necessary extensions to support more dynamic behavior for, say, dynamic shapes, more control flow in the code and so on. And this is something that TVM is moving very fast towards. more dynamic behavior for say dynamic shapes, more control flow in the code and so on. And this is something that TVM is moving very fast towards. So I guess the answer is that it's in

Starting point is 00:22:11 the roadmap and maybe it will come along sometime in the coming year. That's right. Yes. But the core technology is making a lot of progress and this is very apparent in TVM open source GitHub as well. And you're going to hear a lot about training support on TVM during TV source GitHub as well. And you're going to hear a lot about training support on TVM during TVMCon as well. Okay, so that's a great segue for me to ask then. I wanted to ask about that anyway. The current parity status, let's say, between Noctoml, the product,

Starting point is 00:22:39 and Apache TVM, the project, and also a little bit on the conference itself, since you started mentioning it. Yeah, I can do that. So TVM is one of the engines, it's the core engine that we support in our platform. It's not the only engine that's in the roadmap because our goal is right, so is to offer the customer the ability to get the best performance and possibly get on the harder that they want.

Starting point is 00:23:08 So one thing that's in our immediate roadmap is to mix and match support between Octivium and existing stacks, say native support for NVIDIA, native support for ARM and Intel, mix and match those to always offer guaranteed state-of-the-art performance to the customer, even including existing parallel stacks, right? So that's one of the things that helps, you know, put in context how the platform relates to TVM, right?

Starting point is 00:23:37 So our commercial product, the OctoML SaaS platform, uses TVM and other parallel stacks. And on top of that, even for the TVM use case, we need to feed TVM data, right, to do its job well. Because as I mentioned a few times before, TVM uses machine learning for machine learning optimization. And the more it uses, the better it gets because it collects data about pass optimization for future place. And this data is something that we keep for our SaaS users. If you're going to go and use TVM open source standalone, you have to go and build that training site yourself. And then one thing I want to touch on briefly, sorry if I'm jumping back and forth here, George,

Starting point is 00:24:19 but I can't resist sharing that there's some cool customer examples that you'd hear more at TVMCon. You know, there's this project called Microsoft Watch, you know, so they essentially built a system that does computer vision over videos to find bad content, inappropriate content and find bugs in videos and so on. And they do huge amounts of computation to do that. So that means that high efficiency and low cost matters to them. and videos and so on. And they do huge amounts of computation to do that. So that means that high efficiency and low cost matters to them. And we use their technology offer anywhere from 1.2 to 3X throughput gain and cost reduction using a technology that we called AutoScheduler

Starting point is 00:24:59 together with AutoTVM. And anyway, so that was one example that I found exciting to bring up. And the other one is Woven Planet. So they're associated with Toyota, they're a software company for principally, automated driving, right? So self-driving, but also they're looking at smart cities,

Starting point is 00:25:18 robotics mapping and so on. They rely heavily on ML and Edge ML to your point, and they embedded TVM into their MLOps pipelines. And also they call into our platform as part of their deployment flow. And you're going to hear more about this in TVM.com. So these are like two examples of one in the Edge and one in the cloud of how people are using our technology. Interesting. And what about the community aspect?

Starting point is 00:25:48 And, you know, with TVM being an Apache open source project and this being their main event, I guess, do you have any, I don't know, any numbers you can share in terms of participation? Yeah, the community grew a lot. Now we are over 640 lifetime contributors. And it's from close to 70 different institutions, principally industry institutions, but also

Starting point is 00:26:18 academic institutions, because a lot of folks do research with it. And one of the things that we are making a push for in this conference as a community is what we call TVM Unity, which is this view of making TVM easier to use from all the key participants, which are ML engineers, system software engineers, and hardware vendors, make it easier to onboard new hardware, to onboard new optimizations, and so on. And also make it easier to mix and match pieces that TVM offer with other parts of the ecosystem. We are striving for TVM to be seen not as a competitor to some of other stacks that overlap a bit, but as a place to glue everything together.

Starting point is 00:26:56 We have strong integration with MLIR-based solutions for high-level optimizations and cogeneration. We have planned support for high-level frameworks. and code generation. We have support, planned support for high-level frameworks. PyTorch is the principal one that we are working hard on right now to get with other participants in the community, but also NumPy and et cetera. And then integration with the rest of the hardware vendor system software stacks like NVIDIA Stance for RT, CodeDNN, and then Intel's N Intel, ZmkL, DNN, and 1DNN. So in terms of

Starting point is 00:27:29 numbers, I don't know if I should spoil the surprise here, but it's well, last year we had a thousand, this year we're well over that, let's put it this way. Yeah, fair enough. Did I answer your question, George? I feel like there was a lot here. Yes, yes, indeed. Yeah, and I think typically I like to wrap up conversations with future plans, but I think we've already covered that pretty much. Yeah, great. So, I don't know. I'll leave the wrapping up part to you for a change.

Starting point is 00:28:02 Anything that you feel we left out and we should mention? No, I think that, you know, just to reiterate that we're all about getting AI to be more sustainable and more accessible. And when people to come and try it out, come and participate in the conference. There's a lot of really good content.

Starting point is 00:28:20 I'm sure everyone in this space would really enjoy it. So call to action for folks participating in the conference here and if you can participate in the conference come later and look at the content so tomorrow when the embargo expires and this will be

Starting point is 00:28:36 published well people will get to hear and read about it so we'll see if that works the call to action thank you George thank you, George. Great. Thank you for, it was nice catching up with you and well, success with all of your plans. Thank you very much. You too. Stay safe and healthy.

Starting point is 00:28:53 Nice seeing you again. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

Orchestrate all the Things - OctoML announces the latest release of its platform, exemplifies growth in MLOps. Featuring CEO & Co-founder Luis Ceze

OctoML is announcing the latest release of its platform to automate deployment of production-ready models across the broadest array of clouds, hardware devices and machine learning acceleration e...ngines. Article published on ZDNet

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Orchestrate all the Things - OctoML announces the latest release of its platform, exemplifies growth in MLOps. Featuring CEO & Co-founder Luis Ceze

OctoML is announcing the latest release of its platform to automate deployment of production-ready models across the broadest array of clouds, hardware devices and machine learning acceleration e...ngines. Article published on ZDNet

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.