Orchestrate all the Things - OctoML announces the latest release of its platform, exemplifies growth in MLOps. Featuring CEO & Co-founder Luis Ceze
Episode Date: December 16, 2021OctoML is announcing the latest release of its platform to automate deployment of production-ready models across the broadest array of clouds, hardware devices and machine learning acceleration e...ngines. Article published on ZDNet
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together.
OctoML is announcing the latest release of its platform to automate deployment of production-ready models across the broadest array of clouds, hardware devices and machine learning acceleration engines.
I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.
Picking up from where we left off,
sort of, so as you said,
the last time we spoke was March,
and I think it was your Series A back then.
I think it was Series B, that's right.
Okay, or Series B, yeah.
I'm sorry, I may have missed a few series,
because there's just, if you believe it,
I mean, there's just so many series fundraising rounds every single week for the last month.
I know, yeah, it's just easy to lose track.
I know I hear you.
Yeah, the market's very active, which is a good thing.
You know, people are building stuff and capital is being deployed, right?
Yeah, yeah, exactly.
So I figured, okay, before we dive into the news and the community aspect as well,
let's do a little bit of catch up on where you were, let's say, last time we spoke
and how you have developed the company since then.
So unless I have missed something, you've done another funding round recently.
That's right. right yeah we raised our
series c uh a couple months ago that's right yeah uh to continue fueling the growth i guess
motivated by uh all everything that happened since we closed series b and you know um technical
accomplishments team growth and all the partnerships that we had built you know i guess
culminating to even more fuel to continue growing faster.
Can you give a quick summary of some of the good, some of the key things that happened since Series B, if that'll be helpful?
That'll be helpful, yeah?
Yeah, absolutely.
Yes, please do.
Yeah, so in terms of technology, we made a ton more progress, both on Apache TVM, on,
you know, increasing its reach,
no breadth of support of other hardware targets,
as well as support for new frameworks
and new types of models, model breadth and so on.
And notably we announced partnerships with AMD, Qualcomm ARM,
we've demonstrated and publicized
some of the cool work we've done with Microsoft
on their watch for video content monitoring system.
They've been our partner customer there.
We grew the team.
We doubled.
We are close to 100, more than doubled.
We're close to 100 people right now.
We significantly improved and revamped our platform,
which you're gonna hear more about later today.
I guess that's another piece of the news.
We started building a really awesome marketing team.
We hired David Messina from Docker,
had experience building a marketing program
in a company around open source at Docker.
So he joined from a marketing team.
And then a long series of technical accomplishments
that we can go through along the conversation.
But yeah, now where we are today,
we raised about, as I mentioned,
we're about 100 people.
We raised a little over $130 million worth of capital.
And we have a very ambitious plan for growth over the next year
and ramping up our software as a service product.
Okay.
Yeah, that's quite a lot, actually.
And I just went and checked the previous,
the write-up from the previous time that we talked.
And in there, I didn't see in your plans,
there wasn't any mention of another funding round,
but I guess probably I know the answer to that
before I even ask.
And you kind of hinted to it already.
So there's just so much capital flying around these days.
And I guess I've heard this story from many founders
that actually they were not actively looking to raise more money, but it's more like VCs coming to them and telling them, well, you should be raising more and offering them more, actually.
Yeah.
Yeah, no, sorry.
Go ahead.
Yeah, no, I can relate to that.
I feel ours is a mix of really building momentum.
I remember that, of course, we get calls all the time from venture
capitalists trying to deploy capital and I was saying no and pushing that out because we weren't actively fundraising. Then we had a board meeting. It was pivotal in my thinking that we realized,
oh wow, we have all these opportunities of serving, of continuing to grow and establish
our presence in enabling new hardware and new cloud platforms
to work with TVM and our SaaS product,
as well as end users coming to us and like,
hey, we can grow even faster.
And to grow faster, it's always more comfortable
to grow faster with capital.
So we have room for adjustment
as the go-to-market effort catches up, right?
So, but yeah, it was this mix of significant interest
to progress that we had made us very,
it was very visible externally together with our you know business momentum and all the
opportunities we saw in front of us was like hey you know this all stars aligned and we took
uh and then we took the um took the opportunity to raise money earlier than we were planning
and i thought it was exactly the right decision so looking back a couple months now so
and the other thing you you also mentioned headcount.
So you said that you are over 100 people at the moment.
Yeah, yeah, yeah.
We're just crossing 100 now.
So you actually exceeded the goal you had set back then
because you said, well, that you were looking to grow the team
up to something like 70 people.
So you've exceeded that.
That's right, yeah, we exceeded that.
Yeah, no, I couldn't be more grateful to, you know,
our team here, everyone in the company is mobilizing,
recruiting great people.
We really value team culture as well.
So there's something that we all take to heart
so that I'm glad everyone's coming together
and we need to think that we are very proud to be part of
and excited to continue growing.
But yeah, no, we exceeded our expectations
in terms of team growth there for sure,
especially with the kind of people that we are hiring, right? Which is on the technical side in Ohio is where
we're looking for people with experience in machine learning, in compilers, in systems,
in the computer architecture, then finding that makes it something that tends to be challenging,
but luckily we've been able to grow. So yeah, it seems like um yeah you you're doing quite quite well on that front
as well um something else i wanted to ask you has to do with uh the product and well you already
mentioned a few things on how it has grown technically and one thing i wanted to clarify
was again because i was checking has the changed? Because last time I think you mentioned you refer to it as Octomizer and now I couldn't
find the name anywhere.
So I guess now it's just OctoML, the product as well.
Yeah, no, thanks.
Thanks for bringing that up.
Yeah.
So we really need to simplify messaging.
We are the OctoML platform, right?
So the platform is probably the same name as the company.
It might be useful
for me to articulate how we think about OctoML today and what the product is and also get into a little bit of what's the news. So the way we think about OctoML today essentially is offering
performance automation and choice for folks to deploy machine learning models. Okay, so we are building this deployment platform
that enables folks to upload models
and get them ready to be deployed,
making the most of the harder targets
that they want to deploy it on,
and including choosing in what harder
they should deploy their model, right?
So automating the entire process of benchmarking
across a range of choices.
So if you're going to deploy it in the cloud, you want to find the one that has the right throughput per dollar or the right performance characteristics.
If you're going to deploy it on the edge, you want to make sure it runs on the devices that you have access to.
Or you want to be able to give choice of what models you run in your hardware that you've already deployed.
Or you'll be able to choose the hardware to run your model at scale, right?
And one thing that pins a lot of what we, the way we think is really accessibility and
sustainability in AI.
You know, as you know, AI and ML models are very compute hungry, and they're very memory
hungry.
They use a lot of resources.
So at scale, they can be huge resource hogs right so and we think that threatens
progress in AI right so it's funny that you can think about sustainability not just from an
energy and natural resource using directly but also sustainability in terms of human efforts and
we think and we are confident we can attack both right so? So, and, you know, as you know, there is this growing set of harder targets
that folks can deploy models to,
and they all have their best use case
and being able to make that accessible
to a broad set of users is something that enables
better sustainability because you get better
energy efficiency and better resource utilization,
but also to make the entire process of doing that without having to recruit and find and use, you know, specialized
low and expensive engineering is something that also brings sustainability from the human
aspect here.
Anyways, and if you don't do this, like if you don't make it easier for folks to make
the most out of the deployment hardware, you know, you're going to have skyrocketing costs.
And if you rely on hand engineering, you're going to be seeing the delays in deploying models.
And in the end, actually offering innovative products to end customers.
So that's where read automation comes into play.
And that's what we really at the very core is about automating the engineering required to get models ready
for deployments while making the most out of the hardware targets right so okay so you mentioned
the ever-growing let's say options that people have in terms of deploying on various hardware
infrastructures so and i think part of the news has to do with that exactly.
And one part has to do with the fact that you now support Azure Cloud as well.
And the other part has to do with new edge options, right?
Yeah, that's right.
And to be clear, what we are announcing now, what's the news here?
It's really announcing the latest version of our machine learning deployment platform.
We announced it is at CVMCon.
That's happening.
Tutorial starts today.
And the conference happened Thursday and Friday.
So again, it's a full automated platform.
And we have deployment targets now.
We support the three big clouds, AWS, Azure, and Google Complete Platform
with both AMD and Intel chips.
So AMD and Intel CPUs, AMD GPUs
as targets in each of the clouds.
And they have different mixes and matches of these, right?
So being able to actually get your model
and see how it runs across all three clouds
in the specific instances gives people's choice in how they want to deploy it.
And then also on the edge side,
we onboarded, now we have available in the platform,
NVIDIA Jetson and ARM Cortex-A,
NVIDIA Jetson's GPUs, edge GPUs, right?
And ARM Cortex-A CPUs.
And this is just ready to use in the platform.
And we're also improving performance.
So we've improved a lot of the core TVM
in how we produce high-performing codes,
high-performance codes in a specific hardware target
using something that we called auto-scheduling and auto-tuning.
Those engines have improved significantly.
And then we've shown,
you might have seen some of our recent blogs showing technical data, but often in established models, established architectures, we offer 2x.
It's not uncommon to offer 2x better performance gains with very little to no engineering effort.
Is this answering what you were asking?
I can go through some of the other things as well.
We talked about the target, I was about to go into the models and why the performance is better,? I can go through some of the other things as well. We talked about the target,
as I was about to go into the models
and why the performance is better,
but I can get to that later.
Yeah, yeah.
I also wanted to get to that as well.
But before we do,
I have an additional question on, again,
hardware deployment targets.
So I saw one of your latest blog posts,
which was about M1 processors.
And I also wanted to ask well what what's what's
in your roadmap basically i know from the last time we spoke that um you are um you take the
ecosystem view basically then you work uh in collaboration let's say with with hardware
vendors to be able to onboard them and make them part of the TVM platform. So I guess that the fact that you now
support M1 means that you must have worked with Apple and there was interest on their side to
join the ecosystem. And I'm wondering if you also have interest from up and coming, let's say,
vendors with more exotic platforms such as, I don't know, Samba Nova and Cerebros and Graphcore
and that lot. Yeah, that's an excellent question.amba Nova and Cerebros and Graphcore and that lot?
Yeah, that's an excellent question. So let me comment on Apple's hardware first, and then we'll talk about the new up-and-coming ones. So with Apple, of course, they watch our results and we
know folks there. We did not work directly with Apple on those. In fact, Apple tends, you know,
it's their style to control hardware and software and then have an integrated offering. So and we respect that.
But, you know, we actually used M1.
We've done M1 last year and we've done M1 Max and Pro this year.
The key thing there was to show that we can onboard new hardware onto TVM and use its machine learning basis query engine for optimizations to do the job of
of automating the process of onboarding new hardware. So the key thing that we're going to
demonstrate that was that with in a couple of weeks one engineer is able to onboard a new
hardware target on tvm get it up and running and run a model like bird you know fairly complex
model and get really good performance out of it compared to,
you know, a very strong baselines like in that case,
Apple's optimized TensorFlow implementation, right?
So, and we've demonstrated really good performance.
There was a blog post last week
that's probably referring to.
We have not onboarded that in the actual
XML platform, SAS platform yet,
because these chips are just
now starting to become available in the cloud, right? And honestly, even though TVM supports it
fundamentally, you know, so adding ads, adding that to our commercial SaaS platform requires,
you know, us to spend on infrastructure. We want to see where customers are going, and we can do
that very quickly, but we just take the expenses as customers demand it, right?
But the key thing is proving
that the core technology works on that.
And again, in that case,
it did not work with Apple precisely
because the reason we did not have to work with Apple
is precisely because the machine learning
based discovery engine of all the hardware can do
and how it does it well is automatically, right?
So it works well.
Okay.
Does that answer your question yep yep great um and then now let's let's talk about the new you know it's just amazing to see
all of the innovation going on in new um uh like i'm talking about new companies not just innovation
research right so new companies coming on board with new exciting AI ML chips.
And several of them, I would say most of them,
ping us to work with them and they're interested in using TVM.
And some of them do use TVM,
some of more public than others.
They use it themselves and we support them
via the community or sometimes even directly.
In terms of working directly with us
as being a supportive platform,
it's something that we are open to, except that we have to be very strategic, right? So since we are a fast-growing
company, but every single hardware target takes resources to add and maintain and add
infrastructure for, and we can onboard them quickly. So what we do is we are taking the
approach of as customers start requesting
that we can quickly add it to our platform. But we're not doing, we're being very selective early
on in who are our partners because as we are set up, we want to focus on the broad install base,
the customers want to deploy models today, but given that we can move very quickly to add new
ones, we're going to move as the market is going there. So.
Yeah, yeah.
That's a fundamental technology.
It's great to see them watching and actually using.
One up and coming company that I think
is particularly interesting, Sima.ai,
they're building embedded computer vision engines
and they're very public about using TVM.
We've worked with them closely.
They participated in TVM Con last year.
They're participating in TVM Con this year.
So that's
one example of a company that we that they essentially using tvm and they're very open
about it others are using tvm but less open about it okay okay okay thank you okay so yeah i think
with that we can move on to the um to the software side of things and you briefly referred to it earlier so part of the news that you're
announcing is expanded model format support and adding onyx runtime and tensorflow and tensorflow
light right yeah and pytorch coming soon that's that's right yeah so essentially you know adding
these native model importers make it you know even easier for folks to upload the models into
the platform
and get them ready for deployment. And it's something that we think is going to make it
even easier for folks to use the platform without having to first themselves have to convert to,
say, ONNX before they upload the model. And then we now support acceleration engines for comparison
natively and for packaging. like ONNX on time,
TensorFlow on TensorFlow Lite, not just modeling ports,
but also we support their native engines on the platform
so users can compare and contrast and choose
which one they want to use.
And now we announced the ability of doing
a very comprehensive benchmarking of, and we have this notion of workflow.
You upload the model and then you can choose what hardware targets you want in one single workflow.
You say that's against all clouds or specific edge targets. we do the optimization, auto-tuning, packaging, benchmarking, and provide this comprehensive
data to help you make recruitment decisions in how you're going to deploy in the cloud
or how you're going to buy your edge devices.
And we have also, as you're going to hear more, we are in the process of announcing
to a collection of models have been
pre-opt popular models that have been pre-optimized in the cloud uh for for cloud deployments right and
um and people can go and see how well they run in in in different hardware targets and so on we call
these you know our version of a model of a model zoo right so okay okay interesting i don't i don't
recall maybe that was in the uh in the uh draft
uh announcement but if it was i missed it so what what types of models are those yeah you're gonna
you're gonna see more uh in the in the tvm um con uh later but in decision for example
computer vision models they're pre-optimized over a collection of hardware targets we already support for convenience and for folks to see how well our technology
performs. And those are going to be there ready to be used and compared and contrasted by the
end user. Quick question on something I missed previously, edge deployments. I wanted to ask
you actually as part of, you know,
when we talked about the hardware targets,
but since you said that you're basically kind of trying to see
where the market is going and following along,
do you see increased demand for edge deployment?
Yeah, absolutely.
So we see that more and more,
we see customers wanting to deploy on ARM-based SOCs and, you know,
NVIDIA Jetson-like edge GPUs, that's for sure. So one of them, you know, there's one
consumer electronics companies that is looking at using NVIDIA Jetson-like, you know, GPUs for
a end-user computer vision appliance, let's put it this way. And then there's automotive technology companies
that are using computer vision in their cars
and are considering, of course, those are beefier edge devices
and they come from a platform, right?
So we definitely see growth and interest in edge devices.
So there's plenty of opportunities in cloud deployments.
And I would say that there's definitely huge business to be made there.
But there's a lot of excitement in the edge.
And that's something that we pay a lot of attention to.
And we're ready to go there.
That's why we even already support some of the popular hardware targets there.
Okay.
I know one of the other, let's say, big goals down the road that you had last time we spoke was eventually being able to support training as well.
And I wonder, well, how much of a progress you have made towards that goal?
Yeah, no, it's a great question.
So we've made very significant progress on the core technology aspect.
It's not on the platform yet.
It is in the roadmap for next year,
but it's not on the product right now.
And we made a ton of progress
on supporting training on TVM
and also integrating the rest of the ecosystem
required to enable training.
As you know, this is a significant lift.
But luckily, when you look at the actual core computational components in training,
that's what matters to us.
And we can support those really well when making the necessary extensions
to support more dynamic behavior for, say, dynamic shapes,
more control flow in the code and so on.
And this is something that TVM is moving very fast towards. more dynamic behavior for say dynamic shapes, more control flow in the code and so on. And
this is something that TVM is moving very fast towards. So I guess the answer is that it's in
the roadmap and maybe it will come along sometime in the coming year. That's right. Yes. But the
core technology is making a lot of progress and this is very apparent in TVM open source GitHub
as well. And you're going to hear a lot about training support on TVM during TV source GitHub as well. And you're going to hear a lot about training support
on TVM during TVMCon as well.
Okay, so that's a great segue for me to ask then.
I wanted to ask about that anyway.
The current parity status, let's say,
between Noctoml, the product,
and Apache TVM, the project,
and also a little bit on the conference itself, since you started mentioning
it.
Yeah, I can do that.
So TVM is one of the engines, it's the core engine that we support in our platform.
It's not the only engine that's in the roadmap because our goal is right, so is to offer the customer the ability
to get the best performance
and possibly get on the harder that they want.
So one thing that's in our immediate roadmap
is to mix and match support between Octivium
and existing stacks, say native support for NVIDIA,
native support for ARM and Intel,
mix and match those to always offer guaranteed
state-of-the-art performance to the customer,
even including existing parallel stacks, right?
So that's one of the things that helps, you know, put in context how the platform relates to TVM, right?
So our commercial product, the OctoML SaaS platform, uses TVM and other parallel stacks. And on top of that, even for the TVM use case,
we need to feed TVM data, right, to do its job well. Because as I mentioned a few times before,
TVM uses machine learning for machine learning optimization. And the more it uses, the better
it gets because it collects data about pass optimization for future place. And this data is something that we keep for our SaaS users.
If you're going to go and use TVM open source standalone,
you have to go and build that training site yourself.
And then one thing I want to touch on briefly,
sorry if I'm jumping back and forth here, George,
but I can't resist sharing that there's some cool customer examples that you'd hear more at TVMCon.
You know, there's this project called Microsoft Watch, you know, so they essentially built a system that does computer vision over videos to find bad content, inappropriate content and find bugs in videos and so on.
And they do huge amounts of computation to do that.
So that means that high efficiency and low cost matters to them. and videos and so on. And they do huge amounts of computation to do that.
So that means that high efficiency and low cost matters to them.
And we use their technology offer anywhere
from 1.2 to 3X throughput gain and cost reduction
using a technology that we called AutoScheduler
together with AutoTVM.
And anyway, so that was one example
that I found exciting to bring up.
And the other one is Woven Planet.
So they're associated with Toyota,
they're a software company for principally,
automated driving, right?
So self-driving, but also they're looking at smart cities,
robotics mapping and so on.
They rely heavily on ML and Edge ML to your point,
and they embedded TVM into their MLOps pipelines.
And also they call into our platform as part of their deployment flow.
And you're going to hear more about this in TVM.com.
So these are like two examples of one in the Edge and one in the cloud of how people are using our technology.
Interesting.
And what about the community aspect?
And, you know,
with TVM being an Apache open source project
and this being their main event, I guess,
do you have any, I don't know,
any numbers you can share in terms of participation?
Yeah, the community grew a lot.
Now we are over 640 lifetime contributors.
And it's from close to 70 different institutions, principally industry institutions, but also
academic institutions, because a lot of folks do research with it.
And one of the things that we are making a push for in this conference as a community
is what we call TVM Unity, which is this view of making TVM easier to use from all the key
participants, which are ML engineers, system software engineers, and hardware vendors,
make it easier to onboard new hardware, to onboard new optimizations, and so on. And also
make it easier to mix and match pieces that TVM
offer with other parts of the ecosystem. We are striving for TVM to be seen not as a competitor
to some of other stacks that overlap a bit, but as a place to glue everything together.
We have strong integration with MLIR-based solutions for high-level optimizations and
cogeneration. We have planned support for high-level frameworks. and code generation. We have support, planned support for high-level frameworks.
PyTorch is the principal one that we are working hard on
right now to get with other participants in the community,
but also NumPy and et cetera.
And then integration with the rest of the hardware vendor
system software stacks like NVIDIA Stance for RT,
CodeDNN, and then Intel's N Intel, ZmkL, DNN, and 1DNN. So in terms of
numbers, I don't know if I should spoil the surprise here, but it's well, last year we had
a thousand, this year we're well over that, let's put it this way. Yeah, fair enough.
Did I answer your question, George? I feel like there was a lot here. Yes, yes, indeed.
Yeah, and I think typically I like to wrap up conversations with future plans,
but I think we've already covered that pretty much.
Yeah, great.
So, I don't know.
I'll leave the wrapping up part to you for a change.
Anything that you feel we left out and we should mention?
No, I think that, you know,
just to reiterate that we're all about
getting AI to be more sustainable
and more accessible.
And when people to come and try it out,
come and participate in the conference.
There's a lot of really good content.
I'm sure everyone in this space
would really enjoy it.
So call to action for folks participating
in the conference here and if you can participate in the conference
come later and look at the content
so
tomorrow when the
embargo expires and this will be
published well people will get to
hear and read about it so
we'll see if that works
the call to action
thank you George thank you, George.
Great. Thank you for, it was nice catching up with you
and well, success with all of your plans.
Thank you very much. You too. Stay safe and healthy.
Nice seeing you again.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration
on Twitter, LinkedIn, and Facebook.