Orchestrate all the Things - A troubleshooting platform for free: Netdata scores $14.2M funding to extend its open-source application monitoring platform. Featuring CEO / Founder Costas Tsaousis
Episode Date: September 23, 2020Netdata, the company behind the eponymous open-source, distributed, real-time, performance, and health monitoring solution for systems and applications, has announced it completed a new round of ...financing of 14,2 million dollars. We take the opportunity to dive in the Netdata story with CEO and Founder Costas Tsaousis: what they do and why, their open source approach and distributed architecture, and their position in the overall observability ecosystem. Article published on ZDNet
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
Today's episode features Netdata CEO and founder Kostas Tchaousis.
Netdata, the company behind the eponymous open-source distributed real-time performance and health monitoring solution,
has announced it completed a new round of financing of $14.2 million.
We take the opportunity to dive in the new data story with Ceauses, what they do and
why, their open-source approach and distributed architecture, and their position in the overall
observability ecosystem.
I hope you will enjoy the podcast.
If you like my work, you can follow Link Data Registration on Twitter, LinkedIn, and Facebook.
And I guess a typical way to
start an introduction with someone that
we haven't been in touch before and also to kind of introduce yourself and
the company to the world at large. I would say a few words about
how you got started and a few key milestones
actually I would say would be a good thing to highlight since the occasion today is your
upcoming founding and congratulations for that. So it would be interesting to know the
way that led to this point basically. Okay, so my name is Kostas
Tsaousis. I am the founder
and the CEO of
NEDATA.
I started NEDATA in 2014
and actually
it came out of
frustration. I was
really pissed off of
the monitoring systems that
existed by that time.
I couldn't believe that most of the monitoring systems even today provide just an airplane
view of the infrastructure.
So they cherry-pick just a few very important metrics.
They cherry-pick some information, very limited information, and they try to use this information to provide a monitoring solution.
The problem, however, with this is that it is too abstract for actual work.
Many insights are lost, and in the sake of scalability, actually, we have very limited monitoring solutions. I decided to
build the data in a completely different, with completely different principles. So the
first thing is that I decided that the monitoring system should have all the metrics, all the
information in full resolution, per second resolution as a standard, and then work around the scalability problem.
So, in a day-to-day, it's a distributed solution.
We are not centralizing all the data to one place.
So, this provides infinite scalability with full resolution and all the insights.
At the same time, it is packaged in a very nice way
that allows people to install it
and seconds later to have a fully usable monitoring solution.
It's an opinionated monitoring solution,
so it comes pre-configured with all the metrics,
how they are correlated,
what alarms should be attached to them.
So seconds after installing it,
the data is a fully featured monitoring solution
presenting meaningful dashboards
throughout rendering all the information available
and surfacing all the insights possible,
including alarms.
So it also automatically attaches alarms to the metrics, etc.
I released this in 2016.
People loved it.
It went viral.
So today, the data is being downloaded about a million times every single day, even today.
The community has more than 3 million unique users, and it grows by more than 6,000 new
users every single day.
So in 2018, I decided to start a company around new data.
I got the first serious seed funding from a Greek VC, Marathon.
And then I had a series of
more investments.
So we went to
seed extension, to Series A,
and now we just finalized
Series A1. The total
funding so far is
31 million.
The company
is about 50 people,
all of them working remotely all over the world.
So I was basically saying that what you outlined previously and highlighting the key points
that brought you to where you are today, to me sounded like a kind of typical open source success story,
if you would call it that.
So you start a product or a project, rather.
It gets some traction to the point where the founder and the team around them see that
there is a commercial potential.
They get some funding and so on and so forth.
I would say typical.
What is not typical, at least doesn't look typical up to this point, is how you actually
monetize slash commercialize it.
Because what I would expect to see would be your core open source offering being obviously
open source, therefore free for people to use and modify as they see fit.
And what typically happens is that some people are inclined to just install it on their own
infrastructure and use it in their own way and take the associated burden with running
it and storage and management and all of these things. And again, typically what happens is that the company behind the product
has a cloud version running the software that they manage themselves
and they commercialize that basically.
And up to some extent, I see that NetData kind of follows this path.
But what surprised me is that there doesn't seem to be any paid service at all
at this point for what you're offering. And I also saw a blog post that you wrote recently
basically outlining precisely that. So, you know, it's kind of contradictory. How can
you raise, you know, this kind of funding and not have, you know, any paid services
at all? You obviously said that, you said that there will be some path to commercialization soon, but I'm surprised
that there hasn't been one already.
And I wonder what the path will look like.
So the first thing is that I have to say that I am an open source fan.
I believe that open source is one of the major breakthroughs of our time.
It distributes technology in an amazing way throughout the world.
So I really love it. all the business models that somehow they influence, they change the world in a very
strong and deep way. Such companies, for example, are GitHub itself. GitHub is a version control
system for open source developers and the likes.
But their business model was so amazing that it made a big difference for the world.
And today, GitHub, for example, does not sell version control.
GitHub sells a door, a security feature on top of the version control. GitHub sells a door, a security feature on top of the version control. Similarly,
there are more companies like Slack. Slack is a collaboration engine, but you cannot buy
collaboration from them. What you can buy is retention to the free collaboration.
Cloudflare is a CDN, but you cannot buy CDN from them.
What you can buy is enterprise plugins on top of the free CDN.
What these companies achieve is that they provide something so good to the world,
their services, and they provide these services for free, massively for free.
So I admire this business model.
And I believe that the data is in position to execute such a business model.
So all the monitoring features, we plan to offer them for free.
We are a monitoring solution, but no, we're not going to sell monitoring.
We are going to sell enterprise plugins,
advanced retention, advanced security features, etc.,
on top of the free monitoring.
Does this answer your question?
It does, it does.
But the part that was kind of unclear to me was,
of course, what you said all makes sense,
but it's just what surprised me was that I haven't
actually seen any of those
value add for...
It's very early.
So what we have today is
the open source age and
this is
out there for a few years now
and it is viral.
It is the best single node monitoring
in this world
and this is viral. It is the best single node monitoring in this world.
And this is how people come closer to us. What we are trying to do
is build
an infrastructure view
on top of the open source agent.
So we built a
SaaS offering that connects to all your
data agents,
fetches metadata, not the actual
data, fetches metadata, not the actual data, fetches metadata, and attempts to provide
the same, to offer the same service the agent offers it for a single node, to offer it for
your whole infrastructure.
This is still in progress.
We're not there yet.
This is going to be free. Once this is realized and once we are at this point,
we are going to introduce new features.
We're not going to convert the old features,
the features that have been offered for free up to that point.
We're going to introduce new features that will be paid.
But we're not there yet.
We're still building the free thing.
Okay, okay.
It makes sense.
And, you know, from the venture capital point of view,
one could frame what you just said as, you know,
you're building your user base
and you're trying to be as successful as possible.
And then you can, you know,
your bet is basically on converting as much of this user base
to paid customers as possible.
Yes, exactly.
Okay, so with that, maybe it's a good opportunity
if you would like to highlight, to outline rather,
a few of those upcoming paid features that you talked about.
And one of the things that I saw it wasn't exactly in roadmap
but in it was like a on the novel community call for people that would
like to contribute data towards your under development machine learning
picture so I wonder a if this could possibly power some of those features
that you that you mentioned and be what is the status of your development there?
No, we just released a feature, we call it Insights, that allows people to correlate metrics in a very efficient way.
So you highlight a spike on a chart and the data automatically finds which other metrics
have been influenced or have influenced the spike.
So it correlates them somehow.
This is amazing because it is very efficient.
We tested this against other commercial offerings
and ours is a lot better.
So we believe that people should be...
Also, this is a free service.
So we want people to contribute more in this direction.
This is why we ask for people to come and contribute.
This is still very early, though.
We just released it yesterday, I think.
So let's see how this will proceed.
Engaging the community is very crucial for NetData, you know,
because as I said at the beginning, NetData is an opinionated monitoring.
It comes pre-configured with all the expertise of the community.
So every alarm, every metrics correlation, every title,
every description has been battle-tested by the community, all these people have contributed back fixes to
make this as accurate and as, let's say, as accurate as possible.
So the same happens with our ML effort.
We want this to be top notch, to be the best that can be done.
And this is why we're asking for people to be engaged.
Okay.
Actually, the alarms part was something I also wanted to ask about
because, of course, you can correct me if I'm wrong,
but seeing the kind of architecture that you mentioned,
so basically just getting metadata and not the data itself and having a very fine-grained
granularity.
So more data points per unit of time, basically.
And if I got it right, you're actually not retaining that data, which is what allows
you to scale, basically, and to keep your operational costs under control. Is that correct?
Yes and no. So, the whole point with the data is that we distribute the data, we keep the data as close to the edge as possible.
That's the whole point of scalability.
So we want all this data to be either at the point
that the server denotes they are collected or as close to it.
So the data agent allows streaming of metrics.
So you can push metrics between the data servers.
We support all setups.
So we can support master, slaves, and proxies.
Now, the point is that we don't want to centralize the data at the cloud, at our centralized
infrastructure to control our costs.
Because if we centralize it, then all the services that we are offering for free would
not be possible.
That's the main reason. So what we want is for everyone to use the data.
Even the cloud hosting providers,
they can provide paid centralized data servers,
centralized points.
For us, this is totally valid.
We are not going to compete with them.
If they want to do it, our software is there, they are free to do it, and actually we are
going to help them.
So the idea, the primary goal here is to show the world a completely different way to monitor the infrastructure, a way that
provides all the surfaces, all the insights, and does not filter metrics, so has all the
granularity that is required.
This is the primary goal, and I think this is a noble cause, let's say, for Mededa.
Now, you also asked about the license or I lost it.
Yes, indeed. Actually, that was something that I skipped.
It was included in the list of topics I sent, but I didn't
actually ask. But based on what you just said, I kind of figured out the answer. What caught
my eye was the fact that your license is GPL as opposed to AGPL. And being an open source
aficionado, let's say myself, I know a thing or two about open source and the different
licenses and how they influence business models.
And this attracted my attention precisely because of the fact that having a GPL license
basically enables cloud operators to take your software and run it.
And this is not a position most open source vendors would like to find themselves in.
So that's why I wanted to ask.
But you basically just answered by saying that you're fine with that.
Yeah, so GPL is the same license as the Linux kernel itself.
And all the hosting the cloud provides, they use the Linux kernel.
So the answer is very simple.
We want them to use our software.
Our software is free to use by anyone, provided that he complies with the very limited requirements of GPL.
Mainly that if they change the software, they will give back their contributions. That's the whole point of GPL. Other than that, they are free to use the software in any way they see fit, even if they want to install it
and sell it as a service. That's perfectly fine for GPL.
I think that a GPL is
more
lesser if you want to
embed more lesser if you want to embed parts of the data
in your own software.
Parts of the open
source software that is
AGPL licensed to your
software.
Of course, we don't want this to happen.
So this is why we have not
chosen AGPL. But GPL
is okay and cloud
hosting providers they can
they can perfectly
use it with Android GPL
okay well
let's not
get lost in the technicalities of
different software licenses
because well the bottom line is
what you just said that you're fine with
you know anyone basically doing that.
So that explains pretty much everything.
Another point I wanted to ask in continuation to your previous answer,
actually, as to data retainment.
I mean, sure, again, it makes total sense that at least part of the reason
why you do it the way you do
it is you know to keep your operational costs reasonable and therefore be able
to offer your services the way that you do however I you know that kind of that
raised two issues actually one has to do with data backup basically so you know
if my data stays at the edge where it's generated,
and, you know, my node goes down, what happens? Of course, you know, you could argue that it's
my responsibility to have a backup for the data. But, you know, some people are going to ask,
okay, do I also lose my metrics? And, you know, these insights that I may possibly have gotten
out of that, which I would like to retain.
I'm guessing that this could possibly be one of your upcoming services,
but just wonder what your take is on that.
So the idea is that people are free to have a replication,
to replicate their data.
Of course, by default,
when they install a single data agent,
the data are collected
and are staying there.
And of course,
there is a space limit
on how much data can be stored that way.
So the data,
the agent will rotate the data.
So it will override all the data with newer it will override older data with newer ones.
The oldest data with the newest ones.
So the idea is that, however,
the data can stream metrics between the data servers.
So you can have centralization points within your network.
And even if this is not enough, you can have a centralization points within your network and even if this is not enough
you can have a centralization points for data at the cloud provider the hosting
provider and when you connect your metadata to your data cloud where we are
able to detect the replication factor and warn you that, you know, this server has a replication factor of one.
So if it goes down or if it crashes or whatever,
you're going to lose the data.
So to help people set up the replication the way it should.
The point, however, is not to centralize.
We don't want, we are not, you know, a lot of people are afraid that free services offer,
you know, sell data or this kind of stuff.
We are not centralizing any data.
So we don't want people's data.
We want to offer a service that will provide, that will make them more efficient,
that will make them more effective in troubleshooting their infrastructure,
but we don't want their data.
So even alarms and all this kind of stuff can be run at any point
in their infrastructure, even at the data cloud but without us
having all the data. So I guess what you're saying is that basically this
happens on the fly. Yes, this happens on the fly.
The trade-off for doing that would typically be on performance.
So having your data locally stored, again, typically means that doing something on the fly would run much faster.
So is there actually a trade-off there? And how do you manage that?
So NetData is optimized for performance.
The fact that we use the NetData agents as a distributed database,
every data query at the NetData agent responds in just a couple of milliseconds.
NetData is amazingly fast.
It is optimized for that. And even when you do
a synthetic query that needs to query 100 servers, we
query all the servers in parallel. So the delay
is not increased. We're not summing up
the individual nodes that are going to respond.
This means that at the end of the day, the distributed nature of the data means that
you have huge horsepower to execute queries a lot better than anything else. And this is visible, you know, even today in the premature service that NetData Cloud today offers.
You can see it in real time.
You can see it in action.
But you can query the data servers through NetData Cloud
and still the response is amazing.
It's very fast, extremely fast.
So I think that
at the end of the day, our setup
improves performance.
Of course, it is trickier because the software is
more complicated in order to achieve
a few
queries.
But I think that this is a compromise.
The software is
harder to build, but it's faster to execute,
mainly because it uses the resources of your whole infrastructure to do it.
Okay.
Since you mentioned querying, another question I wanted to ask had to do with,
well, and to tie that with the opinionated aspect that you mentioned
earlier that, you know, when you deploy an agent, it's already pre-configured and, you
know, it's already, you don't have to do much and metrics start just pouring in by themselves
more or less.
So, and to tie that to the queuing aspect was, well, how much of control do users have in case they actually want to deviate from what's
pre-configured?
So, for example, if I want to correlate metrics in a different way, or if I want to define
higher-level KPIs, do users have that level of control?
And if yes, how?
Is there some kind of querying language or configuration that they can play with?
Yeah, so today users can do this in multiple ways.
The first thing is that metrics can be correlated together.
So someone can control how the metrics are correlated within the data.
Then you have the ability to build custom alarms using multiple metrics,
not one, and using statistical functions or even machine learning.
So all these features are offered for free today.
And we plan to enrich all of this set in the data cloud.
So although today most of them are limited in one agent,
because the data is mainly single node monitoring,
the agent, in the data cloud,
we plan to offer all of these features across the infrastructure.
It's not there yet.
It's not there.
Okay.
Thank you.
And I guess since we're almost out of time, then we may wrap up with, well, let's say,
a broader view on this space.
You mentioned earlier something about the complexity of, well, actually the complexity
of building your own software.
But what I wanted to ask you was about the complexity
in building software in general, let's say, today.
And one of the themes I have seen being raised
in a recent survey that I had the chance to review,
which collected input from many people active in DevOps domain,
one of the aspects that they touched as an important challenge
for them was the fact that software complexity is increasing with cloud development and breaking
down monoliths and all of these things.
So monitoring also becomes harder.
I wonder if you also see that as a challenge and what else do you see as a challenge and as an opportunity
for the space in general?
So observability is about three pillars.
One is metrics, the other is logs, and the third is traces.
The data addresses only the first of the metrics today.
So the idea is that what we tried to do is simplify the lives of these developers
as much as possible.
I believe that the key point,
a key requirement for solutions,
such solutions in our time, is not how configurable.
Of course, Netdata is extremely configurable.
You can customize it.
There are hundreds of thousands of options out there. What makes the data so good is that it is so easy and it has so same defaults when you use it, when you first install it.
So in the case of developers that you mentioned, what we want to offer to them is peace of mind, that they will install this thing and without configuring
much, they will get the best immediately out of it.
And I find this fascinating.
This is amazing to know that you can trust the software.
We have used cases that people use the data as a certification.
So they install it on their systems,
and if the data does not raise an alarm,
the system is good.
The system does not have any major issues.
So it is fascinating to see how people save time,
how people can be more productive
by using solutions like this in production.
Okay, thanks. I think we covered quite a lot of ground and I think pretty much
everything that I had in my list we went over so unless you have any closing
comments to make? One thing that I would like to mention is that NetData
is a complementary solution.
So we don't
aim to replace
Prometheus and Grafana
or Elastic or Splunk
or Datadog and the likes.
75%
of our user base
uses NetData together
with all these solutions and more solutions.
The whole point is that we believe that the world lacks a troubleshooting platform. There are many
solutions that are used for reporting, for all the functions that we call monitoring today.
But all of these solutions that exist today,
they cannot be used to effectively troubleshoot
performance issues in real time.
And this is where Netdata steps in.
This is what Netdata offers.
And so we want Netdata to be interoperable with all those.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration
on Twitter, LinkedIn, and Facebook.