Orchestrate all the Things - A troubleshooting platform for free: Netdata scores $14.2M funding to extend its open-source application monitoring platform. Featuring CEO / Founder Costas Tsaousis

Episode Date: September 23, 2020

Netdata, the company behind the eponymous open-source, distributed, real-time, performance, and health monitoring solution for systems and applications, has announced it completed a new round of ...financing of 14,2 million dollars.  We take the opportunity to dive in the Netdata story with CEO and Founder Costas Tsaousis: what they do and why, their open source approach and distributed architecture, and their position in the overall observability ecosystem. Article published on ZDNet

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. Today's episode features Netdata CEO and founder Kostas Tchaousis. Netdata, the company behind the eponymous open-source distributed real-time performance and health monitoring solution, has announced it completed a new round of financing of $14.2 million. We take the opportunity to dive in the new data story with Ceauses, what they do and why, their open-source approach and distributed architecture, and their position in the overall observability ecosystem.
Starting point is 00:00:39 I hope you will enjoy the podcast. If you like my work, you can follow Link Data Registration on Twitter, LinkedIn, and Facebook. And I guess a typical way to start an introduction with someone that we haven't been in touch before and also to kind of introduce yourself and the company to the world at large. I would say a few words about how you got started and a few key milestones actually I would say would be a good thing to highlight since the occasion today is your
Starting point is 00:01:14 upcoming founding and congratulations for that. So it would be interesting to know the way that led to this point basically. Okay, so my name is Kostas Tsaousis. I am the founder and the CEO of NEDATA. I started NEDATA in 2014 and actually it came out of
Starting point is 00:01:37 frustration. I was really pissed off of the monitoring systems that existed by that time. I couldn't believe that most of the monitoring systems even today provide just an airplane view of the infrastructure. So they cherry-pick just a few very important metrics. They cherry-pick some information, very limited information, and they try to use this information to provide a monitoring solution.
Starting point is 00:02:10 The problem, however, with this is that it is too abstract for actual work. Many insights are lost, and in the sake of scalability, actually, we have very limited monitoring solutions. I decided to build the data in a completely different, with completely different principles. So the first thing is that I decided that the monitoring system should have all the metrics, all the information in full resolution, per second resolution as a standard, and then work around the scalability problem. So, in a day-to-day, it's a distributed solution. We are not centralizing all the data to one place. So, this provides infinite scalability with full resolution and all the insights.
Starting point is 00:03:02 At the same time, it is packaged in a very nice way that allows people to install it and seconds later to have a fully usable monitoring solution. It's an opinionated monitoring solution, so it comes pre-configured with all the metrics, how they are correlated, what alarms should be attached to them. So seconds after installing it,
Starting point is 00:03:30 the data is a fully featured monitoring solution presenting meaningful dashboards throughout rendering all the information available and surfacing all the insights possible, including alarms. So it also automatically attaches alarms to the metrics, etc. I released this in 2016. People loved it.
Starting point is 00:03:58 It went viral. So today, the data is being downloaded about a million times every single day, even today. The community has more than 3 million unique users, and it grows by more than 6,000 new users every single day. So in 2018, I decided to start a company around new data. I got the first serious seed funding from a Greek VC, Marathon. And then I had a series of more investments.
Starting point is 00:04:48 So we went to seed extension, to Series A, and now we just finalized Series A1. The total funding so far is 31 million. The company is about 50 people,
Starting point is 00:05:04 all of them working remotely all over the world. So I was basically saying that what you outlined previously and highlighting the key points that brought you to where you are today, to me sounded like a kind of typical open source success story, if you would call it that. So you start a product or a project, rather. It gets some traction to the point where the founder and the team around them see that there is a commercial potential. They get some funding and so on and so forth.
Starting point is 00:05:43 I would say typical. What is not typical, at least doesn't look typical up to this point, is how you actually monetize slash commercialize it. Because what I would expect to see would be your core open source offering being obviously open source, therefore free for people to use and modify as they see fit. And what typically happens is that some people are inclined to just install it on their own infrastructure and use it in their own way and take the associated burden with running it and storage and management and all of these things. And again, typically what happens is that the company behind the product
Starting point is 00:06:27 has a cloud version running the software that they manage themselves and they commercialize that basically. And up to some extent, I see that NetData kind of follows this path. But what surprised me is that there doesn't seem to be any paid service at all at this point for what you're offering. And I also saw a blog post that you wrote recently basically outlining precisely that. So, you know, it's kind of contradictory. How can you raise, you know, this kind of funding and not have, you know, any paid services at all? You obviously said that, you said that there will be some path to commercialization soon, but I'm surprised
Starting point is 00:07:09 that there hasn't been one already. And I wonder what the path will look like. So the first thing is that I have to say that I am an open source fan. I believe that open source is one of the major breakthroughs of our time. It distributes technology in an amazing way throughout the world. So I really love it. all the business models that somehow they influence, they change the world in a very strong and deep way. Such companies, for example, are GitHub itself. GitHub is a version control system for open source developers and the likes.
Starting point is 00:08:05 But their business model was so amazing that it made a big difference for the world. And today, GitHub, for example, does not sell version control. GitHub sells a door, a security feature on top of the version control. GitHub sells a door, a security feature on top of the version control. Similarly, there are more companies like Slack. Slack is a collaboration engine, but you cannot buy collaboration from them. What you can buy is retention to the free collaboration. Cloudflare is a CDN, but you cannot buy CDN from them. What you can buy is enterprise plugins on top of the free CDN. What these companies achieve is that they provide something so good to the world,
Starting point is 00:08:57 their services, and they provide these services for free, massively for free. So I admire this business model. And I believe that the data is in position to execute such a business model. So all the monitoring features, we plan to offer them for free. We are a monitoring solution, but no, we're not going to sell monitoring. We are going to sell enterprise plugins, advanced retention, advanced security features, etc., on top of the free monitoring.
Starting point is 00:09:32 Does this answer your question? It does, it does. But the part that was kind of unclear to me was, of course, what you said all makes sense, but it's just what surprised me was that I haven't actually seen any of those value add for... It's very early.
Starting point is 00:09:52 So what we have today is the open source age and this is out there for a few years now and it is viral. It is the best single node monitoring in this world and this is viral. It is the best single node monitoring in this world.
Starting point is 00:10:08 And this is how people come closer to us. What we are trying to do is build an infrastructure view on top of the open source agent. So we built a SaaS offering that connects to all your data agents, fetches metadata, not the actual
Starting point is 00:10:24 data, fetches metadata, not the actual data, fetches metadata, and attempts to provide the same, to offer the same service the agent offers it for a single node, to offer it for your whole infrastructure. This is still in progress. We're not there yet. This is going to be free. Once this is realized and once we are at this point, we are going to introduce new features. We're not going to convert the old features,
Starting point is 00:10:53 the features that have been offered for free up to that point. We're going to introduce new features that will be paid. But we're not there yet. We're still building the free thing. Okay, okay. It makes sense. And, you know, from the venture capital point of view, one could frame what you just said as, you know,
Starting point is 00:11:14 you're building your user base and you're trying to be as successful as possible. And then you can, you know, your bet is basically on converting as much of this user base to paid customers as possible. Yes, exactly. Okay, so with that, maybe it's a good opportunity if you would like to highlight, to outline rather,
Starting point is 00:11:36 a few of those upcoming paid features that you talked about. And one of the things that I saw it wasn't exactly in roadmap but in it was like a on the novel community call for people that would like to contribute data towards your under development machine learning picture so I wonder a if this could possibly power some of those features that you that you mentioned and be what is the status of your development there? No, we just released a feature, we call it Insights, that allows people to correlate metrics in a very efficient way. So you highlight a spike on a chart and the data automatically finds which other metrics
Starting point is 00:12:34 have been influenced or have influenced the spike. So it correlates them somehow. This is amazing because it is very efficient. We tested this against other commercial offerings and ours is a lot better. So we believe that people should be... Also, this is a free service. So we want people to contribute more in this direction.
Starting point is 00:13:02 This is why we ask for people to come and contribute. This is still very early, though. We just released it yesterday, I think. So let's see how this will proceed. Engaging the community is very crucial for NetData, you know, because as I said at the beginning, NetData is an opinionated monitoring. It comes pre-configured with all the expertise of the community. So every alarm, every metrics correlation, every title,
Starting point is 00:13:39 every description has been battle-tested by the community, all these people have contributed back fixes to make this as accurate and as, let's say, as accurate as possible. So the same happens with our ML effort. We want this to be top notch, to be the best that can be done. And this is why we're asking for people to be engaged. Okay. Actually, the alarms part was something I also wanted to ask about because, of course, you can correct me if I'm wrong,
Starting point is 00:14:20 but seeing the kind of architecture that you mentioned, so basically just getting metadata and not the data itself and having a very fine-grained granularity. So more data points per unit of time, basically. And if I got it right, you're actually not retaining that data, which is what allows you to scale, basically, and to keep your operational costs under control. Is that correct? Yes and no. So, the whole point with the data is that we distribute the data, we keep the data as close to the edge as possible. That's the whole point of scalability.
Starting point is 00:15:11 So we want all this data to be either at the point that the server denotes they are collected or as close to it. So the data agent allows streaming of metrics. So you can push metrics between the data servers. We support all setups. So we can support master, slaves, and proxies. Now, the point is that we don't want to centralize the data at the cloud, at our centralized infrastructure to control our costs.
Starting point is 00:15:53 Because if we centralize it, then all the services that we are offering for free would not be possible. That's the main reason. So what we want is for everyone to use the data. Even the cloud hosting providers, they can provide paid centralized data servers, centralized points. For us, this is totally valid. We are not going to compete with them.
Starting point is 00:16:27 If they want to do it, our software is there, they are free to do it, and actually we are going to help them. So the idea, the primary goal here is to show the world a completely different way to monitor the infrastructure, a way that provides all the surfaces, all the insights, and does not filter metrics, so has all the granularity that is required. This is the primary goal, and I think this is a noble cause, let's say, for Mededa. Now, you also asked about the license or I lost it. Yes, indeed. Actually, that was something that I skipped.
Starting point is 00:17:22 It was included in the list of topics I sent, but I didn't actually ask. But based on what you just said, I kind of figured out the answer. What caught my eye was the fact that your license is GPL as opposed to AGPL. And being an open source aficionado, let's say myself, I know a thing or two about open source and the different licenses and how they influence business models. And this attracted my attention precisely because of the fact that having a GPL license basically enables cloud operators to take your software and run it. And this is not a position most open source vendors would like to find themselves in.
Starting point is 00:18:04 So that's why I wanted to ask. But you basically just answered by saying that you're fine with that. Yeah, so GPL is the same license as the Linux kernel itself. And all the hosting the cloud provides, they use the Linux kernel. So the answer is very simple. We want them to use our software. Our software is free to use by anyone, provided that he complies with the very limited requirements of GPL. Mainly that if they change the software, they will give back their contributions. That's the whole point of GPL. Other than that, they are free to use the software in any way they see fit, even if they want to install it
Starting point is 00:18:50 and sell it as a service. That's perfectly fine for GPL. I think that a GPL is more lesser if you want to embed more lesser if you want to embed parts of the data in your own software. Parts of the open source software that is
Starting point is 00:19:13 AGPL licensed to your software. Of course, we don't want this to happen. So this is why we have not chosen AGPL. But GPL is okay and cloud hosting providers they can they can perfectly
Starting point is 00:19:30 use it with Android GPL okay well let's not get lost in the technicalities of different software licenses because well the bottom line is what you just said that you're fine with you know anyone basically doing that.
Starting point is 00:19:46 So that explains pretty much everything. Another point I wanted to ask in continuation to your previous answer, actually, as to data retainment. I mean, sure, again, it makes total sense that at least part of the reason why you do it the way you do it is you know to keep your operational costs reasonable and therefore be able to offer your services the way that you do however I you know that kind of that raised two issues actually one has to do with data backup basically so you know
Starting point is 00:20:23 if my data stays at the edge where it's generated, and, you know, my node goes down, what happens? Of course, you know, you could argue that it's my responsibility to have a backup for the data. But, you know, some people are going to ask, okay, do I also lose my metrics? And, you know, these insights that I may possibly have gotten out of that, which I would like to retain. I'm guessing that this could possibly be one of your upcoming services, but just wonder what your take is on that. So the idea is that people are free to have a replication,
Starting point is 00:21:02 to replicate their data. Of course, by default, when they install a single data agent, the data are collected and are staying there. And of course, there is a space limit on how much data can be stored that way.
Starting point is 00:21:19 So the data, the agent will rotate the data. So it will override all the data with newer it will override older data with newer ones. The oldest data with the newest ones. So the idea is that, however, the data can stream metrics between the data servers. So you can have centralization points within your network. And even if this is not enough, you can have a centralization points within your network and even if this is not enough
Starting point is 00:21:46 you can have a centralization points for data at the cloud provider the hosting provider and when you connect your metadata to your data cloud where we are able to detect the replication factor and warn you that, you know, this server has a replication factor of one. So if it goes down or if it crashes or whatever, you're going to lose the data. So to help people set up the replication the way it should. The point, however, is not to centralize. We don't want, we are not, you know, a lot of people are afraid that free services offer,
Starting point is 00:22:32 you know, sell data or this kind of stuff. We are not centralizing any data. So we don't want people's data. We want to offer a service that will provide, that will make them more efficient, that will make them more effective in troubleshooting their infrastructure, but we don't want their data. So even alarms and all this kind of stuff can be run at any point in their infrastructure, even at the data cloud but without us
Starting point is 00:23:06 having all the data. So I guess what you're saying is that basically this happens on the fly. Yes, this happens on the fly. The trade-off for doing that would typically be on performance. So having your data locally stored, again, typically means that doing something on the fly would run much faster. So is there actually a trade-off there? And how do you manage that? So NetData is optimized for performance. The fact that we use the NetData agents as a distributed database, every data query at the NetData agent responds in just a couple of milliseconds.
Starting point is 00:23:59 NetData is amazingly fast. It is optimized for that. And even when you do a synthetic query that needs to query 100 servers, we query all the servers in parallel. So the delay is not increased. We're not summing up the individual nodes that are going to respond. This means that at the end of the day, the distributed nature of the data means that you have huge horsepower to execute queries a lot better than anything else. And this is visible, you know, even today in the premature service that NetData Cloud today offers.
Starting point is 00:24:50 You can see it in real time. You can see it in action. But you can query the data servers through NetData Cloud and still the response is amazing. It's very fast, extremely fast. So I think that at the end of the day, our setup improves performance.
Starting point is 00:25:11 Of course, it is trickier because the software is more complicated in order to achieve a few queries. But I think that this is a compromise. The software is harder to build, but it's faster to execute, mainly because it uses the resources of your whole infrastructure to do it.
Starting point is 00:25:35 Okay. Since you mentioned querying, another question I wanted to ask had to do with, well, and to tie that with the opinionated aspect that you mentioned earlier that, you know, when you deploy an agent, it's already pre-configured and, you know, it's already, you don't have to do much and metrics start just pouring in by themselves more or less. So, and to tie that to the queuing aspect was, well, how much of control do users have in case they actually want to deviate from what's pre-configured?
Starting point is 00:26:09 So, for example, if I want to correlate metrics in a different way, or if I want to define higher-level KPIs, do users have that level of control? And if yes, how? Is there some kind of querying language or configuration that they can play with? Yeah, so today users can do this in multiple ways. The first thing is that metrics can be correlated together. So someone can control how the metrics are correlated within the data. Then you have the ability to build custom alarms using multiple metrics,
Starting point is 00:26:52 not one, and using statistical functions or even machine learning. So all these features are offered for free today. And we plan to enrich all of this set in the data cloud. So although today most of them are limited in one agent, because the data is mainly single node monitoring, the agent, in the data cloud, we plan to offer all of these features across the infrastructure. It's not there yet.
Starting point is 00:27:26 It's not there. Okay. Thank you. And I guess since we're almost out of time, then we may wrap up with, well, let's say, a broader view on this space. You mentioned earlier something about the complexity of, well, actually the complexity of building your own software. But what I wanted to ask you was about the complexity
Starting point is 00:27:46 in building software in general, let's say, today. And one of the themes I have seen being raised in a recent survey that I had the chance to review, which collected input from many people active in DevOps domain, one of the aspects that they touched as an important challenge for them was the fact that software complexity is increasing with cloud development and breaking down monoliths and all of these things. So monitoring also becomes harder.
Starting point is 00:28:19 I wonder if you also see that as a challenge and what else do you see as a challenge and as an opportunity for the space in general? So observability is about three pillars. One is metrics, the other is logs, and the third is traces. The data addresses only the first of the metrics today. So the idea is that what we tried to do is simplify the lives of these developers as much as possible. I believe that the key point,
Starting point is 00:28:59 a key requirement for solutions, such solutions in our time, is not how configurable. Of course, Netdata is extremely configurable. You can customize it. There are hundreds of thousands of options out there. What makes the data so good is that it is so easy and it has so same defaults when you use it, when you first install it. So in the case of developers that you mentioned, what we want to offer to them is peace of mind, that they will install this thing and without configuring much, they will get the best immediately out of it. And I find this fascinating.
Starting point is 00:29:53 This is amazing to know that you can trust the software. We have used cases that people use the data as a certification. So they install it on their systems, and if the data does not raise an alarm, the system is good. The system does not have any major issues. So it is fascinating to see how people save time, how people can be more productive
Starting point is 00:30:21 by using solutions like this in production. Okay, thanks. I think we covered quite a lot of ground and I think pretty much everything that I had in my list we went over so unless you have any closing comments to make? One thing that I would like to mention is that NetData is a complementary solution. So we don't aim to replace Prometheus and Grafana
Starting point is 00:30:53 or Elastic or Splunk or Datadog and the likes. 75% of our user base uses NetData together with all these solutions and more solutions. The whole point is that we believe that the world lacks a troubleshooting platform. There are many solutions that are used for reporting, for all the functions that we call monitoring today.
Starting point is 00:31:25 But all of these solutions that exist today, they cannot be used to effectively troubleshoot performance issues in real time. And this is where Netdata steps in. This is what Netdata offers. And so we want Netdata to be interoperable with all those. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration
Starting point is 00:31:51 on Twitter, LinkedIn, and Facebook.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.