The Changelog: Software Development, Open Source - Mesos and Mesosphere DCOS (Interview)

Episode Date: July 31, 2015

Tobi Knaup, co-founder & CTO of Mesosphere joined the show to talk about the datacenter operating system, and all the open source around it....

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome back everyone, this is The Change Log and I'm your host Adam Stachowiak. This is episode 167 and on today's show we're talking to Toby Knaup, the CTO of Mesosphere. Toby was the tech lead at Airbnb and ultimately left back in 2013 to start Mesosphere, a company that is building a data center operating system for the next generation of internet scale applications. We talked about Mesosphere,
Starting point is 00:00:37 Mesosphere DCOS, and all the open source around it. Apache Mesos, Docker, Containerization, Linux, Kubernetes, CoreOS, all the in-betweens, Kronos. Great show today with Toby. We have three awesome sponsors for this show, CodeChip, TopTile, and DigitalOcean. Our first sponsor is CodeChip.
Starting point is 00:01:00 They've launched a brand new feature called Organizations, and now you can create teams, set permissions for specific team members, and improve collaboration in your continuous delivery workflows. Maintain centralized control over your organization's projects and teams with CodeChip's new organization's plans. You can save 20% off any plan you choose for three months by using this code, the changelogpodcast. Again, that code is the changelogpodcast. And you'll save 20% off any premium plan for three months by using that code. Head to codeship.com slash the changelog to get started. And now on to the show. All right, everyone. We're back and this is actually take number two with toby canalp he is the cto of mesosphere uh often mesosphere and mesos and apache mesos these are all sort of like
Starting point is 00:01:59 mixed and intertwined so hopefully during the show we'll'll talk to Toby a bit about all that and get it settled out. But Toby, this is take two, man. What do you think? Well, let's try this again. Let's hope we have better luck this time. Jared's also on the call too, but just to explain what happened here. We had a little glitch and we had to
Starting point is 00:02:19 reschedule and today's that rescheduled day and Toby's back, so if we didn't tell you, you wouldn't know, so I let the cow let the cow the bag it's a nice object lesson though because as i said last week when we had our glitch is that if toby had had a recording machine a data data center operating system if he had if his recording machine was spread out across a cluster of thousands that little hardware failure would have been no big deal, right? No big deal at all, yeah. It's all about high availability.
Starting point is 00:02:49 High availability. Well, let's go back in the past a bit. So maybe let's set the tone for what the call is about. So obviously you're the CTO at Mesosphere. You've got a lot of cool stuff you're doing there, a lot of it operating around open source, a lot of stuff out there that's really picked up in the last year that's just got all sorts of things happening. But you were also the tech lead at Airbnb.
Starting point is 00:03:11 You've done lots of cool stuff in your past. So let's get to know you a bit and learn a bit about what your past is and kind of who you are. Sure, yeah. So, yeah, I'm Toby. I grew up in beautiful Bavaria in Germany and moved to Silicon Valley about six years ago. And, you know, I went to school in Germany, did some work in, you know, machine learning and sentiment analysis. And then, you know, when you grow up in Germany and you work in tech, you always have these like these ideas these romantic ideas about what silicon valley is like and you know and it's everybody lives in the future there and stuff and so i always you know wanted to wanted to check it out and so uh you know did an internship at a small startup in silicon valley um you know a couple years ago and then um you
Starting point is 00:04:00 know through a friend actually um got connected to the Airbnb founders a little later. And so I joined those guys pretty early on. I was engineer four at Airbnb. And so, yeah, you know, when you join that early, you wear many hats. So I did a lot of different things. I helped them scale their infrastructure, you know, helped hire a lot of the engineering team, built some of the backend services there, like search and the fraud detection service, built some features in the product too. So lots and lots of different things. And, you know, a couple of years into Airbnb, brought my best friend on board, Flo,
Starting point is 00:04:39 who's a co-founder of Mesosphere too. So brought him into Airbnb to build out the data infrastructure team. And so there, you know, we worked with Apache Mesos, which, you know, ultimately was the reason for starting Mesosphere because we, you know, we were pretty successful with Mesos at Airbnb. Flo had used it at Twitter before as well. And so, you know, that led us to start the company. So maybe since you mentioned Apache Mesos, I mentioned it as well. And so that led us to start the company. So maybe since you mentioned Apache Mesos,
Starting point is 00:05:10 I mentioned it as well, and Mesosphere, can we knock down some hurdles in terms of terms and terminology? Apache Mesos, Mesosphere, Mesos DCOS. Totally. Let's help us out and explain to the audience the differences between all these names. Totally, yeah. So the first thing that was out there was Mesos. It was actually called Nexus before it was called Mesos, but there was another product called Nexus. So they changed the name.
Starting point is 00:05:36 So it started as a project at UC Berkeley at the AMP lab. And in fact, Ben Heintmann, who's the third founder of Mesosphere, was one of the co-curators of the project. So it started there. The idea was to build a cluster management system. So sort of this layer that manages all the machines in a data center in a large cluster, and that provides APIs for building large-scale systems on top and making that process really easy. It became an Apache project a little later, so it was called Apache Mesos then. And, you know, Twitter was one of the largest backers initially. So that's Apache Mesos.
Starting point is 00:06:20 You know, been an Apache, top-level Apache project for a couple of years now. Mesosphere is the company that Flo and Ben and I created to commercialize Apache Mesos and to build a product around it. The way to think about Apache Mesos, or the way we like to think about it, is it's kind of like the Linux kernel. So it sits fairly low in the stack. It does a lot of cool stuff. It's a very sophisticated piece of technology. It's very high performance, a lot of really smart people working on it. But if you look at the Linux kernel, the Linux kernel is not Linux. There's a lot of things around the kernel that you need to run your applications
Starting point is 00:07:04 and to make the whole thing useful. Linux, right? There's a lot of things around the kernel that you need to run your applications and, you know, to make the whole thing useful. So that's basically what DCOS is, the data center operating system, which is the main product that we're building at Mesosphere. So it has Apache Mesos at its core, but it has all the pieces around it too that make it, you know, a full operating system experience. So we got lots of different names there. Mesosphere, Apache Mesos, the kernel itself, basically. Let's go back to Airbnb where I didn't want to derail us too far off the conversation there, but I did want to set some tone in terms of the terms and things like that.
Starting point is 00:07:41 People just sort of stumble over like, and part of this conversation is to demystify a bit of what's happening in the cloud um and so hopefully you can help us do that but take us back to to when you guys were originally starting uh mesosphere what what inspirations were happening what was happening at airbnb in terms of the technology that made you guys eventually leave and start this uh company? Yeah, so it was really our experiences from both Airbnb and Twitter that led to it because Ben and Flo at Twitter and then Flo and I at Airbnb, we used Mesos for completely different use cases, actually. And the environment was completely different, too. Twitter runs their own data centers.
Starting point is 00:08:23 Airbnb is entirely on Amazon Web Services. So cloud and on-premise. And Twitter was running, you know, they're running pretty much all of their production services on top of Mesos. So, you know, it's things like search and the ad server and a lot of user-facing kind of request response type things. And at Airbnb, we were running big data stuff. So we ran Hadoop on top of it, Cassandra, Spark. kind of request response type things. And at Airbnb, we were running big data stuff.
Starting point is 00:08:48 So we ran Hadoop on top of it, Cassandra, Spark, so big data analytics. And that was kind of where the idea for calling it a data center operating system came from. Because we looked at this and we're like, this can really run all the workloads you can run in a data center, the whole range of applications, kind of the same way that your desktop operating system is general purpose. There's not one
Starting point is 00:09:10 operating system that's great for doing development, and then there's another one that's great for doing graphic design, and a third one for doing Word and Excel. Operating systems are mostly general purpose. So that's kind of what drove this. And, you know, pre-Mesos, at those two companies, there were really a bunch of big challenges that we were able to solve with Mesos. One thing that both Airbnb and Twitter struggled with was kind of scaling up and being able to handle the user growth. So Twitter, if you remember the
Starting point is 00:09:46 fail whale that was kind of like 2009 right right you saw that a lot and you know and i think there's even like it's twitter down.com and like all that stuff it was on hacker news all the time the whole internet got got angry and took out the pitchforks when twitter was down so um you know what happened behind the scenes is they um you know they they started, Twitter started as a Ruby on Rails application. And they had to, you know, millions of people showed up, hundreds of millions, tweeted a lot. The infrastructure couldn't scale with this. So they really needed to rethink stuff. And one thing they did is they took this monolithic Ruby on Rails application and broke it down into pieces and do like microservices, micro backend services.
Starting point is 00:10:28 So, you know, there's like a different service for, you know, your timeline, maybe your search, your ads. Those are all different, different code bases, different services. And so one thing they needed is really a platform to run all these things because that's a lot of stuff to manage. And that's what they used Mesos for. And at Airbnb, it was a slightly different scenario. We wanted to use Hadoop, and we wanted to use it in sort of a self-serve way where we could start Hadoop clusters very quickly and then shut them down again. And we also wanted to be able to try out new data analytics tools as they come out.
Starting point is 00:11:09 Data analytics stack is never just Hadoop, it's always a combination of things. And we were using Kafka also at the time. So it's just a bunch of different tools. And we really wanted a platform to run all this stuff on to make it really easy to install these things instead of spending a month or even multiple months to you know trying to figure out how to install hadoop or kafka um so it was one use case at airbnb the other one um which was pretty interesting and probably the most uh advanced one um so what we were doing at the time is we um we had one machine that had a crontab on it and that basically ran the whole ETL and analytics pipeline. So it would do things like, you know, step one, dump the SQL database
Starting point is 00:11:53 to a text file and then maybe merge it with the web server logs and pull some other data from a key value store and, you know, build a data set from all that. And then another step would be, you know, take that and count the revenue, count, you know, other things, number of visitors, what have you. So there's always these multiple steps that depend on each other. And we were doing that at the time with Cron. So you had to be, you know, be like, okay, the first step should probably take like 30 minutes. So, you know, let's give it an hour and then run the next step um and uh so obviously like if that first step would take longer than an hour um for whatever reason then everything would fall over
Starting point is 00:12:30 and you had to like manually debug things and you know folks weren't happy because the reports weren't there and so it was it was kind of a struggle um and the other thing too is um you know the business was growing fast and so these so these jobs would take longer to run over time. And this one single box that we had would just get overloaded. And so what we wanted to do to solve that problem is we wanted to build a system that could dynamically scale with the workload, with the ETL workload that's coming in. And that's what became Kronos, which we open sourced at Airbnb. So were you a part of that then, Kronos?
Starting point is 00:13:08 Yeah, so it was mostly Flo's team. I contributed a little bit to it. But Flo was running the data infrastructure team, and they built that. And we built it. So we looked at this. We looked at the requirements that we had, being able to scale dynamically, elastically, being able to populate new machines as needed. And we were like, you know, this is a lot of, this is really hard, you know, this is,
Starting point is 00:13:31 you know, those are not trivial problems. But then we looked at Mesos and we're like, hey, wait a minute, you know, Mesos solves a lot of these things already. Like it has those things built in as primitives and we can just build on top of that and spend a lot less time to build this thing. And in fact, it took only three months to build the whole thing. And it had just like 3,000 lines of code, I think, somewhere around there. Which, you know, given that it's a distributed system that is fault tolerant and can scale elastically um can survive machine crashes and so on that's that's pretty awesome that's not a lot of code for that so you guys got excited about mesos and so excited that you decided to start mesosphere a company um kind of built on top
Starting point is 00:14:17 of mesos so i'm interested a little bit in the the social and economic kind of background with the project because you have it started at UC Berkeley. All of a sudden, huge players such as Airbnb, Twitter, more recently Apple, and many others hopping in and saying, this is something that we want, that we need, and we'd like to build upon. And then it became an Apache foundation project. So maybe just kind of explain that whole milieu a little bit and the corporate interests, the open source interests and break that down for us. Sure. Yeah. So, um, it became an Apache project, um, pretty early on basically when, um, when Twitter
Starting point is 00:14:59 decided to really invest, uh, invest in it and, um, and make it their production, um, the platform for running production. Before that, because it came out of the AMP lab in Berkeley, Berkeley had the rights and had the IP. And so Twitter, because the plan was to make it such a central piece of their stack, of their infrastructure, they wanted the IP be owned by the Apache Foundation just so they could contribute to it as well and to have sort of this neutral entity.
Starting point is 00:15:31 So it's not the lab, it's not Twitter or any other company that owns it, but it's owned by the foundation. So that was a decision that they made pretty early on. And then UC Berkeley donated all the IP to the Apache Foundation. Ben Heitman became the chair for the project. And then it sort of went the way that most Apache projects go. So every Apache project actually has a lot of freedom over how they want to manage it. But, you know, every Apache project has this idea of committers.
Starting point is 00:16:13 And, you know, the way it works is when you set up an Apache project and it gets accepted, there's an initial set of committers. So at the time it was, you know, the folks from Berkeley that have worked on it in the past. And then the project set up a process for, you know, how do we accept new committers? And the goal there was really because it is such a, you know, central, such an important piece of the stack, the project has a really high bar for people becoming committers. So this typically takes at least six months for someone to write enough code, fix enough bugs, get enough credibility into the community to be accepted as a committer. And so usually what happens is there is a vote that gets done among the existing committers. So someone will propose a new person as a new committer, and then the existing committers. So someone will propose a new person as a new committer, and then the existing committers make a vote.
Starting point is 00:17:08 And if there's enough votes, then that goes through, and that person becomes a committer. So being an Apache project, I assume it's the Apache license, right? That's right. Yeah, it's Apache 2 licensed. Which is one of the more free as far as you are free to build proprietary systems around it. That being said, you're building a company, a VC-funded company around Mesos. And I'm curious your thoughts on building a product or a service around software that ultimately is out of your control.
Starting point is 00:17:42 Yeah, so this actually works, you know, the model works really great for us. And, you know, we do have some control over the software because we are, you know, we're an active participant in the project. In fact, we have the most committers of any company on the Apache Mesos project. And so, you know, even though we don't own the IP
Starting point is 00:18:05 or we don't own the project, we have a big seat at the table. And what we really wanted to do is build substantial product around the open source project as well. So we really think that there's a lot we can add in terms of management tools and applications around Mesos and sort of, you know, the whole how do you operationalize this thing and really make it work in enterprise data centers.
Starting point is 00:18:37 That's really where we think we add a lot of value as a company. And so, you know, if a lot of these open source projects, you know, they're built by, you know, they're built by hackers and we kind of built these things for ourselves. And that works great. You know, those things, those tools work, they do the job, but they're not really built for enterprise environments. And so in Mesos' case, for example, when you set up a cluster and you use open source Mesos, out of the box, it's kind of open. You know, anybody can do anything. It does have some controls, but not enough to satisfy, you know, enterprise requirements where folks have, you know, really strict policies and auditing requirements. You know, especially when it's a bank, they have all sorts of regulations that they need to make sure they meet.
Starting point is 00:19:28 So what we're really doing there is building all these tools and APIs around Mesos to make it work in those environments too. Yeah. So it sounds like because Mesos is kind of the kernel of a distributed clustering system, there's a lot of other pieces to the operating system puzzle and um everybody at least the large players appear to be building their own so um apple has something i think proprietary called jarvis not sure if that's open source proprietary there's marathon you mentioned chronos that was built at airbnb some of these are open
Starting point is 00:20:01 source some of these are not ap Apache, Aurora. Can you kind of explain all, you mentioned Kronos, and you mentioned that there's other services and things that need to be built around it. Is there a comprehensive list of missing things that you need to have a data center operating system if all you start with is Apache Mesos? Yeah, so, you know, the best way to do this is really, you know, DCOS, the data center operating system, has a free community edition that you can just go to our website, launch a cluster on AWS and other clouds and just get started with it. It doesn't cost anything besides paying for the machines in the cloud.
Starting point is 00:20:43 That's really the best way to get started. And you get all the pieces, you get the CLI to interact with your cluster, you get the GUI to see what's going on, you get a package manager. So it's really all the things you know from Linux or other operating systems. There's an equivalent of that in the DCOS. So our package repository, so you can really easily say install Hadoop, Kafka, Cassandra, all these systems with one single command. The same way on Linux you would do apt-get install, we have DCOS package install. And in terms of the applications that run on top, you mentioned a bunch of them. I think the operating system analogy works really well there. So if you think of, let's pick macOS.
Starting point is 00:21:32 When you install macOS, it comes with a few applications pre-installed. So you fire it up for the first time. You already have Finder on there and you have a browser. Sort of the basics are there. The killer apps are there. But there's many other browsers you can run on mac os right you can use chrome you can use firefox or you can use opera um and i think it works kind of the same way in the dcos you know when we ship dcos it comes with marathon which is another open source project that we maintain at Mesosphere, which is kind of the equivalent of an init system that you know from Linux.
Starting point is 00:22:09 So it starts long-lived processes in your data center. So for example, your Ruby and Rails application or Node.js app, so anything you want to keep running forever, it does that. But that's just the Safari equivalent, right? That's the one that we ship. And we believe it's awesome. But if you want to use a different one, you can do that. And like you said, Apple built their own.
Starting point is 00:22:33 It's Jarvis. Hotspot built one called Singularity. Twitter built one called Aurora. Netflix is working on one. I think this really shows that the data center operating system model works, right? Because you get this foundation and it allows developers out there at all these companies to build their own applications on top, to use the API and build something that's custom for the environment that works well for their needs, that works well with their workflow. And in fact, I'd argue, you know, those things are kind of like
Starting point is 00:23:06 platform as a service equivalents, PASs. And we've seen a lot of PASs in the past. And I would argue that none of them have really been that successful. And I think it's because they're generally pretty opinionated. They have one specific workflow. And that, you know, usually just works for a handful
Starting point is 00:23:25 of people. It's that same workflow does not work for every company. And so one thing that the DCS allows you to do is really, you know, either take one of those existing things and modify them or just build your own completely if you want to have your own workflow. And in fact, there's, I think there's more than a dozen in total that are,aS-like systems that run on top of the DCOS. Another example is actually Docker Swarm, which they're also building on top of Mesos. So as developers, we always try to point out patterns and what's the same and what's different. And it seems like the Mesos makes a lot of sense to have that that cluster management and scheduler shared and but everybody seems to be agreeing that the platform
Starting point is 00:24:11 the marathon the jarvis this is where the concerns break out and you can't actually that's not shared infrastructure huh could it be could those all be shared like with one pass and we all you know just like like Apache Mesos, I noticed Aurora, which you said Twitter started, is an Apache project. How come it's not everybody's working on Apache Aurora, and then you guys are adding value at an even higher level? It's just because there's different needs at a low level? Right, yeah.
Starting point is 00:24:39 It's for that reason that I mentioned. I think all of these things take slightly different approaches. And, you know, Aurora and Jarvis and Singularity, they all have something that fills a specific need inside the company that built it. And that makes it less generalizable. And so, you know, and I don't think that's a bad thing. I think it's actually awesome that there's choice and, you know, that if you're new to the space, you can look at the patterns that each one of those systems use
Starting point is 00:25:12 and just pick the one that works best for you. Kubernetes is another example, which, you know, came out of Google and has sort of their workflow and their abstractions built in. You can run that on the DCOS as well. Yeah, I was just going to ask about Google because they seem to be the missing entity in the large players here, Amazon as well. We mentioned them a little bit, but Google has a thing called Borg.
Starting point is 00:25:36 Could you explain how Borg fits into this or doesn't fit into this? It absolutely fits in. Yeah. So actually Borg was probably the first system ever in this space. And, you know, Google uses it internally. It's not open source. They don't sell it. They wrote a paper about it. And in fact, Mesos takes some inspiration from Borg. You know, Google is a sponsor of the lab where it came from.
Starting point is 00:26:01 So there was, you know, a good exchange of ideas back then. It also does a few things differently than Borg, but definitely, you know, took a lot of the lab where it came from. So there was a good exchange of ideas back then. It also does a few things differently than Borg, but definitely took a lot of inspiration. So yeah, Borg is the cluster manager that Google uses internally for pretty much everything. So if you're using Gmail, that runs on Borg. If you're using Google search, it runs on Borg. They run all the databases. I think even Google file system runs that runs on Borg. If you're using Google search, it runs on Borg. They run all the databases. I think even Google file system runs on top of Borg. So it's really their one stack that they use internally to run all the things. Awesome.
Starting point is 00:26:35 Well, I want to ask a few more questions about Kubernetes and clear up exactly how that fits into everything because it seems like it does play nice in this ecosystem. But we'll take a quick break, hear from a sponsor, and when we get back, we will ask Toby about Kubernetes. You've heard me talk about TopTile several times on this podcast, but today is different. I've got a special treat for you. I went out and spoke with a listener who a year ago had never heard of TopTile. He listened to the show just like you're doing right here, right now, today,
Starting point is 00:27:07 and heard us talk about TopTile and what they're all about, and he decided to get in touch. And now he's living the dream as a freelance software developer with TopTile. His name is Daniel Elzon, and I sat down and I talked with him. I said, hey, what is it that you love most about TopTile? Take a listen. Well, for me, the thing about TopTal, which I thought would be very hard for me personally as I transitioned to a more consulting role, was the way I would have access to new clients and what quality those would be. So I found that I've had access to awesome clients through TopTal, and it hasn't been that hard to find because they have a lot of choice.
Starting point is 00:27:47 And even more than that, there's enough choice, and I can actually be a little selective about what kinds of things I want to be working on. So I use that as a way to sort of hone my skills and go towards the technologies I think are worth investing in for the future. So whether it's, you know, including new front-end frameworks or doing a little DevOps work on the side, I usually am able to find clients who have the needs of the things I want to get better at. So that's been truly useful. All right, that was Daniel Lausanne, a listener of The Change Log and also a freelance software developer with toptow if you want to follow in daniel's footsteps go to toptow.com slash developers that's t-o-p-t-a-l.com slash
Starting point is 00:28:36 developers to learn more about what toptow is all about and tell them the cheese log sent you. All right, we're back talking about Apache Mesos, Mesosphere, the cloud, digital, or digital, distributed systems. Curious about Kubernetes. You mentioned it previous to the break, but I'd kind of like you to explain it in more detail for us. Yeah, so Kubernetes is an open source project that Google kicked off last year, 2014. They announced it, I think in around June last year, after working on it for a couple of
Starting point is 00:29:14 months. So it's really, it's a container manager, container orchestrator that uses a lot of the same abstractions and learnings from Google internally. You know, the things that Google learned over the years, building Borg and, you know, its multiple iterations. And so they took all those learnings and, you know, put it into a new open source project that they built from scratch. And that is what Kubernetes is. So it's, you know, it's a really nice tool.
Starting point is 00:29:48 It's really simple and easy to use. They have, it has kind of two main abstractions that, you know, Google found very useful for managing large numbers of containers. One of them is the idea of a pod, which is basically a group of containers that get launched together on the same physical machine that share the same network address and share the same volumes. So a use
Starting point is 00:30:15 case would be, for example, if you're running a web application in one container and you want to run some monitoring system right next to it or some logging agent right next to that web application, you can run that in another container, but they get launched together. And they share the volume, share the disk, so they have access to each other. So that's the idea of a pod. It's one of the things that are unique about Kubernetes.
Starting point is 00:30:41 The other one is this idea of labels and using labels to model dependencies in the system and discover other pieces in the system. So, you know, when you're running lots and lots of containers at scale, one problem is really how do you discover things? You know, how do you figure out where things are running and how do you say, you know, this web application depends on that database? And traditionally how we did that once, you know, this web application depends on that database. And traditionally how we did that once, you know, in the past is with DNS, right? We would just say, you know, my Rails application, you know, go talk to database.company.com. And, you know, that doesn't really work in very dynamic
Starting point is 00:31:18 and elastic environments where containers can move around and you don't really have this model of, you know, pinning an application to a specific machine and making sure it lives there forever. So yeah, what Kubernetes does instead is it gives you labels. So you can basically just say, my web app depends on the thing that is labeled type database and environment production, something like that. So those are kind of the two main abstractions in Kubernetes, plots and labels. And we like those a lot. And Kubernetes is a really popular open source project. It's getting a lot of traction.
Starting point is 00:31:56 It's really great for running containers in sort of a microservices environment. And so that's why we decided to become part of the project too. And I think, you know, where we can really add value there is we can make it run really well in, you know, any environment where the DCOS runs. So you don't have, you know, the same way you don't have to go through the setup of the other distributed systems that we support. We make that really easy for Kubernetes as well. So you can literally say, you know, DCS package install Kubernetes, and you're up and running. You have a Kubernetes cluster configured, and you can start running containers.
Starting point is 00:32:37 So because it's operating at the level of orchestrating Linux containers, it can actually sit on top of Mesos and on top of the DCOS that Mesosphere adds to Mesos. Is that right? That's right, yeah. So it sits sort of at the same level where all of our other services sit, sort of right next to Spark, Hadoop, Marathon, all these other things.
Starting point is 00:33:00 So I'm just going to try to lay out my understanding of this whole stack, and I just want you to tell me where it falls down or if I'm tracking you because I feel like I am, but then I turn around and I can't, and I realize I have no idea what's going on. So you have, and maybe Adam, this will help you as well. It'll definitely help me. Okay, so you have your hardware, right?
Starting point is 00:33:20 Yep. And then on that you have an operating system like Linux. That's right. And then on top of that you have an operating system like Linux and then on top of that you have Mesos which turns many Linux's, thousands are scalable, up and down into one clustered
Starting point is 00:33:35 thing and then on top of that now you add on top of your it's not really application layer but now you can start adding your, this is where your Kubernetes we call it services your services so maybe you have a hadoop service um or you have kubernetes which allows you then to manage linux containers so now you have a second layer of linux um but abstracted away from the hardware now uh yeah Which then inside of those containers, you could run your Hadoop, right? Exactly.
Starting point is 00:34:07 So I think you described it really well. So you have the hardware. The next layer up, well, Linux is there. The next layer up is Mesos. That's the layer that abstracts, not really abstracts, but manages the resources, manages the hardware resources. So it knows how big your cluster is.
Starting point is 00:34:26 It knows how many cores are available, how much memory is available. And it uses those resources from that one big pool, which is your whole data center or your whole cloud, and offers them to the services that run on top. So the services on top are kind of your building blocks. They're kind of your Legos that you use to build your business application.
Starting point is 00:34:46 So if you're building a web app, you need a database. So launch a database on the DCS. If you're building a web app, you need a way to run containers. So use one of the container orchestrator services like Kubernetes, like Marathon, like Docker Swarm, and so on. So those are the building blocks. And then you use those building blocks to manage your application. So your application code goes into a Linux container. You give that Linux container to one of those orchestrators.
Starting point is 00:35:17 They run it in the cluster. You get the tools, the service discovery, for example, to let your application talk to the database that you launched earlier. That's how it all fits together. Gotcha. I think I follow that. Adam, does that clear it up for you? Yeah, I'm definitely tracking on that.
Starting point is 00:35:31 I mean, it's definitely still complicated, but I'm tracking for sure. It's a whole new world. Everything's different. Yeah, it is. And I think one of Kubernetes' pitch was Google's infrastructure for everybody. And a data center operating system is kind of the same idea. It's like you could have access to this kind of scale without having to manage all those tricky pieces below where you care about.
Starting point is 00:35:54 And we've been talking about big players, Apple, Google, Twitter, Airbnb. And so the question that pops up, because I'm just a little guy, and as a lot of developers are out there, Airbnb. And so the question that pops up, because I'm just a little guy, you know, and as a lot of developers are out there, a lot of our listeners are developers wondering, like, is this something I even need to be caring about? Right? Somebody who maybe runs a couple servers that maybe I have a web server and a database server. Should we be paying attention to this stuff? Or is it really the world of twitters and airbnbs i think everybody should be paying attention to this um and and here's the reason so i think when we build things today um we sort of have to we always have to choose between you know building
Starting point is 00:36:38 things quickly or building things for scale i've definitely been in that situation if you look at you know airbnb and Twitter in the early days, they were just a simple Ruby and Rails application that talked to a database, right? That's how they both started. And they sort of had to choose to build things quickly, build for a time to market. And then when they started growing,
Starting point is 00:37:03 it became really hard to scale those things. So I think for all the developers out there that are working on something that they hope that one day will be big, I would say build it on top of the DCOS from day one. Because when it comes time to scale that thing, you'll have a lot less things to worry about. And, you know, I think even if you just have two servers and the DCS can already add value, you know, if one of those two servers fails, it can move your applications to the other one. So I think that's already pretty awesome. I think, you know, the reason why we're seeing mostly big companies using this stuff right
Starting point is 00:37:42 now is because, you know, know for them there's no alternative like their pain in managing those many many service that they have is so big there's just no alternative to automating the whole thing um and so you know their their bleeding is bigger that's why we're seeing a lot more of those guys using it but um you know if i were to start a company today um and you know build some uh say build a mobile app with a backend, I would definitely build it on the DCOS from day one. You don't think things are moving too quickly for those who don't touch it quite as often as, say, daily? Like a large ops team might. It's not moving so quickly that they'll just spend most of their time kind of playing catch up to this new tech?
Starting point is 00:38:26 No, I don't think so. And really that's our mission at Mesosphere is to really make this kind of an easy-to-use product. So you don't have to be a cluster management expert or a distributed systems PhD to run that thing. We want to make that really, easy you know as easy as linux you know and get get to that same level of um of sort of turnkey experience maybe we should uh take some time now to break down mesosphere then so now we've talked about mesos marathon chronos and
Starting point is 00:38:57 the whole slew of things kubernetes borg even um let's talk about what this does to bring it to a DCOS. So DCOS is, Mesosphere is the company, DCOS or Mesosphere DCOS is a product. Where is this a shipping? Is it, I guess, do you download it? Is it a cloud service? How does this work? Right. So there's basically two ways to run it, which is on one of the public clouds like AWS and Google Cloud and Azure, or you can run it on your own machines if you have a bunch of machines in a data center somewhere or you own a whole data center. You can go to the Mesaphere website today, basically click a button and launch a fully configured cluster in one of the clouds.
Starting point is 00:39:53 So all the, you know, it just uses the standard provisioning tools that the cloud providers have, like CloudFormation on AWS, and it brings up all the machines, it's fully configured, you know, it takes about 10 minutes, and you're up and running. So it's not hosted by Mesosphere, but we just give you a template.
Starting point is 00:40:15 We redirect you to AWS. You log in with your own account, and you use that template to bring everything up, but it's your machines and your AWS account. So you manage the whole thing. If you want to run it in a data center, in your own data center on your own machines, we have sort of an early access version of that product. And we're giving that to, you know, a handful of design partners right now and early customers that are helping us,
Starting point is 00:40:44 you know, sort of polish it up. So it's kind of call us and we'll get back to you and help you install it at the moment. So when we go to the product page, we see Community Edition free and we see Enterprise Edition, let's talk. Is that the dividing line there? The Community Edition is what you can go and launch today
Starting point is 00:41:01 and the Enterprise version is what you can take to your own data center. Exactly, yep. Okay, so I guess since there's so much underlying tech under this, and we all know what open source is, the most easy way to ask is, why isn't community edition free? Or I guess it is free, but why isn't it open source? Is there a reason why you went the way you went with it?
Starting point is 00:41:23 Or do you plan on having that as a paid version at some point? So we're evaluating our options there right now. We love open source. We're a big contributor to Mesos, in fact, and to Marathon and Kronos and other open source projects. So today, the majority of code that we write is open source. And we're having the conversation right now about DCS. You know, what should we do there? Should we make it entirely open source?
Starting point is 00:41:51 And parts of it are already open source. So, yeah, we'll probably have some news there sometime soon. Which parts are open source right now? So kind of the parts that I mentioned. So Mesos, Marathon, and Kronos, and all the other frameworks that we're running, like Cassandra. The integration between Cassandra and DCOS, for example, Kafka and DCOS, all that stuff is open source. So earlier when we talked about terms and tried to divide the lines a bit, so Apache Mesos is different from Mesos.
Starting point is 00:42:27 No, that's the same thing. Same thing? Okay. Apache Mesos, Mesos, same thing. Trying to make sure because I see it's a fork and I wasn't sure if it was, is it your own flavor of the fork or is it the real thing itself? Yeah, so we work with Apache Mesos. That's what we contribute to and that's the version that goes into DCOS also. So do you have any thoughts on the future of this then
Starting point is 00:42:52 in terms of how it plays back in open source? I know you've got components in there, but is it something where, you know, for example, the one that comes to mind right now is, just because the naming is so similar too, is GitLab, right? GitLab has an enterprise version and it has a community version and community is the open source free version. And then enterprise is something you can buy and install, or they even have hosted similar to GitHub. Right. Yeah. You know, for us, we, we really think that this is, you know, this makes
Starting point is 00:43:21 operating infrastructure so much easier and we really want to give it to everybody. We don't want to hold those things back. And so that's why we already have a free community edition that there's no charge for it. And we're thinking hard about open sourcing it as well. It has some implications, of course, but we're thinking through that process right now. Yeah, just curious because it seems like that would be the place to start if you were going to, and I figured that was the question on every listener's mind is, hey, why isn't community free then, or why isn't it open source?
Starting point is 00:43:57 If it's free, why not make it open source too? Right. Do you guys feel any pressure from, because you are VC funded, investors on the open source front? Do they push you away from it? Do they push you toward it? Is it a complete non-factor? So our investors are really awesome. Someone told me at some point, besides looking for money besides besides looking for money you don't really look for a business partner and um you know our two biggest investors are our coastal adventures and uh and recent harvids and they've been really great uh to work with and you know they really want to see
Starting point is 00:44:36 this they want to see this uh be successful in the in the long run and um and they give us a lot of freedom to run the company so um it's you know it's really in a large part it's it's our decision you know how much we want to do open source and and how much we want to do closed um you know not really getting getting pressure from from the vcs on that and they see the value of the open source too and you know they've invested in other open source companies before um so you know they understand the model, they see the benefits. Cool, switching gears a little bit here. I'm thinking about languages and Apache Mesos itself, a C++ project.
Starting point is 00:45:16 It seems like a lot of the projects built on top of it, such as Marathon are Scala, or is it Scala? I call it Scala. Scala, yeah. Thank you, Scala, so I'm right. Or at least according to a couple of us, such as Marathon, are Scala. Or is it Scala? I call it Scala. Scala, yeah. Thank you. Scala, if I'm right. Or at least according to a couple of us, I'm right. It's a Toby.
Starting point is 00:45:30 According to Toby. We'll let you have the final say. So Scala, yeah, I'm a big fan of right tool for the right job and learning why tools are the right for particular jobs. It seems like Scala is well fitted for this space. I'm wondering if you could speak to that. Yeah. So actually we have,
Starting point is 00:45:51 we're working in a lot of different languages in that, in that layer and sort of the DCS services layer. So yeah, Kronos and Marathon are in Scala. There's a bunch that are in Java just, you know, because the project started that way, for example,
Starting point is 00:46:06 Cassandra and Hadoop and HDFS. And they're all Java. And then there's Go also. Kubernetes is written in Go. We're really language agnostic there. So Mesos has an API for a lot of languages. Python, in addition to those that we just mentioned. I think someone wrote Haskell bindings too. So you can really use pretty much anything,
Starting point is 00:46:31 and it's pretty easy to write your own language bindings. Mesos is right now getting a new HTTP-based API, which will ship fairly soon, so it'll be even easier to build language bindings. We picked Scala originally. I'm trying to think why we picked that. At the time, so this was three years ago, I think, at this point, when we started building Kronos, Java was still a very popular language for systems engineering. And Scala is also a JVM language, and we found it to be more expressive than Java.
Starting point is 00:47:11 You know, you had to write fewer lines of code. It allowed us to do functional programming, which was really interesting. And so, you know, that's why we went with it. It just seemed, you know, more modern, more effective, and more efficient than in terms of the time it takes to write software. So that's why we went with it at the time. But I completely agree with you. I think it's all about finding the right tool for the job. And I think there's right now new exciting languages and systems engineering. Go is definitely getting a lot of, you know, picking up a lot of steam. Scala is getting more popular, too.
Starting point is 00:47:52 There's a few new ones like Rust. So we're really, you know, we're really language agnostic there. So if you want to develop for the DCOS, you're not tied to a specific language. Yeah, but with a few years of experience looking back now on that decision, do you feel like it was a good decision? Are you still bullish on Scala? Are you personally even? Because I notice you've been writing some as well. Just curious your thoughts on it.
Starting point is 00:48:21 And then if you are personally looking at Go or looking at these other things, Rust, or if that's more as a company. Right. So personally, I think, you know, it looks like the Go community is really, you know, growing and a lot of the newer tools that we're seeing in the systems engineering space are written in Go. So, you know, today I would probably build the thing in Go, you know, for that reason. I personally still like Scala a lot and I prefer it over Go actually. It's, you know, it has, Go is a very simple language, which is great.
Starting point is 00:48:54 You know, it has a low barrier of entry. It's, you know, the code that you write is very consistent because there's usually only, you know, only one way to do things or a few ways to do things. So those are huge benefits. Scala lets you do functional programming really well. And so they both have their strengths and weaknesses. For a project like a systems engineering project, a cluster manager, I would probably go with Go today. I'd probably still use Scala for, say, if I was doing something in data analytics, Spark-based, that's using Scala, or web backends and things like that. I would probably still go with Scala. So all the folks who we've gotten recently as listeners to the show, Jared, are pretty excited because we just came back from GopherCon, and most of those guys are either writing go or interested in writing go.
Starting point is 00:49:52 So they're hands in the air except for when you said Scala. Anyways. Well, we're going to take a quick break. We'll come back and ask you some really awesome closing questions. But we're going to take a break. We'll be right back. I have yet to meet a single person who doesn't love DigitalOcean. If you've tried DigitalOcean, you know how awesome it is.
Starting point is 00:50:14 And here at the Changelog, everything we have runs on blazing fast SSD cloud servers from DigitalOcean, and I want you to use the code CHANGELOG when you sign up today to get a free month. Run a server with one gig of RAM and 30 gigs of SSD drive space totally for free on DigitalOcean. Use the code CHANGELOG. Again, that code is CHANGELOG. Use that when you sign up for a new account. Head to DigitalOcean.com to sign up and tell them the change load sent you. All right, we're back. Great break there. And we got some awesome closing questions that many, many listeners are always just like, I love when they ask those questions. And the first question, I think, is maybe a call to arms, Toby. So of the projects you have out there from the CLI to Marathon,
Starting point is 00:51:05 to all the different projects you mentioned, what are some call to arms, some ways that the open source community can help rally around what you're working on or what you're doing to amplify what you're doing or to just step in and help out? Right, yeah. So I think the goal of things like Marathon or the whole DCOS is really to automate everything.
Starting point is 00:51:30 So, you know, it's making ops people's lives better, SREs. So if you're working in that field and, you know, you hate doing the same thing over and over again, and you hate, you know, getting woken up in the middle of the night because some random machine your data center failed or in your cloud you know may want to check out marathon and the DCS and you know see if it works for you use case and you know try it out and you know help us help us fix some bugs in there and help us make it work for even more use cases. And the GitHub org that you guys operate off of is just slash Mesosphere, right?
Starting point is 00:52:12 That's right, yeah. GitHub slash Mesosphere, tons of stuff on there, you know, from little weekend projects that some of our guys are working on. There's some really cool stuff there, actually. So, you know, I don't know if you guys have any HPC, high performance computing listeners, but we have a couple of guys on the team that are playing with stuff there like MPI.
Starting point is 00:52:32 So yeah, some really cool things there. Also, if you want to get started developing for the DCOS or building a distributed system, if you've never done that, there's a really cool example project on our GitHub called Renderer. It's a rendering web crawler so you you know it crawls the web renders all the pages shows them in a big graph um and it's basically example code for developing a distributed system on on top of the dcus very cool um it's in a lot of different languages i think we have scala java python you know a bunch of others so it's a really good place to get started it's called wrangler is that right rendler rendler okay i just others. So it's a really good place to get started. It's called Wrangler, is that right? Wrangler.
Starting point is 00:53:07 Wrangler, yes. I just found it. Yeah, it's a couple pages back. We'll link to some of the show notes, but it's R-E-N-D-L-E-R, Wrangler. Right. And it seems like it's based on the guy here on the readme. It looks like Joker. Is it the Joker?
Starting point is 00:53:27 It's the... What is that? Do you know? What's the guy here on the readme it looks like joker is that the joker it's the um what is that you know what's the guy i think it's riddler riddler okay yeah riddler i know there was something like that it looked like joker for a second but it was riddler but anyway yeah well you're not kidding when you say you have a lot of open source stuff out there because six pages on there uh slash mesosphere and i think there's a lot i think it's 20 per page some of those are forks obviously um but still lots of lots of cool repos out there for those who want to go digging yeah next question one we cannot skip everyone's favorite is who is your programming hero my programming hero i think, is Mark Andreessen, because the guy did so much. He wrote really the first usable web browser back in the day.
Starting point is 00:54:12 And he did so much for making the internet what it is, went on to start Netscape, which at the time, they came up with JavaScript, the Netscape browser, of course, the Netscape application server, which back then was basically the way to build web applications, the equivalent of Node.js and Ruby on Rails and all of those tools that we have today. So yeah, he's my hero. Very cool.
Starting point is 00:54:42 And a question we don't ask every single show but i love asking this question which is what's on your open source radar so if you've got a weekend or even a week where you could just like take a vacation that's not really like traditional vacation where you just go and travel and have fun you actually like maybe you know go travel have fun and hack too it's a hackcation or something like that i don't know But if you had some time where you weren't forced to work on what you work on daily, either by your own passions or your own commitments, if you just could take a weekend or a week, what would you play with? What would you work on those. But, you know, things that we aren't working on as a company, I can think of two. There's a really cool deep learning framework also managed by UC Berkeley. It's called CAFE. So it lets you do, you know, deep learning, neural networks.
Starting point is 00:55:41 That's kind of a, you know, machine learning is kind of a passion of mine. So I'd probably check that out. Cafe. And the other one I'd take a look at, it's a monitoring tool built by SoundCloud. It's called Prometheus. That looks really cool too. Oh, wow, Adam. Probably check those out. That's a good tee up because that's our next episode. Isn't that next episode? It's definitely in the close pipeline. I can't remember if it's next or second to next. But, yeah, we're having the Prometheus team on. I think it is next week, actually. Yeah, it is.
Starting point is 00:56:10 And it's after this show. Right on. It's like we've been here to say that. Julius Volz. Is it just Julius coming on the show or who else is coming on the show? Probably just Julius, but possibly Bjorn as well. Bjorn would be awesome. Those guys were so awesome.
Starting point is 00:56:22 Cool, yeah. We hired. We have a few SoundCloud people here. and they're all raving about it. Yeah. We'll be talking to Prometheus soon, so maybe you can tune into that episode. So you got Cafe, which is a deep learning, I don't know what you'd describe that as, deep learning framework. Yeah, or a toolkit for deep learning. So why Prometheus? yeah or a toolkit for for deep learning so why prometheus so i think um you know i'm i did a lot
Starting point is 00:56:48 of ops in my career a lot of you know a lot of sre and so monitoring tools is always um it's always a hot topic um there's just so many shitty tools out there um so you know prometheus really looks like like something fresh, something different. I haven't really tried it out yet, so that would be my first thing to do, is just get it up and running and hire some data at it and see what it does, just play with it. Very cool. Well, Toby, it was really awesome having you on the show. For those who don't know how to reach out to you, what's the best ways to get in touch?
Starting point is 00:57:24 Twitter, GitHub? Twitter works. having on the show for those who don't know how to reach out to you what's the best ways to get in touch twitter github twitter works um my twitter handle is uh it's super gunter so that's how you can find me we'll link to that one super gunter that's awesome uh cool man so any any of the closing thoughts before we close out the show for yourself? This was fun. This was fun? All right. That's a good closing thought. All right. Cool.
Starting point is 00:57:48 Well, I want to say a huge thanks to everyone who listens to this show and specifically those members out there that support this show. We're member supported. We're also sponsored. So the sponsors we have for this show are CodeShip, TopTile, and DigitalOcean for this show. We love those guys that make this show possible. Jared, you're awesome. And Toby, you're awesome for joining us as well today. Until next
Starting point is 00:58:11 week when we talk about Prometheus, let's say goodbye, guys. See ya. See ya. Goodbye. We'll see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.