Programming Throwdown - Reactive programming and the Actor model

Episode Date: September 28, 2018

Hey everyone! This episode is an absolutely fascinating interview with Jonas Bonér, creator of Akka. We dive into reactive programming, the actor model, and the Akka toolkit. Jonas also desc...ribes his journey as a developer that led him to create Akka and found Lightbend with Martin Odersky, the creator of Scala. Jonas brings a ton of in-depth technical discussion, so buckle up! :-) Show Notes: https://www.programmingthrowdown.com/2018/09/episode-82-reactive-programming-and.html ★ Support this podcast on Patreon ★

Transcript
Discussion (0)
Starting point is 00:00:00 programming throwdown episode 82 reactive programming and the actor model with jonas bonair take it away jason hey. So we have a really cool interview. I'm sure a lot of you know about Scala and Akka. You've heard about this. We've talked about this on the show. And we have Jonas Bonner, who is... Are you the original creator of Akka? Is that right? Yeah, I started it back in 2009. The first launch of the product was in 2009. I started hacking on it a year earlier.
Starting point is 00:00:49 So, yes. Cool, excellent. Jonas is going to explain kind of reactive programming, the actor model. He's going to talk us through sort of that whole kind of revolution. I think it's amazing to do, especially for UI and for a lot of other kind of processes, to have this sort of model. And Jonas is going to really, as an expert, is going to explain it to all of us. So, Jonas, why don't you kind of tell us your background, why you got, you know, how you kind of ended up getting really into this sort of programming model. Is it something, is it sort of a series of anti-patterns that you saw along the way that made you say oh
Starting point is 00:01:25 we need to build something new here so what was the motivation for that and what's that journey kind of been like it's been i guess about 10 years right yeah yes it's time flies by quickly yeah it's actually started a bit earlier than that even it's uh so So my journey into distributed systems and concurrent system and stuff like that actually started the journey towards Akka and the Akka model there and everything started back in I was I joined BA systems back in the day and I was working then on open source product, Aspect, Aspect oriented program work were quite popular back then,
Starting point is 00:02:10 I don't know if AspectJ and stuff, Aspect works was later merged with AspectJ and etc. So there was a lot of like barcode weaving and having this dynamic capabilities to Java and any language that implemented AOP. So I actually worked on that open source product while working at BA Systems. But then I was headhunted by a small startup in 2003 or something like that that used the technology that we had built which was of course
Starting point is 00:02:45 open source a small small startup in the valley called called terracotta and they did distributed systems and and and and essentially tried to cluster the jvm underneath the jvm itself to make i mean and maintain the programming model of sort of you know threads and locks and sort of stretch that out across a set of distributed nodes and try to do sort of all the messiness underneath to make that work. Essentially, if we should get geeky here, the way it actually worked was that the sort of locks and sort of memory barriers were sort of translated into sort of transactional scope. So it was those transactions that were maintained across the JVM. And they did that completely transparently using Aspect-oriented programming. So in that case, how do you... And the tech that I had built.
Starting point is 00:03:46 How do you handle the data dependency there? I mean, if somebody is treating it as like threads, they might not be aware of that they have to broadcast all this data, right? Exactly, exactly. I mean, when I joined the company and actually went promoting it and stuff, we actually went quite far, you know, selling people on the model. But the thing is that it was a completely broken model from the start. And that started to grow on me after, you know, being out of clients,
Starting point is 00:04:21 you know, after a couple of years. And we never really got it to work. And no surprise, you know, I'm now a firm believer of the sort of opposite way of approach in distributed systems, where you sort of embrace the network, embrace the constraints of the network, and instead of trying to hide it, you know, it turns, if you try to hide it, in my opinion, it turns into this leak abstraction that leaks so much that it becomes more or less useless. You know, we've seen this many, many, many times in the past, with, you know, with RPC and distributed objects
Starting point is 00:04:58 or XA or anything like that. You know, it's actually, it works to the point where it doesn't work, and when it doesn't work, everything falls apart. Right. Where you have network disconnection and partial failures, you have really no idea how you can recover from, et cetera. So that, you know, made me completely lose faith in that model. And actually, in general, you know, the way I was sort of doing,
Starting point is 00:05:23 after that, I was doing consultancy on distributed system in Java in general, you know, the way I was sort of doing, after that, I was doing consultancy on distributed system in Java in general, you know, embrace Macorba and all the tools that they have, EGBs and all the tools that they have there. And it started to grow in me that all of this is just the wrong way of approaching the problem and I went through this crisis in a way. I started to sort of, you know, dig through a lot of research papers and stuff and then it started, then I chatted with a friend that had been actually always in school, been programming in Erlang. You know, he went straight to Ericsson and Ericsson developed his language, his esoteric and quite obscure language called Erlang. And I just, you know, when talking with him, I sort of realized that, wow, this is the model I've been looking for. I mean, communication or distributed communication is first class.
Starting point is 00:06:20 We have true isolation. You know, failure can't cascade across components or across nodes, as we've even seen, you know, often in Java and things like that. And that sort of set me on the path first, learning Erlang for real. So I've been tinkering with it back in school, and I, of course, knew what it was, being sweet and having friends using it, but i've never really took it seriously but i i felt like of course i can't you know program erlang and day in and day out all my clients you know my my whole life is served on the jvm you know i then okay then i better find a way to to port that
Starting point is 00:06:57 model because essentially the model the principles they're so so they're so good. Over to Java. And that's how I started Akka. Back in 2008 was that. Oh, I see. So are there any sort of, did you encounter any sort of really fundamental limitations of, or say differences between the JVM and Erlang that were like serious roadblocks for Akka? Or was it more of like just a mechanical like, you know, day in, day out, let's get it done. Like, were there any like real technical hurdles there? Yeah, one of the things, you know, that people always tend to point out when they come from Erlang, and they're absolutely right about that, is that you can't have any, you can't
Starting point is 00:07:41 have such a thing as true isolation on the JVM because it is a shared heap. So sort of isolation is only by convention. But by trying to have a programming model that sets up the boundaries and tries to make it easy to do the right thing. And regardless, even if you use ACA, you can always use reflection and stuff like that to bypass that if you really, really want to shoot yourself. So that's one of the things that's hard. And also things like garbage collection. Erlang has garbage collection, but it has it per actor.
Starting point is 00:08:24 So it's not that. Erline has a garbage collection, but it has it per actor. So you have more fairness because you have control over the garbage collector. While in SysAka on the JVM, there's really no control we have over such low-level things. And sometimes it can be more of long latency pauses and stuff like that. But of course, very often those can be mitigated by doing the right thing. And we try to push people in the direction of doing the right thing, not creating too much garbage and stuff like that. That makes sense. But still, it's not the same thing as an Erlang, you know, where they're full control
Starting point is 00:09:05 over everything. The VM is built with actors in mind, which is not the case for the JVM. So in the case of Akka, I mean, you could have all the actors running in separate processes, right? But that probably has its own expense, right? Yeah, no, we don't do that. We sort of, so the way it works, you know, if we should go into how, into the actor, should we first explain the actor model? Yeah, that's a good point. So maybe let's, yeah, let's spiral back a little bit and talk about, you know, what is the actor model?
Starting point is 00:09:44 What is reactive programming? If someone comes, let's say they're in first year of university, they've taken systems software and they've written some code in C, and they say, what is the actor model? What does this mean? Yeah, sure. The actor model is actually quite old. I learned it from Erlang, but Erlang took it from papers by Carl Hewitt, that he wrote in the early 70s or something like that. model is a complete computation model, like lambda calculus, a foundational computational model that has a different take than most of the other ones. It models communication as first class, communication and behavior. And it's called the actor model because actor is the unit of computation.
Starting point is 00:10:47 It's sort of the unit of work that you have. And, you know, actors are extremely lightweight. At least, I mean, in a sense, now we're getting down to how they usually are implemented. But both in Erlang and most C++ implementations that I've seen, as well as ACA, implement actors in the way that they are extremely lightweight in terms of how they use resources.
Starting point is 00:11:17 For example, ACA, you can run millions of them on a regular laptop. And they don't consume... Oh, I see. So it's almost like a green thread. Like a green thread or a coroutine or something like that. Exactly. So we do multiplexing on threads, you know,
Starting point is 00:11:34 so you have like this N, N, N colon M mapping of actors to threads. And Inaka is called the dispatcher, but yeah, it's that sort of thread pool that you schedule or that you run them on. Inaka only consumes like a heap. Yeah, just for people who might not be familiar. So basically, if you create a thread or even to a greater extent, you create a process, this is actually a lot is happening under the hood. You might say just new thread and it's it seems really quick or std thread or something in c++ but actually under the hood the os is doing a ton of work there
Starting point is 00:12:11 and so you can't realistically create let's say a million processes you know on one computer it's just not practical and so a lot of these uh you see these green threads or these co-routines the idea is the system creates let's say's say, 32 threads right at the beginning. And then when you say, hey, I want to do this bit of work on the side, it just uses one of those 32 threads that's already been created. And if you try and do 33 things at the same time, that 33rd thing just kind of sits in a queue. And once one of the threads is free, it starts offloading that queue.
Starting point is 00:12:49 Right, exactly. So that's essentially how we implement actors in Akka. I mean, nothing prevents you from implementing the actors as having one actor per process. But I think the whole benefit of actors falls apart, in my opinion. The actors are so great because they are not mapped to threads, that you can easily create thousands or millions of them and spread them out on the cluster.
Starting point is 00:13:14 The way I tend to look at them is very similar to how things work in nature, mapping them to how ant colonies work, or bacteria or something like that, where you have so many, right? So if a few of them fail, it doesn't really matter because others can take over where the failed ones started and without really affecting the overall health of the application. And that's essentially how I like to look at actors and why I see them a really powerful model for the cloud, where you have all these compute resources and all these nodes available. But a few other things that are characteristically important to actors is that all communication is first class, built into the model.
Starting point is 00:14:10 Communication is distributed by nature or by default, where local communication is just an optimization. It is a unification of distributed and local communication. That means that they both look the same. It looks the same if you communicate with an actor that sits right next to you or lives in another node or in another data center.
Starting point is 00:14:36 It looks the same. And this is what we call location transparency. And one of the benefits of that is that if you have a reasonably advanced runtime underneath here that sort of understands how actors are being used and which ones are overloaded,
Starting point is 00:14:55 which ones are underutilized, which ones are actually failing now and stuff, you know, that runtime can optimize these things by moving the actors around. And this is sort of, in my opinion, the key to location transparency, that the cluster can optimize itself by shuffling actors around on different nodes without the experience from the user perspective, the client.
Starting point is 00:15:21 He never really has to even know that that is going on. That's a lot thanks to the dynamicity of the actor model, that the actors can serve. One thing that I didn't say is that the way you communicate with an actor is usually through some sort of reference. In Erlang you have what is called the PID, the process ID that you send messages to, or communicate with. In Akka we have what is called the PID, the process ID that you send messages to, or can you communicate with it. In ACA, we have what is called the actor ref. So you never communicate with the actor directly, but only through this sort of proxy or this reference. So this means that the actor can
Starting point is 00:15:55 live anywhere. And this sort of leveling direction also gives the runtime the opportunity to manage failure in a transparent way by sort of failing over to other actors, even on other nodes, or to route messages, you know, to other actors of the same type and things like that, you know. So that's a very powerful thing. And this is, you know, also leveraged
Starting point is 00:16:16 in the failure model. So that is sort of one of the key things also that I see from the actor model, that it has this notion of supervision, that actors watch out for each other. It's completely masterless, decentralized. There's no special actors or anything like that. But each actor can watch out for other actors. And if his buddy dies, he will get a notification and say, oh, your buddy died.
Starting point is 00:16:45 What do you want to do about it? And then he can choose to escalate because it might be above his pay grade what to do. He perhaps didn't create the actor or stuff like that. But then he escalates up the hierarchy. Or he might decide to do something about it and restart the actor or take over his work and things like that and and all that can be done you know thanks to isolation that these actors are completely self-contained or autonomous units that that um to serve that doesn't have any any strong coupling with anything else so basically uh the the end user could build let's say some checkpointing at the actor level.
Starting point is 00:17:25 And then if they say, oh, this person that I was dependent on is supposed to give me 10 units of work and they gave me five units of work and then died. I could use my checkpoint to sort of go back to an earlier state where that person hasn't given me any work and then and then reboot them and kind of can keep going exactly and and and things everything so the way normally organized it is is just like an organizational hierarchy that you have like bosses or whatever you want to call it like we call it the the actor parent so any actor can be a parent simply by creating workers. And by escalating sort of the... When you receive a notification, a message from one of your workers, then you can choose to do something about it or escalate up, you know. And you can also have this sort of sideway error notification
Starting point is 00:18:20 because it might not be you that created the actor, but you might be dependent on him, right? Or something like that. So then it's also interesting to get a notification that the worker that you're dependent on, or the actor you're dependent on, failed. And you know, you need to encode this type of sort of a cursor or indexing in the protocol, of course, so you know where you were or where you were when you died. But usually, I mean, if you retain that information, that can be sort of relayed up to the parent, and the parent can then sort of kick off yet another worker and resume him where the other one failed, so to speak, if that makes sense.
Starting point is 00:19:06 Yeah, that makes sense. So how does this work with respect to sort of data flow? Like let's say you're like the play framework, for example. I guess all of these actors are communicating with the same database. And so that's sort of, is that the way that they read and write information? Yeah, that's a great question. That sort of leads into one of the features we have in ACA. That's persistent of state is nothing that's sort of encoded in the core actor model. Erlang has their way of doing it. They're essentially using a distributed database called Amnesia, where we put information in and now sort of a key value store, you
Starting point is 00:19:49 can say, a little bit more than that. While we have taken sort of another route where we based our persistence on event logging. So each actor, if you choose it to be persistent, has an event log in which it stores the events that it receives. I don't know if you're familiar with event sourcing versus command sourcing or so. No, I've actually never heard of that, to be honest. So go for it. We don't need to go into those semantic details. But essentially what it does is it simply logs the messages that comes in or the events
Starting point is 00:20:28 it creates representing the state change coming out of that, or receiving that message, actually. It might receive a message. Actors are side-effecting. We haven't said that. They're not purely functional or stuff.
Starting point is 00:20:44 The whole point of actors is to do stuff when they receive messages, so they are side-effecting. And so whenever you receive a message, you have the opportunity to create an event representing the state change that you did for that, as an effect of receiving that message. And then we have a way of logging that event, representing the state change to a persistent event log that is sort of replicated and then fully durable, fully decentralized and all those things. And, you know, and then since all, you know,
Starting point is 00:21:22 all events representing state changes are logged in order as they happen, when an actor fails, what we do is simply replay the event log, bringing the actor up to speed where it was. And the beauty of having this event log is that you can use it for a lot of more things than just bringing up the actor when it failed. You can actually, having other consumers of this event log, for example, replicas that make sure that they sit and read events constantly, so they're always hot. Or for audit purposes, you have a strong audit log, everything that
Starting point is 00:21:59 went on in the system, you can just go in and see. Or for debugging purposes, you can have one actor that essentially only sits and reads events and replicates them to an external system that you can use for replaying things when things go wrong. Much slower, like event after event, as we're debugging the system to find out what went wrong and things like that. So it opens up for a lot of interesting things in having this type of architecture with event logging, I believe. Yeah, that totally makes sense. So what's the connection between the actor model and the reactive programming? So reactive programming, as I understand it, I've used a little bit of Angular and some React.js
Starting point is 00:22:50 and things like that. And basically, in this environment, you have parts of the website are sort of just monitoring for variables to change. So for example, a very simple example, you have some Angular website. And you say at the top, it's going to say hello name. And maybe at the very beginning, you don't, you know, the name is just empty or null or something like that. But then very quickly,
Starting point is 00:23:16 you know, the name gets populated. And then as soon as that happens, the hello name is monitoring that variable. And it says, oh, the name has changed to Jason. So let's put hello Jason. And this all happens so quickly that, and there's probably no rendering along the way that people don't notice that's what's going on. But under the hood, there's a lot of monitoring and triggering. And so that's how I understand Angular. How does that reactive model compare,
Starting point is 00:23:50 or how does that work well with Actor? Yeah, that's a great question. First, I have to say that reactive started to become a sort of quite overloaded word. It means a lot of things. You know, people have one context, become a sort of quite overloaded word. It means a lot of things. People have one context or one way of mapping that or it means one thing to, for example, a web developer, another thing to perhaps a low-level systems programmer, et cetera,
Starting point is 00:24:21 et cetera. And the way I look at it is that I see it as two different categories. The whole family of reactive, I see it as two different things. First, we have reactive programming, and I can briefly explain what that means, at least what I think it is. But we also have what we usually call reactive systems. And I see them as two different things you know and one is for local for programming local things you know while the other one is to model distributed systems distributed community communication and and one is the subset of this i see reactive programming is a subset of reactive system but if we, I can get back to reactive systems,
Starting point is 00:25:05 but if we should start with reactive programming, so, you know, reactive programming is really about sort of, I see it as sort of a variation or a subset of asynchronous programming, you know, where the whole idea is that the availability of new information drives the logic. So you have this sort of data flow graph, you know, that's extremely lazy, that doesn't do anything unless the information is available. And then, you know, and as soon as it's available, it flows downstream, like triggering a bunch of behaviors or
Starting point is 00:25:36 changing data flow of variables and things like that. So it sort of allows you to decompose the problem into multiple sort of discrete steps that are well-defined. And each step can be executed in a fully asynchronous and non-blocking fashion, which is great. Maps work very well to modern hardware like multicore and things like that. And then they can be recomposed to produce a workflow or as I said a graph and it sort of is there and it usually you know reacts to sort of completely unbounded flow of information like a stream of information so there's really no end to it it just reacts to its environment so to speak and reactive programming as I said it's it's sort of usually meant for the way I actually I see it as at least it is that
Starting point is 00:26:23 it's it's for local, it's for local computation. It doesn't really have a distribution model. So it's more event-driven than sort of message-driven. We perhaps don't need to go into all the semantic differences between them, but
Starting point is 00:26:40 the way I view it in short is that event-driven is like you simply just sort of emit events to whoever is interested. You know, it can be zero, it can be 200, it can be a million, it doesn't really matter. And the guy emitting the event really doesn't know. And it all sort of is emitted in a fully local fashion. While message-driven is really about direct communication between parties. So it models direct addressable communication.
Starting point is 00:27:10 You know who you're talking to and why, etc. And that means that you can actually use it to cross address boundaries, while reactive programming is really all within one address boundary. That makes sense. But the different APIs, if we should talk concrete in a programming perspective, so the APIs, you can categorize into two different groups.
Starting point is 00:27:35 One is callback-based. That's sort of this old Node model, or old. It's a lot older than that, of course, but Node popularized that. We have this event loop. It was anonymous side effect and callbacks, or, you know, this sort of reads event from event sources. And, but the other one is a sort of becoming more popular now, is a more declarative model
Starting point is 00:27:59 where you use functions, you use function composition, and, you know, things like map and filter and fold and things like Google buy and you know these things and that's sort of the approach that also most of these like distributed streaming products have sort of started to use. And no surprise I think because I mean I definitely favor the declarative model because it allows for composition, while the callback model is really hard to compose and it's really hard to do error handling with callbacks, etc. You know, the callback hell and all these things. So we've actually seen a lot of interesting products in the of the declarative approach to function to to reactive
Starting point is 00:28:45 programming you know one is is or none of this is new by the way you know but we've seen it you know being being popularized and one is for example futures and promises that's quite an old concept but it's becoming really really popular both in web development as well as systems programming and replication development programming and we have them in aka of course in the uh and the uh and the other one i will retouch upon this sort of streams and and and you can run these streams locally and we we have streams in article protocol aka streams that implements the reactive stream specification that all all runs it locally and it's a way to orchestrate workflow you know know, and do local data processing in a very low latency, high efficient way. But you can also do it in a distributed fashion, you know.
Starting point is 00:29:32 So that sort of sums up my view on reactive programming. I don't know if you want me to touch on reactive systems as well. Yeah, definitely. That would segue really well into the next part. Yeah, go ahead. Yeah, so reactive system, as I said, I mean, it is definitely a superset of reactive programming, but sort of trying to expand those capabilities into a set of good design principles, I'd say, for distributed systems. So we try to encode that in what's called the reactive manifesto.
Starting point is 00:30:10 It's something I started and then later have evolved through contributions by others. But it's essentially to try to come up with a common vocabulary and a set of design principles for building modern systems that are ready for multi-core, for cloud computing, you know, IoT, you know, data streaming, and all of these things. And I always, you know, feel inclined to say that nothing of this is new, you know. These principles, you know, can be traced back to the 70s and 80s, the work by Jim Gray, Pat Hell,
Starting point is 00:30:49 and some of my old-time heroes, and also Joe Armstrong on Erlang. And the foundation for reactive systems is message passing. It's not being event-driven, if you should like tour pure semantics, because as I said, event-driven is all about, it really doesn't have a way of doing addressing. You can't cross address boundaries.
Starting point is 00:31:10 While message passing, it's all about sending a message to someone, to a destination. So it's really the best way to model distributed communication, I believe. And because I think the key here is that message passing, what we build on when we talk about reactive systems is that message passing really creates this sort of temporal boundary between components that allows them to be fully decoupled, you know, and they can be decoupled in time, and this is what allows for concurrency, you know, and they can be decoupled in time, and this is what allows for concurrency, you know, and they can be
Starting point is 00:31:46 decoupled in space also. And this is what allows for distribution and mobility and location transparency as we talked about, this level of indirection and this full isolation that also, you know, paves the way for self-healing system, as system know you know you know recover from failure without affecting other parts of the of the system so so so the reactive manifesto sort of ends up with with the story of of that message passing you know can give you resilience through self-healing systems that is isolated properties of isolation all these things as well as elasticity you know having to have the system to grow and shrink on demand you know to do living up to the promise of cloud computing etc so yeah a number a number of
Starting point is 00:32:30 questions i mean so one one question before i jump into a whole suite of other questions is you talk about the sort of uh orchestration and of of this sort of dynamical system, right? Is there any work on sort of, I wanted to say reactive, but any work of sort of like a really advanced sort of supervision? So for example, every actor type or every actor implementation, something could monitor those and say,
Starting point is 00:33:04 oh, this particular action is very expensive. And I've learned it's expensive because this program has been running for 10 minutes or it ran yesterday or something like that. And so we could plan ahead and say, you know, these sets of actions need to run on their own machine because they're super expensive. These other ones are really lightweight. I mean, my guess is there's probably a lot of, I don't know if it's possible in the context of the JVM, but in general, it sounds like there's a lot of open research around sort of supervising and moving around these actors
Starting point is 00:33:40 to get optimal performance. Absolutely. And we do that to some extent in ACA. And I've always been, you know, long-term, you know, the vision that I have, I mean, it's long-term I'd like to see, you know, fully adaptive systems, you know, leveraging AI and all these things,
Starting point is 00:33:59 you know, but that's sort of far out. So what we have in practice running today is simpler in metrics-based routers, for example. That can sort of act both upon, you know, more based on low-level metrics, you know, latency and throughput and these type of things. And as you say, failure rates and things like that. And we also allow us to sort of tag certain groups of actors as high priority or less priority or give them different roles and things like that. And that can also be taken into account in how you prioritize work across the cluster. But one thing that we haven't said is that, I sort of touched a little bit upon it, but one of the sort of really fascinating things with Actors is that even though we implement it in a statically compiled language like Java, for example, or Scala,
Starting point is 00:35:04 you know, it's actually implemented in Scala. I mean, the actor model by itself, you know, gives you a lot of dynamicity because the actor, when it receives a message, it can sort of redefine its own behavior prior to receiving the next message. So it can completely change the way it behaves. And this means that it can actually turn itself, for example, if it feels like it is overloaded, it can itself turn itself into a router and spin up, you know, like 20 different routers on other machines even
Starting point is 00:35:41 and start relaying. And once the traffic decreases, it just kills them and sort of restores the old behavior doing the work himself. So this is probably a contrived example, but this is stuff that, I don't know if that is that contrived, but things are usually more advanced than this, you know, and we have capabilities that does, that sort of solves this better, but it sort of tries to illustrate the
Starting point is 00:36:09 dynamicity of the actor model, that these actors are fully dynamic in the way that they redefine what they are, really, and as well as being moved around, you know, along with their state, and so that opens for a ton of possibilities. Yeah, I mean, I think it's fascinating.
Starting point is 00:36:29 I mean, I think an actor could try to say, like, is the data, you know, is the IO going to justify the, you know, the computation boost I get from spinning up a bunch of sub-actors? And if not, I'll just do it myself or something like that. Exactly. Exactly.
Starting point is 00:36:48 Exactly. I mean, it's always a trade-off. You know, there's no right answer. It's all contextual. And even if you have all the information, you probably can easily do the wrong call and take the wrong call anyway, you know. So that's why there's that dynamicity in having, like, sort of where there's no fixed topology. You know, the classic way to distribute it, at least the way I learned to build a distributed system,
Starting point is 00:37:13 was that you have to design the topology of the system up front, you know, where things should run, etc. You need to redeploy or reboot things to change. But having this dynamicity means that the topology can evolve and will probably completely change from how you initially deployed it after it had been running for a while. Yeah, that makes sense. So what's the difference between, let's say, the actor model in ACA and something like MPI? Yeah. and something like MPI? Yeah, yeah.
Starting point is 00:37:47 First, I'd say, you know, of course, MPI being native, it can be used for things like, you know, probably a lot more low latency that doesn't have the overhead of the garbage or the unpredictability of the garbage collector, things like that. But also, I personally never used MPI, of the garbage or the impredictability of the garbage collector things like that but but also you know i personally never used mpi but but that's does that map to threads that directly
Starting point is 00:38:10 one-to-one or does it do multiplexing or because that's it could be one of the differences yeah i mean yeah i think i think mpi is is very heavy-handed so i think every every um mpi node yeah definitely needs to be a thread, maybe even a process. I don't know if there's – I think there might be shared memory. So I think each MPI node has to be its own process, if I remember correctly. Yes, I'd say that there's nothing wrong with that model. I mean, there is a certain class of use cases that fits that model by having, you know, essentially one worker, you know, sitting, you know, in the same place, so to speak, with all the caches hot, you know, there's no sort of context switching at all. You can just do all the work that you give it with the lowest latency as possible.
Starting point is 00:39:10 You know, that model is great when you have a limited number of workers and a static number of workers, where you know that you have, for example, 10 workers or five workers, and you just want to hand out work and have them go as fast as they can possibly do. The single writer principle is really a really good guiding principle when it comes to modern hardware. You just keep the cache assault just right as fast as you can and don't let go, because then you might be suspended. While the actor model, you know, it of course can do that, but a lot more inefficient, because it can be sort of, you know, rescheduled on another thread, on another core, you know. So, and it's really hard. There are, of course, native libraries, J9 libraries that tries to solve the problem of pinning threads to cores,
Starting point is 00:40:05 and we can pin actors to threads and things like that. But that sort of violates the idea of the model, because the idea of the model is to have, you know, hundreds of thousands of these things that give you a very different way of programming. And that fits another class of problems, you know. So it's really comparing apples and oranges. For example, if you're on the JVM, there is a great library that implements more of the MPI model that's called the Disruptor. It was created by a guy called Martin Thompson working for LMAX building sort of high-frequency trading exchanges and stuff like that. I'm not sure about the high frequency, but he's been involved in these types of things, but it's actually a stock exchange
Starting point is 00:40:47 with, of course, extremely low latency guarantees and where they have the number of workers fixed and where the actor model would be a really bad fit. But for a large class of problems, especially when we talk about microservices and there's like in general cloud application development, streaming, all these things.
Starting point is 00:41:06 I think Actress is a really good tool to use. Yeah, that makes sense. I mean, I think, I mean, it's hard to know what something is capable of, but it's easier to talk about how it's typically used. And the way MPI is typically used is, as you said, you know, one MPI node per machine usually, and, you know, very limited passing of data back and forth. A lot of it is really kind of done by hand. And yeah, I think the biggest thing is it, it sounds like with the actor model, you can kind of build it on one machine, and then be kind of confident that it will scale out whereas with mpi um it's it's
Starting point is 00:41:47 much more tailored to you know your specific your your hardware and the specific environment what about what about something like a spark or hadoop or something like that like one of these um i don't know what you would you would call like, kind of like a big data ETL type library. What would be the trade-offs? Why would somebody use, let's say, Spark instead of Akka or Akka instead of Spark or something like that? That's a great question.
Starting point is 00:42:17 And I think it composes pretty well. First, you can definitely use Akka to build something like that. For example, Flink. I don't know if you know that. That's built right on top of Acca. You're using all of these features that we talked about. And, you know, so it's more of a low-level programming model
Starting point is 00:42:38 when you only talk about the actor part of Acca. But what we have added is a lot of things on top. We have ACCA cluster, for example, that does sort of peer-to-peer gossip-based clustering, similar to Dynamo or Coord, or I don't know what you're familiar with in terms of research papers, but, you know, similar to Cassandra, you know, masterless, fully decentralized clustering. And another tool that sort of ties into the streaming, as I said, is the Akka Streams library that gives you a great toolbox to do quite advanced streaming, you know, fanning out, fanning in, doing data processing
Starting point is 00:43:27 and things like that. But we have chosen to only support local processing, focusing more on very, very low latency and high throughput. So if you want to do distributed stream processing, of course you can use something like Akka, but you will have to build a lot of things yourself. You can perhaps stitch together. We actually have support for that called StreamRest where you can sort of stitch together
Starting point is 00:43:55 Akka streams nodes, local Akka streams nodes across the cluster. And for simple uses, that's great. And that's why we built it for customers that love that model but only want to be able to scale up a little bit. But if you have big needs for doing fast data processing, then you should definitely use the tools tailored for that. And then Spark streaming or Flink or something like Google Cloud Dataflow are great models for that. Of course, I mean, they compose. You can have actors as being the services to the end point, the stateful end points,
Starting point is 00:44:33 receiving data from external systems or being the application point for microservices at the end points of the streaming pipeline. And so they absolutely compose nicely. That makes sense. One question about the actor model. Can the, do the actors, so one thing that I think makes this a little different from something like Ray or one of these other systems
Starting point is 00:45:04 is I think the actors can actually send data before they have sort of terminated so most of the time when you have these kind of systems at least i kind of think of it as sort of this functional thing where i send inputs to some process and i get back outputs and so that can fan out however it does but with the actor model something could be running maybe even perpetually and it's it's it's sort of a like a um like a like a permanent thing that's getting data sending data you're getting messages sending messages back but it doesn't have to be just sent back on termination i feel like that opens a lot of opportunity yeah it's really good
Starting point is 00:45:45 that you that you point point that out i should have said it in that you know that one of the sort of the area where actors sort of shine is that it's is in being long-lived addressable stateful uh objects in long-lived meaning that they outlive the context, application context, or scope, stuff like that. And addressable means that you have a stable reference that you can always send a message to them.
Starting point is 00:46:16 And stateful is probably the most important thing, how it distinguishes from a lot of these other concurrency constructs, like most data flows, and they are stateful within, but they terminate and then it's the end of that. Or futures that have one value, but they're not really stateful in the sense that they
Starting point is 00:46:38 are long lived addressable stateful things. So that's really how I should use them. If you only have a need for stateless, you know, essential stateless data processing, then I would not use actors. There are better tools for that. I will use some sort of stream processing, some sort of data flow graph, or rely on futures, or future promises, or something like that even for local computation i wouldn't rely on actors but as soon as you have this exact need that you point out then then actors are are extremely handy so how does that work when you say they're addressable like is there almost like a like a dns type thing going on when you start when you start ACCA system where you can say, give me worker 23 or something
Starting point is 00:47:27 like that. I want to send them a message. We'll figure out a way to do that. Yeah, exactly. I mean, first, the way it works in ACCA is that if you just use ACCA actors, it doesn't give you any DNS capabilities. But it all starts from the top level actor that creates actor, and then you need to make sure that you pass along the references that each actor needs to have,
Starting point is 00:47:54 as well as when an actor uses another actor, its reference, its actor rep, the handle is passed along. So you can just store away that and communicate back, etc. So that's how you can populate things. But that's, of course, very limiting. So one of the features with Acker cluster that adds, you know, the clustering capabilities is what we call sharding, Acker cluster sharding, and that essentially gives you what you asked for. You have this, essentially the address is where actors are gossiped around, and that can change.
Starting point is 00:48:34 You can have consistent hashing, consistently hand out actors across the node ring, the set of distributed nodes in the cluster. And when nodes leave and go, that needs to be repartitioned. And actors will then be reallocated or moved around, and that information, or the news of DNS or addressable information will be gossiped around so everyone has the has the latest news to speak it sounds like the the whole the problem of you know i need a shared let's just say a key value store uh we could we could treat dns as a key value store like i have this sort of shared key value store that needs to be replicated among all the actors that that sounds like a hard problem i mean it sounds like i mean there's there's there's a Like I have this sort of shared key value store that needs to be replicated among all the actors.
Starting point is 00:49:26 That sounds like a hard problem. I mean, it sounds like there's conflicts that could happen. Someone could just find out about an actor and that actor's already dead or something like that. It almost seems like there's a lot of complexity around having a consistent key value store that's gossiped among all the actors like how does that actually work absolutely you're right there and and and you know we we rely on on on on on both like old and recent research research there as i said you know this epidemic gossiping is based on on also not reasonably old papers now. And the same thing as the failure detection algorithms and things like that. But when it comes to replicating of the state, we rely first on vector clocks, which is a quite old thing invented by Leslie
Starting point is 00:50:22 Lamport back in the day. I don't remember. I think so. Yeah was it Leslie Lamport back in the day? No, perhaps, yeah, I don't remember. I think so, yeah, he invented Lamport clocks and then someone else invented vector clocks and stuff like that. But anyway, that's quite old, but we also, you know, rely on quite recent research when it comes to disseminate a sort of state, and that is something called CR crdt so a conflict tree replicated data types which is quite recent research which actually you know vector clock
Starting point is 00:50:51 is actually is the crdt but the crdt sort of generalizes that by giving you a way of of expressing state that sort of in a fully monotonic increasing fashion with a merge function that you know that you could always merge you know so it will always it sounds similar sounds similar to like this operational transform type stuff where you can you can forward any of these ideas it's it's it's it's very similar with sort of I think it was parallel research this year DDT was, I think, started by Mark Shapiro at Microsoft, while this Operation Transform was at Google, right? So I think it was sort of done more or less in parallel.
Starting point is 00:51:36 But CRDTs have sort of exploded in the research arena after the initial papers. And there's now models for modeling CRDTs that are not just, you know, the simple things like registers and counters, but also things like maps and sets and even, to some extent, graphs. So you can model quite rich data structure as long as you adhere to the rules, you know, and be sure that they are they are eventually consistent for sure they
Starting point is 00:52:06 are strongly consistent but you can be guaranteed that they will always converge eventually you know so you have strong eventual consistency that makes sense yeah so just to to sort of tie it back uh to to something everyone knows if you talk about like uh google docs so you have a google doc you're editing it your friend is editing it and let's say you both go to the same cursor and you both hit delete at exactly the same time to delete some character. Now what's going to happen is one of you is going to arrive first, and that delete is going to take effect. Then the second person shows up with the delete, and basically, to sort of simplify this, there'll be some bookkeeping going on. So we know that when that person hit the delete key, we know sort of their state when they did that. And so the system can sort of modernize that delete or bring it forward in time. And when it does that, it will encounter the previous delete.
Starting point is 00:53:01 And it will say, oh, this person didn't really intend to hit delete after that delete they intended to really do the same thing and so we'll just we'll just you know not execute that second one or we'll figure out some way to amend it you know if someone deletes a letter at exactly the same time another person deletes the entire line then we'll just delete the whole line um but all of that kind of, you know, it's almost like a git rebase or something like that, but you can't rely on a person to do the merge. You have to come up with a set of rules
Starting point is 00:53:32 that can be executed autonomously. Right, right. And you know, these sort of primitive ways of doing that is like last write, win, stuff like that is usually not sufficient because you will have data loss. And a lot of key value stores implement it like that, which is quite fascinating, while others rely on things like vector clocks and even CRDTs to actually do proper
Starting point is 00:53:58 merge without discarding data. Cool, that makes sense. So as far as recovering from node failure, is that something that kind of ACA pushes on the user? So I guess ACA provides the user with a notification of a failure and then the developer has to figure out sort of how to reconcile that? No, that's not the way it works. So if the user if you move, it's a user
Starting point is 00:54:26 if you then mean the user of one of the actors. As I explained earlier you have this level of interaction. You have the actor that you talk to and you never really see or touch or know much about the
Starting point is 00:54:42 actor unless you use monitoring software of course which you should. From a programming perspective of course you can subscribe to a set events on how the actor is doing but you don't have to do that but but since the responsibility of recovering from runtime and and you know it's on on node fail or or or do what what was a specific question you asked there yes yeah no it's fine so specific so if an actor fails then i guess the system will spin up a copy but then it also has to notify. Actually, that's an interesting thing. Like if an actor fails, we have to know like who is impacted by that because it could actually be anybody potentially.
Starting point is 00:55:34 So I guess potentially everyone has to be notified of that. Yeah, everyone that has, you know, essentially the way it works is that, you know, it can be any number of actor refs, references to that actor. And if an actor fails, it can be, of course, restarted on the same machine very quickly. But the bigger problem and the more common problem, or perhaps not more common problem, one is user error normally or running out of memory or something like that. But the more interesting problem is probably on node failure, where the whole machine goes down. And, you know, what's happening then is that we rely on ACA cluster there to do failure detection, you know, and that's of course
Starting point is 00:56:27 a really hard problem because it's very easy to have like false positives. I mean, you might think that the actor or that the node that you're communicating with, we do some heart beating, you know, pinging around and heartbeats might be delayed for various reasons. It might not be that the node failed, it might be delayed for various reasons. It might not be that the node failed, it might be that it's just doing garbage collection because we're on the JVM, so it's just really really slow and it's busy. Or it might just be overloaded with user requests. Or there might be a temporary network glitch, so the node is still alive.
Starting point is 00:57:01 And that's really hard to do. So we have heuristics and we have ways to define certain thresholds for these heuristics. But if based on these heuristics we have to decide that node is down, then we have we have uh they have to do different algorithms in order to how to how to resolve resolve that it can be that we're sort of basing it on first i have to say you know that the problem might in most cases is is a problem of split brain you know that you actually have to it's actually only a network disconnect you you don't know but it might be that it's only a network disconnect. You don't know, but it might be that it's only a network disconnect or splitting up the cluster into two
Starting point is 00:57:49 different halves. Then you have another problem, which is which side of the data center should you let keep running? Because if you only spin up all actors on both sides,
Starting point is 00:58:08 you know, thinking that the other half is down, then you run into the problem that you have a duplication, you know, and you can run into all kinds of data consistencies, right? So you need some sort of intelligence here to do a good, sort of take a good decision. Like one of the half of the cluster then needs to decide, I'm out, you know, and the other half won, or vice versa. And there are different algorithms for that
Starting point is 00:58:32 and we don't need to go into specifics, right? But they're all based on your needs for the use case, you know. One example might be that there is one critical actor that you absolutely need for the system to function to be on one side. And then, of course, that system wins, even though you only have two actors there, or two nodes there, and 200 on the other one. Sorry, it's really bad luck. Or it's like simple majority wins. The majority of nodes wins while the smaller cluster has to reboot or hold.
Starting point is 00:59:06 So this is really a hard problem, but once you've detected that, it's really about resuming the actors on the healthy nodes, you know, and repartition the cluster to have sort of a balance to sort of allocation of actors on the nodes that are still running. And also sort of gossip around the new address information. So all the actor refs in the cluster, meaning all clients, can start using the actors on the new locations. And from the user of these actors, they should never find out, apart from the latency, of course, it takes for this whole process to happen.
Starting point is 00:59:53 It seems like to sort of program in a defensive way, it seems really important to segment the data you're receiving from each actor so like for example a degenerate example is where you spin up a bunch of actors and they let's say they send messages to you and your job is to just concatenate all of these messages or accumulate all of these messages right maybe they're sending back numbers and you're just adding them up. And so you've added up, you're up to, you know, 1027 and then an actor dies. Well, now you're kind of in trouble because you don't necessarily know the contribute.
Starting point is 01:00:32 You can't separate the contribution of that actor from the other ones. And so even if, unless that one happens to, unless you have some way of restarting that actor at exactly the right spot, you kind of, you then that that death needs to sort of cascade upwards because you're inconsistent right so it seems like uh people have to program in a way where they're uh you know keeping track of who said what and uh that way they uh they don't end up in the situation where they can't recover right yeah the way the way it's usually solved is by is by using event logging.
Starting point is 01:01:06 Then the actor doesn't need himself to keep track of that. But, you know, whatever made it to the actor, or more, whatever the actor has actually done, you know, is persisted. So he knows exactly where he was when he died, because he can just like bring, you know, replay the log, bring him up to speed and continue to take more requests. While the sender of those, I mean, of course, knows where he was, where he sent because he didn't get an act of the message. And by the way, we have support for that, guaranteed delivery as well through the replay, so resending the message, deduplication, and at least once
Starting point is 01:01:52 delivery as well. So I think we have more or less the whole chain covered if you layer in these things. This is also one of the core philosophies of ACA, is that the bare-bones actor don't have any guarantees. You know, it's far forget. You're on your own. And that's simply because that's the most performant. That's the least expensive. And some might want that.
Starting point is 01:02:20 So why should you pay for more than you absolutely need? But we then have, you know, layers that you can sort of layer in, in terms of reliability. For example, when it comes to communication, you essentially just use a mix-in or a trade call that at least wants delivery, that does what it says, at least it wants delivery, deduplication and retransmission of messages, etc. So give that reliability when it comes to the communication. That, of course, costs a lot. Well, not a lot, but it costs what it costs, of course.
Starting point is 01:02:56 And you don't need to write it yourself. And when it comes to, you know, on the consumer side, we, as I said, allow you to layer in event logging, which, of course, also has a cost because it needs to go down to disk and commit every message. And in order to do that in a strong, consistent fashion, it needs to wait until you have committed the message down to a reliable medium, which is pluggable, by the way, so you can plug in almost anything you like. But that also has a cost, you know,
Starting point is 01:03:28 but you can choose then to layer in the guarantees you need when you need it. So what about, I mean, now there's this, I mean, like the whole blockchain thing is getting really popular. There's a lot of like really interesting distributed technologies that are coming out that are distributed sort of, I guess, over the the public sphere i guess is a way to say it but you have things
Starting point is 01:03:49 like web torrent and things like that and in this case you know there's um many uh many clients can't actually reach each other like many physical nodes can't communicate with each other um but you're relying on sort of the whole system to have some sort of graph that is connected, right? And so can Akka work over something like that? Like, does it have NAT punch through? And, you know, does the design extensible to that? Or is it mostly for sort of clusters where the nodes are able to communicate with each other pretty reliably and all of that? First, I've said that there are blockchain implementations on Arca, and I haven't used them myself. I can't talk about the quality of them.
Starting point is 01:04:47 But nothing prevents you from... The Actimol by itself, I think, lends itself very well to these types of distributed problems because it's all just stateful nodes and efficient communication between them. But the way we've looked at ACA and the implementation of ACA is that we've always said that it should ideally be used in a trusted environment, thanks to security problems and things like that.
Starting point is 01:05:17 You can tunnel it over TLS and things like that. And we have support for these type of security guarantees, but it's not really meant for that type of large, super world-scale systems. If all you used were stateless actors, then it would probably work quite well. But where it becomes tricky is with the stateful part. Because if you want strong consistency, you can only have one actor in charge and you need a full replication of that state across everywhere, the whole world, etc. And it's not really meant for that. Then you need something like blockchain that's meant for being shared in a distributed fashion, in a fully reliable fashion. But marrying the two, having the model of communication from actors or from AR and having into the state being blockchains
Starting point is 01:06:26 i mean i mean that could absolutely work uh but i haven't explored it but conceptually it would cool that makes sense so so you uh founded a company light bend is that correct yeah it's a company that we i you know when i when i Akka by 2009, I had no intention to really start a company, but it immediately became extremely popular and I realized I had to start doing consultancy on it and, you know, Acke grew out of the Scala community, so I knew a lot of people there who went to conferences, and I met with Mark Noderski, the creator of Scala. And so he and I sort of realized that, yeah, we should do something together. Acke's built on Scala, and Scala's really getting traction. Acke's getting traction. It could be interesting to form a company together. So we did that and we launched it in 2011 and we later then added the Play framework
Starting point is 01:07:31 and a bunch of tools around it and the last year we've been working on a fast data platform for distributed streaming, making sense in the streaming jungle, and things like that. So it's been quite a ride. Cool. So I guess 2009, 2011, and, yeah, I think Scala was really big, I think, at Twitter. Twitter was really pushing Scala. Who are the other sort of really heavy hitters that are using Scala and Play and things like that? Yeah, that's a great question.
Starting point is 01:08:10 I mean, a lot of the heavy hitters, unfortunately, we can't talk much about. You know, that's always the case. But, you know, but a lot of investment banks and, you know, most banks on Wall Street are heavily invested in Scala. Oh, I didn't know that but but but also you know you know linkedin for example and and and uh yeah um yeah a lot of the different differences like like uh social media and also retail you know there's there's a ton of old clients i i i sort of, yeah. Cool. I don't exactly know who I can mention, you know, without saying.
Starting point is 01:08:49 Yeah, that makes sense. I'm hesitant. I can sort of list along, but then I might say something that I shouldn't say. Yeah, that totally makes sense. But absolutely, there's a ton of, you know, because it is, even the Scala course of a source, it is this intersection between what is just a user and what is a client. But, you know, scholars really have been taking off immensely. I started using it in 2006, I think.
Starting point is 01:09:15 It was quite early. And, you know, back then, I remember the first conference, it was in 2010. It was like a small group meeting at EPFL. And it's been really growing immensely after after that and and yeah actually i do some some scholar mostly spark at my job i think it's fantastic i mean it's absolutely phenomenal um sometimes i i uh see some people using operators in a really confusing way and so i have to like, take a moment to say, okay, what is this? What is this operator doing? But overall, you know, the freedom
Starting point is 01:09:50 it gives you, I'm a big fan of languages that at least have the option of typing, you know, type safety. And the Spark, the Spark framework is absolutely fantastic. And so yeah, i'm a big fan but i probably started using it um around 2013 2014 so i've only been using it for a few years yeah i'm glad to hear that you like it i think i think for the use case that that you're talking about i think it's it's it's it's worth one where it really really shines because you know because of the functional side of Scala. And the things we talked about for reactive programming initially in this chat, having sort of first-class combinators, like map and filter and fold, and having them easily composed. These are things that's hard with a language like Java that doesn't have
Starting point is 01:10:45 first-class functions. I think that's why we see a lot of these companies like Spark and Flink and Kafka. It's also written in Scala. Oh, I didn't know that. Yeah. I've done also Java
Starting point is 01:11:01 Hadoop. It's not pretty. It's like extract data function, but it's actually a class and it has a function called run or something like that. It becomes very verbose because everything has to be a class. With Scala, it feels much more native.
Starting point is 01:11:21 I mean, you can add two columns of a data set without having to create a class. I mean, it seems, in hindsight, it's very obvious that's the right design pattern, you know. Exactly, yeah, yeah. And just have, you know, first-class closure or something like that makes it look, makes it so much easier.
Starting point is 01:11:41 And that said, you know, Java added Lambdas, you know, so it's a little bit easier in Java now than it used to be. But, you know, lambdas are not functions, you know, in the true sense. So I think they miss out a lot on that. Yeah, that makes sense. So what about like the company LightBend? so it started out with you and and the the the founder or creator of scala and have you added a lot of people since then or is it are you trying to keep you pretty focused or what yeah we've been growing slowly i'd say i mean we've been around for you know for a few for a few for a few seven years now and i we been sort of, we're up to around 140 people now which is
Starting point is 01:12:27 yeah, we're sort of pretty sized, well-sized engineering organization and sales and marketing and stuff like that. So all it takes to run a real company, when you start the startup, we were just seven guys, not in the garage, but at our homes. We started very distributed and we're still very distributed, which is really challenging, but we sort of try to grow slowly and only grow where we absolutely need. When it comes to being remote We're where I think we're around 20 countries now and we're almost at all continents, you know When they like Asia, you know both both Australia and New Zealand were in Africa We're South America us Europe. We're across the whole
Starting point is 01:13:21 All the whole world which is you know know it's true super exciting and fun but it's also very challenging when it comes to you know communication and meetings and all those things yeah i mean not to you know like go off on a tangent but uh this is something that that day-to-day you know we've been uh i've been talking about a lot um you're dealing with sort of time zone and and sort of being able to sort of franchise in a sense the the the organization right and so how do you deal with the fact that there's people all over the world they're all waking up going to sleep at different times like do you use like slack or something like that i mean what how do you sort of keep
Starting point is 01:14:02 consistency there when everyone's you know know, different parts of the world? Yeah, that's a great question. Everyone, you know, we don't have it all figured out. You sort of learn as you go. But sort of one of the guiding principles that I've had, you know, when building the company from the start is that, you know, that there is no such thing like remote employees. It's only distributed teams. And the distinction I make here is that it's really, really hard
Starting point is 01:14:34 to have a team co-located and one or two guys remote because that means that they talk over coffee and over lunch and things like that, and then they forget to pass that information on. And it becomes very split up and divided, and it's really, really hard on the people that are then remote. But if you talk about fully distributed teams, that means that every single one is remote. And even if they do happen to sit in the same office, they can't communicate. Of course, they can't communicate. Of course, they can talk. But if there is important information, it has to be communicated over some sort of textual form or perhaps send a screencast around or something like that.
Starting point is 01:15:16 But it's mainly text. So having this distinction that for a team, we have had teams that have been fully co-located, and then that's fine, but then everyone needs to be there, you know. Else it's a fully distributed team regardless of where people are. And having that mentality sort of helps. But it also makes it, you know, makes it hard, you know, when it comes to, makes it tedious, you know, having to write everything down. It takes more time to document everything thoroughly and things like that. And also, you know, it might be harder to do meetings. But one of the guiding principles that we've had in the past that we actually had to go away from a little bit now,
Starting point is 01:16:01 but that we tried to do is to keep all teams in the same time zone, because that means that they can hang out on Slack and things like that in real time and don't have to wake up and catch up on 200 messages. It's a really hard time figuring out the context. But as I said, we haven't managed to keep that for every team. But for example, the ACA team is fully distributed, but all in one, or it might be two time zones. Yeah, that makes sense. So that's something that also helps. Yeah.
Starting point is 01:16:37 But as you know, the tools, it's mainly Slack and email and Skype Hangouts or Zoom that we use now. So it's just, yeah, nothing fancy. Cool. That makes sense. Yeah, actually, I ran into somebody whose company had the exact opposite philosophy. They wanted each team to be distributed. And let's just say it doesn't work. Like that idea just is not a good idea. I think what you proposed is actually, I agree 100%.
Starting point is 01:17:05 I think it's very isolating to have, and it's never going to be balanced. So it's, I mean, it's, the odds are not that it's, the odds are high that it's going to be, as you said, one or two people out of 10, you know, across the globe. And the other eight are on the other side of the globe. So it doesn't really work. But yeah, once you go to the team level, yeah, yeah, exactly. Cool. So if someone is, um, so we have listeners all over the world. If someone is, um, you know, just, uh, in university, um, they have a degree in, let's say computer science or electrical engineering or something like that. And they're interested in, you know, and they're interested in a career at Lightbin.
Starting point is 01:17:49 So what opportunities are there? We're actually surprisingly, a number of people have reached out to us. And so they literally ended up finding jobs with people we interviewed, which I was pretty shocked. I didn't really expect that, but it turns out this is actually a really good medium for people who are, especially in university, but in general, just engineers. And so what sort of opportunities do you have at Lightbend?
Starting point is 01:18:15 And what does that look like? Yeah, we're very interested in people coming straight out of the university. They haven't been damaged yet. No, just kidding. But it's usually, you know, we've been hiring a lot of people right from the university or very close to coming out of the university at least. And, you know, and I think that, you know,
Starting point is 01:18:42 if you're interested in working on, you know, hard distributed systems type of things, multi-core concurrency related things, cloud computing, these type of things, streaming, and this whole thing with fast data and also machine learning, if those things interest you, you should absolutely apply. And since we are so distributed, if you're the right guy, you know, I mean, then we'll hire you wherever you are, more or less, because we have teams that are across all time zones.
Starting point is 01:19:15 All the way, you know, from Japan, you know, and, you know, far out in Asia, you know, Australia, New Zealand, all the way down to South America. And then, you know, the west coast of the U.S. So it's all over the place. Cool. Great. So what's an average day like for you or for, you know, an employee? If your day is really atypical and crazy, what's an average day like for someone who works at Lightpin? Yeah, I think it's different for different people.
Starting point is 01:19:52 You know, when it comes to me, I mean, me personally, I mainly work with colleagues in the U.S. So for me, I have meetings, you know, starting from 4 in the afternoon all the way up to 9, 10, 11 sometimes. So, but that's nice, you know, I can get a lot of, you know, sort of silent space, silent time and, you know, time for myself, you know, thinking, working, all that stuff, without being that interrupted in the day, and then I, you know, can be more social and discuss things in the afternoons.
Starting point is 01:20:28 One thing also that I have to mention when it comes to the average engineer's schedule is that we have a notion of roles. We do all the support ourselves. This is one of the things that our customers love, that we don't have a support organization per se, but it's actually the teams. You know, if you have a problem with ACA, it's someone from the ACA team that helps solving that.
Starting point is 01:20:54 That's sort of quite challenging, you know, taking on that role or that hat, so to speak, but also quite fun, you know. It can be fun to sort of see how people are using your software and helping them with the stuff that you built last week. That makes sense. Most engineers are juggling these two roles, developing and doing support.
Starting point is 01:21:24 Usually on a few days doing support and then and then you know a couple of weeks hacking and then back back and back to support and things like that so so cool that makes sense yeah i mean i think it's very hard to build something like this in in a vacuum right so i mean the best ideas are going to come from those discussions so exactly i think i think i mean without without them and without our open source community i have to say not just our customers you know but that's without the passionate community that we've had we we would never get them to get it this far i wouldn't be you know wouldn't even exist i think but wouldn't
Starting point is 01:21:59 be where it is even without the the open source. We've been getting so much from that. So many passionate people talking about it, encouraging things, but also rolling up their sleeves and actually sending in patches. And it's really an extremely good example of what can, what can mean on the good side of humanity, what can be achieved with people across all cultures and actually
Starting point is 01:22:29 collaborate into doing something substantial I think it's quite heartening to see actually. Cool, yeah that sounds absolutely amazing so yeah, Jonas it was absolutely amazing having you on the show, I actually learned a lot, this is one of the most educational episodes for me personally.
Starting point is 01:22:49 Thank you very much. I appreciate it. I think people are absolutely going to love it. Can you give us some sort of what's the best way to reach you or reach Lightbend? So what serves like some good, calm communication information there? Yeah, I think the best way to reach me is probably on Twitter. I'm jboner, J-B-O-N-E-R on Twitter. Just reach out. And else, you know, Jonas at LifeBand, if you want to send me emails,
Starting point is 01:23:18 lifeband.com. If you want to find out, you know, I have my personal website, jonasboner.com, but if you want to find out more about LightBin, just go to lightbin.com or aka.io, if you want to learn more about Aka. We have a ton of material, both on the Aka website and also on LightBin, all kinds of this stuff, webinars, recorded webinars and articles and all kinds of stuff. Very good. All of it is all of it is totally free to use right so basically you're the business model here
Starting point is 01:23:50 is that people can do anything they want with Aka college projects you know even commercial projects but then you're there to help them if if they need some extra functionality or they get stuck or something like that absolutely it's just open core model where everything you you know, in the core, you know, ACA, Play, Scala, and supporting tools, you know, are open source. And we try to help the community as much as our customers. But then we also have, you know, commercial tools on top, you know, things like monetary management and we have a full, as I talk about
Starting point is 01:24:26 the fast data platform is also commercial but everything that we covered in this podcast, in this call is fully open source and we're here to help. Cool, thank you so much again and
Starting point is 01:24:42 everyone, let us know what you think about the episode. Feel free to chat on the Discord. If you so much again. And yeah, everyone, let us know what you think about the episode. Feel free to chat on the Discord. If you have any questions, feel free to at both of us, Programming Throwdown at and jbonair at on Twitter or ask on the Discord and I can pass it along. But thanks again for this interview. It's fantastic. Yes, thanks, John.
Starting point is 01:25:06 That was awesome. Thanks a lot for having me. I really enjoyed chatting with you guys as well. So it was definitely mutually beneficial. The intro music is Axo by Binar Pilot. Programming Throwdown is distributed under a Creative Commons Attribution Sharealike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you must provide attribution to Patrick and I and sharealike in kind.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.