CoRecursive: Coding Stories - Tech Talk: Big Ball Of Mud

Starting point is 00:00:00 Welcome to Code Recursive, where we bring you discussions with thought leaders in the world of software development. I am Adam, your host. If you're anything like me, then learning how to build software in a sustainable way, a way where you don't continually build up technical debt, and have development slow down as the project gets more complex, has been a career-long struggle. Big Ball of Mud is the title of a paper presented in 1997 at PLOP, Pattern Languages of Programs Conference, and I think it's super interesting. The researchers went out into the field to see what software architectures

Starting point is 00:00:51 were being used in industry. And Big Ball of Mud is what they found, along with six other patterns with names like Sweep It Under the Rug and Reconstruction, which is the throw it all away and build it again and hope it's better the next time pattern. Anyhow, I think this is a hard problem. Evolving software under constrained resources is always going to be a challenge.

Starting point is 00:01:19 And we kid ourselves when we don't admit that it's hard. Today, I talked to Wade Waldron about how to avoid this situation or how to recover from it. If you like the show, spread the word, tell a friend, leave an iTunes review, or follow us on Twitter. If you're listening in your web browser on the website, subscribe to the podcast for a much better experience. Wade, thank you for joining me on the podcast. I'm glad to be here. So if you were at a dinner party, what would you tell somebody you did for a living? I'd probably try to avoid that question. think it's uh it's usually a little awkward

Starting point is 00:02:06 to explain that but um i guess when i do get asked that question i i usually tell people i'm a software consultant um i'm not sure that necessarily uh explains things very well but i guess i guess that's the answer they get and i i actually got the request to interview you from a listener. And I started to dig into these courses you have on the reactive architecture, and I found them to be very interesting. So I'd like to start with this question. What's a big ball of mud? So, I mean, it's an interesting question. One of the challenges, I think, with that particular question and one of the challenges with that term is sometimes it'll get kind of get people's backs up, I guess, because they hear me talk about a big ball of mud and they say, no, no, no, that's not how I build a system. And, you know, so I think it's important when I talk about a big ball of mud like that, that I establish first off that this is, I consider

Starting point is 00:03:07 this a worst case scenario. I do not by any means consider this to be the general case. So when I talk about a big ball of mud, what I'm talking about is usually a system that has been built in a monolithic way. So it's been built as a single application rather than a series of microservices, for example. But I'm not talking about every monolith. Not every monolith is a big ball of mud. There are plenty of monoliths that are probably extremely well-designed and extremely resilient and extremely robust. However, I have worked with monoliths specifically where they were not really well designed and were not very robust. And so in those cases, those monoliths, those particular systems were built in such a way that there was no clear separation between dependencies. So you had, you know, every piece of the system was depending on every other piece of the system, either directly through the code or sometimes through the database. And it's very difficult in those cases to sort of unravel where the clear

Starting point is 00:04:10 system boundaries are. Instead, what you end up with this big ball of mud where it's all sort of one cohesive blob and you can't really separate it into smaller pieces. If I have a big ball of mud, what should I do next? Um, well, I mean, you know, some of that depends on, uh, you know, what your goals are and what, uh, that big ball of mud is doing for you. Um, what I would suggest not doing is throwing away the big ball of mud and starting over. Uh, you know, that's something that I think is often tempting. You look at this big ball of mud and you don't know where to start and you say, well, you know, let's just build a new thing and replace the old thing. And then usually two or three years into that project, you realize that you didn't really understand that the old thing to begin with and now what you've got is a

Starting point is 00:04:57 big ball of smaller balls of mud or something like that. So what I would usually recommend in that case is to start looking for pieces of that big ball of mud that might be easier to isolate. So find a section of the code that is maybe not quite as intertwined as other sections and start looking for ways that you can disentangle that piece of code. What I would usually do is I would start by extracting that out into sort of a separate library or something like that,

Starting point is 00:05:30 that you can then basically remove or control the references to that library a little bit better. Once you have things sort of separated like that, then you can start looking at, okay, now how can I take this thing and potentially move it out of the ball of mud completely? How do I move it into a separate microservice or something like that? But first you have to start by finding that piece that isn't so tightly coupled to everything that the moment you try to move it, it's going to break. And that could be a challenge.

Starting point is 00:05:57 That's not always an easy thing to do, but you got to start somewhere. The other thing I would highly recommend is if you are in the situation where you have that big ball of mud, don't make it bigger. So when somebody comes in and says, hey, we need this new thing, don't just jump in and start putting it into the big ball of mud. Look at the possibility of, okay, this is a new thing. Can we build this new thing separate from the ball of mud so that at the very least we're not making the problem any bigger uh we're we're maintaining uh we're maintaining the existing problem but moving things uh in a way that encapsulates them and isolates them better yeah that's a great suggestion i think that

Starting point is 00:06:36 in my experience the tricky part with that is sort of what if the piece isn't that cohesive like they want this new thing but it kind of relates to what's there. Is there a way to, I don't know, is there a way to do this without just making that new thing like actually dependent on the monolith, but maybe, you know, across process boundaries, but it's still tied to it? There are, and it's something I have done in the past. Depending on the, I guess, maturity of the development team, you know, how familiar they are with different techniques and things like that, it may not be a problem. Also depends a little on what kind of infrastructure you have in place.

Starting point is 00:07:20 But for example, if I use a concrete case that I did, we had an application which was a big ball of mud. And we wanted to introduce, in this case, it was a rewards program for the company that I was working with at the time. And with that rewards program, we needed a lot of information that a little easier is we built that rewards system as a separate component. And then what we did is we went into the existing monolithic application and we found the places where the information we needed was being recorded. We kind of located and isolated those particular pieces of the application. And then we made the application essentially emit an event. And this is something that comes from event-driven architecture, which obviously we haven't talked about today, but it's something that reactive systems quite often focus on.

Starting point is 00:08:19 And what you do is you make that piece of the monolith emit an event, which can then be consumed later on by your new piece of the monolith emit an event, which can then be consumed later on by your new piece of the system. And so that event can be consumed. You can create your own sort of internal representation of the data as necessary based on what's coming out of that event. And so now you don't have to talk directly to the monolith. Instead, you indirectly consume these events. Those events are probably broadcast through a tool like Kafka or Kinesis or RabbitMQ or something like that. And so you consume those events and build up your own model based on that. And that's one way that you can separate yourself from the monolith.

Starting point is 00:08:58 And what's the advantage of emitting events as opposed to, I don't know, REST calls or something like that? So emitting events is using asynchronous messaging generally. And that tends to be a little bit more robust for a few different reasons. One of the reasons that it tends to be more robust is because if you emit an event, an event is something that you consume asynchronously. And as a result of that, it means that you don't necessarily require all pieces of your system to be active and functional at the same time. So for example, in the rewards example that I was giving, if the reward service was down, there's nothing stopping me from emitting that event anyway, right? So I emit the event,

Starting point is 00:09:45 even though the reward system is down. Now, when the reward system comes back up, I can just consume that event, even though I wasn't alive when the event was emitted. On the flip side, if the system that is emitting that event goes down, that doesn't stop the reward system from continuing to serve any requests that it has to serve. It also doesn't prevent it from consuming any events that were already omitted. So that's one aspect of it that I think is beneficial. It actually allows for more flexibility in terms of what's active and what isn't active. There's other things, too.

Starting point is 00:10:20 You know, again, because it's asynchronous, it means you become more decoupled in time. And so, you know, that has its own set of advantages. You're not expecting something to happen right now. You're expecting it to happen eventually. If you expect something to happen right now, then again, we go back into the failure scenario where if something fails for some reason and you need it right now, you have no choice but to fail the whole process. On the other hand, if something fails and you need it eventually now, you have no choice but to fail the whole process. On the other

Starting point is 00:10:45 hand, if something fails and you need it eventually, well, now you have other options that you can take. You can do retries. You can wait for the information to become available. You don't have to deal with it. You don't have to deal with that problem right now. So again, I think what it does is it allows the system to become more robust and less brittle over time. So that is sort of isolating one service from the other one in time, I think. So you have because because the event could be sitting in a buffer sitting in Kafka. Is there other ways that we should isolate services from each other? So, I mean, there's there's a ton of different ways to isolate services.

Starting point is 00:11:29 I kind of feel like they boil down to a few specific ones. So specifically, I think, you know, isolation in time, I think, is very important. Isolation of state, I think, is equally important, especially when you build microservices. And so when I talk about isolation of state, what I mean is microservices shouldn't share a database. Now, I want to clarify that statement a bit and say they shouldn't share, you know, tables and things like that in the database. They may actually all be operating within the same, you know, SQL database or whatever, a Cassandra database, doesn't matter. But they don't have access to each other's data through the database. If you want access to each other's data,

Starting point is 00:12:11 then you do that by communicating through the API that that service presents. And that helps to decouple services, which again, can make them more flexible. That makes it easier for services to evolve. I've been in situations, for example, where you're isolated or sorry, you lack that isolation. Everything depends on the database. And then you get into this awkward situation where you go, okay, the structure of this particular table is actually kind of awkward and I want to change it. But these 17 dozen other locations in the application all depend on that

Starting point is 00:12:46 table. So if I make a change here, I got to change all those other things. And that's the kind of situation we want to avoid. We'd like to be able to, you know, have the freedom to evolve our database as necessary for that particular microservice, for example. So that's isolation of state. There's also isolation of space. So in that case, what we're saying is microservices shouldn't depend on the location of another microservice. So this is different from a monolith where if you have a monolith, essentially everything is deployed as a single unit. And so because of that, you might have your reward service and your customer service and whatever other services you have, those are all deployed in the same location. And the monolith is largely dependent on the fact that those are deployed in the same location.

Starting point is 00:13:34 Now, you might have multiple copies of that deployed in different locations, but those copies usually don't know about each other. Any communication happens through the database. So within the individual application, everything is deployed in the same place. With microservices, that's different. We shouldn't necessarily require that a microservice be deployed on the same machine as another microservice, for example. We should be able to have the flexibility to deploy them across many machines. And what that gives us is that gives us the ability to scale and to be elastic. So now, you know, if we need maybe our customer service needs 10 copies and our reward service only needs three copies, that's fine.

Starting point is 00:14:22 You know, we can deploy as many copies of our customer service as we want because it's not directly tied to the location of the reward service or anything like that. So that's isolation of space. And to make sure I follow. So you were talking about this rewards example. We can change examples if you want. So we have a rewards service now. We've broken it out. And so it has its own database. And then when a user shops or something that would cause rewards to happen,

Starting point is 00:14:50 then we would emit events. And then am I on the right track here? Potentially, yeah. So you might have, for example, when a customer buys something, you would emit an event which indicates, you know, customer bought something for X amount of dollars. The reward service would receive that. And then the reward service would potentially know that that amount of money translates into this kind of reward, whatever that happens to be. So it receives an event, which is something like, I don't know, item purchased or

Starting point is 00:15:25 something like that. It receives that and then rewards the events or sorry, applies the appropriate rewards based on that event. Makes sense. I interrupted you. You were going to another type of isolation, I believe. Yeah. So we've got, um, so far we've got isolation of space, isolation of state and isolation of time. Um, and then the other one we've kind of already talked about a little bit and that's isolation of failure. Um, and so that's essentially just making sure that you have your system isolated in such a way that if one piece of the system fails, it doesn't bring down the whole system. Um, so, you know, for example, if our reward service fails, well, people should still be able to buy stuff. You know, we don't want to have a situation where

Starting point is 00:16:10 our reward system fails and therefore we have to say, oh, nobody can buy anything anymore. You know, so we want to isolate those failures such that one, a failure in one piece of the system, I mean, it might have an impact on another piece of the system, but it doesn't bring down the whole system. You know, uh, Netflix actually does this really well. Um, and, uh, you know, what they have is, um, I I'm kind of making some assumptions here based on my experience with Netflix. I've never worked for them or anything, so I don't really know, but, um, based on my assumption and my experiences with them, I don't know if you've ever noticed, but sometimes if you look at like the My List feature in Netflix, sometimes that disappears. I've seen that happen a couple of times in Netflix where I go to look for My List and it's just gone.

Starting point is 00:16:54 And I think what is happening in that case is the microservice or the service that supports the My List feature has disappeared. You know, it's failed for some reason or maybe it's being redeployed or whatever the case may be. It's gone. The rest of the application still works fine. In fact, if I wasn't specifically looking for the My List feature, I wouldn't even know it was gone because there's no like error message or anything like that. Everything just continues to work.

Starting point is 00:17:21 I can still watch videos. I just can't access the list. And so I think that's a really good example of how you can isolate failures in your application. I heard another example from Netflix, which I'm probably going to get wrong, but when you first go into Netflix, they have some personalized recommendations.

Starting point is 00:17:40 And I guess there's a service that generates that, but it's an expensive thing to generate. So it kind of is cached. So when you go in, it'll start generating and it will get shown from the cache. However, like lots of times they just don't have that. You haven't viewed it before. It hasn't been kicked off.

Starting point is 00:17:57 It's not in the cache. So they just have a default. They just show here's what we think everybody would like, right? Like everybody likes the movie Ghostbusters. I don't know what they put in the default recommendation, but it's not per se a failure, but I guess an explicit fallback, right? They're like, we're not assuming this is always here.

Starting point is 00:18:13 Yeah, well, I mean, I think that is a, it is a representation of a failure of some kind because potentially they could have a situation where, for example, maybe that service is actually unavailable and they fall back to the defaults. And, you know, again, that's a great way to hide or to isolate a particular kind of failure. So, you know, rather than failing completely and saying, look, you're looking for this kind of information, I can't give it to you.

Starting point is 00:18:39 What we do instead is we say, well, normally we give you this, you know, really rich, detailed information, but we lack that right now. So here's the next best thing we could give you and give us that instead. And I think I'm not sure. Again, I'm not sure if that's an actual implementation detail the Netflix uses, but it wouldn't surprise me. And I think it would be a really good example as well of isolating failures. Yeah, I had a previous interview with Jan. How do you say his name? Jan Mahachuk. And he was saying, like, just making these decisions explicitly is like a big change. Like when we have a monolithic app and like the user service, there's no user service.

Starting point is 00:19:21 The user part is just embedded in the application. So you never have to make an assumption about what should you do if there's no ability to look up users. But all of a sudden, when you split these things up, you can make these explicit decisions and say, well, maybe we can have a read-only mode if we can't authenticate this user or what have you. Yeah, and I think that's definitely true,

Starting point is 00:19:43 that it's not always obvious when you're building existing systems, monolithic systems, things like that. One of the things, you know, I teach a lot of courses as part of my job. And one of the things that I teach in one of my courses is I go through the exercise of breaking out a system into separate microservices. And then with the students, I'll actually sit down and kind of talk to them and say, okay, so we've got this series of microservices, right? Now, what happens if this microservice fails? And, you know, sometimes the immediate reaction is sort of like, well, I don't know, I guess nothing works. Okay, but what if we wanted it to work? What would we do if this service failed in order to allow it to continue to work? And that's, I find that's a really interesting process going

Starting point is 00:20:31 through with the students and, you know, talking to them about, you know, how could we change this system in some way so that we can tolerate a failure here? And so then we start to look at things like, well, what if we, instead of making a direct REST call, what if we emitted events, consumed those events, and created our own internal view of that data? If we do that, then when that external service fails, it doesn't matter because we have our own internal copy of that data that we can fall back on. Yes, the data might not be 100% up to date, but in a lot of cases, that doesn't really matter. In a lot of cases, mostly up to date is probably good enough. And in most cases, I would say mostly up to date is better than I can, now imagine that fails. What are you going to do?

Starting point is 00:21:26 Now imagine that fails. What are you going to do? I think that's a really good exercise to go through with any system, really. One potential way to deal with this that I've seen go badly is this service sometimes is busy. So if it doesn't respond in a certain amount of time, I'm just going to retry it. And then we have multiple services and they start, um, basically knocking something over. It starts to get slow. So you ask it again, have you, have you encountered

Starting point is 00:21:56 this problem before? Uh, I have, I've built a system like that, you know, to, to my own shame, I guess. Um, I, I built a system years ago where, um, it would attempt to, uh, send messages, uh, onto another aspect, another area of the system. Um, and then when that area of the system got busy, um, we'd end up getting timeouts. And so we'd, uh, retry and send more messages, um, to an already busy system and things would just get busier and busier and busier. And if you do that enough, then what ends up happening is the busy part of your system just collapses, uh, under the weight and everything falls over. Um, so, uh, yes, I've definitely

Starting point is 00:22:35 encountered this scenario. Um, there is a solution to that. And I think that solution is to be a little more polite, I guess, on the sending end. So, you know, rather than retrying over and over and over again until you kill that busy system, what we can do is we can use techniques like a circuit breaker. And what a circuit breaker does is essentially any requests go through the circuit breaker, whether they're successful or not. But what happens is as soon as a request fails for some reason, it trips that circuit breaker. And so once that circuit breaker gets tripped, what happens is now any requests that come through that circuit breaker just immediately fail and they fail with a message, something like

Starting point is 00:23:17 circuit breaker is open or something like that. And so you get this rapid failure. As a result, you know, you can retry as much as you want, but you're not actually putting any load on the external system because the circuit breaker is basically just preventing you from sending those messages on. Then eventually, after some predefined period of time, the circuit breaker flips over into what we call a half open state. And in the half open state, what it does is it allows one request through, just one, not a whole bunch. But what it does is it checks for that one request. And if that one request succeeds,

Starting point is 00:23:57 then we go back to normal operation. On the other hand, if that one request fails, then we assume the external service is still unavailable for some reason. And we go back to just blocking all of the external calls. What this does is it allows your system to be, I guess, more polite again to that external system so that you don't just drive it into the ground. And I mean, these circuit breakers are something that are implemented in various libraries. The libraries I work with are things

Starting point is 00:24:25 like ACCA and Logum. Both of them have built-in circuit breakers that you can use out of the box. There's other libraries that implement them too, though, you know, those are certainly not the only ones. You shouldn't be rolling it yourself at this point, I guess, is the... Yeah, there's no reason to build this yourself. You know, there's plenty of, plenty of options out there for, uh, for leveraging circuit breakers. So you have a lot of great principles here about making things work over time, making sure state's not shared. It's, it sounds like, or to steal your, your terminology from your course that the goal is to make these services autonomous, like able to stand on their own. Is that a correct characterization? Um's definitely one of the major goals. Yeah.

Starting point is 00:25:05 I mean, autonomy is a tricky thing, I think, because it's a really nice goal, but not one that we can never or ever necessarily reach completely. Like a fully autonomous system would be a very rare system, I think. But the further we can move along that path the better so you know the closer we can get to a fully autonomous system the better because that allows for all sorts of really interesting things I mean you know I guess to provide a bit of a definition when we talk about an autonomous system what we're talking about is a system that doesn't depend on anything right it depends only on itself and nothing else.

Starting point is 00:25:46 If you had a system like that, you could deploy as many copies of that as you wanted, and there would be nothing preventing it. You would never reach a point where there's a bottleneck or something that says you can't deploy any more of these. That means you could essentially scale forever. It also means you'd be totally resilient to any kind of failure because, again, you can deploy as many copies of these as you want, and if one of them fails, no big deal, you have 50 other copies.

Starting point is 00:26:10 So that allows you a lot of flexibility in terms of building a very robust system. Like I said, it's usually not easy or even necessarily possible to get to that point. So it's more about moving along that path and going as far along that path as possible. Yeah, I think it must be completely impossible. Like you have to have user input, for example, I guess we're excluding user interaction. Yeah, I mean, yes and no. You know, the trick and the reason why I say it's generally impossible to have this is because in order for you to have user interaction, you need some way for the user to know where all of the copies of the server are.

Starting point is 00:26:57 Right. And so in order for the user to know where all the copies of the server are, you need some sort of load balancer or something like that in between the user and all of the many different copies. The moment you introduce that load balancer, it's not a fully autonomous system anymore. You now have a dependency where the load balancer depends on all the services and the input or the users depend on the existence of that load balancer in order for this to work. So, you know, I think that's an example of where I typically say that it's probably going to be impossible to build a fully autonomous system, because at some point you're going to have a load balancer or something interfacing with the user. I can't think of a system off the top of my head where that wouldn't ever be the case. It's an interesting, it's an interesting game to play. Like, I think you could, so if I were going to make a service, and so it has persistent data,

Starting point is 00:27:49 but I want it to be totally autonomous. So I guess I would just emit things that would get stored in the database by something else, but at the same time, keep everything cached, like locally within that service. It sounds like a horrible idea, I guess, but it would mean if the database were down, I could just keep emitting these events and use my local cache. But yeah. Yeah. I mean, I would argue that if you have a database, you're not a fully autonomous system.

Starting point is 00:28:14 Um, you know, and again, that's where I say that, you know, fully autonomous systems are very, very difficult, if not impossible to build. Um, because if you have a database, well, now you have a dependency, right? Your system, your microservice or whatever it is, depends on the presence of that database. Now, you can improve autonomy there. So what you could do, and again, this is something I've actually done in the past, is I've actually built a system at one point where every instance of my microservice had its own database.

Starting point is 00:28:43 And so that improves autonomy because now I don't have a shared database. I have independent databases for each microservice and each instance of the microservice. And each one has its own copy of the data and everything else. That improves autonomy, but that's also really expensive. And so that's another thing that you have to consider when you do this is the further you move along this path to autonomy, oftentimes things get more expensive. And so there has to be a real value delivered in order to make this worthwhile. I would not, for example, recommend that everybody go out and build systems that create fully independent copies of databases for every unique instance of a microservice. That's probably more expensive than what most people need.

Starting point is 00:29:27 But it's something where if you reached a scalability limit and you realized that you couldn't go any further because you had this shared database, well, then that might be a place where you could say, well, how can we break that coupling? How can we isolate ourselves even more so that we don't have a shared database? And what benefit would that give us? And is it worth it? But is it worth it is always the key

Starting point is 00:29:51 question, I think. So you hit a big question that I have in this area, which is like, how micro, how monolithic? Is there guidelines that can be used to decide, you know, how many services are needed to serve this customer function or how do we decide how to cut these things up? So, I mean, I think, you know, part of that for me is the principles of isolation that we talked about. You know, the goal of microservices isn't necessarily, in my opinion, based on size. You know, I don't want to be one of those people who says that a microservice should only be 100 lines of code or something like that. I would prefer to say that a microservice should be as small as it needs to be in order to get the job done and remain isolated. So there's no, I mean, that's kind of a wishy-washy answer in the sense

Starting point is 00:30:45 that I'm not giving you a concrete answer, only that I would say, you have to look at your unique use case and say, okay, we can make this thing more isolated by doing X, Y, and Z, whatever that happens to be. But is that costing us more than the value it's delivering? Um, and so, you know, that's, that's kind of the key thing is, okay, so we could, you know, build our applications so that everything shares a single database, but they all have isolated tables that they don't access.

Starting point is 00:31:17 You know, that's better than having shared tables that everybody accesses. That's more isolated. So we isolate by creating those separate tables. That's kind of the first step. The second step might be, okay, are there other options within our database? So can we have, you know, different schemas, for example, if you're using like a SQL database, or if you're using a Mongo database, MongoDB has the concept of databases within your MongoDB.

Starting point is 00:31:46 And then Cassandra, I think they call them key spaces. So different databases have additional isolation techniques. And so that's better. Again, there's a little bit more of an overhead when you do that. And then probably for most use cases, that's going to do the job. But then there might be those rare use cases where you really need to scale beyond the ability of one database instance to handle. And then you start looking at, okay, well, what if I create a whole other instance of Cassandra that's just there to handle to ask yourself whether the cost to maintain that new thing justifies the benefit that you get out of it. Is the important distinction for microservices the complexity of the business requirements and how they interact or the actual scale of deployment and the usage? I think it's both in a lot of ways. So one of

Starting point is 00:32:48 the things that I think microservices do really well is they allow you to isolate complex business logic in one area rather than having it, you know, trapped in your database in a series of stored procedures or something like that that are used by multiple parts of the application, you can have, you know, a single microservice that just deals with this piece of business logic, however complex that business logic happens to be. What I like about that approach is it allows me to go into that microservice and sort of forget about all the other things for a while and just focus on that microservice and that piece of the business. And so I can sort of keep that all in my head without getting lost in the details of what everything else is doing with that. So I think, you know, it is good for isolating business logic. And I think that's very important. You know, I think actually that's one of the primary things

Starting point is 00:33:39 we talked about. How big should a microservice be? I think one of the primary things is to look at, you know, specific isolated pieces of business logic. In DDD, Domain Driven Design, we use the term bounded context, you know, look for those bounded contexts. And that's kind of where you start building a microservice because it allows you to isolate that business logic. So I think that's definitely a very important thing. However, the fact that you've created this isolation and then potentially introduced a new level of autonomy that you didn't have before, then enables you to scale in ways that you didn't previously have the ability to scale. So in that respect, this is something that enables scalability. So I think it's a little bit of both. It's both business logic and scalability that drives this. I think the term microservices, I think, became pretty hip maybe around 2015 or something. That's my recollection. And maybe there's a bit of a backlash now. But a long time ago, like maybe 2005, I remember people talking about service-oriented architecture.

Starting point is 00:34:44 Is there a difference? Is it rebranding? I think to some extent it's rebranding, but I would argue it's not completely rebranding. I would argue that microservices are a subset of service-oriented architecture. So service-oriented architecture would be kind of an umbrella term that covers microservices, but it also covers other things. One of the problems I think that happened with service-oriented architecture is over time, they started building infrastructure around doing service-oriented architecture. So you started getting like these enterprise message buses and things like that, enterprise service buses.

Starting point is 00:35:22 And these would have a lot of functionality built into them. They would do message passing between different parts of the system. They would do message versioning. They would do API versioning, all sorts of different things. And I think that sort of muddied the water a little bit because people got so focused on these enterprise service buses, which really wasn't necessarily the original purpose of service-oriented architecture. I think service-oriented architecture, the original purpose was really about isolation, which is, again, kind of what I've been talking about.

Starting point is 00:35:54 But on top of that, you know, some people build applications that they would call service-oriented architecture, but they build them in a monolithic style. So what they do is they build a single deployable unit, but within that single deployable unit, there are multiple services, essentially, and those services communicate with each other through a single or rather through discrete APIs. So each service presents an API. When another service needs data, it talks to that API. It doesn't go directly to the database. So they have isolation of state in that respect. What they didn't do necessarily is require that those individual services be deployed independently. And I think that's where microservices take all of those ideas of isolation of state and you know providing um that that api and communicating only through that api but then they also add the additional requirement that says

Starting point is 00:36:51 and these microservices have to be deployed independently um they're not deployed as a single unit and i think that's where the difference is and it seems like an all right solution, I guess. Like deploying these things as separate services isn't necessary to overcome the things you were talking about earlier, like having clear dependencies. It sounds like by the services talking to each other through their external APIs, they've covered that intertwined dependency risk.

Starting point is 00:37:29 Yeah, definitely. I think the thing to keep in mind is that, you know, again, going back to the principles of isolation that I talked about, you know, they cover off the isolation of state fairly well. And so that's a really nice thing. I think just by using service-oriented architecture, you tick that box, at least to some extent. Maybe there's ways that you don't, but I think generally speaking, service-oriented architecture does a really good job of ticking the isolation of state box. Where it falls down a little bit is not service-oriented architecture, but that sort of monolithic deployment style of service-oriented architecture, where it falls down is isolation of space. So again, because each individual service is packaged up into a single deployable unit, so all of your services get packaged up into this

Starting point is 00:38:18 one unit that you deploy, that means you don't have isolation of space. You are basically requiring that you have, you know, exactly the same number of copies of every service. And that limits your scalability and it limits your ability to handle failures because you can't, for example, say, well, I want 10 copies of my customer service and only three copies of my reward service. You lose that ability because it's all packaged up into one deployable unit. That makes sense. And there's a continuous delivery problem, I'm thinking,

Starting point is 00:38:51 wherein if you have these four services all wrapped up in one deployment and you want to roll out a new version of one of them, you have to switch them all at once. So you can't have the old user service still there and the new version of the reward service talk to it. Yeah, and this is one of the things that I think is really beneficial from a development perspective is when you are working

Starting point is 00:39:15 with that monolithic deployment style, if you've ever worked in an application that does that, oftentimes you get into this situation where you get like deploy day. So, okay, everybody, we're deploying today, which means nobody changed any of the code because we got to make sure that nothing moves between now and when we deploy. And then you've also got this problem where, you know, people are kind of talking to each other and saying, okay, I got my stuff in, did you get your stuff in? You know, we got to sync up everything before we deploy. And, you know, that all gets very expensive.

Starting point is 00:39:45 But then the other thing, too, is that you get into situations where, you know, I need to make a change. And it's just a very small change. It's a hot fix for a bug that I guess got deployed or whatever. I want to make that change and I want to deploy that. But now we've got this problem that, okay, maybe my change is small and we want to deploy just that change, but we can't. We have to deploy all this other stuff as well. And so, I mean, you know, there's ways that you can work around that to some extent with branching and things like that, but it starts to get awkward and the maintenance burden of that gets harder. What I like about working in

Starting point is 00:40:19 microservices is it allows you to say, I want to make that hotfix to that bug and deploy this service. I don't actually depend on anything else. So that's fine. You know, I can make that change to just that one service, deploy it. And that's not gonna, I'm not gonna have to worry about what everybody else changed in the meantime. The one thing that I think maybe can be worse is when you want to change a service and the things that utilize that service. Like when it was all a monolithic application, it could be a single commit. I guess the rollout is a bit trickier perhaps, but now when it's multiple things, if you need to make a breaking change, how would you handle that? I mean, I think you're right. I think the rollout of changes that affect

Starting point is 00:41:07 multiple services is arguably harder in a microservice-based approach than it was in a monolithic approach. I don't think there's too many people that would argue that. So yes, I absolutely agree. I think that is harder. Again, there's certain deployment techniques that can mitigate that to some extent, like you can do blue-green deploys, for example, where you still deploy each service individually and you deploy each service as some number of copies, but you kind of deploy them all into an inactive cluster and then you flip over from the active cluster to the inactive cluster.

Starting point is 00:41:43 So there's ways to sort of mitigate that. But it is more complicated, I think, is what it boils down to. I guess the way that I would suggest you mitigate that problem and the way that I have done this is you support the old API, right? So if you need to make a change to an API of one service and you have another service that depends on that API, right? So if you need to make a change to an API of one service and you have another service that depends on that API, when you make the change to the API, you have to support the old API as well. That is harder than it was with a monolith. With a monolith, you would just

Starting point is 00:42:14 make the change to the API, deploy everything at once, and you wouldn't have a problem. So this is one of the things where when you make the move from monoliths to microservices, you're going to get a lot of benefits. You're going to get some disadvantages as well. And so it's a matter of figuring out for your particular use case, do the advantages outweigh the disadvantages? I think when you're starting to talk about things like scalability, resiliency, things like that, if you've got a system that has to deal with, you know, millions of users or terabytes of data all the time, then we start to get into the situation where, yes, we probably do want to make that sacrifice. On the other hand, if you've got a system that's dealing with like, you know, 15 users an hour or something like that, and very small amounts of data, this might be a bit much, you know, you might not need this kind of resiliency and this

Starting point is 00:43:03 kind of scalability. I think that's why sometimes you were mentioning before, like, you know, you might not need this kind of resiliency and this kind of scalability. I think that's why sometimes you were mentioning before, like, you know, people having pushback when you say something about their monolithic app, but it, because it's probably started small and delivered a lot of value and then, and then grew and grew and delivered more and more value, you know, and along the way they're always taking these, these little steps to make it better um yeah i think so um you know i think in a lot of cases you get people who uh jump in and they they're a startup initially right and when you're a startup you don't have any users um but then over time that user base grows you know maybe you get a few hundred users a few thousand users you know up to a few million users whatever at some point your application starts to break break down, because you didn't build it under the understanding that you would have, you know,

Starting point is 00:43:50 that number of users. So, you know, I think that does happen. And I think that is one of the things where you start to get pushback is when, you know, a startup is first jumping in and they've got, you know, no users or a very small number of users doing a lot of this kind of stuff might be really expensive and really time consuming and not worth it, to be perfectly honest. Now, the thing that they have to consider, obviously, is, OK, so we don't have any users right now, but where do we want to be in a year? You know, if our goal in a year is to be at, you know, 10,000 users or whatever it is, are we going to be able to support that given the infrastructure that we've built?

Starting point is 00:44:36 If our goal is to be at 10 million users, are we going to be able to support that given the infrastructure that we built? And so, I mean, obviously, I guess everybody wants to be at 10 million users, but, you know, being realistic about it, is that a likely scenario? And so it's about figuring out again, how much is worthwhile right now? Is it worth going through all the effort right now so that we can be prepared in a year for when we get to the scale that we want to be at? And the tricky thing, I think, in that startup mode is maybe you don't really know what the future is going to hold because you need to get input from the customers. So this reward example, you may have an idea that a reward system is a good idea, but you may actually build it and nobody uses it and then want to remove it.

Starting point is 00:45:23 So I think that's why sometimes, you know, oh, we'll just add it to the existing code base because we don't know if it's a thing yet. Like this is just a proof of concept to see if users engage with this feature. Yeah, and I think that's okay, depending on how you do it. So again, if you're in the situation

Starting point is 00:45:41 where you've got, you know, this big ball of mud, you know, style architecture, then I think at that point, you really have to be more careful. You shouldn't just make the ball of mud bigger. That's not to say necessarily that you can't add the existing functionality into your existing monolith, maybe just to save on deployment hassles and things like that. But what you should do in that case, if you're going to add it to that existing monolith, you should add it to the monolith in an isolated way. And so that means, you know, kind of talking

Starting point is 00:46:16 along the lines of the service-oriented architecture style of monolith, where you create the reward system inside your monolith, but you provide an API and every other part of the monolith where you create the reward system inside your monolith, but you provide an API and every other part of the monolith that needs access to the data goes through that API. They don't go directly to the database. So then your reward system has its own isolated section of the database that it's fully in control of, and nobody gets to talk to that database. They just go through the API.

Starting point is 00:46:50 What you've done now, though, is you've put yourself in a position where if it turns out this reward thing does turn into a big deal, now what you can do is you can say, well, we've already got the tables and everything isolated. Nobody's accessing those tables except through the API. The API is already defined. It's clear. It's consistent, whatever. Let's just pull that API out into a microservice. And now we can do that.

Starting point is 00:47:10 We can pull it out into a microservice without a whole lot of hassle. And now we can start playing with the scaling options that we've talked about already. And so that gives you the flexibility to do that. The key is, again, don't make the existing problem worse. Always look for ways to make it better than it was before. And I think that gives a great segue to your opinions on this hexagonal architecture.

Starting point is 00:47:36 So if we're building a single app, how do we build it in such a way that the dependencies are not tangled? Yeah, so hexagonal architecture, I think, is a really interesting thing, something that I use heavily when I build my own applications. And what hexagonal architecture does is it sort of divides your application along clear boundaries. And so you have kind of at the center of the application, you have your domain. And your domain is like basically your business logic. It's all the things that are critical to the operation of your business, the rules that are associated with that business, the decisions that you have to make, all of that kind of stuff falls into your domain.

Starting point is 00:48:19 At the outside, the very outside edge of that, the very outside edge of your system, you have all the infrastructure you need to make the system work. And so that's, you know, things like your database, your user interface, you know, if you're using any kind of messaging platforms, you know, your messaging platforms will be out there. You know, so any of the technology that enables you to make your application work,

Starting point is 00:48:44 those kind of fall into the infrastructure category. And what hexagonal architecture does for me is it allows me to make very clear distinctions between what is domain and what is infrastructure. And so essentially what you do is you say, okay, within the domain, I'm not allowed to have any dependencies on infrastructure. So my domain doesn't know what kind of database I use. It doesn't know I'm using SQL. It doesn't know I'm using Cassandra.

Starting point is 00:49:08 It doesn't know whether I have a REST API or a user interface based on a website or something like that. It doesn't know those things. Those are all infrastructure. All it knows is things like, you know, when I get a request to reward a customer because they purchased something, this is how many reward points I will give based on, you know, the amount of money that they spent. That's a business rule. So what hexagonal architecture does is by forcing you to say your domain can't depend on your infrastructure, it forces you to introduce layers of isolation that then

Starting point is 00:49:47 enable you to make interesting decisions later on. So for example, you need stuff out of a database. I mean, that's going to happen at some point, but you don't need to know what kind of database it is or what that database looks like. You just know, for example, in the reward system, you need reward points. So you know that there is an API that you can call that gives you reward points. You build an interface or something in your application that does that. Then you have an implementation of that within the infrastructure that says, well, this happens to talk to SQL, or it happens to talk to Cassandra or whatever. Now that you've done that, you've created that separation between the domain, which just says, I need a way to get reward points, and the infrastructure that says,

Starting point is 00:50:32 I get reward points out of the database. Now that you have that separation, you can start doing interesting things like saying, okay, well, I realize that the database representation that I used here was actually very inefficient, so I'm going to rewrite that database representation. None of my domain code changes because your domain code is still just getting reward points. You're only changing that infrastructure layer. And so that allows for a lot of flexibility. I've done systems that use

Starting point is 00:51:00 hexagonal architecture where, for example, I have changed the underlying table structure of something in order to make it more efficient without basically just rewriting one class. And that's that class that's accessing the database, what we would call a repository in domain-driven design terms. So I've changed the implementation of the repository. The domain code stayed exactly the same. I've also changed to a totally separate database. So I've gone, for example, from MongoDB to a SQL database. And again, the domain code didn't have to change. Nobody using that service had to know that that change was made. No other services had to change because everything's isolated in state. I've gone further than that, though, because

Starting point is 00:51:45 on the flip side of that, if your infrastructure layer says, you know, I am operating through a REST API and I'm making calls into that domain, that domain presents sort of a clear API that says this is how you talk to that domain. Now what I can do is I can do things like say, okay, well, originally I had a REST API and it made these calls into this domain, but now I don't want a REST API. Now I want an event-driven system. Well, it just makes the same calls into the same domain. So you can add additional endpoints, maybe a REST endpoint and then an event-based endpoint, and then maybe later on a user interface based endpoint, they're all talking to the same thing. They're all talking to the same domain. And so you can make those kinds of changes.

Starting point is 00:52:29 You can potentially do things like rewrite the entire domain. And as long as that interface that you've provided, as long as that API to the domain remains consistent, you don't have to change anything on the infrastructure level. So there's lots of flexibility that comes when you do this properly. Yeah, it's, it's very interesting. And it seems to have a lot of principles that are great for keeping these dependencies from from, well, keeping the dependencies from being too coupled to each other. One thing I didn't understand about it is, I don't understand why it's a hexagon. Like I saw a picture of it. There's a hexagon in the middle. It says domain, but at the six sides, I don't really understand where the six sides come from. Yeah. You know, to be honest, I'm not sure either. When I first started learning hexagonal

Starting point is 00:53:19 architecture, I was introduced to it with three different names. So I was introduced to it with the name hexagonal architecture, which I found very confusing for the same kind of reason that you expressed. Why is it a hexagon? I was also introduced to the concept of ports and adapters, which is another name for it. And then I was introduced to it as onion architecture as well. In some ways, I think Onion architecture represents my understanding of it better, which is you have these different layers you have. So the inner layer is the domain. Outside of that, you have what you would call the API layer. And then outside of that, you have the

Starting point is 00:53:58 infrastructure layer. And the dependencies in these layers go from the outside in. So infrastructure depends on API. API depends on domain, but never the other way around. And I think logically, in my head, that makes sense. I'm not really sure why the original hexagon. I'll figure it out and I'll put a link somewhere. But yeah, I think what you said makes sense sense where it makes it easy to put in different implementations, um, which could be various sides perhaps.

Starting point is 00:54:31 Yep. Yep. Um, I saw here on your Twitter, it says that you're a science fiction author. Um, I do. Uh, I mean, I, I would say I'm a, uh, a wannabe author to some extent, uh extent of a little bit of science fiction, fantasy. So, yeah, I do a bit of writing on the side, nothing published, but I've written I've written one novel, which I'm kind of in the final stages of polishing up before I maybe start start farming it out to publishers. But and, you know, working on other projects here and there. That's awesome. What, who's your favorite, uh, author, uh, right now?

Starting point is 00:55:13 Uh, favorite author right now is, uh, Brandon Sanderson. Definitely. Um, he's written a number of books. Um, I think, uh, my favorite by him is the Mistborn, Mistborn trilogy, which is absolutely a fantastic series of books, uh, which I would highly recommend to anybody if you're interested in fantasy at all. I've never heard of it. I read a lot of science fiction, but not fantasy as much. I'll check it out, though. Brendan Sanderson dabbles in a little bit of science fiction. He's primarily a fantasy author, but he's had some short stories and things like that

Starting point is 00:55:46 that are more science fiction oriented, I think. I would say I probably read more fantasy, but I do read a little bit of science fiction here and there as well. I think my favorite science fiction book, actually, that could be a tough one. It probably is between Dune, Frank Herbert's Dune, and Orson Scott Card's Ender's Game would be kind of my top ones.

Starting point is 00:56:10 Oh, yeah, those are both great books. At some point, I read all the Frank Hebert books, and I loved them. They're great. So much detail in his uh world that he created yeah to me um dune is kind of the science fiction equivalent of uh the lord of the rings you know that that intense world building yeah that's definitely true well before we uh wrap up our talk i wanted to say that uh your your course that you're building is really great i went through uh quite a bit of it and i like the structure i love watching uh tech talks um but the thing i liked about about your structure you have is there's a talk portion there's questions there's answers makes it a little bit more uh engaged than just watching like a several hour talk i thought it was great yeah i think that was one of the things that we

Starting point is 00:57:05 really focused on when we were building the course was a couple of things. One is everybody learns differently. You know, so some people learn by watching, some people learn by listening, some people learn by reading, some people learn by answering questions and things like that. So we wanted to sort of hit as many of those different learning approaches as possible with the course. But the other thing too, is I didn't want the course to be something where you can just sort of like put it on in the background and tune out and not really pay any attention to that. I do that all the time. I'll start listening to something and I sort of wander off and don't pay attention to it. I wanted this to be something where you come out the other end

Starting point is 00:57:40 and you have actually absorbed the information. And so that sort of necessitated the introduction of the questions um we also try to find ways to use the questions as a a bit of a learning experience as well so the thing i liked was your um well the thing i liked was it takes a case study approach somewhat with this reactive barbecue uh yep it just made me want barbecue to be honest. You know, I actually, so we do we do in-person training of the, this same course. It's not exactly the same, but we have an in-person version of it. The exercises are all very different, much more interactive, obviously. But one of the things that I did during one of the teaches,

Starting point is 00:58:24 I think about a year ago is I spoke to the organizers of the one of the things that I did during one of the teaches, I think about a year ago, is I spoke to the organizers of the conference where I was teaching the course, it was the Reactive Summit. And I spoke to the organizers and specifically said, hey, can we organize some sort of barbecue meal, you know, during the course at some point, particularly because we were teaching in Texas. So it was sort of like, okay, we're teaching the reactive barbecue in Texas. I mean, come on, you have to have barbecue at some point. So they came through though. And we indeed actually had a nice barbecue meal the one day. So it was really nice for that. It would have been funny if actually

Starting point is 00:59:03 the barbecue ordering site went down during the process because it fits right into your case study. Yeah, yeah, absolutely. Well, maybe not funny when you're hungry. All right. Wade, thank you so much for your time. It's been a lot of fun. Yeah, no, it's been great. And, you know, I mean, you mentioned the course.

Starting point is 00:59:22 I think at this point we have three pieces of the course out, but we've got another bunch coming. So, you know, keep your eyes out, I guess, for the rest. Yeah, and actually, let's just touch on that. So what are the three courses you have so far? So the three courses are basically, the first one is kind of an introduction to reactive architecture. The second one is domain-driven design.

Starting point is 00:59:45 And then the third one is all about building reactive microservices. Awesome. And so that's part of one training path on the IBM cognitive class. So the training path is the LightBend Reactive Architecture Foundations. So we're going to be launching another training path shortly, which will include another three courses so that's great i'll put a link in the show notes for this episode yeah great that would be awesome well that is the show i would like to thank everyone who helped share the last episode with Philip Wadler.

Starting point is 01:00:26 It got some great attention on Reddit, our programming, where there were lots of interesting comments and critiques. If you made it this far, you must have enjoyed the show. So tell a friend about it, mention it online somewhere, whatever you can do. It helps grow the show. Talk to you next time.

Pet Camera - EBO Air 2

CoRecursive: Coding Stories - Tech Talk: Big Ball Of Mud

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

CoRecursive: Coding Stories - Tech Talk: Big Ball Of Mud

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.