CoRecursive: Coding Stories - Tech Talk: Big Ball Of Mud
Episode Date: November 14, 2018Tech Talks are in-depth technical discussions. Evolving software under constrained resources is a challenge, and I think we kid ourselves when we don't admit this. Software that is providing value oft...en grows in scope until it is a mess. Today I talk to Wade Waldron about how avoid this situation or recover from it. Big ball of mud is the title of a paper presented at the 1997 Patterns Languages of Programs conference and I think it is super interesting. The researchers went out into the field to see what architectures software in industry were following. Big Ball of mud is what they found, along with other 6 other patterns with names like "sweep it under the rug" and reconstruction, which is the throw it away and start again pattern. Links: Big Ball Of Mud Paper Hexagonal Architecture Reactive Foundations Course Reactive Advanced Course Check out other episodes like this Philip Wadler: https://corecursive.com/021-gods-programming-language-with-philip-wadler/ This podcast originally published here : https://corecursive.com/22-big-ball-of-mud-architecture-and-services-with-wade-waldron/
Transcript
Discussion (0)
Welcome to Code Recursive, where we bring you discussions with thought leaders in the world of software development.
I am Adam, your host.
If you're anything like me, then learning how to build software in a sustainable way,
a way where you don't continually build up technical debt,
and have development slow down as the project gets more complex, has been a career-long struggle.
Big Ball of Mud is the title of a paper presented in 1997 at PLOP, Pattern Languages of Programs
Conference, and I think it's super interesting.
The researchers went out into the field to see what software architectures
were being used in industry.
And Big Ball of Mud is what they found, along with six other patterns
with names like Sweep It Under the Rug and Reconstruction,
which is the throw it all away and build it again
and hope it's better the next time pattern.
Anyhow, I think this is a hard problem.
Evolving software under constrained resources
is always going to be a challenge.
And we kid ourselves when we don't admit that it's hard.
Today, I talked to Wade Waldron about how to avoid
this situation or how to recover from it. If you like the show, spread the word, tell a friend,
leave an iTunes review, or follow us on Twitter. If you're listening in your web browser on the
website, subscribe to the podcast
for a much better experience. Wade, thank you for joining me on the podcast. I'm glad to be here.
So if you were at a dinner party, what would you tell somebody you did for a living?
I'd probably try to avoid that question. think it's uh it's usually a little awkward
to explain that but um i guess when i do get asked that question i i usually tell people i'm a
software consultant um i'm not sure that necessarily uh explains things very well but i guess i guess
that's the answer they get and i i actually got the request to interview you from a listener.
And I started to dig into these courses you have on the reactive architecture, and I found them to be very interesting.
So I'd like to start with this question.
What's a big ball of mud?
So, I mean, it's an interesting question. One of the challenges, I think, with that particular question and one of the challenges with that term is sometimes it'll get kind of get people's backs up, I guess, because they hear me talk about a big ball of mud and they say, no, no, no, that's not how I build a system.
And, you know, so I think it's important when I talk about a big ball of mud like that, that I establish first off that this is, I consider
this a worst case scenario. I do not by any means consider this to be the general case. So when I
talk about a big ball of mud, what I'm talking about is usually a system that has been built
in a monolithic way. So it's been built as a single application rather than a series of microservices, for example.
But I'm not talking about every monolith. Not every monolith is a big ball of mud.
There are plenty of monoliths that are probably extremely well-designed and extremely resilient
and extremely robust. However, I have worked with monoliths specifically where they were not really well designed and were not very robust.
And so in those cases, those monoliths, those particular systems were built in such a way that there was no clear separation between dependencies.
So you had, you know, every piece of the system was depending on every other piece of the system, either directly through the code or sometimes through the database. And it's very difficult in those cases to sort of unravel where the clear
system boundaries are. Instead, what you end up with this big ball of mud where it's all sort of
one cohesive blob and you can't really separate it into smaller pieces. If I have a big ball of mud, what should I do next? Um, well, I mean, you know, some of
that depends on, uh, you know, what your goals are and what, uh, that big ball of mud is doing for
you. Um, what I would suggest not doing is throwing away the big ball of mud and starting over. Uh,
you know, that's something that I think is often tempting. You look at this big ball of mud and
you don't know where to start and you say, well, you know, let's just build a new
thing and replace the old thing. And then usually two or three years into that project, you realize
that you didn't really understand that the old thing to begin with and now what you've got is a
big ball of smaller balls of mud or something like that. So what I would usually recommend in that case is to start looking for pieces of that big ball of mud
that might be easier to isolate.
So find a section of the code
that is maybe not quite as intertwined as other sections
and start looking for ways
that you can disentangle that piece of code.
What I would usually do is I would start by extracting that out
into sort of a separate library or something like that,
that you can then basically remove or control the references
to that library a little bit better.
Once you have things sort of separated like that,
then you can start looking at, okay, now how can I take this thing
and potentially move it out of the ball of mud completely?
How do I move it into a separate microservice or something like that?
But first you have to start by finding that piece that isn't so tightly coupled to everything that the moment you try to move it, it's going to break.
And that could be a challenge.
That's not always an easy thing to do, but you got to start somewhere.
The other thing I would highly recommend is if you are in the situation where
you have that big ball of mud, don't make it bigger. So when somebody comes in and says,
hey, we need this new thing, don't just jump in and start putting it into the big ball of mud.
Look at the possibility of, okay, this is a new thing. Can we build this new thing separate from
the ball of mud so that at the very least we're not making the problem any bigger
uh we're we're maintaining uh we're maintaining the existing problem but moving things uh in a
way that encapsulates them and isolates them better yeah that's a great suggestion i think that
in my experience the tricky part with that is sort of what if the piece isn't that cohesive
like they want this new thing but it kind of relates to what's there.
Is there a way to, I don't know, is there a way to do this without just making that new thing like actually dependent on the monolith, but maybe, you know, across process boundaries, but it's still tied to it?
There are, and it's something I have done in the past.
Depending on the, I guess, maturity of the development team,
you know, how familiar they are with different techniques and things like that,
it may not be a problem.
Also depends a little on what kind of infrastructure you have in place.
But for example, if I use a concrete case that I did,
we had an application which was a big ball of mud. And we wanted to introduce, in this case, it was a rewards program for the company that I was working with at the time. And with that rewards program, we needed a lot of information that a little easier is we built that rewards system as a separate component.
And then what we did is we went into the existing monolithic application and we found the places where the information we needed was being recorded.
We kind of located and isolated those particular pieces of the application.
And then we made the application essentially emit an event.
And this is something that comes from event-driven architecture,
which obviously we haven't talked about today,
but it's something that reactive systems quite often focus on.
And what you do is you make that piece of the monolith emit an event,
which can then be consumed later on by your new piece of the monolith emit an event, which can then be consumed later on by your new
piece of the system. And so that event can be consumed. You can create your own sort of
internal representation of the data as necessary based on what's coming out of that event. And so
now you don't have to talk directly to the monolith. Instead, you indirectly consume these
events. Those events are probably broadcast through a tool like Kafka or Kinesis or RabbitMQ or something like that.
And so you consume those events and build up your own model based on that.
And that's one way that you can separate yourself from the monolith.
And what's the advantage of emitting events as opposed to, I don't know, REST calls or something like that?
So emitting events is using asynchronous messaging generally.
And that tends to be a little bit more robust for a few different reasons.
One of the reasons that it tends to be more robust is because if you emit an event, an event is something that you consume asynchronously. And as a result of
that, it means that you don't necessarily require all pieces of your system to be active and
functional at the same time. So for example, in the rewards example that I was giving, if the
reward service was down, there's nothing stopping me from emitting that event anyway, right? So I
emit the event,
even though the reward system is down. Now, when the reward system comes back up,
I can just consume that event, even though I wasn't alive when the event was emitted.
On the flip side, if the system that is emitting that event goes down,
that doesn't stop the reward system from continuing to serve any requests that it has to serve.
It also doesn't prevent it from consuming any events that were already omitted.
So that's one aspect of it that I think is beneficial.
It actually allows for more flexibility in terms of what's active and what isn't active.
There's other things, too.
You know, again, because it's asynchronous, it means you become more decoupled in time.
And so, you know, that has its own set of advantages.
You're not expecting something to happen right now.
You're expecting it to happen eventually.
If you expect something to happen right now, then again, we go back into the failure scenario
where if something fails for some reason and you need it right now, you have no choice
but to fail the whole process.
On the other hand, if something fails and you need it eventually now, you have no choice but to fail the whole process. On the other
hand, if something fails and you need it eventually, well, now you have other options that you can
take. You can do retries. You can wait for the information to become available. You don't have
to deal with it. You don't have to deal with that problem right now. So again, I think what it does
is it allows the system to become more robust and less brittle over time.
So that is sort of isolating one service from the other one in time, I think.
So you have because because the event could be sitting in a buffer sitting in Kafka.
Is there other ways that we should isolate services from each other?
So, I mean, there's there's a ton of different ways to isolate services.
I kind of feel like they boil down to a few specific ones.
So specifically, I think, you know, isolation in time, I think, is very important.
Isolation of state, I think, is equally important, especially when you build microservices.
And so when I talk about isolation of state, what I mean is microservices shouldn't share a database.
Now, I want to clarify that statement a bit and say they shouldn't share, you know, tables and
things like that in the database. They may actually all be operating within the same,
you know, SQL database or whatever, a Cassandra database, doesn't matter. But they don't have
access to each other's data through the database. If you want access to each other's data,
then you do that by communicating through the API that that service presents. And that helps
to decouple services, which again, can make them more flexible. That makes it easier for services to evolve.
I've been in situations, for example,
where you're isolated or sorry, you lack that isolation.
Everything depends on the database.
And then you get into this awkward situation where you go, okay, the structure of this particular table
is actually kind of awkward and I want to change it.
But these 17 dozen other locations in the application all depend on that
table. So if I make a change here, I got to change all those other things. And that's the kind of
situation we want to avoid. We'd like to be able to, you know, have the freedom to evolve our
database as necessary for that particular microservice, for example. So that's isolation
of state. There's also isolation of space.
So in that case, what we're saying is microservices shouldn't depend on the location of another microservice.
So this is different from a monolith where if you have a monolith, essentially everything is deployed as a single unit.
And so because of that, you might have your reward service and your customer service and whatever other services you have, those are all deployed in the same location.
And the monolith is largely dependent on the fact that those are deployed in the same location.
Now, you might have multiple copies of that deployed in different locations,
but those copies usually don't know about each other. Any communication happens through the
database. So within the individual application, everything is deployed in the same place.
With microservices, that's different.
We shouldn't necessarily require that a microservice be deployed on the same machine as another microservice, for example.
We should be able to have the flexibility to deploy them across many machines.
And what that gives us is that gives us the ability to scale and to be elastic.
So now, you know, if we need maybe our customer service needs 10 copies and our reward service only needs three copies, that's fine.
You know, we can deploy as many copies of our customer service as we want because it's not directly tied to the location of the reward service or anything like that.
So that's isolation of space.
And to make sure I follow.
So you were talking about this rewards example.
We can change examples if you want. So we have a rewards service now.
We've broken it out.
And so it has its own database.
And then when a user shops or something that would cause rewards to happen,
then we would emit events.
And then am I on the right track here?
Potentially, yeah.
So you might have, for example, when a customer buys something,
you would emit an event which indicates, you know, customer bought something for X amount of dollars.
The reward service would receive that.
And then the reward service would potentially know that that amount of money translates into this kind of reward, whatever that happens to be.
So it receives an event, which is something like, I don't know, item purchased or
something like that. It receives that and then rewards the events or sorry, applies the appropriate
rewards based on that event. Makes sense. I interrupted you. You were going to another
type of isolation, I believe. Yeah. So we've got, um, so far we've got isolation of space, isolation of state and
isolation of time. Um, and then the other one we've kind of already talked about a little bit
and that's isolation of failure. Um, and so that's essentially just making sure that you have your
system isolated in such a way that if one piece of the system fails, it doesn't bring down the
whole system. Um, so, you know, for example, if our reward service fails,
well, people should still be able to buy stuff. You know, we don't want to have a situation where
our reward system fails and therefore we have to say, oh, nobody can buy anything anymore.
You know, so we want to isolate those failures such that one, a failure in one piece of the
system, I mean, it might have an impact on another piece of the system, but it doesn't
bring down the whole system. You know, uh, Netflix actually does this really well. Um, and, uh, you know,
what they have is, um, I I'm kind of making some assumptions here based on my experience with
Netflix. I've never worked for them or anything, so I don't really know, but, um, based on my
assumption and my experiences with them, I don't know if you've ever noticed, but sometimes if you look at like the My List feature in Netflix, sometimes that disappears.
I've seen that happen a couple of times in Netflix where I go to look for My List and it's just gone.
And I think what is happening in that case is the microservice or the service that supports the My List feature has disappeared.
You know, it's failed for some reason or maybe it's being redeployed or whatever the case may be.
It's gone.
The rest of the application still works fine.
In fact, if I wasn't specifically looking for the My List feature,
I wouldn't even know it was gone
because there's no like error message or anything like that.
Everything just continues to work.
I can still watch videos.
I just can't access the list.
And so I think that's a really good example
of how you can isolate failures in your application.
I heard another example from Netflix,
which I'm probably going to get wrong,
but when you first go into Netflix,
they have some personalized recommendations.
And I guess there's a service that generates that,
but it's an expensive thing to generate.
So it kind of is cached.
So when you go in, it'll start generating
and it will get shown from the cache.
However, like lots of times they just don't have that.
You haven't viewed it before.
It hasn't been kicked off.
It's not in the cache.
So they just have a default.
They just show here's what we think everybody would like, right?
Like everybody likes the movie Ghostbusters.
I don't know what they put in the default recommendation,
but it's not per se a failure,
but I guess an explicit fallback, right?
They're like, we're not assuming this is always here.
Yeah, well, I mean, I think that is a,
it is a representation of a failure of some kind
because potentially they could have a situation
where, for example,
maybe that service is actually unavailable
and they fall back to the defaults.
And, you know, again, that's a great way to hide or to isolate a particular kind of failure.
So, you know, rather than failing completely and saying, look, you're looking for this kind of information, I can't give it to you.
What we do instead is we say, well, normally we give you this, you know, really rich, detailed information, but we lack that right now.
So here's the next best thing we could give you and give us that instead.
And I think I'm not sure. Again, I'm not sure if that's an actual implementation detail the Netflix uses, but it wouldn't surprise me.
And I think it would be a really good example as well of isolating failures.
Yeah, I had a previous interview with Jan.
How do you say his name? Jan Mahachuk.
And he was saying, like, just making these decisions explicitly is like a big change.
Like when we have a monolithic app and like the user service, there's no user service.
The user part is just embedded in the application.
So you never have to make an assumption about what should you do
if there's no ability to look up users.
But all of a sudden, when you split these things up,
you can make these explicit decisions and say,
well, maybe we can have a read-only mode
if we can't authenticate this user or what have you.
Yeah, and I think that's definitely true,
that it's not always obvious when you're building existing systems, monolithic systems, things like that.
One of the things, you know, I teach a lot of courses as part of my job.
And one of the things that I teach in one of my courses is I go through the exercise of breaking out a system into separate microservices. And then with the
students, I'll actually sit down and kind of talk to them and say, okay, so we've got this series
of microservices, right? Now, what happens if this microservice fails? And, you know, sometimes
the immediate reaction is sort of like, well, I don't know, I guess nothing works. Okay, but what
if we wanted it to work? What would we do if this service failed in order
to allow it to continue to work? And that's, I find that's a really interesting process going
through with the students and, you know, talking to them about, you know, how could we change this
system in some way so that we can tolerate a failure here? And so then we start to look at
things like, well, what if we, instead of making a direct
REST call, what if we emitted events, consumed those events, and created our own internal
view of that data?
If we do that, then when that external service fails, it doesn't matter because we have our
own internal copy of that data that we can fall back on.
Yes, the data might not be 100% up to date, but in a lot of cases, that doesn't really matter. In a lot of cases, mostly up to date is probably good enough. And in most cases, I would say mostly up to date is better than I can, now imagine that fails. What are you going to do?
Now imagine that fails.
What are you going to do?
I think that's a really good exercise to go through with any system, really.
One potential way to deal with this that I've seen go badly
is this service sometimes is busy.
So if it doesn't respond in a certain amount of time,
I'm just going to retry it. And then we have multiple services and they start, um, basically
knocking something over. It starts to get slow. So you ask it again, have you, have you encountered
this problem before? Uh, I have, I've built a system like that, you know, to, to my own shame,
I guess. Um, I, I built a system years ago where, um, it would
attempt to, uh, send messages, uh, onto another aspect, another area of the system. Um, and then
when that area of the system got busy, um, we'd end up getting timeouts. And so we'd, uh, retry
and send more messages, um, to an already busy system and things would just get busier and busier
and busier.
And if you do that enough, then what ends up happening is the busy part of your system just
collapses, uh, under the weight and everything falls over. Um, so, uh, yes, I've definitely
encountered this scenario. Um, there is a solution to that. And I think that solution is to be a
little more polite, I guess, on the sending end. So, you know, rather than retrying
over and over and over again until you kill that busy system, what we can do is we can use
techniques like a circuit breaker. And what a circuit breaker does is essentially any requests
go through the circuit breaker, whether they're successful or not. But what happens is as soon as
a request fails for some reason, it trips that circuit
breaker. And so once that circuit breaker gets tripped, what happens is now any requests that
come through that circuit breaker just immediately fail and they fail with a message, something like
circuit breaker is open or something like that. And so you get this rapid failure. As a result,
you know, you can retry as much as you want, but you're not actually putting any load on the external system because the circuit breaker
is basically just preventing you from sending those messages on. Then eventually, after some
predefined period of time, the circuit breaker flips over into what we call a half open state. And in the half open state,
what it does is it allows one request through,
just one, not a whole bunch.
But what it does is it checks for that one request.
And if that one request succeeds,
then we go back to normal operation.
On the other hand, if that one request fails,
then we assume the external service
is still unavailable for some reason.
And we go back to just blocking all of the external calls.
What this does is it allows your system to be, I guess, more polite again to that external system so that you don't just drive it into the ground.
And I mean, these circuit breakers are something that are implemented in various libraries.
The libraries I work with are things
like ACCA and Logum. Both of them have built-in circuit breakers that you can use out of the box.
There's other libraries that implement them too, though, you know, those are certainly not the only
ones. You shouldn't be rolling it yourself at this point, I guess, is the... Yeah, there's no reason
to build this yourself. You know, there's plenty of, plenty of options out there for, uh, for leveraging circuit breakers. So you have a lot of great principles here about making things work over
time, making sure state's not shared. It's, it sounds like, or to steal your, your terminology
from your course that the goal is to make these services autonomous, like able to stand on their
own. Is that a correct characterization? Um's definitely one of the major goals.
Yeah.
I mean, autonomy is a tricky thing, I think, because it's a really nice goal, but not one
that we can never or ever necessarily reach completely.
Like a fully autonomous system would be a very rare system, I think.
But the further we can move along that path the better so you know the closer we can get to a
fully autonomous system the better because that allows for all sorts of really interesting things
I mean you know I guess to provide a bit of a definition when we talk about an autonomous
system what we're talking about is a system that doesn't depend on anything right it depends only
on itself and nothing else.
If you had a system like that,
you could deploy as many copies of that as you wanted,
and there would be nothing preventing it.
You would never reach a point where there's a bottleneck or something that says you can't deploy any more of these.
That means you could essentially scale forever.
It also means you'd be totally resilient to any kind of failure
because, again, you can deploy as many copies of these as you want,
and if one of them fails, no big deal, you have 50 other copies.
So that allows you a lot of flexibility
in terms of building a very robust system.
Like I said, it's usually not easy
or even necessarily possible to get to that point.
So it's more about moving along that path
and going as far along that path as possible. Yeah, I think it must be completely impossible.
Like you have to have user input, for example, I guess we're excluding user interaction.
Yeah, I mean, yes and no. You know, the trick and the reason why I say it's generally impossible to have this is because in order for you to have user interaction, you need some way for the user to know where all of the copies of the server are.
Right.
And so in order for the user to know where all the copies of the server are, you need some sort of load balancer or something like that in between the user and all of the many different copies. The moment you introduce that load
balancer, it's not a fully autonomous system anymore. You now have a dependency where the
load balancer depends on all the services and the input or the users depend on the existence of that
load balancer in order for this to work. So, you know, I think that's an example of where I typically say that it's probably going to be impossible to build a fully autonomous system, because at some point you're going to have a load balancer or something interfacing with the user.
I can't think of a system off the top of my head where that wouldn't ever be the case.
It's an interesting, it's an interesting game to play. Like, I think you could, so if I were going to make a service,
and so it has persistent data,
but I want it to be totally autonomous.
So I guess I would just emit things
that would get stored in the database by something else,
but at the same time, keep everything cached,
like locally within that service.
It sounds like a horrible idea, I guess,
but it would mean if the database were down, I could just keep emitting these events and use my local cache. But
yeah. Yeah. I mean, I would argue that if you have a database, you're not a fully autonomous system.
Um, you know, and again, that's where I say that, you know, fully autonomous systems are very,
very difficult, if not impossible to build. Um, because if you have a database, well,
now you have a dependency, right? Your system, your microservice or whatever it is,
depends on the presence of that database.
Now, you can improve autonomy there.
So what you could do, and again, this is something I've actually done in the past,
is I've actually built a system at one point
where every instance of my microservice had its own database.
And so that improves autonomy because now I don't have a shared
database. I have independent databases for each microservice and each instance of the microservice.
And each one has its own copy of the data and everything else. That improves autonomy,
but that's also really expensive. And so that's another thing that you have to consider when you
do this is the further you move along this path to autonomy, oftentimes things get more expensive.
And so there has to be a real value delivered in order to make this worthwhile.
I would not, for example, recommend that everybody go out and build systems that create fully independent copies of databases for every unique instance of a microservice.
That's probably more expensive than what most people need.
But it's something where if you reached a scalability limit
and you realized that you couldn't go any further
because you had this shared database,
well, then that might be a place where you could say,
well, how can we break that coupling?
How can we isolate ourselves even more
so that we don't have a shared database?
And what benefit would that give us? And is it worth it? But is it worth it is always the key
question, I think. So you hit a big question that I have in this area, which is like, how micro,
how monolithic? Is there guidelines that can be used to decide, you know, how many services are needed to serve this customer function or how do we decide how to cut these things up?
So, I mean, I think, you know, part of that for me is the principles of isolation that we talked about.
You know, the goal of microservices isn't necessarily, in my opinion, based on size.
You know, I don't want to be one of those
people who says that a microservice should only be 100 lines of code or something like that.
I would prefer to say that a microservice should be as small as it needs to be in order to get the
job done and remain isolated. So there's no, I mean, that's kind of a wishy-washy answer in the sense
that I'm not giving you a concrete answer, only that I would say, you have to look at
your unique use case and say, okay, we can make this thing more isolated by doing X,
Y, and Z, whatever that happens to be.
But is that costing us more than the value it's delivering?
Um, and so, you know, that's, that's kind of the key thing is,
okay, so we could, you know, build our applications
so that everything shares a single database,
but they all have isolated tables that they don't access.
You know, that's better than having shared tables
that everybody accesses.
That's more isolated.
So we isolate by creating those
separate tables. That's kind of the first step. The second step might be, okay, are there other
options within our database? So can we have, you know, different schemas, for example, if you're
using like a SQL database, or if you're using a Mongo database, MongoDB has the concept of
databases within your MongoDB.
And then Cassandra, I think they call them key spaces.
So different databases have additional isolation techniques.
And so that's better.
Again, there's a little bit more of an overhead when you do that.
And then probably for most use cases, that's going to do the job. But then there might be those rare use cases where you really need to scale beyond the ability of one database instance to handle.
And then you start looking at, okay, well, what if I create a whole other instance of Cassandra that's just there to handle to ask yourself whether the cost to maintain that new thing justifies the benefit that you get out of it.
Is the important distinction for microservices the complexity of the business requirements and how they interact or the actual scale of deployment and the usage?
I think it's both in a lot of ways. So one of
the things that I think microservices do really well is they allow you to isolate complex business
logic in one area rather than having it, you know, trapped in your database in a series of
stored procedures or something like that that are used by multiple parts of the application,
you can have, you know, a single microservice that just deals with this piece of business logic,
however complex that business logic happens to be. What I like about that approach is it allows me to go into that microservice and sort of forget about all the other things for a while and just
focus on that microservice and that piece of the business.
And so I can sort of keep that all in my head without getting lost in the details of what everything else is doing with that. So I think, you know, it is good for isolating business logic.
And I think that's very important. You know, I think actually that's one of the primary things
we talked about. How big should a microservice be? I think one of the primary things is to look at, you know,
specific isolated pieces of business logic. In DDD, Domain Driven Design, we use the term
bounded context, you know, look for those bounded contexts. And that's kind of where you start
building a microservice because it allows you to isolate that business logic. So I think that's
definitely a very important thing. However, the fact that you've created this isolation and then potentially introduced a new level of autonomy that you didn't have before, then enables you to scale in ways that you didn't previously have the ability to scale. So in that respect, this is something that enables scalability. So I think it's a little bit of both. It's both business logic and scalability that drives this.
I think the term microservices, I think, became pretty hip maybe around 2015 or something.
That's my recollection. And maybe there's a bit of a backlash now.
But a long time ago, like maybe 2005, I remember people talking about service-oriented architecture.
Is there a difference?
Is it rebranding?
I think to some extent it's rebranding, but I would argue it's not completely rebranding.
I would argue that microservices are a subset of service-oriented architecture.
So service-oriented architecture would be kind of an umbrella term that covers microservices,
but it also covers other things.
One of the problems I think that happened with service-oriented architecture is over time, they started building infrastructure around doing service-oriented architecture.
So you started getting like these enterprise message buses and things like that, enterprise service buses.
And these would have a lot of functionality built into them.
They would do message passing between different parts of the system.
They would do message versioning.
They would do API versioning, all sorts of different things.
And I think that sort of muddied the water a little bit
because people got so focused on these enterprise service buses,
which really wasn't necessarily the original purpose of service-oriented architecture.
I think service-oriented architecture, the original purpose was really about isolation, which is, again, kind of what I've been talking about.
But on top of that, you know, some people build applications that they would call service-oriented architecture, but they build them in a monolithic style. So what they do is they build a single deployable unit,
but within that single deployable unit,
there are multiple services, essentially,
and those services communicate with each other
through a single or rather through discrete APIs.
So each service presents an API.
When another service needs data, it talks to that API. It doesn't go directly to the database. So they have isolation of state in that respect. What they didn't do necessarily is require that those individual services be deployed independently. And I think that's where microservices take all of those ideas of isolation of state and you know providing um that that api
and communicating only through that api but then they also add the additional requirement that says
and these microservices have to be deployed independently um they're not deployed as a
single unit and i think that's where the difference is and it seems like an all right solution, I guess.
Like deploying these things as separate services
isn't necessary to overcome the things you were talking about earlier,
like having clear dependencies.
It sounds like by the services talking to each other
through their external APIs,
they've covered that intertwined dependency risk.
Yeah, definitely. I think the thing to keep in mind is that, you know, again, going back to the
principles of isolation that I talked about, you know, they cover off the isolation of state
fairly well. And so that's a really nice thing. I think just by using service-oriented architecture,
you tick that box, at least to some extent. Maybe there's ways that you don't, but I think
generally speaking, service-oriented architecture does a really good job of ticking the isolation
of state box. Where it falls down a little bit is not service-oriented architecture, but that
sort of monolithic deployment style of service-oriented architecture, where it falls down is isolation of space. So again, because each individual service
is packaged up into a single deployable unit, so all of your services get packaged up into this
one unit that you deploy, that means you don't have isolation of space. You are basically requiring that you have, you know,
exactly the same number of copies of every service.
And that limits your scalability and it limits your ability to handle failures
because you can't, for example, say, well, I want 10 copies of my customer service
and only three copies of my reward service.
You lose that ability because it's all packaged up into one deployable unit.
That makes sense.
And there's a continuous delivery problem, I'm thinking,
wherein if you have these four services
all wrapped up in one deployment
and you want to roll out a new version of one of them,
you have to switch them all at once.
So you can't have the old user service still there
and the new version of the reward service talk to it.
Yeah, and this is one of the things that I think is really beneficial
from a development perspective is when you are working
with that monolithic deployment style,
if you've ever worked in an application that does that,
oftentimes you get into this situation where you get like deploy day.
So, okay, everybody, we're deploying today, which means nobody changed any of the code because we got to make
sure that nothing moves between now and when we deploy. And then you've also got this problem
where, you know, people are kind of talking to each other and saying, okay, I got my stuff in,
did you get your stuff in? You know, we got to sync up everything before we deploy. And, you know,
that all gets very expensive.
But then the other thing, too, is that you get into situations where, you know, I need to make a change.
And it's just a very small change.
It's a hot fix for a bug that I guess got deployed or whatever.
I want to make that change and I want to deploy that.
But now we've got this problem that, okay, maybe my change is small and we want to deploy just that change,
but we can't. We have to deploy all this other stuff as well. And so, I mean, you know, there's
ways that you can work around that to some extent with branching and things like that, but it starts
to get awkward and the maintenance burden of that gets harder. What I like about working in
microservices is it allows you to say, I want to make that hotfix to that bug and deploy this service.
I don't actually depend on anything else. So that's fine. You know, I can make that change
to just that one service, deploy it. And that's not gonna, I'm not gonna have to worry about what
everybody else changed in the meantime. The one thing that I think maybe can be worse is when you
want to change a service and the things that utilize that service.
Like when it was all a monolithic application, it could be a single commit. I guess the rollout is
a bit trickier perhaps, but now when it's multiple things, if you need to make a breaking change,
how would you handle that? I mean, I think you're right. I think the rollout of changes that affect
multiple services is arguably harder in a microservice-based approach than it was in a
monolithic approach. I don't think there's too many people that would argue that. So yes, I
absolutely agree. I think that is harder. Again, there's certain deployment techniques that can mitigate that to some extent,
like you can do blue-green deploys, for example,
where you still deploy each service individually
and you deploy each service as some number of copies,
but you kind of deploy them all into an inactive cluster
and then you flip over from the active cluster to the inactive cluster.
So there's ways to sort of mitigate that.
But it is more complicated, I think, is what it boils down to.
I guess the way that I would suggest you mitigate that problem
and the way that I have done this is you support the old API, right?
So if you need to make a change to an API of one service
and you have another service that depends on that API, right? So if you need to make a change to an API of one service and you have
another service that depends on that API, when you make the change to the API, you have to support
the old API as well. That is harder than it was with a monolith. With a monolith, you would just
make the change to the API, deploy everything at once, and you wouldn't have a problem.
So this is one of the things where when you make the move from monoliths to microservices,
you're going to get a lot of benefits.
You're going to get some disadvantages as well. And so it's a matter of figuring out for your particular use case, do the advantages outweigh the disadvantages?
I think when you're starting to talk about things like scalability, resiliency, things like that, if you've got a system that has to deal with, you know, millions of users or terabytes of data all the time, then we start to get into the situation where, yes,
we probably do want to make that sacrifice. On the other hand, if you've got a system that's
dealing with like, you know, 15 users an hour or something like that, and very small amounts of
data, this might be a bit much, you know, you might not need this kind of resiliency and this
kind of scalability. I think that's why sometimes you were mentioning before, like, you know, you might not need this kind of resiliency and this kind of scalability. I think that's why sometimes you were mentioning before, like, you know, people having pushback
when you say something about their monolithic app, but it, because it's probably started small
and delivered a lot of value and then, and then grew and grew and delivered more and more value,
you know, and along the way they're always taking these, these little steps to make it better um yeah i think so um you
know i think in a lot of cases you get people who uh jump in and they they're a startup initially
right and when you're a startup you don't have any users um but then over time that user base grows
you know maybe you get a few hundred users a few thousand users you know up to a few million users
whatever at some point your application starts to break break down, because you didn't build it under the understanding that you would have, you know,
that number of users. So, you know, I think that does happen. And I think that is one of the things
where you start to get pushback is when, you know, a startup is first jumping in and they've got, you know,
no users or a very small number of users doing a lot of this kind of stuff might be really
expensive and really time consuming and not worth it, to be perfectly honest.
Now, the thing that they have to consider, obviously, is, OK, so we don't have any users
right now, but where do we want to be in a year?
You know, if our goal in a year is to be at, you know, 10,000 users or whatever it is,
are we going to be able to support that given the infrastructure that we've built?
If our goal is to be at 10 million users, are we going to be able to support that given the
infrastructure that we built? And so, I mean, obviously, I guess everybody wants to be at 10 million users, but, you know, being realistic about it, is that a likely scenario?
And so it's about figuring out again, how much is worthwhile right now? Is it worth going through
all the effort right now so that we can be prepared in a year for when we get to the scale that we
want to be at? And the tricky thing, I think, in that startup mode is maybe you don't really know
what the future is going to hold because you need to get input from the customers.
So this reward example, you may have an idea that a reward system is a good idea,
but you may actually build it and nobody uses it and then want to remove it.
So I think that's why sometimes, you know, oh, we'll just
add it to the existing code base because we don't
know if it's a thing yet. Like this
is just a proof of concept to see if
users engage with this feature.
Yeah, and I think that's okay,
depending on how you do it.
So again, if you're in the situation
where you've got, you know, this big ball of mud,
you know, style architecture,
then I think at that point, you really have to be more careful.
You shouldn't just make the ball of mud bigger.
That's not to say necessarily that you can't add the existing functionality into your existing
monolith, maybe just to save on deployment hassles and things like that.
But what you should do in that case, if you're going to add it to that existing monolith,
you should add it to the monolith in an isolated way. And so that means, you know, kind of talking
along the lines of the service-oriented architecture style of monolith, where you create
the reward system inside your monolith, but you provide an API and every other part of the monolith where you create the reward system inside your monolith, but you provide an API
and every other part of the monolith that needs access to the data goes through that
API.
They don't go directly to the database.
So then your reward system has its own isolated section of the database that it's fully in
control of, and nobody gets to talk to that database.
They just go through the API.
What you've done now, though, is you've put yourself in a position where if it turns out this reward thing does turn into a big deal, now what you can do is you can say, well,
we've already got the tables and everything isolated.
Nobody's accessing those tables except through the API.
The API is already defined.
It's clear.
It's consistent, whatever.
Let's just pull that API out into a microservice.
And now we can do that.
We can pull it out into a microservice
without a whole lot of hassle.
And now we can start playing with the scaling options
that we've talked about already.
And so that gives you the flexibility to do that.
The key is, again, don't make the existing problem worse.
Always look for ways to make it better than it was before.
And I think that gives a great segue to your opinions on this hexagonal architecture.
So if we're building a single app, how do we build it in such a way that the dependencies are not tangled?
Yeah, so hexagonal architecture, I think, is a really interesting thing,
something that I use heavily when I build my own applications.
And what hexagonal architecture does is it sort of divides your application along clear boundaries.
And so you have kind of at the center of the application, you have your domain.
And your domain is like basically your business logic. It's all the things that are critical
to the operation of your business, the rules that are associated with that business,
the decisions that you have to make, all of that kind of stuff falls into your domain.
At the outside, the very outside edge of that, the very outside edge of your system,
you have all the infrastructure you need to make the system work.
And so that's, you know, things like your database,
your user interface, you know,
if you're using any kind of messaging platforms,
you know, your messaging platforms will be out there.
You know, so any of the technology that enables you
to make your application work,
those kind of fall into the infrastructure category.
And what hexagonal architecture does for me
is it allows me to make very clear distinctions
between what is domain and what is infrastructure.
And so essentially what you do is you say,
okay, within the domain,
I'm not allowed to have any dependencies on infrastructure.
So my domain doesn't know what kind of database I use. It doesn't know I'm using SQL. It doesn't know I'm using Cassandra.
It doesn't know whether I have a REST API or a user interface based on a website or something
like that. It doesn't know those things. Those are all infrastructure. All it knows is things like,
you know, when I get a request to reward a customer because they purchased something,
this is how many reward points I will give based on, you know, the amount of money that
they spent.
That's a business rule.
So what hexagonal architecture does is by forcing you to say your domain can't depend
on your infrastructure, it forces you to introduce layers of isolation that then
enable you to make interesting decisions later on. So for example, you need stuff out of a database.
I mean, that's going to happen at some point, but you don't need to know what kind of database it
is or what that database looks like. You just know, for example, in the reward system, you need reward points. So you know that there is an API that you can call that gives you reward
points. You build an interface or something in your application that does that. Then you have
an implementation of that within the infrastructure that says, well, this happens to talk to SQL,
or it happens to talk to Cassandra or whatever. Now that you've done that, you've created that separation between the domain,
which just says, I need a way to get reward points,
and the infrastructure that says,
I get reward points out of the database.
Now that you have that separation,
you can start doing interesting things like saying,
okay, well, I realize that the database representation
that I used here was actually very inefficient,
so I'm going to rewrite that database representation. None of my domain code changes
because your domain code is still just getting reward points. You're only changing that
infrastructure layer. And so that allows for a lot of flexibility. I've done systems that use
hexagonal architecture where, for example, I have changed the underlying table structure
of something in order to make it more efficient without basically just rewriting one class.
And that's that class that's accessing the database, what we would call a repository
in domain-driven design terms. So I've changed the implementation of the repository. The domain
code stayed exactly the same. I've also changed to a totally separate
database. So I've gone, for example, from MongoDB to a SQL database. And again, the domain code
didn't have to change. Nobody using that service had to know that that change was made. No other
services had to change because everything's isolated in state. I've gone further than that, though, because
on the flip side of that, if your infrastructure layer says, you know, I am operating through a
REST API and I'm making calls into that domain, that domain presents sort of a clear API that
says this is how you talk to that domain. Now what I can do is I can do things like say, okay,
well, originally I had a REST API and it made these calls into this domain, but now I don't want a
REST API. Now I want an event-driven system. Well, it just makes the same calls into the same domain.
So you can add additional endpoints, maybe a REST endpoint and then an event-based endpoint,
and then maybe later on a user interface based endpoint, they're all talking to
the same thing. They're all talking to the same domain. And so you can make those kinds of changes.
You can potentially do things like rewrite the entire domain. And as long as that interface that
you've provided, as long as that API to the domain remains consistent, you don't have to change
anything on the infrastructure level. So there's lots of flexibility that comes when you do this properly. Yeah, it's, it's very interesting. And it seems to have a lot of principles that are
great for keeping these dependencies from from, well, keeping the dependencies from being too
coupled to each other. One thing I didn't understand about it is, I don't understand
why it's a hexagon. Like I saw a picture of it. There's a hexagon in
the middle. It says domain, but at the six sides, I don't really understand where the six sides come
from. Yeah. You know, to be honest, I'm not sure either. When I first started learning hexagonal
architecture, I was introduced to it with three different names. So I was introduced to it with the name hexagonal architecture,
which I found very confusing for the same kind of reason that you expressed.
Why is it a hexagon?
I was also introduced to the concept of ports and adapters, which is another name for it.
And then I was introduced to it as onion architecture as well.
In some ways, I think Onion architecture represents my understanding of it
better, which is you have these different layers you have. So the inner layer is the domain.
Outside of that, you have what you would call the API layer. And then outside of that, you have the
infrastructure layer. And the dependencies in these layers go from the outside in.
So infrastructure depends on API.
API depends on domain, but never the other way around.
And I think logically, in my head, that makes sense.
I'm not really sure why the original hexagon.
I'll figure it out and I'll put a link somewhere.
But yeah, I think what you said makes sense sense where it makes it easy to put in different
implementations, um, which could be various sides perhaps.
Yep. Yep. Um, I saw here on your Twitter,
it says that you're a science fiction author.
Um, I do. Uh, I mean, I, I would say I'm a, uh,
a wannabe author to some extent, uh extent of a little bit of science fiction, fantasy.
So, yeah, I do a bit of writing on the side, nothing published, but I've written I've written one novel,
which I'm kind of in the final stages of polishing up before I maybe start start farming it out to publishers.
But and, you know, working on other projects
here and there. That's awesome. What, who's your favorite, uh, author, uh, right now?
Uh, favorite author right now is, uh, Brandon Sanderson. Definitely. Um, he's written a number
of books. Um, I think, uh, my favorite by him is the Mistborn, Mistborn trilogy, which is absolutely
a fantastic series of books, uh, which I would highly recommend to anybody if you're interested in fantasy at all.
I've never heard of it.
I read a lot of science fiction, but not fantasy as much.
I'll check it out, though.
Brendan Sanderson dabbles in a little bit of science fiction.
He's primarily a fantasy author, but he's had some short stories and things like that
that are more science fiction oriented, I think.
I would say I probably read more fantasy,
but I do read a little bit of science fiction
here and there as well.
I think my favorite science fiction book,
actually, that could be a tough one.
It probably is between Dune, Frank Herbert's Dune,
and Orson Scott Card's Ender's Game would be kind of my top ones.
Oh, yeah, those are both great books. At some point, I read all the Frank Hebert books, and I loved them. They're great. So much detail in his uh world that he created yeah to me um dune is kind
of the science fiction equivalent of uh the lord of the rings you know that that intense world
building yeah that's definitely true well before we uh wrap up our talk i wanted to say that uh
your your course that you're building is really great i went through uh quite a bit of
it and i like the structure i love watching uh tech talks um but the thing i liked about about
your structure you have is there's a talk portion there's questions there's answers makes it a
little bit more uh engaged than just watching like a several hour talk i thought it was great
yeah i think that was one of the things that we
really focused on when we were building the course was a couple of things. One is
everybody learns differently. You know, so some people learn by watching, some people learn by
listening, some people learn by reading, some people learn by answering questions and things
like that. So we wanted to sort of hit as many of those different learning approaches as possible
with the course. But the other thing too, is I didn't want the course to be something where you can just sort
of like put it on in the background and tune out and not really pay any attention to that.
I do that all the time. I'll start listening to something and I sort of wander off and
don't pay attention to it. I wanted this to be something where you come out the other end
and you have actually absorbed the information. And so that sort of necessitated
the introduction
of the questions um we also try to find ways to use the questions as a a bit of a learning
experience as well so the thing i liked was your um well the thing i liked was it takes a case study
approach somewhat with this reactive barbecue uh yep it just made me want barbecue to be honest.
You know, I actually, so we do we do in-person training of the, this same course. It's not
exactly the same, but we have an in-person version of it. The exercises are all very different,
much more interactive, obviously. But one of the things that I did during one of the teaches,
I think about a year ago is I spoke to the organizers of the one of the things that I did during one of the teaches, I think about a
year ago, is I spoke to the organizers of the conference where I was teaching the course,
it was the Reactive Summit. And I spoke to the organizers and specifically said, hey, can we
organize some sort of barbecue meal, you know, during the course at some point, particularly
because we were teaching in Texas.
So it was sort of like, okay, we're teaching the reactive barbecue in Texas. I mean, come on,
you have to have barbecue at some point. So they came through though. And we indeed actually had
a nice barbecue meal the one day. So it was really nice for that. It would have been funny if actually
the barbecue ordering site went down during the process because it fits right into your case study.
Yeah, yeah, absolutely.
Well, maybe not funny when you're hungry.
All right.
Wade, thank you so much for your time.
It's been a lot of fun.
Yeah, no, it's been great.
And, you know, I mean, you mentioned the course.
I think at this point we have three pieces of the course out,
but we've got another bunch coming.
So, you know, keep your eyes out, I guess, for the rest.
Yeah, and actually, let's just touch on that.
So what are the three courses you have so far?
So the three courses are basically,
the first one is kind of an introduction to reactive architecture.
The second one is domain-driven design.
And then the third one is all about building reactive microservices.
Awesome.
And so that's part of one training path on the IBM cognitive class.
So the training path is the LightBend Reactive Architecture Foundations.
So we're going to be launching another training path shortly,
which will include another three courses so that's great i'll put a link in the show notes for this episode
yeah great that would be awesome
well that is the show i would like to thank everyone who helped share the last episode with Philip Wadler.
It got some great attention on Reddit, our programming,
where there were lots of interesting comments and critiques.
If you made it this far, you must have enjoyed the show.
So tell a friend about it, mention it online somewhere, whatever you can do.
It helps grow the show.
Talk to you next time.