PurePerformance - PurePerformance Guest Host Series 01: Alois Reitbauer presents From Monolith to Microservices at Prep Sportswear
Episode Date: August 8, 2016Alois Reitbauer (@AloisReitbauer) guest hosts - Mike Jones ( http://bit.ly/mjlnk ) takes us on a journey how the team moved a monolithic application that was built by a remote team to a micro service ...architecture. Learn how the manage a couple of million lines of code with only 5 people while improving performance and availability. Mike also shares lessons learned on their journey and shares strategies on how to make the transition to micro services while having to keep the lights on for day-to-day business.
Transcript
Discussion (0)
It's time for Pure Performance.
Get your stopwatches ready.
It's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello, everybody. We have a special episode of Pure Performance today.
My name is Brian Wilson, and I do not have Andy Grabner with me today.
I don't even have myself with me today because my colleague,
Alois Reitbauer of Dynatrace, he was so excited about the idea of these Pure Performance podcasts that he went out and bought a microphone of his
own and is going to be a guest host from time to time whenever he meets in or meets up with somebody
who he feels could offer some great insight as well. So today is the first episode with Albus
Freitbauer guest hosting. His guest today is Mike Jones, director of technology for Prep Sportswear.
I think this was a very interesting episode, actually, because we get to get a lot of insight behind what's behind a site like Prep Sportswear.
Do yourself a favor and go check out the website so you get an idea of what the front end is, because behind the scenes, there's quite a lot more going on. You know, they're handling everything from not just coding for the website, but IoT type of coding, Internet of Things. So they have to do coding for the embroidering machines, laser cutters, all sorts of aspects of not just the
order fulfillment and website, but also the manufacturing of those components.
So it's really quite a different take on the standard kind of e-commerce view
or set of responsibilities that most people have.
Also in this episode, you'll hear Mike talking about their conversion
from a monolithic code to microservices and how to get started along that journey,
restructuring your team so that you can start meeting some of those business challenges.
And they also talk about, you know, their APM transformation, you know, how they left
traditional monitoring behind and went with more robust APM style monitoring. And he also talks
about just some of the tools in general that they're using to help build that team and also
to get their work done in general. So I found this one very interesting. I hope you enjoy.
Alois is on Twitter at A-L-O-I-S-R-E-I-T-B-A-U-E-R.
You can follow him there,
and I'm sure you'll be hearing some more episodes from him in the future.
In the meantime, let me stop delaying and pass it off to Alois.
Hey, this is Alois Reitbauer from Dynatrace.
I'm here today with Mike Jones from Prep Sportswear, and we talk about their journey from monoliths to microservices.
Hello, Mike. Hello. So, Mike, tell us a bit about yourself and what Prep Sportswear does.
Sure. So, I've been with Prep Sportswear for about two and a half years now.
I'm a bit of a veteran.
I've been 20 years in the e-commerce space,
and I've worked with some brands that folks have heard of.
I think Classmates.com, eHow.com.
But I've been here with Prep, and Prep's a great opportunity.
There's a few things that brought me me here it's a custom print company and we we do our own manufacturing it's a u.s
manufacturer and we carry designs for just about every middle school and high school
grade school in the country so you can go find your kid's school and buy a shirt with their
mascot on it and have your kid's name on the back. Go to their sports games, that kind
of thing. So we've got a fun product. And the technology is we, of course, we support
the website, but we also support the manufacturing. So I get to work with the website one day, and then another day I can work with an embroidery
machine or a laser cutter.
Yeah, I saw that machine just over there.
You're like, we're the Internet of Things.
You guys can see it, but they have this really big stitching machine there where they're
playing around with.
So it's really amazing.
Yeah, so it's fun.
It's a good variety of technology.
And, you know, it's here in the Seattle area, and we've got an amazing view of the water.
Guys, they really have an amazing view of the water.
It's like sixth floor overlooking the Seattle Harbor.
It's amazing here.
I really wonder how guys actually got work done over here.
It's this amazing view, honestly.
You can see we're on the side that faces the city.
That's true.
The development team is actually looking the other direction.
That's true.
Okay.
So, yeah, we wanted to talk a bit about your story from moving from a monolithic application
to the microservices approach.
Sure.
Let's start at the beginning.
I think that's a good point to start at. So how did you get involved in this whole microservices approach. Sure. Let's get, yeah, let's start at the beginning. I think that's a good point to start at. So how did you get involved like in this whole microservices journey?
Yeah, it's, when I brought on, we certainly had some platform issues. We had some scaling issues
around our peak period. So, you know, like a lot of retailers, Black Friday, Cyber Monday are big days.
And the Cyber Monday before I came on, I think it took something like 22 hours to process all the orders to be able to start fulfilling those things,
and that's forever.
We also had one of the lead engineers who he was on call, essentially, our production facility
in Louisville, Kentucky.
They start at 6 o'clock in the morning, which means if something goes wrong, they're calling
us at 3 o'clock in the morning Pacific time.
And this poor gentleman, he got a call 17 days in a row at 3 o'clock in the morning
saying,
hey, there's something wrong with the platform.
Oh, it's good that you guys have Starbucks over here.
Yeah, so obviously something had to be done.
And we looked at doing a number of things, including introducing a service layer microservice tier.
So the 22 hours, maybe to put this a bit into context, what's your goal?
So how quickly do you want to deliver something if you order it on a website?
Sure.
Yeah.
So we're a custom print company that sells sportswear.
But what that means is we don't have pre-printed product on the shelf, right?
So every order that we have has to be, we take the shirt down,
we either put it through an air-embracing machine,
or we apply the print onto it.
So everything's custom.
And what we try and do is, uh, ship within four days.
Um, so if we have a delay, um, of just getting the order ready to, to get the design and
the product ordered, then, you know, we, we eat into that schedule.
And we're also talking, um, this is the run up to Christmas.
So people, if they don't get their stuff Sam's not happy
the kid's not happy
so yeah time is everything
and the
service tier that we're putting in and some of the changes
we're doing in our business process is really
letting us shave days
off of our production schedule
so since we've been
changing the technology we went from averaging
five or six
days in production to getting closer to two or three. And we're really looking towards the day
that we can do shipping next day, same day fulfillment. That's great. And for like the
people listening to the podcast, so how big is your like? yeah so that's another
fun thing
when I'm talking about all the different pieces
of technology support
almost all of it was written from the ground up
we don't use a lot of
third party libraries
so in order to support e-commerce
ERP, device integration
design, manipulation
all of that stuff together e-commerce, ERP, device integration, design, manipulation,
all of that stuff together is about 4.5 million lines of code.
Yeah, and you probably have an army of developers working on it, right?
Yeah, right now I've got five developers.
You must be kidding me, right?
No, that is it.
So how many lines of code is this per developer? Like 800,000? Yeah, if math serves me correctly, something like that.
So that was quite a challenge.
I mean, just keeping this code up and running must be quite a challenge.
Yeah, I think one of the great stories that we have here at Prep
is going from that time where the you know, the lead developer is
answering his phone every day at three o'clock in the morning. We've really been able to,
you know, get the platform a lot more manageable, a lot more stable. And the phone just doesn't
ring like that anymore. This last peak period, we had one issue at 3 o'clock in the morning for the whole month.
So, you know, using tools like Dynatrace, you know, spending effort on breaking up the platform and, you know, just making it manageable, you know, has really paid off for us.
Yeah, so I can imagine, like you talked about here,
we have 4 million lines of code.
And you wanted to do microservices, totally understandable.
You said you wanted to make your platform more stable.
You want to have your order fulfillment faster.
If you look at 4 million lines of code,
so how do you really figure out, so where do you start?
Like there's a lot of places where you could start yeah yeah and i think talking about uh the fact that we really had some performance issues
this stuff that was kind of on fire i mean that was that's where we started because we had to
insure up those services um and uh so that the initial part was pretty easy.
We've got to do something about this.
Some of the other pieces that we've done, and we're not done, by the way.
This is certainly a work in progress.
We started with what's on fire, and then we took a look at some of the things that are pretty easy. They don't have entanglements on a lot of the rest of the system.
And then we're building on from there.
And what did you pick for your first microservices project?
Was it something already in the platform or some additional new stuff you wanted to add to it?
Yeah, I think the first real standalone service was a new feature. We had a... So our business model for a long time was a just-in-time model.
We didn't hold any inventory at all. So the minute somebody ordered something that night,
we'd go order from various suppliers and get it in the ship. For a lot of reasons, we decided
to start holding our own inventory. So building that system was the really first standalone system.
And how did the project work out for you?
What were the key successes, the key challenges that you saw in this first microservices project that you guys were doing?
Yeah.
I mean, the challenges, technically the challenges became really just the touch points with the rest of the monolith.
So getting that first feature set out where we're tracking the units and we're inputting them in, gotcha case that I came across was, you know, we're tracking inventory against orders, you know, correctly, which is great.
We tested that.
But then we had another part of the system that was going through, and when there was a problem with the order, it would cancel the original order
and it would create a replacement order.
And it would do it, of course,
outside of the scope of this new service.
So we had all this untracked inventory
that was being generated
because the other part of the system
was changing the state of the order.
So finding those things and fixing that,
tearing it out of the state machine, those are the challenges
that we had. So what you're
actually saying is you can't just look at
your architecture and say, like, this is
logically a service, this is logically a
service, this is a service, and not taking
the service and kind of
ripping it out of the heart of this
monolith, but it's like really
looking at your business processes end-to-end
first, and seeing how this plays together, but it's like really looking at your business processes end-to-end first,
and seeing how this plays together, because otherwise
you end up changing
one piece of your system, one
business process knows about the new services,
the new service.
The other one does not know about
it, and then you end up with a great
microservice architecture. It's just no longer
doing what the business needs to do.
Well, and for us in particular, it's challenging because we had a ton, a ton of turnover.
So this four and a half million lines of code is really built by the original software team
over eight years.
And basically almost all those folks left within a few months of each other.
So we've got this system, we're, you know, we're all new to the system. The amount of documentation
was fairly poor. And the stuff that some of the stuff that was in there was in Russian. It was outsourced. So, you know, not only do we have a challenge, you know,
trying to meet with this architecture,
the architecture isn't well documented or understood.
So, yeah, that's a challenge.
I think it's also, in your case case like an architectural challenge as well.
Like you said, okay, it's a form of code.
It's not documented.
It's in Russian, partly at least.
It often breaks in production or it frequently breaks in production.
You bring on a team of new guys and tell them, okay, now we're refactoring this into microservices.
Yeah.
I know how to sell a job.
Yes.
Sign up here.
So how do you get the team motivated or keep the face up so that they're not afraid,
oh, if I touch something, I break even more, like this is like way over my head.
Yeah.
So how do you work with the team and what's like the framework that you're providing to them that they're happy?
Yeah, and that's really, I think, one of the benefits as we're moving things on to these services
is we've got separately testable, you know, the new stuff has that in mind.
But that's not true with the monolith.
And with the monolith, we've got a little fraction of it is wrapped in unit test. So we end up having a lot
of end-to-end test. And, you know, they're expensive to maintain, they're expensive to run.
It's, you know, just certainly something that we're getting through. And, you know, it's also
helping everyone see the priority
of moving things to the new platform.
It's just expensive to run this monolith.
And how do you describe this to the business owner?
We met your CEO before.
So did he walk into his office and say,
so listen, we're not going to do anything
for the next six months to one year.
We have this new architecture.
It's called microservices. It's going to be great. You guys just have to hold on for a little bit, to one year we have this new architecture it's called microservices
it's going to be great you guys just have to hold on for a little bit like a year or so or how do
you obviously you didn't do that right yeah i think he was really cooperative i mean that
everyone understood when you've got a platform that has that many issues like something has to
change right so i i think know, the business understood,
you know, why aren't we doing the old thing anymore?
You know, why move to Microsoft services in particular?
I think that's, you know, something that I certainly had to sell a little bit
and something that they certainly bought on.
What was the key argument there when you sold it to them?
Especially to business.
I mean, they're not computer guys.
You can't say it's like a clean architecture.
They simply won't care, right?
Yeah.
Yeah, I think looking at the landscape
and seeing what some companies like Amazon did,
where they're building everything as a service,
they're able to expose that and create new markets where they're not really a website.
They're a collection of services.
And seeing how powerful that was for them, I think that makes an argument for us where we can look beyond just being a website
and be able to provide services.
So you have another story like you have like a big challenge ahead of you right now and it has
to do with stitching so i really like that story so maybe you want to share it like that microservice
story yeah okay yeah this um so one of the uh one of the stories around uh the challenge with creating microservices out of what we have
and the reason to do it is um our design service and so we have a couple different techniques
uh for creating our designs there's embroidery um there and there's ink, there's applique. So for embroidery designs, they track how many stitches it is required to create that design.
Well, lo and behold, the monolith, with how tightly integrated it is with everything,
if we want to price out a shirt, it actually calls the design service to get the stitch count
to figure out if we need to charge an extra dollar for the design.
So just the act of putting something in a cart that calls the pricing engine, which calls
the design service multiple times to find the price, shows how entangled things are and how
hard it is to separate the logic based on the existing implementation.
So needless to say, we shouldn't have to count stitches to price a shirt.
Okay, that's something you guys are working on.
It kind of really shows that, again, it's kind of tightly related to your business.
First of all, it's about pricing that you have to figure out as a customer, I don't want to wait.
I want to check out and buy the stuff, so please tell me the price of it.
So what I really like about your story is everything is business driven.
People always talk about microservices that are driven by business and they like this
great architectural thing.
In your case, I think it's really perfect, the inventory service that you were talking
about. You've had a story
about payment providers as well that
kind of ties nicely into microservices
as well.
You have flexibility there, right?
Yeah.
I think some of the
practices with
delivering these services, and we can show
the business user,
hey, here's the status of the service, and here's the number of issues we've had with it.
So with payment providers, we're able to really easily expose who's responding quickly,
who's up and down, which providers have more fraud scores, that kind of thing,
surface that quickly via just a single service.
Whereas before, that data wasn't exposed anywhere, and it was just part of the checkout process.
So, you know, we get to surface properties of the service to, you know,
stakeholders really quickly. And they love that.
I want to, like, leap over to a bit of a different topic. You said you could have
replaced the whole team or you rebuilt the whole team.
Sure.
To put it that way. So how did you start with the team?
Yeah.
Building up the team. What were your priorities? Which people did, or which roles were, like, super critical for you to bring aboard first?
And how did you go about that?
Yeah, so this has always been a small team.
We, even with the offshore team, when I came on, there's 20 people in technology.
Of that 20, there was one dedicated operations person.
So almost everybody was in some kind of development capacity.
And we had operations challenges.
So one of the first things I did was hire a DBA and a DevOps engineer
so we could have folks dedicated to solving performance issues and getting
that off of the developer's plate.
The interesting thing about that is I actually had the developers challenge me on that.
They were like, well, no, we need more developers.
We've got to spend time fixing this.
And instead of having folks, you know,
take that off their plate. So I did it anyway. And it really proved pretty massive for us. I think,
you know, we were able to go through that big change in staff with having some key folks. And we ended up having faster services and less issues along the way,
which is pretty amazing.
How many people do you have in operations right now?
What, at DevOps role, most likely?
Yeah, so I have one Monday DevOps person.
I've got four full-line operations engineers.
And I assume they were closer together with the developers.
Yeah.
Yeah, we're all co-located.
I've got one IT individual, Chad, over in Louisville on site,
but everyone else is right here.
That's where your production facility is.
Yeah, correct.
So, yeah, we're really tight-knit. but everyone else is right here. That's where your production facility is. Yeah, correct.
So, yeah, we're really tight-knit.
I mean, we're 11 people total, so we have to be tight-knit.
Yeah.
So, I think you guys are also using, like, HipChat and ChatOps a lot to communicate with each other?
Yeah, yeah.
We're definitely posting charts, asking questions,
do a lot of communication there.
What does the rest of your technology and tools look like?
What are you guys using?
We already talked about HipChat, but whatever tooling.
Yeah, so we, PRTG is our systems monitor.
Of course, we're using Dynatrace. PRTG is our systems monitor.
Of course, we're using Dynatrace.
We've got a couple other tools that are escaping my mind.
Jira, right, for tickets?
Yeah. Just see it on the wall here, guys.
They have Jira tickets on their wall.
Yeah, definitely Jira.
And I think we were using some older antiquated bug tracking system
before. Actually, almost all of our tools are new in the last couple of years. And going with that
is, you know, I changed the team around a little bit with responsibilities, but we also really changed as a company as far as adopting agile methods,
getting the whole company kind of understanding what the roles and responsibilities were,
and then just increasing communication and getting folks aligned on what we were working on.
So just that list of bugs up there on the wall, you know,
reminded me kind of coming back to something you were talking about earlier,
which was, you know, how did we get through all that change?
Well, one of the things we had to do with the business is take a look at that big backlog of stuff
and say, look, we've got to kill a lot of the stuff off.
We can't do, with five engineers and, you know, three ops guys,
we can't do 100 projects and replatform. So let's throw that out and kind of start over. So we started literally with
a new bug tracking system, cleared the decks for projects, and just really started over.
Do you still remember what your issue number one was that you closed?
Have no idea.
Two years ago seemed like a long time.
And for continuous delivery, do you use Jenkins or?
TeamCity.
TeamCity, okay.
Yeah, we're a C-sharp stack, and TeamCity works really well with C-sharp projects.
Yeah, which is also kind of interesting because very often we hear people talking about microservices.
They use a lot of, say, the fancy new technologies.
I'm not saying that your technology stack is not fancy,
but I think you have a kind of, from what you see most out there,
a bit of an uncommon technology stack running on, say, classically.NET, to put it away.
Yeah.
Yeah, I mean, we definitely looked at, you know, Docker, Node,
or some other technologies that other folks are using more with microservices.
And there's a lot of folks doing great work with that.
But for us at the time, I think with the amount of change we had,
totally changing our technology platform too
was just a little too much to assume at the time.
So we're still able to do really good work
with the technology we have.
C-sharp, it's not horrible.
I'm not saying that it's bad,
because everybody else is using Node.
I think it's cool, because it adds this variety in there.
But you're also innovating on the platform side.
So we were just talking before,
you showed me your new servers that you just bought,
and you guys are also investing in OpenStack, right? Yes, yes, we are. on the platform side. So we were just talking before, you showed me your new servers that you just bought,
and you guys are also investing in OpenStack, right?
Yes, yes, we are.
We're just starting to roll that out.
Yeah, it's really crazy kind of how much technical
that we had in a number of areas.
And we had a mix of virtualization platforms.
We had Hyper-V and I'm forgetting.
VMware?
Yeah, VMware.
That's a small company.
It's a simple virtualization.
Yeah.
Yeah, so, I mean, we had a mix of virtualization technologies,
and we had just a mix of OS instances, 2003, 2008, 2012.
So we're able to, you know, instead of babysitting all of those things and kind of, you know,
getting one of those things as a winner, we're just kind of starting over with OpenStack
and building from the ground up, and we're going to be able
to really collapse our
services, the number
of instances down and have a real
performant manageable
system. I think it also plays
well into a continuous delivery pipeline
because you now can
really provision hardware
and machines as you go
before you had to do it sometimes manually
you mentioned. Yeah, no you had to do it sometimes manually, you mentioned.
Yeah.
Yeah.
No, absolutely.
You know, being able to spin up new instances, you know, is pretty magic.
Along the journey, did your approach to monitoring or performance management change from where
you started to where you are right now?
Oh, yeah.
Yeah. to where you are right now? Oh, yeah, yeah. I mean, that's obviously going from a place where the platform was having a lot of challenges.
Our monitoring system wasn't effective.
It wasn't helping us find root causes.
It was just that thing that was blinking all the time and sending everyone email.
So taking somewhat immature products, we were using WhatsApp Gold as our system monitoring.
And part of its tool, part of its practice, I think our one ops guy, when someone would come and say, hey, there's a problem, he'd looked at two charts.
He looked at application CPU and he looked at edge network graphs.
And if those things looked okay, he just kind of said, oh, I'll go talk to the engineers, you know, which wasn't very helpful. So now bringing in tools like Dynatrace where the operations folks can actually do that first deep level of analysis.
And we can also look at trends, see, hey, look, this is a thing that keeps getting in our way.
Let's go fix that and do it as a team effort being led from operations.
That's really allowed us to keep that phone from ringing at 3 o'clock in the morning.
Which is huge.
Yes.
Especially for the guy who has to answer the phone in the morning.
Another interesting thing that we were talking about before is that
you don't have this typical knock room anymore.
Correct. You told me you bought a monitor to put it on there to share the data and
you have great dashboards that you showed me that you also built with Dynatrace.
Really combining also real user metrics and business metrics like conversions,
this kind of stuff, but it feels like monitoring is kind of more integrated
into your deeper workflows
that you have with HipChat and so it's more ubiquitous than like, okay, this is the monitoring
piece.
It at least feels to me like this.
Yeah.
Yeah, and I think you got to look over my shoulder a little bit and see some of our
HipChat conversations back and forth.
And it's, you know, we're looking at're looking at, um, you know, performance data
throughout the day and being able to talk about it. And, uh, you know, it's, it's, it's in some
cases it's, it's monitoring, Hey, there's an issue. Um, in other cases it's, Hey, you know, this,
this, this data looks a little different. What's, what's going on here. Um, you know, is that good?
Is it bad? I don't know. Let's take a look.
So it's certainly more of just part of our everyday instead of looking at this board that says what's wrong.
So what are your recommendations?
So I think a lot of people out there,
they want to switch to microservices themselves.
They have similar problems that you guys might have right now yep and after listening to your story they might think okay this
is great i want to do it as well so what are your recommendations for them to get started
or what would you kind of give them as a well i i think you know certainly if we can do it with this big a code base and this small a team, I guarantee you all can do it.
You know, we didn't have the luxury of being able to dedicate a team that's building out the new Microsoft service infrastructure, which a lot of folks do.
Do you think, by the way, it was an advantage for you to have that small team because you never had that option? I think the commitment from the whole company
had to be there then. So it's, you know, we decided this is what we're going to do. And,
you know, again, we have the backing of the business who said, okay, we're going to take
stuff off the queue. And, you know, so, yeah, I think having the small team made the investment higher,
which made it more interesting to everybody.
I was going to say, what I meant was you didn't have the,
even the opportunity to say, okay, let's keep everything running that we have right now.
We built a separate platform.
Yeah.
So it made your decisions much clearer and, like, kept them into a smaller frame yeah yeah no that's true too
it's what we first um decided to uh work on was certainly what's getting in our way and uh you
know that's that's technical debt right it's uh you know how do you solve that, kind of irrespective of are you moving the services or not.
It's what is getting in your way and just find a way to take care of it.
And for us, moving things to microservices was that thing.
Any other recommendations you would give to somebody who's just thinking about starting it? Yeah I think one of the bigger challenges that that we had you
know you need to certainly understand if you've got a monolithic system you know
that state machine and what your data looks like there the services can be
really easy to start you know that's attraction, right? You've got this really clear interface and these are
your known inputs and outputs. It's what happens
to that data underneath, that shared data, how do you handle
those state changes? That for us has been
tricky and I think it is for a lot of folks.
An easy answer or a best practice is, well, segregate that data too.
That's just not possible for us, for the way we're doing the business right now.
So we've got to find some ways to manage that.
Okay, I think that's great advice.
I really like your story.
And it's really impressive if you walk along the office here and you see the small development team and know, okay, imagine these four million lines of codes.
They're rebuilding a platform.
They all look really happy, so I assume it didn't get up at 3 a.m. in the morning today.
Yeah, nope, not today.
I think it's really a great story.
So, Mike, thanks for taking the time.
Great.
Okay, bye, folks.
Thank you, Alois and Mike.
It was great having you guys on today.
And I really look forward to some more of these from Alois himself.
And you all listeners may recall the last set of podcasts
we just released
was with Richard Dominguez
of Prep Sportswear.
He had a little bit
of a different take
and a different angle
of the aspects
of what they're doing.
So I find the two of these
back to back
to be very interesting
and really helps
fill out a complete picture.
So that's it for today.
If you have any questions
or comments
or you would like to be a guest on the show,
please reach out to us
either via email at
pureperformanceatdynatrace.com
or you can reach us at
by sending a tweet to
hashtag pureperformanceatdynatrace
and we'll be happy to entertain
the idea of you being a guest star on the show and claiming your 15 minutes of fame.
We love having guests, really, though, and we find these conversations fascinating.
So until next episode, thank you all for listening, and we'll see you soon.
Good luck with all of your performance.
Goodbye.