The Changelog: Software Development, Open Source - Node Black Friday at Walmart (Interview)

Starting point is 00:00:00 Welcome back everyone. This is The Change Log and I'm your host Adams Dekowiak. We're a member supported blog podcast and weekly email covering what's fresh and what's new in open source. Check out the blog at thechangelog.com, our past shows at 5by5.tv slash changelog, and subscribe to our weekly email. It's called the Changelog Weekly. We ship it on Saturdays. You don't want to miss it, and you can subscribe at thechangelog.com slash weekly. This show is hosted by myself, Adam Stachowiak, and Andrew Thorpe. Now, we recorded

Starting point is 00:00:42 this show in particular before the new year. Didn't have time to publish it before the new year, but at the tail end, you'll hear Andrew mention taking some time off. That's already happened. We missed you. We're back. We're excited. It's 2014.

Starting point is 00:00:56 And this is episode 116, and it's sponsored by DigitalOcean, FreshBooks, and TopTile. We'll tell you a bit more about FreshBooks and TopTile later on in the show. But our good friends over at DigitalOcean have some cool stuff happening. They're nearing their millionth droplet. And to celebrate, they're giving away $10,000 in hosting credit. $10,000 in hosting credit. You heard it right. To a lucky user who hits this milestone.

Starting point is 00:01:24 And there are three ways you can qualify. Number one, you got to be the user who spins up the millionth droplet. So that's number one. Number two, you've got to include your Twitter handle in the droplet's host name. So when you create the droplet, you got to put your Twitter handle in that host name. And number three, you have to tweet to DigitalOcean with the hashtag millionth droplet. For example, I'm going to be the millionth droplet on DigitalOcean. That would qualify you. If you do all three things, you're in it.

Starting point is 00:01:58 So try DigitalOcean today for free using our promo code changelogsentme. That's changelogsentme. That's changelogsentme. That'll get you a $10 hosting credit as well as a chance, I guess, to potentially be the Muth Droplet. So good luck to you, but head to digitalocean.com to get started. And now, on to the show. We're joined today by Aaron Hammer to talk about HAPI, a server framework for Node.js and Node Black Friday when Walmart went Node for Black Friday. So Aaron, welcome to the show. Why don't you give us an introduction of who you are and what you do. Hey, glad to be here. I am the Node lead architect at Walmart. I'm part of the mobile group, and my team is basically focused on moving the

Starting point is 00:02:47 existing mobile services infrastructure from some legacy Java stuff to Node, and we basically drive all the API for the mobile clients. So what did you do before working at Walmart? Immediately before, I was about three years at Yahoo, focusing mostly on standards and focusing on interop and open web. I was one of the founders of the Open Web Foundation and did a lot of IPR work in terms of CLA and agreement there. And before that, I spent about 10 years on Wall Street building high-frequency trading systems. And, yeah, that kind of covers the last 15 years. So you've definitely been deep into the business side of things.

Starting point is 00:03:49 I've done a whole bunch of different things. My philosophy in life is that life is all about collecting experiences. So I tend to get bored with things and just switch to completely unrelated fields. So going from finance to just consumer web to retail. Let's talk a little bit about the retail. So what was behind the decision for Walmart to go to Node, and what was that process like? It wasn't a very intense process, to be honest. Basically, two years ago, Ben and Dion joined Walmart Mobile, and they were looking for ways to kind of move it to the 21st century from some really old stacks on Java it was using. And what was clear is that we're not going to be rewriting all the backend services, but we are going to be building

Starting point is 00:04:43 a new orchestration layer that's going to talk to a whole bunch of new and legacy systems, some of them using AS400 and offering you awesome SOAP APIs and others a little bit more modern with XML stuff. And so we don't want to implement any of that on the mobile clients. And what you want to do, you want to build an orchestration layer that kind of abstracts all the crap in the back or the good stuff in the back and then provides a uniform API to the mobile clients. So we were looking at different technologies and we just felt that Node was the right choice. That an orchestration layer that is mostly doing network, it's basically a glorified proxy with some data manipulation or data transformation.

Starting point is 00:05:34 But it's not, you know, no calculation. You're not pricing anything. You're not managing a complicated state. That's all done by the upstream account management and those. So Node looked like a good choice. And so we just went ahead and made a big bet that it's going to work out. So obviously Walmart's one of the biggest companies in the world. And in my experience with larger companies,

Starting point is 00:05:59 it's a lot harder to move a big ship than a small boat. And so what kind of pain inside of Walmart, if any, did you experience when you're presenting this, you know, this newer emerging technology as an alternative to, like, a reliable stack that's been around for a while? So it's still, there's still resistance coming from other teams. Within mobile, it was never an issue because mobile started out as a labs-like environment where our mandate was to experiment and try new things and use whatever technology we want. We were already introducing new things if it's iOS apps or Android apps to the existing IT stack that was used there. So that wasn't a big deal. But then going to the existing IT stack that was used there. So that wasn't a big deal.

Starting point is 00:06:47 But then going to the rest of the organization, when we went to the IT folks and the data centers to try to get some machines we can run it on, one of the first bombs we hit is that the version of Solaris that Walmart was running at the time could not support Node. We couldn't compile Node on that operating system. And so it took some time for us to convince enough people to get us some Linux boxes or smart OS boxes that we can actually run stuff. So it was more once you start interacting with the rest of the IT organization. And it wasn't much pushback. It's just we were asking them to do new things that they have never done before.

Starting point is 00:07:28 And those things take a lot of time. And if you think about it, Walmart runs all, I think it's like 17 countries now. So they're running all their operations all from the same set of data centers. So you're talking about the retail stores and the online all coming from pretty much the same set of data centers. So you're talking about the retail stores and the online all coming from pretty much the same spot. So you can imagine the change control in those data centers is quite insane. And for a good reason. You're talking about if you take down those data centers in the U.S.,

Starting point is 00:08:00 you're disrupting food supply for about 40% of the country. So the scale and the size and the magnitude of any change you're making is significant. So that's still the issue, but it's been very manageable. Yeah. So for Walmart Mobile, I guess, and I'm a little, I wouldn't say I'm fuzzy, I think I've made some assumptions, but you guys, did you launch the Node client for Black Friday? Or when did that launch actually happen? We deployed our, so we are working with a proxy strategy,

Starting point is 00:08:35 where basically the idea is to stick Node as a dumb proxy between the mobile clients and the existing services. And then slowly, based on business priorities and other requirements, start to hijack endpoints at the proxy and implement them in Node. So we've started doing that. But we're still proxying a large amount of the traffic through Node to the AppStream services. So we first rolled this out in April. And we kind of ramped up to 100% of all

Starting point is 00:09:07 mobile traffic around June. And we've been running with all mobile traffic going through Node since June. The problem was that we suffered from a pretty awful memory leak that caused us basically to have to restart the services all the time. And so we were never sure up until the day of Black Friday that our system is actually ready for that capacity. And we had mitigation. We had other plans. We had failovers. So it wasn't like, oh, if this doesn't work, we don't have mobile services for Black Friday.

Starting point is 00:09:47 That clearly is not going to be acceptable. Right. But we really didn't know up until the day of how well this is going to perform. So how well did it perform? Oh, it was the most boring thing ever. You had a tweet. I can't remember exactly, but it was something along those lines. The servers were bored.

Starting point is 00:10:05 Yeah, the servers were bored out of their minds, what you said. That was pretty intense. And then you were also doing a lot of live tweeting at that time too, like keeping a lot of nerds on their toes, I'm sure, just kind of like watching your progress. And I know I was shopping and watching at the same time. I just was watching the, you know, NodeBF was the hashtag on Twitter. We linked out to it, and we'll put this in the show notes too

Starting point is 00:10:29 so y'all can catch up, those listening. But Node had the servers that they were bored. So what was that like? I mean, the servers were doing nothing. They averaged about 0.75% CPU. That's not 75%. That's 0.75% CPU. That's not 75%. That's 0.75%. And by the way, we had a bug in our monitoring system for a while

Starting point is 00:10:50 where we thought that the range was 0 to 1. And we were really worried for a while because we were constantly hitting 50% to 60% CPU. And then we started investigating, like, what is going on? I mean, we're not doing anything. Why is the CPU so high? It should really be like 20. And then we realized that it was a unit bug.

Starting point is 00:11:10 And we were actually at 0.5% CPU. So it was very uneventful. But the node process was just sitting there doing nothing. Memory was completely stable. People have nicknamed my RSS charts as my lasagna charts. Oh, yeah. It just looks like a bunch of swiggly but flat trending lines. Yeah, no, it was really, really boring. And as the night progressed and, you know, my team was all up,

Starting point is 00:11:49 everybody was coming up with suggestions of how we can just gently poke the servers to make something break. Just to kind of keep it a little more interesting. Everybody has their own suggestion of what we can do. So, I mean, like you said, Black Friday is like the biggest retail day, you know, of the year. And you guys, what were you planning for? You said you had no idea what was going to happen, but I mean, really no idea or were you hoping for the best or like, what was in y'all's mind? So, I mean, the industry as a whole, the average is basically 40 to 60% of annual revenues online happen between Thanksgiving and Christmas. Which if you think about it for a business, that's awful.

Starting point is 00:12:33 It's insane. Cyber Monday being the two busiest shopping days. Although now with all the retailers, Thanksgiving has become the number one shopping day. It's a little crazy, but that's where we've seen the most traffic, especially since everybody is looking up the deals. Not necessarily buying, but they're looking it up. So we had the business every year the business gives us estimates of how much traffic uh what's the multiplier going to be from both last year and from um like from september of the same year yeah and so we were looking at uh at you know two three four times multipliers in terms of volume but more than anything we really didn't know how well the upstream service are going to perform so if you think about it node sits between the clients and the uh and the java services

Starting point is 00:13:31 and because node is doing such a great job managing the income traffic the incoming traffic it's such a great uh little executable for managing sockets. It's basically acting as a queue. So load balancer is basically sending traffic to the node processes, and then they are trying to proxy it over to Java. And if Java is behind and starting to get slow, it doesn't really add much load on the node process in terms of CPU because it's a non-blocking system. So it doesn't do anything.

Starting point is 00:14:01 It just sits there waiting for socket events. And no socket events are coming back. But what is happening is that we are growing in memory. Because we keep holding more and more and more sessions in memory until the Java stuff is ready. And in that environment, basically Node becomes a queue. And if Java gets very, very slow, then Node is, you know, the queue gets very, very big until at some point it blows up. So we didn't really know what to expect in terms of how big is this queue going to be? How well is the memory going to perform? So we basically dumped a lot of extra capacity.

Starting point is 00:14:35 We ended up, I think, with about six times more capacity than we actually needed, which really contributed to the completely boring performance. So this was then a major success, right, for you and your team? This was huge. I think we kind of proved the entire stack, but also the code I've been using privately in conversation with the Node core team and a few other companies like Joyent and Voxer, I kept saying, I don't want to be what Twitter was for Rails.

Starting point is 00:15:13 I don't want to be the Rails doesn't scale guy of Node. Because even though it was largely bullshit at the time, it did cause quite a significant damage to Rails adoption. Once Twitter was having problems, a lot of people, there was this backlash and people were going back to PHP because Facebook was on PHP and that was clearly much better. So I was like, I'm not going to be that guy. I'm not going to be the number one headline on Hacker News saying, no doesn't scale. Just ask Walmart.

Starting point is 00:15:51 So, I was really freaking out about that. And that was like the back of my mind as we were approaching this. And so, we kind of threw more capacity at it. We were like watching it um like everybody on the node core team was like following it throughout the night like all waiting for like anything if anything goes wrong to jump on irc with us and like help us fix it live um it was really like you know it it meant a lot to the community as a whole yeah it seemed like it i mean that's really the way i took it because we tweeted say night, follow NodeBF on Twitter and lots

Starting point is 00:16:27 of retweets came from that. And I think your tweet alone got 82 favorites and 157 retweets. And it just seemed like a lot of people were just like watching real time and a lot of people who were involved in Node just kind of like, you know, behind the scenes cheering to make sure that, you know know everything had gone successfully for you yeah my my follower count on twitter jumped like a thousand for the night um it's crazy it was it was it was quite funny but it's a you know it's like you said it's big for the community right i mean people that like node they want to see node succeed and so it's not just big for you and your team because you're proving an emerging technology to a very large company, but it is.

Starting point is 00:17:06 It's big for the whole community because, like you said, if something would have fallen flat on its face, whether it was your fault or an inherent problem with Node, then you're right. The rumors would have been Node can't scale. And, I mean, you still hear that every once in a while when people are talking about Rails, just when they haven't been maybe not in the community for the last couple of years. But you'll still every once in a while hear somebody say Rails can't scale. And that kind of sticks with you. So you're right. It's a good thing for the whole community, not just the Walmart Node branch. Yeah, this was a big deal.

Starting point is 00:17:41 And it also, at Node Summit, I gave a talk. Basically, my plan for Node node summit i gave a talk but basically my plan for node summit was to give a talk about black friday of course there was nothing to talk about so i basically gave a talk about how everything went wrong all the way until black friday and uh we only got the the fix for the for the infamous memory leak in the week of. Like we actually, and we couldn't even verify it because we were doing daily releases. So we never actually got to observe the server over, you know, over like 48 hours to see that the memory leak was actually fixed.

Starting point is 00:18:20 So it was all very theoretical. And so that was part of the thing is that if the memory leak was still going on, it required us to restart our servers every seven days. And we were expecting up to 10x traffic. And, well, that means we can have to restart the servers more than once a day. And on Black Friday, you don't really want to touch your servers right um so so it was all very suspenseful but it was kind of like uh edge of your seat suspenseful but boredom so did you have somebody like working on that memory leak all the time trying to find it or what happened with that um so we found a memory leak uh well we we saw the pattern of the memory

Starting point is 00:19:01 leak back in uh april already and then by june it was i i was of the memory leak back in April already. And then by June, I was convinced of the memory leak. And it was one of those things where I argue with everybody, including my own team, that it is a memory leak. And they're all like, no, it must be something else. And I said, no, it's a Node memory leak. And they're like, no, it can't be Node memory leak because somebody else would report it, right? Other people are using Node in production. And they're not seeing any of that behavior. And so we ended up spending quite a lot of effort putting quite a lot of monitoring into the system where I was basically spending three months trying to find correlation between the memory leak,

Starting point is 00:19:39 when memory was spiking, to when something else was going on. So we added monitors for client disconnects and response times and concurrent connections and just connections per second. And really, we built in so much monitoring into it so we can start comparing it and nothing correlated, just absolutely nothing. We knew the more traffic we get overall, the more leak we get, but the leaking is not happening when the traffic is coming in. It's just that there's a correlation between the overall daily pattern.

Starting point is 00:20:15 And at some point, I found a few clues that I had some ideas, and I said, okay, we're going to make a configuration change that is going to double the amount of HTTP client calls that we're making. And I said, and watch, we'll do that and the memory leak will double itself. And people kept saying, no, no, no, it's not going to happen. And of course, memory leak doubled itself, which helped us zoom into the exact spot where we're leaking. And I was able to isolate that, and then I wrote a little program that showed it,

Starting point is 00:20:48 but it took about 12 hours of it to run to even show you a slight leak. And it was all from, what, closing file descriptors? No, it ended up being a missing handle scope in the C++ side of Node. It was one line missing in node uh in one function um it was basically two c++ words that was the bug and it caused a four in some cases it caused a four bytes leak per htp request yeah so that takes a little while to add up, but that definitely adds up. Yeah. And so at the time, we were leaking about eight megs a day.

Starting point is 00:21:30 So we got it really, really low by mitigating it in other ways, but couldn't really solve it. And then TJ Fontaine from the NodeCore team was able to – he spent like three weeks on it. We took some crazy stuff. And there's a great blog post on the Joyent blog detailing exactly what TJ used and it's kind of like, a little bit like black magic. So it's interesting, the blog post though. Let me ask you, what was that experience like, going back and forth

Starting point is 00:22:06 with the NodeCore team and trying to prove this, and how receptive were they to you, like, you know, pointing this stuff out? So when we first reported it back in June or July, it was the people have been quite dismissive. Where basically

Starting point is 00:22:22 the theory was, like, there is no way we have such a gigantic memory leak and you're the only one seeing it. But at the end, when I was able to actually come and say, hey, here's a little bit of node code and if you run it, it will show you the leak. And then of course they ran it and they were like, no, we're not seeing any leak. And I said, just leave it alone for 12 hours and come back to it tomorrow. And they did. And they opened it up and it's like, no, it still doesn't look like Illiga. It's like, oh, go ahead and plot your trend line on that chart. And then they did that and they're like, oh yeah, you know what? It looks like there's something going on there. And then as they added more instrumentation, they could actually

Starting point is 00:22:56 start seeing what was actually happening is V8 was building a gigantic array of undefined, which is where the four bytes came from. Basically, it was pointing to the canonical undefined reference within the V8. It was just building a gigantic array of undefined that were never getting cleaned. Right. So you said it was fixed the week of is that right it was fixed about two two weeks before um but uh there was some uh some build issue with the uh smart os distribution of the new version of node and so uh we missed a stress test because it was like a few hours late and then after that we were busy with a few other things so we were we ended up putting it uh a week before and uh and crossing our fingers yeah can you really like how easy is it to to test that though to stress test that uh leave your system in production for a day yeah

Starting point is 00:23:56 it's uh yeah like my favorite thing about walmart to say is that we're we're too big to stage um yeah and and the reality is that we could not really reproduce it for months. And we still can reproduce it with the actual system. The only way I was able to reproduce it was with this little script I wrote that was creating a very specific scenario of bursts of traffic to actually stress Node in just the right way to make it happen. Well, so long story short, though, the whole thing was a big success, big win for you,

Starting point is 00:24:32 big win for Node. Let's talk a little bit about the actual, I mean, the implementation of what you guys did and then potentially get into Happy a little bit. So it's my understanding that you guys started out using Express as that was really the only option at the time. Is that right? So I started, the origins of Happy were really back at Yahoo. I was working on a project called Sled for about a year.

Starting point is 00:24:56 So I started when Node was just,.2 just came out. So I guess it was November three years ago. And I was building a list making, a collaborative list making tool at Yao. And I used Express at the time. It was really the only option. It was really just Express and Connect. Those were the only two things. And Express was built on top of it.

Starting point is 00:25:22 And we used that for a while. But what was going on in that one year is that I found myself basically building a framework on top of it. And we used that for a while, but what was going on in that one year is that I found myself basically building a framework on top of Express because Express gave so little functionality. It basically was just a router with a little bit of helpers, but it wasn't really a full web framework that did all the things I want in terms of redirect the right way and handle rendering views without having to constantly set up the view context. And at the time, also, the middleware ecosystem for Express was very, very young. I mean, basically, I was finding Express bugs on a daily basis and just IMing TJ and saying, hey, TJ, another one.

Starting point is 00:26:08 And so that has changed very dramatically. But when I went to Walmart, what I did is I basically took the Express layer that we've built in the Slate project, which was open source by Yahoo. So that was easy. And then we kind of like said, OK, we're going to call this happy, and it's going to be basically an express layer that will add everything we needed. And if you look at what PayPal just did with their Kraken framework, they kind of did the same thing. They took express and then they realized express by itself is not a very useful framework for a large team,

Starting point is 00:26:44 so they went and added a bunch of abstraction and layers on top of that. Express by itself is not a very useful framework for a large team. So they went and added a bunch of abstraction and layers on top of that. Right. So we did that for a while, and it was working well. But then we started hitting the limits of what Express can do. And the biggest one is the way the router is designed in Express. It's basically just a gigantic array of regular expressions. And all it's doing is that whenever a request comes in, it just goes through the array in the order that you edit your routes into the array, and it's doing a regex match against

Starting point is 00:27:16 each one of them. And when it finds a match, it calls the function that will match it. And all the middleware stuff is basically just adding a wildcard match into the array. There's no magic there. It's kind of beautiful in how simple the entire architecture of Express is. But when you're working in an enterprise environment, when you have multiple teams working on the same server, you're going to want the server to take care of, for example, collisions in your routes.

Starting point is 00:27:46 You don't want to have to end up with, you know, two routes with the same path, two middlewares conflicting on what they're changing. And very fast, we got ourselves into middleware hell, which I'm very proud to say that that was the term that I started. Probably being like one of the first people to actually use Express in such a large scale that we experienced it. And it was really painful. We wanted the framework to protect us from doing stupid things, and it wasn't. So we switched to a director from the Nojitsu guys.

Starting point is 00:28:23 And we used Director for a little bit because Director just gives you a router and you can use it as any way you want. But then we hit limitation there as well because of the way they store the route tree. And at that point we felt pretty good about just doing our own internals. We're talking about, you know, a year and a half into working on this environment. And the other thing is that when you start to build a real production dependency on these things, you kind of require a different level of maturity from the modules you're using,

Starting point is 00:28:59 from the open source stuff you're using. And we found ourselves trying our best to use public open source stuff you're using and we found ourselves um uh trying our best to use like public open source modules but very slowly but surely moving towards more and more code base that we were we were managing it just because uh the quality was was uh more uh within our our criteria and uh if there was a problem we can fix it right away we didn't have to fork or start playing all those games or trying to find someone on you know twitter to help us accept a pull request we've made and so we're still using a significant amount of open source stuff but whenever we we hit a wall with a with a, we are much more trigger happy now to fork it and create our own than we were a year ago.

Starting point is 00:29:48 Right. You guys are much more familiar with the whole environment now as a team and your needs in that environment. So there's a lot more confidence in that area, I would imagine. Yeah. We're also – we feel like we're giving enough back that – like if you talk to most of the leader in the Node community, they're all about small, tiny modules. They all hate frameworks. And it's kind of funny. Whenever they talk about framework, you always hear one of them will say, there's also Happy from the Walmart guys, which if you're doing large-scale stuff, that's actually a good solution.

Starting point is 00:30:29 But really, you don't need it. It's always this qualified love coming from a lot of the core Node people, which I respect, but at the same time, if you're working in a large team, you have a lot of people building stuff, you really want to have a plugin architecture that people can build their own stuff and then just deploy it together without having to coordinate every change, without having that one gigantic, scary routing table file that everybody has to constantly change to get their stuff into the server. So those were the things

Starting point is 00:31:04 we focused on the last year in terms of the framework. So those were the things we focused on the last year in terms of the framework. So it's interesting then because you guys started, so you did your own thing, and with Happy, you talk about how the Node community loves a bunch of little tiny modules. But you briefly mentioned it, but you went through a pretty modular approach

Starting point is 00:31:20 to how you deal with Happy, right? So first of all, your organization name on GitHub is Spumco. And why don't you real quick tell me the story you said about Spumco and where that came from. So the first module was called Happy, which was short for HTTP API. So it was really an acronym. But then as soon as I wrote it down, I was like, happy, happy, joy, joy. All my childhood, the Ren and Stimpy days just came flooding back. And so, of course, the second module we created was called Joy. And we called it J-O-I just to stick with the same spelling style of happy. And after that, basically every new module we created

Starting point is 00:32:06 was based on some kind of random stimpy character or episode. And at some point, we were like over 30 public GitHub projects, which made life on the official Walmart Labs GitHub account quite miserable because GitHub doesn't give you any way to organize your stuff other than everything is flat in your organization. And when you have a couple hundred other people all using the same GitHub organization, the dashboard becomes useless. So we were like, okay, we need to move our stuff to a new org.

Starting point is 00:32:42 And what are we going to call it? And I was like, I don't want to call it another Walmart 2. Let's find something a little more creative. So we chose Spunkle, which is Spunkle with a C is the name of the animation studio that created Ren and Stimpy. And so we call our organization Spunkle with a K as kind of a homage to that. Yeah. So all your plug-ins here. So you guys actually were brave enough to name one poop,

Starting point is 00:33:15 and that is the plug-in for or the module for kind of like exception error handling. Is that right? Well, the proper tagline is it's a plug-in for taking dumps. Yeah, processing a dump and cleaning up after an uncaught exception. So that's funny. That's hilarious. Yeah, it's a very modular approach. Now, this interesting design, is that kind of like to speak to, you know,

Starting point is 00:33:39 to kind of capture the heart of the Node community so that people can pick the modules as they want and kind of mix and match what they want for their different configuration so i i'm a big believer in a expansion contraction pattern of development where you add features to your main uh main framework to the core of the framework and then as they mature as as you gain some experience of how they work you figure out if they should be abstracted out into their own submodule or if they should stay part of the core system. You're still keeping the same integrated experience overall, but in terms of code organization, you're still breaking it up. But I would argue that – so Happy itself, just the Happy module, it's a pretty heavy framework.

Starting point is 00:34:25 So I wouldn't try to portray it as a lightweight, modular solution. It's taking a very opinionated, hands-on approach to writing an HTTP web server or API server. And the reason for that is that we really want an integrated solution where when you define a route, you can define the caching policy all in one place. You can define your authentication. You can define your input validation. And everything just works out of the box. You don't have to then start basically finding the right plug-in to do this and the plug-in to do that. Everything we deemed as absolutely necessary for building any kind of modern

Starting point is 00:35:05 web application is built in. And so that's a core principle there. What we've done, though, is whatever we consider to be more of an optional component, for example, there's a very popular express middleware called Passport that basically everybody's using for all their third-party authentication. So we created a wrapper for that, and that's called Travelog. And so that's not part of Core. It's a heavy plugin you can add in.

Starting point is 00:35:39 The other thing is that we designed a plugin system to basically avoid all the middleware hell that we've experienced before. So you can actually describe relationship between plugins. One plugin can actually say, I require another one to work. A plugin can actually be very specific into the order of execution to say, hey, first go run this CSRF plugin, and then only then run the cookie one or the other way around depending on what you need and so when you're loading that you don't have to worry about the order in which you're loading them as long as they describe the relationship they will the happy loader will make sure that it's done in

Starting point is 00:36:18 the right order based on how it's been prescribed so we've done a lot of that stuff we're trying to avoid as much as possible. So we're discouraging people from building new, happy, specific plugins as much as possible. So a lot of people are creating these plugins, and then they're all very disappointed when they're showing it to me. And I was like, why is it a plugin? Why is it not just a regular node module that you use? There is this excitement of, you know, like, putting a stake in the ground and saying, I created the happy plugin for this. But in practice, it's not necessary.

Starting point is 00:36:56 So we look at plugins as basically something that is directly interacting with the framework, that's directly adding functionality to the framework, that is directly interacting with the framework, that's directly adding functionality to the framework, that is defining routes. If all you want to do is parse a multi-form response or request, I would say don't build a plugin for that. Just write a module. Right. Yeah, it's a different way of viewing it, I guess. People kind of, it's a, I don't know what the, what the mindset is, right? But let's say, you know, like Go, right? It's a pretty early adoption stage for the program language of Go. And so people love to write like the port of another solution for that thing, right? So people say, oh, if they're using, you know, let's say you didn't have your, your password wrapper for, for Happy, then they would say, let's say you didn't have your Passport wrapper for Happy,

Starting point is 00:37:45 then they would say, well, people use Passport for Express. I want to write the Passport for Happy. And I think that causes some fragmentation in the community, right? Because you can talk a long time about what the motivation is behind that, kind of looking for some sort of fame or whatever. But at the same time, they want to help the community by providing a solution. So I think Node and NPM specifically kind of makes it pretty simple, right, to just use NPM modules in general.

Starting point is 00:38:12 And so perhaps you're right that it makes more sense to write a module that is easier to manage in that way than trying to do a specific plug-in for Happy. Yeah, and the other thing is that really all you need in order to create a happy plug-in is to export export one function called register and so that's all that's all we're looking for when we're loading a plug-in so um but you know if if you're writing a module and you really want it to be easily um absorbed into the happy ecosystem then you know write your plug-in write your module the way you would write it for anybody to use, and then just add one more exported function so it can also work as a

Starting point is 00:38:54 plugin for Happy. And I would argue that if you're doing that, might as well export one more function and then it can also work as a middleware for Express. If you design it properly, then it should be pretty easy to bridge the two for most of the basic stuff that people are looking for. Let's pause for a minute and give a shout out to our sponsor, FreshBooks. Now, we've been using FreshBooks literally for years and years and years. I got to say probably as far back I can remember using FreshBooks is like 2007, 2008 maybe.

Starting point is 00:39:30 But they're a staple in our business and what we do. We would literally be lost if we didn't have FreshBooks to use for our invoice management. We absolutely, hands down, red-faced, love FreshBooks. Absolutely. So just in case you didn't know, we love FreshBooks. But if you're the kind of person who's still sending your

Starting point is 00:39:53 invoices with Word or Excel, and maybe you've kind of got your receipts kind of shoved into a shoe box, kind of keeping track of your expenses there and just kind of hoping things will work out, well, with FreshBooks, you can easily do all of that. You can create invoices online. You can capture and track expenses on the go. You can get real-time business reports with a few simple clicks. It's super, super easy, perfect for any large or small business. We absolutely love it.

Starting point is 00:40:21 And we want you to try it today for free. You can sign up today at GetFreshBooks.com. And here's the delicious part that FreshBooks is doing for our listeners. Every day they're giving away a birthday cake. That's right, a birthday cake to someone who signs up for a new account from this show. So for your chance to win, enter the change log in the How Did You Hear About Us section when you sign up for a new account. And just know that with FreshBooks, every day could be your birthday. So go sign up at getfreshbooks.com.

Starting point is 00:40:51 So when was 1.0 of Happy released? Or I guess a better question, I'm not sure what your versioning structure looks like. When was Happy production ready released? Well, it was in production before 1.0 but um we uh so we're using the numbers to basically it's just a regular stemver contract of uh of you know a patch is just a bug fix that's backward and forward compatible and then a minor is uh backward compatible and major is not backward-compatible. And so 1.0 came out, I think, in April. We got it out right together with the Node.10 release, and that has been used in production since then.

Starting point is 00:41:41 And we're working on 2.0 right now, hoping to ship it out next week. And there are really no major changes in it. It's just that it's been long enough that we've accumulated a little too much backup compatibility crap around it. As we've been using it, as people have been using it, we got a lot of feedback. And a lot of the decisions we made in April were no longer valid in September. All of a sudden, we were like, oh, we really don't want this configuration value to be in the same node as this configuration value because it doesn't work right when you're trying to use defaults. And so we made back-up compatible changes, but it got to a point now where it's kind of time to clean it up and do a breaking release.

Starting point is 00:42:28 So it's a very non-dramatic 2.0. Yeah. It was interesting. I was watching, I don't know what video it was, but like a tutorial happy video that you all put on. And I just noticed somebody was adding routes, adding handlers with the route method on the server. And then I saw in the documentation that there was the add route method. And so I was like, I wonder where, you know, something's wrong. Or, you know, in my head, I was like, I bet there's a major change coming out that that's like breaking, you know, that's either deprecating

Starting point is 00:42:58 this or breaking it or something. And it was interesting to me that you all have a issue open for 2.0 breaking changes. That's a neat way to do it. And it was pretty, you that you all have a issue open for 2.0 breaking changes. That's a neat way to do it. And it was pretty, you know, simple for me to figure out what was going on and why I saw the kind of discrepancy between the two. So how much, you know, I guess my question is how actively is your, you know, issues, are your issues on the project watched? Like, do most people know that these breaking changes are coming that are using Happy or, you know, I guess that's an awkward way to say, basically, but how much activity have you guys had around, like, the open source issues kind of pull request kind of a thing?

Starting point is 00:43:39 So there's a couple hundred people that are actually actively watching the issues. At this point pretty much everybody was using it in any kind of serious capacity if you have a production dependency on it you're watching what's going on and we we have uh we've basically um went all in on github as our everything you know it's our project management solution, our team communication solution. It's basically, we put everything there. There is no other ticketing system for Happy

Starting point is 00:44:11 privately in Walmart or anywhere else. We basically made a decision that it's an open source project and we're going to run it completely as an open source project, even though I get bug reports from internal teams and I always say, like, go open an issue, and they're like, really?

Starting point is 00:44:27 This issue? You want me to put it on the web? I was like, yeah, go open an issue. I'm not embarrassed by it. It's a bug, and we'll fix it. Go open an issue. People will see the commit, so it's not like you can hide anything. But we've

Starting point is 00:44:43 also made extensive use of milestones even before GitHub kind of cleaned their act with versions. So we've been using milestones quite extensively. So we don't do release notes because all we're doing, we're very religiously tagging everything to an issue and then the issues are all part of milestones. So you can see exactly, if you just look up, you want to say, okay, what changed between this and this?

Starting point is 00:45:06 You can just bring up the milestones and you can see exactly what issues were associated. And then once we did that, we kind of added the breaking change label and we said, hey, you know what? If we're going to make a change that's going to be breaking, and before we were 1.0, basically every minor release was a breaking change.

Starting point is 00:45:25 Every one of them was like, oh, yeah, you can't upgrade unless you rewrote your entire app. And after that, it became a lot less significant. I think we made two breaking changes throughout 1.0, and both were for security reasons. So we changed the default of multi-part parser not to create files by default. Stuff like that that we felt like, you know what, this is a breaking change worth making. Yeah, the whole Node community had to do that pretty much, right?

Starting point is 00:45:57 If I remember correctly. Yeah. So we had a couple of breaking changes in 1.0 that were just that important. But overall, it hasn't been a big deal. And now we're working on 2.0 and we're kind of like we have the one issue that we collect everything. And it's more of like edited and it's a lot more friendly for you to understand. But then every individual issue that is actually the one where the change is being made we also tag that so um because i'm not

Starting point is 00:46:30 expecting everybody to be able to sit there and go through my you know 300 breaking changes issues in 2.0 and like read every one of those i mean that would be awful so instead like we're you know we're basically doing it that way and it's also great because then once we're done writing a migration guide it's just kind of like doing some editorial on that particular issue yeah uh and then and that and that's how we do it too like we're going to go and edit that and that will be the migration guide like we're not going to actually like publish a like a wiki page of doing that so do you you know anyone that's using Happy in production besides you guys at a large scale?

Starting point is 00:47:08 I don't know about large scale. I know that Mozilla was using it for some of their identity stuff. For some of their browser ID, they were using Happy. I don't know what's the status right now, but they were using it as of

Starting point is 00:47:24 a few months ago. I know MasterCard is using it for some of their new projects. Conan S., the publisher, they're using it as a building block for their new cross-platform environment. And, of course, Walmart is using it quite heavily right now for mobile. We're looking this year to expand beyond mobile to a lot of other areas of the e-commerce business. It has some significant adoption, but then others have made a decision to either build their own or use Express. So I think the default behavior for other people right now is to pick Express because everybody else is using Express.

Starting point is 00:48:15 And then they tend to, once they got into Express, they feel like it's too much right now to make changes, so they just keep building more and more layers on top to make it more manageable for them. So I'm hoping that as time passes and more people are seeing what we're doing with it, they can make a different decision. And for example, if you have an existing API and you want to take the proxy strategy to migrate to a new stack, which is a really great approach in terms of sticking your layer in between and slowly making changes. You don't have to go in. Otherwise, you have to sit in a dark room for a year and rebuild everything. And of course, we know how well that works in production, like when you ship the new version and nothing works right and then you know it's a year behind

Starting point is 00:49:08 and probably get canceled and everybody quits right so yeah and and basically you can you can deploy happy with probably about 30 lines of your own code and get all the proxy functionality immediately at walmart you know, Walmart scale. Yeah. So that's kind of neat. Yeah, I was going to ask. I mean, that was kind of my next question was, you know, what's the future of Walmart look like for this kind of stuff? So how much kind of, so you're on the mobile team. How much, you know, I don't know what the best way to ask is, but how much impact have you had on the other teams in Walmart? So it was kind of interesting because there was, about a year ago, there was some effort within the people in the company who like to set standards.

Starting point is 00:49:53 And they came to me and they said, you know, more people are asking us about notes. So can we make it the formal that like Happy is the official framework at Walmart? And I said, no, I don't want that to be the case. I don't want anybody to use Happy because some policy is dictating it um because i wouldn't use it because you're telling me what to use so so i don't want to do it to other people um so we never like actually like promoted within the company and people just picked it up all on their own which is kind of neat all of a sudden like we're getting uh issues open and then like after like a few back and forth they're like oh wait a minute are you from the santa clara office um so it was kind of like this this funny

Starting point is 00:50:30 where like we were meeting other people on like the irc channel like you know co-workers that we have never met before um so some of the other um other teams were building like smaller uh like like panels for the main website, like recommendations and the social stuff. They're using Happy to build their own stuff. And they have their own deployment, their own servers. And every once in a while, they'll come back. But the real goal for my team this year is going to be to look and see where we can add value

Starting point is 00:51:02 beyond mobile. As Walmart e-commerce as a whole is moving to new APIs and new technologies on the back end, we're all going to have to move to that stack. And then also we're expanding our mandate to other countries. So right now the mobile team is focused primarily on the US where we have Walmart and Sam's Club and we also are working on the mobile apps for Asda which is the Walmart brand in the UK. And Walmart is active in a lot more other countries including Mexico and Canada and China and Brazil. And it's a very long list. So we are at some point going to expand to those. So it's really seeing how much we can scale the Node engineering process beyond just scaling the software, but also scaling the engineering itself, like the writing of the software itself.

Starting point is 00:52:08 We're going to pause the show for just a minute and give a shout out to our awesome sponsor, TopTal. They've been sponsoring the show for a little bit, and we've had a chance to tell you about some really awesome stuff they're doing. I've been working with Brendan, their co-founder and CTO, and I mentioned that I wasn't quite sure what to expect from them. And I was, but I was excited about what they're doing. They're helping developers who want to freelance with some really awesome companies, find ways to do that. And it's their mission. These guys are the real deal. They're engineers themselves from top to bottom. They're not technical recruiters trying to pimp developers. So if that's what you think, then you've got, you've got them completely pegged wrong. They're a network of elite engineers all around the world who work with some really

Starting point is 00:52:46 awesome clients. And for those of you out there who are freelancing or would like to freelance, you've got to check out TopTie. You can work on special projects with companies like Airbnb, Artsy, IDEO, and many others. You can work remotely. You can go to Andrew's favorite place, which is on a beach, or anywhere in the world. No office is required. And to get started, head to top.com slash developer, click join the best.

Starting point is 00:53:11 And because they want to work with only the best senior engineers out there, they've got a well-thought-out four-stage screening process that begins with a personal phone call via Skype to kind of get to know who you are and introduce you to who they are and what their mission is and see if you're a fit. And from end to end, the screening process includes an English speaking test, a timed algorithm test, technical interviews with core TopTile engineers, as well as a test project. And once you've made it past the screening process, the sky is the limit. And if you think you have what it takes, head to TopTile.com slash developer right now to get started. Tell them the change law sent you and enjoy now back to the show one of the things i wanted to kind of implore uh i don't know if that's even the right word at this point but to kind of congratulate or maybe thank you

Starting point is 00:53:55 guys about was you know we've had a on the show a few times in the last couple weeks we've had discussions around you know how one deployment tool will come out and another deployment tool will come out and say, you know, we are better than X or we don't suck as much as X. And they'll kind of take a shot at the person that they're, they're building on top of. And, you know, I was looking through happy and you guys obviously are, I wouldn't say you're a competitor with express, but you've definitely kind of entered the same space as express. And I don't see anything on, you know, your website saying like, we're better than express or the reason we're doing this is because Express stinks. And I personally just wanted to thank you guys for that because I think that's a good thing to get

Starting point is 00:54:32 away from in the open source community. Well, I mean, there's a couple reasons for that. I think I had, like, in the last year, I had one tweet where I said basically something like, you know, if you're doing something serious with Node, it's time to start looking beyond Express. I think I was a little more snarky about it. But really, there's two ways of looking at it. One is we are clearly the underdog in this space. Both Express and Restify, which is the joint API framework,

Starting point is 00:55:05 have a lot more deployed use cases than Happy has right now. We have more revenues going through it. So, you know, combined, there's definitely more money being bet on Happy than everybody else combined. But that's not a very meaningful statistic. I just enjoy saying it. Basically, you're just saying Walmart's using it at that point. Yes.

Starting point is 00:55:29 Well, I know if MasterCard puts some real revenue on it too, I think between those two, it's going to be pretty significant. But if you're the underdog and you're starting to basically say nasty things about the industry leader. Like, you're really coming off as a dick. I mean, you're not really coming off as, like, you know, somebody who's like. And the thing is, the people who have created Express, you know, particularly the former LearnBoost guys, they're now with WordPress bot, their cloud app startup or spinoff. But those are all fantastic guys. I mean, they are just awesome people and brilliant engineers. So for me to go out and say anything bad about their work, I disagree with the choices they've made. And I think that architecturally what they've produced is not compatible with the

Starting point is 00:56:30 parts I want to have. But to say that it's bad or it's just going to be stupid. And the thing is, those are very two different philosophies. Express is very lightweight. It's basically just giving you very, very a little bit of sugar on top of note. And that's what most people want. So I don't think I need to be a, we don't need to actively go against it. But at the same time, we're definitely trying to get more people to adopt happy. We're definitely trying to highlight where we think we're better than other frameworks in terms of the functionality we provide. But I think you can do it without being a dick. Yeah, absolutely.

Starting point is 00:57:12 And I think that's what you all are doing. So I congratulate and thank you for that. So for the new listeners of the show, at the end of every episode, we ask the same questions to our guests. So Aaron, I'll go ahead and ask them to you now. The first one is for a call to arms. So something around Happy or any one of its modules or Node in general that you'd like to see

Starting point is 00:57:36 the open source community kind of pitch in and contribute to? Mostly just use it. We really are looking for more people to give it a try. And the thing is, if you try it and you don't like it, please tell us why. Like, go open an issue and say, I tried it, didn't like it, here's why I didn't like it, good luck with it. Like, we love issues. We actually don't have a Google group like most other projects because we just want everybody to open issues.

Starting point is 00:58:01 And we have a label called discussion. So we're basically using GitHub issues just like a mailing list um and we found it it's basically it's creating a a psychological um barrier that people are less likely to be spammy and and um uh and troll uh where the mailing list is kind of like expected. So it's working really well, but really my request is for people to just go and give it a try and play with it, find bugs, ask for more stuff, and we're happy to engage. Awesome. If you weren't working at Walmart or working unhappy, what would you be doing?

Starting point is 00:58:46 I would be a full-time farmer. That's awesome. Right now I'm only a part time farmer. I would, I would, uh, if I, if I could afford to, uh, to do that full time, that that's definitely what I would be doing. There's a, there's a famous farming joke of a farmer goes to Vegas and, and win the jackpot. So everybody's saying like, what are you going to do now? And he kind of looks up and he thinks about it and says, I think I can continue being a farmer for another five years. That's awesome. What would you farm? You live out in California, so are you into like avocado? I actually like, I'm not a big fan of the orchard stuff. So I have a small apple orchard, but mostly a lot of vegetables. And I have quite a lot of animals between chicken, ducks, geese, emus, alpacas, pigs, a bunch of beehives.

Starting point is 00:59:39 That's cool. Yeah. Beehives. You're definitely the first guest that we've had that has said that. But still, it kind of is a recurring theme. Yeah, yeah. Beehive. You're definitely the first guest that we've had that has said that. But still, it kind of is a recurring theme. It's very rare for us to get somebody that we would say, what would you rather be doing? And they would say, oh, I'd go into another technology industry or something like that.

Starting point is 01:00:05 Typically, developers and people that, in my experience, that sit behind a computer all day tend to want to do something with their hands if they had more time. So for you it would be farming, for me it would be woodworking, and for some people it's surfing and all that. So yeah, it's a common theme among developers that I found to kind of dream about doing things with your hands. Woodworking and bees. Yeah. Cool.

Starting point is 01:00:20 So I actually gave a talk at Real Time Conf in October. Basically, all I did was talk about food for an hour to engineers. And it was by far the most insane talk production I've ever put together. It was four months of preparation. I had to actually rent a U-Haul and drive it all the way to Portland from California because I had too much stuff. I couldn't ship it. That's crazy.

Starting point is 01:00:50 Yeah. You did it. Yeah, so developers, I'm sure, were very glad to hear you talk about food. It was fun. It was like a psychotic, you know, I think the budget was over five grand for the talk. It was crazy.

Starting point is 01:01:03 But yeah, and the video is online, so you should check it out. Yeah, we'll have to link to that. Our last question is for a programmer hero, somebody in your life that has been influential. I don't think anybody has been influential, but I would say Roberta Williams would be my childhood engineering hero. Of course, if you're not as old as me, she created all the King Quest games. So together with her husband Ken, they created Sierra Online. And so, yeah, I grew up on those games. And all I want to do is reverse engineer them and figure out how they're done. Played my first King Quest when I was probably 10 or 11 years old.

Starting point is 01:01:55 That's cool. Yeah, I have fond memories of games that I played when I was a kid. The one thing about this industry that has kind of amused me or shocked me at kind of both levels is you expect a lot of your coworkers to have spent a lot of their childhood playing video games on the computer. And for whatever reason, a lot of developers just didn't come that route.

Starting point is 01:02:19 So it's kind of interesting to me to bump into someone else that enjoyed a lot of the old school games that perhaps a lot of the newer developers kind of never even heard of yeah my kids are playing kingquist now so it's fun i they're playing right next to me and they keep asking me like how do you spell this spell that that's awesome yeah well cool well hey i wanted to say thanks again for joining us on today's show we're here with aaron hammer from walmart labs and spumco as they're so noted on github uh talking about happy and black friday and

Starting point is 01:02:50 and success it was and that you guys are definitely doing a uh a pretty awesome thing for the node community and and i mean shoot node should write white papers about walmart because i think it will help to pre preemptively squash any node can't scale arguments after hearing the success of Black Friday. I also wanted to give a shout out to our sponsors, DigitalOcean and TopTel for supporting the show. You can go to digitalocean.com to set up your cloud server today and make sure you use our promo code CHANGELOGSENTME. That's CHANGELOGSENTME in all caps to get a $10 hosting credit. And if you want to freelance with companies like Airbnb, Artsy, or IDEO, you can head to toptowl.com

Starting point is 01:03:30 slash developer and click join the best to see if you have what it takes to join Top Towl's network of elite engineers. Again, the URL is toptowl.com slash developer. And that's it for this week. Thanks again to Aaron Hammer for joining us. And also thanks to the listeners for tuning in and for your support. Thanks again to Aaron Hammer for joining us.

Starting point is 01:03:48 And also thanks to the listeners for tuning in and for your support. If you haven't yet, you can subscribe to the Changelog Weekly. It's our weekly email where we share everything that hits our open source radar. You can subscribe at thechangelog.com slash weekly. I think we're off next week, right? We're going to encourage all of our developer friends and listeners to enjoy the holidays with your family and loved ones. And we will be back sometime in the new year. So until then, guys, let's say goodbye. Bye. We'll see you next time.

The Changelog: Software Development, Open Source - Node Black Friday at Walmart (Interview)

Eran Hammer joined the show to talk about Node.js and Black Friday at Walmart....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

The Changelog: Software Development, Open Source - Node Black Friday at Walmart (Interview)

Eran Hammer joined the show to talk about Node.js and Black Friday at Walmart....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.