The Changelog: Software Development, Open Source - Goliath, Event Machine, SPDY (Interview)

Starting point is 00:00:00 Welcome to the Changelog episode 0.5.5. I'm Adam Stachowiak. And I'm Winn Netherland. This is the Changelog. We cover what's fresh and new in the world of open source. If you found us on iTunes, we're also on the web at thechangelog.com. We're also up on GitHub. Head to github.com slash explore. You'll find some training repos, some feature repos from the blog, as well as our audio podcast. And if you're on Twitter, follow ChangeLog Show and me, Adam Stack.

Starting point is 00:00:38 And I'm Penguin, P-E-N-G-W-Y-N-N. This episode is sponsored by GitHub Jobs. Head to thechangelog.com slash jobs to get started. If you'd like us to feature your job on this show, select advertise on the changelog when posting your job, and we'll take care of the rest. Mogwai's looking for an iOS, Android, Windows, mobile app developer. Mogwai's backed by Marc Andreessen's Ning,

Starting point is 00:00:59 and they're looking for someone that is familiar with the mobile platform, preferably Java C++ experience. BSRMS and computer science is a plus. If you're interested full-time in Palo Alto, apply at lg.gd.com. Python is in big demand over at Urban Mapping. So they're the developer's core team of MapFluence, their hosted mapping and analytics platform. Looking for also a Bachelor's of Science, Computer Science, expert at Python and Django and restful web services.

Starting point is 00:01:34 Also a big plus if you know MapReduce, Pig, Cascading, Hadoop, there it is, all sorts of NoSQL stuff. If you're interested, lg.gd slash 9e. Fun episode this week. Talked to Ilya Grigorik over at PostRank. Got the scoop on Goliath, their invented non-blocking asynchronous Ruby framework built on top of the event machine,

Starting point is 00:01:56 which is really, really cool. That's a mouthful. It is a mouthful. I got the scoop on why our PostRank numbers don't show any interaction with our feed. So pointing me some things we can fix to fix up our Tumblr feed so that we can see who's interacting with our content. All 12 of you.

Starting point is 00:02:15 We had a couple design episodes there, but I have to comment on their design. Their design is phenomenal. Post-rank, yeah, we got into that. Ilya said he started with a Photoshop background, and he was a designer first and got into development out of necessity and made a career out of it. He's a founder over at PostRank. They do some really, really cool things around social media analytics and things

Starting point is 00:02:38 and some really high-volume throughput, and they do it all in Ruby. Who says Rails can't scale? That's right. Who says that stuff? I know some other podcasts. That's harsh, but some other podcasts. Well, what do we have to promote this week? Me? Me? You? Oh, RedDirtRubyConf. Don't miss it. LittleBirdie told me there's a special bare bones package that just went on sale today. $199 gets you into the conference if you don't need anything.

Starting point is 00:03:07 There you go. And we're also ordering another packet of stickers, so stay tuned to that as well. Cool. If you are at CodeConf this weekend, catch, I believe, Kenneth and Steve are going to be out there. And if you are at RedDirtRub RubyConf, as we mentioned, look us up. We'll be doing a special live episode on the 21st.

Starting point is 00:03:29 Looking forward to that. And stay tuned to some other great stuff this summer. Cool. Fun episode. Want to get to it? Let's do it. Chatting today with Ilya Grigorik from PostRank. So, Ilya, why don't you introduce yourself and a little bit about your role at PostRank.

Starting point is 00:03:59 Sure. So, I'm the founder, CTO, I guess, of PostRank. We're a fairly small company and startup, about 15 people at this point up in Waterloo, Canada. And we're aggregating quite a bit of data from the social web. Ended up building a framework called Goliath to do a lot of our API serving. So here we are today. You know, I think your name in Ruby circles has become almost synonymous with performance and high-performance Ruby scaling and things of that sort. So what's your, I guess, journey to performance been like with Ruby and web frameworks? Well, that's an interesting and loaded question. And as far as Ruby and performance, you know, that's – so I think a lot of that work, especially stuff that you read on my blog,

Starting point is 00:04:49 has come around by necessity more so than anything. It certainly wasn't a motivated or coordinated move towards that. It's just when we started PostTrank, our focus has been around aggregating lots and lots of data. So what today I guess is often called big data, archiving it and then processing it for a variety of kind of internal use case and also our clients. And it just so happens that Ruby was kind of my favorite language at the time, so we chose it as the primary platform. And throughout that whole experience, we basically tried to figure out how do we make use of Ruby because we were using it on the front end for stuff like Rails and everything else. And we loved the productivity that it enabled us to have in terms of developing new products

Starting point is 00:05:40 and just iterating very fast, being able to reliably test and quickly test all this stuff, unit testing, integration testing, and all the rest. And we wanted to propagate all of that experience throughout our entire infrastructure. So that led to lots of interesting optimization work in terms of we needed to build fast crawlers to collect that data. So how do you do that with Ruby? optimization work in terms of we needed to build fast crawlers to collect that data so how do you do that with ruby and that frankly that's what got me started in many ways down

Starting point is 00:06:10 this whole path of web servers and clients and all the rest and then extending that to okay well we downloaded this data now now we need to push it through five or six stages of processing. So let's say you downloaded an RSS feed, which is something that smells like XML. It's not quite RSS. It's malformed XML at that point. Let's transform it to something like JSON, which is something that we can actually work with,

Starting point is 00:06:42 and then let's run it through language analysis and all these different steps. So just trying to coordinate all of those steps and how do you do that, what is the architecture that makes sense, what is the right choice of language or a library for all of those things. So long story short, I think almost everything you'll find, for example, on my blog, is directly correlated to what we've been doing or at some point researching or trying to improve within our infrastructure. And that's, quite frankly, been more by necessity than any specific reason for, okay, I need to optimize this specific step of the infrastructure.

Starting point is 00:07:21 Your blog, igvita.com, has been a great resource for me, learning different tools in the Ruby stack. And a set of those has been no-sequel options. I think you've played with every one of them out there. Do you have a favorite? I do and I don't. There's ones that we use and there's ones that we don't. As everybody else, I think, at this point, quite fascinated with everything that's going on in the space. It's definitely been a bit of an explosion. And just trying to dig in

Starting point is 00:07:51 beyond just the feature list, right? And trying to really understand what's going on, what's the data schema, how does it actually affect how you are? Because ultimately, I think a lot of these solutions come down to you really need to put a lot of thought up front in terms of what you're designing for or what are you optimizing for because frankly my sql is probably the right answer in 90 of the use cases still for most people and you know as developers we may not like that because it's not the shiny new thing but usually that's you know when you align the business goals with what you actually should be doing, that's usually the right solution.

Starting point is 00:08:32 But having said that, at FullString specifically, we've deployed... Oh, let's see. So we definitely have a lot of MySQL. We're scaling up a fairly large Cassandra cluster at this point in time. We're logging about 50 or 60 gigs of data into it every day today. We have MongoDB for some highly unstructured data, and it's great for that. We have Redis for some of the data structure stuff.

Starting point is 00:09:01 We definitely have Memcache. So it's a mixed bag of tools. And I think you need to pick the right tools for the right job. It's not just a matter of, you know, having a favorite. You just need to know what each tool is good for. Let's switch over and talk about Goliath, your new project that runs on top of Vent Machine. So how did this project come about? Yeah, so Goliath is definitely not new from our perspective. And the background on this guy is, you know, we actually started work on, I guess, the first version of Goliath back in, oh boy, early 2008. So this has actually been something that, um, a framework that we've been using and

Starting point is 00:09:45 iterating on for a while. And what we released recently is technically the version four of our internal API stack. And back when we started in 2008, um, one of the first things that we realized was the ecosystem around Ruby web service, uh, wasn't that great. I believe Mox was effectively kind of the de facto deployment target. And we wanted something that wouldn't lock us into the threat model. We wanted something that would give us higher concurrency. And we started looking around at the available alternatives. Thin was just coming around.

Starting point is 00:10:30 It wasn't, I wouldn't even call it in production ready mode at that point. Ebb, if you remember that guy, which later evolved into Node.js, of course, made some rounds. But none of those solutions were really there in terms of providing a full stack for testing, development, or even a sensible DSL at that point. They were all pretty raw.

Starting point is 00:10:59 So given all of that, we effectively started our own project around it. And the first version of Goliath started as just one file. It was very simple. It was fast. It served just our needs and nothing else, as most projects start. And then over time, we've started iterating and made a lot of different mistakes along the way, hence the version 4 by the end. We had a mixed model where it was first fully evented, then we went a mix of threads and events, which worked, but it was lots of lessons learned there. We did a complete rewrite with version 3, which was completely evented.

Starting point is 00:11:42 Didn't like where it actually ended up, and then ended up with version 4, which is the most recent one Didn't like where it actually ended up and then ended up with version 4, which is the most recent one, which is the one we open sourced. And today I'm going to call Goliath kind of the 85, maybe approaching the 90% solution. It's very simple to write a Hello World app from scratch. That's very fast. That runs in a raw TCP socket and

Starting point is 00:12:05 serves, I don't know, some insane amount of requests per second. It's fairly hard to get to an 80% solution. You know, you really need to start to put some thought around how you handle all the edge cases in HTTP spec and all the, you know, how do you develop a good DSL random and all the rest. And then getting to, you know, 90% and 100% is very hard. That takes literally years. And I think Goliath is kind of getting to that point, even though it's new in terms of being as an open source project.

Starting point is 00:12:40 It's definitely been something that we've worked on and spent a lot of time working on for the past couple of years. So at its core, Goliath is a non-blocking framework. How much of a barrier to entry is that for the average Rubyist, do you think? Well, that's an interesting question. I'm not sure that it's much more of a barrier than any other framework because what we tried to do with Goliath is actually to simplify or hide almost the fact that it's completely asynchronous under the hood. So of course, the first thing that you should think about when you hear asynchronous

Starting point is 00:13:21 is what does that mean for the programming style? Usually when you think about asynchronous, you end up having to define callbacks and functions which fire at some later time when the event completes. So Node.js is something that you guys have discussed at length on this show before, and that's definitely a great example of that, right? With Goliath, we actually tried to take advantage of some of the features that Ruby 1.9 exposes to hide some of that complexity. And maybe I should step back here and say that

Starting point is 00:13:56 the version 3 that we wrote internally for Goliath was actually completely asynchronous. And it was very much the same flavor as Node.js with all the libraries, except it was in Ruby. And what we found, though, was after we ran with that for about six months, we found that the APIs that we were building were getting complicated enough

Starting point is 00:14:23 such that the testing and the maintenance of them was becoming very, very expensive for us. The code became complex. It was very hard to maintain in an ongoing basis. So we took a step back and said, look, this is not going to scale. How do we solve this problem? And we started looking around and realized that Ruby 1.9 has this really nice feature called fibers, which are continuations. And if we were to do some extra work under the hood and within the actual library, we could actually hide a lot of the developer, effectively, instead of having to define a callback, we can do it for you and then make it look as if you have a completely synchronous API.

Starting point is 00:15:12 So at the end of the day, when you look at the code that you write for a Goliath API, it looks completely synchronous. So you could, in fact, take your Rails code and pretty much copy it over and not worry about having to define extra functions, callbacks, and all the rest. You have very logical flow. If else, you don't have to worry about callbacks and errorbacks and all this kind of stuff.

Starting point is 00:15:42 So our goal is to actually simplify it such that you don't have to think about it. And I think we succeeded at that because for new guys that start with us at PostRank, we just give them the framework and they pretty much are oblivious to the fact that it's underneath as running on this asynchronous core.

Starting point is 00:16:00 The only thing they have to pay attention to is, of course, the fact that they're using the right libraries. So they're not using a blocking library. So let's talk about that for a moment that was gonna be my next question so what's the ruby landscape look like for non-blocking libraries it's pretty good is it growing is it's compared to no js which is like non-blocking you know by default right yeah and so the whole ecosystem that grew up around it has been non-blocking by default, right? And so the whole ecosystem that grew up around it has been non-blocking. So Ruby, are we getting there, or is it still a lot of work to be done to take advantage of this style of programming?

Starting point is 00:16:33 To be honest, I'm not sure how to answer that exactly, because I think the most prevalently used framework within Ruby for doing this kind of programming is Event Machine. And Event Machine does have quite a bit of work and drivers that have been built around it for all of your common suspects. So anything from Memcache to MySQL to Cassandra to everything else, HTTP clients and so forth.

Starting point is 00:17:01 So as far as getting good coverage in terms of your most common apps, I think it's all there. And I think most of the clients are in good functioning state and I haven't had too many problems with that. Now, it's interesting that you compare that to Node.js because

Starting point is 00:17:21 intentionally or not, I think when Ryan picked JavaScript, he basically made a break with everything. He basically said, look, we're going to have to write completely new drivers for just about everything. And there's been a lot of work that's been done in that space now. And I think now, if you're just starting with Node today, you already have a pretty good ecosystem of drivers for virtually all of the major components that you would need. But in the process of doing so, because he completely broke away from any other language, he basically forced the user to always make the right choice in some sense.

Starting point is 00:18:03 Because in Node, you can't really make a mistake of picking the wrong driver. Whereas in Ruby, if you're developing in Ruby, you have to be very conscious of what it is that you're doing. Because you could pull in some driver that all of a sudden is doing the wrong thing, and your performance goes out the door. So I think both are comparable. There's obviously a reason as to why we chose to stick with developing Goliath. And fundamentally, I think there's no reason to break apart from the Ruby language

Starting point is 00:18:40 and force yourself down the JavaScript path. And I should say, I love JavaScript. There's nothing wrong with it. It's a great language. But I just enjoy Ruby so much more. And the type of code that you can write with stuff like fibers and all the rest is, to me, much more readable and maintainable.

Starting point is 00:18:59 And hence our development and all of the work around Goliath. The fact that we can reuse components like RSpec, Cucumber, and all the rest to drive our tests, and we have access to all of the Ruby standard library. It's a double-edged sword, right? On one hand, you break apart from bad gems and libraries, which are blocking where they shouldn't be. But at the same time, you do have the full capability and library of all of the Ruby gems.

Starting point is 00:19:30 So you just have to be a little bit more careful. Speaking of the Ruby library and the standard library and the ecosystem of Ruby gems around it, as a community, how do you think we're adapting to the move to 1.9? I'm actually really pleased to see that a lot more people are migrating. I believe just a couple of days ago I saw some announcements from the Rails core saying that the next version of Rails will require Ruby 1.9. So it's no longer a suggested option. It's a required option. And I think that's obviously big news. And I think overall, even though it seems like

Starting point is 00:20:14 it took a little bit longer than it should have to start moving the community to 1.9, there seems to be a fairly big shift that has happened. I'm going to say in the last, in the last six to eight months where more and more people are adopting one nine as their default platform. And, you know, I think there's many different reasons for that. Some of it is just availability of better tooling around us like RVM and everything else that just make it much,

Starting point is 00:20:43 much easier to both develop and deploy against multiple runtimes. And then just the fact that more and more gem authors are paying attention to 1.9 now. So I've been running on 1.9 as my primary platform for almost a year and a half or two years at this point. I develop all my gems on 1.9. I only switch back to 1.8 to run the spec test. And I think that's becoming the default now.

Starting point is 00:21:12 So I'm happy to say that we're getting there. So in the readme for Goliath, you mentioned performance numbers on MRI, JRuby, and Rubinius. How important was it to you to publish those and support Goliath on a multiple Ruby stack? So I think this is one area that I'd love to explore in the future with Goliath. So initially, we developed Goliath to run on 1.9 MRI specifically,

Starting point is 00:21:41 so the C Ruby. And we had a couple of dependencies in there which were specifically C extensions. So for example, thin can only run on MRI because it uses the mongrel parser and some C code under the hood. And of course, Event Machine itself is a C++ core. But Event Machine also has a Java version.

Starting point is 00:22:05 So when we were developing Goliath, we tried to find and remove any bottlenecks that would not allow us to run on multiple runtimes. So we wanted to be able to run on JRuby. And part of the reason for that is MRI has a global interpreter lock. And you're basically stuck to a single core, which doesn't have a global interpreter lock, then in theory, nothing stops us from spinning up a bunch of operating system or OS threads

Starting point is 00:22:53 and running multiple reactors within the same process. And that, of course, opens up a lot of interesting opportunities for simplifying the deployment and doing all this kind of stuff. So to be honest, it was when we were removing these bottlenecks, we were looking a little bit more to the future. So with the hope that as these alternative runtimes, and I know many people wouldn't consider, or rather would consider JRuby to be their primary runtime, not an alternative runtime,

Starting point is 00:23:26 as these systems develop, we can take advantage of the performance that they can offer us with Goliath. And, for example, JRuby is a very interesting one that I'm looking forward to investigating in the future because, at the moment, fibers, which we depend on fairly heavily in goliath are pretty slow in jruby they are mapped directly to operating system level threads so expensive to spin up and maintain but there is some patches and work in jruby that will that should change that dramatically to the tune of making it even faster than

Starting point is 00:24:07 kind of the lightweight processes that we have currently on MRI. And when that happens, it could well be the case that Goliath will run just several times faster on JRuby than it does on MRI. And I think that's a great story that we don't have to lock ourselves to a specific runtime. So you mentioned in the readme suggesting that you stand this up behind an HAProxy or

Starting point is 00:24:33 Nginx equivalent. What do you guys run? Primarily HAProxy. That's kind of our primary weapon of choice. We do have some NGINX processes deployed. The reason we prefer HA proxy is because it allows us to have much more control over the load balancing and all the other parameters. So more intelligent failover and all the rest.

Starting point is 00:25:05 And when we need additional features that Nginx can expose, like, for example, do GZIP compression for us or something else, then we deploy it as needed. Talk a bit, if you would, how you're using it at PostRank. Goliath? Yes. So Goliath we have deployed for a number of different applications. One of the choices that we made very early on in terms of architecture was to build a lot of our

Starting point is 00:25:35 own infrastructure within PostRank around the idea of web services. So instead of specifying or using some sort of an RPC mechanism, let's just use HTTP as our primary source. So everything should talk over JSON and over HTTP. So we rely on a lot of very high-performance endpoints within our system, which are serving hundreds of requests a second for our own internal use and for our clients. So we share the same endpoints. which are serving hundreds of requests a second for our own internal use and for our clients. So we share the same endpoints. So to do that, obviously we need something that is able to handle the concurrency and also to be able to handle features like HTTP pipelining, keep-alives, to minimize the overhead.

Starting point is 00:26:21 So internal services for request-response style requests. We have streaming APIs. So, for example, if you've ever worked with the Twitter search API, you open a connection and it just feeds you data, JSON data. We have some of those deployed as well. So, we're streaming data over Goliath. Goliath is also capable of doing streaming uploads, which is something that we added fairly recently, such that, for example, if

Starting point is 00:26:51 a client is pushing you a, I don't know, let's say a five megabyte image, and you want to store that into S3, you don't have to buffer that in memory, which is what most web servers do today, at least in the Ruby space. And then they give you the whole image, and then you can push it to S3. Goliath actually allows you to progressively load that and push it directly to S3. So those would be the primary use cases.

Starting point is 00:27:20 But between the Keep Alive support, pipelining, and the streaming APIs, we easily push tens of gigabytes of data through that stack every day. So the sort of client libraries you're using, I'm assuming you're doing some sort of parallel network transport for each of these. So what's your basic favorite transport library? So a lot of the, I'm not sure this is actually what you're asking, but a lot of the messaging and communication that we do in terms of coordinating web services within PostRank is done over AMQP.

Starting point is 00:28:00 So, for example, some of the HTTP streaming web services that we have, they quite literally act as direct front ends to AMQP queues, right, where we would connect to some endpoints after all the data has been processed and just stream that data to our clients. Oh, gotcha. So all of your HTTP transport is then just a long persistent connection streaming sort of API? Right. Yep. Gotcha. So PostRank, for those that don't know, is a way to show, among other things, a way to show what's popular on your particular blog. We're dying to use this on the changelog, but until we get off Tumblr, we can't, we've hit a snag. So PostRite uses the URLs that are in your feed to determine, I guess, what sort of participation your audience is having with your,

Starting point is 00:28:52 your content by matching it to what's bookmarked and delicious and other social venues. But Tumblr does not include the slug on the, the post items, right? So they have the integer at the end. So none of our content matches. So every day I get an email saying that my post-ranked content is so sad because nobody's marking our stuff. Well, we can probably fix that.

Starting point is 00:29:17 And actually, so the crazy thing that we do at post-rank is, as you mentioned, we aggregate this, what we call engagement activity, which is effectively anytime somebody shares or does something around a piece of content on the web, we want to know about it. So we aggregate, for example, every tweet that contains a URL or every vote from Dig or Reddit or Hacker News and all these other sites, and every comment from all these sites as well. So one way to picture what we're doing is we're trying to assemble a firehose of all the different firehoses of the activities around all this content. And we don't collect that data for specific URLs that we care about.

Starting point is 00:30:00 We collect that data for all of the URLs. So as you can imagine, that's quite a bit of data. So even though the plugin that you're referring to, which is the top posts widget that we have, is not picking up the right URL, we have all the tweets and everything else for content around the changelog show. So you can actually use our API and just send it all the URLs that you guys have created.

Starting point is 00:30:27 Oh, gotcha. And you can get the actual metrics, or you can actually get the full conversation as well. This is something that I alluded to earlier, where we're pushing a lot of data into Cassandra. That's what we're using it for. We launched this project four or five months ago where every activity that we collect, so for example, if somebody today shares a tweet

Starting point is 00:30:53 with a link to the changelog, one of the changelog episodes, will actually store the content of that tweet and all the associated metadata about it, and then allow you to look it up on a URL basis. So you can actually say, well, you know, I have this URL. Show me all the activity. So there's people bookmarking it on Delicious. There's tweets. There's Hacker News comments and all the rest.

Starting point is 00:31:18 And you can see that as just one stream. Now, I've seen you guys hire from time to time to switch topics for a moment what would you tell the job candidate that was looking to get on it at post-ranked maybe new to the Ruby community or new to even open source development what as an employer do you look for

Starting point is 00:31:38 in a developer well let's see a GitHub account that's always a good place to start, and a blog, right? At the end of the day, and I've interviewed a lot of students specifically, so we're located in Waterloo in Canada, and Waterloo has a fairly well-known computer science program, University of Waterloo. So we interview a lot of co-op students for basically every semester. We have at least a couple.

Starting point is 00:32:13 And honestly, one thing that always surprises me is I go through a pile of resumes, 50 to 100 each time, is the fact that out of those 50 or 100, they're all bright computer science students, very smart guys, usually guys, for good or for worse. Very few of them actually have something that they're passionate about. Very few of them have a blog or something that they've written or contributed to. Very few of them have a GitHub account. So frankly, my first pass over that stack of resumes is always just to look for, do you have a blog or do you have a GitHub account? And usually there's at least three or five people that match, and I immediately put them to the side,

Starting point is 00:33:02 and I know that I'm going to interview them, even without considering or looking at the marks, because they're already showing something that most people don't. But overall, I think the best people that we've hired, they've all had a consistent streak of having projects that they're passionate about, that they've contributed to, and having a history of history of open source contribution. So how did you come to Ruby and what language background did you come from? I think as many people, I started with PHP and Perl. I actually was never much of a computer geek, if you will. I got into web development through web design.

Starting point is 00:33:50 I was one of the Photoshop wranglers for a while and effectively got into the whole programming world by learning HTML and then learning that my clients wanted more dynamic sites. So I started doing PHP and then Perl, and then before I knew it, I was in computer science. And then before I knew it, I was doing Ruby. So it's kind of an odd path.

Starting point is 00:34:16 You know, it's very similar to my own path. And I tell folks that I feel like Merlin, living my life backwards, started out on the front end, and I keep going deeper into the stack, just trying to deliver on things that are in my head. And I think your blog just, you know, oozes that design. What sort of commonality do you see between design as a communication medium and programming as a communication medium? I think they're one and the same in many ways. To me, presentation is at least 50% of the actual deliverable product,

Starting point is 00:34:51 whatever that product may be. And depending on the context, that could be a nice packaging around your product. It could be a nice DSL project that you built. It could be a well-structured readme, right? The ability to actually communicate something to another person is kind of, I think, is the most important aspect. Then you really have to pay attention to what is the most important aspect

Starting point is 00:35:21 because ultimately the process of design is more about subtraction than adding stuff. So you really need to be clear about what it is that you're trying to communicate, whatever it is that you're working on, a new open source project or a new design template. Do you have a programming hero? A programming hero? Honestly, there's probably too many.

Starting point is 00:35:48 Give us one and don't say Linus. Give us one. I think one person that impressed me early on was Brad Fitzpatrick. So of Life Journal, Memcache, Fame, and all the rest. early on was Brad Fitzpatrick, so of Life Journal, Memcache, Fame, and all the rest. And I can't even say specifically why, but I remember reading some interviews very early on about just how he started Life Journal and the work that they were doing around Memcache,

Starting point is 00:36:25 ProBall, and all the other projects that came out that a lot of us don't even think about today but run a lot of our infrastructure on. And how it was, for him, was always about just solving his own problem. He never started with some grandiose vision of, I need to build a really fast memory cache server. It's just I have this specific problem at my company.

Starting point is 00:36:52 I started this project on a whim because my friend said I should. And, you know, here I am just slugging it out. Are we in a golden age of web development and perhaps just don't know it? Golden age of web development and perhaps just don't know it? Golden age of web development, huh? Has there been a better time to be a bit pusher on the web? I think it's getting better and better, right? So when I think about the skill set that you have, I think it's an incredibly valuable skill set as a web developer, and I think it's only going to get more and more important,

Starting point is 00:37:31 especially with the spread of technologies like HTML5 and everything else. When I think about one area that I haven't done much work on and I really want to get into is mobile. And just based on my own observations and research around that area, it seems like more and more larger organizations that have spent a lot of time and effort developing custom apps for each platform are now migrating to HTML5. Facebook is a great example. Twitter, all of these guys are converting their mobile clients to HTML5.

Starting point is 00:38:13 And when you think of HTML5, of course, you're doing CSS, JavaScript, and all the rest. So I think it's only going to get more and more important. In some ways, it's going to get more complicated, but it's also going to get more interesting as well. You know, every time I go to your site, I see the tagline, a goal is a dream with a deadline. And you're one of the most productive developers that I follow. Are you goal-oriented? Definitely, yes. So how do you manage that workflow?

Starting point is 00:38:46 Well, let's see. Remember the milk? I don't know if you've used the app. Oh, yeah. But I live and die by that thing. I don't think there's anything specific about Remember the Milk, short of just it's a great app built with a – it's very clean. It knows its purpose.

Starting point is 00:39:02 It doesn't get in the way. But, you know, I definitely love my checklists. Are you a GTD guy or you have your own workflow inside there? I am definitely familiar with all the GTD stuff. Over time, I think I realized that it's not the process. I think a lot of people spend a lot of time focusing on how to improve your process instead of actually doing stuff. So I can't say I'm a diehard GTD person, but I definitely follow my inbox zero rules and make sure that I review my goals for the day or for the weekend and so on

Starting point is 00:39:42 and so forth. If there's any advice I could give to my college-age-something self, it would be that a little effort every day will always outshine these big bursts of productivity. What are some of the habits that you have that you think have made you more productive as a developer? Well, I think it's exactly what you said. It's the small little things that add up over time. I don't remember the exact quote, but the general message is we tend to overestimate

Starting point is 00:40:24 what we can get done in a day and underestimate what we can get done in a day and underestimate what we can get done in a week or a month. So it's not about doing heroic things on any given day as much as it is just having a clean path towards what's the next thing I need to do to move this thing along. So a couple of closing questions. Are you a Vim, TextMate, Emacs, or BBEdit guy? So I don't have any religious allegiances to any one of the editors.

Starting point is 00:40:59 I do spend probably 50% of my time in Vim and TextMate. So I switch between the two quite a bit. This is where I outsource a lot of my discovery to my guests. So what one project do we need to post on the changelog that we haven't covered yet? One project. Does it count if I don't give you a project but instead a technology? Sure. So I've been digging into Speedy.

Starting point is 00:41:37 And I don't know if you've paid attention to this, but about a year ago or so, Google released this project or I guess a study that they did around a new protocol that they were trying to define called Speedy. And their goal was to see how can we speed up the performance of loading web pages, the common web pages that we all visit, Yahoo.com, MSsn.com, or even google.com, by over 50%. And they took a low-level approach and said, well, of course there's JavaScript optimization, compression, and all the rest, but what we can do at protocol level? And they basically came up with a whole bunch of ideas around well http is maybe

Starting point is 00:42:26 not the ideal transport when it was designed at the beginning we didn't pay much attention to latency you know and later we've introduced a functionality like http pipelining keep alive and all the rest which frankly don't even work most of the time. So this is a little known fact, but HTTP pipelining is disabled in all browsers except Opera. And even Opera only uses it in very weird edge cases where it can actually do so. And that's primarily because a lot of the servers don't support pipelining,

Starting point is 00:43:03 or when they claim to support it, they don't actually do it properly. And then, of course, all the cache servers in between, which tend to break this kind of stuff. So it's not a great protocol at the end, it turns out. So Speedy is about redoing a lot of that work and basically building a new protocol instead of HTTP. And so they did this stuff about a year ago, released some numbers, and basically showed that, yes, we could, given some of these optimizations that we propose, we can actually get over 60% improvement in latency for delivering these web pages. They posted some source codes, a client that was available in Chromium,

Starting point is 00:43:48 and after that I didn't see much coverage around us at all. And just recently a thread popped up where they basically said that if you're running Chrome and you're talking to Google Web Services, then 90% of the traffic is going over Speedy. So if you're a web developer today, there's a high likelihood that you actually are using Chrome. And if you're using a Google Web Service, chances are you're not running over HTTP, you're running over Speedy,

Starting point is 00:44:21 which is really, really interesting. That's amazing. Yeah, exactly. And I guess Google can actually do that because they control their own servers and they control the browser. So they're able to make this sort of change. But of course, it's not a proprietary protocol. The spec is out there.

Starting point is 00:44:42 So can we make use of that for our own web services? I'd love to make post-rank web pages load 50% faster without actually modifying any of our UI code or anything in that respect. I'd love to just replace the web server and make it talk speedy and off we go. Has anything materialized as far as an Apache module or anything like that to make it a little bit more palatable for the actual average developer? Yeah, so they actually released an Apache module.

Starting point is 00:45:16 So if you're, I'm not sure how, you know, I actually haven't tried it with something like, let's say, Passenger. I wonder if we can make that work. But what I've been digging into myself is, I've been trying to build an actual parser for Speedy in Ruby, in pure Ruby. And this was more for my own education. I find that the best way for me to learn is to actually try and build something

Starting point is 00:45:42 because I can read the spec and I kind of nod along and I think I understand it. And then I start to write code and I realize that I didn't get it at all. So I'm actually working on one right now. And it's both very simple and very interesting in how they've made some of the decisions around how the packet exchange should be done, the fact that you can send multiple streams over the same TCP channel

Starting point is 00:46:09 and they can be intermixed and all the rest. So definitely a project or a technology to look into for a lot of web developers, I think, because even though it's a fairly low-level web server type technology, I think it's something that we should be paying attention to because it's a significant improvement. You know, we've had pretty much the same transport stack for years. I can remember, I guess 15 years ago or so, maybe more having to download and install a PPP stack

Starting point is 00:46:41 or a TCP IP stack for my operating system just to connect to the Internet. So maybe we're due for the next evolution on top of TCP for basic dial tone of the web. Yeah, absolutely. And in fact, as I'm working on implementing this parser for Speedy, the crazy thought that's scrolling through my head is, so one of the core concepts behind Speedy is that the same channel, same TCP channel can transport multiple data streams at the same time.

Starting point is 00:47:13 So that means when a packet arrives, it actually tells you that I belong to the specific stream. So you can request, for example, two images and data can be fetched in parallel. Oh, wow. Which is not something that you can do with HTTP, because HTTP forces a strict five-second request, and then you have to wait until you fetch the first full image, and then the server will start sending you the second image. With SPDY, you can actually intermix that data. So if you have a slow resource, it doesn't block everybody else. So you can make a request to a slow dynamic resource, but then fetch quick images in parallel.

Starting point is 00:47:56 So you take that, and then you take a look at technologies like XeromQ. And XeromQ is trying to do something similar but something more generic they're saying hey look tcp is great but we need message oriented messaging you know we shouldn't have to worry about parsing out when the where the message ends all messaging should be message oriented and it all should be should be done as fast as possible. And you should have all these different transports. It shouldn't matter if you're sending data over TCP, UDP, or Unix pipe. So I think if you think about what Speedy is doing and XeroMQ is doing,

Starting point is 00:48:40 there's a really interesting opportunity there to connect the two and build something very interesting. You could build a web server that is completely message-oriented and you wouldn't need an HAProxy or an

Starting point is 00:48:57 Nginx or anything else in between. You could just bring up a Ruby process. It would know where to connect. It would know how to parse that message without having to implement an entire parser in C just to parse out the boundaries of the message and respond quickly without having to register with anybody or say that I'm up or down. Definitely exciting stuff.

Starting point is 00:49:22 We learned about ZeroMQ on the zed shaw interview that was the first time we'd heard of it and got a quick look at it there we need to get somebody from the chromium project to talk about speedy which when i first saw it i guess when it first came out was it last year sometime i thought it was spidey spdy for those that are listening at home but don't have access to the to the show notes pronounced speedy right here on the executive summary well yeah thanks for joining us it's definitely been fascinating to talk about Goliath and this non-blocking async style of programming and some other things. Great. Thanks a lot. I see it in my eyes

Starting point is 00:50:12 So how could I forget when I found myself for the first time Safe in your arms As the dark passion

The Changelog: Software Development, Open Source - Goliath, Event Machine, SPDY (Interview)

Wynn caught up with Ilya Grigorik, Founder and CTO of PostRank to talk about Goliath, async Ruby web development, and Google’s SPDY....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.