The Changelog: Software Development, Open Source - Goliath, Event Machine, SPDY (Interview)
Episode Date: April 6, 2011Wynn caught up with Ilya Grigorik, Founder and CTO of PostRank to talk about Goliath, async Ruby web development, and Google’s SPDY....
Transcript
Discussion (0)
Welcome to the Changelog episode 0.5.5. I'm Adam Stachowiak.
And I'm Winn Netherland. This is the Changelog. We cover what's fresh and new in the world of open source.
If you found us on iTunes, we're also on the web at thechangelog.com.
We're also up on GitHub.
Head to github.com slash explore.
You'll find some training repos, some feature repos from the blog,
as well as our audio podcast.
And if you're on Twitter, follow ChangeLog Show and me, Adam Stack.
And I'm Penguin, P-E-N-G-W-Y-N-N.
This episode is sponsored by GitHub Jobs.
Head to thechangelog.com slash jobs to get started.
If you'd like us to feature your job on this show,
select advertise on the changelog when posting your job,
and we'll take care of the rest.
Mogwai's looking for an iOS, Android, Windows, mobile app developer.
Mogwai's backed by Marc Andreessen's Ning,
and they're looking for someone that is familiar with the mobile platform,
preferably Java C++ experience.
BSRMS and computer science is a plus.
If you're interested full-time in Palo Alto, apply at lg.gd.com.
Python is in big demand over at Urban Mapping.
So they're the developer's core team of MapFluence, their hosted mapping and analytics platform.
Looking for also a Bachelor's of Science, Computer Science,
expert at Python and Django and restful web services.
Also a big plus if you know MapReduce, Pig, Cascading, Hadoop, there it is,
all sorts of NoSQL stuff.
If you're interested, lg.gd slash 9e.
Fun episode this week.
Talked to Ilya Grigorik over at PostRank.
Got the scoop on Goliath,
their invented non-blocking asynchronous Ruby framework
built on top of the event machine,
which is really, really cool.
That's a mouthful.
It is a mouthful.
I got the scoop on why our PostRank numbers
don't show any interaction with our feed.
So pointing me some things we can fix to fix up our Tumblr feed
so that we can see who's interacting with our content.
All 12 of you.
We had a couple design episodes there, but I have to comment on their design.
Their design is phenomenal.
Post-rank, yeah, we got into that.
Ilya said he started with a Photoshop background,
and he was a designer first and got into development out of necessity
and made a career out of it.
He's a founder over at PostRank.
They do some really, really cool things around social media analytics and things
and some really high-volume throughput, and they do it all in Ruby.
Who says Rails can't scale?
That's right. Who says that stuff? I know some other podcasts.
That's harsh, but some other podcasts. Well, what do we have to promote this week?
Me? Me? You?
Oh, RedDirtRubyConf. Don't miss it. LittleBirdie
told me there's a special bare bones package that just went on sale today.
$199 gets you into the conference if you don't need anything.
There you go.
And we're also ordering another packet of stickers,
so stay tuned to that as well.
Cool.
If you are at CodeConf this weekend,
catch, I believe, Kenneth and Steve are going to be out there.
And if you are at RedDirtRub RubyConf, as we mentioned, look us up.
We'll be doing a special live episode on the 21st.
Looking forward to that.
And stay tuned to some other great stuff this summer.
Cool.
Fun episode.
Want to get to it?
Let's do it.
Chatting today with Ilya Grigorik from PostRank.
So, Ilya, why don't you introduce yourself and a little bit about your role at PostRank.
Sure. So, I'm the founder, CTO, I guess, of PostRank. We're a fairly small company and startup, about 15 people at this point up in Waterloo, Canada.
And we're aggregating quite a bit of data from the social web.
Ended up building a framework called Goliath to do a lot of our API serving.
So here we are today.
You know, I think your name in Ruby circles has become almost synonymous with performance and high-performance Ruby scaling and things of that sort.
So what's your, I guess, journey to performance been like with Ruby and web frameworks?
Well, that's an interesting and loaded question.
And as far as Ruby and performance, you know, that's – so I think a lot of that work, especially stuff that you read on my blog,
has come around by necessity more so than anything.
It certainly wasn't a motivated or coordinated move towards that.
It's just when we started PostTrank, our focus has been around aggregating lots and lots of data.
So what today I guess is often called big data, archiving it and then processing it for a variety of kind of internal use case and also our clients.
And it just so happens that Ruby was kind of my favorite language at the time, so we chose it as the primary platform.
And throughout that whole experience, we basically tried to figure out how do we make use of Ruby
because we were using it on the front end for stuff like Rails and everything else.
And we loved the productivity that it enabled us to have in terms of developing new products
and just iterating very fast, being able to reliably test and quickly test all this stuff,
unit testing, integration testing, and all the rest.
And we wanted to propagate all of that experience
throughout our entire infrastructure.
So that led to lots of interesting optimization work
in terms of we needed to build fast crawlers to collect that data.
So how do you do that with Ruby? optimization work in terms of we needed to build fast crawlers to collect that data so
how do you do that with ruby and that frankly that's what got me started in many ways down
this whole path of web servers and clients and all the rest and then extending that to okay well
we downloaded this data now now we need to push it through five or six stages of processing.
So let's say you downloaded an RSS feed,
which is something that smells like XML.
It's not quite RSS.
It's malformed XML at that point.
Let's transform it to something like JSON,
which is something that we can actually work with,
and then let's run it through language analysis
and all these different steps.
So just trying to coordinate all of those steps and how do you do that,
what is the architecture that makes sense,
what is the right choice of language or a library for all of those things.
So long story short, I think almost everything you'll find, for example, on my blog,
is directly correlated to what we've been doing or at some point researching or trying to improve within our infrastructure.
And that's, quite frankly, been more by necessity than any specific reason for, okay, I need to optimize this specific step of the infrastructure.
Your blog, igvita.com, has been a great resource for me, learning different tools in the Ruby stack.
And a set of those has been no-sequel options.
I think you've played with every one of them out there.
Do you have a favorite?
I do and I don't.
There's ones that we use and there's ones that we don't.
As everybody else, I think, at this point, quite fascinated with everything
that's going on in the space. It's definitely been a bit of an explosion. And just trying to dig in
beyond just the feature list, right? And trying to really understand what's going on,
what's the data schema, how does it actually affect how you are? Because ultimately, I think
a lot of these solutions come down to
you really need to put a lot of thought up front in terms of what you're designing for or what are
you optimizing for because frankly my sql is probably the right answer in 90 of the use cases
still for most people and you know as developers we may not like that because it's not the shiny
new thing but usually that's you know when you align the business goals with what you actually should be doing,
that's usually the right solution.
But having said that, at FullString specifically, we've deployed...
Oh, let's see. So we definitely have a lot of MySQL.
We're scaling up a fairly large Cassandra cluster at this point in time.
We're logging about 50 or 60 gigs of data
into it every day today.
We have MongoDB for some highly unstructured data,
and it's great for that.
We have Redis for some of the data structure stuff.
We definitely have Memcache.
So it's a mixed bag of tools. And
I think you need to pick the right tools for the right job. It's not just a matter of,
you know, having a favorite. You just need to know what each tool is good for.
Let's switch over and talk about Goliath, your new project that runs on top of Vent Machine.
So how did this project come about? Yeah, so Goliath is definitely not new from our perspective. And the background on this guy is,
you know, we actually started work on, I guess, the first version of Goliath back in,
oh boy, early 2008. So this has actually been something that, um, a framework that we've been using and
iterating on for a while. And what we released recently is technically the version four of our
internal API stack. And back when we started in 2008, um, one of the first things that we realized
was the ecosystem around Ruby web service, uh, wasn't that great.
I believe Mox was effectively kind of the de facto deployment target.
And we wanted something that
wouldn't lock us into the threat model. We wanted something that would give us higher
concurrency. And we started looking around at the available alternatives.
Thin was just coming around.
It wasn't, I wouldn't even call it
in production ready mode at that point.
Ebb, if you remember that guy,
which later evolved into Node.js, of course,
made some rounds.
But none of those solutions were really there in terms of providing a full stack for testing, development,
or even a sensible DSL at that point.
They were all pretty raw.
So given all of that, we effectively started our own project around it.
And the first version of Goliath started as just one file.
It was very simple.
It was fast.
It served just our needs and nothing else, as most projects start.
And then over time, we've started iterating and made a lot of different mistakes along the way, hence the version 4 by the end.
We had a mixed model where it was first fully evented, then we went a mix of threads and events, which worked, but it was lots of lessons learned there.
We did a complete rewrite with version 3, which was completely evented.
Didn't like where it actually ended up, and then ended up with version 4, which is the most recent one Didn't like where it actually ended up and then ended up with version 4,
which is the most recent one,
which is the one we open sourced.
And today I'm going to call Goliath
kind of the 85, maybe approaching the 90% solution.
It's very simple to write a Hello World app from scratch.
That's very fast.
That runs in a raw TCP socket and
serves, I don't know, some insane amount of requests per second. It's fairly hard to get
to an 80% solution. You know, you really need to start to put some thought around how you handle
all the edge cases in HTTP spec and all the, you know, how do you develop a good DSL random
and all the rest.
And then getting to, you know, 90% and 100% is very hard.
That takes literally years.
And I think Goliath is kind of getting to that point, even though it's new in terms
of being as an open source project.
It's definitely been something that we've worked on and spent a lot of time working
on for the past couple of years.
So at its core, Goliath is a non-blocking framework.
How much of a barrier to entry is that for the average Rubyist, do you think?
Well, that's an interesting question.
I'm not sure that it's much more of a barrier than any other framework because what we tried to do with Goliath is actually to simplify or hide almost
the fact that it's completely asynchronous under the hood.
So of course, the first thing that you should think about when you hear asynchronous
is what does that mean for the programming style?
Usually when you think about asynchronous, you end up having to define callbacks
and functions which fire at some later time when the event completes.
So Node.js is something that you guys have discussed at length on this show before,
and that's definitely a great example of that, right?
With Goliath, we actually tried to take advantage of some of the features that Ruby 1.9 exposes
to hide some of that complexity.
And maybe I should step back here and say that
the version 3 that we wrote internally for Goliath
was actually completely asynchronous.
And it was very much the same flavor as Node.js
with all the libraries, except it was in Ruby.
And what we found, though, was after we ran with that
for about six months,
we found that the APIs that we were building
were getting complicated enough
such that the testing and the maintenance of them was becoming very, very expensive for us.
The code became complex.
It was very hard to maintain in an ongoing basis.
So we took a step back and said, look, this is not going to scale.
How do we solve this problem?
And we started looking around and realized that Ruby 1.9 has this really nice feature called fibers, which are continuations.
And if we were to do some extra work under the hood and within the actual library, we could actually hide a lot of the developer, effectively, instead of having to define a callback, we
can do it for you and then make it look as if you have a completely synchronous API.
So at the end of the day, when you look at the code that you write for a Goliath API,
it looks completely synchronous. So you could, in fact, take your Rails code
and pretty much copy it over
and not worry about having to define extra functions,
callbacks, and all the rest.
You have very logical flow.
If else, you don't have to worry about callbacks
and errorbacks and all this kind of stuff.
So our goal is to actually simplify it
such that you don't have to think about it.
And I think we succeeded at that
because for new guys that start with us at PostRank,
we just give them the framework
and they pretty much are oblivious
to the fact that it's underneath
as running on this asynchronous core.
The only thing they have to pay attention to
is, of course, the fact that they're using
the right libraries.
So they're not using a blocking library. So let's talk about that for a moment that was gonna be my next question so what's the ruby landscape look like for non-blocking libraries
it's pretty good is it growing is it's compared to no js which is like non-blocking you know by
default right yeah and so the whole ecosystem that grew up around it has been non-blocking by default, right? And so the whole ecosystem that grew up around it has been non-blocking.
So Ruby, are we getting there,
or is it still a lot of work to be done to take advantage of this style of programming?
To be honest, I'm not sure how to answer that exactly,
because I think the most prevalently used framework within Ruby
for doing this kind of programming is Event Machine.
And Event Machine does have quite a bit of work
and drivers that have been built around it
for all of your common suspects.
So anything from Memcache to MySQL to Cassandra
to everything else, HTTP clients and so forth.
So as far as getting good coverage
in terms of your most common apps,
I think it's all there.
And I think most of the clients
are in good functioning state
and I haven't had too many problems with that.
Now, it's interesting that you compare that
to Node.js because
intentionally or not, I think
when Ryan picked JavaScript,
he basically made a break with everything.
He basically said, look, we're going to have to write
completely new drivers for just about everything.
And there's been a lot of work that's been done
in that space now. And I think now, if you're just starting with Node today, you already have a pretty good ecosystem of drivers for virtually all of the major components that you would need.
But in the process of doing so, because he completely broke away from any other language, he basically forced the user to always make the right choice in some sense.
Because in Node, you can't really make a mistake of picking the wrong driver.
Whereas in Ruby, if you're developing in Ruby,
you have to be very conscious of what it is that you're doing.
Because you could pull in some driver that all of a sudden is doing the wrong thing,
and your performance goes out the door.
So I think both are comparable.
There's obviously a reason as to why we chose to stick with developing Goliath.
And fundamentally, I think there's no reason to break apart from the Ruby language
and force yourself down the JavaScript path.
And I should say, I love JavaScript.
There's nothing wrong with it.
It's a great language.
But I just enjoy Ruby so much more.
And the type of code that you can write
with stuff like fibers and all the rest
is, to me, much more readable and maintainable.
And hence our development
and all of the work around Goliath.
The fact that we can reuse components like RSpec, Cucumber, and all the rest to drive our tests,
and we have access to all of the Ruby standard library.
It's a double-edged sword, right?
On one hand, you break apart from bad gems and libraries, which are blocking where they shouldn't be.
But at the same time, you do have the full capability
and library of all of the Ruby gems.
So you just have to be a little bit more careful.
Speaking of the Ruby library and the standard library
and the ecosystem of Ruby gems around it,
as a community, how do you think we're adapting to the move to 1.9?
I'm actually really pleased to see that a lot more people are migrating.
I believe just a couple of days ago I saw some announcements from the Rails core saying that the next version of Rails will require Ruby 1.9.
So it's no longer a suggested option. It's a required
option. And I think that's obviously big news. And I think overall, even though it seems like
it took a little bit longer than it should have to start moving the community to 1.9,
there seems to be a fairly big shift that has happened.
I'm going to say in the last,
in the last six to eight months where more and more people are adopting one
nine as their default platform. And, you know,
I think there's many different reasons for that.
Some of it is just availability of better tooling around us like RVM and
everything else that just make it much,
much easier to both develop and deploy against
multiple runtimes.
And then just the fact that more and more gem authors are paying attention to 1.9 now.
So I've been running on 1.9 as my primary platform for almost a year and a half or two
years at this point.
I develop all my gems on 1.9.
I only switch back to 1.8 to run the spec test.
And I think that's becoming the default now.
So I'm happy to say that we're getting there.
So in the readme for Goliath,
you mentioned performance numbers on MRI, JRuby, and Rubinius.
How important was it to you to publish those
and support Goliath on a multiple Ruby stack?
So I think this is one area that I'd love to explore
in the future with Goliath.
So initially, we developed Goliath to run on 1.9 MRI specifically,
so the C Ruby.
And we had a couple of dependencies in there
which were specifically C extensions.
So for example, thin can only run on MRI
because it uses the mongrel parser
and some C code under the hood.
And of course, Event Machine itself is a C++ core.
But Event Machine also has a Java version.
So when we were developing Goliath,
we tried to find and remove any bottlenecks
that would not allow us to run on multiple runtimes.
So we wanted to be able to run on JRuby.
And part of the reason for that is
MRI has a global interpreter lock.
And you're basically stuck to a single core, which doesn't have a global interpreter lock, then
in theory, nothing stops us from spinning up a bunch of operating system or OS threads
and running multiple reactors within the same process. And that, of course, opens up a lot of
interesting opportunities for simplifying the deployment and doing all this kind of stuff.
So to be honest, it was when we were removing these bottlenecks,
we were looking a little bit more to the future.
So with the hope that as these alternative runtimes,
and I know many people wouldn't consider,
or rather would consider JRuby to be their primary runtime,
not an alternative runtime,
as these systems develop, we can take advantage of the performance
that they can offer us with Goliath.
And, for example, JRuby is a very interesting one that I'm looking forward to
investigating in the future because, at the moment, fibers,
which we depend on fairly heavily in goliath
are pretty slow in jruby they are mapped directly to operating system level threads
so expensive to spin up and maintain but there is some patches and work in jruby that will
that should change that dramatically to the tune of making it even faster than
kind of the lightweight processes that we have currently on MRI.
And when that happens, it could well be the case that Goliath will run just several times
faster on JRuby than it does on MRI.
And I think that's a great story that we don't have to lock ourselves
to a specific runtime.
So you mentioned in the readme
suggesting that you stand this up
behind an HAProxy or
Nginx equivalent. What do you guys run?
Primarily
HAProxy. That's
kind of our primary weapon of choice.
We do have some NGINX processes deployed.
The reason we prefer HA proxy is because it allows us to have much more control
over the load balancing and all the other parameters.
So more intelligent failover and all the rest.
And when we need additional features that Nginx can expose,
like, for example, do GZIP compression for us or something else,
then we deploy it as needed.
Talk a bit, if you would, how you're using it at PostRank.
Goliath?
Yes.
So Goliath we have deployed for a number of different applications.
One of the choices that we made very early on in terms of architecture was to build a lot of our
own infrastructure within PostRank around the idea of web services. So instead of specifying
or using some sort of an RPC mechanism, let's just use HTTP as our primary source.
So everything should talk over JSON and over HTTP.
So we rely on a lot of very high-performance endpoints within our system,
which are serving hundreds of requests a second for our own internal use and for our clients.
So we share the same endpoints. which are serving hundreds of requests a second for our own internal use and for our clients.
So we share the same endpoints.
So to do that, obviously we need something that is able to handle the concurrency and also to be able to handle features like HTTP pipelining, keep-alives, to minimize the overhead.
So internal services for request-response style requests.
We have streaming APIs.
So, for example, if you've ever worked with the Twitter search API,
you open a connection and it just feeds you data, JSON data.
We have some of those deployed as well.
So, we're streaming data over Goliath.
Goliath is also capable of doing streaming
uploads, which is something that we added fairly recently, such that, for example, if
a client is pushing you a, I don't know, let's say a five megabyte image, and you want to
store that into S3, you don't have to buffer that in memory, which is what most web servers
do today, at least in the Ruby space.
And then they give you the whole image,
and then you can push it to S3.
Goliath actually allows you to progressively load that
and push it directly to S3.
So those would be the primary use cases.
But between the Keep Alive support,
pipelining, and the streaming APIs, we easily push tens of gigabytes of data through that stack every day.
So the sort of client libraries you're using, I'm assuming you're doing some sort of parallel network transport for each of these.
So what's your basic favorite transport library?
So a lot of the, I'm not sure this is actually what you're asking,
but a lot of the messaging and communication that we do
in terms of coordinating web services within PostRank
is done over AMQP.
So, for example, some of the HTTP streaming web services
that we have, they quite literally act as direct front ends to AMQP queues, right, where we would connect to some endpoints after all the data has been processed and just stream that data to our clients.
Oh, gotcha. So all of your HTTP transport is then just a long persistent connection streaming sort of API?
Right. Yep.
Gotcha.
So PostRank, for those that don't know, is a way to show, among other things, a way to show what's popular on your particular blog.
We're dying to use this on the changelog, but until we get off Tumblr, we can't, we've hit a snag. So PostRite uses the URLs that are in your feed to determine, I guess,
what sort of participation your audience is having with your,
your content by matching it to what's bookmarked and delicious and other
social venues.
But Tumblr does not include the slug on the, the post items, right?
So they have the integer at the end.
So none of our content matches.
So every day I get an email saying that my post-ranked content is so sad
because nobody's marking our stuff.
Well, we can probably fix that.
And actually, so the crazy thing that we do at post-rank is, as you mentioned,
we aggregate this, what we call engagement activity, which is
effectively anytime somebody shares or does something around a piece of content on the web,
we want to know about it. So we aggregate, for example, every tweet that contains a URL or every
vote from Dig or Reddit or Hacker News and all these other sites, and every comment from all these sites as well.
So one way to picture what we're doing is we're trying to assemble a firehose
of all the different firehoses of the activities around all this content.
And we don't collect that data for specific URLs that we care about.
We collect that data for all of the URLs.
So as you can imagine, that's quite a bit of data.
So even though the plugin that you're referring to,
which is the top posts widget that we have,
is not picking up the right URL,
we have all the tweets and everything else for content around the changelog show.
So you can actually use our API and just send it all the URLs
that you guys have created.
Oh, gotcha.
And you can get the actual metrics,
or you can actually get the full conversation as well.
This is something that I alluded to earlier,
where we're pushing a lot of data into Cassandra.
That's what we're using it for.
We launched this project four or five
months ago where every activity that we collect, so for example, if somebody today shares a tweet
with a link to the changelog, one of the changelog episodes, will actually store the content of that
tweet and all the associated metadata about it,
and then allow you to look it up on a URL basis.
So you can actually say, well, you know, I have this URL.
Show me all the activity.
So there's people bookmarking it on Delicious.
There's tweets.
There's Hacker News comments and all the rest.
And you can see that as just one stream.
Now, I've seen you guys hire from time to time to switch topics for a moment
what would you tell
the job candidate that was looking
to get on it at post-ranked
maybe new to the Ruby community or new
to even open source development
what as an employer do you look for
in a developer
well let's see
a GitHub account
that's always a good place to start, and a blog, right?
At the end of the day, and I've interviewed a lot of students specifically, so we're located in Waterloo in Canada,
and Waterloo has a fairly well-known computer science program, University of Waterloo.
So we interview a lot of co-op students for basically every semester.
We have at least a couple.
And honestly, one thing that always surprises me is I go through a pile of resumes, 50 to 100 each time,
is the fact that out of those 50 or 100, they're all bright computer science
students, very smart guys, usually guys, for good or for worse. Very few of them actually have
something that they're passionate about. Very few of them have a blog or something that they've written or contributed to. Very few of them have a GitHub account.
So frankly, my first pass over that stack of resumes is always just to look for,
do you have a blog or do you have a GitHub account?
And usually there's at least three or five people that match,
and I immediately put them to the side,
and I know that I'm going to interview them, even without considering or looking at the marks, because they're already showing something that
most people don't. But overall, I think the best people that we've hired, they've all had a
consistent streak of having projects that they're passionate about, that they've contributed to,
and having a history of history of open source contribution.
So how did you come to Ruby and what language background did you come from?
I think as many people, I started with PHP and Perl.
I actually was never much of a computer geek, if you will.
I got into web development through web design.
I was one of the Photoshop wranglers for a while
and effectively got into the whole programming world
by learning HTML and then learning that my clients
wanted more dynamic sites.
So I started doing PHP and then Perl, and then before I knew it,
I was in computer science.
And then before I knew it, I was doing Ruby.
So it's kind of an odd path.
You know, it's very similar to my own path.
And I tell folks that I feel like Merlin, living my life backwards,
started out on the front end, and I keep going deeper into the stack,
just trying to deliver on things that are in my head. And I think your blog just,
you know, oozes that design. What sort of commonality do you see between
design as a communication medium and programming as a communication medium?
I think they're one and the same in many ways.
To me, presentation is at least 50% of the actual deliverable product,
whatever that product may be.
And depending on the context,
that could be a nice packaging around your product.
It could be a nice DSL project that you built.
It could be a well-structured readme, right?
The ability to actually communicate something to another person
is kind of, I think, is the most important aspect.
Then you really have to pay attention to what is the most important aspect
because ultimately the process of design is more about subtraction
than adding stuff.
So you really need to be clear about what it is that you're trying to communicate,
whatever it is that you're working on,
a new open source project or a new design template.
Do you have a programming hero?
A programming hero?
Honestly, there's probably too many.
Give us one and don't say Linus.
Give us one.
I think one person that impressed me early on was Brad Fitzpatrick.
So of Life Journal, Memcache, Fame, and all the rest. early on was Brad Fitzpatrick,
so of Life Journal, Memcache, Fame, and all the rest.
And I can't even say specifically why,
but I remember reading some interviews very early on about just how he started Life Journal
and the work that they were doing around Memcache,
ProBall, and all the other projects that came out
that a lot of us don't even think about today
but run a lot of our infrastructure on.
And how it was, for him,
was always about just solving his own problem.
He never started with some grandiose vision of,
I need to build a really fast memory cache server.
It's just I have this specific problem at my company.
I started this project on a whim because my friend said I should.
And, you know, here I am just slugging it out.
Are we in a golden age of web development and perhaps just don't know it?
Golden age of web development and perhaps just don't know it? Golden age of web development, huh?
Has there been a better time to be a bit pusher on the web?
I think it's getting better and better, right?
So when I think about the skill set that you have, I think it's an incredibly valuable skill set as a web developer,
and I think it's only going to get more and more important,
especially with the spread of technologies like HTML5 and everything else.
When I think about one area that I haven't done much work on
and I really want to get into is mobile.
And just based on my own observations and research around that area,
it seems like more and more larger organizations that have spent a lot of time and effort
developing custom apps for each platform are now migrating to HTML5.
Facebook is a great example.
Twitter, all of these guys are converting their mobile clients to HTML5.
And when you think of HTML5, of course, you're doing CSS, JavaScript, and all the rest.
So I think it's only going to get more and more important.
In some ways, it's going to get more complicated, but it's also going to get more interesting as well.
You know, every time I go to your site, I see the tagline, a goal is a dream with a deadline.
And you're one of the most productive developers that I follow.
Are you goal-oriented?
Definitely, yes.
So how do you manage that workflow?
Well, let's see.
Remember the milk?
I don't know if you've used the app.
Oh, yeah.
But I live and die by that thing.
I don't think there's anything specific about Remember the Milk,
short of just it's a great app built with a – it's very clean.
It knows its purpose.
It doesn't get in the way.
But, you know, I definitely love my checklists.
Are you a GTD guy or you have your own workflow inside there?
I am definitely familiar with all the GTD stuff.
Over time, I think I realized that it's not the process.
I think a lot of people spend a lot of time focusing on how to improve your process
instead of actually doing stuff. So I can't say I'm a diehard GTD person, but I definitely follow
my inbox zero rules and make sure that I review my goals for the day or for the weekend and so on
and so forth. If there's any advice I could give to my college-age-something self, it would be that a little effort every
day will always outshine these big bursts of productivity.
What are some of the habits that you have that you think have made you more productive
as a developer?
Well, I think it's exactly what you said.
It's the small little things that add up over time.
I don't remember the exact quote,
but the general message is we tend to overestimate
what we can get done in a day and underestimate what we can get done in a day
and underestimate what we can get done in a week or a month.
So it's not about doing heroic things on any given day
as much as it is just having a clean path towards
what's the next thing I need to do to move this thing along.
So a couple of closing questions.
Are you a Vim, TextMate, Emacs, or BBEdit guy?
So I don't have any religious allegiances to any one of the editors.
I do spend probably 50% of my time in Vim and TextMate.
So I switch between the two quite a bit.
This is where I outsource a lot of my discovery to my guests.
So what one project do we need to post on the changelog that we haven't covered yet?
One project.
Does it count if I don't give you a project but instead a technology?
Sure.
So I've been digging into Speedy.
And I don't know if you've paid attention to this,
but about a year ago or so, Google released this project or I guess a study that they did around a
new protocol that they were trying to define called Speedy.
And their goal was to see how can we speed up the performance of loading web pages, the
common web pages that we all visit, Yahoo.com, MSsn.com, or even google.com, by over 50%.
And they took a low-level approach and said, well, of course there's JavaScript optimization,
compression, and all the rest, but what we can do at protocol level?
And they basically came up with a whole bunch of ideas around well http is maybe
not the ideal transport when it was designed at the beginning we didn't pay much attention
to latency you know and later we've introduced a functionality like http pipelining keep alive and
all the rest which frankly don't even work most of the time. So this is a little known fact,
but HTTP pipelining is disabled in all browsers except Opera.
And even Opera only uses it in very weird edge cases
where it can actually do so.
And that's primarily because a lot of the servers
don't support pipelining,
or when they claim to support it, they don't actually do it properly.
And then, of course, all the cache servers in between, which tend to break this kind of stuff.
So it's not a great protocol at the end, it turns out.
So Speedy is about redoing a lot of that work and basically building a new protocol instead of HTTP.
And so they did this stuff about a year ago, released some numbers,
and basically showed that, yes, we could, given some of these optimizations that we propose, we can actually get over 60% improvement in latency
for delivering these web pages.
They posted some source codes, a client that was available in Chromium,
and after that I didn't see much coverage around us at all.
And just recently a thread popped up where they basically said that
if you're running Chrome and you're talking to Google Web Services,
then 90% of the traffic is going over Speedy.
So if you're a web developer today,
there's a high likelihood that you actually are using Chrome.
And if you're using a Google Web Service,
chances are you're not running over HTTP, you're running over Speedy,
which is really, really interesting.
That's amazing.
Yeah, exactly.
And I guess Google can actually do that because they control their own servers
and they control the browser.
So they're able to make this sort of change.
But of course, it's not a proprietary protocol.
The spec is out there.
So can we make use of that for our own web services?
I'd love to make post-rank web pages load 50% faster without actually
modifying any of our
UI code or anything in that respect. I'd love to just
replace the web server and make it talk speedy and off we go.
Has anything materialized as far as an Apache module or anything like that
to make it a little bit more palatable for the actual average developer?
Yeah, so they actually released an Apache module.
So if you're, I'm not sure how, you know,
I actually haven't tried it with something like, let's say, Passenger.
I wonder if we can make that work.
But what I've been digging into myself is,
I've been trying to build an actual parser for Speedy in Ruby,
in pure Ruby.
And this was more for my own education.
I find that the best way for me to learn is to actually try and build something
because I can read the spec and I kind of nod along
and I think I understand it.
And then I start to write code and I realize that I didn't get it at all.
So I'm actually working on one right now.
And it's both very simple and very interesting
in how they've made some of the decisions around
how the packet exchange should be done,
the fact that you can send multiple streams over the same TCP channel
and they can be intermixed and all the rest.
So definitely a project or a technology to look into for a lot of web developers,
I think, because even though it's a fairly low-level web server type technology,
I think it's something that we should be paying attention to
because it's a significant improvement.
You know, we've had pretty much the same transport stack for years.
I can remember, I guess 15 years ago or so,
maybe more having to download and install a PPP stack
or a TCP IP stack for my operating system
just to connect to the Internet.
So maybe we're due for the next evolution on top of TCP
for basic dial tone of the web.
Yeah, absolutely.
And in fact, as I'm working on implementing this parser for Speedy,
the crazy thought that's scrolling through my head is,
so one of the core concepts behind Speedy is that the same channel, same TCP channel can transport multiple data streams at the same time.
So that means when a packet arrives, it actually tells you that I belong to the specific stream.
So you can request, for example, two images and data can be fetched in parallel.
Oh, wow. Which is not something that you can do with HTTP, because HTTP forces a strict five-second
request, and then you have to wait until you fetch the first full image, and then the server
will start sending you the second image.
With SPDY, you can actually intermix that data.
So if you have a slow resource, it doesn't block everybody else.
So you can make a request to a slow dynamic resource, but then fetch quick images in parallel.
So you take that, and then you take a look at technologies like XeromQ.
And XeromQ is trying to do something similar but something
more generic they're saying hey look tcp is great but we need message oriented messaging you know
we shouldn't have to worry about parsing out when the where the message ends all messaging should
be message oriented and it all should be should be done as fast as possible.
And you should have all these different transports.
It shouldn't matter if you're sending data over TCP, UDP, or Unix pipe.
So I think if you think about what Speedy is doing and XeroMQ is doing,
there's a really interesting opportunity there to connect the two
and build something very interesting.
You could
build a web server that
is completely message-oriented
and
you wouldn't need
an HAProxy or an
Nginx or anything else in between.
You could just bring up a Ruby process.
It would know where to connect. It would know how to
parse that message without having to implement an entire parser in C
just to parse out the boundaries of the message
and respond quickly without having to register with anybody
or say that I'm up or down.
Definitely exciting stuff.
We learned about ZeroMQ on the zed shaw interview that was the first
time we'd heard of it and got a quick look at it there we need to get somebody from the chromium
project to talk about speedy which when i first saw it i guess when it first came out was it last
year sometime i thought it was spidey spdy for those that are listening at home but don't have
access to the to the show notes pronounced speedy right here on the executive summary well yeah
thanks for joining us it's definitely been fascinating to talk about Goliath
and this non-blocking async style of programming and some other things.
Great. Thanks a lot. I see it in my eyes
So how could I forget when
I found myself for the first time
Safe in your arms
As the dark passion