Screaming in the Cloud - Episode 59: Rebuilding AWS S3 in a Weekend with Valentino Volonghi
Episode Date: May 8, 2019About Valentino VolonghiValentino currently designs and implements AdRoll's globally distributed architecture. He is the President and Founder of the Italian Python Association that runs PyCo...n Italy. Since 2000, Valentino has specialized in distributed systems and actively worked with several Open Source projects. In his free time, he shows off his biking skills on his Cervelo S2 on 50+ mile rides around the Bay.Links Referenced:Â https://twitter.com/dialtone_Adroll.comTech.adroll.com
Transcript
Discussion (0)
Hello and welcome to Screaming in the Cloud with your host, cloud economist Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud. This episode of Screaming in the Cloud is sponsored by O'Reilly's Velocity 2019 conference.
To get ahead today, your organization needs to be cloud native.
The 2019 Velocity program in San Jose from June 10th to 13th is going to cover a lot of topics we've already covered on previous episodes of this show,
ranging from Kubernetes and site reliability engineering over to observability and performance. The idea here is to help you stay
on top of the rapidly changing landscape of this zany world called cloud. It's a great place to
learn new skills, approaches, and of course, technologies. But what's also great about almost
any conference is going to be the hallway track. Catch up with people who are solving interesting
problems, trade stories, learn from them,
and ideally learn a little bit more
than you knew going into it.
There are going to be some great guests,
including at least a few people
who've been previously on this podcast,
including Liz Fong-Jones and several more.
Listeners to this podcast can get 20% off of most passes
with the code CLOUD20.
That's C-L-O-U-D-2-0 during registration.
To sign up, go to velocityconf.com slash cloud that's velocityconf.com slash cloud thank you to velocity for sponsoring this podcast
welcome to screaming in the cloud i'm cory quinn i'm joined this week by valino Valonghi, CTO of AdRoll. Welcome to the show.
Hey, Corey.
Thanks for having me on the show.
No, thanks for being had.
One, let's start at the very beginning.
Who are you and what do you do?
Well, I'm CTO at AdRoll Group.
And what AdRoll Group does is effectively build marketing tools for businesses that want to grow.
And they're looking to try to make sense of everything that is happening in marketing,
especially when it comes to digital marketing,
that effectively is going to help their businesses drive more customers to their websites
and turn them into profitable customers effectively.
Awesome. You've also been a community hero for AWS
for the last five years or so.
Yeah, I was lucky enough to be included
in the first group of community heroes,
which I think was started in 2014.
It isn't still completely clear to me
what exactly community heroes do
besides obviously helping the company and what did we do to deserve to what exactly community heroes do besides obviously helping the company
and what did we do to deserve to be called community heroes. I think lots of people such
as yourself are doing a great amount of work to help the community understand the cloud and
spreading the reasoning behind everything that is happening in the market these days. So maybe you should be a hero as well.
Unfortunately, my harsh line on no capes winds up being a bit of a non-starter for that.
And I've been told the wardrobe is very explicit.
Oh, okay. I didn't know that.
Exactly. It all comes down to sartorial choices and whatnot.
So you've been involved with using AWS
from a customer perspective for,
I'm betting, longer than five years.
Yeah, probably longer than a decade, actually.
Longer than a decade.
And it's amazing watching how that service is just,
I guess how all AWS services have evolved
over that time span,
where it's gone from,
yeah, it runs some VMs and some storage.
And if you want to charitably call it a network, you can because latency was all over the map.
And it's just amazing watching how that's evolved over a period of time where not only it was
iterating rapidly and improving itself, but it seemed like the entire rest of the industry was
more or less ignoring it completely as some sort of flash in the pan i've never understood why they got the head start that they did oh man such a long long time ago
i remember i was still in europe before i came to work on uh to start up adro but uh in 2006 i think
was when s3 was first released and i remember starting to take a look at it and thinking, wow, now you can put files on a system out there
that you don't know really where it lives,
but I don't need to have my own machines anymore.
And it was the time that you used to buy co-locations online
and it was a provisioning process for all of those.
You needed to choose your memory size
and you typically get a
co-located, co-hosted, shared host type situation. And it was expensive. And then, yeah, in 2008,
EC2 came out, 2007 EC2 came out, and it felt like magic. And at that point in time, Adder was running in a data center out here
on Spear Street in San Francisco.
And I remember we had two databases machines,
both RAID 5,
and one machine was humming along fine,
but the other one was going on two drives,
two drives that were failing in the RAID 5 and we started driving the order in the drives in on Amazon or
whatever Newegg and I think they were in backorder at that time and we needed to
wait for a for a week or two before those could arrive at that moment in
time I made the call. That's it.
We're not doing this anymore.
We are going on AWS.
Just give me two weeks and I'll migrate everything, I told the CEO, and then we'll be free from
the data center.
And I tell you, the costs will be exactly the same.
And actually, that's exactly what happened.
It took two weeks.
They moved all the machines over. the costs were exactly the same,
but we had no more needs to run to the store and provision extra capacity or buy extra capacity
or any of that stuff. It also allowed us massive amounts of flexibility.
And then very early on, it was funny because I think I've lived through all of the
stages of disbelief when it comes to AWS or cloud in general, where the first complaints were,
well, it's not performant enough. If you want to run MapReduce, you cannot run it inside AWS.
There's simply not enough IEO performance on the boxes.
I even lived in a period of time, I was following closely when GitHub was on
AWS at first, and then they moved to Rackspace afterwards because AWS wasn't fast enough even
for them. And they were working through some issues here and there.
Some of those things were obviously real, like true immaturity situations.
EBS has gone through a lot of ups and downs, but it's mostly been stable since
then. Living in now a day and age where the EBS drives that you get from AWS are
super stable, but it never used to be like that.
You needed to kind of get adapted, get used to the fact that an EBS drive could fail or
the entire region could go down because of EBS drives, which has happened in US East
a few times in the past. But yeah, from those very few simple services
with very rudimental and simple APIs,
it does feel like they have started to add more and more,
not only breadth,
because obviously that's evident to anybody at this point in time.
I don't think anybody can keep up with the number of services that are being released. But what's really surprising is that for the services
where they see value and where customers are seeing a lot of adoption and interest, they
can go to extreme depth with the functionality that they implement, the care with which they implement it, and
ultimately with how much of it is available for many of them.
Now you get over 160, I think, different types of instances.
It used to be that you only had six or seven, and now 160.
Some of them are FPGA instances, which I think there's only maybe a handful of
people in the world that can code those machines properly. And they certainly don't work at my
company right now. Well, that's always the fun question too, is do you think that going through
those early days where you were building out an entire ecosystem, sorry, an entire infrastructure on relatively unreliable instances and
disks and whatnot was a lesson that to some extent gets lost today. I mean it
taught you early on, at least for me, that any given thing can fail so
architecting accordingly was important. Now you wind up with ultra reliable
things that never seem to fail until one day they
do and everything explodes. Do you think it's leading to less robust infrastructures in the
modern era? It's possible. I think if people get on AWS thinking that we're going to run in the
cloud, so it's never going to fail because Amazon manages it, I think they're definitely
making a real mistake, a very short-sighted statement right there.
Not just because of that, in case of failures, but a couple of years ago, I think, maybe
three years ago, there were all of those Zen vulnerabilities coming out that Amazon needed
to patch and entire regions needed to be rebooted.
What do you do at that point when your infrastructure is not fully automated and capable to be restored without downtime in user-facing software?
You're going to need to pause development for weeks just in order to patch a high-urgency
vulnerability in your core infrastructure.
That's just an event that is not even a fault
of anybody. It's not even necessarily under full control of Amazon and you
need to be ready for some of that stuff. So there are, I would say, there are
systems that are simply, lots of companies that especially in their first
journey to moving stuff inside AWS, they tend to just replicate exactly what they have in their own
data center and just move it inside AWS. I know this because, for example, Adderall has done that
the first time that we migrated into AWS. We first migrated just our boxes. And then we quickly
learned that it wasn't always that reliable. And so we needed to figure some of that stuff out for ourselves and effectively you start to realize in our case back then that you
needed to work around many of those things but as you said today it isn't
quite that way and to an extent Amazon almost makes a promise about many of
these services not failing or taking care of your infrastructure for you.
For example, if you look at Aurora,
it's a stupendous, fantastic piece of database software.
It's extremely fast.
It's always replicated in multiple availability zones,
so multiple data centers.
The failover time is less than a second, I think, at the moment. And when you're tasked with solving a problem, building a service, you're going to choose
to build it on top of Aurora, neglecting to think about what happens if Aurora doesn't
answer to me because the network goes off?
Or what happens if my machines go down because I misconfigured them? Some of the biggest
higher profile issues in terms of infrastructure of the last year alone, for example, with S3,
have been erroneous configuration changes being pushed to production. What do you do at that
point? Your system needs to be built in such a way that it's going to be resistant, at least
partially, to some of these things. and Amazon is trying to build a lot of
the tools around that stuff but I think it still takes it still takes a lot of
mind presence from the developers and architect to actually do this in a in a
thoughtful way use the services that you need to use in a thoughtful way
understand the perimeter of
your infrastructure and particularly the assumptions you're making as you're building the infrastructure.
And if you can design a graceful degradation service where a failure of an entire subsystem
is not going to lead to complete failure to serve a website, whether you slowly get to
just a less useful website progressively,
but still maintaining the core service that you might offer, then it improves your infrastructure
quite a lot.
I think this is where Chaos Monkey, Gorilla, whatever, King Kong, Kong, or whatever it's
called for the region failure come into play to try to exercise those muscles.
It's obviously important to have them going in production, but I think
even a good start would be to have those running as you're prototyping your software and
just see where the failures bring you. And another trend we've seen recently is the use of TLA plus as a formal
formal verification language where you can
effectively spec your system using these formal languages and then test it using verification
software so that it highlights places where your assumptions were not checking out with
reality effectively. The challenge that I've always had when looking at,
I guess, shall we say, older art environments
and older architectures is that in the early days,
what you just described was very common,
where you wind up taking an existing on-prem data center app
and more or less migrating that wholesale directly
as a one-to-one migration into the cloud.
That was great when you could view the cloud as just a giant pile of, I guess, similar style
resources. But now with 150 something in AWS alone, the higher level services start to unlock
and empower different things that weren't possible back then, at least not without a
tremendous amount of work. You talk, for example, about not having enough people around who can program FPGAs.
Do you think that if you were building AdRoll today, for example, you would focus on higher
level services architecturally?
Would you go serverless?
Would containers be interesting?
Or would you effectively stick to the tried and true architecture that got you to where
you are?
Probably, I would probably do a mix.
I think what's important to evaluate as building infrastructure is the skill set of the people that you have working on your team.
And you certainly need to play to their strengths.
Ultimately, they are the ones building and maintaining your infrastructure,
not Amazon, not an external vendor,
and most certainly not the open source
maintainer of whichever project you use in alternative. And the other aspect is
try to understand sometimes with subtle indications from Amazon,
which services Amazon is investing most of their energy or a lot of their energy
in so that you know that they
continue to grow and they continue to receive support and they continue to fix bugs and
issues because you know that they'll be with you for the rest of your company's life, for
example.
But on the other hand, a lot of times you write software just automatically without really thinking about the better
way to write something just because you're used to it and so typically it's
not an easy thing to just jump out of the habit of getting an instance going
to do something and it might be a good idea at first, but if you develop a good process to test new
architectures and new ideas, you might quickly end up realizing, well, actually, I don't
need to run a T2 micro or whatever for running this particular thing with S3, where every
time a file is uploaded on S3, I run some checks on the file that is uploaded to S3 where every time a file is uploaded on S3 I run
some checks on the file that is uploaded to S3. You might realize well maybe the
best thing to do is to try to play around with a lambda function instead
and that effectively fixes your entire problem. One area that for example we've
tested around and it's on Adros technical blog is that we built a globally distributed eventually consistent counter that uses DynamoDB
and Lambda and S3 together and effectively is able to aggregate all of
the counts that are happening in each of the remote regions into a single counter
in a central region that can then be synced back to each remote region.
This way we can keep track of, for example, in our case, how much money has been spent in each particular region
and be sure that this money is spent efficiently.
And the only other alternative way to do it is to set up a fairly complex database of your own and make sure that latency of updates is fast
enough and that all the machines are up and running all the time and if anything goes down
it's a it's a high urgency situation because your controls on the budgets go away
so it's sometimes it's really useful to especially when dealing with problems in which communication and the
flow of information isn't particularly easy to grasp for an engineer it's easy
to be able to remove an entire layer of of a problem and be like reliant on
someone else to be providing the SLA that they are promising you and so
effectively that's
the case for what for Lambda is if there is a there's obviously a particular
range of uses in which Lambda makes complete sense what from the point of
view of price and from the point of view of the resources needed or the type of
computation that runs on it and if you can manage to keep this in your head, in your mind, when
you're making decisions and, or you can make some tests, you can actually discover that maybe you
can use Lambda and you get away with not having to solve quite a challenging problem at the end
of the day. So sometimes it helps rewriting some infrastructure just as an exercise. What I do at AdRoll is that I do,
as a CTO, I tend to not have a lot of direct reports. I consider each service at AdRoll to
be my direct report as a team, effectively, each of them being a team. And every six weeks,
they provide a short presentation in which they explain the budgets that they've
gone through, whether they have overspent or underspent and why. And among the many things,
they also talk about their infrastructure. We have diagrams of infrastructure. We talk about
new releases from Amazon and what would be a new way to build the same thing. And they evaluate
whether it would save money or not. And so you kind of need to have someone in the organization, especially if you're planning
to adopt some of the new technologies that their role is effectively dedicated to being
up to date with what's going on in the world and knowing the infrastructure of your systems
and be able to make suggestions and then let the team make the decision at that point.
What's also sometimes hard to reconcile for some people
is that these services don't hold still.
And I think one of the better services
to draw this parallel to
is one I know you're passionate about.
Let's talk a little bit about S3.
Before we started recording the show,
you mentioned that you thought that it was pretty
misunderstood.
Yeah.
What do you mean by that?
Well, S3 has been, in my view, it's been one of the closest thing to magic that exists
inside AWS.
Until not long ago, the maximum amount of data that you could pull from S3 was one gigabit
per second on streams.
You were limited in the number of requests per second that you could run on the same
shard of S3.
There was no way of tagging objects.
There was the latency on the first byte when S3 started was in the 200-300 milliseconds.
It was expensive.
S3 probably has undergone some of the most cost-cutting that you could see out there.
And part of the decrease in cost
is that the standard storage classes become cheaper,
but also they have added significant other storage classes
that you can move your data in and out of relatively
simply without having to change the service effectively. It's the very same API, but
different cost profile and storage mechanism. And when it all started, there was just one,
it was just US standard and it was pretty expensive to use both in terms of
per request cost and storage cost. But yeah, today there's the limit on the
single bandwidth. The bandwidth on a single stream is not one gigabit per
second anymore, it's at least five gigabits per second. If you can get a
hundred gigs on one of the instances that have a hundred gig networking
inside Amazon, you can get all of those a hundred gigs out of S3 just fetching multiple streams.
The latency that you get on the first byte is well below 100 milliseconds. Their range queries
are very well supported, so you could fetch blocks inside S3. S3 has turned into almost a database
now. With S3 Select, it allows you to run filters directly on your files by decompressing them on the fly
and recompressing them afterwards or simply by reading richer formats like it could be
parquet for example it honestly is something that it's hard to imagine how you could build
everything that we have going on right now at Adro without S3.
It has gotten to the point where running an HDFS cluster for us is not really that useful.
If you look at EMR themselves, they have a version of HBase that runs backed by S3.
And I know of extremely big companies that have moved from running HBase backed by file system HDFS
to instead HBase backed by S3 that have had incredible improvements in performance
and the consistency of the performance of HBase.
HBase is very sensitive to the performance of the disks because it's a consistent first database effectively and if the region that is currently master,
sorry, if the server that is currently master for a region is slow, it ends up bringing down that entire region effectively.
It's a service that has grown dramatically and we have experimented even with using it as a file system
by using user file descriptors in the kernel. More
recent versions of the Linux kernel allow user file descriptors. And if you have limited use for
writing like we do and you want to treat the file system like a write once read many file system,
then S3 becomes actually surprisingly useful as well. Netflix published a blog article on their tech blog talking about, for example,
how they use a way to mount S3 as a local file system in order to
using FFmpeg to run movie decoding and transcoding.
Because effectively FFmpeg was not
created with the idea that S3 was around and so it needs to have the
entire file available on the local system or at least an entire block
available it doesn't work well with streams and so if you can abstract that
part away from the FFmpeg API and move it in the file system you can
suddenly use S3 as some kind of almost a file
system.
And we have done a similar thing when it comes to processing columnar files or indexed files
from inside S3, where if you know exactly the range of data that you want to access,
you can just do it inside S3.
We use it as a communication layer between the map and the reduced stage of our homegrown
MapReduce frameworks.
And it's again, it allows us to cut away thousands of hours per day on waiting times for downloading
a file to the local disk before processing it on local disk.
We can just process it right away and cache it on the box after it's been downloaded.
It's quite remarkable.
The speed increase, the cost decrease, the three select.
I think we're going to see in the near future
databases that start to use S3
as the actual backend for their storage more and more
without worrying about the limitations of the current disk.
And effectively, we'll be able to scale in a stateless way,
adding as many machines as you want
and respond to as much traffic as you can
without needing to worry about failures either.
It's an incredible amount of opportunity
and possibility that is coming down
in the future that I'm really excited to see become real. I think that requires people to
update a lot of their understandings about it. I mean, one of the things that I've always noticed
that's been incredibly frustrating is that people believe it when it says it's simple storage
service. Oh, simple. And you look on Hacker News and that's generally the consensus.
Well, S3 doesn't sound hard. I can build one of those in a weekend. And you see a bunch of
companies trying to spin up alternatives to this. Companies no one's ever heard of before. Oh,
we're going to do S3, but on the blockchain is another popular one that makes me just roll my
eyes back in my head so hard I pass out. You're right. This is the closest thing to magic that I think you'll see
in all of AWS. And people haven't seemed to update their opinion. I think you're right.
It's getting closer to a database than almost anything else. But I guess the discussions
around it tend to be, well, a little facile, for lack of a better term. Well, there was this outage
a couple of years ago, and it went down for four hours in a single region, and that's a complete
non-starter, so we can't ever trust it. Who's going to be able to run their own internal data
store with better uptime than that? Remarkably few people. Yeah, I mean, Adderall has used 17 exabytes of bandwidth from S3 just for our business
intelligence workload from EC2 to S3 this past month. I don't even know how to even start.
If a router communicating between S3 and whatever instance we have going around
goes out and we're out, we're out for good.
S3 has multiple different paths to reach EC2 and they are all redundant.
Each machine's internal there is obviously redundant.
They replicate the data in multiple zones and whatnot.
This bandwidth is available across multiple zones because I'm storing data inside the
region so it's already available in multiple data centers.
The number of boxes that are needed to aggregate to 17 exabytes as well is quite impressive.
We have no people thinking about this. We run over, I think we run over 20 billion
requests per month on S3. I'm pretty sure that bucket, if it were made public, would be one of
the biggest properties in terms of volume on the web. And I just can't see it. Processing 20 billion events per month with files that are sometimes significantly big,
it's going to take a lot of people.
Exactly.
People like to undervalue their own expertise, what their time costs, the opportunity cost
of focusing on that more than other stuff.
And you still see it with strange implementations of trying to mount S3 in a FUSE file system. Trying to treat it
like that has never worked out for anything I've ever seen, but people keep
trying. Yeah, the FUSE file system is an interesting one. I think
things might change in the future, but it needs to be done with some concept of
what you're doing. It really isn't a file system, but it works for a certain subset of the use cases.
And we're not even talking about necessarily yet all of the compliance side of things.
So encryption, ability to rotate your keys, to set permissions on who can or cannot access,
tagging each object, building rules for accessing the objects
or the prefix based on the tags available on that object using the IM policies.
Life cycle transitions, object locks so no one can delete it, litigation hold options.
And then take a look at Deep Archive, $12,000 a year to store a petabyte.
That's who cares money.
Yeah, that's that's who cares money yeah that's that's uh
exactly absolutely right plus it doesn't matter at a certain point in time if you're not compliant
and you're storing that much data you just can't so you might have to delete it all there's a lot
of different uh security regulations gdpr is incoming is your database going to help out to remain compliant well GDPR
isn't incoming actually it came out a year ago but with GDPR here and the California privacy
law incoming next at the end of the year and the end of this year is your data is your storage
system going to help you to become compliant who's going to build all of the compliance tools
on top of on top of your
storage system and and make sure that you remain compliant for kingdom comes so that basically it's
uh it's it's a i mean it's it's it's awesome and i think it's a healthy exercise for every engineer
to always question what uh what is the value that you're getting out of a service and try to scheme
or like understand the infrastructure try to try
to like whiteboard it out and maybe do quick cost estimation but it's never enough to have just the
engineer in there security is a stakeholder of this kind of decisions the operations team is
a stakeholder in this kind of decision the business is in the is a stakeholder in this kind of decision the business is in the is a stakeholder in this kind
of decision the business might not be happy as you said to spend 500 000 for two engineers to
work on s3 a year when they can spend 12 000 to get a petabyte store the year inside s3 just
it's just a lot easier uh 12 grand is really who cares money. Exactly, especially when you're dealing with
what it takes to build and run something that averages that much data. It becomes almost a
side note. And the durability guarantees remain there as well. It feels like one of those things
we could go on with for hours and hours. Yeah. And the other aspect that is very important is how close S3 is to the computing power.
Because as I said, 17 exabytes of data just for BI purposes, I cannot do that across data center.
There is no way. That would cost everything from the business in terms of bandwidth costs.
Many times other vendors approach Adderall
obviously asking for using their storage solution but you're simply either for
deploy either to deploy you I need my own data center and then you're not
close to the where the capacity is or you are in another system where I don't
need a data center but you're not located near my compute capacity.
And so I lose that piece of the equation
that makes all of the stuff that I want to do worthwhile.
To an extent, S3 is the biggest locking reason
behind the behind EC2.
It really is hard to replicate all of the different
bits and pieces of technology that are built on top of S3 and in particular
being so close to so many services that are easy to integrate with each other
from thing we're using things such as Lambda or EC2 makes it very compelling.
Other cloud vendors are obviously always on the catch-up and and getting there
but I don't think they're quite to the level
of customization, security, compliance,
and ease of use that really Amazon S3 has.
It also has really hard aspects to it as well,
but I think by and large, it's a huge success story.
If people are interested in hearing more,
I guess, of your wise thoughts on the proper application of these various
services where on the internet can they find you? Oh I the easiest way to find me
is to shoot me questions comments or follow me on on my Twitter account
dialtone underscore. Adderall also has a tech blog at tech.adderall.com. We usually publish
a lot of interesting articles about the ongoings with our infrastructure, things such as the
globally eventually consistent counter that I mentioned earlier, but also our extreme use of the spot market effectively, or our strange use of S3 as a quasi file system
for processing our MapReduce jobs,
which also are described in our blog.
And generally speaking,
I'm more than happy to answer questions
and whatnot to local events.
I usually go to as many local events as I can here
with the either AWS user events or other meetups
or go to a random set of other conferences as well.
Thank you so much for taking the time to speak with me today.
I appreciate it.
Thank you.
Valentino Valonghi, CTO of AdRoll.
I'm Corey Quinn, and this is Screaming in the Cloud.
This has been this week's episode of Screaming in the Cloud. This has been this week's episode of Screaming in the Cloud.
You can also find more Corey at Screaminginthecloud.com
or wherever fine snark is sold.
This has been a humble pod production stay humble