Screaming in the Cloud - Episode 26: I’m not a data scientist, but I work for an AI/ML startup building on Serverless Containers
Episode Date: September 5, 2018Do you deal with a lot of data? Do you need to analyze and interpret data? Veritone’s platform is designed to ingest audio, video, and other data through batch processes to process the medi...a and attach output, such as transcripts or facial recognition data. Today, we’re talking to Christopher Stobie, a DevOps professional with more than seven years of experience building and managing applications. Currently, he is the director of site reliability engineering at Veritone in Costa Mesa, Calif. Veritone positions itself as a provider of artificial intelligence (AI) tools designed to help other companies analyze and organize unstructured data. Previously, Christopher was a technical account manager (TAM) at Amazon Web Services (AWS); lead DevOps engineer at Clear Capital; lead DevOps engineer at ESI; Cloud consultant at Credera; and Patriot/THAAD Missile Fire Control in the U.S. Army. Besides staying busy with DevOps and missiles, he enjoys playing racquetball in short shorts and drinking good (not great) wine. Some of the highlights of the show include: Various problems can be solved with AI; companies are spending time and money on AI Tasks can be automated that are too intelligent to write around simple software Machine learning (ML) models are applicable for many purposes; real people with real problems and who are not academics can use ML Fargate is instant-on Docker containers as a service; handles infrastructure scaling, but involves management expense Instant-on works with numerous containers, but there will probably be a time when it no longer delivers reasonable fleet performance on demand Decision to use Kafka was based on workload, stream-based ingestion Veritone’s writes code that tries to avoid provider lock-in; wants to make an integration as decoupled as possible People spend too much time and energy being agnostic to their technology and giving up benefits If you dream about seeing your name up in lights, Christopher describes the process of writing a post for AWS Pain Points: Newness of Fargate and unfamiliarity with it; limit issues; unable to handle large containers Links: Veritone Christopher Stobie on LinkedIn Building Real Time AI with AWS Fargate SageMaker Fargate Docker Kafka Digital Ocean .
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, cloud economist Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This week's episode of Screaming in the Cloud is generously sponsored
by DigitalOcean. I would argue that every cloud platform out there biases for different things.
Some bias for having every feature you could possibly want offered as a managed service at
varying degrees of maturity. Others bias for, hey, we heard there's some money to be made in the cloud space. Can you give us some of it?
DigitalOcean biases for neither. To me, they optimize for simplicity. I polled some friends of mine who are avid DigitalOcean supporters about why they're using it for various things,
and they all said more or less the same thing. Other offerings have a bunch of shenanigans,
root access and IP addresses.
DigitalOcean makes it all simple.
In 60 seconds, you have root access to a Linux box with an IP.
That's a direct quote, albeit with profanity about other providers taken out.
DigitalOcean also offers fixed price offerings. You always know what you're going to wind up paying this month,
so you don't wind up having a minor heart issue when the bill comes in.
Their services are also understandable without spending three months going to cloud school.
You don't have to worry about going very deep to understand what you're doing.
It's click button or make an API call and you receive a cloud resource.
They also include very understandable monitoring and alerting.
And lastly, they're not exactly what I would call small time. Over 150,000 businesses are using them today. So go ahead and
give them a try. Visit do.co slash screaming, and they'll give you a free $100 credit to try it out.
That's do.co slash screaming. Thanks again to DigitalOcean for their support of Screaming in the Cloud.
Hello and welcome to Screaming in the Cloud.
I'm Corey Quinn.
I'm joined this week by Christopher Stobbe, who's the director of SRE at Veritone.
He's also a former TAM at AWS, but that's not really what I wanted to invite him here to talk about.
Instead, a blog post went out somewhat recently about architecture that he's been working on.
So first, welcome to the show, Christopher.
Hey, Corey. Good to be here. Thanks for having me.
No, thanks for being so generous with your time.
So let's start at the very beginning.
I first became aware that you folks existed with a post that was put up on the Amazon official architecture blog,
and I'll throw a link to it in the show notes,
that was titled Building Real-Time AI with AWS Fargate.
So I read that five or six times,
and eventually I had a vague idea
of what you were talking about
and did a little more digging.
So for those who were starting off
in the same place that I was,
Veritone is a company that likes to position itself as a provider of artificial intelligence
tools designed to help other companies analyze and organize unstructured data, such as audio,
video, and images. What does that mean, using small words?
Yeah, the description there is a little bit of a mouthful.
I think the best way I like to describe it is actually more of an anecdote or a story.
So with normal AI, if you want to do, say, something like image recognition or speech-to-text or this or that,
any of these different capabilities that exist,
you have to go write a service that connects to that engine,
and you have to write an API layer that's very specific and very singular.
Veritone abstracts all that and says, hey, you can learn the Veritone API,
and you can get access to any engine that we have in our ecosystem and make a single call and describe what you want
and get results against any of the
engines that we support. So I like to look at Veritone as a unification layer, a single API
for lots of different AI. It's easy to fall into the trap that I did when I started researching
into what it is you would actually build that, oh, you're talking about AI and machine learning.
It's probably a
few people who are sitting in a garage somewhere. They've gotten a seed round, maybe a Series A,
and holy crap, you're publicly traded on the NASDAQ. So this is no longer the sort of thing
that's just the remit of hobbyists or focused on what if far future technology. This is something that the market believes in strongly.
This is something that's here today,
albeit one that's still being built into a clear-cut use case.
As things stand today, what problems might I have
that look like something that AI might be able to help me with?
Yeah, so I think there's a lot of different things
that can be solved with AI.
And I think a lot of really big companies
are applying a lot of time and money
into building the AI that changes the world.
But I think in the meantime, before 20 years passes,
there's a lot of menial stuff,
or maybe not menial is the right word,
but there's a lot of tasks that can be automated
that are a little bit too intelligent to just write simple software around.
Things like analyzing court case documents, ingesting them and transcribing them from text
into an indexed searchable object in a database is something that traditionally was done by humans
and took a lot of time and energy.
Instead, you can scan a document, run it through a transcription engine, and you have your results indexed and searchable in a few hours or even faster, depending on whether or not you use Veritone.
One of the interesting challenges about this entire space, from my perspective, is just the sheer applicability of machine learning models to different things.
A while back when SageMaker first came out,
I gave it a few months
and then asked on my ridiculous newsletter,
who's using SageMaker and for what?
Because personally, I'm not a data scientist.
I'm not someone who has the wherewithal
or the expertise to have intelligent conversations
around these things.
And what amazed me was, first, the sheer volume of replies I got.
Secondly, the fact that everyone was doing something different with it.
And lastly, that they all started with some form of the sentence,
I'm not a data scientist, but.
This is rapidly turning into something that real people with real problems who are not themselves academics are able to touch and use and get exposure to.
Yeah, I'm not a data scientist, but I definitely agree.
I think that AI is expanding and we're growing into a field that just demands accuracy and results at a much faster pace than humans can deliver. Absolutely. You wound up mentioning in your post that this entire system that you describe is built
around Fargate. For those who aren't aware, this is effectively instant on Docker containers as a
service. Picture serverless Docker. And in addition to starting a war with that phrase,
you're effectively not that far from what this looks like.
It's you throw a Docker container at AWS, it handles all of the infrastructure scaling for you.
The downside to this, of course, is that first, there is some management expense tied into that. on a one-to-one compute level, you will wind up spending more per container hour or per container second
than you would for a similar amount of compute on EC2.
Do you find that the value that you get
from having something managed entirely for you
offsets that economic cost?
Or is there a tipping point where,
okay, we're now large enough on these workloads
that moving to EKS or ECS or something else eventually becomes a foregone conclusion? Yeah. So I think that when
we went out and we started doing this, we kind of looked at it twofold. We looked at it first
with the assumption that eventually Fargate would have some sort of a pre-provisioned billing,
kind of like pre-buying DynamoDB throughput or purchasing reserved instance.
Fargate is young, so I think we assumed that eventually it would have a better billing model than it currently does.
But given that it doesn't today, part of the architecture actually includes mixing and
matching Fargate and EC2.
So we have very bursty traffic, and we designed the mean of our traffic, the average load
to run on reserved instances on EC2
and for all the bursts to scale in Fargate.
So it is very expensive
and we were very conscious of the economic impact
that our company would have internally
if we went entirely Fargate.
I think that there's a definite story
around when Fargate becomes acceptable, when using other things begins to make more sense.
And one thing I'm starting to see more and more of as I talk to people about this is the idea of traditionally what you would see with on-prem versus cloud.
You own the base and rent the peak.
I'm starting to see people with EKS or ECS clusters or running it in KOps or whatever it is to run Kubernetes,
but then having a burstability story that goes to Fargate since it's instantly available,
it scales effectively forever. And the only real downside is a bit of cost at stupendous scale.
Right. Yeah. I think with Fargate, just the flexibility that you get from being able to scale quickly,
it just outweighs any cost impact, in my opinion.
With EC2, even with optimized ECS AMIs, you're still looking at a minute to a minute 30
just for an instance to be available and ready for traffic.
And that doesn't even include starting the container.
Whereas a lot of our benchmarking with Fargate,
we were benchmarking five-second start times from nothing.
So having a container not exist and be ready in five seconds,
to me, given our workload, outweighed the financial impacts.
Do you find that that instant-on experience works just as well
for one or two containers as it does for dozens, hundreds, thousands, etc. Or are there certain tipping points where it no longer is able to deliver reasonable fleet performance on demand? I'm not talking about service limits. I'm just talking about raw capacity. the cloud is not infinitely scalable. Source tried it. At what point do you wind up seeing inflections, if any,
or aren't they really manifesting in the service?
They haven't manifested for us yet.
I assume, given all of the things in AWS, that it will eventually manifest.
Luckily, we haven't hit that problem yet, though.
Yeah, it turns out that all things are finite at a large enough scale.
This is not incidentally intended to sound in any way, shape, or form as a ding on Fargate.
It's just when someone approaches you with a new service and says, here you go, it's awesome,
I have an ops background. My immediate question is, terrific, where is it going to break? If you
don't know and understand what the failure modes look like, you're
in for a bad time when your customers discover them.
And they will discover them. Yeah, I absolutely agree.
We ran our Fargate deployments through
a lot of load tests, just trying to break it,
basically trying to see when we started seeing issues.
And all things considered, and the amount of time we wanted to put into it,
we were not actually able to break it
from an error that was AWS related.
Right.
And I think that there's a lot of challenge
as far as trying to understand,
okay, is this something that's local to my account?
Is it local to this particular availability zone?
Is it local to the service itself?
Were you in the pre-announced beta period where it was just
limited to a few customers? Were you using it just from day one where it went GA? Was there
something else? Or am I not allowed to ask you that question? I don't actually know, but we were
in the beta period, but only maybe a couple of weeks before it went GA. By every account that I've been able to get,
Fargate is awesome.
My single complaint with it
is that its name is absolutely terrible.
It's almost like a code name
that's snuck out into the real world.
If I tell someone I'm using Fargate,
everyone looks at me blankly
unless they know exactly what it is.
There's no good way to infer a name from it,
such as Simple storage service.
Well, if I've never heard of S3,
I can probably ferret out what that means.
With Fargate, give up.
There's no good way to get there from first principles.
Yeah, the name is,
I always go immediately to Stargate,
which I assume most other nerds will as well.
Oh, thank God, it's not just me.
Something else that was of note in your blog post was that the cues that exist between your
components are using Kafka for communication between all these different pieces. Now,
let me qualify this. I am not at all interested in starting a religious war over what is the chosen cue and what is awful and only used by heretics.
But I will ask this, was it a difficult decision arriving at we'll use Kafka,
or was it a relatively straightforward shot? It was relatively straightforward. We try and
be fairly agnostic and also not incite our own holy wars. We use a number of other queue services internally.
I think Kafka just made the most sense for the specific workload, stream-based ingestion.
So it wasn't too difficult of a decision. There's a lot of noise these days about
picking only things that you can pick up and move as they are to another cloud provider.
Looking at what you've built, I'm not entirely sure what that would even begin to look like.
Was avoiding provider lock-in in any way, shape, or form
on your strategic roadmap?
Or was it, well, if we ever have to move, we'll deal with it then?
Or did I just cause a whole bunch of executives
to go completely white as they realize,
oh my word, we're locked in?
Veritone is very cognizant of vendor lock-in.
We actually have an offering of our product that you can ship and run in your own data centers.
So we're very cognizant of making sure
that when we write code that's specific
to a technology like Fargate,
we write it very small and use shims and make the actual integration as decoupled as possible. For example, after we did
the Fargate deployment, we reworked a lot of the APIs that use Fargate to use other things like
Kubernetes or Docker Swarm as well. Gotcha. I like the model because you're starting off with
something that embraces whatever the provider is offering. And then you go back and add shim layers that wind up making it portable
if you need to. If you're going to be targeting the idea of being provider agnostic, especially
as you need to be as you're meeting your customers where they are in your use case, it makes perfect
sense. That's why it's a best practice, not you must always do this. I think that's a terrific
architectural model. First generation, embrace whatever it is the provider gives you. Generation two, let's see what we can do to decouple this
in some areas where it makes sense. Yeah, I think a lot of people get lost
spending too much time and energy on being agnostic to their technology. And I think it's important.
But I also think that you get to a certain point where you're giving up all the benefits that you
might have gained by using that technology just to be agnostic. And at that point, to me, it doesn't make a lot
of sense. So I like to try and design things around best use case for the workload and then
work from there. Which is absolutely the right move. Wait, what do you mean you're not doing
something that's architecturally perfect in favor of chasing down something you'll never
need to implement? People just like to focus on the wrong part of the story.
Yeah, I agree entirely.
Your blog post originally appeared on the Veritone corporate blog.
And as someone who writes an awful lot of blog posts myself, this is of personal interest to me.
Your blog was invited to have a guest spot on the AWS Architecture blog.
My blog posts generally get threatened with cease and desist letters if I go too far.
How did you wind up getting your post featured on something that is an AWS property?
So we're a pretty large customer for AWS.
We have, well, large in the sense that we give them money,
not large in the sense comparing to other AWS customers. There's always a bigger fish.
We're big enough that they pay attention to our bill. And so I think that they noticed when we
started using Fargate that our solutions architect and Tams all reached out and were like,
Hey, we see you guys are using this new technology. What are you using it for? We're
really interested in your use case. And it just set up some conversations with their product managers
and the lead architects around Fargate, where we kind of walked through what we were building.
And they asked us if we'd be interested in co-writing a blog for the AWS Architecture Series.
What was that process like? Was it essentially, here you go, here's a blog post that we wrote,
and they said, cool, and published it as is, and it surprised you?
Was there a 15-round revision process?
Sorry, for those of us who dream of one day seeing our name up in lights,
it's interesting to understand what it is to go through that process.
Yeah, it actually was surprising to me because our internal processes
took quite a lot longer than theirs.
We did a lot of review internally with our marketing and legal teams before we sent it to AWS just to make sure that we had all of our bases covered and we were talking about things in the right way.
And by the time we sent it to AWS, they actually had no revisions for us.
It was just a waiting period for them to
find the right time and blog series to send it out with. So from the time we gave it to them,
there was not really a lot of back and forth until they told us, hey, your blog's being published.
It's nice to wind up having it just sail through like that. I
generally tend to not write in a style that lends itself to that.
Yeah, I think we just got lucky.
Well, to that end,
anytime I've given a talk or written a blog post about a technical solution or an architecture I was proud of,
if I've then taken that post
and I go and show it to some of my coworkers
who worked with me on building that thing in the first place,
their response is, yeah, it's a great piece of fiction you wrote there.
That's not the project that I remember.
And they're right.
I'm of the personality type where I will block out some of the negative issues, mostly to
keep myself from waking up in the night screaming.
But it's always sort of a glossy, polished, final version.
And if you follow a lot of other blogs that discuss similar things, this is a common pattern.
There's generally some form of, I guess, wishful thinking and polishing it up.
And, oh, it's easy.
We just sat down at our computers one day, and that was 9 o'clock in the morning.
And by lunchtime, we had this architecture that appears in this blog post.
And I don't care if you're writing hello world, it's never that simple or easy to pull off.
So can you talk a little bit about behind the scenes?
What were the, I guess, pain points as you were building this out?
What didn't go according to plan?
What could have worked but didn't and needed to be backed around in some ways?
Sure. So I think the biggest issues we really had was based on how new Fargate is and our
general understanding or laziness from not wanting to read the documentation
of running into things like limit issues. We were basically requesting too many containers to be launched
consecutively. AWS had to slap us on the wrist and tell us to stop. Luckily, we had some really
good conversations with the engineering team and the service team for Fargate, and we were able to
get a lot of these limits increased. But that back and forth and kind of not knowing what was going
on or why things were breaking was definitely a pain point. I think another pain point that we ran into with Fargate specifically was it doesn't
handle large containers very well. And with a lot of AI engines, you have these really, really big
flat files that are like six gigs. And trying to launch a six gig container in Fargate, if anyone
figures out how to do that well, please reach out to me.
I'd love to hear about it.
But for us, comparing a regular Go container that's 5 to 10 megs
to a 6-gig container was like 5 seconds compared to 15 minutes to launch containers.
It was very, very slow and painful,
and we actually ended up not being able to use Fargate for some of our
larger containers. This is far from an isolated occurrence, incidentally. I've spoken with other
clients of mine who are in similar situations, and their question is, great, so how do I go about
effectively launching a 10-gigabyte container using... And I don't even need to listen to the
rest of that sentence, because the only answer, almost regardless of technology provider, is you don't. You don't launch a container that's
that large unless you have nothing but time because it's not going to be performant. Getting
it out to where it needs to go takes forever. And a whole host of things that arise from the idea
that containers are envisioned, for better or worse, as being a relatively lightweight, thin
thing that winds up being tossed to a bunch of things at the same time. Not, well, okay, it's easier for us to deploy our
container via one of those trucks that Amazon has, Snowmobile, that has 100 petabytes of storage in
the back because it takes too long to get out there over the network. At some tipping point,
this is in some ways the wrong tool for the job as it's currently being imagined.
Yeah, I agree. One of the reasons we use containers for everything, including these engines that maybe it doesn't make sense for, Veritone actually has another service called
Veritone Developer Application, or VDA. And this allows anyone in the world, you or me, to go write
an AI engine or any type of engine that you'd like and upload it into the Veritone system.
So if you want an engine that can tell you hot dog or not hot dog,
very specifically, you can write one and upload it to Veritone
and then use that engine later to go compare hot dogs.
So given the nature of all the different developers that would be
submitting code into our platform, we needed some sort of a common technology that would allow us
to ingest and deploy the engines that they submit to us in a predictable and similar fashion. So
Docker was the obvious choice. I think that you're probably right based upon that. The challenge,
of course, is always trying to disambiguate the hype
from what people are actually doing and how they're approaching things.
It's everyone says I should be using this particular technology,
and that technology, incidentally, changes from week to week.
It's virtualization, it's cloud, it's containers, it's Kubernetes,
it's serverless, it's wait 20 minutes, we'll have another one of these.
But making sure the problem you have looks an awful lot like the one that the tool is aimed at
is usually a step that some people tend to gloss over.
Yeah, I definitely would agree.
So to that end, what advice would you give someone who read your blog post,
was entranced by it, and is determined to follow in your architectural
footsteps? Don't be afraid to fail because we failed numerous times trying to build this
the first couple of go-rounds. Just go in, dive into it, and you'll be surprised with how powerful
the technology can be. Fargate is a pretty cool tool and I expect it to evolve to be, I think,
one of their biggest services over the next few years. I suspect I expect it to evolve to be, I think, one of their biggest services
over the next few years. I suspect you're probably not going to be wrong on that.
Thank you so much for being so generous with your time. Thanks, Corey. Appreciate it.
Christopher Stobie of Veritone. I'm Corey Quinn, and this is Screaming in the Cloud.
This has been this week's episode of Screaming in the Cloud. You can also find more Corey
at screaminginthecloud.com
or wherever fine snark is sold.