Screaming in the Cloud - Episode 26: I’m not a data scientist, but I work for an AI/ML startup building on Serverless Containers

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This week's episode of Screaming in the Cloud is generously sponsored by DigitalOcean. I would argue that every cloud platform out there biases for different things. Some bias for having every feature you could possibly want offered as a managed service at

Starting point is 00:00:37 varying degrees of maturity. Others bias for, hey, we heard there's some money to be made in the cloud space. Can you give us some of it? DigitalOcean biases for neither. To me, they optimize for simplicity. I polled some friends of mine who are avid DigitalOcean supporters about why they're using it for various things, and they all said more or less the same thing. Other offerings have a bunch of shenanigans, root access and IP addresses. DigitalOcean makes it all simple. In 60 seconds, you have root access to a Linux box with an IP. That's a direct quote, albeit with profanity about other providers taken out. DigitalOcean also offers fixed price offerings. You always know what you're going to wind up paying this month,

Starting point is 00:01:23 so you don't wind up having a minor heart issue when the bill comes in. Their services are also understandable without spending three months going to cloud school. You don't have to worry about going very deep to understand what you're doing. It's click button or make an API call and you receive a cloud resource. They also include very understandable monitoring and alerting. And lastly, they're not exactly what I would call small time. Over 150,000 businesses are using them today. So go ahead and give them a try. Visit do.co slash screaming, and they'll give you a free $100 credit to try it out. That's do.co slash screaming. Thanks again to DigitalOcean for their support of Screaming in the Cloud.

Starting point is 00:02:08 Hello and welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by Christopher Stobbe, who's the director of SRE at Veritone. He's also a former TAM at AWS, but that's not really what I wanted to invite him here to talk about. Instead, a blog post went out somewhat recently about architecture that he's been working on. So first, welcome to the show, Christopher. Hey, Corey. Good to be here. Thanks for having me. No, thanks for being so generous with your time.

Starting point is 00:02:35 So let's start at the very beginning. I first became aware that you folks existed with a post that was put up on the Amazon official architecture blog, and I'll throw a link to it in the show notes, that was titled Building Real-Time AI with AWS Fargate. So I read that five or six times, and eventually I had a vague idea of what you were talking about and did a little more digging.

Starting point is 00:03:02 So for those who were starting off in the same place that I was, Veritone is a company that likes to position itself as a provider of artificial intelligence tools designed to help other companies analyze and organize unstructured data, such as audio, video, and images. What does that mean, using small words? Yeah, the description there is a little bit of a mouthful. I think the best way I like to describe it is actually more of an anecdote or a story. So with normal AI, if you want to do, say, something like image recognition or speech-to-text or this or that,

Starting point is 00:03:42 any of these different capabilities that exist, you have to go write a service that connects to that engine, and you have to write an API layer that's very specific and very singular. Veritone abstracts all that and says, hey, you can learn the Veritone API, and you can get access to any engine that we have in our ecosystem and make a single call and describe what you want and get results against any of the engines that we support. So I like to look at Veritone as a unification layer, a single API for lots of different AI. It's easy to fall into the trap that I did when I started researching

Starting point is 00:04:18 into what it is you would actually build that, oh, you're talking about AI and machine learning. It's probably a few people who are sitting in a garage somewhere. They've gotten a seed round, maybe a Series A, and holy crap, you're publicly traded on the NASDAQ. So this is no longer the sort of thing that's just the remit of hobbyists or focused on what if far future technology. This is something that the market believes in strongly. This is something that's here today, albeit one that's still being built into a clear-cut use case. As things stand today, what problems might I have

Starting point is 00:04:58 that look like something that AI might be able to help me with? Yeah, so I think there's a lot of different things that can be solved with AI. And I think a lot of really big companies are applying a lot of time and money into building the AI that changes the world. But I think in the meantime, before 20 years passes, there's a lot of menial stuff,

Starting point is 00:05:20 or maybe not menial is the right word, but there's a lot of tasks that can be automated that are a little bit too intelligent to just write simple software around. Things like analyzing court case documents, ingesting them and transcribing them from text into an indexed searchable object in a database is something that traditionally was done by humans and took a lot of time and energy. Instead, you can scan a document, run it through a transcription engine, and you have your results indexed and searchable in a few hours or even faster, depending on whether or not you use Veritone. One of the interesting challenges about this entire space, from my perspective, is just the sheer applicability of machine learning models to different things.

Starting point is 00:06:08 A while back when SageMaker first came out, I gave it a few months and then asked on my ridiculous newsletter, who's using SageMaker and for what? Because personally, I'm not a data scientist. I'm not someone who has the wherewithal or the expertise to have intelligent conversations around these things.

Starting point is 00:06:25 And what amazed me was, first, the sheer volume of replies I got. Secondly, the fact that everyone was doing something different with it. And lastly, that they all started with some form of the sentence, I'm not a data scientist, but. This is rapidly turning into something that real people with real problems who are not themselves academics are able to touch and use and get exposure to. Yeah, I'm not a data scientist, but I definitely agree. I think that AI is expanding and we're growing into a field that just demands accuracy and results at a much faster pace than humans can deliver. Absolutely. You wound up mentioning in your post that this entire system that you describe is built around Fargate. For those who aren't aware, this is effectively instant on Docker containers as a

Starting point is 00:07:17 service. Picture serverless Docker. And in addition to starting a war with that phrase, you're effectively not that far from what this looks like. It's you throw a Docker container at AWS, it handles all of the infrastructure scaling for you. The downside to this, of course, is that first, there is some management expense tied into that. on a one-to-one compute level, you will wind up spending more per container hour or per container second than you would for a similar amount of compute on EC2. Do you find that the value that you get from having something managed entirely for you offsets that economic cost?

Starting point is 00:08:00 Or is there a tipping point where, okay, we're now large enough on these workloads that moving to EKS or ECS or something else eventually becomes a foregone conclusion? Yeah. So I think that when we went out and we started doing this, we kind of looked at it twofold. We looked at it first with the assumption that eventually Fargate would have some sort of a pre-provisioned billing, kind of like pre-buying DynamoDB throughput or purchasing reserved instance. Fargate is young, so I think we assumed that eventually it would have a better billing model than it currently does. But given that it doesn't today, part of the architecture actually includes mixing and

Starting point is 00:08:35 matching Fargate and EC2. So we have very bursty traffic, and we designed the mean of our traffic, the average load to run on reserved instances on EC2 and for all the bursts to scale in Fargate. So it is very expensive and we were very conscious of the economic impact that our company would have internally if we went entirely Fargate.

Starting point is 00:08:58 I think that there's a definite story around when Fargate becomes acceptable, when using other things begins to make more sense. And one thing I'm starting to see more and more of as I talk to people about this is the idea of traditionally what you would see with on-prem versus cloud. You own the base and rent the peak. I'm starting to see people with EKS or ECS clusters or running it in KOps or whatever it is to run Kubernetes, but then having a burstability story that goes to Fargate since it's instantly available, it scales effectively forever. And the only real downside is a bit of cost at stupendous scale. Right. Yeah. I think with Fargate, just the flexibility that you get from being able to scale quickly,

Starting point is 00:09:46 it just outweighs any cost impact, in my opinion. With EC2, even with optimized ECS AMIs, you're still looking at a minute to a minute 30 just for an instance to be available and ready for traffic. And that doesn't even include starting the container. Whereas a lot of our benchmarking with Fargate, we were benchmarking five-second start times from nothing. So having a container not exist and be ready in five seconds, to me, given our workload, outweighed the financial impacts.

Starting point is 00:10:18 Do you find that that instant-on experience works just as well for one or two containers as it does for dozens, hundreds, thousands, etc. Or are there certain tipping points where it no longer is able to deliver reasonable fleet performance on demand? I'm not talking about service limits. I'm just talking about raw capacity. the cloud is not infinitely scalable. Source tried it. At what point do you wind up seeing inflections, if any, or aren't they really manifesting in the service? They haven't manifested for us yet. I assume, given all of the things in AWS, that it will eventually manifest. Luckily, we haven't hit that problem yet, though. Yeah, it turns out that all things are finite at a large enough scale. This is not incidentally intended to sound in any way, shape, or form as a ding on Fargate.

Starting point is 00:11:09 It's just when someone approaches you with a new service and says, here you go, it's awesome, I have an ops background. My immediate question is, terrific, where is it going to break? If you don't know and understand what the failure modes look like, you're in for a bad time when your customers discover them. And they will discover them. Yeah, I absolutely agree. We ran our Fargate deployments through a lot of load tests, just trying to break it, basically trying to see when we started seeing issues.

Starting point is 00:11:40 And all things considered, and the amount of time we wanted to put into it, we were not actually able to break it from an error that was AWS related. Right. And I think that there's a lot of challenge as far as trying to understand, okay, is this something that's local to my account? Is it local to this particular availability zone?

Starting point is 00:11:59 Is it local to the service itself? Were you in the pre-announced beta period where it was just limited to a few customers? Were you using it just from day one where it went GA? Was there something else? Or am I not allowed to ask you that question? I don't actually know, but we were in the beta period, but only maybe a couple of weeks before it went GA. By every account that I've been able to get, Fargate is awesome. My single complaint with it is that its name is absolutely terrible.

Starting point is 00:12:32 It's almost like a code name that's snuck out into the real world. If I tell someone I'm using Fargate, everyone looks at me blankly unless they know exactly what it is. There's no good way to infer a name from it, such as Simple storage service. Well, if I've never heard of S3,

Starting point is 00:12:48 I can probably ferret out what that means. With Fargate, give up. There's no good way to get there from first principles. Yeah, the name is, I always go immediately to Stargate, which I assume most other nerds will as well. Oh, thank God, it's not just me. Something else that was of note in your blog post was that the cues that exist between your

Starting point is 00:13:09 components are using Kafka for communication between all these different pieces. Now, let me qualify this. I am not at all interested in starting a religious war over what is the chosen cue and what is awful and only used by heretics. But I will ask this, was it a difficult decision arriving at we'll use Kafka, or was it a relatively straightforward shot? It was relatively straightforward. We try and be fairly agnostic and also not incite our own holy wars. We use a number of other queue services internally. I think Kafka just made the most sense for the specific workload, stream-based ingestion. So it wasn't too difficult of a decision. There's a lot of noise these days about picking only things that you can pick up and move as they are to another cloud provider.

Starting point is 00:14:02 Looking at what you've built, I'm not entirely sure what that would even begin to look like. Was avoiding provider lock-in in any way, shape, or form on your strategic roadmap? Or was it, well, if we ever have to move, we'll deal with it then? Or did I just cause a whole bunch of executives to go completely white as they realize, oh my word, we're locked in? Veritone is very cognizant of vendor lock-in.

Starting point is 00:14:33 We actually have an offering of our product that you can ship and run in your own data centers. So we're very cognizant of making sure that when we write code that's specific to a technology like Fargate, we write it very small and use shims and make the actual integration as decoupled as possible. For example, after we did the Fargate deployment, we reworked a lot of the APIs that use Fargate to use other things like Kubernetes or Docker Swarm as well. Gotcha. I like the model because you're starting off with something that embraces whatever the provider is offering. And then you go back and add shim layers that wind up making it portable

Starting point is 00:15:09 if you need to. If you're going to be targeting the idea of being provider agnostic, especially as you need to be as you're meeting your customers where they are in your use case, it makes perfect sense. That's why it's a best practice, not you must always do this. I think that's a terrific architectural model. First generation, embrace whatever it is the provider gives you. Generation two, let's see what we can do to decouple this in some areas where it makes sense. Yeah, I think a lot of people get lost spending too much time and energy on being agnostic to their technology. And I think it's important. But I also think that you get to a certain point where you're giving up all the benefits that you might have gained by using that technology just to be agnostic. And at that point, to me, it doesn't make a lot

Starting point is 00:15:48 of sense. So I like to try and design things around best use case for the workload and then work from there. Which is absolutely the right move. Wait, what do you mean you're not doing something that's architecturally perfect in favor of chasing down something you'll never need to implement? People just like to focus on the wrong part of the story. Yeah, I agree entirely. Your blog post originally appeared on the Veritone corporate blog. And as someone who writes an awful lot of blog posts myself, this is of personal interest to me. Your blog was invited to have a guest spot on the AWS Architecture blog.

Starting point is 00:16:26 My blog posts generally get threatened with cease and desist letters if I go too far. How did you wind up getting your post featured on something that is an AWS property? So we're a pretty large customer for AWS. We have, well, large in the sense that we give them money, not large in the sense comparing to other AWS customers. There's always a bigger fish. We're big enough that they pay attention to our bill. And so I think that they noticed when we started using Fargate that our solutions architect and Tams all reached out and were like, Hey, we see you guys are using this new technology. What are you using it for? We're

Starting point is 00:17:03 really interested in your use case. And it just set up some conversations with their product managers and the lead architects around Fargate, where we kind of walked through what we were building. And they asked us if we'd be interested in co-writing a blog for the AWS Architecture Series. What was that process like? Was it essentially, here you go, here's a blog post that we wrote, and they said, cool, and published it as is, and it surprised you? Was there a 15-round revision process? Sorry, for those of us who dream of one day seeing our name up in lights, it's interesting to understand what it is to go through that process.

Starting point is 00:17:39 Yeah, it actually was surprising to me because our internal processes took quite a lot longer than theirs. We did a lot of review internally with our marketing and legal teams before we sent it to AWS just to make sure that we had all of our bases covered and we were talking about things in the right way. And by the time we sent it to AWS, they actually had no revisions for us. It was just a waiting period for them to find the right time and blog series to send it out with. So from the time we gave it to them, there was not really a lot of back and forth until they told us, hey, your blog's being published. It's nice to wind up having it just sail through like that. I

Starting point is 00:18:19 generally tend to not write in a style that lends itself to that. Yeah, I think we just got lucky. Well, to that end, anytime I've given a talk or written a blog post about a technical solution or an architecture I was proud of, if I've then taken that post and I go and show it to some of my coworkers who worked with me on building that thing in the first place, their response is, yeah, it's a great piece of fiction you wrote there.

Starting point is 00:18:48 That's not the project that I remember. And they're right. I'm of the personality type where I will block out some of the negative issues, mostly to keep myself from waking up in the night screaming. But it's always sort of a glossy, polished, final version. And if you follow a lot of other blogs that discuss similar things, this is a common pattern. There's generally some form of, I guess, wishful thinking and polishing it up. And, oh, it's easy.

Starting point is 00:19:17 We just sat down at our computers one day, and that was 9 o'clock in the morning. And by lunchtime, we had this architecture that appears in this blog post. And I don't care if you're writing hello world, it's never that simple or easy to pull off. So can you talk a little bit about behind the scenes? What were the, I guess, pain points as you were building this out? What didn't go according to plan? What could have worked but didn't and needed to be backed around in some ways? Sure. So I think the biggest issues we really had was based on how new Fargate is and our

Starting point is 00:19:53 general understanding or laziness from not wanting to read the documentation of running into things like limit issues. We were basically requesting too many containers to be launched consecutively. AWS had to slap us on the wrist and tell us to stop. Luckily, we had some really good conversations with the engineering team and the service team for Fargate, and we were able to get a lot of these limits increased. But that back and forth and kind of not knowing what was going on or why things were breaking was definitely a pain point. I think another pain point that we ran into with Fargate specifically was it doesn't handle large containers very well. And with a lot of AI engines, you have these really, really big flat files that are like six gigs. And trying to launch a six gig container in Fargate, if anyone

Starting point is 00:20:43 figures out how to do that well, please reach out to me. I'd love to hear about it. But for us, comparing a regular Go container that's 5 to 10 megs to a 6-gig container was like 5 seconds compared to 15 minutes to launch containers. It was very, very slow and painful, and we actually ended up not being able to use Fargate for some of our larger containers. This is far from an isolated occurrence, incidentally. I've spoken with other clients of mine who are in similar situations, and their question is, great, so how do I go about

Starting point is 00:21:16 effectively launching a 10-gigabyte container using... And I don't even need to listen to the rest of that sentence, because the only answer, almost regardless of technology provider, is you don't. You don't launch a container that's that large unless you have nothing but time because it's not going to be performant. Getting it out to where it needs to go takes forever. And a whole host of things that arise from the idea that containers are envisioned, for better or worse, as being a relatively lightweight, thin thing that winds up being tossed to a bunch of things at the same time. Not, well, okay, it's easier for us to deploy our container via one of those trucks that Amazon has, Snowmobile, that has 100 petabytes of storage in the back because it takes too long to get out there over the network. At some tipping point,

Starting point is 00:22:00 this is in some ways the wrong tool for the job as it's currently being imagined. Yeah, I agree. One of the reasons we use containers for everything, including these engines that maybe it doesn't make sense for, Veritone actually has another service called Veritone Developer Application, or VDA. And this allows anyone in the world, you or me, to go write an AI engine or any type of engine that you'd like and upload it into the Veritone system. So if you want an engine that can tell you hot dog or not hot dog, very specifically, you can write one and upload it to Veritone and then use that engine later to go compare hot dogs. So given the nature of all the different developers that would be

Starting point is 00:22:46 submitting code into our platform, we needed some sort of a common technology that would allow us to ingest and deploy the engines that they submit to us in a predictable and similar fashion. So Docker was the obvious choice. I think that you're probably right based upon that. The challenge, of course, is always trying to disambiguate the hype from what people are actually doing and how they're approaching things. It's everyone says I should be using this particular technology, and that technology, incidentally, changes from week to week. It's virtualization, it's cloud, it's containers, it's Kubernetes,

Starting point is 00:23:20 it's serverless, it's wait 20 minutes, we'll have another one of these. But making sure the problem you have looks an awful lot like the one that the tool is aimed at is usually a step that some people tend to gloss over. Yeah, I definitely would agree. So to that end, what advice would you give someone who read your blog post, was entranced by it, and is determined to follow in your architectural footsteps? Don't be afraid to fail because we failed numerous times trying to build this the first couple of go-rounds. Just go in, dive into it, and you'll be surprised with how powerful

Starting point is 00:23:58 the technology can be. Fargate is a pretty cool tool and I expect it to evolve to be, I think, one of their biggest services over the next few years. I suspect I expect it to evolve to be, I think, one of their biggest services over the next few years. I suspect you're probably not going to be wrong on that. Thank you so much for being so generous with your time. Thanks, Corey. Appreciate it. Christopher Stobie of Veritone. I'm Corey Quinn, and this is Screaming in the Cloud. This has been this week's episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com or wherever fine snark is sold.

Screaming in the Cloud - Episode 26: I’m not a data scientist, but I work for an AI/ML startup building on Serverless Containers

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.