Screaming in the Cloud - The Man Behind the Curtain at Zoph with Victor Grenu

Episode Date: October 25, 2022

About VictorVictor is an Independent Senior Cloud Infrastructure Architect working mainly on Amazon Web Services (AWS), designing: secure, scalable, reliable, and cost-effective cloud archite...ctures, dealing with large-scale and mission-critical distributed systems. He also has a long experience in Cloud Operations, Security Advisory, Security Hardening (DevSecOps), Modern Applications Design, Micro-services and Serverless, Infrastructure Refactoring, Cost Saving (FinOps).Links Referenced:Zoph: https://zoph.io/unusd.cloud: https://unusd.cloudTwitter: https://twitter.com/zophLinkedIn: https://www.linkedin.com/in/grenuv/

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is brought to us in part by our friends at Datadog. Datadog's a SaaS monitoring and security platform that enables full stack observability for developers,
Starting point is 00:00:41 IT operations, security, and business teams in the cloud age. Datadog's platform, along with 500 plus vendor integrations, allows you to correlate metrics, traces, logs, and security signals across your applications, infrastructure, and third-party services in a single pane of glass. Combine these with drag-and-drop dashboards and machine learning-based alerts to help teams troubleshoot and collaborate more effectively, prevent downtime and enhance performance and reliability. Try Datadog in your environment today with a free 14 day trial and get a complimentary t-shirt when you install the agent. To learn more, visit datadoghq.com slash screaming in the cloud to get started. That's www.datadoghq.com
Starting point is 00:01:28 slash screaming in the cloud. Managing shards, maintenance windows, over-provisioning, elastic hash bills. I know, I know, it's a spooky season and you're already shaking. It's time for caching to be simpler. Memento serverless cache lets you forget the backend to focus on good code and great user experiences. With true auto-scaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomemento.co slash screaming. That's g-o-m-o-m-e-n-t-o dot c-o slash screaming. Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the best parts about running a podcast like this and trolling the internet of AWS things is every once in a while I get to learn something radically different than what I expected. For a long time, there's been this sort of persona or brand in the AWS
Starting point is 00:02:33 space, specifically the security side of it, going by Zoph, that's Z-O-P-H. And I just assumed it was a collective or a whole bunch of people working on things, and it turns out that nope, it is just one person. And that one person is my guest today. Victor Grenieux is an independent AWS architect. Victor, thank you for joining me. Hey, Corey. Thank you for having me. It's a pleasure to be here. So I want to start by diving into the thing that first really put you on my radar, though I didn't realize it was you at the time. You have what can only be described as an army of Twitter bots around the AWS ecosystem. And I don't even know that I'm necessarily
Starting point is 00:03:19 following all of them, but what are these bots and what do they do? Yeah, I have a few bots on Twitter that I push some notifications, some tweets when things happen on AWS security space, especially when the AWS managed policies are updated from AWS. And it comes from an initial project from Scott Piper. He was running a Git command on his own laptop to push the history of AWS Managed Policy. And he told me that I can automate this thing using a deployment pipeline and so on and to tweet every time a new change is detected from AWS. So the idea is to monitor every change on these policies. It's kind of wild because I built a number of somewhat similar Twitter bots,
Starting point is 00:04:15 only instead of trying to make them into something useful, I make them into something more than a little bit horrifying and extraordinarily obnoxious. Like there's a cloud boomer Twitter account that winds up tweeting every time Azure tweets something, only it quote tweets them in all caps and says something insulting. I have an AWS releases bot called AWS Cloud.
Starting point is 00:04:36 So that's C-W-O-U-D. And that winds up converting it to U-W-U-Speak. It's like, yay, a new auto-scaling group. That sort of thing is obnoxious and offensive but it makes me laugh yours on the other hand are things that i have notifications turned on for just because when they announce something it's generally fairly important the first one that i discovered was your iam changes bot and i found some terrifying things coming out of that from time to time what What's the data source for that?
Starting point is 00:05:06 Because I'm just grabbing other people's Twitter feeds or RSS feeds. You're clearly going deeper than that. Yeah, the data source is the official AWS managed policy. In fact, I run AWS CLI in the background, and I'm doing just a list policy,
Starting point is 00:05:22 the list policy command. And with this list, I'm doing a get of each policy that is returned. So I can history in the Git repository to get the full history over the time. And I also craft a list, a list of duplicated policy. And I also run like a dog food initiative, the policy analysis, validation analysis from AWS tools to validate the consistency and the accuracy of their own policies. So there is a policy validation with their own tool. You would think that that wouldn't turn up anything because their policy validator effectively acts as a linter.
Starting point is 00:06:08 So if it throws an error, of course you wouldn't wind up pushing that. And yet somehow the fact that you have bothered to hook that up and have findings from it indicates that that's not how the real world works. Yeah, there is some, let's say some false positive because we are running the policy validation with their own linter, their own policies. But this is something that is documented from AWS. So there is an official page where you can find why the linter is not working on each policy and why. There is an explanation for each findings. I'm thinking
Starting point is 00:06:47 of the randomly managed policy, which is too long and policy analyzer is crashing because the policy is too long. Excellent. It's odd to me that you have gone down this path because it's easy enough to look at this and assume that, oh, this must just be something you do for fun or as an aspect of your day job. So I did a little digging into what your day job is, and this rings very familiar to me. You are an independent AWS consultant, only you're based out of Paris, whereas I was doing this from San Francisco due to an escalatingly poor series of life choices on my part. What do you focus on in the AWS consulting world? Yeah, I'm running an AWS consulting boutique in Paris, and I'm working for a large customer in France.
Starting point is 00:07:39 And I'm doing mostly infrastructure stuff, infrastructure design for cloud-native applications, and I'm also doing some security audits and remediation for my customers. It seems to me that there's a definite divide as far as how people find the AWS consulting experience to be. And I'm not trying to cast judgment here, but the stories that I hear tend to fall into one of two categories. One of them is the story that you have, where you're doing this independently, you've been on your own for a while,
Starting point is 00:08:15 working specifically on this. And then there's the stories of, oh yeah, I work for a 500-person consultancy and we do everything as long as they'll pay us money. If they've got money, we'll do it. Why not? And it always seems to me, not to be overly judgy, but the independent consultants just seem happier about it because for better or worse, we get to choose what we focus on in a way that I don't think you do at a larger company. Yeah, it's the same in France or in Europe. There is a lot of consulting firms.
Starting point is 00:08:47 But with the pandemic and with the market where we are working in the cloud, in the cloud-native solution and so on, there is a lot of demand. And the natural path is to start by working for a consulting firm. And then when you are ready, when you have many AWS certifications, when you have the experience of the customer, when you have a network of well-known customers and you gain trust from your customers, I think it's natural to go by yourself, to be independent and to choose your own project and your own customer. I'm curious as to get your take on what your perception of being an AWS consultant is when you're based in Paris versus, in my case, being based on the West Coast of the United States. And I know that's a bit of a strange question, but even when I travel, for example, over to the East Coast,
Starting point is 00:09:46 suddenly my own newsletter sends out three hours later in the day than I expect it to, and that throws me for a loop. The AWS announcements don't come out at two or three in the afternoon. They come out at dinnertime. And for you, it must be in the middle of the night when a lot of those things wind up dropping. The AWS stuff, not my newsletter. I imagine you're not excitedly waiting on tenterhooks to see what this week's issue of last week in AWS talks about, like I am. But I'm curious just that even beyond that,
Starting point is 00:10:13 how do you experience the market from what you're perceiving people in the United States talking about as AWS consultants versus what you see in Paris? It's difficult. In fact, I don't have so much information about the independent in the u.s i know there is a lot but i think it's more common in in europe and yeah it's an advantage to have a 10 hour time range from the u.s because a lot
Starting point is 00:10:40 of stuff happened on the pacific time on the Seattle time zone and the San Francisco time zone. So, for example, for this podcast, my Monday is over right now. So, yeah, I have some advantage in time, but yeah. This is potentially an odd question for you, but I find an awful lot of the AWS documentation to be challenging, we'll call it. I don't always understand exactly what it's trying to tell me. And it's not at all clear that the person writing the documentation about a service, in some cases, has ever used the service.
Starting point is 00:11:18 And in everything I just said, there is no language barrier. This documentation was written theoretically in English and I most days can stumble through a sentence in English and almost no other language. You obviously speak French as a first language, given that you live in Paris. It seems to be a relatively common affliction. How do you find interacting with AWS in French goes? Or is it just a complete non-starter and it all has to happen in English for you? No, in fact, the consultants in Europe, I think,
Starting point is 00:11:54 in fact, in my part, I'm using my laptop in English, I'm using my phone in English, I'm using the AWS console in English, and so on. So the documentation for me is a switch on English first, because for the other language, there is sometimes some automated translation that is very dangerous sometimes. So we all keep the documentation and the materials in English. It's wild to me just looking at how challenging so much of this stuff is, having to then work in a second language on top of that.
Starting point is 00:12:35 It just seems almost insurmountable to me. It's good that they have automated translation for a lot of this stuff, but that falls down in often hilariously disastrous ways sometimes. It's wild to me. Even taking most programming languages that folks have ever heard of, even if you program and speak no English, which happens in a large part of the world, you're still using if statements,
Starting point is 00:12:55 even if the term if doesn't mean anything to you localized in your language. It really is, in many respects, an English-centric industry. Yeah, completely. Yeah, even in front of a large French customer, I'm writing the PowerPoint presentation in English, some emails are in English,
Starting point is 00:13:17 even if all the folks in this read are French. So, yeah. One other area that I wanted to explore with you a bit is that you are very clearly focused on security as a primary area of interest. Does that manifest in the work that you do as well? Do you find that your consulting engagements tend to have a high degree of focus on security?
Starting point is 00:13:41 Yeah. In my design, when I'm doing some AWS architecture, my main objective is to design some security architecture and security pattern and apply best practices and least privilege. on security audits for startups, for internal customer, for diverse company, and I'm doing some remediation after all. And to run my audit, I'm using some open source tooling, some custom scripts and so on. I have a methodology that I'm running for each customer. And the goal is to sometimes to prepare some certification,
Starting point is 00:14:27 PCI DSS or so on, or maybe to ensure that the best practices are correctly applied on a workload or before a go-live. One of the weirdest things about this to me is that I've said for a long time that cost and security tend to be inextricably linked as far as being a sort of a trailing reactive afterthought for an awful lot of companies.
Starting point is 00:14:51 They care about both of those things right after they failed to adequately care about those things. At least in the cloud economic space, it's only money as opposed to, oops, we accidentally lost our customers' data. So I always found that I find myself drifting in a security direction if I don't stop myself, just based upon a lot of the cost work I do.
Starting point is 00:15:12 Conversely, it seems that you have come from the security side and you find yourself drifting in a costing direction. Your side project is a SaaS offering called unused.cloud. That's U-N-U-S-D dot cloud. And when you first mentioned this to me, my immediate reaction was, oh, great, another SaaS platform for costing. Let's tear this one apart too.
Starting point is 00:15:36 Except I actually like what you're building. Tell me about it. Yeah, unused.cloud is a side project for me. And I was working since, let's say, one year. It was a product that I've deployed for some of my customers on their local account, and it was very useful. And so I was thinking that it could be a sas product so i've worked at times uh yes few months on the on shifting the the product to a sas approach so the product aim to detect the west on aws account on all aws region and it scan all your your address accounts and all your your, and it tries to detect unused EC2, RDS,
Starting point is 00:16:27 Glue Dev Endpoint, SageMaker, and so on, and attach EBS and so on. I don't craft a new dashboard, a new cost explorer, and so on. It's just cost awareness. It's just a notification on email or Slack or Microsoft Teams. And you just add your AWS account on the product and you schedule, let's say, once a day. And it scans and it sends you a cost awareness, a waste detection, and you can act by turning off what is not used. What I like about this is it cuts at the number one rule of cloud economics, which is turn that shit off if you're not using it.
Starting point is 00:17:13 You wouldn't think that I would need to say that, except that everyone seems to be missing that on some level. And it's easy to do. When you need to spin something up and it's not there, you're very highly incentivized to spin that thing up. When you're not using it, you have to remember that that thing exists. Otherwise, it just sort of sits there forever and doesn't do anything. It just costs money and doesn't generate any value in return for that. What you got right is you've also eviscerated my most common complaint about tools that claim to do this, which is you build in either an explicit rule of ignore this resource or ignore resources with the following tags.
Starting point is 00:17:52 The benefit there is that you're not constantly giving me useless advice like, oh yeah, turn off this idle thing. It's, yeah, that's there for a reason. Maybe it's my dev box. Maybe it's my backup site. Maybe it's the entire DR environment that I'm going to need at little notice.
Starting point is 00:18:08 It solves for that problem beautifully. And though a lot of tools out there claim to do stuff like this, most of them really fail to deliver on that promise. Yeah, I just want to keep it simple. I don't want to add an additional console and so on. And yeah, you are correct. You can apply a simple tag on your asset.
Starting point is 00:18:30 Let's say an EC2 instances. You apply the tag you unused and the value off. And then the alerting is disabled for this asset. And the detection is based on the CPU average and the network out metrics. So when the instance is not used in the last seven days with a low CPU average and low network out, it comes as a suspect. One thing that I like about what you've done, but also have some reservations about it, is that you have not done what so many of these tools do, which is, oh, just give us all the access in your account. It'll be fine.
Starting point is 00:19:12 You can trust us. Don't you want to save money? And yeah, but I also still want to have a company left when all is said and done. You are very specific on what it is that you're allowed to access, and it's great. I would argue on some level, it's almost too restrictive. For example, you have the ability to look at PC2, Glue, IAM, just to look at account alias is great, RDS, Redshift, and SageMaker.
Starting point is 00:19:36 And all of these are simply list and describe. There's no gets in there other than in Cost Explorer, which makes sense. You're not able to go rummaging through my data and see what's there. But that also bounds you on some level to being able to look only at particular types of resources. Is that accurate?
Starting point is 00:19:52 Or are you using a lot of the CloudWatch stuff and Cost Explorer stuff to see other areas? In fact, it's a least privilege and readily permission because I don't want too much question for the security team. So it's full read-only permission and I've only added the detection that I'm currently support.
Starting point is 00:20:14 Then if in some weeks, in some months, I'm adding a new detection, let's say for snapshot, for example, I will need to update. So I will ask will need to update. So I will ask my customer to update their template. There is a mechanism inside the product to tell them that the template is obsolete, but it's not a breaking change. So the detection will continue, but without the new detection,
Starting point is 00:20:43 the new snapshot detection, let's say so yeah it's a least privilege and all i need is the get metric uh statistics from cloud watch to detect uh unused asset and also checking like detach elastic ip or detach e volume. So there is no cloud watching in this detection. Also, to be clear, I am not suggesting that what you have done is at all a mistake, even if you bound it to those resources right now. Just because everyone loves to talk about these exciting, amazing, high-level services that AWS has put up there. For example, oh, what about DocumentDB or all these other, you know, Amazon Basics, MongoDB, same thing, or all of these other things
Starting point is 00:21:31 that they wind up offering. But you take a look at where customers are spending money and where they're surprised to be spending money, it's EC2. It's a bit of RDS. Occasionally it's S3, but that's a lot harder to detect automatically whether that data is unused.
Starting point is 00:21:46 You haven't been using this data very much. Well, you see how the bucket is labeled archive backups or regulatory logs. Imagine that. What a ridiculous concept. Yeah. Whereas an idle EC2 instance sort of can wind up being useful on this. I am curious whether you encounter in the wild in your customer base, folks who are having idle-looking EC2 instances, but are in fact, for example, using a whole bunch of RAM,
Starting point is 00:22:15 which you can't tell from the outside without custom CloudWatch agents. Yeah, I'm not detecting this behavior for large usage of RAM, for example, or for maybe there is some custom application that is low in CPU and don't talk to any other services using the network. But with this detection, with the current state of the detection, I'm covering a large majority of wasp because what i see from my
Starting point is 00:22:48 customer is that there is some teams some data scientists or data teams or who are experimenting a lot with stage maker with glued event point that's on and this is very expensive at the end of the day because they don't turn off the light at the end of the day on Friday evening. So what I'm trying to solve here is to notify the team on Slack when they forgot to turn off the most common Wastona WS,
Starting point is 00:23:19 so EC2, RTS, Redshift. I just now wound up installing it, well, we've been talking, on my dedicated ship posting account. And sure enough, it already spat out a single instance it found, which, yeah, it was running an EC2 instance on the East Coast when I was just there
Starting point is 00:23:37 so that I had a DNS server that was a little bit more local. Okay, great. And it's a T4G micro, so it's not exactly a whole lot of money, but it does exactly what it says on the tin. It didn't wind up nailing the other instances I have in that account
Starting point is 00:23:51 that I'm using for a variety of different things, which is good. And it further didn't wind up falling into the trap that so many things do, which is the, oh, it's costing you zero and your spend this month is zero because this account is where I dump all of my AWS credit codes. So many things say, oh, well, it's not costing you anything. So what's the problem?
Starting point is 00:24:14 And then that's how you accidentally lose $100,000 in Activate credits because someone left something running way too long. It does a lot of the right things that I would hope and expect it to do. And the fact that you don't do that is kind of amazing. Yeah, it was a need from my customer and an opportunity. It's a small bet for me because I'm trying to do some small bets, you know, the small bet approach. So the idea is to try a new thing. It's also an excuse for me to learn something new because building a SaaS is challenging. One thing that I am curious about, in this account, I'm also running the controller for
Starting point is 00:24:57 my home Wi-Fi environment. And that's not huge. It's a T3 small, but it is still something out there that it sits there because I need it to exist. But it's relatively bored. If I go back and look over the last week of CloudWatch metrics, for example, it doesn't look like it's hugely busy. I mean, sure, there's some network traffic in and out as it updates itself and whatnot, but the CPU peaks out at a little under 2% used. It didn't warn on this, and it got it right. I'm just curious as to how you did that. What is it looking for to determine
Starting point is 00:25:28 whether this instance is unused or not? It's the magic. There is some intelligence, no, I'm just kidding, it's just statistics. And I'm getting two metrics, the superior average from the last seven days and the network out. And I'm getting the average on those metrics.
Starting point is 00:25:51 And I'm doing some assumption that this specific EC2 is not used because of this metric, this average. Yeah, it is wild to me just that this is working as well as it is. It's just like it does exactly what I would expect it to do. It's clear that, and this is going to sound weird, but I'm going to say it anyway, that this was built from someone who is looking to answer the question themselves and not from the perspective of, well, we need to build a product and we have access to all of this data from the API. How can we slice and dice it and add some value as we go? I really like the approach that you've taken on this.
Starting point is 00:26:33 I don't say that often or lightly, particularly when it comes to cloud costing stuff, but this is something I'll be using in some of my own nonsense. Thanks, I appreciate it. So I really want to thank you for taking as much time as you have to talk about who you are and what you're up to. If people want to learn more, where can they find you?
Starting point is 00:26:50 Mainly on Twitter. My handle is Z-O-P-H, Zoff. And yeah, on LinkedIn or on my company website, Zoff.io. And we will, of course, put links to that in the show notes. Thank you so much for your time today.
Starting point is 00:27:09 I really appreciate it. Thank you, Corey, for having me. It was a pleasure to chat with you. Victor Grenieux, independent AWS architect. I'm cloud economist,
Starting point is 00:27:20 Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that is going to cost you an absolute arm and a leg, because invariably, you're going to forget to turn it off when you're done. If your AWS bill keeps rising and your blood pressure is doing
Starting point is 00:27:48 the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started. This has been a humble pod production stay humble

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.