Screaming in the Cloud - Understanding CDK and The Well Architected Framework with Matt Coulter

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate.

Starting point is 00:00:38 Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other. Which one is up to you? Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business, production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io slash screaming in the

Starting point is 00:01:20 cloud. Observability, it's more than just hipster monitoring. in terms of what's available. AWS offers NVIDIA A100 GPUs on instances that only come in one size and cost 32 bucks an hour. Lambda offers instances that offer those GPUs as single card instances for $1.10 an hour. That's 73% less per GPU. That doesn't require any long-term commitments or predicting what your usage is going to look like years down the road. So if you need GPUs, check out Lambda. In beta, they're offering 10 terabytes of free storage, and this is key, data ingress and egress are both

Starting point is 00:02:16 free. Check them out at lambdalabs.com slash cloud. That's L-A-M-B-D-A-L-A-B-S dot com slash cloud. Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the best parts about, well, I guess being me, is that I can hold opinions that are probably polite and call them incendiary. And that's great because I usually like to back them in data. But what happens when things change? What happens when I learn new things? Well, do I hold on to that original opinion with two hands and a death grip? Or do I admit that I was wrong in my initial opinion about something? Let's find out.

Starting point is 00:02:58 My guest today returns from earlier this year. Matt Coulter is a senior architect, since he has been promoted, at Liberty Mutual. Welcome back, and thanks for joining me. Yeah, thanks for inviting me back, especially to talk about this topic. Well, we spoke about it a fair bit at the beginning of the year, and if you're listening to this and you haven't heard that show, it's not that necessary to go into. Mostly, it was me spouting uninformed opinions about the CDK, the Cloud Development Kit. For those who are unfamiliar, I think of it more or less as, what if you could just structure your cloud resources using a programming language you

Starting point is 00:03:36 claim to already know, but in practice copy and paste from Stack Overflow like the rest of us? Matt, you probably have a better description of what the CDK is in practice. Yeah. So we like to say it's imperative code written in a declarative way or declarative code written in an imperative way. Either way, it lets you write code that produces CloudFormation. So it doesn't really matter what you write in your script. The point is, at the end of the day, you still have the CloudFormation template that comes out of it. So the whole piece of it is that it's a developer experience, developer speed play, that if you're from a background that you're more used to writing a programming language than a YAML, you might actually enjoy using the CDK over writing straight CloudFormation or SAM.

Starting point is 00:04:21 When I first kicked the tires on the CDK, my first initial obstacle, which I've struggled with in this industry for a bit, is that I'm just good enough of a programmer to get myself in trouble. Whenever I wind up having a problem that Stack Overflow doesn't immediately shine a light on, my default solution is to resort to my weapon of choice, which is brute force. That sometimes works out, sometimes doesn't. And as I went through the CDK a couple of times in service to a project that I'll explain shortly, I made a bunch of missteps with it. The first and most obvious one is that AWS claims publicly that it has support in a bunch

Starting point is 00:04:59 of languages..NET, Python, there's obviously TypeScript, there's Go support for it. I believe that went generally available. And I'm sure I'm missing one or two, I think. Aren't I? Yeah, so TypeScript, JavaScript, Python, Java,.NET, and Go. I think those are the currently supported languages. John, that's the one that I keep forgetting. It's the block printing to the script that is basically Java cursive. The problem I run into, and this is true of most things in my experience, when a company says that we have deployed an SDK for all of the following languages, there is very clearly a first-class citizen language and then the rest

Starting point is 00:05:38 that more or less drift along behind with varying degrees of fidelity. In my experience, when I tried it for the first time in Python, it was not a great experience for me. When I learned just enough JavaScript, and by extension, TypeScript, to be dangerous, it worked a lot better. Or at least I could blame all the problems I ran into on my complete novice status when it comes to JavaScript and TypeScript at the time.

Starting point is 00:06:02 Is that directionally aligned with what you've experienced given that you work in a large company that uses this? And presumably once you have more than, I don't know, two developers, you start to take on aspects of a polyglot shop no matter where you are on some level. Yeah. So personally, I jump between Java, Python, and TypeScript whenever I'm writing projects. So when it comes to the CDK, you'd assume I'd be using all three. I typically stick to TypeScript and that's just because personally I've had the best experience using it. And for anybody who doesn't know the way CDK works for all the languages, it's not that they have written a custom like SDK for each of these languages.

Starting point is 00:06:43 It's a case of it uses a node process underneath them and the language actually interacts with, it's like the compiled JavaScript version is basically what they all interact with. So it means there are some limitations on what you can do in that language. I can't remember the full list, but it just means that it is native in all those languages,

Starting point is 00:07:02 but there's certain features that you might be like, ah, whereas in TypeScript, you can just use all of TypeScript. And my first inclination was actually, I was using the Python one and I was having issues with some, some compiler errors and things that are just caused by that process. And it's something that talking, talking in the cdk.dev Slack community, there is actually a very active... Which is wonderful, I will point out. Thank you. There is actually an awesome Python community in there. But if you ask them, they would all ask for improvements to the language.

Starting point is 00:07:33 So I personally, if someone's new, I always recommend they start with TypeScript and then branch out as they learn the CDK so they can understand, is this a me problem or is this a problem caused by the implementation? From my perspective, I didn't do anything approaching that level of deep dive. I took a shortcut that I find has served me reasonably well in the course of my career. When I'm trying to do something in Python and you pull up a tutorial, which I'm a big fan of reading experience reports and blog posts, and here's how to get started. And they all had the same problem, which is step one, run npm install, and that's, hmm, you know, I don't recall that being a standard part of the Python tooling. It is clearly designed and interpreted and contextualized through a lens of JavaScript. Let's remove that translation

Starting point is 00:08:20 layer. Let's remove any weird issues I'm going to have in that transpilation process and just talk in the languages written in. Will this solve my problems? Oh, absolutely not. But it will remove a subset of them that I am certain to go blundering into like a small lost child trying to cross an eight lane freeway. Yeah. I've heard a lot of people say the same thing because the CDK CLI is a node process. You need it no matter what language you use. So if they were distributing some kind of universal binary that just integrated with the languages, it would definitely solve a lot of people's issues with trying to combine languages at deploy time. One of the challenges that I've had as I go through the process of iterating on the project,

Starting point is 00:09:01 but I guess I should probably describe it for those who have not been following along with my misadventures. I write blog posts about it from time to time because I need a toy problem to kick around sometimes because my consulting work is all advisory and I don't want to be a talking head. I have a Twitter client called lasttweetinaws.com. It's free. Go and use it. It does all kinds of interesting things for authoring Twitter threads. And I wanted to deploy that to a bunch of different AWS regions, as it turns out, 20 or so at the moment. And that led to a lot of interesting projects

Starting point is 00:09:30 and having to learn how to think about these things differently, because no one sensible deploys an application simultaneously to what amounts to every AWS region without canary testing and having a phased rollout and the rest. But I'm reckless and,

Starting point is 00:09:50 honestly, as said earlier, a bad programmer. So that works out. And trying to find ways to make this all work and fit together led iteratively towards me discovering that the CDK was really kind of awesome for a lot of this. That said, there were definitely some fairly gnarly things I learned as I went through, due in no small part to help I receive from generous randos in the cdk.dev Slack team. And it's gotten to a point where it's working. And as an added bonus, I even mostly understand what it's doing, which is just kind of wild to me. It's one of those interesting things where because it's a programming language, you can use it out of the box the way it's designed to be used, where you can just write your simple logic, which generates your cloud formation. Or you can do whatever crazy logic you want to do on top of that to make your

Starting point is 00:10:33 app work the way you want it to work. And providing you're not in a company like Liberty, we're not going to do a code review. If no one's stopping you, you can do your crazy experiments. And if you understand that it's good. But I do think something like the multi-region deploy, I mean, with CDK, if you have a construct, it takes in a variable that you can just say what the region is. So you can actually just write a for loop and pass it in, which does make things a lot easier than, I don't know, trying to do it with a YAML, which you can pass in parameters, but you're going to get a lot more complicated, a lot quicker. The approach that I took philosophically was I wrote everything in a region-agnostic way, and it would be instantiated and be told what region to run it in as an environment variable,

Starting point is 00:11:15 what a CDK deploy was called. And then I just deploy 20 simultaneous stacks through GitHub Actions, which invoke a custom runner, So this runs inside of a Lambda function. And that's just a relatively basic YAML file, thanks to the magic of GitHub Actions matrix jobs. So it fires off 20 simultaneous processes on every commit to the main branch. And then after about two and a half minutes, it has been deployed globally everywhere. And I get notified in anything that fails,

Starting point is 00:11:40 which is always fun and exciting to learn those things. That has been, overall, just a really useful experiment and an experience because you're right, you could theoretically run this as a single CDK deploy and then wind up having it iterate through a list of regions. The challenge I have there is that unless I start getting into

Starting point is 00:12:00 really convoluted asynchronous concurrency stuff, it feels like it'll just take forever. At two and a half minutes a region times 20 regions, that's the better part of an hour on every deploy and no one's got that kind of patience. So I wound up just parallelizing it a bit further up the stack. That said, I bet there are relatively straightforward ways

Starting point is 00:12:17 given that async is a big part of JavaScript to do this simultaneously. One of the pieces of feedback I've seen about CDK is if you have multiple stacks in the same project, it'll deploy them one at a time. And that's just because it tries to understand the dependencies between the stacks and then it works out which one should go first. But a lot of people have said, well, I don't want that. If I have 20 stacks, I want all 20 to go once the way you're saying. And I have seen that people have been writing plugins to enable concurrent deploys

Starting point is 00:12:45 with CDK out of the box. So it may be something that's, it's not an out of the box feature, but it might be something that you can pull in a community plugin to actually make work. Most of my problems with it at this point are really problems with CloudFormation.

Starting point is 00:12:58 CloudFormation does not support, well, if at all, secure string parameters from the AWS Systems Manager parameter store, which is what my default go-to for secret storage. And Secrets Manager is supported, but that also costs 40 cents a month per secret. And not for nothing, I don't really want to have all five secrets deployed to Secrets Manager in every region this thing is in. I don't really want to pay $20 a month for this basically free application just to hold some secrets. So I wound up talking to some folks in the

Starting point is 00:13:28 Slack channel, and what we came up with was I have a centralized S3 bucket that has a JSON object that lives in there. It's only accessible from the deployment role, and it grabs that at deploy time and stuffs it into environment variables when it pushes these things out. That's the only stateful part of all of this.

Starting point is 00:13:44 And it felt like that is on some level a pattern that a lot of people would benefit from if it had better native support, with the counter-argument that if you're only deploying to one or two regions, then Secrets Manager is the right answer for a lot of this, and it's not that big of a deal. Yeah, and it's another one of those things if you were deploying in Liberty, we'll say, well, your secret is unencrypted at runtime, so you probably need a KMS key involved in that, which is, you know, the costs of KMS. It depends on if it's a personal solution or if it's something for like a Fortune 100 company.

Starting point is 00:14:15 And if it's personal solution, I mean, what you're saying sounds great that it's, I am restricted in S3 and then that way only at deploy time can it be read. It actually could be a custom construct that someone could build and publish out there to the construct library or the construct hub, I should say. To be clear, the reason I'm okay with this from a security perspective is, one, this isn't a dedicated AWS account. This is the only thing that lives in that account. And two, the only API credentials we're talking about are the application-specific credentials for this Twitter client when it winds up talking to the Twitter API. Basically, if you get access to these and are able to steal them and deploy somewhere else, you get no access to customer data or user data, because this is not charged for anything. You get

Starting point is 00:15:03 no access to things that have been sent out. All you get to do is submit tweets to Twitter, and it'll have the string last tweet in AWS as your client rather than whatever normal client you would use. It's not exactly what we'd call a high-value target because all the sensitive-to-a-user data lives in local storage in their browser. It is fully stateless.

Starting point is 00:15:24 Yeah, so this is what I mean, like it's the difference in what you're using your app for. Perfect case of you can just go into the Twitter app and just withdraw those credentials and do it again if something happens, whereas, as I say, if you're building it for liberty, that it will not pass one of our well-architected reviews just for that reason. If I were going to go and deploy this in a more, I guess, locked down environment, I would be tempted to find alternate approaches such as have it encrypted at rest via KMS in S3 is one option.

Starting point is 00:15:54 So is having global DynamoDB tables that wind up grabbing those things or even grabbing it at runtime if necessary. There are ways to make that credential more secure at rest. It's just, I look at this from a real world perspective of what is the actual attack surface on this. And I have a really hard time just identifying anything that is going to be meaningful with regard to an exploit.

Starting point is 00:16:15 If you're listening to this and have a lot of thoughts on that matter, please reach out. I'm willing to learn and change my opinion on things. Yeah. One thing I will say about the Dynamo approach you mentioned, I'm not sure everybody knows this, but inside the same Dynamo table, you can scope down a row. You can be like this row and this field in this row can only be accessed from this one Lambda function. So there's, there's a lot of really awesome security features inside DynamoDB that I don't think most people take advantage of, but they open up a lot of options for simplicity. Is that tied to the very recent announcement about Lambda getting source ARN as a condition key? In other words, you can say this specific Lambda function as opposed to a Lambda in this account. That was a relatively recent

Starting point is 00:16:57 advent that I haven't fully explored the nuances of. Yeah, that has opened a lot of doors. I mean, the Dynamo being able to be locked down to your row has been around for a while, but the new Lambda from SourceArm is awesome because, yeah, as you say, you can literally say this thing as opposed to you have to start going into tags or you have to start going into something else to find it. So I want to talk about something you just alluded to, which is the well-architect Framework. And initially when it launched, it was a whole framework and AWS made a lot of noise about it on keynote stages, as they are wont to do. And then later they created a quote-unquote well-architected tool, which, let's be very direct, it's the checkbox survey form, at least the last time I looked at it. And they now have,

Starting point is 00:17:41 I believe, six pillars of the Well-Arched framework where they talk about things like security, cost, the sustainability is the new pillar. I don't know, absorbency or whatever the remainders are. I can't think of them off the top of my head. How does that map to your experience with the CDK? Yeah, so out of the box, the CDK from day one was designed to have sensible defaults. And that's why a lot of the things you deploy have opinions. I talked to a couple of the heroes and they were like, I wish it had less opinions. But that's why whenever you deploy something, it's got a bunch of configuration already in there.

Starting point is 00:18:17 For me in the CDK, whenever I use constructs or stacks or deploy anything in the CDK, I always build it in a well-architected way. And that's such a loaded sentence. Whenever you say the words well-architected way. And that's such a loaded sentence. Whenever you say the words well-architected, the people go, what do you mean? And that's where I go through the six pillars. And in Liberty, we have a process. It used to be called Scorp because it was five pillars, but now it's Scorps because they added sustainability.

Starting point is 00:18:40 But that's where for every stack we'll go through it. And we'll be like, okay, let's have the discussion. And we will use the tool that you mentioned. mean the tool as you say it's it's a bunch of tick boxes with a text box but the idea is we'll get in a room and as we build these starter patterns or these pieces of infrastructure that people are going to reuse we'll run the well-architected review against the framework before anybody gets to generate it. And then we can say out of the box, if you generate this thing, these are the pros and cons against the well-architected framework of what you're getting, because we can't make it a hundred percent bulletproof for your use case because we don't know it, but we can tell you

Starting point is 00:19:18 out of the box what it does. And then that way you can keep building. So they start off with something that is well-documented, how well-architected it is. And way you can keep building so they start off with something that is well documented how well architected it is and then you can start having it makes it a lot easier to have those conversations as they go forward because you just have to talk about the delta as they start adding their own code then you can go in you go okay you've added these 20 lines let's talk about what they do and that's why i always think you can do a strong connection between infrastructure as code and well-architected. As I look through the actual six pillars of the well-architected framework, sustainability, cost optimization, performance efficiency, reliability, security, and operational excellence,

Starting point is 00:19:58 as I think through the nature of what this shitpost thread Twitter client is, I am reasonably confident across all of those pillars. I mean, first off, when it comes to the cost optimization pillar, please don't come to my house and tell me how that works. Yeah, obnoxiously, the security pillar is sort of the thing that winds up causing a problem for this, because this is in an account deployed by a control tower. And when I was getting this all set up,

Starting point is 00:20:21 my monthly cost for this thing was something like a dollar in charges. And then another $16 for the AWS config rule evaluations on all of the deploys, which is, it just feels like a tax and going about your business, but fine, whatever. Cost and sustainability, from my perspective, also tend to be hand in glove when it comes to this stuff. When no one is using the client, it is not taking up any compute resources. It has no carbon footprint of which to speak, by my understanding. It's very hard to optimize this down further from a sustainability perspective without barging my way into the middle of an AWS negotiation with one of its power companies. Yeah, so for everyone listening, watch as we do a live, well-architected review. Oh yeah, I expect we should do this on Twitter one of these days.

Starting point is 00:21:06 I think it'd be a fantastic conversation, or Twitch, or whatever the kids are using these days. Yeah. And again, so much of it, too, is thinking about the context of security. You work for one of the world's largest insurance companies. I shitpost for a living. The relative access and consequences

Starting point is 00:21:21 of screwing up the security on this are nowhere near equivalent. And I think that's something that often gets lost. The perfect be the enemy of the good. So that's why, unfortunately, the well-architected tool is quite loose. So that's why they have the well-architected framework, which is there's a white paper that just covers anything, which is quite big. And then the wrote specific lenses for like serverless or other use cases that are shorter. And then when you do a well-architected review, it's like loose. And it's sort of like, how are you applying the principles of well-architected and the conversation that we just had about security. So you would write that down in the box and be like, okay,

Starting point is 00:21:56 so I understand if anybody gets this credential, it means they can post his last tweet in AWS. And that's okay. That's the client, not the Twitter account to be clear. Yeah. So that's okay. That's what you, not the Twitter account, to be clear. Yeah, so that's okay. That's what you just marked down in the Well-Architected Review. And then if we go to do one in the future, you can compare it

Starting point is 00:22:11 and we can go, oh, okay, so last time you said this. And you can go, well, actually, I decided to, or we pivoted. We're a bank now.

Starting point is 00:22:18 Yeah. So that's it. We do more than tweets now. We decided to do microtransactions through cryptocurrency over Twitter. I don't know.

Starting point is 00:22:27 And that ends this conversation. No, no. But yeah, so if something changes, that's what the Well-Architected Review is for. It's about facilitating the conversation between the architect and the engineer. That's all it is. This episode is sponsored in parts

Starting point is 00:22:40 by our friend EnterpriseDB. EnterpriseDB has been powering enterprise applications with PostgreSQL for 15 years, and now EnterpriseDB has you covered wherever you deploy PostgreSQL, on-premises, private cloud, and they just announced a fully managed service on AWS and Azure called Big Animal. All one word. Don't leave managing your database to your cloud vendor because they're too busy launching another half dozen managed databases

Starting point is 00:23:08 to focus on any one of them that they didn't build themselves. Instead, work with the experts over at EnterpriseDB. They can save you time and money. They can even help you migrate legacy applications, including Oracle,

Starting point is 00:23:20 to the cloud. To learn more, try Big Animal for free. Go to biganimal.com slash snark and tell them Corey sent you. And the lenses also are helpful. This is a serverless application, so we're going to view it through that lens,

Starting point is 00:23:34 which is great because the original version of the well-architected tool is, oh, you built this thing entirely in Lambda. Have you bought some reserved instances for it? And it's, yeah, why do I feel like I have to explain to AWS how their own systems work? This makes it a lot more streamlined and talks about this, though it still does struggle with the concept of, in my case, a stateless app.

Starting point is 00:23:54 That is still something that I think is not the common path. Imagine that. My code is also non-traditional. Who knew? Who knew? The one thing that's good about it, if anybody doesn't know, they just updated the serverless lens about, I don't know, a week or two ago. So they added in a bunch more use cases. So if you've read it six months ago or even three months ago, go back and reread it because they spent a good year updating it.

Starting point is 00:24:17 Thank you for telling me that. That will, of course, wind up in next week's issue of Last Week in AWS. You can go back and look at the archives and figure out what week we recorded this then. Good work. One thing that I have learned as well, as of yesterday, as it turns out, before we wound up having this recording, obviously, because yesterday generally tends to come before today. That is a universal truism,

Starting point is 00:24:37 is that I had to do a bit of refactoring because what I learned when I was in New York live tweeting the AWS Summit is that the Route 53 latency record works based upon where your DNS server is. Yeah, that makes sense. I use Tailscale and wind up using my Pi hole, which lives back in my house in San Francisco.

Starting point is 00:24:55 Yeah, I was always getting US West 1 from across the country. Cool. For those weird edge cases like me, because this is not the common case, how do I force a local region? Ah, I'll give it its own individual region prepanned as a subdomain. Getting that to work with both the

Starting point is 00:25:11 global last tweet in AWS.com domain as well as the subdomain on API gateway through the CDK was not obvious on how to do it. Randall Hunt over at Kalent was awfully generous and came up with a proof of concept in about three minutes because he's Randall. And that was extraordinarily helpful. But a challenge I ran into was that the CDK deploy would fail because the way that CloudFormation was rendered and the way it was trying to do stuff, oh, that already has that domain affiliated in a different way. I had to do a CDK destroy than a CDK deploy for each one. Now, not the end of the world, but it got me thinking. Everything that I see around the CDK

Starting point is 00:25:50 more or less distills down to either Greenfield or a day one experience. That's great, but throw it all away and start over is often not what you get to do. And even though Amazon says it's always day one,

Starting point is 00:26:03 those of us in, you know, real companies don't get to just treat everything as brand new and throw away everything older than 18 months. What is the day two experience looking like for you? Because you clearly have a legacy business. By legacy, I of course use it in the condescending engineering term that means it makes actual money rather than just telling really good stories to venture capitalists for 20 years. Yeah, we still have mainframes running that make a lot of money. So I don't mock legacy at all. What that piece of crap do about $4 billion a year in revenue, perhaps show some respect. It's a common refrain. Yeah, exactly. So yeah,

Starting point is 00:26:41 anyone listening, don't mock legacy because as Corey says, it is running the business. But for us, when it comes to day two, it's something that I'm actually really passionate about this in general, because it is really easy. Like I did it with CDK patterns. It's really easy to come out and be like, okay, we're going to create a bunch of starter patterns or quick starts or whatever flavor that you came up with. And then you're going to deploy this thing and we're going to have you in production in 30 seconds. But even day one, later that day, not even necessarily day two, it depends on who it was that deployed it

Starting point is 00:27:15 and how long they've been using AWS. So you hear these stories of people who deployed something to experiment and they either forget to delete it, it cost them a lot of money, or they try to change it and it breaks because they didn't understand what was in it. And this is where the community starts to diverge in their opinions on what AWS CDK should be. There's a lot of people who think that at the minute, CDK, even if you create an abstraction in a construct, even if I create a construct and put it in the construct library that you get to use it still unravels and deploys as part of your deploy so everything that's

Starting point is 00:27:51 associated with it you now own and you technically need to understand that at some point because it might in theory break whereas there's a lot of people who think okay the CDK needs to go server side and an abstraction needs to stay an abstraction in the cloud and then that way if somebody's looking at a 20 line CDK construct or stack then it stays 20 lines it never unravels to something crazy underneath I mean that's one approach I think it'd be awesome if that could work I'm not sure how the support for that would work from it you've got something running on the cloud I'm not sure how the support for that would work from it. You've got something running on the cloud. I'm pretty sure AWS aren't going to jump on a call to support some construct that I deployed. So I'm not sure how that'll work in the open source sense. But what we're doing at

Starting point is 00:28:33 Liberty is the other way. So I mean, we famously have things like the software accelerator that lets you pick a pattern and it creates your pipelines and you're deployed. But now what we're doing is we're building a lot of telemetry and automated information around what you deployed. So that way, and it's all based on well-architected common theme. So that way, what you can do is you can go into- It's partially audibility and partially at a glance, figure out, okay, are there some things that can be easily remediated as we basically shift that whole thing left? Yeah. So you deploy something and it should be good the second

Starting point is 00:29:05 you deploy it but then you start making changes because you're corey you just start adding some stuff and you deploy it and if it's really bad it won't deploy like that's the liberty setup there's a bunch of rules and i'll go okay that's really bad that'll cause damage to customers but there's a large gap between bad and good that people don't really understand the difference that can cost a lot of money or can cause a lot of grief for developers because they go down the wrong path. So that's why what we're now building is after you deploy, there's a dashboard that'll just come up and be like, hey, we've noticed that your Lambda function has too little memory.

Starting point is 00:29:41 It's going to be slow. You're going to have bad cold starts or, you know, things like that. The knowledge that I have had to gain through hard fighting over the past couple of years, putting it into automation. And that way, combined with the well-architected reviews, you actually get me sitting in a call going, okay, let's talk about what you're building that hopefully guides people the right way. But I still think there's so much more we can do for day two, because even if you deploy the best solution today, six months from now, AWS will release 18 new services that make it easier to do what you just did. So someone also needs to build something that shows you the delta to get to the best. And that would involve AWS or somebody thinking cohesively, like these are how we use our products.

Starting point is 00:30:27 And I don't think there's a market for it as a third party company, unfortunately, but I do think that's where we need to get to. That at day two, somebody can give the way we're trying to do for Liberty Advice automated that says, I see what you're doing, but it would be better if you did this instead. Yeah, I definitely want to spend more time thinking about these things and analyzing how we wind up addressing them and how we think about them going forward. I learned a lot of these lessons over a decade ago. I was fairly deep into using Puppet and came to the fair and balanced conclusion that Puppet was a steaming piece of crap. So the solution was that I was one of the very early developers behind SaltStack, which was going to do everything right. And it was. And it was awesome

Starting point is 00:31:10 and it was glorious. Right up until I saw a environment deployed by someone else who was not as familiar with the tool as I was, at which point I realized, hell is other people's use cases. And the way that they contextualize these things.

Starting point is 00:31:25 You craft a finely balanced torque wrench. It's a thing of beauty. And people complain about the crappy hammer. You're holding it wrong. No, don't do it that way. So I have an awful lot of sympathy for people building platform level tooling like this, where it works super well for the use case that they're in,

Starting point is 00:31:41 but not necessarily, they're not necessarily aligned in other ways. It's a very hard nut to crack. Yeah. And like, even as you mentioned earlier, if you take one piece of AWS, for example, API gateway, and I love the API gateway team, if you're listening, don't hate on me, but there's like 47,000 different ways you can deploy an API gateway. And the CDK has to cover all of those. It would be a lot easier if there was less ways that you could deploy the thing and then you can start crafting user experiences on a platform. But whenever you start thinking that every AWS

Starting point is 00:32:16 component is kind of the same, like think of the amount of ways you can deploy a Lambda function now or think of like containers, I'll not even go into the number of ways to run containers if you're building a platform either you support it all and then it sort of gets quite generic or you're going to do like what serverless cloud are doing now like jeremy daly's building this unique experience that's like okay the code is going to build the infrastructure so just build a website and we'll do it all behind it and i think they're really interesting because they're sort of opposites in that one doesn't want to support everything, but it should theoretically

Starting point is 00:32:47 for their slice of customers be awesome. And then the other one's like, well, let's see what you're going to do. Let's have a go at it and I should hopefully support it. I think that there's so much that can be done on this, but before we wind up calling it an episode, I had one further question that I wanted to explore

Starting point is 00:33:03 around the recent results of the community CDK survey that I believe is a quarterly event. And I read the analysis on this, and I talked about it briefly in the newsletter, but it talks about adoption and a few other aspects of it. And one of the big things it looks at is the number of people who are contributing to the CDK in an open source context. Am I just thinking about this the wrong way when I think that, well, this is a tool that helps me build out cloud infrastructure. Me having to contribute code to this thing at all

Starting point is 00:33:34 is something of a bug. Whereas, yeah, I want this thing to work out super well. Docker is open source, but you'll never see me contributing things to Docker as a pull request because it does, as it says on the tin, I don't have any problems that I'm aware of that, ooh, it should do this instead. I mean, I have opinions on that, but those aren't pull requests. Those are complete, you know, shifts of product strategy, which it turns out is not quite done on GitHub. So it's funny. I, a while ago was talking to a lad who was the person who came up with the idea for the CDK and CDK is pretty much the open source project for AWS if you look at what they have and the

Starting point is 00:34:11 thought behind it it's meant to evolve into what people want and need so yes there is a product manager in AWS and there's a team fully dedicated to building it, but the ultimate aspiration was always, it should be bigger than AWS and it should be community driven. Now, personally, I'm not sure, like you just said it, what the incentive is given that right now CDK only works with cloud formation, which means that you are directly helping with an AWS tool, but it does give me hope for like there's CDK for Terraform and there's CDK for Kubernetes. And there's other flavors based on the same technology as AWS CDK that potentially could have a thriving open source community because they work across all the clouds. So it might

Starting point is 00:34:54 make more sense for people to jump in there. Yeah. I don't necessarily think that there's a strong value proposition as it stands today for the idea of the CDK becoming something that works across other cloud providers. I know it technically has the capability, but if I think that Python isn't quite a first-class experience, I don't even want to imagine what other providers are going to look like from that particular context. Yeah, and that's, from what I understand,

Starting point is 00:35:20 I haven't personally jumped into the CDK for Terraform. And we didn't talk about it here, but in CDK you get your different levels of construct. L1 is like a CloudFormation level construct, so everything that's in there directly maps to a property in CloudFormation. And then L2 is AWS's opinion on safe defaults. And then L3 is when someone like me comes along and turns it into something that you may find useful. So it's a pattern. As far as I know, CDK for Terraform is still on level one. They haven't

Starting point is 00:35:49 got the rich... And L4 is just hiring you as a consultant to come in and fix my nonsense for me. That's it. L4 could be Pulumi recently announced that you can use AWS CDK constructs inside it. But I think it's one of those things where the constructs, if they can move across these different tools, the way AWS CDK constructs now work inside Pulumi and there's a beta version that works inside CDK for Terraform then it may or may not make sense for people to contribute to this stuff because we're now building at a higher level it's just the vision is hard for most people to get clear in their head because it needs articulated and told as a clear strategy and then you know as you say it is an AWS product strategy so I'm not sure what you get back out of contributing

Starting point is 00:36:30 to the project other than like Thorsten I should say so Thorsten who wrote the book with me he is the number three contributor I think to the CDK and that's just because he is such a big user of it that if he sees something that annoys him, he just goes in and tries to fix it. So the benefit is he gets to use the tool. But he is a super user, so I'm not sure outside of super users what the use case is. I really want to thank you for,

Starting point is 00:36:56 I want to say, spending as much time talking to me about this stuff as you have, but that doesn't really go far enough because so much of how I think about this invariably winds up linking back to things that you have done and have been advocating for in the community for such a long time.

Starting point is 00:37:08 It is not you personally, just like your fingerprints are all over this thing. So it's one of those areas where the entire software development ecosystem is really built on the shoulders of others who have done a lot of work that came before. Often you don't get to any visibility of who those people are. So it's interesting whenever I get to talk to someone whose work I've directly built upon that I get to say thank you. Thank you for this. I really do appreciate how much more straightforward

Starting point is 00:37:32 a lot of this is than my previous approach of clicking in the console and then lying about it to provision infrastructure. No worries. Thank you for the thank you. I mean, at the end of the day, all of this stuff is just, it helps me as much as it helps everybody else. And that's, we're all trying to just make everything quicker for ourselves at the end of the day, all of this stuff is just, it helps me as much as it helps everybody else. And that's, we're all trying to just make everything quicker for ourselves at the end of the day. If people want to learn more about what you're up to, where's the best place for them to find you these days?

Starting point is 00:37:54 I mean, they can always take a job at Liberty. I hear good things about it. Yeah, we're always looking for people at Liberty, so come look up our careers. But Twitter is always the best place. So I'm niDeveloper on Twitter. You should find me pretty quickly or just type Matt Coulter into Google. You'll get me. I like that.

Starting point is 00:38:10 It's always good when it's like, oh, I'm the top Google result for my own name. On some level, that becomes an interesting thing. Some folks can do it super well. John Smith has some challenges, but yeah, most people are somewhere in the middle of that. I didn't used to be number one, but there's a guy called the kangaroo kid in Australia

Starting point is 00:38:24 who is like a stunt driver who was number one. And I always thought it was funny if people Googled the guy and thought it was me. So it's not anymore. Thank you again for, I guess, all that you do. And of course, taking the time to suffer my slings and arrows as I continue to revise my opinion of the CDK upward. No worries. Thank you for having me. Matt Coulter, senior architect at Liberty Mutual. I'm cloud economist, Corey Quinn, and this is Screaming in the Cloud.

Starting point is 00:38:50 If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice and leave an angry comment as well that will not actually work

Starting point is 00:39:04 because it has to be transpiled through a JavaScript engine first. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started. this has been a humble pod production stay humble

CODACE Plant Stand

Screaming in the Cloud - Understanding CDK and The Well Architected Framework with Matt Coulter

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Screaming in the Cloud - Understanding CDK and The Well Architected Framework with Matt Coulter

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.