Screaming in the Cloud - Understanding CDK and The Well Architected Framework with Matt Coulter
Episode Date: August 25, 2022About MattMatt is a Sr. Architect in Belfast, an AWS DevTools Hero, Serverless Architect, Author and conference speaker. He is focused on creating the right environment for empowered teams t...o rapidly deliver business value in a well-architected, sustainable and serverless-first way.You can usually find him sharing reusable, well architected, serverless patterns over at cdkpatterns.com or behind the scenes bringing CDK Day to life.Links Referenced:Previous guest appearance: https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/slinging-cdk-knowledge-with-matt-coulter/The CDK Book: https://thecdkbook.com/Twitter: https://twitter.com/NIDeveloper
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by Honeycomb.
When production is running slow, it's hard to know where problems originate.
Is it your application code, users, or the underlying systems?
I've got five bucks on DNS, personally.
Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ,
guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both
your team and your business. You should care more about one of those than the other. Which one is
up to you? Drop the separate pillars and
enter a world of getting one unified understanding of the one thing driving your business, production.
With Honeycomb, you guess less and know more. Try it for free at honeycomb.io slash screaming in the
cloud. Observability, it's more than just hipster monitoring. in terms of what's available. AWS offers NVIDIA A100 GPUs on instances
that only come in one size and cost 32 bucks an hour.
Lambda offers instances that offer those GPUs
as single card instances for $1.10 an hour.
That's 73% less per GPU.
That doesn't require any long-term commitments
or predicting what your usage is going to look like years down the road. So if you need GPUs, check out Lambda. In beta,
they're offering 10 terabytes of free storage, and this is key, data ingress and egress are both
free. Check them out at lambdalabs.com slash cloud. That's L-A-M-B-D-A-L-A-B-S dot com slash cloud. Welcome to Screaming in the Cloud. I'm
Corey Quinn. One of the best parts about, well, I guess being me, is that I can hold opinions
that are probably polite and call them incendiary. And that's great because I usually like to back
them in data. But what happens when things change?
What happens when I learn new things?
Well, do I hold on to that original opinion with two hands and a death grip?
Or do I admit that I was wrong in my initial opinion about something?
Let's find out.
My guest today returns from earlier this year.
Matt Coulter is a senior architect, since he has been promoted,
at Liberty Mutual. Welcome back, and thanks for joining me.
Yeah, thanks for inviting me back, especially to talk about this topic.
Well, we spoke about it a fair bit at the beginning of the year, and if you're listening
to this and you haven't heard that show, it's not that necessary to go into. Mostly, it was me
spouting uninformed opinions about the CDK, the Cloud Development Kit. For those who are unfamiliar, I think of it more or
less as, what if you could just structure your cloud resources using a programming language you
claim to already know, but in practice copy and paste from Stack Overflow like the rest of us?
Matt, you probably have a better description of what the CDK is in practice.
Yeah. So we like to say it's imperative code written in a declarative way or declarative code written in an imperative way. Either way, it lets you write code that produces
CloudFormation. So it doesn't really matter what you write in your script. The point is,
at the end of the day, you still have the CloudFormation template that comes out of it.
So the whole piece of it is that it's a developer experience, developer speed play,
that if you're from a background that you're more used to writing a programming language than a
YAML, you might actually enjoy using the CDK over writing straight CloudFormation or SAM.
When I first kicked the tires on the CDK, my first initial obstacle, which I've struggled
with in this industry for a bit, is that I'm just good enough of a programmer to get myself
in trouble.
Whenever I wind up having a problem that Stack Overflow doesn't immediately shine a light
on, my default solution is to resort to my weapon of choice, which is brute force.
That sometimes works out, sometimes doesn't. And as I went through the CDK
a couple of times in service to a project that I'll explain shortly, I made a bunch of missteps
with it. The first and most obvious one is that AWS claims publicly that it has support in a bunch
of languages..NET, Python, there's obviously TypeScript, there's Go support for it. I believe
that went generally available. And I'm sure I'm missing one or two, I think. Aren't I?
Yeah, so TypeScript, JavaScript, Python, Java,.NET, and Go. I think those are the currently
supported languages.
John, that's the one that I keep forgetting. It's the block printing to the script that is
basically Java cursive. The problem I run into, and this is true
of most things in my experience, when a company says that we have deployed an SDK for all of the
following languages, there is very clearly a first-class citizen language and then the rest
that more or less drift along behind with varying degrees of fidelity. In my experience, when I
tried it for the first time in Python,
it was not a great experience for me.
When I learned just enough JavaScript,
and by extension, TypeScript, to be dangerous,
it worked a lot better.
Or at least I could blame all the problems I ran into on my complete novice status
when it comes to JavaScript and TypeScript at the time.
Is that directionally aligned with what you've
experienced given that you work in a large company that uses this? And presumably once you have more
than, I don't know, two developers, you start to take on aspects of a polyglot shop no matter where
you are on some level. Yeah. So personally, I jump between Java, Python, and TypeScript whenever I'm
writing projects. So when it comes to the CDK, you'd
assume I'd be using all three. I typically stick to TypeScript and that's just because personally
I've had the best experience using it. And for anybody who doesn't know the way CDK works for
all the languages, it's not that they have written a custom like SDK for each of these languages.
It's a case of it uses a node process underneath them
and the language actually interacts with,
it's like the compiled JavaScript version
is basically what they all interact with.
So it means there are some limitations
on what you can do in that language.
I can't remember the full list,
but it just means that it is native in all those languages,
but there's certain features that you might be like, ah,
whereas in TypeScript, you can just use all of TypeScript. And my first inclination was
actually, I was using the Python one and I was having issues with some, some compiler errors
and things that are just caused by that process. And it's something that talking, talking in the
cdk.dev Slack community, there is actually a very active... Which is wonderful, I will point out.
Thank you.
There is actually an awesome Python community in there.
But if you ask them, they would all ask for improvements to the language.
So I personally, if someone's new, I always recommend they start with TypeScript
and then branch out as they learn the CDK so they can understand,
is this a me problem or is this a problem caused by the implementation? From my perspective, I didn't do anything approaching that level of deep dive.
I took a shortcut that I find has served me reasonably well in the course of my career.
When I'm trying to do something in Python and you pull up a tutorial, which I'm a big fan of
reading experience reports and blog posts, and here's how to get started. And they all had the same problem, which is step one, run npm install, and that's, hmm,
you know, I don't recall that being a standard part of the Python tooling. It is clearly designed
and interpreted and contextualized through a lens of JavaScript. Let's remove that translation
layer. Let's remove any weird issues I'm going to have in that transpilation process and just
talk in the languages written in. Will this solve my problems? Oh, absolutely not. But it will remove
a subset of them that I am certain to go blundering into like a small lost child trying to cross an
eight lane freeway. Yeah. I've heard a lot of people say the same thing because the CDK CLI
is a node process. You need it no matter what language you use. So if they were
distributing some kind of universal binary that just integrated with the languages, it would
definitely solve a lot of people's issues with trying to combine languages at deploy time.
One of the challenges that I've had as I go through the process of iterating on the project,
but I guess I should probably describe it for those who have not been following along with
my misadventures. I write blog posts about it from time to time
because I need a toy problem to kick around sometimes because my consulting work is all
advisory and I don't want to be a talking head. I have a Twitter client called lasttweetinaws.com.
It's free. Go and use it. It does all kinds of interesting things for authoring Twitter threads.
And I wanted to deploy that to a bunch of different AWS regions,
as it turns out, 20 or so at the moment.
And that led to a lot of interesting projects
and having to learn how to think
about these things differently,
because no one sensible deploys
an application simultaneously
to what amounts to every AWS region
without canary testing
and having a phased rollout and the rest.
But I'm reckless and,
honestly, as said earlier, a bad programmer. So that works out. And trying to find ways to make this all work and fit together led iteratively towards me discovering that the CDK was really
kind of awesome for a lot of this. That said, there were definitely some fairly gnarly things
I learned as I went through, due in no small part to help I
receive from generous randos in the cdk.dev Slack team. And it's gotten to a point where it's
working. And as an added bonus, I even mostly understand what it's doing, which is just kind
of wild to me. It's one of those interesting things where because it's a programming language,
you can use it out of the box the way it's designed to be used, where you can just write your simple logic, which generates
your cloud formation. Or you can do whatever crazy logic you want to do on top of that to make your
app work the way you want it to work. And providing you're not in a company like Liberty, we're not
going to do a code review. If no one's stopping you, you can do your crazy experiments. And if
you understand that it's good. But I do think something like the multi-region deploy, I mean, with CDK, if you have a construct,
it takes in a variable that you can just say what the region is. So you can actually just write a
for loop and pass it in, which does make things a lot easier than, I don't know, trying to do it
with a YAML, which you can pass in parameters, but you're going to get a lot more complicated,
a lot quicker. The approach that I took philosophically was I wrote everything in a region-agnostic way,
and it would be instantiated and be told what region to run it in as an environment variable,
what a CDK deploy was called. And then I just deploy 20 simultaneous stacks through GitHub
Actions, which invoke a custom runner, So this runs inside of a Lambda function.
And that's just a relatively basic YAML file,
thanks to the magic of GitHub Actions matrix jobs.
So it fires off 20 simultaneous processes on every commit to the main branch.
And then after about two and a half minutes,
it has been deployed globally everywhere.
And I get notified in anything that fails,
which is always fun and exciting to learn those things.
That has been, overall,
just a really useful experiment and an experience because you're right,
you could theoretically run this as a single CDK deploy
and then wind up having it iterate
through a list of regions.
The challenge I have there is that
unless I start getting into
really convoluted asynchronous concurrency stuff,
it feels like it'll just take forever.
At two and a half minutes a region times 20 regions,
that's the better part of an hour on every deploy
and no one's got that kind of patience.
So I wound up just parallelizing it
a bit further up the stack.
That said, I bet there are relatively straightforward ways
given that async is a big part of JavaScript
to do this simultaneously.
One of the pieces of feedback I've seen about CDK
is if you have
multiple stacks in the same project, it'll deploy them one at a time. And that's just because it
tries to understand the dependencies between the stacks and then it works out which one should go
first. But a lot of people have said, well, I don't want that. If I have 20 stacks, I want all
20 to go once the way you're saying. And I have seen that people have been writing plugins to enable concurrent deploys
with CDK out of the box.
So it may be something that's,
it's not an out of the box feature,
but it might be something
that you can pull in a community plugin
to actually make work.
Most of my problems with it at this point
are really problems with CloudFormation.
CloudFormation does not support,
well, if at all, secure string parameters
from the AWS Systems Manager parameter store,
which is what my default go-to for secret storage. And Secrets Manager is supported,
but that also costs 40 cents a month per secret. And not for nothing, I don't really want to have
all five secrets deployed to Secrets Manager in every region this thing is in. I don't really
want to pay $20 a month for this basically free application just to hold some secrets.
So I wound up talking to some folks in the
Slack channel, and what we came up with was
I have a centralized S3
bucket that has a JSON object that lives
in there. It's only accessible from the deployment
role, and it grabs that
at deploy time and stuffs it into environment
variables when it pushes these things out.
That's the only stateful part of all of this.
And it felt like that is on some level a pattern that a lot of people would benefit from if it had
better native support, with the counter-argument that if you're only deploying to one or two
regions, then Secrets Manager is the right answer for a lot of this, and it's not that big of a deal.
Yeah, and it's another one of those things if you were deploying in Liberty,
we'll say, well, your secret is unencrypted at runtime, so you probably need a KMS key involved in
that, which is, you know, the costs of KMS.
It depends on if it's a personal solution or if it's something for like a Fortune 100
company.
And if it's personal solution, I mean, what you're saying sounds great that it's, I am
restricted in S3 and then that way only at deploy time can it be read.
It actually could be a custom construct that someone could build and publish out there to the construct library or the construct hub, I should say.
To be clear, the reason I'm okay with this from a security perspective is, one, this isn't a dedicated AWS account.
This is the only thing that lives in that account. And two, the only API credentials we're talking about are the
application-specific credentials for this Twitter client when it winds up talking to the Twitter
API. Basically, if you get access to these and are able to steal them and deploy somewhere else,
you get no access to customer data or user data, because this is not charged for anything. You get
no access to things that have been sent out.
All you get to do is submit tweets to Twitter,
and it'll have the string last tweet in AWS as your client
rather than whatever normal client you would use.
It's not exactly what we'd call a high-value target
because all the sensitive-to-a-user data
lives in local storage in their browser.
It is fully stateless.
Yeah, so this is what I mean,
like it's the difference in what you're using your app for. Perfect case of you can just go
into the Twitter app and just withdraw those credentials and do it again if something happens,
whereas, as I say, if you're building it for liberty, that it will not pass one of our
well-architected reviews just for that reason. If I were going to go and deploy this in a more, I guess, locked down environment,
I would be tempted to find alternate approaches
such as have it encrypted at rest via KMS
in S3 is one option.
So is having global DynamoDB tables
that wind up grabbing those things
or even grabbing it at runtime if necessary.
There are ways to make that credential
more secure at rest.
It's just, I look at this
from a real world perspective of what is the actual attack surface on this. And I have a
really hard time just identifying anything that is going to be meaningful with regard to an exploit.
If you're listening to this and have a lot of thoughts on that matter, please reach out. I'm
willing to learn and change my opinion on things. Yeah. One thing I will say about the Dynamo
approach you mentioned, I'm not sure everybody knows this, but inside the same Dynamo table, you can scope
down a row. You can be like this row and this field in this row can only be accessed from this
one Lambda function. So there's, there's a lot of really awesome security features inside DynamoDB
that I don't think most people take advantage of, but they open up a lot of options for simplicity. Is that tied to the very recent
announcement about Lambda getting source ARN as a condition key? In other words, you can say this
specific Lambda function as opposed to a Lambda in this account. That was a relatively recent
advent that I haven't fully explored the nuances of. Yeah, that has opened a lot of doors. I mean,
the Dynamo being able to be locked down to your row has been around for a while, but the new Lambda from SourceArm is awesome
because, yeah, as you say, you can literally say this thing as opposed to you have to start going
into tags or you have to start going into something else to find it. So I want to talk about something
you just alluded to, which is the well-architect Framework. And initially when it launched, it was a whole
framework and AWS made a lot of noise about it on keynote stages, as they are wont to do.
And then later they created a quote-unquote well-architected tool, which, let's be very
direct, it's the checkbox survey form, at least the last time I looked at it. And they now have,
I believe, six pillars of the Well-Arched framework where they talk about things like security, cost, the sustainability is the new pillar.
I don't know, absorbency or whatever the remainders are.
I can't think of them off the top of my head.
How does that map to your experience with the CDK?
Yeah, so out of the box, the CDK from day one was designed to have sensible defaults.
And that's why a lot of the things you deploy have opinions.
I talked to a couple of the heroes and they were like, I wish it had less opinions.
But that's why whenever you deploy something, it's got a bunch of configuration already in there.
For me in the CDK, whenever I use constructs or stacks or deploy anything in the CDK, I always build it in a well-architected way.
And that's such a loaded sentence. Whenever you say the words well-architected way. And that's such a loaded sentence.
Whenever you say the words well-architected,
the people go, what do you mean?
And that's where I go through the six pillars.
And in Liberty, we have a process.
It used to be called Scorp because it was five pillars,
but now it's Scorps because they added sustainability.
But that's where for every stack we'll go through it.
And we'll be like, okay, let's have the discussion.
And we will use the tool that you mentioned. mean the tool as you say it's it's a bunch of
tick boxes with a text box but the idea is we'll get in a room and as we build these starter patterns
or these pieces of infrastructure that people are going to reuse we'll run the well-architected
review against the framework before anybody gets to generate it. And then we can say out of the box, if you generate this thing, these are the pros and
cons against the well-architected framework of what you're getting, because we can't make it
a hundred percent bulletproof for your use case because we don't know it, but we can tell you
out of the box what it does. And then that way you can keep building. So they start off with
something that is well-documented, how well-architected it is. And way you can keep building so they start off with something that is well documented
how well architected it is and then you can start having it makes it a lot easier to have those
conversations as they go forward because you just have to talk about the delta as they start adding
their own code then you can go in you go okay you've added these 20 lines let's talk about what
they do and that's why i always think you can do a strong connection between infrastructure as code and well-architected.
As I look through the actual six pillars of the well-architected framework, sustainability,
cost optimization, performance efficiency, reliability, security, and operational excellence,
as I think through the nature of what this shitpost thread Twitter client is, I am reasonably
confident across all of those pillars.
I mean, first off, when it comes to the cost optimization pillar,
please don't come to my house and tell me how that works.
Yeah, obnoxiously, the security pillar is sort of the thing
that winds up causing a problem for this,
because this is in an account deployed by a control tower.
And when I was getting this all set up,
my monthly cost for this thing was something like a dollar in charges.
And then another $16 for the AWS config rule evaluations on all of the deploys, which is, it just feels like a tax and going about your business, but fine, whatever.
Cost and sustainability, from my perspective, also tend to be hand in glove when it comes to this stuff.
When no one is using the client, it is not taking up any compute resources.
It has no carbon footprint of which to speak, by my understanding. It's very hard to optimize this down further from
a sustainability perspective without barging my way into the middle of an AWS negotiation with
one of its power companies. Yeah, so for everyone listening, watch as we do a live, well-architected
review. Oh yeah, I expect we should do this on Twitter one of these days.
I think it'd be a fantastic conversation,
or Twitch, or whatever the kids are using these days.
Yeah.
And again, so much of it, too,
is thinking about the context of security.
You work for one of the world's largest insurance companies.
I shitpost for a living.
The relative access and consequences
of screwing up the security on this
are nowhere near equivalent. And I think
that's something that often gets lost. The perfect be the enemy of the good.
So that's why, unfortunately, the well-architected tool is quite loose. So that's why they have the
well-architected framework, which is there's a white paper that just covers anything, which is
quite big. And then the wrote specific lenses for like serverless or other use cases that are shorter. And then when you do a well-architected review, it's like loose. And
it's sort of like, how are you applying the principles of well-architected and the conversation
that we just had about security. So you would write that down in the box and be like, okay,
so I understand if anybody gets this credential, it means they can post his last tweet in AWS.
And that's okay. That's the client, not the Twitter account to be clear.
Yeah. So that's okay. That's what you, not the Twitter account, to be clear. Yeah, so that's okay.
That's what you just marked down
in the Well-Architected Review.
And then if we go to do one
in the future,
you can compare it
and we can go,
oh, okay, so last time
you said this.
And you can go,
well, actually,
I decided to,
or we pivoted.
We're a bank now.
Yeah.
So that's it.
We do more than tweets now.
We decided to do
microtransactions
through cryptocurrency
over Twitter.
I don't know.
And that ends this conversation.
No, no.
But yeah, so if something changes,
that's what the Well-Architected Review is for.
It's about facilitating the conversation
between the architect and the engineer.
That's all it is.
This episode is sponsored in parts
by our friend EnterpriseDB.
EnterpriseDB has been powering enterprise applications with
PostgreSQL for 15 years, and now EnterpriseDB has you covered wherever you deploy PostgreSQL,
on-premises, private cloud, and they just announced a fully managed service on AWS and Azure called
Big Animal. All one word. Don't leave managing your database to your cloud vendor
because they're too busy
launching another half dozen
managed databases
to focus on any one of them
that they didn't build themselves.
Instead, work with the experts
over at EnterpriseDB.
They can save you time and money.
They can even help you migrate
legacy applications,
including Oracle,
to the cloud.
To learn more,
try Big Animal for free.
Go to biganimal.com slash snark
and tell them Corey sent you.
And the lenses also are helpful.
This is a serverless application,
so we're going to view it through that lens,
which is great because the original version
of the well-architected tool is,
oh, you built this thing entirely in Lambda.
Have you bought some reserved instances for it?
And it's, yeah, why do I feel like
I have to explain to AWS how their own systems work?
This makes it a lot more streamlined and talks about this, though it still does struggle
with the concept of, in my case, a stateless app.
That is still something that I think is not the common path.
Imagine that.
My code is also non-traditional.
Who knew?
Who knew?
The one thing that's good about it, if anybody doesn't know, they just updated the serverless lens about, I don't know, a week or two ago. So they added in
a bunch more use cases. So if you've read it six months ago or even three months ago,
go back and reread it because they spent a good year updating it.
Thank you for telling me that. That will, of course, wind up in next week's issue of Last
Week in AWS. You can go back and look at the archives and figure out what week we recorded this then.
Good work.
One thing that I have learned as well,
as of yesterday, as it turns out,
before we wound up having this recording,
obviously, because yesterday generally tends to come before today.
That is a universal truism,
is that I had to do a bit of refactoring
because what I learned when I was in New York
live tweeting the AWS Summit
is that the Route 53 latency record
works based upon where your DNS server is.
Yeah, that makes sense.
I use Tailscale and wind up using my Pi hole,
which lives back in my house in San Francisco.
Yeah, I was always getting US West 1
from across the country.
Cool.
For those weird edge cases like me,
because this is not the common case,
how do I force a local region?
Ah, I'll give
it its own individual region prepanned as a subdomain. Getting that to work with both the
global last tweet in AWS.com domain as well as the subdomain on API gateway through the CDK was not
obvious on how to do it. Randall Hunt over at Kalent was awfully generous and came up with a
proof of concept in about three minutes because he's Randall. And that was extraordinarily helpful. But a challenge
I ran into was that the CDK deploy would fail because the way that CloudFormation was rendered
and the way it was trying to do stuff, oh, that already has that domain affiliated in a different
way. I had to do a CDK destroy than a CDK deploy for each one. Now, not the end of the world,
but it got me thinking.
Everything that I see around the CDK
more or less distills down
to either Greenfield
or a day one experience.
That's great,
but throw it all away and start over
is often not what you get to do.
And even though Amazon says
it's always day one,
those of us in, you know,
real companies don't get to just treat everything as brand new and throw away everything older than 18 months.
What is the day two experience looking like for you? Because you clearly have a legacy business.
By legacy, I of course use it in the condescending engineering term that means it makes actual money
rather than just telling
really good stories to venture capitalists for 20 years. Yeah, we still have mainframes running that
make a lot of money. So I don't mock legacy at all. What that piece of crap do about $4 billion
a year in revenue, perhaps show some respect. It's a common refrain. Yeah, exactly. So yeah,
anyone listening, don't mock legacy because as Corey says, it is running the business. But for us, when it comes to day two, it's something that I'm
actually really passionate about this in general, because it is really easy. Like I did it with CDK
patterns. It's really easy to come out and be like, okay, we're going to create a bunch of
starter patterns or quick starts or whatever flavor that you came up with. And then you're
going to deploy this thing
and we're going to have you in production in 30 seconds.
But even day one, later that day, not even necessarily day two,
it depends on who it was that deployed it
and how long they've been using AWS.
So you hear these stories of people who deployed something to experiment
and they either forget to delete it, it cost them a lot of money,
or they try to change it and it
breaks because they didn't understand what was in it. And this is where the community starts to
diverge in their opinions on what AWS CDK should be. There's a lot of people who think that
at the minute, CDK, even if you create an abstraction in a construct, even if I create
a construct and put it in the construct library that you get to use it still unravels and deploys as part of your deploy so everything that's
associated with it you now own and you technically need to understand that at some point because it
might in theory break whereas there's a lot of people who think okay the CDK needs to go server
side and an abstraction needs to stay an abstraction
in the cloud and then that way if somebody's looking at a 20 line CDK construct or stack
then it stays 20 lines it never unravels to something crazy underneath
I mean that's one approach I think it'd be awesome if that could work I'm not sure how the support
for that would work from it you've got something running on the cloud I'm not sure how the support for that would work from it. You've got something running on the cloud. I'm pretty sure AWS aren't going to jump on a call to support some construct
that I deployed. So I'm not sure how that'll work in the open source sense. But what we're doing at
Liberty is the other way. So I mean, we famously have things like the software accelerator that
lets you pick a pattern and it creates your pipelines and you're deployed. But now what
we're doing is we're building a lot of telemetry and automated information around
what you deployed. So that way, and it's all based on well-architected common theme.
So that way, what you can do is you can go into-
It's partially audibility and partially at a glance, figure out, okay, are there some
things that can be easily remediated as we basically shift that whole thing left?
Yeah. So you deploy something and it should be good the second
you deploy it but then you start making changes because you're corey you just start adding some
stuff and you deploy it and if it's really bad it won't deploy like that's the liberty setup there's
a bunch of rules and i'll go okay that's really bad that'll cause damage to customers but there's
a large gap between bad and good that people don't really understand the difference
that can cost a lot of money or can cause a lot of grief for developers because they
go down the wrong path.
So that's why what we're now building is after you deploy, there's a dashboard that'll just
come up and be like, hey, we've noticed that your Lambda function has too little memory.
It's going to be slow.
You're going to have bad cold starts or, you know, things like that.
The knowledge that I have had to gain through hard fighting over the past couple of years, putting it into automation. And that way, combined with the well-architected reviews,
you actually get me sitting in a call going, okay, let's talk about what you're building
that hopefully guides people the right way. But I still think there's so much more we can do for day two, because even if you
deploy the best solution today, six months from now, AWS will release 18 new services that make
it easier to do what you just did. So someone also needs to build something that shows you the delta
to get to the best. And that would involve AWS or somebody thinking cohesively, like these are how we use our products.
And I don't think there's a market for it as a third party company, unfortunately, but I do think that's where we need to get to.
That at day two, somebody can give the way we're trying to do for Liberty Advice automated that says, I see what you're doing, but it would be better if you did this instead. Yeah, I definitely want to spend more time thinking about these
things and analyzing how we wind up addressing them and how we think about them going forward.
I learned a lot of these lessons over a decade ago. I was fairly deep into using Puppet and came
to the fair and balanced conclusion that Puppet was a steaming piece of crap. So the solution was
that I was one of the very early developers
behind SaltStack, which was going to do everything
right. And it was. And it was awesome
and it was glorious. Right
up until I saw a
environment deployed by
someone else who was not as
familiar with the tool as I was,
at which point I realized, hell is other
people's use cases. And the way that they
contextualize these things.
You craft a finely balanced torque wrench.
It's a thing of beauty.
And people complain about the crappy hammer.
You're holding it wrong.
No, don't do it that way.
So I have an awful lot of sympathy
for people building platform level tooling like this,
where it works super well for the use case that they're in,
but not necessarily,
they're not necessarily aligned in other ways.
It's a very hard nut to crack. Yeah. And like, even as you mentioned earlier,
if you take one piece of AWS, for example, API gateway, and I love the API gateway team,
if you're listening, don't hate on me, but there's like 47,000 different ways you can deploy an API
gateway. And the CDK has to cover all of those.
It would be a lot easier if there was less ways that you could deploy the thing and then you can
start crafting user experiences on a platform. But whenever you start thinking that every AWS
component is kind of the same, like think of the amount of ways you can deploy a Lambda function
now or think of like containers, I'll not even go into the number of ways to run containers
if you're building a platform either you support it all and then it sort of gets quite generic
or you're going to do like what serverless cloud are doing now like jeremy daly's building this
unique experience that's like okay the code is going to build the infrastructure so just build
a website and we'll do it all behind it and i think they're really interesting because they're
sort of opposites in that one doesn't want to support everything,
but it should theoretically
for their slice of customers be awesome.
And then the other one's like,
well, let's see what you're going to do.
Let's have a go at it
and I should hopefully support it.
I think that there's so much that can be done on this,
but before we wind up calling it an episode,
I had one further question that I wanted to explore
around the recent results of the
community CDK survey that I believe is a quarterly event. And I read the analysis on this, and I
talked about it briefly in the newsletter, but it talks about adoption and a few other aspects of it.
And one of the big things it looks at is the number of people who are contributing to the CDK
in an open source context. Am I just thinking about this the wrong way
when I think that, well, this is a tool
that helps me build out cloud infrastructure.
Me having to contribute code to this thing at all
is something of a bug.
Whereas, yeah, I want this thing to work out super well.
Docker is open source,
but you'll never see me contributing things to Docker
as a pull request because it does, as it says on the tin, I don't have any problems that I'm aware of that, ooh, it should
do this instead. I mean, I have opinions on that, but those aren't pull requests. Those are complete,
you know, shifts of product strategy, which it turns out is not quite done on GitHub.
So it's funny. I, a while ago was talking to a lad who was the person who came up with the idea for the CDK and CDK is pretty much the open source project for AWS if you look at what they have and the
thought behind it it's meant to evolve into what people want and need so yes there is a product
manager in AWS and there's a team fully dedicated to building it, but the ultimate aspiration was always, it should be bigger than
AWS and it should be community driven. Now, personally, I'm not sure, like you just said it,
what the incentive is given that right now CDK only works with cloud formation, which means that
you are directly helping with an AWS tool, but it does give me hope for like there's CDK for
Terraform and there's CDK for Kubernetes.
And there's other flavors based on the same technology as AWS CDK that potentially could
have a thriving open source community because they work across all the clouds. So it might
make more sense for people to jump in there. Yeah. I don't necessarily think that there's a
strong value proposition as it stands today for the idea of the CDK becoming something
that works across other cloud providers.
I know it technically has the capability,
but if I think that Python isn't quite a first-class experience,
I don't even want to imagine what other providers
are going to look like from that particular context.
Yeah, and that's, from what I understand,
I haven't personally jumped into the CDK for Terraform.
And we didn't talk about it here, but in CDK you get your different levels of construct.
L1 is like a CloudFormation level construct, so everything that's in there directly maps
to a property in CloudFormation.
And then L2 is AWS's opinion on safe defaults.
And then L3 is when someone like me comes along and turns it into something that you
may find useful.
So it's a pattern. As far as I know, CDK for Terraform is still on level one. They haven't
got the rich... And L4 is just hiring you as a consultant to come in and fix my nonsense for me.
That's it. L4 could be Pulumi recently announced that you can use AWS CDK constructs inside it.
But I think it's one of those things where the constructs, if they can move across these
different tools, the way AWS CDK constructs now work inside Pulumi and there's a beta version that works inside CDK
for Terraform then it may or may not make sense for people to contribute to this stuff because
we're now building at a higher level it's just the vision is hard for most people to get clear
in their head because it needs articulated and told as a clear strategy and then you know
as you say it is an AWS product strategy so I'm not sure what you get back out of contributing
to the project other than like Thorsten I should say so Thorsten who wrote the book with me
he is the number three contributor I think to the CDK and that's just because he is such a big user
of it that if he sees something that annoys him, he just goes in and tries to fix it.
So the benefit is he gets to use the tool.
But he is a super user,
so I'm not sure outside of super users
what the use case is.
I really want to thank you for,
I want to say,
spending as much time talking to me
about this stuff as you have,
but that doesn't really go far enough
because so much of how I think about this
invariably winds up linking back
to things that you have done
and have been advocating for in the community for such a long time.
It is not you personally, just like your fingerprints are all over this thing.
So it's one of those areas where the entire software development ecosystem is really built
on the shoulders of others who have done a lot of work that came before.
Often you don't get to any visibility of who those people are.
So it's interesting whenever I get to talk to someone
whose work I've directly built upon that I get to say thank you.
Thank you for this.
I really do appreciate how much more straightforward
a lot of this is than my previous approach
of clicking in the console and then lying about it
to provision infrastructure.
No worries. Thank you for the thank you.
I mean, at the end of the day, all of this stuff is just,
it helps me as much as it helps everybody else. And that's, we're all trying to just make everything quicker for ourselves at the end of the day, all of this stuff is just, it helps me as much as it helps everybody else.
And that's, we're all trying to just make everything quicker for ourselves at the end of the day.
If people want to learn more about what you're up to, where's the best place for them to find you these days?
I mean, they can always take a job at Liberty. I hear good things about it.
Yeah, we're always looking for people at Liberty, so come look up our careers.
But Twitter is always the best place.
So I'm niDeveloper on Twitter.
You should find me pretty quickly
or just type Matt Coulter into Google.
You'll get me.
I like that.
It's always good when it's like,
oh, I'm the top Google result for my own name.
On some level, that becomes an interesting thing.
Some folks can do it super well.
John Smith has some challenges,
but yeah, most people are somewhere in the middle of that.
I didn't used to be number one,
but there's a guy called the kangaroo kid in Australia
who is like a stunt driver who was number one. And I always thought it was funny
if people Googled the guy and thought it was me. So it's not anymore. Thank you again for,
I guess, all that you do. And of course, taking the time to suffer my slings and arrows as I
continue to revise my opinion of the CDK upward. No worries. Thank you for having me.
Matt Coulter, senior architect at Liberty Mutual.
I'm cloud economist,
Corey Quinn,
and this is Screaming in the Cloud.
If you've enjoyed this podcast,
please leave a five-star review
on your podcast platform of choice.
Whereas if you've hated this podcast,
please leave a five-star review
on your podcast platform of choice
and leave an angry comment as well
that will not actually work
because it has to be
transpiled through a JavaScript engine first. If your AWS bill keeps rising and your blood
pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill
by making it smaller and less horrifying. The Duck Bill Group works for you,
not AWS. We tailor recommendations to your business and we get to the point.
Visit duckbillgroup.com to get started. this has been a humble pod production
stay humble