Screaming in the Cloud - Would You Kindly Remind with Peter Hamilton
Episode Date: March 31, 2022About PeterPeter's spent more than a decade building scalable and robust systems at startups across adtech and edtech. At Remind, where he's VP of Technology, Peter pushes for building a sust...ainable tech company with mature software engineering. He lives in Southern California and enjoys spending time at the beach with his family.Links:Redis: https://redis.com/Remind: https://www.remind.com/Remind Engineering Blog: https://engineering.remind.comLinkedIn: https://www.linkedin.com/in/hamiltopEmail: peterh@remind101.com
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
Today's episode is brought to you in part by our friends at Minio,
the high-performance Kubernetes native object store that's built for the multi-cloud,
creating a consistent data storage layer for your public cloud instances,
your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as,
which depends probably on where you work.
It's getting that unified is one of the greatest challenges facing developers and architects
today.
It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run
any workload, and the footprint to run anywhere.
And that's exactly what Minio offers.
With superb read speeds in excess of 360 gigs
and a 100 megabyte binary
that doesn't eat all the data you've got on the system,
it's exactly what you've been looking for.
Check it out today at min.io slash download
and see for yourself.
That's min.io slash download. And be sure to tell them that I sent you.
This episode is sponsored in part by our friends at Vulture, spelled V-U-L-T-R,
because they're all about helping save money, including on things like, you know, vowels.
So what they do is they are a cloud provider that provides
surprisingly high performance cloud compute at a price that, well, sure, they claim it is better
than AWS's pricing. And when they say that, they mean that it's less money. Sure, I don't dispute
that. But what I find interesting is that it's predictable. They tell you in advance on a monthly
basis what it's going to cost. They have a bunch of advanced networking features. They have 19 global locations and scale
things elastically, not to be confused with openly, which is apparently elastic and open.
They can mean the same thing sometimes. They have had over a million users. Deployments take less
than 60 seconds across 12 pre-selected operating systems,
or if you're one of those nutters like me, you can bring your own ISO and install basically
any operating system you want. Starting with pricing as low as $2.50 a month for Vulture
Cloud Compute, they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having
something of the scale on their own. Try Vulture today for free by visiting vulture.com slash
screaming, and you'll receive $100 in credit. That's v-u-l-t-r dot com slash screaming.
Welcome to Screaming in the Cloud. I'm Corey Quinn, and this is a fun episode.
It is a promoted episode, which means that our friends at Redis have gone ahead and sponsored this entire episode.
I asked them, great, who are you going to send me from generally your executive suite?
And they said, nah, you already know what we're going to say.
We want you to talk to one of our customers.
And so here we are.
My guest today is Peter Hamilton, VP of Technology at Remind. Peter, thank you for
joining me. Thanks, Corey. Excited to be here. It's always interesting when I get to talk to
people on Promoted Guest episodes when they're a customer of the sponsor. Because to be clear, you do not work for Redis.
This is one of those stories you enjoy telling,
but you don't personally have a stake
in whether people love Redis, hate Redis, adopt it or not,
which is exactly what I try and do on these shows.
There's an authenticity to people
who have in the trenches experience
who aren't themselves trying to sell the thing
because that is their entire job in this world.
Yeah. You just presented three or four different opinions, and I guarantee we felt all of them at
different times. So let's start at the very beginning. What does Remind do?
So Remind is a messaging tool for education, largely K through 12.
We support about 30 million active users across the country, over 2 million teachers, making sure that every student has equal opportunities to succeed and that we can facilitate as much learning as possible.
When you say messaging, that could mean a bunch of different things to a bunch of different people.
Once on a lark, I wound up sitting down, this was years ago, so I'm sure the number is a woeful underestimate now, of how many AWS services I could use to send a message from me to you.
And this is without going into the lunacy territory of, well, I can tag a thing and then mail it to you like a snowball edge or something.
No, this is using them as intended. I think I got to 15 or 16 of them. When you say messaging,
what does that mean to you? So for us, it's about communication to the end user. We will do
everything we can to deliver whatever message a teacher or a district administrator has to the
user. We go through SMS, text messaging. We go
through Apple and Google's push services. We go through email. We go through voice call,
really pulling at all the stops we can to make sure that these important messages get out.
And I can only imagine some of the regulatory pressure you almost certainly experience. It
feels like it's not quite to HIPAA levels where, oh, there's a private cause of action if any of this stuff gets out.
But people are inherently sensitive about communications involving their children.
I always sort of knew this in a general sense.
And then I had kids myself.
And, oh, yeah, suddenly I really care about those sorts of things.
Yeah.
One of the big challenges is you can build great systems that do the correct thing.
But at the end of the day, we're relying on a
teacher choosing the right recipient when they send a message. And so we've had to build a lot
of processes and controls in place so that we can kind of satisfy two conflicting needs. One is
to provide a clear audit log, because that's an important thing for districts to know
if something does happen, that we have clear communication. And the other is to also be
able to jump in and intervene when something inappropriate or mistaken is sent out to the
wrong people. Remind has always been one of those companies that has a somewhat exalted reputation
in the AWS space. You folks have been early adopters of a bunch of different services,
which, let's be clear, in the responsible way, not the,
well, they said it on stage, time to go ahead and put everything they just listed into production,
because we, for some godforsaken reason, view it as a to-do list. But you've been thoughtful about
how you approach things, and you have been around as a company for a while, but you've also been
making a significant push toward being cloud-native by certain definitions of that term. So I know this
sounds like a college entrance essay, but what does cloud native mean to you?
So one of the big gaps, if you take an application that was written to be deployed in a traditional
data center environment and just drop it in the cloud, what you're going to get is a flaky data
center. Well, that's not fair. It's also going to be extremely expensive.
Sorry, an expensive flaky data center.
There we go.
There we go.
What we've really looked at, and a lot of this goes back to our history in the earlier
days, we ran on top of Heroku, and it was kind of the early days of what they call the
12-factor application.
But making aggressive decisions about how you structure your architecture and application
so that you fit in with some of the cloud tools architecture and application so that you fit in
with some of the cloud tools that are available and that you fit in, you know, with the operating
models that are out there. When you say an aggressive decision, what sort of thing are
you talking about? Because when I think of being aggressive with my approach to things like AWS,
it usually involves Twitter. And I'm guessing that is not the direction you intend that to go.
No, I think if you look at Twitter or Netflix or some of these players that, quite frankly, have defined what AWS is to us today through their usage patterns, not quite that.
Oh, I mean using Twitter to yell at them explicitly about things, because I don't do passive aggressive. I just do aggressive. Got it. No, I think in our case, it's been plotting a very narrow path that allows us to avoid some of the bigger pitfalls. We have our sponsor here, Redis. I can talk a little bit
about our usage of Redis and how that's helped us in some of these cases. One of the pitfalls
you'll find with pulling a non-cloud native application, putting the cloud is state is hard
to manage. If you put state on all your machines and machines go down, networks fail, all those things, you now no longer have access to
that state. And we start to see a lot of problems. One of the decisions we've made is try to put as
much state as we can into data stores like Redis or Postgres or something in order to decouple our
hardware from the state we're trying to manage and provide for our users so that we're more
resilient to those sorts of failures. I get the sense from the way that we're having this
conversation, when you talk about Redis, you mean actual Redis itself, not ElastiCache for Redis,
or as I'm tending to increasingly think about AWS's services, Amazon Basics for Redis.
Yeah, I mean, Amazon has launched a number of products. They have their ElastiCache,
they have their new MemoryDB. There's a lot of different ways to use this. We've relied pretty
heavily on Redis, previously known as Redis Labs, and their enterprise product in their cloud in
order to take care of our most important data, which we just don't want to manage ourselves.
Trying to manage that on our own, using something like Elastic Cache, there's so many pitfalls,
so many ways that we can lose that data.
This data is important to us.
By having it in a trusted place and managed by a great ops team like they have at Redis,
we're able to then lean in on the other aspects of cloud native to really get as much value
as we can out of AWS.
I am curious. As I said, you've had a reputation as a company for a while
in the AWS space of doing an awful lot of really interesting things. I mean, you have a robust
GitHub presence. You have a whole bunch of tools that have come out of our mind that are great.
I've linked to a number of them over the years in the newsletter. You are clearly not afraid culturally to get your hands dirty and build things yourself,
but you are using Redis Enterprise as opposed to open source Redis. What drove that decision?
I have to assume it's not, wait, you mean I could get it for free as an open source project? Why
didn't someone tell me? What brought you to that decision? Yeah, a big part of this is what we could call operating leverage. Building a great
set of tools that allow you to get more value out of AWS is a little different story than babysitting
servers all day and making sure they stay up. So if you look through most of our contributions in open source space, have really been around,
here's how to expand upon these foundational pieces from AWS. Here's how to more efficiently
launch a suite of servers into an auto-scaling group. Here's our Troposphere and other pieces
there. This was all before Amazon's CDK product, but really it was, here's how we can more effectively use CloudFormation to capture our infrastructure as code. And so we are not afraid in any way
to invest in our tooling and invest in some of those things. But when we look at the trade-off
of directly managing stateful services and dealing with all the uncertainty that comes,
we feel our time is better spent working on our product and
delivering value to our users and relying on partners like Redis in order to provide that
stability we need. You raise a good point. An awful lot of the tools that you have put out there
are the best, from my perspective, approach to working with AWS services. And that is a relatively thin layer built on top of them
with an eye toward making the user experience more polished,
but not being so heavily opinionated
that as soon as the service goes in a different direction,
the tool becomes completely useless.
You just decide to make it a bit easier
to wind up working with specific environment variables
or profiles rather than what appears to be the AWS UX approach of,
oh, now type in your access key, your secret key, and your session token,
and we've disabled copy and paste.
Go, have fun.
You've really done a lot of quality of life improvements
more so than you have.
This is the entire system of how we do deploys start to finish.
It's opinionated and sort of like a take on what Netflix did once upon a time with Asgard. It really feels like it's just the right level
of abstraction. We've done a pretty good job. I will say, years later, we felt that we got it
wrong a couple of times. It's been really interesting to see that, that there are times
when we say, oh, we could take these three or four services and wrap it up into this new concept
of an application. And over time, we start poking holes in that four services and wrap it up into this new concept of an application.
And over time, we have to start poking holes in that new layer and we start to see
we would have been better served by sticking with as thin a layer as possible
that enables us rather than trying to get these higher level pieces.
It's remarkably refreshing to hear you say that, just because so many people love to tell the story on podcasts or on conference stages or whatever format they have of, this is what we built.
And it is an aspirationally superficial story about this.
They don't talk about that, well, first we went down these three wrong paths first.
It's always a, oh, yes, obviously we are smart people and we only make the correct decision. And I remember in the before time, sitting in conference talks, watching people talk about great things they've done.
And I'll turn to the person next to me and say, wow, I wish I could be involved in a project like that.
And they'll say, yes, so do I.
And it turns out they work at the company the speaker is from.
Because all of these things tend to be the most positive story.
Do you have an example of something that you have done in your production environment that
going back, yeah, in hindsight, would have done that completely differently?
Yeah. So coming from Heroku, moving into AWS, we had a great open source project called Empire,
which kind of bridged that gap between them, but used Amazon's ECS in order to launch applications.
It was actually command line
compatible with the Heroku command when it first launched. So a very big commitment there.
And at the time, I mean, this comes back to a point I think we were talking about earlier,
where architecture, costs, infrastructure, they're all interlinked. And I'm a big fan of Conway's law,
which says that an organization's structure needs to match its architecture. And I'm a big fan of Conway's law, which says that an organization's structure needs
to match its architecture. And so six, seven years ago, we're a heavy growth-based company,
and we are interns running around, doing all the things. And we wanted to have really strict
guardrails and a narrow set of things that our development team could do. And so we built a
pretty constrained, you will launch, you will have one Docker image per ECS service. It can only do these specific things. And this allowed our
development team to focus on pretty buttons on the screen and user engagement and experiments and
whatnot. But as we've evolved as a company, as we've built out a more robust business, we've
started to track revenue and costs of goods sold more aggressively, we've seen there's a lot
of inefficient things that come out of that. One particular example was we use pgBouncer for our
connection pooling to our Postgres application. In the traditional model, we had an autoscaling
group for our pgBouncer, and then our autoscaling groups for the other applications would connect to
it. And we saw additional latency, we saw additional costs, and we eventually kind of tore all that
down and packaged that PG Bouncer alongside the applications that needed it. And this was a
configuration that wasn't available in our first pass. It was something we intentionally did not
provide to our development team. And we had to unwind that. And when we did, we saw better
performance, we saw better performance.
We saw better cost efficiency,
all sorts of benefits that we care a lot about now
that we didn't care about as much many years ago.
It sounds like you're describing some semblance
of an internal platform
where instead of letting all of your engineers effectively,
well, here's the console.
Ideally, you use some form of infrastructure as code.
Good luck, have fun.
You effectively gate access to that.
Is that something that you're still doing
or have you taken a different approach?
So our primary gate is our infrastructure as code repository.
If you want to make a meaningful change,
you open up a PR, got to go through code review.
You need people to sign off on it.
Anything that's not there may not exist tomorrow.
There's no guarantees. And we've gone around occasionally just shut random servers down that
people spun up in our account. And sometimes people are a little grumpy about it, but you
really need to enforce that culture that we have to go through the correct channels and we have to
have this cohesive platform, as you said, to support our development efforts. So you're a messaging service in education.
So whenever I do a little bit of digging into backstories of companies and what has made,
I guess, an impression, you look for certain things and explicit dates are one of them,
where on March 13th of 2020, your business changed just a smidgen. What happened other than the obvious,
we never went outside for two years? So if we roll back a week, you know,
that's March 13th. So if we roll back a week, we're looking at March 6th. On that day,
we sent out about 60 million messages over all of our different mediums, text, email, push notifications.
On March 13th, that was 100 million. And then a few weeks later, on March 30th, that was 177
million. And so our traffic effectively tripled over the course of those three weeks. And yeah,
that's quite a ride, let me tell you. The opinion that a lot of folks have who've not gotten to play in
sophisticated distributed systems is, well, what's the hard part there? You have an auto-scaling
group just spin up three times the number of servers in that fleet and problem solved. What's
challenging? A lot. But what did you find that the pressure points were? So I love that example
that your auto-scaling group will just work. By default, Amazon's auto scaling groups only support a thousand backends. So when your auto
scaling group goes from 400 backends to 1200, things break and not in ways that you would have
expected. You start to learn things about how database systems provided by Amazon have limits
other than CPU and memory. And they're clearly laid out that there's network bandwidth limits
and things you have to worry about.
We had a pretty small team in that time,
and we got in this cadence where every Monday morning
we would wake up at 4 a.m. Pacific
because as part of the pandemic, our traffic shifted.
So our East Coast users would be most active in the morning
rather than the afternoon.
And so at about 7 a.m. on the East Coast is when be most active in the morning rather than the afternoon. And so at about
7 a.m. on the East Coast is when everyone came online. And we had our Monday morning crew there
and just looking to see where the next pain point was going to be. And we'd have Monday, walk through
it all. Monday afternoon, we'd meet together. We'd come up with our three or four hypotheses on
what will break if our traffic doubles again. And we'd spend the rest of that next week addressing those the best we could and repeat for the next Monday.
And we did this for three, four, five weeks in a row.
And finally, it stabilized.
But yeah, it's all the small little things.
The things you don't know about, the limits and places you don't recognize that just catch up to you.
And you need to have a team that can move fast and adapt quickly.
You've been using Redis for six, seven years, something along those lines. As an enterprise
offering, you've been working with the same vendor who provides this managed service for a while now.
What have been the fruits of that relationship? What is the value that you see by continuing to
have a long-term relationship with vendors? Because let's be serious, most of us don't
stay in jobs that long, let alone work with the same vendor.
Yeah. So coming back to the March 2020 story, many of our vendors started to see some issues
here that various services weren't scaled properly. We made a lot of phone calls to a
lot of vendors and working with them. And I'm very impressed with how Redis Labs at the time was able to respond. We hopped on a
call. They said, here's what we think we need to do. We'll go ahead and do this. We'll sort this
out in a few weeks and figure out what this means for your contract. We're here to help and support
in this pandemic because we recognize how this is affecting everyone around the world. And so I
think when you get in those deeper relationships,
those long-term relationships,
it is so helpful to have that trust,
to have a little bit of that give when you need it
in times of crisis and that they're there
and willing to jump in right away.
There's a lot to be said
for having those working relationships
before you need them.
So often, I think that a lot of
engineering teams just don't talk to their vendors to a point where they may as well be strangers.
You'll see this most notably, because at least I feel it most acutely, with AWS service teams.
They'll do a whole kickoff when the enterprise support deal is signed three years ago past,
and both the AWS team and the customer's team have completely rotated since then,
and they
may as well be strangers. Being able to have that relationship to fall back on in those really weird,
really, honestly, high stress moments has been one of those things where I didn't see the value
myself until the first time I went through a hairy situation where I found that that was useful.
And now it's, oh, I'm now biased instead for, oh, I can fit into the free tier of this service.
No, no, I'm going to pay and become a paying customer.
I'd rather be a customer that can have that relationship
and pick up the phone than someone whining at people
in a forum somewhere of, hey, I'm a free user
and I'm having some problems with production,
just never felt right to me.
Yeah, there's nothing worse than calling your account
rep and being told, oh, I'm not your account rep anymore. Somehow you missed the email,
you missed who it was prior to COVID. And we saw this a couple of many, many years ago.
One of the things about Remind is every back to school season, our traffic 10Xs in about three
weeks. And so we're used to emergencies happening and unforeseen things happening.
And we plan their year
and try to do capacity planning and everything.
But we've been around the block a couple of times.
And so we have a pretty strong culture now
of leaning in hard with our support reps.
We have them in our Slack channels,
our AWS team we meet with often,
our Redis labs, we have them on Slack as well.
We're constantly talking about databases that may or may not be labs, we have them on Slack as well. We're constantly talking about
databases that may or may not be performing as we expect them to. They're an extension of our team.
We have an incident, we get paged. If it's related to one of those services, we hit them in Slack
immediately and have them start checking on the back end while we're checking on our side.
So one of the biggest takeaways I wish more companies would have is that when you are dependent upon another company to effectively run your production infrastructure, they are no longer your vendor.
They're your partner, whether you want them to be or not.
And approaching it with that perspective really pays dividends down the road.
Yeah. One of the cases you get when you've been
at a company for a long time and been in a relationship for a long time is growing together
is always an interesting approach. And seeing sometimes there's some painful points. Sometimes
you're on an old legacy version of their product that you were literally the last customer on,
and you got to work with them to move off of. But you were there six years ago when they were just starting out,
and they've seen how you've grown, you've seen how they've grown,
and you've kind of been able to marry that experience together in a meaningful way. still dreaming of deploying apps instead of hello world demos, allow me to introduce you to Oracle's
always free tier. It provides over 20 free services and infrastructure, networking, databases,
observability, management, and security. And let me be clear here, it's actually free. There's no
surprise billing until you intentionally and proactively upgrade your account. This means you can provision
a virtual machine instance or spin up an autonomous database that manages itself,
all while gaining the networking, load balancing, and storage resources that somehow never quite
make it into most free tiers needed to support the application that you want to build. With
Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime.
You know that I always like to put asterisk next to the word free.
This is actually free, no asterisk.
Start now. Visit snark.cloud slash oci-free.
That's snark.cloud slash oci-free.
Redis is, these days, a data platform. Back once upon a time, I viewed it as
more of a caching layer, and I admit that the capabilities of the platform have significantly
advanced since those days when I viewed fairly through the lens of cache. But one of the
interesting parts is that neither one of those use cases, in my mind, blends particularly well with heavy use of spot fleets.
But you're doing exactly that.
What are you folks doing over there?
Yeah, so as I mentioned earlier, coming back to some of the 12-factor app design, we heavily rely on Redis as sort of a distributed heap.
One of our challenges of delivering all these messages
is every single message has its in-flight state.
Here's the content.
Here's who we sent it to.
We wait for them to respond.
On a traditional application,
you might have one big server that stores it all in memory
and you get the incoming request and you match things up.
By moving all that state to Redis, all of our workers, all of our application servers, we know they can disappear at any point
in time. We use Amazon's spot instances in their spot fleet for all of our production traffic,
every single web service, every single worker that we have runs on this infrastructure.
And we would not be able to do that if we didn't have a reliable and robust place to store this data that is in-flight and currently being accessed.
So we'll have a couple hundred gigs of data at any point in time in a Redis database just representing in-flight work that's happening on various machines.
It's really neat seeing Spot Fleets being used as something more than a theoretical possibility.
It's something I've always been very interested in, obviously, given the potential cost savings.
They approach cheap as free in some cases.
But it turns out, we talked earlier about the idea of being cloud native versus the rickety expensive data center in the cloud. And an awful lot of applications are simply not built
in a way that, yeah, we're just going to randomly turn off a subset of your systems,
ideally with two minutes of notice, but all right, have fun with that. And a lot of times,
it just becomes a complete non-starter, even for stateless workloads, just based upon how
all of these things are configured. It is really interesting to watch
a company that has an awful lot of responsibility that you've been entrusted with
embraces that mindset. It's a lot more rare than you'd think.
Yeah. And again, sometimes we overbuild things and sometimes we go down paths that may have been a
little excessive, but it really comes down to your architecture. It's not just having everything running on spot. It's making effective use of SQS and other queuing
products at Amazon to provide checkpointing abilities. And so you know that should you
lose an instance, you're only going to lose a few seconds of productive work on that particular
workload and be able to kick off where you left off. It's properly using auto-scaling groups.
From the financial side, there's all sorts of weird quirks you'll see.
The spot market has a wonderful set of dynamics where the big instances are much, much cheaper
per CPU than the small ones are on the spot market.
And so structuring things in a way that you can co-locate different workloads onto the same hosts and hedge against that host going down by spreading across multiple availability zones.
I think there's definitely a point where having enough workload, having enough scale allows you to take advantage of these things.
But it all comes down to the architecture and design that really enables it.
So you've been using Redis for longer than I think many of our listeners have been in tech.
And the key distinguishing points for me between someone who is an advocate for a technology and
someone who's a zealot or a pure critic is they can identify use cases for which it's great and
use cases for which it is not likely to be a great experience.
In your time with Redis,
what have you found that it's been great at?
And what are some areas that you would encourage people
to consider more carefully before diving into it?
So we like to joke that five, six years ago,
most of our development process was,
I've hit a problem, can I use Redis to solve that problem?
And so we've tried every solution possible with Redis.
We've done all the things.
We have a number of very complicated Lua scripts
that are managing different keys in an atomic way.
Some of these have been more successful than others,
for sure.
Right now, our biggest philosophy is
if it is data we need quickly
and it is data that is important to us, we put it in Enterprise Redis, the cloud product from Redis.
Other use cases, there's a dozen things that you can use for a cache.
Redis is great for a cache.
Memcache does a decent job as well.
You're not going to see a meaningful difference between those sorts of products.
Where we've struggled a little bit has been when we have essentially relational data
that we need fast access to.
And we're still trying to find the clear path forward here
because you can do it and you can have atomic updates
and you can kind of simulate
some of the acid characteristics you would have
in a relational database,
but it adds a lot of complexity
and adds a lot of overhead to our team
as we're continuing
to develop these products, to extend them, to fix any bugs we might have in there. And so we're
kind of recalibrating a bit. And some of those workloads are moving to other data stores where
they're more appropriate. But at the end of the day, if it's data that we need fast and it's data
that's important, we're sticking with what we got here because it's been working pretty well. It sounds almost like you started off with the mindset of
one database for a bunch of different use cases and you're starting to differentiate into purpose
built databases for certain things. Or is that not entirely accurate? There's a little bit of that.
And I think coming back to some of our tooling, as we kind of jumped on a bit of the microservice
bandwagon, we would see here's a small service
that only has a small amount of data
that needs to be stored.
It wouldn't make sense to bring up
an RDS instance or an Aurora instance
for that in Postgres.
Let's just store it in an easy store like Redis.
And some of those cases have been great.
Some of them have been a little problematic.
And so as we've invested in our tooling
to make all of our databases accessible
and make it less of a weird trade-off between what the product needs, what we can do right now,
and what we want to do long-term and reduce that friction, we've been able to be much more
deliberate about the data source that we choose in each case.
It's very clear that you're speaking with a voice of experience on this, where this is not something that you just woke up and figured out.
One last area I want to go into with you is when I asked you what it is you care about primarily as an engineering leader and as you look at serving your customers as well, you effectively had a dual answer, almost off the cuff, of stability and security. I find
the two of those things are deeply intertwined in most of the conversations I have, but they're
rarely called out explicitly in quite the way that you do. Talk to me about that.
Yeah. So in our wild journey, stability has always been a challenge.
And we've always been in early startup mode where you're constantly pushing, what can we ship?
How quickly can we ship it?
And in our particular space, we feel that this communication that we foster between teachers and students and their parents is incredibly important.
And is a thing that we take very, very seriously.
And so a couple of years ago, we were trying to create this balance and create not just
the language that we could talk about on a podcast like this, but really recognizing
that framing these concepts out to our company internally, to our engineers, to help them
to think as they're building a feature, what are the things they should think about?
What are the concerns beyond the product spec?
To work with our marketing and sales team to help them to understand why we're making
these investments that may not get a particular feature out by X date, but it's still a worthwhile
investment.
And so from the security side, we've really focused on building out robust practices and robust controls that don't necessarily lock us into a particular standard like PCI compliance or things like that, but really focusing on the maturity of our company and our culture as we go forward.
And so we're in a place now, we are ISO 27001.
We're heading into our third year.
We leaned in hard on our disaster recovery processes.
We leaned in hard on our bug bounties, pen tests, kind of found this incremental approach
that day one, I remember we turned on our bug bounty and it was a scary day as the reports
kept coming in.
But we take on one thing at a time and continue to build on it and make it an essential part of how we build systems.
It really has to be built in.
It feels like security is not something that can be slapped on as an afterthought, however much companies try to do that.
Especially, again, as we started this episode with, you're dealing with communication with people's kids.
That is something that people have remarkably little sense of humor around, and rightfully so. Seeing that there is as much,
if not more, care taken around security than there is stability is generally the sign of a
well-run organization. If there's a security lapse, I expect certain vendors to rip the power
out of their data centers rather than run in an insecure fashion.
And your job done correctly, which clearly you have gotten to, means that you never have to make that decision because you've approached this the right way from the beginning.
Nothing's perfect, but there's always the idea of actually caring about it being the first step.
Yeah.
And the other side of that was talking about stability.
And again, it's avoiding the either or situation
that we can work in as well
along those two stability and security.
We work in our cost of good soul
and our operating leverage
and other aspects of our business.
And every single one of them,
it's our co-number one priorities
are stability and security. And if it costs us a bit more money, if it's our co-number one priorities are stability and
security. And if it costs us a bit more money, if it takes our dev team a little longer,
there's not a choice at that point. We're doing the correct thing.
Saving money is almost never the primary objective of any company that you really
want to be dealing with. Something bizarre is going on.
Yeah. Our philosophy on any cost reduction has has been this should have zero negative impact to our stability.
If we do not feel we can safely do this, we won't.
And coming back to the spot instance piece, that was a journey for us.
And, you know, we tested the waters a bit and we got to a point, we worked very closely with Amazon's team,
and we came to that conclusion that we can safely do this.
And we've been doing it for over a year and seen no adverse effects.
Yeah. And a lot of shops, I've talked to folks about, well, let me go into a consulting project.
Okay, there's a lot of things that could have been done before we got here.
Why hasn't any of that been addressed?
And the answer is, well, we tried to save money once and it caused an outage.
And then we weren't allowed to save money anymore.
And here we are.
And I absolutely get that perspective.
It's a hard balance to strike.
It always is.
Yeah.
The other aspect where stability and security kind of intertwine is you can think about
security as infosec and our systems and locking things down.
But at the end of the day, why are we doing all that?
It's for the benefit of our users.
And Remind as a communication platform and safety and security of our users is as dependent on us being up and available so that teachers can reach out to parents with important communication.
Things like attendance, things like natural disasters or lockdowns or any of the number of difficult situations schools find themselves in.
This is part of why we take that stewardship that we have so seriously is that being up and protecting a user's data just has such a huge impact on education in this country.
It's always interesting to talk to folks who insist they're making the world a better place.
It is, what do you do? We're improving ad relevance.
I mean, okay, great. Good for you.
You're serving a need that I would not shy away from classifying what you do
fundamentally as critical infrastructure. And that is always a good conversation to have. It's nice
being able to talk to folks who are doing things that you can unequivocally look at and say,
this is a good thing. Yeah. And around 80% of public schools in the US are using Remind in some capacity.
And so we're not a product that's used in a few specific regions all across the board. One of my
favorite things about working at Remind is meeting people and telling them where I work and they
recognize it. They say, oh, I have that app. I use that app. I love it. And I spent years in ads
before this. And I've years in ads before this.
And I've been there and no one ever told me
they were glad to see an ad.
That was never the case.
And it's been quite a rewarding experience
coming in every day.
And as you said,
being part of this critical infrastructure,
that's a special thing.
I look forward to installing the app myself
as my eldest prepares to enter public school in the fall.
So now at least I'll have a hotline of exactly where to complain when I didn't get the attendance message.
Because, you know, there's no customer quite like a whiny customer.
There's still customers.
Happy to have them.
True.
We tend to be.
I want to thank you for taking so much time out of your day to speak with me.
If people want to learn more about what you're up to, where's the best place to find you? So from an engineering perspective at Remind, we have our
blog, engineering.remind.com. If you want to reach out to me directly, I'm on LinkedIn. It's a good
place to find me. Or you can just reach out over email directly, peterh at remind101.com.
And we will put all of that into the show notes. Thank you so much for your time. I appreciate it.
Thanks, Corey.
Peter Hamilton, VP of Technology at Remind.
This has been a promoted episode brought to us by our friends at Redis.
And I'm cloud economist, Corey Quinn.
If you've enjoyed this podcast, please leave a five-star review on your podcast platform
of choice.
Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry and insulting comment that you will then hope that
Remind sends out to 20 million students all at once. If your AWS bill keeps rising and your
blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business, and we get to the point.
Visit duckbillgroup.com to get started.
This has been a HumblePod production.
Stay humble.