The Pragmatic Engineer - Building Reddit’s iOS and Android app
Episode Date: April 23, 2025Supported by Our Partners• Graphite — The AI developer productivity platform. • Sentry — Error and performance monitoring for developers.—Reddit’s native mobile apps are more complex than... most of us would assume: both the iOS and Android apps are about 2.5 million lines of code, have 500+ screens, and a total of around 200 native iOS and Android engineers work on them. But it wasn’t always like this.In 2021, Reddit started to double down on hiring native mobile engineers, and they quietly rebuilt the Android and iOS apps from the ground up. The team introduced a new tech stack called the “Core Stack” – all the while users remained largely unaware of the changes. What drove this overhaul, and how did the team pull it off?In this episode of The Pragmatic Engineer, I’m joined by three engineers from Reddit’s mobile platform team who led this work: Lauren Darcey (Head of Mobile Platform), Brandon Kobilansky (iOS Platform Lead), and Eric Kuck (Principal Android Engineer). We discuss how the team transitioned to a modern architecture, revamped their testing strategy, improved developer experience – while they also greatly improved the app’s user experience. We also get into: • How Reddit structures its mobile teams—and why iOS and Android remain intentionally separate • The scale of Reddit’s mobile codebase and how it affects compile time• The shift from MVP to MVVM architecture• Why Reddit took a bet on Jetpack Compose, but decided (initially) against using SwiftUI• How automated testing evolved at Reddit • Reddit’s approach to server-driven-mobile-UI• What the mobile platforms team looks for in a new engineering hire• Reddit’s platform team’s culture of experimentation and embracing failure • And much more!If you are interested in large-scale rewrites or native mobile engineering challenges: this episode is for you.—Timestamps(00:00) Intro(02:04) The scale of the Android code base(02:42) The scale of the iOS code base(03:26) What the compile time is for both Android and iOS(05:33) The size of the mobile platform teams (09:00) Why Reddit has so many mobile engineers (11:28) The different types of testing done in the mobile platform (13:20) The benefits and drawbacks of testing (17:00) How Eric, Brandon, and Lauren use AI in their workflows(20:50) Why Reddit grew its mobile teams in 2021(26:50) Reddit’s modern tech stack, Corestack (28:48) Why Reddit shifted from MVP architecture to MVVM(30:22) The architecture on the iOS side(32:08) The new design system(30:55) The impact of migrating from Rust to GraphQL(38:20) How the backend drove the GraphQL migration and why it was worth the pain(43:17) Why the iOS team is replacing SliceKit with SwiftUI(48:08) Why the Android team took a bet on Compose (51:25) How teams experiment with server-driven UI—when it worked, and when it did not(54:30) Why server-driven UI isn’t taking off, and why Lauren still thinks it could work(59:25) The ways that Reddit’s modernization has paid off, both in DevX and UX(1:07:15) The overall modernization philosophy; fixing pain points (1:09:10) What the mobile platforms team looks for in a new engineering hire (1:16:00) Why startups may be the best place to get experience (1:17:00) Why platform teams need to feel safe to fail (1:20:30) Rapid fire round—The Pragmatic Engineer deepdives relevant for this episode:• The platform and program split at Uber• Why and how Notion went native on iOS and Android• Paying down tech debt • Cross-platform mobile development—See the transcript and other references from the episode at https://newsletter.pragmaticengineer.com/podcast—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com. Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe
Transcript
Discussion (0)
AI coding tools.
I'm interested.
How much do you use the current ones?
What do you use?
What is working?
I think Apple's built in Xcode one is perfectly fine.
I use it a tremendous amount.
I feel like chat GPT has officially replaced my knowledge of Git.
I can never remember how to use Git invocations properly.
So I just pop open a window and be like, hey, can you please reeducate me on this?
I use a lot for that kind of like ad hoc things where I don't have an idea of where to
start.
I've noticed that they are starting to do a lot more when you ask it a question.
It can generate a lot more complicated.
Like it'll give you a integration.
kind of web project or something like that.
I'm excited to look back into that,
but yet on the mobile side,
I've asked it a couple of specific wrinkly questions
that I actually have to solve on a day-to-day basis.
I've never once gotten good code for those.
But it's a good rubber duck if you're trying to break down a problem
and carve it up into chunks and ask it focus questions.
You might get decent code.
So in personal projects, I've used Gemini's built in Android Studio.
It's honestly pretty good.
I don't use it to ask questions and have it write code for me.
Honestly, I still don't believe that that's going to work very well.
But as like a very fancy, efficient auto-complete, it's fantastic.
Reddit completely rewrote its iOS and Android apps starting in 2021,
and did so without most of us noticing any of it.
But why did they do it and how?
Today, we're revealed details by talking with three members of Redis mobile platform team,
Lauren Darcy, Brandon Kobolensky, and Eric Kock.
We cover the sheer size of Reddit,
how they employ more than 200 mobile engineers,
have around 2.5 million lines of code for both iOS and Android,
more than 500 screens and hundreds of modules.
How Reddit changed their API from Rest to GraphQL
and why this was more tricky to do than most assumed it would be.
Why Android chose Jetpack Composed when Composed was still in Alpha
and why iOS decided not to use Swift UI at the time
and why the iOS team is reversing course now, and many more details.
If you're interested in how a full remote mobile engineering team operates
and makes decisions and want to understand the challenges
of building mobile apps used by tens of millions of users,
this episode is for you.
If you enjoy the show, please subscribe to a podcast on any podcast platform and on YouTube.
Welcome to the podcast, all of you.
Thank you.
Thank you.
Happy to be here.
So, sorry enough, can we get a sense of what is the scale and shape of the Reddit code base?
And right now we're talking about the mobile code base, of course.
Yeah, I guess I can start with Android.
There's some like standard scale metrics.
I don't know, two and a half or so million lines of code, about 800 modules.
But the thing that really surprises people is we have about 500.
180 screens.
You know, you think of Reddit and you think of like the feed and the post detail screen,
maybe a handful of others, but to this day, I do not understand how we have this many
screens or where they all go.
But we have a lot.
Brandon, you want to give the iOS take on that?
The iOS site is actually, I think, a pretty similar size.
We're behind on having like a more specific concept of screen that like I would trust.
But my favorite answer to this question is like, we're large enough that if you ask me like
what scale we are, I have to ask you.
clarifying questions of like, well, do you care about generated code? Do you care about specific
kinds of targets? Because like, we've got a thousand basal targets, but I would have to take 15
minutes to categorize them or ask Brentley to figure it out. But yeah, it's big enough that we
don't have to prove that it's too big. And I guess one question for this size of code basis
for those who are not as familiar with, let's say, mobile, is the compile time. On back end and
on web, it's usually not a big deal, although in TypeScript, it could be, but a mobile usually is.
You know, these type of code bases, a million line plus often would compile 10 minutes, 15,
sometimes even longer than that. What is your compile time, like from fresh?
Again, this question causes me to make a face, because if you end up being a Reddit iOS developer,
that ends up being like, hey, my build time, because I think if you do the entire app, everything,
it's probably about 30 minutes, but that'll cache a bunch of things. And after that, it's going to be
way faster. But if you are exposed to a 30 minute build time, we're probably going to see you
on a chart and be like, hi, why are you building the whole app? Would you like to be more productive?
Can you try a couple of these different techniques? Because it's really hard to make that any
faster. Yeah, on the Android side, incremental builds are not bad at all. I think our clean builds
are somewhere in the neighborhood of eight to nine minutes. But that's with remote build
caching and a lot of things that we have to kind of help things out.
Yeah, but just to clarify, as you said,
random as well,
you wouldn't really do a clean build except maybe on your first day or,
you know,
when you join or if you've been away from,
I don't know,
a long time or you do something.
It's just a rare event,
right?
Like most devs on day to day,
you're going to do like small incremental bills that we're talking are not
beyond minutes and probably even faster.
Yeah.
And essentially,
we try to get people into like selected focus targets or like,
I think we have a couple of names.
for this because we've had it for a long time.
We've had varying versions of this tech.
Like our playgrounds, we want you to take a selection of those targets and do like a focus mode.
I think other places call it.
And in those builds, like you'll vary from if you're in the feed area where you have a tremendous amount of dependencies,
maybe that'll take 10 minutes, but the small libraries that I might be working on, it's like 30 seconds to a minute.
You like, it doesn't take any time at all if you're outside of the main project.
And as context, you mentioned your number of the number of,
the lines of code, which we know is not a perfect metric,
but above a certain size, it does give you a sense, right?
Like Uber has a similar size, for example.
And again, a lot of Starburst will have a lot, a lot smaller.
What is the size of a team that maintains all of this, right?
All of Reddit's native apps?
It's a great question.
We have two dedicated platform teams for Android and iOS.
They're about 10 to 11 engineers apiece on a good day.
And they support about 200 mobile engineers across the company on different teams.
Wow.
I want to say we have about 20 feature teams.
And they are set up like a fairly typical feature team with some Android, some iOS, some web engineers,
maybe some dedicated backend engineers, an EM, a PM, a part designer, that sort of situation.
But what's special about the platform teams is they tend to be homogenous.
So everyone on the iOS platform team is an iOS developer.
They're all together.
Oh, wow.
So they develop a little bit more subject matter expertise to support all of the feature iOS developers in their spaces.
Yeah, because at most places, the mobile platform team would have a mix of like 50% Android, 50% iOS, usually the ones I've seen.
Yep.
We work more like sister teams instead of having one team that is both platforms.
We also have a web platform team and a UI platform team that owns the design system.
This episode is brought to you by Graphite, the developer productivity platform that helps developers create, review and merge smaller code changes, stay unblocked, and ship faster.
Code review is a huge time sync for engineering teams.
Most developers spend about a day per week or more reviewing code or blocked waiting for a review.
It doesn't have to be this way.
Graphite brings stack pull requests, the workflow at the heart of the best in class internal,
code review tools at companies like Meta and Google to every solver company on GitHub.
Graphite also leverages high signal code-based aware AI to give the developers immediate actionable
feedback on their poll requests, allowing teams to cut down on review cycles.
Tens of thousands of developers at top companies like Asana, Ramp, Tecton, and Versel rely on graphite
every day. Start stacking with graphite today for free and reduce your time to merge from days
to hours. Get started at GT.Deft slash pragmatic.
is G4 Graphite, T4Technology.dev slash pragmatic.
This episode is brought to you by Sentry.
Buggy lines of code and long API calls are impossible to debug,
and random app crashes are things no software engineer is a fan of.
This is why over 4 million developers use Sentry to fix errors in crashes
and solve hidden or tricky performance issues.
Centruca's debugging time and half,
no more soul-crushing lock sifting,
or vague user reports like, it broke, fix it.
Get the context you need to know what happened, when it happened, and the impact,
down to the device, browser and even a replay of what the user did before the error.
Central will alert the right div on your team with the exact broken line of code so they can push a fix fast
or let Autofix handle the repetitive fixes so your team can focus on the real problems.
Central Help Monday.com reduce their errors by 60%
and spend up time to resolution for next door by 45 minutes per dev per issue.
Get your whole team on Century and Seconds by heading to century.io slash pragmatic.
That is S-E-N-T-R-Y-O-S-Pragmatic.
Or use the code Pragmatic on sign-up for three months on the team plan and 50,000 errors per month for free.
I have to ask this question because people are already asking in their heads,
why does Reddit have so many mobile engineers?
And by the way, I did get this when I was in Uber as well,
and you've gotten this so many times before, right?
But again, Eric, you already mentioned the 580 screens, which some people will just not really believe.
But we've all mentioned now more than like 100 mobile engineers, which is a lot.
Like this will make you one of the largest native mobile engineering organizations globally.
Maybe not the single biggest one, but probably top 10.
It's really scary when you say it that way.
In one team that is, right?
Like you will have, I don't know, Google if you add up, they'll have a lot more combined, but not in one app.
Yeah.
So we talked about the screens.
We have a lot of feature surfaces.
But there's also a lot of mobile developers working in areas like building our safety features and building our eventing and our logging and our experimentation platform.
So not everyone is focused on, like, and our.
ads platform also is an entire organ to itself.
So while a lot of the bigger teams are teams like the feed and the post detail experience,
the things that many Redditors know and love about Reddit, there's an entire team just
devoted to supporting mods.
So once you get into all of the different sorts of areas, plus an entire Dove platform
that is trying to extend the experience to third-party developers to actually build on Reddit,
It gets big fast.
The other big investment we saw, I want to say like a year and a half to two years ago,
is we have a lot more, we don't call them S-steps, but we actually have a lot more automation folks.
So a good portion of our actual builds are part of our test infra investment,
which a couple years ago we didn't have much test infra,
and now we have a wide variety of test infra.
So they account for quite a few of the builds that we see running today, too.
And then by testing for an automation, is this end-to-end test, integration test,
some fancy AI stuff?
Like what kind of mix?
I'm assuming it's going to be a mix, right?
I was about to say yes until you added the AI part.
Brandon, Eric, you want to talk about the different kinds of tests we have in our test pyramid?
I guess the other part of this is we did have some hackathon projects that
did some AI testing stuff.
They're just not something we use in tooling a lot.
But testing-wise, we have, actually, I think we pulled metrics for what our code coverage is.
But basically, we ratchet to enforce, like, unit test coverage in various areas.
We have definitely a lot of integration tests.
And we also use a lot of E2E stuff or like black box testing for, like, accessibility assertions and that kind of stuff.
There's a lot that we actually do with it.
We also do weird stuff.
Like I've been working for a while on using the UI testing infrastructure to, like,
exercise the app and then try to connect it to leaks to do memory leak analysis inside of our end-to-end testing.
So, yes, the answer to your question is we have all of those in a pretty good mix.
And is this simulators or do you actually have a device lab where like these things are deployed and you're holding your hat?
So probably.
I completely, well, yeah, I completely forgot that we also have, we partner with a vendor to do that as well.
Yes. So every kind of test.
I think we've come a long way.
When I joined, we didn't have coverage metrics,
but I want to say we're like maybe 2% unit test coverage.
And nothing beyond that.
So this is something that's like really grown within the last, I don't know,
two, three years where I think our testing story is quite a bit better than it used to be.
But I think we still have a long ways to go, honestly.
I think we have the pyramid established.
but actually kind of filling in all the bricks on the pyramid has some ways to go.
And like you've done way more testing than most engineering teams will do it,
especially, you know, start-up starting out will not even dream of doing so.
What has been the benefit?
You know, you put in a bunch of work, right?
Like, because I do hear engineers asking like, why bother, right?
Like, it's going to be a lot of effort.
They're hard to maintain.
We know end-to-end tests, especially in mobile, are slow, et cetera.
What have you gotten out of this?
So I, if,
we go back a couple of years, like Eric was saying, we had almost zero percent test coverage,
and we were having a lot of incidents in prod that were easily preventable.
And so there was a whole shift left approach of trying to find those things before our users
were telling us about them at scale. And so I think that our test info became a part of that,
but it was actually really hard to stand up in the first place. And that was because we were in a
tech debt situation that had really, really long build times with no test in front
place yet. So we didn't have a lot of capacity that we could say, hey, we're willing to
increase our build times to add tests in to be more protective. So you have a kind of
perfect storm of rapid growth in our engineering orgs, so lots more people running builds.
Our builds are already really, really long. And we want to introduce all this test coverage.
and we're constantly having incidents with the fact that we don't have test coverage in place.
And we were mostly relying on manual QA from an overseas vendor with like a 12-hour turnaround.
So fixing things really, really difficult.
So I think trunk-based development, so shifting where we're testing and what artifacts were testing,
introducing automation ideally in places that we felt had value first.
So if you're in a post-mortem and you notice that a test could have,
resolved this in a way that we would never have this problem again. Those are the tests that we
really tried to evangelize doing first. Tests like that issue or similar to that issue. It's a lot
easier to convince people those are tests that pull their weight and are worth running. But yeah,
we did have ratchets in place to just kind of increase testing across the board. And a lot of
those kind of contracting firms we brought in to actually help us build tests and build
build a culture of testing.
I will say so the downside you brought up are very real.
We've had lots of issues with flaky tests.
You know, people, like I said, we started with virtually no coverage.
And so now all these engineers who maybe have never written a test in their life
are now trying to test their code that was not written to be very testable.
So we ended up with a lot of bad tests and a lot of flaky tests and a lot of pain around it.
So it is very real.
Yeah.
TechDet exists anywhere, regardless of whether it's in your app or your test code.
One of the little tidbits that was really helpful to me is we had someone who joined Reddit after we'd kind of gone through this process.
And we were talking about some of our frustrations with the testing.
And he's like, this is the first place I've worked at that, like, added a ratchet and like it worked.
Like, yes, we have flaky tests.
If you have tests, you'll always have some flicking it.
That would just be true.
But like, you also have a lot.
Especially mobile.
Yeah.
you also have a lot of tests.
And when Eric was referring to some of these iffy tests,
one of my favorite side effects of trying to build tests is you're at least adding code
paths to, like, inject your dependencies.
And like you can at least reach that code and test,
which means even if you can write a bad one today,
you can probably write a good one tomorrow,
much simpler than starting from zero.
There's a,
the side effects have absolutely been worth it.
So you're doing a lot of modern and during practices.
One of the things that is the most modern and, you know,
maybe the future.
So there's a lot of pipe around it is AI coding tools.
I'm interested, how much do you use the current ones?
What do you use?
What is working or how much does it help you or even if it helps?
And I'm asking this because especially with native mobile, you know, these tools are not nearly as cutting edges as they are, let's say, on web or some of the kind of JavaScript frameworks.
I'm just interested in like today, you know, like just honestly like what are you getting out of it?
And what is working?
What is not really working?
This is a little bit of a tricky one to answer.
So in personal projects, I've used, you know, Gemini's built in Android Studio.
It's honestly pretty good.
I don't use it to ask questions and have it right code for me.
Honestly, I still don't believe that that's going to work very well.
But as like a very fancy, efficient auto-complete, it's fantastic.
Yeah, I think Apple's built-in X-code one is perfectly fine.
I use it a tremendous amount.
Like, I feel like Chad GPT has officially replaced like my knowledge of Gets.
I can never remember how to use Git like invocations properly.
So I just pop open a window and be like, hey, can you please reeducate me on this?
I use it a lot for that kind of like ad hoc things where I don't have an idea of where to start.
I've noticed that they are starting to do a lot more like when you ask it a question,
it can generate a lot more complicated like it'll give you an integrated kind of web project or something like that.
I'm excited to look back into that.
But yeah, on the mobile side, I've asked it a couple of like specific research.
wrinkly questions that I actually have to solve on a day-to-day basis.
I've never once gotten good code for those.
But it's a good rubber duck if you're trying to break down a problem
and carve it up into chunks and ask it focused questions.
You might get decent code.
You're always going to have to revisit it.
But yeah, I think most of just the personal project, bug standard ones.
I do wonder if these tools will see a similar thing with what we're seeing with native mobile
development.
You know, native mobile development tooling is different than web, then backend.
We've always had, you know, if you're not iOS, I mean, you're doing it right.
Xcode has as different functionality, Android Studio or JetBrains ID he has.
And the challenges are just different.
So I feel there what we might see, you know, certain teams using it and other domains
might work better and, you know, maybe mobile will either catch up or be slower or, who knows,
maybe it's a different fit.
I think we are using it in a couple of ways, though.
So I agree with Eric that rapid prototype,
with the Android Studio Gemini features are pretty cool.
That is a lot of fun to do on the side
and for validating ideas
before you actually make them official Reddit ideas.
So is this fair for me to rephrase that,
you know, for AI coding tools specifically for mobile,
it somewhat useful.
Like, you know, some people use it here and there.
But sounds like the bigger use case
is you actually using this technology,
LLMs, and elsewhere, to add to your workflows,
you know, and eventually maybe even your deaf tools.
But sounds like that's kind of a little bit more promising and exciting path.
Did I sense this right?
Yeah, I think that's right.
I think our test insights are also somewhat powered by AI now, but we're not really using them as much as we could.
I would love to see them more.
We had a lot of investment in test infrastructure, but there's a lot of duplication in it.
And when a team goes and looks at how many tests they actually need and how many are P0s,
it's usually a very small subset of what is currently sitting there.
Being able to actually figure out what tests are pulling their weight post-Ratchit
is a great place.
I think that we might see AI in use.
And let's talk about the interesting topic that we wanted to talk about.
You did a really big tech mobile modernization starting from maybe 2021.
Can you tell me what was the state?
why did you decide to suddenly make a big change, which, as I understood, also led to hiring more native engineers.
And how did it go?
That's a big topic for us that we really like to talk about.
Let's see.
Let me try and set the stage.
So, 2021, Reddit is growing as a company.
They are trying to establish themselves more as an international company instead of a,
U.S. domestic company.
And they're seeing the world shift from a, like, web-based space to a mobile-based space,
especially overseas.
That led to a whole bunch of new strategy, which led to a whole bunch of hiring across
many, many teams, especially in the mobile space.
And so we start seeing a team that was probably in the, like,
40 engineers, 50 engineers across both platform size and within like two years were around 200.
And that is the space in which you start from a place of having really, really poor build times,
like two and a half hours type of a thing with no test infrastructure, but all of these big goals.
And it's also around that time that platform teams really started to form officially.
Brendan, you can correct me if I'm wrong here.
It was within that year or so where we kind of transitioned from a company, like many companies,
they kind of beg and borrow platform support from everybody to keep things up and going,
to saying we need proper platform teams that are going to actually help us scale this way.
I think that time frame is about right.
Oh, no, actually, that's definitely right.
Because I was like, this is my brief dip as an EM in this thing and I got to hire a couple people.
But we were kind of, at least from my perspective on iOS, we had always been about three people supporting however big the org was.
So like the iOS platform team was three from the time that we had 10 or 15 engineers until about the timeframe, Lori is describing.
And yeah, it was a lot.
It was a tremendous amount of work.
I can remember specific incidents about this.
But yeah, it was really a time where we could finally start developing, at least the start of, like, specific areas of expertise instead of platform engineers just being experts in everything and then having to pretend that it wasn't stressful in context, whichy.
So one of the things that happened, though, is we're hiring all this great talent because we really want to invest in mobile.
And they're showing up and onboarding is rough.
They have never seen build times anything like this.
if you ask them what the text deck is,
we're like, what team are you joining?
It depends on what team you're joining,
and then we'll tell you what your tech stack is, maybe, if there is one.
And so we're getting a ton of feedback from our existing developers
and the new people showing up that DevX is not bringing them any joy.
Feature teams have these big, ambitious new goals,
and the feature team hiring, they join together, they form a team,
and they can't execute because our codebase is really slow
and they can't figure out how to actually like deploy safely.
All of the things are kind of coming up at once.
Yeah.
Yeah.
I was say one of my favorite, actually least favorite, I guess,
stories about this was when I joined.
So I guess I'll say I'm generally all four people on any team
getting to know the finer points of their tooling and the tech stack.
But there's one fact I strongly believe that
a single feature engineer should ever know.
And that is at what point their CI provider decides that it's more likely that a process
is stuck in an infinite loop than that you're actually still trying to build.
But when I joined, we weren't so lucky.
I regularly had to retry PRs after they get timed out on CI after two and a half hours.
And you know, like a lot of these were from decisions that made a lot of sense when we were
small teams, small code base.
And we just didn't have the staffing to readdress them as we grew.
So yeah, we kind of reached a point where something had to be done.
So it wasn't just DevX, though.
This was also bleeding out into the users and their experience.
So this is around the time where we probably had our all-time worst stats on mobile across both platforms.
Didn't show the ratings or feedback.
All of the above.
I started in 2021.
And the week I started our Android.
start up time was 13 seconds at P90.
I think, Brendan, our iOS was around 7 to 8 seconds, P90.
It was the worst of before Bolt.
I was pulled out of my onboarding the first week for a feed incident that had no
observability.
We couldn't figure out what was going on, even that we were not showing feeds.
So we were basically blind to the problems that we had.
we were just constantly reacting to problems at that point.
And so I think that all of this together became a really good, like, pivot point for the company to say,
these are all becoming really big liabilities for us.
We need to actually have a plan for how we're going to resolve our devx issues so that we can actually build what we want to build.
But also to dig ourselves out of the hole that we were in from a, from a, like,
existing tech debt perspective, it needed to be a pretty comprehensive plan.
So that kind of sets the stage for how we bring together a group of people to define our tech
stack and we'll go from there.
And then what was your journey in the modernization?
So what were the things that you put in place?
We previously talked about before the recording about a mono repo, which is an interesting one.
early on executive leadership asked us for a prescription for what was going to be our kind of like big solve and ideally could we have a kind of umbrella effort that that kind of trademarked the entire approach and that's how we ended up with a branded name for our tech stack which is core stack so this is for the new stack right this is this is our answer to everything um and i'll very quickly tldier it
At the bottom of it, it is, how are we going to organize our code?
We agree that we are going to organize it in a mono repo for each platform,
modularized primarily by feature.
Above that, we're going to use a modern programming language.
We're going to commit to using the modern programming language for the given platform.
We were already, one thing that was helpful was that a lot of these particular choices were already well underway.
They just needed to be blessed and pushed into the primary way we wanted people to work,
which made it actually pretty easy to go down for most of these choices.
But we were in the middle of a GraphQL transition from a rest company.
So we committed to being a GraphQL first client and then a GraphQL only client.
Above that, we see a little bit of divergence across the two platforms, but we settled on MVVM of some flavor.
Do you want to explain why we chose MVVM?
Eric, Brandon?
Sure.
I mean, a lot of it was kind of forced by, we haven't talked about this yet, but our UI framework choice of Compose.
So we were previously an MVP Model View presenter architecture pattern, which was even when I joined felt very out of date and a little odd.
But it was fine, right?
Like it's better to have consistency than to have a few things here and a few things there.
So we just kept going with it, despite it not really being anyone's favorite.
But the issue was kind of forced when we chose Jetpack Compose because it is a reactive framework.
so we can no longer use imperative architecture patterns to drive reactive framework.
So we kind of had to settle on something else, which, you know, the industry has more or less
settled on MVM or MBI or some flavor of that.
And then, sorry, just for those who don't know, so MBVM model, view view model, right?
An MVP model view presenter?
Yes, yes, that is correct.
And what is the difference between the two, right?
They sound pretty similar.
You have the model.
we have the view and then you have a third thing, but they're different.
Yeah, so really what it boils down to is imperative or reactive.
There are several other differences, but that was the main driver for us.
Using a reactive UI framework, we were kind of forced to stop using imperative logic to drive it.
Brendan, how about the iOS side?
So iOS, this was a little bit more complicated.
So at that point, we were evaluating Swift UI and kind of like, where we're,
we were at. I think our back deployment version on that was still iOS 14. So SwiftUI was,
I think, technically possible. I can never remember. It was actually 13 at the time.
Oh, but Swift UI introduced some problems from that perspective, and it also was just like,
it was still pretty new because we're talking about 21. It was three years old at that point,
and it's still a little, still pretty fresh. So we opted for a layer on top of basically collection
view we called Slice Kit. I'm pretty sure we've had some blog posts about this. I can't remember exactly.
We talked about that for a while.
It had a lot of benefits.
There are some cool things that if you imagine wrapping a screen in a collection
view behind the scenes, there's some really nice things like dynamic type was trivial
to just get that stuff working.
I think Conrad's pretty happy with how accessibility can work in some of those areas.
But ultimately, we started running into issues because writing a declarative wrapper around
UI kit is tremendously complicated.
And it turns out even Apple is not perfect at doing it.
So if UI, I would argue, is what we wanted at that point, at least from the code we wanted to write.
It wasn't just quite ready yet.
So we're revisited in some of that, but I think we'll talk about that in a little bit.
Yeah, it was a pretty legit journey, though.
Yeah.
So we choose MVM for both of our platforms with a slightly different answer for what our UI framework is going to be.
We choose Compose on the Android side.
We choose SliceKit on the Android side.
iOS side. One is an industry standard and one is an internal choice, let's say.
And then on top of that, we build our new design system, which is ideally going to solve
for a lot of the feature delivery pain around people constantly reinventing the wheel.
I think that at one point there were like 15 spinner controls in each client.
And there was a guy on Reddit that just like continuously found new ones and posted about them.
I love that guy.
But yes, so it wasn't particularly revolutionary except in small parts.
A lot of it had already been started and committed to.
But some of these projects had been going for like three to six years already and not finishing,
which had left us in a place of the worst of both worlds.
when I got here, if you had an incident, our client was half planted in the old rest world
and half planted in the new GraphQL world. And you had to know which team and which project
and which endpoint you were talking about before you could solve for even where to look at the
observability for that element. And so you could vastly reduce and simplify for everyone.
then a lot of things would get easy.
And our promise, wait, our promise for Korsdack
was if you actually followed our simple prescription,
platform teams would be here to build easy mode tools
to provide golden paths
and to generally make it easy for our feature teams
to focus on building the best features
instead of struggling with the tools and the architecture.
So starting from the bottom in terms of the API.
You mentioned there was a there was
a transition from rest to GraphQL. Why did that happen and what did that mean for the mobile clients,
especially that, you know, as we know, for native mobile, you can't just like ship a new
version and forget about the old versions. You either need to have backwards support or you need
to have some sort of kill switch, which forces you to update you on WhatsApp is, I think,
for people using it, they will say, saying, oh, this version of the app cannot be used. You need to download
a new one. But until, until things like that are built, you can't really do that. So,
Like, what was the impact on, you know, how long did that take?
And how do you respond?
Oh, my gosh.
It would be funny if I actually did the math on how long it took.
Like, I think right now, it's true that we're not in an active GraphQL migration.
But for most years of me being at Reddit, there's been some kind of that migration.
Like the, we were tackling it early on.
One of the shoutouts to Aiden, who was in charge of our, like, original iOS GraphQL implementation, is like,
that system is actually still in the code base today.
So we got five years out of that thing at least.
But it provided a lot of just benefits in terms of contracts
and knowing what the mobile app is actually doing
in which client is even doing a request.
One of the happy accidents I actually like
is that because we have to name our GraphQL requests
in a specific way, there are a couple of incidents
that got solved in 15 minutes because we were like,
oh, wait, well, that call is only deployed on Android
So this is an Android-specific issue.
So the, like, Max thing is here.
So, like, iOS on-call people, you can take off.
Like, we don't have to worry about this.
But it was a lot.
But a lot of cool benefits.
And we got to build, like, sort of our own Apollo internally.
That finally proved to me that we definitely should just use Apollo
instead of trying to roll our own.
It turns out it's a complicated thing to build.
And then just to hammer home, the difference between rest and GraphQL.
Rust is APIs and then JSON, right?
And then you need to do all your validation.
With GraphQL, you get more type safety.
Do you?
Yeah, like, oh, definitely.
So the nice thing is you can go and read the GraphQL spec.
It is actually like somewhere on the internet.
I actually had to read that a couple weeks ago.
Yeah, there's an explicit contract.
You're going to get types.
There's going to be a schema that enforces those types.
And if you think about a older website like Reddit,
that you can still probably find some of our old school like API documentation.
A lot of those endpoints are like older than some of the people who listen to this podcast.
Like Reddit is 20 some years old and there, Eric kind of alluded to this before.
There were a bunch of reasonable decisions that were made when we like shipped those.
Like after 10 years, some of those contracts have to be reworked.
So GraphQL was a great opportunity to like apply a layer on top of our like old services that established a clear contract and like
allowed people to actually understand what clients want and what, you know, the back end needs to
surface. I would never go back. Go ahead, Eric. I will say, though, like, I think we took a little bit
of a naive approach. We definitely have some learnings out of the migration. Really, what we did,
at least to start out, was we took our rest API and did a one-to-one migration to GraphQL,
which is not how GraphQL is supposed to work, right? Like, we take these very flat models and we
pull them over to GraphQL and then we don't get that type safety really. We get some nullability
safety, but that's about it. So we still ended up with just massive amounts of client side logic
that didn't belong there. And we still have a lot of these issues today around too much logic
in the app or things having to be redesigned in kind of weird ways or even just overfetching. A lot of
times we fetch massive amounts of data that we never even touch just because it was in the rest
API, so we probably need it. So we made some mistakes. We have some learnings and, you know,
it's the kind of thing we're still improving every day. And then for this API migration, you know,
the API, the backend, the API change from rest to GraphQL, how did it go in terms of like,
was it the mobile teams telling the backend teams, hey, this is what we need, this is what the app
uses? And obviously there's a web team as well, or was it more the other way around where the
backend team is like, okay, we're exposing this thing, here's what you're getting? Because
it's always a question of kind of who's, who's the, who's the app.
leading in terms of, you know, you have the teams who are closer to the users, but there's also
the backend teams which have a lot of considerations for performance, functionality, maintainability,
that kind of stuff.
That's one of my favorite things about GraphQL actually is, as long as you have some amount
of collaboration, you can usually come up with something that both people are happy about,
or both sides are happy about, I guess.
You can, I mean, you have a query and you can query only for the very specific fields you need,
can kind of form them into the data models the way you like them.
So the direct answer to your question is it depends on what team you're on.
Sometimes the backend developers are leading it.
Sometimes the client side are.
Sometimes there's a lot of collaboration.
But usually both sides can be happy just due to the flexibility of GraphQL.
I think initially the migration, like the web team probably started it.
But I feel like they pretty immediately pulled in like platform people, at least,
I'm sure Aiden was in there, like, immediately as soon as Alex had the idea.
Like, so I think it started from there, but pretty quickly, like, we were in the room being
able to ask questions and provide feedback for sure.
Go ahead, Lori.
I was actually going to take a different answer on this one.
My entire experience at Reddit has been around a backend GraphQL team having been established
and really helping drive the migration away from Rest and our old infrastructure to a GraphQL
and then a federated GraphQL model that they have been.
evangelizing and like they've been excellent partners to the client teams but they have definitely
driven a lot of that adoption and helped us like build our actual GraphQL experience as as client
engineers and some of that is because when we go out into the the wider talent pool most people
come in the door at Reddit not knowing GraphQL up front that has changed over time but it was
a lot more unusual in the time that we were growing. Most people came in with a rest background.
They had maybe even written a Reddit app themselves before they joined the company.
And so I think that our journey here involves us learning how to write idiomatic GraphQL
over time, learning what happens when you don't. And one of the other things I think that
had an impact on this particular project is our initial GraphQL implementation was slower
than our old rest infrastructure.
It had a latency tax at the, like, we got the flexibility in all the good parts we wanted,
but we had to take it with a latency tax up front for a while.
And so when we transitioned the main Reddit clients to GraphQL,
they were noticeably slower on the same features we had been running on our rest infrastructure before,
and we had to decide if that was a tradeoff we wanted to lean into and commit to and work through
or if we're going to rethink our approach.
Ultimately, we took longer to actually move over to our GraphQL infrastructure because
we worked with the GraphQL backend team to reduce that latency and basically make it much more
performant. And so by the time our clients moved over, it was in a much better place.
But there was a point in time where it was frustrating for future teams to be migrating in the
direction of, uh, with a latency hit, um, especially when we were really sensitive to,
to our app being slower for users at the time. Yeah, this is the, that migration. Uh, there's
a comic about migrations at Google, uh, Manukorinet, the, who drew it, uh, it's, it's like
there's two paths to the systems.
There's a path that doesn't work and the one that's being deprecated.
And it's the usual store.
And he's got a request to like whatever he went to like translate it.
Because it's usually what happens with migrations, right?
Yep.
But I asked both Brandon and Eric last week if they feel like GraphQL was the right solution
for our size at the time.
And what did you say?
I don't know if it was the right size of the time.
But I feel like this is such a great example of like, A, one of my favorite things about
Reddit and B, why you should.
just hire tech experts and let them cook, is like, yes, it was slower at the initial time.
But now, like, because we have that, we can do a bunch of code gen.
We can try to solve like a bunch of different tech problems.
And there's a bunch of optimizations that we can do because we have one thing.
We only have one kind of like networking layer to maintain.
Well, that's probably not true.
We probably have too many still.
But at least we can optimize the GraphQL one.
But because we, you know, took that or made that bet early and let people,
figure out how to make it more performant I'm pretty happy with it I'd do it again
yeah nice and then how did the architecture choices play out so you mentioned
that on on iOS you decided to go with an in-house UI framework at first
where where did that work out and where are you right now as ever we are in a
period of transition so I I would still say that well actually no I would I would
fight somebody about it it was absolutely worth it for one very
reason like a big chunk of the actually let's talk about a few things uh one the feed code uh in our ios
app is like a very very central component and it spreads its sort of uh design choices i think
pretty so just so the feed code the feed is when you open it you see your feed or like feed of
different reddits right or different subreddits when you think about either uh just popping
open the app and seeing what we would call your home feed like the automatically like the
automatically populated list of posts that we think you're really going to like.
That infrastructure is actually shared between, well, a lot of it is shared between actually
the subreddit feed if you look at a specific community.
And historically, there was a pretty sort of nasty cluster of like type hierarchies in profiles
as well.
Basically, anything where you would show a vertical scrolling list of posts is probably the same
infrastructure.
Before this, that was based in texture.
And if you're listening to this, you probably,
don't remember that framework or what it was, but it was something Facebook started and then
Pinterest shepherded for a long time that allowed you to do asynchronous and multi-threaded UI.
So that created a tremendous number of complexity. If you were like remembering Laurie's
examples of like somebody who comes in who doesn't know a particular kind of like pattern at Reddit,
nobody who we onboarded new texture. So we wanted to get out of that. The first commitment to
Slice kit was a massive win in that we just are back in Apple's nice main thread UI land.
However, we ran into a lot of issues where it's really hard to reuse the components.
There's some boilerplate that we just can't quite crack.
So what we've been working on for a while and how we've been kind of going about this is,
I really strongly believe that if you're building infrastructure, the new thing should at least be
able to be hosted in the old thing.
So what we're doing is we're saying, hey, if you're using Slice Kit, we're going to figure out ways that you can put individual Swift UI-based cells in there.
And maybe you can easily A-B test on those individual cells.
So you can figure out is it performance, like, is it accessible?
Like, does it meet all of your needs?
And you can kind of roll out at your own pace.
And then we're essentially adapting that principle so that if you imagine like our vertical breakdown, of course, stack, we are replacing eventually Slice Kit with Swift's,
as we prove out areas of this thing and make sure that it, for example, Swift UI list is a little bit
per suspect right now because it's really complicated.
But essentially, we're replacing it on a many-year time scale, but we're doing that by making
sure that the teams who are currently leveraging it have support for everything that they shipped
in prod today.
I think one of the interesting things about this is that when we gave our kind of Kors-Tack
prescription, a lot of us intended for.
it to be a starting place that we could make, once you separate the concerns out a bit and you
have an established pattern, we can build a roadmap away from that pattern if we decide that
there's something better out there. But a lot of people kind of took it as, oh, when we have
introduced all the new stuff, we'll be good forever. And we will never have to change our minds.
But a lot of the people who are trying to evangelize and build out this idea, we're trying to plan
for change for the long term because we knew that anything that we chose in the past,
we chose with a reason.
We had reached a point where those particular choices were not holding up over time and
we needed to change them.
But we understood that we probably would be having the same conversations, ideally three
to five years later, about whatever we were doing now.
So Slice Kit and Compose were bets on both sides.
They had different, like, risk profiles,
associated with them.
But we wanted to reserve the right to not have a like one way door there where we
couldn't change our minds later.
And even the Slice Kit choice had an like intentionally revisit this choice on a
regular basis for the proper time to consider Swift UI.
And how did it work on an Android?
Because you, you took a bet on Compose and just like Swift UI, Compose back in 2021,
one was pretty new.
Like right now,
everyone,
it's pretty accepted.
It's a great way to build things.
In fact,
I'm not even sure there's really a competition of how to structure like a
model.
But back then,
this was not the case.
It was like,
here's a new,
new framework.
You know,
we think it'll be good,
but you don't know.
So I want Eric to answer this question,
but one thing I want to preface it with is when things are quite bad,
you have a lot more opportunity
to just take bigger bets and get away with them if they pay off.
So, Eric, why did we take the bet on Compose as early as we did, given how big...
Yeah, I don't know how proud of this we should be, probably not super proud.
But when we started working on our first Compose feature, it was still in alpha.
We shipped while it was still in beta.
Wow.
Yeah.
The reason we were able to do this, like Lori said, was that we knew we needed change, and
this looked like the next big thing. At the time, I was still on a feature team and I was working
on a brand new feature, like completely Greenfield, perfect fit for trying something brand new.
You know, so I took a bet. I built the whole thing in Compose. I used NVVM instead of MVP,
kind of introduced a whole lot of new patterns that seemed like the way to go. And the reason that
I was comfortable doing this was because I was also willing to go back and rewrite it if the bet
didn't pay out. You know, I was the sole developer on this feature. I completely owned it.
I, if it didn't work out, it was a terrible decision, I was willing to own that and, you know,
make it right. And just to confirm, because, yeah, I think what people might miss is like Reddit,
you know, like it's a massive app, like large amounts of users are using it. This is not just like
your, you know, your startup with 100 users. And this feature that you did, what was it like,
so it was like a nice to have feature or not something critical or it was critical, but you were
just like, okay, I'm going to make it work. It was, I wouldn't call it critical. So it's now gone,
but it was called Reddit Talk. I don't know how many people are familiar with that, but it was
the like live chat room, like voice chat feature that we'll call house is taking off, right?
Yes, exactly. Yes, everyone was copying that for a while. So yeah, it was just kind of the perfect
opportunity. So it wasn't, it was launched to a very small audience. I think it was only like two
subreddits had the feature at first, and then we were, you know, slowly scaling it as we proved
it out. So it wasn't a super risky bet, but there was some risk there. Yeah. Well, reputation, right,
still? It still only took like less than a month to adapt, talk as it first came about into the
golden example of what we ultimately decided was our very, fairly straightforward set of tech stack choices.
And so it was easy to adapt.
So it basically proved out the greenfield work as a starting point for our new prescription was going to work just fine.
Adapting some of our more legacy code, our larger critical path features came with, like, they needed more than even our KorsDak prescription was going to give.
So they extended it in all sorts of interesting ways.
Now, one thing that you mentioned that you're doing,
which is a very interesting here, server-driven UI.
And this is interesting because we talked about this with Lori,
but a lot of companies about a certain size,
especially when there's native mobile,
they will come up with something that might feel familiar to,
let's say, either React Native or just something else with the goal of,
you know, being able to drive some of the UI back in changes.
What was your approach here?
Why did you do it?
How did you build it and how do you feel about it?
So, well, it's not an official part of our,
course stack. Part of the course stack idea is that people can extend or experiment with anything
they think is going to work well on their surfaces because they're not all the same. We have
massive feed surfaces and we have chat surfaces and we have profiles and they all really need
different levels of like complexity and capabilities. So we definitely had some teams try out server
driven U.I. I think, Brandon, you might have a good example where it worked out well.
Yeah, I think for a lot of those tremendously complicated, but one of the things that I think
has been shipped for a real long time is our reporter flow has a, like our content reporting
if you find something that you don't think should be on Reddit and you want to report it.
I would say that has some server-driven UI in it, but since you're like driving reporting
reasons where you're really just trying to ship like a list of strings, like I think they chose a
a reasonable level of complexity where it makes sense for that team.
So it works really well.
But other cases, not so much.
Eric, did you have an example of the nasty?
Yeah.
The first place we tried out server-driven UI was our biggest, most important service,
which is maybe not where you want to start with experimentation generally.
On the surface, though, it made a lot of sense.
Like, we had tons of logic in each client that went into determining how to display posts
within a feed.
You know, overly smart clients aren't a good thing.
And the logic didn't even match between clients.
Like, there were times where you hold an Android and an iPhone next to each other and, like, the feeds would look different, not intentionally, just because we got the logic wrong on one or the other.
And nobody knew which one was right.
But despite all the good reasons that went into this, it was a decision that we've kind of ended up regretting to an extent.
And they were currently working to walk back from.
It turns out that even if the feed itself doesn't need the full post models to be able to display correctly,
everything it links to still does.
So, you know, if we make the feed server-driven UI and it no longer has any knowledge of the backing models,
we just end up having to do double fetching.
We have to fetch the server-driven UI definitions, then we have to fetch the actual models.
And this just introduces double the opportunity for errors.
And it has led to lots of bugs that users see in production.
I don't know if you've ever used the app
and you've tried to tap on a post and nothing happens.
That's because one or the other of the fetches failed.
So we're working on it.
It's interesting because server-driven UI is something that everyone eventually goes to
because of the native mobile apps because the nature of mobile is,
if you only have client-side logic, you need to ship that code,
maybe put it behind a feature flag, but it just takes, you know, if you want to add new business
logic, it will take you at least a week or two weeks or however long it takes. So eventually
everyone comes to a realization, what if, you know, A, what if we just had a webpage wrapped into
an app and that doesn't work, but the next specs thing is back in TrivenuI? What if we just have
the business logic as a JSON or something that we'll download from the website and it should work?
But I've seen so many times where like, again, like most companies don't talk about it.
this publicly, but it becomes sour. Like, it just doesn't really work out because I think eventually
what I've seen realize is like, you know, we need to be backwards compatible. We need versioning.
We need to support older clients. We're now running into, oh, we want this feature that we didn't
think about back then. And now we could add it, but our older version doesn't have it. What is your,
what is your take on why is this not taking off in mobile land in general, right? Like it's,
it seems there's just, you come around and learn a lot of these things. I, I, I, I, I,
I actually think that we shouldn't give up on the idea, all right?
I think it's like any idea that we have had and failed at or whiffed.
You can say people, you can say agile and it means different things to everybody.
You can, you can say modularization and you can name just as many times that has failed to deliver on its promises as it has worked.
And so I don't actually think that there's a real problem with the idea.
I think there's a problem with an implementation that works for the shape of the problem.
And we don't spend enough time making sure it is like, in this case, I don't think we took enough time to make sure it was going to solve the Reddit shaped problem as opposed to server-driven UI.
It makes sense.
But don't actually like apply it to the actual reality of what our codebase and information.
infrastructure looks like. There were parts of our infrastructure that were not ready for it.
We did not have an end-to-end commitment from everyone to work on the exact same plan at the
exact same time in order to support it. But I've also seen that happen with trunk-based
development and other things where it was like the fifth time you try, you get it right.
And then everyone is really happy with it. So I don't think it's necessarily the prescription
that is the entire problem,
I think it's a underdeveloped implementation
and plan in place
to make sure it can work in the cases that you want it to work.
I want to double-click on what Lori is saying
with a specific example.
So just in case any product manager is listening
who wants server-driven UI,
the reason that managing that complexity is so important
is that we know server-driven UI is possible,
and it can be done really well
because web browsers exist.
But that's the upper limit of this complexity.
And I don't have enough time to build Chrome really, really well.
So, like, if you want server-driven UI and you want the full feature that you're
specifying in a web browser, well, all you have to do is spend several billion dollars.
If we control complexity well, however, we can definitely solve this problem.
Just please, no web browsers.
But you also see that, like, WebView implementations in mobile are often not great
experiences for mobile developers.
So, like, there are ways.
in which we've tried to bridge that at different times, and sometimes it really works,
and sometimes it does not work. But I don't think we should stop trying on those, like, on these
ideas, because I think we all agree that our clients being simpler, having less logic in them,
and being more nimble, it's like those are all capabilities we want. We just haven't figured out
how to deliver them well. And in the meantime, we have weekly, sometimes more than once a week
releases in order to keep the ability to change production, nowhere near what our web platform
can do in their continuous delivery pattern.
But it's pretty much the fastest you can deploy on mobile.
Oh, yeah.
There's also like, I feel like for Jetpack or for, excuse me, for Compose and for Swift UI,
because the atoms are different, we might be in a case where it's easier to try some of the
things using those modern, like, excuse me, the modern frameworks that are closer to DSLs.
Because if you think about a server-driven UI thing, it's probably really hard to do that in UI Kit.
UIKit is not declarative.
So if you're shipping a declared spec, it probably will change a lot.
And so you've done a bunch of work over the last like three or four years, starting from this modernization.
You move to compose on Android, Swift, well, moving to Swift UI on iOS, added in the monorepo.
We didn't talk about it.
but you added code ownership.
And you just did a bunch of things that made things better.
In the end, of course, you onboarded so many new mobile engineers as well.
Was this effort worth it?
Like, how can you tell how this big modernization, the core stack effort, played off?
I think we have really good answers to this on the DevX side.
I think we have really great answers to this on the user side.
I know that our sentiment has improved from our developer sentiment,
and we are serving so many more developers than we were before.
And pretty much every stability and performance metric
has benefited from the changes that we've made.
But who wants to go first?
Android or iOS?
Sure, I'll go.
Yeah, I mean, like Lori said, our developer surveys have shown that people are just far happier than they used to be.
Those developer surveys were rough for a while.
And now they're overall, like, they're pretty positive.
But a lot of things have gotten easier.
Onboarding is easier.
We don't have to coach people into the, you know, myriad of patterns that we have.
When I used to interview people, you know, they were all excited like, yeah, Reddit sounds so cool.
I really want to work here.
What's your tech stack look like?
And I would tell them and I could just see their face like, uh, that sounds terrible.
And that's not the case anymore.
Like people are excited.
People like want to work in this kind of code base.
So we've seen a lot of really good stuff come out of it.
Even the types of engineers that we can hire,
we can hire more junior engineers on the feature teams
because you don't need to know all this, you know, legacy knowledge
that there's no reason to know anymore for the most part.
But yeah, there's obviously all the runtime user-facing benefits as well.
but I don't know.
I'm a developer.
I care about developer experience.
Brandon, where's the texture crashes?
I think iOS is a little lagging behind in that, but I think, well, emphatically it was worth it.
Like, if we wouldn't have done that, I can't imagine how the heck anybody would be productive
with the expectations that we would have had.
But emphatically worth it.
And I think even in cases where I feel like the tech needs to get better,
it allows us to have a sort of specific path and a specific focus for people working on this.
Like I love that we now have like iOS people who are the de facto GraphQL experts.
That was not always the case.
And now as they like iterate on that tooling and on those technologies, like everybody gets benefits.
So we can progress in that area independently while I'm, you know, trying to figure out how we're going to get Swift UI into this thing.
And structured concurrency also introduces a lot of different.
and close on this. We have paths to, like, activate these projects. And I would much rather be in
this universe than the alternative. So I can give you some concrete examples of this. We used to
see Slack messages in our, like, shared guild channels. And you would have to figure out what
somebody was doing at a really detailed level, because it could be anything before you could help them.
And so it was actually really hard to mentor people, especially in a remote first company. You had to get a
whole bunch of context before you could answer a question about why their build was broken or what
they were trying to achieve. We have a much higher level of shared understanding when the question
gets asked at this point because most people are following goldenish paths most of the time.
So there's a lot less frustration, a lot less burnout on the people who are both the subject matter
experts and the people who are trying to become them. We did a whole bunch of analysis of
our devX improvements to try and prove out, not just by surveys, but with some sort of
quantitative analysis, if they were worth doing. I really wanted to be able to kind of take the
product approach of could we defend this sort of effort, like we would defend a product
initiative. And we definitely saw that breaking up the monoliths and modularizing and giving people
stronger ownership of their code areas, improved a ton of productivity signals.
People were building new features like Reddit Recap with half the amount of people they did
the previous year.
They were able to finish their features sooner, so they would add these extra bells and
whistles to their actual deliveries, which had unintended consequences.
like they would introduce more animations.
And we'd be like, where did those come from?
Oh, no, our performance.
But no, all of those had like real business benefits
from a developer efficiency and productivity perspective
while increasing developer joy at the same time,
which was the kind of developer productivity we were after as platform team.
The feature from a user side,
pretty much every one of our metrics that was,
bugging us at the beginning of this effort is in a much better and stable place,
and it very rarely sees regressions.
Our crash rate benefited from going to Swift and Kotlin.
Just from the nullability perspective alone,
we see very few NPE problems in production.
Hot fix is associated with obvious avoidable problems like that.
I think we were up like 1.5 percentage points on crashery.
which is a ridiculously great improvement on Android.
iOS was always a little bit more stable because they don't have quite as many devices in the wild.
But they also saw, I think, almost a 1% improvement over that sort of time period.
I never remember the percentages.
I only remember when the problem completely goes away.
Right.
So we are at a point of diminishing returns on investing in that space.
And we can focus on, we then focus.
on startup times. Those are down under like three to four seconds, even in the P90 plus sort of
range. And they've stayed there for years and people have noticed, which allows us to
basically move on to another problem we want to solve, like scroll performance or video
performance and working with the video team and stuff like that. So by kind of making these
improvements and then putting policies into place to keep them relatively stable in those
places without big regressions, allows everyone to go redirect their time to what is the most
pressing thing that is in front of us either as a company or bothering our users the most,
and try and focus on improving that space. And we can do that in part because everyone is kind of
a little bit more similarly shaped, which allows us to make a huge amount of assumptions.
We can assume that we can build a tool that will work for most of them most of the time,
which we now do.
We build a lot more tools and scripts and things to make things easy.
And we can build observability such that when we actually go looking at what the problem space looks like,
we can actually build a map of what it looks like as opposed to it looks like a very strange world map with really strange boundary lines
where you can't actually see the whole problem because everyone is doing something different.
I want to double click on one of the things Lori said.
In case there are any perspective like modernization enthusiasts in the audience,
It's like we're not doing this because like we think we have a better solution than the feature teams.
Like I feel like there's a bunch of boxes at Reddit where I don't want you to have to think about how to interact with our GraphQL service.
I just want you to like think about your feature and I want you to be able to be creative in the ways that she was describing.
I love the idea that like you're adding animations and you're like you're actually bringing your creativity as an engineer to like the features you're building.
the reason we're doing this is not to take flexibility away from people.
It's to eliminate a bunch of boring things so that you can do the actual important work.
If you're trying to do modernization, do it in service to your stakeholders, do it in service
to the teams that you support, not because you think Swift is a cool language, not because
you think Swift UI is a cool thing, because it has a benefit to the teams you support.
That's a huge reason for us behind this.
I agree with that.
I think that's one of the reasons why we focused on some of the places that people do not gravitate to, like dependency injection.
Like can we make that less painful for people?
Oh, yeah.
Because nobody loves dependency injection with the possible exception of Drew until he was in charge of it.
And now it can never not be anvil guy.
But we picked the places with pain points for our feature teams, not the ones that we were trying to like constrain them to Brendan's point.
And then you're in the enviable position of being a mobile platform team that is actually iOS and Android.
So two pure native mobile platforms is quite rare in the industry.
And also working with a large native mobile engineering team, which obviously you need a larger team for a platform team to make sense.
My question is, what does it take for an iOS or Android engineer to work at a platform team like this?
in terms of when you're hiring, what are the trades that you're looking for?
Is it a lot of internal transfers I'm asking?
Because for a lot of mobile engineers, this is a little bit of a dream to get there at some point if they're lucky enough to have either their organization or, you know, to look for the select few companies that do have this thing.
What are ways that mobile engineers who are doing either native or cross platform can work towards skill sets or experience to increase their chances to later join a team like this?
I'll take this first.
So the people who have joined the teams in the last couple of years are a great mix of internal transfers and going out into the hiring pool and hiring externally.
We often find people grow in their feature orgs to be a top performer and they know everything about their scope in that space, but they have frustrations that they would really like to get resolved by,
platform solutions, they would also impact other teams. They are great recruits and we have gotten
some really great people come into the platform teams who then just immediately turned around
and helped solve real problems at Reddit with us and helped us understand them and scale
them to everybody else. So I can think of a couple people who've joined our teams that absolutely
meet that sort of like pattern. We've also gotten some really great talent out from other companies
who've brought in fresh ideas from other companies at scale,
as well as from startups who are scrappy,
and Reddit is kind of messy.
And we often have zero to one problems that it does not take at scale experience to solve.
It takes somebody looking at the problem and going,
I know exactly how to make this 100% better,
and just going and doing it.
So taking ownership over something and being practical
and having first principles is extremely important.
And then the last thing that I'm always looking for is,
so it's a little different between the platform teams here,
but I have a really robust rotation of DevX.
Everyone on my team doesn't necessarily want to be DevX all the time,
but they want to give back on the DevX side,
at least part of their job, like at least 25% for everybody.
Some people want to completely specialize on the DevX side,
but a whole bunch of people actually really enjoy helping other engineers work through their problems and such.
So I'm looking for that interest in giving back to other internal engineers as well as thinking of end users at the same time.
But if they do join, they get to work with awesome people like Brandon and Eric,
who probably have a different perspective on what they're looking for on their platform teams.
My joke answer is that I don't know why someone would choose to do this.
I just sort of kind of like stumbled into it.
It's very stressful and it's very hard.
Having said that, I would not choose any other position.
I want to be, you will drag me from this team kicking and screaming.
I think Lori provided a good perspective.
I want to talk about like what I think is the most important thing for an I see who's
who was trying to have a job on a platform team.
You need to sit in the consequences of your decisions.
Like, you need to, in my opinion, you should try to work at a tech company for a year or two
and actually see what happens after you ship a system and then the assumptions change and you
have to figure out how to keep this thing going.
There are systems that we have, no joke, been in our code base for five years.
and the platform team has been like evaluating whether to replace them or how to keep them going.
That's incredibly hard.
And I think that's, we have very talented software engineers across like our iOS org, right?
But I think the big difference between folks who are working in features and working this is it's more reasonable for a feature team to like take their code and after the product doesn't work.
They just kind of delete it and go on.
Eric mentioned like talk is not in our code base anymore.
Most things that I've had to write at Reddit are still in the codebase.
I wish they weren't.
But you have to understand.
And I think what you get out of that is you get a bunch of software design intuition
because you have to like reevaluate your assumptions for an incredibly long time.
If you can do that, you're probably ready for platform stuff.
But take your time.
I don't want to burn you out before you're ready because this is hard.
I want to interject real quickly there.
one of the things that Brandon mentioned that I think is really important,
and it definitely comes up in all of our interviews for platform roles,
what we're not looking for is somebody that just wants to go introduce platform work,
that they are not going to use themselves or dog food,
and assume they can solve something without understanding the ground.
And so that is one of the reasons why we like internal transfers
or even feature team to platform team transfers or growth,
simply because if you take product thinking in here,
we're just looking at a different set of users.
And if you do not take the feedback cycles from your developers,
you will absolutely miss on what you try and deliver to them as a platform team.
And that is a problem that we're always looking for.
Like, are we open to that feedback?
Are we actually solving the problem that is the real problem at Reddit?
Eric, what about you?
Yeah, a huge plus someone to that.
I don't think platform work is a great first job, although we have someone on our team who's doing
great at it as their first job. But I don't think that there's really a blueprint for
becoming a platform engineer. Some people are jack of all trades because you need that to some
extent. You have to know how the build system works, how to make a screen more performant,
just how the compiler works. There's just so much knowledge that you need. And then others are
just specialists. We have, you know, build experts. We have performance experts. We have people who
just go really deep in a certain area. So I don't know if there's a certain thing to do to become
a platform engineer other than having experience across a lot of different things and just having
an interest in it, I guess. I do think that remember that you're talking to a couple of people
I would entertain the argument that when I joined Reddit, it was probably not an air quote
startup, but like it was a startup kind of environment.
But like both Eric and I like started in at Reddit when the teams were small enough that it was easier to get to that impact.
So one of the like things, you know, if you're trying to get onto a platform team, it's probably easier to go to a startup and get a bunch of that experience than it is to go to a like a scaled platform.
Because like our bar of what we expect our engineers to do has gotten way higher.
Like my job now is a staff compared to my job like I think I was a staff four or so years.
ago, completely different, completely different expectations. So I think, like, being in a smaller
work where you can get exposed to a lot of those different things that Eric is talking about is
probably an easier path, but you should still apply. You might not get accepted this round,
but I'd still love to talk to you about how we do this. The last thing there is that, like,
I want to push back on platforms being elite teams a little bit. We're not. We're no smarter than anybody else.
Mostly from the perspective of I'm always worried we're going to introduce something that's going to reduce the psychological safety of this group.
They need to be able to break things in order to make big improvements.
And so having the ability to say you don't know an answer to something and not having any sort of problems with that.
And being able to just like really grow through blameless post-mortem culture is extremely.
important to platform teams. Otherwise, you always play it safe and you usually just continue to
introduce friction to teams instead of actually solving their real problems to help them, like,
reach their maximum potential as feature teams. It's too easy to be like, we'll just add another
lint rule so they avoid that footgun and suddenly you have a gazillion of them and everything
is slowing down because of all of their checks. So I think there's also just the brilliant
an A-hole sort of platform
archetype that we specifically are not interested in.
We like the brilliant, humble really wants to work with other people
and help them solve their problems too.
It is definitionally like a position of service.
My job is to help engineers be more efficient.
It's not to ivory tower designs and then pretend that I'm smarter than the US.
I'm really not.
Now, this is nice to hear, and I think it's just a good reminder.
As companies grow, like, there's more opportunities.
So, like, if you're lucky enough to work at a company, which is hard to predict.
But if, you know, there are trajectories and when companies live up to expectations, they will grow, it's just a lot of paths open up.
May that be going to a platform team, may that be changing stacks, may that be going to a leadership position or, you know, transferring teams working on a lot of them.
It's just a lot easier.
And on the other hand, companies that either shrink or stay kind of flat, everything will be a lot more challenging and difficult there.
I would say that like, befriend your platform team is the best way to become a platform engineer someday.
If you have a platform team, befriend them during hackathon weeks or do some mentorship inside projects.
Almost everyone has done something like that before they end up joining our team.
but also we really like having really good partners on the feature teams themselves
because they are very honest with us about what their real problems are
and they are the best source of that.
So there are also people who just never want to join a platform team
because they're really user-focused and that's the end user,
the user is the user they really, really want to focus on with their career.
And they are also some of our very best partners
because they tell us what they're trying to achieve
and we try and help them figure out how to do it.
They get early access to tech that's not ready yet.
We have to find people who want to use Swift UI so we can deploy it at scale so we can verify all these issues.
If you're getting buddy-buddy with us, you're 100% going to be the first person in my head of like, oh, well, search is a great partner for us on the iOS platform of like, yeah, we'll get feedback from them and see what Alex thinks of this.
Meanwhile, ads is a really great limit because they have to monetize every feature, whether it's in legacy or monitoring.
and therefore they are very sensitive to slow migrations.
So yeah, we go to different teams for different reasons and have, like, there's lots of ways to be part of the platform answer, even if you're not on a platform team or if you choose to specialize in it.
So as closing, let's just do some rapid questions where I'll shoot the question and then you tell me what pops up.
So what's the framework that you really like for its usability or elegance?
If we're choosing usability and elegance, I have to pick Swift UI.
But I was thinking about this question.
I think my answer is actually Lib GraphQLJS,
because that's the last framework that I interacted with that taught me a lot.
Would recommend should read.
I think for me, it's composed.
You know, I love how maintainable and testable it is.
It's just so much less boilerplate than anything I've used before.
Just really solid, you know, unidirectional design, events work their way up,
state works its way down.
And, you know, all these principles are appealing for far beyond just the UI layer.
You know, I know when you have a hammer, everything starts to look like a nail.
But, you know, if I'm being honest, a lot of things have been looking like very appealing nails lately.
And, you know, at Reddit, we do use it for both our UI and our presentations layer, our presentation layer.
And it's been really great for us.
I do also like Compose the more I use it.
I was out of the actual I see a world for a while, but wild.
but I have side projects that I've been really enjoying it again.
I picked a framework that is not a technology framework.
I am often thinking about the Westroom typology of organizational culture.
It's in the Accelerate Book, and I think about it a lot.
It basically says that all cultures will eventually, like,
settle into a pathological power-oriented culture,
a bureaucratic rule-oriented culture,
or generative performance-oriented culture.
And when I first read about it,
I thought the whole answer was to keep moving people away from power cultures
through bureaucratic into generative,
and that the best place for high-performing teams was in generative.
But now I have kind of come around to,
you have to choose the right tool for the job,
and there's a reason why, for example, a QA team or a deployment,
situation might actually be more rules oriented and have a lot more process around it than say how certain other teams could work.
So how do you actually work together when you have different cultures within teams?
How do you get anything done when you have different incentives?
So that's something that is, I think, of every people problem as a system design problem.
And so my answer is there are three types and what is the inner up between them.
And then as closing, what's a book that you would recommend and why?
I thought about this forever and I have to choose the K&RC book.
Like, I feel like it melted my brain a little bit when I really, really understood that like object-oriented programming is like just see structures with function pointers inside of it.
Like go through that book, type out every color.
example they have in it and I promise at the end of it, you will be a confused programmer,
but you will be better. Can you repeat the book again?
Sorry, the KNRC book, like classic Kernighan and Richie C book?
Did I get those names right?
Carnegie and Ritchie. You did. You did. I have it in eight different forms.
Is it like 35, 40 years old?
And the book's probably older than me.
Okay.
Still works.
It was a Bible for a reason. It is a good book.
So good.
So I read a lot.
I'm a really big fan of taking a first principle and trying to get a whole bunch of people to try it at scale in order to, like, change one small thing.
So I decided my recommendation right now is Kent Beck's tidy first.
We read it as a team last year and had a whole multi-day conversation about how we could incorporate.
Yeah, the little cat book. It's also extremely skinny, so you're not investing too much, but it has a lot of great ideas in it.
It's skinny, but lots of good ideas.
So, yeah, like, I would suggest, like, pick up the book and then just you can open it up to any part and just pick one thing and try it for a few weeks and see if it changes your flow.
But that particular one, I think, has launched a thousand little ideas with our team.
I think for me, this is where I start to lose.
my credibility as an engineer. Project Hail Mary is my fiction book recommendation. Fantastic.
Couldn't put it down. Awesome. Well, thanks very much for this interesting conversation on what
it actually takes to build an app that is a lot more complex on the inside than a lot of people
assume who downloaded it and use it. This was great. Tons of fun. Thank you. Yeah. This was lots of fun.
I hope you enjoyed this behind the scenes, look into how Reddit redid the core part of their native iOS
and Android apps over the course of several years.
Thanks very much to Lauren, Brandon and Eric,
and you can find all of them in the show notes linked below.
For more details on mobile engineering challenges,
you can find longer deep dyes into Pragmatic Engineer as linked in the show notes.
I also wrote a book titled Building Mobile Apps at Scale
that contains more pointers on this topic.
If you've enjoyed this podcast, please do subscribe on your favorite podcast platform
and on YouTube.
This helps more people discover the podcast,
and a special thank you if you leave a rating.
Thanks, and see you in the next one.
