Programming Throwdown - 185: Workflow Orchestrators
Episode Date: November 4, 2025Intro topic: Asymmetric ReturnsNews/Links:NanoChat by Andrej Karpathyhttps://github.com/karpathy/nanochatPydantic AIhttps://www.marktechpost.com/2025/03/25/pydanticai-advancing-generative-ai-...agent-development-through-intelligent-framework-design/1000th Starlink this yearhttps://spaceflightnow.com/2025/05/16/live-coverage-spacex-plans-morning-launch-of-starlink-satellites-from-california/ChatGPT Apps SDKhttps://openai.com/index/introducing-apps-in-chatgpt/Book of the ShowPatrickThe Will of the Many by James Islingtonhttps://amzn.to/43IfU8QJasonInterview with DHH (Founder of Ruby on Rails)https://www.youtube.com/watch?v=vagyIcmIGOQPatreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrickFactoriohttps://www.factorio.com/ Jasonnip.io Topic: Workflow OrchestratorsWhyBatch jobs (embarrassingly parallel)Long-running tasks (e.g. transcoding video)Checkpointing/resumingHowMessage QueuesContainerizationWorker Pools & AutoscalingHistory & BackfillSteps to run workflows:Containerize the workflow definition and send to the cloudContainerize all the individual tasksSubmit job(s)ExamplesAirflowLegacy but dominantDagsterGreat UX for python developersTemporal: https://temporal.io/The new hotnessRayLow-level but very powerfulKubeflowDesigned for ML workflows, integrated dashboard ★ Support this podcast on Patreon ★
Transcript
Discussion (0)
Programming Throwdown, Episode 185 Workflow orchestrators.
Take it away, Patrick.
Welcome to another guaranteed to be fantastic episode.
That's right, yeah, it's money back.
I nailed it.
We've been doing this too long.
Okay.
I'm going to try something a little different.
I was thinking about this.
I don't have a good formulation of it.
So you can sort of help me.
All right.
But it's sort of been coming up in a number of things.
And the sort of most obvious one is when people talk about venture capital, which is, you know, investors, mostly tech investors.
And they invest in, you know, 100 companies.
But really, they're expecting sort of tend to do really well.
they're hoping the rest don't, you know,
they just assume they basically go to zero,
they kind of don't care,
and then they hope, you know,
one or two,
1,000 X or whatever
and pay for the risk
they take over the entire pool.
And, you know,
part of that is this sort of like asymmetric returns.
And then I began thinking,
it pops up in sort of other things.
You know, people were talking,
you know,
most recently I was listening to someone talking about,
about health recommendations,
like how much protein should you consume?
And people have this, you know,
oh, maybe you did this much, maybe that much, maybe.
And then we sort of said, you know, up to a certain point, it kind of, you know, is very, very
important.
And then past another point, you know, it's, you kind of get this like nonlinear relationship.
And then I began to think about it.
And they kind of said something that led me to thinking about it in terms of, in terms of work,
which is if you set a certain target goal for how much you learn, you know, in a given day
or continual learning or, you know, how much output you want at work, I think.
internally we have this, you know, sort of everything gets kind of normally distributed that
maybe some days you're going to be a little bit above, some days a little bit below, some days,
you know, very rarely a lot above, you know, very rarely a lot below. And in my head, at least,
you know, I think we do this thing that pops up intuitively over and over again where things
kind of clump together and become sort of normally. And we think that the returns there end
up being, you know, kind of not super impactful, like they even out, basically, that the unders
and the overs, overs, you know, are okay. But what I began to realize, and sorry, trying to bring
it full circle, it's a little cloudy in my mind, but back to the VC thing, this health thing,
that they were sort of saying, you kind of need to set your targets a little higher, because
actually, depending on where you are on this sort of like return curve, right, that is not flat.
it's not, you know, just one single slope that if you are at one of these inflection points
or lower, then the good days can be sort of 10x and the bad days are, you know, negative
some amount. And if you're really low, what ends up happening is those negative amounts are
sort of larger than the big amounts and you can sort of lose ground or, you know, not move
forward versus if you're higher up on the curve and you set your target, then even
the low days, you know, you're still sort of moving forward. And so when I talk about like learning or
output at work, if you, you know, sort of set your target as like my median is I get a little bit of
good stuff done. And some days I get a lot. Some days I don't. Some days I make more work for myself by
doing something stupid or saying something dumb at work, right? You kind of think about this as like
evening out to like slow, small steps. But in reality, setting yourself up for at least some of those
home runs. So setting yourself up.
positionally but also just where your target is like in how you expose yourself it sort of drifts
away sorry my analogy is breaking but like you sort of shift yourself into a position where you're
you know you still have days that are below average but you're above average days give you this
exposure opportunity for just outsized growth right you're taking those really really hard home runs
at work you're you know going for just you know really stretch opportunities and thinking about this
you know it's just an interesting thing i've been juggling with which is i think a lot of
myself included a lot of the time I don't give myself this opportunity to have this
sort of outperformance, this exposure, this VC 1000X, right? You know, we talk about investment.
You do the boring. Everyone always says, just put all your money in the S&P 500 or the Russell
5,000, you know, just put all your money in there, this very average dot return. But maybe some
amount of it, maybe some small amount, you got to take those like really big swings and you've got
to limit it. But if it, you know, 1,000x, it could carry the whole portfolio. Yeah. So I can
I can riff on this. I mean, so I don't know if I mentioned this in a past episode, but there's an awesome book from Nassim Nicholas Taleb, The Black Swan. Have you read The Black Swan? Yes. Yep. Yeah. So, yeah, I'm a big proponent that there are black swans and there are white swans. In other words, like, there are... Wait, is white swan the opposite of Black Swan?
Well, White Swans, okay, first of all, white swam is a term I literally just made up five seconds ago. But the white swans, but the white swans is the term, but the white swans.
I would define it as like, you know, it's an unexpectedly high return.
Okay, okay.
Because by definition, using your VC example, you know, the VC funds, I've heard things like
they do 100 companies and they expect one company to make up for the other 99, so it's even
more extreme.
But so what that means is if put yourself in the chair of a VC, when you talk to one of these
hundred companies, you know, you're kind of a.
expecting them to fail because, you know, statistically, they're going to be one of the 99, right?
And at some point, you know, you figure out which one the one is, I'm sure. But I would consider
that kind of like, like you're betting on a opposite of a black swan. Like, you're betting on an
unexpected, like huge boom, right? And so, yeah, I think that, you know, otherwise you wouldn't,
you wouldn't really take the risk, right?
If there wasn't the opposite of a, let's call it the white swan,
the white swan wasn't there, the VC wouldn't do what they do, right?
If one company, you know, 5xed and the other 99 went to zero, that wouldn't work.
So they're kind of counting on that.
And I think that is kind of a good way to look at things.
I think that there's sort of certain moments where you have kind of eureka moments,
or you have kind of a burst or you just get lucky.
And so kind of budgeting for those kind of moments,
I think makes a lot of sense.
And being kind of ready to really capitalize.
That's the other thing is the VC hopefully is set up to where,
you know, when they do find the one,
that they can then, you know, 10x triple down on that one.
So same kind of thing.
Like you're constantly kind of exploring.
You see something pretty exciting.
and then you kind of bet big on it.
I do think that most returns are asymmetric.
Oh, the other part I was going to say is,
your example with investing,
I think that at a large enough scale,
things start to become normally distributed, right?
So in other words,
like if Broadcom has like a killer year,
well, it doesn't really matter
if you've invested.
in, I don't know if they're in the SMP 500, but if you, if you invested in all SMP 500,
then their killer year gets sort of diluted, right? So like at scale, you don't really have
black swans or white swans, at least you hope, which is the reason why we can all be
sort of comfortable. We're not just anxiety ridden as, as we get older, right? So I think that
that at scale, when I think about investing, as someone who's not a very,
savvy investor at all, you know, I just think, well, their returns are going to be normally
distributed around 5% or 8% or whatever it is. But then individually, or when I look at things
at a smaller scale, I do think the returns are often asymmetric. Yeah, I think I might have mixed up
sort of two things I might not. So the one your Rift-on is good. So I think there's like taking it
to career growth, not that we always have to talk about that, but taking, I actually believe I
I was bringing this up the other day. Most places, we overestimate the impact of a failure. Like, if you really, if your boss came and asked you, hey, do you want to try this like really hard thing? You're like, I'm like, I'm not sure. Like, what's really the downside of failing? Like, I think most of the time in most cases, in all cases, you know, this is not universal advice. I'm not a lawyer, I guess. But like most cases, the risk of failing is pretty low. So it's actually really capped to the downside, you know. And,
And it's, I guess, maybe similar to the VC example.
The chance of the VC having money taken away from them is non-existent.
So the worst case is capped at zero, but the top case is uncapped.
So if they do really bad, if 100 out of 100 fail, they lose all their money that they invested.
But it doesn't mean they invested all their money.
Correct.
But if one goes to Facebook scale or whatever, you know, then all of a sudden,
you know, they're filthy rich.
And so I think that's probably a good mapping.
I think there's another thing I was trying to slide in there,
but maybe now thinking about it,
I muddy the water.
I think it's separate,
which is just like individual performance day to day.
So like taking opportunities and getting that exposure,
but then day to day,
trying to set yourself up so that like,
like you're getting the reps in,
you're getting practice in,
like you're getting yourself like training and skill
and like setting your expectations for yourself maybe higher,
knowing that sometimes you're not going to do as well,
and then realizing when you practice being at a higher level
that you're going to get similarly,
it's not exactly the same though, I guess,
but you're going to get those outperformance days.
And those outperformance days may shock you with just,
like I've had that before.
You call it kind of like Eureka moment.
You're like doing some work.
And being up at that higher level,
you're going to potentially see more of them
where it's just like, what was I thinking that?
It was amazing.
Like, how do I get that in a bottle?
Like, I would love to be like that every day.
And it doesn't happen often, but I do feel like it happens more often when I'm operating at a high, not burning out, but when I'm expecting more out of myself and performing at a high level.
Yeah, I think that makes a lot of sense. I think that similar to the VC, you know, having that money to invest in the next round, like you also need to balance your work and life so that when you do have like that eureka moment, you can kind of really triple down on that.
And then go back to a more healthy state.
So I think, I don't know if we have a term for this in my family, but,
but there is kind of like this notion, like, oh, you know, dad's going into some workhole,
you know, he's going to come back in like a couple of days or something.
I call it the fugue, but yeah.
The fugue?
Yeah, isn't that like an organ thing where like the organ music gets really loud and like, you know.
Oh, look at you.
I've never, never, that's an SAT word I've never heard.
I have to look it up.
Maybe, yeah, a fugue state.
Here we go.
In music.
Yeah, it's a fugue state is a disassociated.
It's a psychological environment.
So I'm unexpected.
Oh, wait, this is bad.
No, this is like you have amnesia.
What?
I completely, okay, maybe I'm thinking, I guess I'm thinking of it.
Oh, Patrick's going dementia again, girls.
Oh, no.
Oh, no.
Hang on.
I don't know.
Maybe I'm just using this wrong.
Oh, well.
It's a word.
Like you said, maybe you just need to define a new word.
It's like a flow state.
That's why I think the like, you know, anyways.
Okay, but yeah, I just made up a new word for it apparently because this doesn't match any existing definition.
So if you use this, don't expect anyone to know what it is.
You know, just, okay, kind of, maybe this is too much of a segue, but you made me think of another thing.
I used to coach, you know, high level I sees at Facebook.
So basically, I mean, obviously, like there's, I wasn't the highest level.
So there's people above me and everything.
But specifically, people who are trying to go either from five to six or six to seven,
I did a bunch of one-on-one coaching, and then I ran this thing called the IC Circle,
where we would just do more like one-to-many coaching, but it was more of like a community type thing.
And one thing really struck me.
This person said they didn't want, this was one-on-one.
They said they didn't want to go to level six.
because if you look at the stats,
you're something like 15 times more likely to get fired
if you're level six than if you're level five
for performance.
And they're like, well, you know,
I kind of want to take this promotion.
They also were, I think they were on a visa.
I mean, it's definitely part of it.
But they just felt like, well, my expected value is zero if I get fired.
And so keeping that probability low,
is uh actually yeah but you're looking yeah i guess it's not zero you get a job somewhere else but
my point is like your your expected value at meta is zero if they fire you right um but uh and and so
the the the raise that the level six compensation is higher but not higher enough to justify that right
so anyway what i told what i told this person at the time was um was was basically you know okay i
totally get it there is and there I think there was kind of an issue with this people at six
and above just getting getting fired for performance I said but you know plan for the future
don't really plan for the present so like okay the present situation is you're looking around
and you're seeing the level six is on your team getting getting sit home but that's you can't
kind of base the future on the present um and I think that that um but
I think that, yeah, you have to kind of, like the asymmetric returns kind of kicks in where, you know, by taking that chance and getting, getting that promotion now, you know, you can always kind of figure out your steady state afterwards. What you can't do, or it's very hard to control, is having that amazing, you know, product launch that causes you to get a promotion. Like, in other words,
If you have an asymmetric return and you don't capitalize on it, it might be very, very hard to bounce back from that.
But if you do capitalize on it, and then let's say, worst case, you know, you get fired for performance at the next level, I still feel like that was the right call.
And just to finish the story out, the person didn't get fired.
They're actually still at Facebook happily at their level.
So, I mean, this got talked about not me, but by people in the field, financial people when Facebook was.
doing the outsized bonuses for hiring AI people.
I think they've turned it down now,
but they were saying a little bit of a similar thing,
there's an expectation of a continuing current state.
So in other words,
I don't have to take this right now
because I could take it in 12 months or 24 months.
And in reality, actually it turned out,
it was like a window of like 30 days.
Yeah.
It was like really short.
But it's sort of to what you were saying,
those opportunities you think they're,
they're going to come at some even spacing or always and it's that's not right it's not true your
ability to get a promotion your ability to be on that project your ability to start something new greenfield
make your mark whatever those things aren't evenly distributed in time and so yeah i yeah i agree with
you all right so just to follow up i i did find the correct definition of you i wasn't that crazy
although it apparently have a slightly negative connotation but it means to do something where you're
completely aware and focus but afterwards you can't remember
what happened. And so that sounds perfect. That's actually exactly what. So actually it took me a little
bit of searching. Apparently it's a very deep, like a back catalog definition. It's like what,
what idiot wrote this code? And it was you a week ago. Yes. No, no. That's what I like you get really
focused and you turn out all this stuff. My new one is writing code and then like having other people sort of
help me like finish it off or whatever and then submit it under their names. And then I think they wrote the code.
and they're like, no, you wrote this code.
Oops.
All right.
All right.
Time for news.
You're up first.
All right.
I'm up first.
So it won't be this weekend when this comes out, but that's okay.
Just recently, Andre Carpathy, I believe is how you say his name.
You know, he's gained some renown recently for a little bit being the people's AI person, I guess.
Like, you know, he's not running a big AI company.
He clearly knows.
knows what he's doing, he can explain concepts, but he explains them in depth.
Let me, let me add a little bit of color there. So Andre was the head of AI Tesla, and he left,
and he basically is like, I don't want to say retire. That's not the right word because he's very
productive, but he quit to do exactly this, like to just build things for the people and teach
the people. And so he's, he's in a way like doing some monastic work right now.
well do we talk about SAT words monastic oh there we go look at this
man you think people thought they were coming here to talk about programming
so you know in some ways actually I think to your thank you for that actually that's
color comment is very good you know I think our podcast tries to be although I don't say
we're as effective as this something in a similar vein trying to break down stuff for
folks he released a new
GitHub repo, sort of out of the blue, as far as I know, I don't think he was, I didn't know he
was working on it, called NanoChat. And NanoChat is, previously he had done some other
sort of examples of training, I don't say like toy, but like Toy Transformer networks and models and
getting them up to sort of GPT2 level, sort of, you know, trying to be two level. Like,
like, wow, okay, it's just better than if you, if you remember, oh, I don't know, a while ago,
a decade ago, people were doing hidden Markov models trained on, you know, Harry Potter. And it would
produce words that, like, if you didn't look too closely, they kind of look like maybe sentences,
but they didn't make any sense. And then, you know, we kind of got into the chat GPTs. These GPDs
are good. So he's sort of done some of that before. But now sort of what he's done is done an end-to-end,
pre-training, doing the chat UI, you know, constructing the models and everything end-to-end in a single
repost. You can really kind of study it. And really awesomely, if you don't know, training the frontier models,
I don't even know what the estimates are now
like $10 million to train
to train one.
They're very, very, very expensive.
So most training outside of the
maybe 10 companies in the world
is doing fine-tuning,
you know, other small things,
but not the sort of pre-training.
That's reserved for people have much more money
to burn than myself.
Yeah, and 10 million, I think,
just is one instance.
So it's like if you miss a colon,
I mean, you're going to catch earlier than that.
But like in theory,
If you have some data issue, now it's another 10 million.
Yeah, yeah, yeah.
You know, who knows how many times they have to do it.
Yeah, that's right.
And that was a big thing.
If you were called Deep Seat getting really popular for, you know, they had their moment.
They've kind of, you know, taken a little step back, you know, in terms of in the, you know, general awareness.
But that was their big thing was doing some of this pre-training cheaper.
Okay.
Anyways, back to the story.
So Andre's Carpathy's repo also made it so that you could do all of this for $100.
now $100 isn't nothing but I'll make a couple observations so first one is again compared to
that other amount people spend more than that go play a round of golf you know go do something
you know it it is a lot of money I don't want to diminish it but it is enough like if you're
doing it for learning if you're doing it for understanding if you're you know really getting into
this it's attainable it's a hobbyist level amount that you could spend which is awesome on
sort of relatively reasonable spec hardware and he has the scripts for doing all of it sort of
set up, which is really awesome. But then, of course, the second observation I'll make is twofold.
One, the cost for his stuff will go down. If you wait, you know, six months, 12 months,
I would expect it to be cheaper as new hardware and instances and supply come online.
But the second thing is as a general sort of observation, a lot of this stuff, this is where
you're seeing the growth. You know, we're not seeing chat GPT.
whatever, five, or any of these other things become two X better, three X better. It's not really
happening, but the cost to train, the cost to run inference is all coming down dramatically.
And here, you know, I think is a little bit of an example of, you know, sort of the rise in the
hobbyist stuff and the decline of the cost, you know, you're coming to, you know, not exactly
at meeting point, but coming and converging together to where we can start to not get to state
of the art, but push to something that five years ago would have been like,
Oh my gosh, like you did all of this run.
That would be crazy.
So it's a reminder of the insane progress we're saying.
Yeah, totally.
Yeah.
I think it's a really good kind of recap.
Yeah, I mean, this stuff's super exciting.
I think a lot of the innovation now is,
and I think Andres actually talks about this,
is around, you know, reverse engineering the best possible prompt.
I don't know if I brought this up, but I don't think we have.
I had a situation at my current job about a month or two ago.
We changed models.
So a model that we were using was deprecated.
And so we moved on to the next version of the same model.
And the performance went way, way down, like really, really far down.
Like, went from maybe 80% to 7% kind of thing.
And after a lot of debugging and kind of, you know, looking at a lot of numbers,
we realized that all we had to do is change the so we had a prompt that was about
three or four sentences in a paragraph if we swapped the first two sentences the performance
went back to 80 percent so so basically like the the ordering of the sentences the way
the thoughts are and it's not like it wasn't illogical the first way right it was two independent
thoughts and they just if you switch the order massive improvement and so what that says to me is
that we humans shouldn't be writing these prompts.
I mean, you know, we write the first version,
but like at the bare minimum,
you know, we should have a system
that just like reorders them a few different ways
that picks the best ordering.
Like somehow breaks them into independent thoughts
and then tries different permutations.
But at the limit, it would be, you know,
some type of feedback mechanism
that's constantly updating all of these prompts.
But that I think is where I'm,
seeing kind of a lot of the innovation and I think it's because of cost exactly you know as
you said when it costs 10 million to train a model um what we ultimately need and I think
nano chat's a good way of getting there is you know a set of benchmarks and a set of models that
are small and trained for very cheap um and then you can you know learn learn some insight someone
can invent the next thing right like maybe it's SSM
maybe it's a diffusion chat models right someone invents the next thing and then it's it blows
everything else out of its weight class but but it's still nowhere near chat GPT but but it totally
dominates the very cheap chat GPT and so then you could say okay this is now worth spending 10 million
I mean I guess that's partly where I think the running local models you know on your whatever home
home smart device, your personal assistant on your phone.
We were just checking before briefly.
There's a cloud outage this morning.
So that happens from time to time.
And so I think having some of these models run locally.
And I think when they hardware get sophisticated enough and the models get distilled enough to be like absolutely actually useful in those things,
I bet you would see a lot more investment into into that as like a focused product that you can have.
Like, what has state of the art for a capped size model, not unbounded?
Right.
Yeah, exactly.
All right.
My news article is Pidentic AI.
To be totally honest, I picked a random news article.
I really just wanted to talk about Pidentic AI.
I could have made it a tool, but I had a different tool.
So this is really cool.
I used this the other day.
I built a very simple test, like a prototype.
So I downloaded, remember PC Part Picker?
Yes.
Remember that website?
Yeah, I don't know if they're still around,
but someone posted basically CSV files
of all the inventory of PC Part Picker.
So there's a CSV file for heat sinks,
a CSV file for processors, et cetera, et cetera.
And I use Pidantic AI, which is this way to build
AI agents, which we should probably
do an entire show on,
but it's taking off, and
I think it's very interesting.
And I only give it access to one tool,
and the tool is
you know, DuckDB?
We've probably talked about this.
So the tool literally is
you can,
you pass in which
CSV file you're interested in,
video cards, CPU, etc.
And DuckDB SQL query.
And this,
this AI
tool opens at CSV, runs a query, returns back the results. And so that was it. And so this whole
AI agent thing with Pidentic AI, it was like 80 lines of code or something. And now I have this
thing where you can ask it to build you computers. Like you can literally type in, hey, I want a
computer with the NVIDIA GPU, at least 16 gig of Vram for under $2,000. And it will actually
build a computer for you out of all those parts and it runs SQL queries and you can actually
see all the steps. First, it runs like a described select on each of the CSV file so it can get
all the schemas of them all. And then it starts running queries. Sometimes it'll type in a query
that doesn't compile. DuckDB will like send an error back and Pidentic can act will actually
catch errors and turn them into strings and send those strings back to the AI. So the
AI, like, got an error and was like, oh, I need to change the query. And it did. And it eventually
gets there. And it generates, like, builds PCs that are good. So, but, but I just found this
amazing. And I'm sure there are other tools out there, but I just, I couldn't believe how in
just a matter of hours, I was able to go from nothing to all that. That is really cool. I mean, it's
really, like, commoditizing. Yeah, it's coming to the fact that, yeah,
We can do that and it's super cool.
Yeah.
So if you're doing agentic stuff,
this is the same company that made Piedantic,
which is a,
which is kind of like a data class system that's very popular and really well done.
Actually,
the same people made SQL model and fast API and like a ton of amazing libraries.
So this is also, you know,
walking in good footsteps.
My next news article is something that I actually botched and then it was amazing how it was amazing before it was amazing.
Okay, here we go.
So I saw this news article and it was 1,000 Starlink after a launch this, again, this weekend from SpaceX.
So all those Starlink satellites up.
And there's been actually a huge thing like the amounts of money they're making.
It's a private company still owned by SpaceX.
But the number of subscribers, they have like as far as like a tech store.
It's actually really astonishing.
I would love to have invested, would invest.
It's private.
So it's a bit of a thorny issue to try to get access.
But this is a great idea.
I'm really for it.
But we saw this.
I was talking about it that, you know, there's a thousand, thousand satellites.
And then I was like, wait a minute.
Hang on.
Let me look at this again.
And it's, you know, oh, it was a thousand Starlink satellites had been launched in
2025 so far.
So, but I thought was like, yeah, what I already thought was amazing is like, oh, there's a
thousand of these satellites, you know, orbiting around.
How cool is this?
It's like, no, that's incorrect.
There has been a thousand launch to this year.
So then I, wait a minute, how many are there?
And it's like 8,400 and something in orbit.
There are so many in orbit now that actually every single day, one to two Starlink satellites
incinerate back into the atmosphere from.
sort of end of lifing, which is expected.
So they put them into, you know, relatively low Earth orbit.
So there are orbits, you know, there's a little bit of air friction and whatever.
They naturally decay.
They're meant to do that.
You don't want them staying up forever and, you know, getting obsolete and then just
drifting around a space junk.
So they actually think they do a process for, you know, intentionally sort of making it.
So they'll deorbit, you know, deliberately rather than just, you know, chaotically.
But basically there have been so many gone up.
going up that one to two every single day are just basically burning up and it's still this like
insanely lucrative business doing all this crazy stuff and there are 8,000 found a visualizer you
can click and actually see all of them because there's you know sort of orbit information is known and
it turns out orbital mechanics is like well understood so you can actually just go and visualize
the entire like mesh of all of the orbits which is very well planned to get nice coverage
You can see new ones that are being deployed
are like clustered together and slowly drift
to their, I guess, station.
I'm not familiar with the term.
But it's absolutely bonkers just to think about the scale of this.
And I don't think maybe I've talked about on the podcast
before, but I was talking about it with some local friend group,
you know, just sort of how if you had said this, you know,
whatever, 10 years ago, like, oh, yeah, there's like this darling thing.
It's roughly competitive with broadband.
It works great.
You can play video games on it.
Like it's not super laggy.
It's in this little dish,
just flat dish thing and people would be like what no that and then now people are like yeah
of course that's called Starlink like yeah I have that in my cabin yeah it's it's from
do they have coverage in the ocean or like do how does that work is it the kind of thing where
it's only in places where there that can be populated or is it literally circumnavigating they
have slightly different fee structure I'm not clear if it's because they can or because there's a
reason but no people have it on like their ships on their traveling RVs on their boats yeah
there and now the this weekend or whatever they're demoing and i guess maybe some people still
don't know this but there was like a couple weekends ago or whatever they were talking about
one of the airlines i want to say united or american here in the united states was going to be
putting starling dishes on all of the planes which they've used a different satellite provider
so if you've ever gotten satellite on an airplane it's you know it's not very good yeah um
But now they were like flying this reporter up and sort of they were able to take a FaceTime call and it felt like normal.
And they were like from an airplane.
And they were just like their mind was sort of like blown that, you know, flying through Spain.
I think Elon Musk even ended up retweeting about it.
And he's like, well, if you're traveling whatever the orbital speed, I won't quote it, because I'll get it wrong of a Starlink satellite.
He's like, even airplanes look basically still.
And so you know, you think flying through an airplane like, oh my gosh, it's moving so fast.
Aren't there going to be all of this?
He's like, no, they basically look like they're standing still because we're flying so fast.
Wow, that's incredible.
And so it's just this interesting sort of normalization of what is just absolutely.
It's rocket science, right?
Like just all of this stuff, you know, being up there and going around.
So when the Starlink satellites are, you know, shuttled, is that shuttle reusable?
Or like, how does that work?
Yeah.
So they do the Falcon 9.
So it comes back and lands normally on the barge.
So they go up, launch it in the second stage, the top part of the rocket as a fan go up.
And they don't relaunch that.
That's the new thing, the really big one that they make a big deal out of Texas, the starship and the booster that goes with that.
The full thing is supposed to be reusable.
So that's like where they see the future.
But for now, the first stage, which has most of the, you know, more of the rocket engines and everything.
basically goes up and then comes back in lands
and those are the videos you see.
And so that's why they're able to put,
well, part of why they're able to put
so many satellites up in orbit.
But they still,
there is some portion of the launch vehicle
that doesn't get reused.
Got it. Makes sense.
Cool. All right.
My second news story is the introduction of the
apps in chat GBT and the chatGBT
apps SDK.
I feel like this is really powerful.
I feel like there's something here.
So just to explain at least how I think of it, because it's still very early, basically, and I don't have a lot of details on it yet, I don't think they even kind of have a lot of details yet themselves.
But the way I imagine this working is, let's say you ask chat GPT to play your favorite song, or you ask it to buy eggs from...
I guess if you say from
let's say buy eggs from Walmart
right
so right now if you do that
you know chat GPT is going to say hey I can't do
any of these things would you like me to do something
else it's always so polite it's like would you like me
to just talk to you about eggs
but you know in the future with
these apps
you will
install the Walmart app
and when you ask a question
you know the opening I folks
will figure out from your question
whether they should route that question
to an app or not.
And if they decide they should,
it gets routed and now Walmart,
you know, which probably is using ChachyPT
anyways, but like now Walmart gets your question
and they're poised to
handle it differently. So for example,
if Walmart gets your question, they could order the eggs.
They have your identity
from you being logged into Chatchee.
And so they can order the eggs and eggs show up at your house the next day, which is pretty cool.
I do feel like this is kind of like a new kind of app store moment, which is pretty exciting.
I do feel like we'll kind of look back on this and say, oh, like, I wish I'd built like the,
you know, whatever the most obvious app is, maybe groceries is it, I don't know.
But I feel like now is the time to think about, like, what is the app that obviously everybody wants?
Because apps aren't even allowed on the store yet.
I think they're going to announce a date when you can start submitting apps.
But you have access to the SDK.
So if you have something where you think chat would be a better experience than going on the web or whatever else you would do,
now is the time to build that thing so that as soon as they announced a,
the submission process
you could be first in line.
I do think there's something
just extraordinarily powerful here.
When this was launched,
I saw it was like
a thousand startups are killed
by one slide or something.
It was like the headline.
But I guess maybe,
I like the way you explain it,
I guess,
if you develop your app,
it's sort of like now you can host your own like server
if you do something local
and your chat chip.
Your chat agent can query your
you know local server there's answering something so it's like sort of chat you do some training
or upload some workflow or something and then open ai will route chat gpt queries to that
internally if you host something there like if you is a state in their ecosystem can you run
arbitrary code so it's not completely clear yeah let me let me see if like to explain it so the idea
is you know you you submit an app uh open a i approves it now you're on their app store right
So a person who uses chat GPT installs your app, right?
Now, when that person asks questions to OpenAI,
OpenAI can route some of those questions to you.
And you could then use OpenAI as part of helping answer the question,
but they're now expecting you to respond with an answer.
So in the case, like you said,
So you're a grocery store.
You do work to make sure that, like,
your OpenAI app knows how to call your inventory and pricing,
you know, Rest API.
And so you give all of that information or whatever to your subcomponent app.
And then, okay, and then the user installs that one,
which is why it doesn't have to disambiguate across every app on the app store.
Okay.
Yeah, I think that's right.
And so you can also build tools like I talked about with the Pidentic AI.
So in addition to being so open-ended that ChatGBTGBT just calls you with a string of text,
you can also have a tool.
So for example, maybe you have a buy groceries tool and that tool takes like some very specific structured data.
So like the name of the grocery and what quantity and all of that.
I mean, you still have to do all the validation yourself because you don't never know what ChatGBTGT is going to call your tool with, but at least that might be more structured than, you know, you just have a string of text and have to figure out what to do with that.
Very cool.
Yeah, definitely check this out.
I mean, I feel like folks out there, you know, especially if you're a hobbyist, this is like something to get in on early, I feel.
Yeah, I guess that's the what the dozen fart apps on the app store that made.
you know, tons of money in the first year of the Apple App Store.
Yeah, yeah, exactly.
You need the equivalent, the chart GPT, chat, I try to make it chart, fart, GPT, I don't know,
fart GPD, okay, oh, no, my, this is going to evolve, let's not do it.
Chart GPT.
Oh, stop, stop.
Oh, boy.
All right, let's just move to book at the show.
We'll have to get rid of the friendly tag or whatever, the safer work tag.
I don't know we have a friendly, no, anyways, my book of the show, just hard pivoting.
Segway, forcing a segue, is, I'm repeating, The Will of the Many, I had this book,
I believe a couple episodes ago. I had only just started it. I have finished it. It was really good.
I really like it. So I felt like I owed the people the like follow up that if I was going to pitch
it as a, you know, hey, I'm starting this book. It seems kind of interesting. I did finish it.
It is good, you know. And the second one comes out, probably by the time this episode's out,
it'll it'll sort of be out so if you haven't started this um book and it sounded interesting from
my last pitch i guess i could repitch it but anyways so uh you know definitely has like an
equivalent of kind of a magic system a loose magic system um where people can uh sort of seed
their something called like a will like sort of part of their like energy their life energy i
guess isn't hasn't been you know fully explained yet and they can sort of give it
to somebody else, and those are assembled into the government in a kind of form of a hierarchy,
a pyramid, and it's all about sort of the dynamics, the politics of that. And so, of course,
you know, a story about a boy who's an orphan and ends up getting, you know, sucked up into
intrigue and mystery and all of that good stuff. But by the author James Islington, who I've recommended
other books from before.
Definitely something that I would recommend now having finished it.
And I'm looking forward to the next one.
I guess I could have made in my book of the show.
But the next one seems like it's going to be really good as well.
And I'm looking forward to reading that.
All right.
Yeah, I'll definitely have to check that out.
I need to get more into reading fiction.
My book of the show is actually a podcast episode.
I'm cannibalizing our show here.
But basically, I had a long drive.
So, long story short, it was just me and the boys, me and my two sons,
and we were driving to Houston and back.
And, you know, they're playing, you know, Nintendo Switch or whatever they want to do.
So I'm like, I'm going to listen to a podcast.
And I listened to the whole six hours of this podcast with DHS,
who's the person who made Ruby on Rails.
I thought it was fascinating, actually.
I kind of went in thinking, well, you know,
this is somebody I have kind of heard about.
A friend of mine told me we should have DHH on our show.
That would be kind of cool.
DHS, if you happen to be listening, which I doubt,
or somebody who knows him or something, totally down for that.
But, you know, I'd heard of him,
and I remember specifically him posting a lot of,
about getting off of the cloud
and he ended up moving to like a
colo.
So, you know, I thought it would be kind of
interesting, but I had
honestly a little bit low expectations
of just any podcast episode
where
it's just kind of a one-sided
where the person just basically
talks for six hours.
But I was like blown away,
actually. I felt like the content was
really interesting.
actually the whole
it really goes against
a lot of the
sort of like common themes
that you're seeing a lot of. So for
example
DHH like despises
going into any office
and so he's like remote work
is the only work
it's a really interesting
perspective
you know he talks about how
to make remote work work well
he's been doing this for like decades right
So he gives a lot of perspective on that.
The company, which I think is called 27 Signals,
they took $0 of funding.
So they basically bootstrapped themselves by doing contract work.
So they did contract work until they had kind of enough money.
And I guess maybe they structured the contract work in such a way where they could share the IP.
And then eventually open source the IP and it became really.
Ruby on Rails and all of that.
So $0
of venture, $0 of
anyone investing.
And he said something kind of
interesting. He said, you know, people
gave me a hard time
because they say, well,
you know, you,
so their product, which
I forgot the name, oh, base camp.
So their product is base camp, which is similar
to Jira. And
people say, look, you could have made Jira.
You know, your product is
is actually like in many ways better than Jira
and Atlasin is like a giant company
and you're this tiny company.
And his response is interesting.
It's basically like it's a tiny company
but you know,
I own half of it.
And like Jason,
who's his co-founder,
owns the other half.
And actually like owning 50% of a tiny company
is awesome.
And I have more than enough money and all of that.
And it was like a real kind of like,
I guess like it is also like
I keep wanting to think of the word monastic,
but it is kind of like totally antithetical
to everything you see on social media.
Everything on social media is,
look at this big round we raise.
No one ever says,
look at this $30 million we raise.
We only had to give up, you know,
70% of the company.
Like no one ever says that part of it, right?
And everyone seems to be just like stolling all the virtues
of being in person.
And yeah,
I feel like just to have a different perspective, it was pretty mind-blowing.
So I highly recommend folks listen to it.
Also, my dad was a race car driver, so he spends about an hour or two talking about race car
driving, which hits close to home, which I loved hearing about.
That's more of a personal thing.
But I found the whole interview fascinating, and I'd highly recommend folks check it out.
It's six hours, so I think it counts as a book.
Fair enough.
Yeah, I think 37 signals is private.
I've looked before, I was just looking again to see,
but I guess one of those interesting things about private versus public,
and to your point, not only they have a lot more description,
like it just don't have to share a lot of information
and get judged about decisions.
And so because it doesn't have investors or all of this,
like even people, you know, here are some hacker news posts,
and they're just saying like, can we even estimate how many subscribers they have,
how much money?
And people are like throwing out numbers,
but they're all, you know, just speculation.
or you know based on some you know sort of swag at it like they're not accurate and it's just
kind of interesting to think about like the amounts of stress reduction that would give to someone to
say we just going to make the decision can we continue to pay the people who work for us can we
continue to make money that we're happy with and like there's always that question like what's enough
you know and like if you don't have an answer to what's enough then maybe you just end up in one
of these quests to, you know, seek power and fame and fortune. And that's okay, I guess,
but if you can say, this is enough and he is saying, you know, I can do these really cool things,
drive race cars and have this business that doesn't make me hate my life. Then, you know,
that is awesome. It's great to see, like you said, sort of another way instead of it only,
that there's only one way. Yeah. I mean, somebody said, uh, uh, the, the, the host, uh, the podcast said,
You know, hey, one thing about working remote is, you know, it's kind of lonely.
How do you deal with that? How do you meet people? And the guy was like, DHH is like, get married, have a family.
He's like, yeah, I have kids running around the house. I'm never lonely.
You know, it's like, these are the kind of things that like, just like, just a different perspective.
Like, I've never heard anyone give any answer like that. And, you know, it's so different. I'm still honest.
I mean, we just got back from Houston, so I haven't had a lot of time to really, like, sleep on this.
But it's such a different perspective that I have to really think about it.
But I think there's definitely nuggets of wisdom there, absolutely.
And I do feel like, you know, I've kind of ping ponged between, you know, I was in the office.
COVID hit.
I worked remote for, what, four years.
Now I've been in the office of the past year.
And I do actually feel like, I feel like remote is better in many ways, but it has to be done correctly and you have to be in the right situation.
So I think, I think remote where there is an office and like 90% of the people are in it, I think that's hard.
Of course, you can make it work, but I think socially, like, it takes a lot of, you have to be very active on managing that.
I think remote where everyone's remote or in the office, you know, it's, it's, um, you're on
Zoom all the time anyways, because there's always, this is, okay, this is, this is my perspective.
Everyone's remote. Everyone's remote. Because you go into the office, but there's the Seattle
office or there's Bob who's remote. Like as soon as, as soon as you have more than one office or a single
person in the company who's remote, now you're remote. Because you're on.
on Zoom all day anyways. I guess my thought here is similar to what I was saying before. I don't think
there's one right way. I think to your point, it's situational. It's company by company. It's person by
person. Some people hate like, I couldn't be at home. It's noisy. Other people are like,
I can't stand the office. It's noisy. People dropping in on you. Like it's, you know, for almost any
statement, someone will say the same statement in positive and negative light.
And so like anything, my, how do you want to say?
My thought is you're going to have to try really hard to prove that there's a singular right answer and no nuance.
Anytime someone comes and says remote work is the worst, it doesn't make sense, it never works.
Or if someone says, and the office is dumb, it never works.
We should all be.
Like whenever you make a statement with no wiggle room, those are always the hardest to defend, right?
And it's like, come on, there's got to be a gradient here.
And, you know, and if there's a gradient, I think that, you know, I will say even being people fully in the office, having the setup that we could do fully remote means people have some flexibility to do.
If they need to take a week, a week, they may have taken a sick week or something away or they needed to, you know, I was pointing out, be home.
There's so much random stuff where people expect you to be at your place of living for delivery or for some improvement or to come in or whatever.
and then they change the date.
So having some flexibility to handle those situations
has been a net benefit,
regardless of sort of what percentage allocation
you spend office or at home.
Yeah, totally makes sense.
All right.
Tool of the show.
Oh, man, I think I'm losing my voice.
It's not good.
All right.
Mine is a game, and it's a game I am almost certain
I didn't look because I didn't want to know,
but I'm almost certain that I've recommended before.
Everyone knows it.
Everyone has it, but I don't care, because I got sucked back in.
I was free, and then I am no longer free.
I've been playing factorial again.
And not even, although it looks amazing, not even the Space Age DLC.
I'm just back to playing vanilla, bass, no mod, Factoria.
I've been playing it on my Switch, and it has been, it's always like a little awkward.
It's so much better on the PC with the keyboard.
I will admit, but once you get into the groove and you're like on an airplane, which I was recently a couple of times, you know, can sit on the couch with the family, you know, when I'm watching some show, you just really don't want to watch for the fifth time in a row, which whatever. Anyways, like, if you can quote the show and like, you know what's going to happen from the kids show, you know you've seen it too many times, which means they've seen it even more than you because you know you haven't watched it every time they've watched it.
um anyways i'm sucked back in factoria if you've not played it you either are going to assume it is
either going to be work or amazing or you won't understand it i guess those are the three
three outcomes be like this is just like work it sucks i don't want to work or you're going to go
i don't get it this is like weird and and just like this is not fun or you're going to be like
what you're going to spend 10 hours and then you have no clue what happened
You know, I posted about a month ago, I had a friend, a close friend who's trying to grow his Twitter presence and just actively trying to do this.
And he posted saying, you know, I went from 1,300 followers to 7,000 followers in a month.
Here's a list of things I did.
And it was a list of things.
And I replied, and I was like, how I went from 1,300 followers to 1,300 followers in three years.
and it was a screenshot of Factorio.
But Factorio is amazing.
I mean, it's so satisfying.
It's like got to be one of the most satisfying feelings,
like building a factory.
I have a confession.
I have like a probably,
I don't want to look how many hours.
I don't actually,
maybe the switch does track.
I don't know how to look.
It's not prominent like on Steam.
So I'm not sure how many hours I've played.
I've never launched a rocket.
What?
Really?
Is that bad?
It's bad, right?
I need to do it.
It's become like a goal.
Like I, before I stop this time, I have to do it.
I just always do other stuff.
I just, I don't know.
Wow, that's amazing, though.
Actually, it's amazing that, like, you have built all these intrinsic rewards, you know?
It's like, oh, can I, like, well, give me an example of something that took you a long time that wasn't rocket.
No, it's not.
No, no, it's worse than that.
I play for a while.
I start to get to, like, and it reaches, like, there's always someplace and it's always a little different where it, you need to, like, do.
a bunch without like a good progression so you can like progress early game progress and I'm getting
better I think actually each time and then I go away for a while I'm like I don't have time to invest
to like double my output here build another one of these or you know get a blueprint working or get
the bots going so I just like I put it away for a while or I get busy and then I come back I'm
I don't remember where anything is so I just start a new game and then I just like start over and I just
like do with all the improvements so it's just like what do you call like greasing the groove like
just keep playing the early game over and over again.
I know, that makes sense.
That makes sense.
It's bad.
I'm a bad person.
Like, I need to finish the game.
There's so much developer put in I haven't even experienced.
My kids play with a creative mode where you're not even a person.
You could just build anything anywhere and it's unlimited.
And you kind of like teleport around the map and they just build random things with it.
I turn the biters.
Like the biters are there, but they won't attack you unless you attack them because.
yeah i just i don't know it's too anxiety inducing man well you know i mean you probably know this but
the construction robots just totally changed the profile of the game like uh um like i i i have mine
like near the end game mine will be set up where i'll have basically stamps that like stamp
an entire screen's worth of content and so it's like uh uh uh uh uh like four robob ports
a bunch of these like
electric towers
and then a bunch of solar panels
and it just stamps down
and then because the robo ports
are part of the stamp
you know
it builds the robo ports
and then it can build the next stamp
and so I'll just walk across the stream
stamping like hundreds of these
and then I'm like done with power
it's like okay well I don't have to think about power
for the rest of the game
nice
all right my tool
oh go ahead
my tool this show is
nip.io
which is apparently a great way to get
in trouble at work as
I still have no clue what this is
so this is like
I could see why
workplaces wouldn't like this but I'll explain
what the tool is
so the idea is
there are many places where
you know you
can't use
just an IP address
you have to use a host name
and so you end up in your code writing all these things like oh if I'm local then use like this IP address otherwise use this host name or you end up in weird situations where like if you're running on Kubernetes you know then the host you want to talk to you is just called database and you can like ping database you know curl database like from the other nodes in in that Kubernetes cluster
But then when you're not running in Kubernetes, now it's like, oh, no, it's actually a local host.
It gets all confusing, right?
And so this nip.io is like a cool way to turn any IP address into a DNS.
So basically you take the IP address, whatever it is, you know, 1.1.1.1.
And then you add dot nip.io at the end.
and now you have a DNS,
but it's going to resolve to exactly that IP address.
So, you know, you're kind of trusting,
I will say that for the reason probably is blocked for Patrick,
Patrick's job is that you're kind of trusting these people
to convert that DNS to the correct IP address.
So, you know, you might put in like 2.2.2.2.2.9.0.9.0.0.0.0.2.2.2.2.2.2.2.2.2.
because the people behind that website have built like a DNS lookup that does that.
But they could change that code tomorrow.
They can make it so everything.nip.io goes to Google or goes to a web-based Bitcoin miner or whatever.
So that is sort of a vulnerability that you're taking on.
It is, you know, it's powered by a company that seems like pretty reputable.
but that is a risk.
But when you're doing
debugging stuff,
especially locally on your laptop
or you're getting started
with a new project,
it could be an easy
kind of convenient.
Yeah, that's my tool of the show.
But I could see,
yeah, like you said,
there are specific use cases
where it would be really useful.
Yeah, totally.
There's a lot of code,
I mean, especially third-party code
that expects a domain name
that just will not take an IP address.
And this is a way to get around
it um all right so oh go ahead no no no i just yeah anyways
workflow orchestrator we're gonna orchestrate the show here with a workflow
orchestrator i'm distracted today dude we're just like i just like it's like it's like it's
like it's like it's like it's like it's up over here um so so okay
Let's talk about why you need a workflow orchestrator.
So here's a very simple example.
And actually kind of spoiler alert,
I actually asked this as my interview question.
So if you ever end up getting interviewed by me,
you're going to probably get this question.
Let's say you're building YouTube.
And specifically, like, don't worry about like the social networking
or anything of that, but just, you know,
you upload a video, it turns, transcodes your video
a hundred different ways for mobile and, you know, low bandwidth and all of that.
And then when other people go to that URL, they get your video.
So, um, video is maybe not the best example of this because it takes a while to upload the video.
But, you know, you, you upload the video and, um, you know, then it has to go through and do a whole
bunch of work, right? That transcoding of that video to other formats, maybe you uploaded it as a
quick time video, which they don't want to serve that on the web, right? So that is a time-intensive
process. And so it's not practical to like just have someone wait on a web request, right? Because
if that person just hits F5 or something and closes the connection, you know, how is that going to
work, right? Those kind of two problems here. Problem number one is sometimes you have to do things
that take a long time. And then problem number two is sometimes you have to do things that are
kind of non-interactive where someone will leave and you need this thing to keep working and eventually
report back in some way. And the last category of why you'd need this would be, you know,
you have a lot of things, a huge batch of things that you need to do. So for example,
actually, this is a very real example for us.
I went through and re-ost-transcribed all of the episodes of programming Throwdown.
So if you look at the transcriptions, they're better than they were before.
The old system I was using was massively out of date,
and I felt like, well, since I upgraded it, I might as well just re-transcribe everything.
And so, you know, we have 180-some-odd episodes.
184 episodes.
And so that actually is expensive, but even if it wasn't, it's, you know, a batch job that I have to just submit a hundred of these and just tell me when they're all done, right?
So that's the basic reason why you'll want to use a workflow orchestrator.
under the hood they all work more or less the same way so there's you know a set of message cues and so i think
we've talked about pub sub and these things in the past but basically you know you need a way to
submit uh jobs and um the jobs need a way to read your submission and so there needs to be some
kind of cue to handle that um if the job is multi-step then when the first step is done now
you know, a new thing needs to be pushed onto this message queue saying,
okay, we're ready for the second step, right?
A job can fail, you know, like AWS could shut down that machine,
the hard drive could fail on that machine, etc., etc.,
and so message cues are important for,
there's a lot of logic around,
basically you send a keep alive to the queue while you're still alive,
and if the queue doesn't get a keep alive for some amount of time,
that you specify, let's just say 10 minutes,
it assumes that you're dead and it will rerun,
like it will ask another machine to go and process that.
Okay, so message queues, a big part of it.
And containerization, another really big part of it,
because you need a way to define some task on your computer,
but then ultimately send that task out to the cloud,
and there might be hundreds, maybe thousands,
tens of thousands of machines kind of running your task.
So it's not practical to like sudo app to get install some package all those times.
So there needs to be some type of containerization.
And then also the workflow, so the thing that, you know, tells, you know,
the overall system what tasks need to be run and which ones depend on which other ones,
that also needs to be put in the cloud somewhere and containerized.
And so that's the second ingredient.
And the third and final ingredient is some type of worker pool,
some type of, you know, a pool of machines that are ready to do work.
And you typically pair this with some type of auto scaling.
So you have, especially if you have bursty work.
So imagine, well, YouTube is such a multinational thing that this wouldn't happen.
But imagine if you were building a version of YouTube and you're just getting started
where it may be way more popular in the U.S.,
you know, you'd want to scale those nodes down
when it's the wee hours in our time zones
and then scale them up when it's peak hours.
Yeah, I think that becomes important too
for, you know, something going viral.
So if your website goes viral,
you don't, you want to capture the signups.
You want to capture whatever, right?
So, but I think in that case,
if the things people are signing up or doing
or kicking off these workflows,
I wanted to circle back to, you know,
talking about failures, but I think there's a couple of things maybe that are less obvious or
maybe important about these. So one is you were mentioning, you know, something like YouTube size,
but when you're kind of small, things generally fail because you did something wrong or there's
a bug or a problem. But when you reach a certain size, even if, you know, if you look at what I guess
is called meantime between failure, like you said, a hard drive or cosmic ray or whatever it may be,
But when you're running a million, you know, jobs a day,
what is the chance that at least one job is going to, you know, have an issue in a day?
It's it's non-trivial, right?
It's noticeable.
And so you're getting these failures because someone is walking into the server room
and pulling out a server rack and like blowing the dust out of it for maintenance or whatever, right?
And you can either have that be very carefully planned or just say, look, it's going to get handled.
and I think kind of
that is the same vein
as you see stuff like
I think Netflix was famous
for doing chaos monkey
something that goes in
and just basically
kills off nodes
or drop nodes
to make sure
you know nothing
nothing breaks
it just randomly
sows chaos
I think the other thing
that was my job
when I was there
that is my job
in my family
I'm just
chaos monkey
oh dad's here
Okay, wow, I've been distracted today.
But the other thing is, you know, we were talking about, you know,
and I think we'll get to it in a minute,
but something like training and LM and it's, you know, $10 million.
Some of these workflows could be very, very, very expensive.
And like you said, imagine there's some error.
But imagine the error occurs at like a very late stage.
You don't want to start all the way again from the beginning.
So being able to kind of, it's not exactly a hot swap,
but being able to resume the pipeline with a new,
version of like a later node or even you just maybe it's not an error maybe the whole thing
finishes but then you realize oh here in this last stage or this last but we could do better
or we could do something different so having a system that automatically is like preserving those
stages and inputs and and you know that kind of stuff so you can sort of rerun later parts
becomes like super useful and tying that all with the dashboard and management which is things
I have seen people not use one of these and then you end up just basically reinventing them
So you can either like not use them or you can reinvent one with half the features that you like do yourself.
And you just slowly try to re and build a crappier version of like an existing solution.
Yeah, totally, totally agree.
Yeah, I think the part you mentioned about history and backfill is is super important.
Like there have been so many times where exactly what you said happens, you know, the job runs successfully, but it wasn't actually a success.
Like we actually went the past three days with no art.
because, you know, maybe the upload failed, but it didn't let the job know that something
was wrong. So, um, so for example, like a very common pattern is where you catch an error
in Python or in any language. You catch the error and then log an error message. And that's
great in all, but because you caught the error and you didn't bubble it all the way up, now
your job thinks it finished successfully. Like if you didn't set the exit code correctly on your
program. And so, you know, that can happen and then you find out about it. Even if you find out
about it an hour later, if your website's popular enough, that could be millions of jobs that have
run in that hour that need to be rerun. So having all of that ledger and then being able to backfill
and say, okay, you know, all the jobs from, you know, three days ago to now, I need to rerun all of
them and it just goes through and does that and auto scales to handle it all that that's um that's and
there's sort of this uh kind of like um you know how cars have the uh what's it um there's basically
like a backup system so imagine um so like imagine on your HUD imagine your HUD just like
disappears like the the OS powering the heads up display on your car fails it's got some kind of
backup where you'll at least see like the miles per hour or something but now that yeah right but
that contingency that's where i was thinking of there's a contingent operating system in most of these
cars but now that contingent operating system that really needs to not fail right because now
you're really host and so similarly here you know the workflow orchestrator itself if it starts
failing it's that can be a really difficult thing to survive right
And so to your point, you know, I've seen it as well where people try to build their own.
And it happens so innocently, you know, the way it happens is, you know, you have some task.
And this happens at big companies, you know, so it's not a small company thing specifically.
Like, you have some task and it's relatively minor.
And so you have this script that does it.
And, you know, that starts to succeed.
So it's like, well, you know, I'm just going to make a cron job and have it run this Python script every hour.
And then it's like, oh, okay, now I need to run it every minute.
Oh, now I need like three machines.
So I'll just spin up three, C2 instances all running this Python script every minute.
And you kind of like, at some point, you have to just eat the technical debt and like set up a workflow orchestrator.
But I've seen too many cases where people are going to go too long.
And then when there is an issue, it becomes very difficult.
I think the other term you'll hear with a lot of these is DAG, which isn't unique to these,
but that all of these jobs form a directed ASIC, like, no cycle graph.
So they're not looping.
So they have a start.
They process some number of jobs with some interdependencies, and then they reach a conclusion.
And so you'll hear them talked about that way.
Maybe help me there, Jason.
So, and not to throw a wrinkle here, but so workflow orchestration versus some
that is more seen as like
I would call like a pipeline like spark or
something. What's the sort of distinction
between or where's a line
or is it blurry? Like what's the difference
between something that would be viewed as more
like pipeline versus workflow
orchestration? Yeah, that's a great
question. I don't have a easy answer
but basically I would say
okay, one thing a
workflow orchestrator cannot do
is it cannot fuse
nodes together.
So you kind of explicitly define, like, do this task, do that task, do that task,
and they all depend on each other or in a sequence or something.
And if tasks like B and C really could be fused into one operation, it won't do that
because it makes no assumption about what programming language or what version of
Ubuntu is running on that node or anything like that.
They're all just basically Docker containers that are getting spun up.
you can actually,
depending on the orchestrator you use,
some orchestrators have it where
you can spin up a container for a particular node,
and that container can do multiple tasks.
Let's say I have 10 videos.
I have to transcode all of them to quick time.
I have a quick time node,
and that node could do all 10 of them.
Right.
But by default, most of these orchestrators,
you actually spin up a Docker container,
do that one thing, that one time,
and then tear it back down.
And so there's a pretty heavy spin-up, spin-down time
for tests.
Conversely, with Spark, like, you kind of create,
you do create sort of a DAG,
and same with early versions of PiTorch,
or if you do PiTorch.compile,
or, you know, early versions of TensorFlow,
you would create a DAG of operations,
and then what the Spark compiler or Pi Torch compiler
TensorFlow will do is it will fuse operations
and it will create this super optimized bag
and it does that because all of the operations
are within its own ecosystem.
So like Spark has a limited vocabulary as does Pi Torch.
And so this being so open-ended,
it can just run any Docker container
with any arguments you want.
that open-endedness comes at a cost.
All right, yeah, that makes sense.
Cool.
So, yeah, quick rundown on the steps to use a workflow orchestrator.
Typically, you containerize your tasks.
That's usually, you know, pretty straightforward.
Most people are using Docker anyways.
But you also have to containerize the workflow itself.
So the file that says, you know, run this task,
and then, uh, run the second task afterwards, like that, that logic has to go somewhere. Um, and so
all of these examples that we'll talk about in the future, uh, they all do it differently. Um,
and then after that, you're kind of off to the races. You submit jobs. Um, one thing that is, um, that varies a lot
from, from orchestrator to orchestrator is how to pass data among the tasks. So,
So the early versions of these,
the only way you could pass information
was through arguments.
So job one finishes,
and then it's allowed to inject
some command line arguments into job two.
But that's it.
And because it's a command line argument,
you can't stick like a whole JSON blob in there,
a large JSON blob or anything like that.
You couldn't put an image in there or something.
And so what you have to do for these early ones,
is use something else to store everything.
So, you know, the task transcodes the video,
and then it doesn't pass the transcoded video onto the next task,
which may be like sets a flag in a database saying it's ready.
But what it does, it puts the transcoded video into some object store,
like S3 or something else.
And then all it passes to the next task is maybe a unique idea of the video.
later versions
they allow you to pass arbitrary data
but under the hood they're doing the same thing
so when you pass like a panda's data frame
or an image or something from one task to the other
under the hood it's actually storing it
an S3 in some temporary bucket
which has like a low retention policy
and then pulling it back out for the next
So, but yeah, that's all there is too.
So it's really not that hard.
I would say this is in the category of things that kind of like functional programming languages where it's kind of a unique paradigm.
If you've never used a workflow orchestrator, it can look very daunting.
But once you've learned one, now you kind of can do pretty much all of them pretty easily.
I haven't used one, but looking forward, a lot of these names are familiar.
So, uh, wait, really?
You haven't, uh, well, what about like a proprietary one or something?
I'm sure you, oh, yeah, uh, I guess that's true, but I feel like it's a bit different.
Like if someone else already has like all of this setup, I, I've sort of been a part of one of them, I guess.
So other people sets up this stuff and I, you know, define some portion of some code that needs to run and someone is sucking it up into a container and running it.
Yeah, that way.
But I've never sat down and had to write in the language.
of describing the workflow as an example.
That makes sense.
Yeah, at Meta, we had something called Datastorm,
which is very similar to the ones we'll talk about next,
but I never worked on Datastorm.
I was mainly mostly a customer,
but they all are extremely similar.
And I think a lot of these cases, like Data Storm's an example,
there was something called,
flume which was
similar to Spark
kind of a combination of Spark and a
workflow orchestrator and then that
someone made an open source version of that
but yeah I think
most people will encounter workflow
orchestrators it might just be a homegrown one
instead of one of these ones
yeah that's true but the the big
the big market share
ones are
Airflow. Airflow
is an Apache project.
I don't know the history of
Airflow, but it's
kind of the 900 pound gorilla
in the room. So
if you're on AWS, there's something
called MWAA,
which is managed.
I don't know what the W is, but the A's are Apache
Airflow. And basically, you don't even
have to set up airflow
yourself on any computer or anything like
that, they just do everything for you.
It's kind of like RDS for databases.
So RDS things, you don't have to like install Postgres on a machine yourself or anything
like that.
They have the same thing for Airflow.
Airflow is legacy.
It doesn't have a lot of the features of the other ones we'll talk about.
But you can feel confident that Airflow can do almost anything.
Like, if you can't get it done in Airflow,
what that means is of the thousands and thousands of developers,
nobody's tried to do what you're trying to do,
which is maybe a sign that you're doing something wrong.
It's kind of in that category where too big to fail, I guess.
But it is legacy.
I think it's originally in Java.
And because Airflow is so open-ended,
you know, it's not hooked into any one language that well,
I would say, is my biggest detractive.
of Airflow.
Some other examples,
I used Dagster a lot.
Dagster is quite nice.
I think it's Python only,
or if it's not,
there's maybe a Rest API
that would be pretty ugly to use,
but it's really meant for Python.
It's nice, it gives you old decorators,
so you can decorate functions
as Dagster nodes.
And so when a function calls another,
if it calls a Daxter node as a function,
it'll return back a future,
and Daxter kind of handles all of that for you.
So it kind of pauses the operation of your function,
calls that dependent task,
and then comes back,
and gives you your function back,
which is pretty clever, pretty cool.
So temporal is they actually just raised,
I think, a series,
B. I'm going to double check.
Yeah, they raised
$2.5. Oh, $2.5 billion
is their valuation. A series
C, they raised.
And
it's pretty cool. So it's
kind of a different take.
You don't have the dashboard. You don't have
a lot of the things we talked about that are important.
But it just gives you the
bare thing that really matters,
which is being able
to run a function
and know you're going to get an answer
regardless of how long that function takes.
So temporal would be a good thing
if you're building your own workflow orchestrator,
I think temporal would be a good thing to build it on top of.
And it might even be that Daxter uses temporal.
It's very possible.
But it is kind of like lower level,
you know, really calling individual functions durably.
similar to temporal
there's Apache Ray
so Ray comes from this
company called Any Scale
Ray is also pretty low level
so you're calling individual functions
and it has to be
kind of in Python
and so it's hooking in at pretty low level
kind of like temporal
the thing that's really powerful
about Ray is just how
unbelievably fast it is
So with Ray, you spin up all the machines in advance.
So, you know, all these other workflow orchestrators,
you're spinning up a machine, doing some work,
and then spinning it back down.
With Ray, you spin up a whole cluster.
When that whole cluster is up and running,
you then execute a bunch of functions,
and those functions get farmed out to that cluster you span up
for that particular task you're doing.
and so it's a fully connected mesh of machines
and then when you finish your task
that entire cluster gets spun down
so you're paying a bigger hit up front
but you're getting like super super low latency
like you never have to worry once the whole cluster is up
you never have to worry about any node
requiring another machine or anything like that
and the final example I have
which folks should know about is cube flow
cube flow is similar in a sense to Airflow
the big difference is they've integrated a dashboard
where you can upload different statistics
so for example Airflow is a dashboard where you can monitor the jobs
and see oh I have five failed jobs yesterday
QPlo has that too but QVlow allows you to like each run
have a dashboard for that specific run.
And so it's like, okay, task three,
you know, output a time series graph,
and here it is.
Task four created a scatter plot,
and there it is.
And so for workflows that are producing statistical outputs,
that's like a nice feature tab.
It sounds like some of them are,
like you said,
sort of lower level.
Like Ray sort of sounds like it's shorter,
not shorter.
You want higher, you know,
response lower response times lower latency you want to be able to do lots and lots and lots of
things you want to sort of shard them all up and have them sort of distributed and then other
things like you were sort of saying like transcoding a video or something you pretty much want to
like do a bunch of data reading do a bunch of processing kind of do a bunch of data writing
you know as far as like externally so you're sort of dealing with large blobbed objects and
very long run times and so it feels it feels like there isn't
a generic solution that just sort of like meshes over all of it, which I guess kind of makes
sense. And as you were sort of saying, I think whether one sort of container can be multi-purpose
or whether you really are needing different containers for all of the different jobs,
probably plays a huge role in sort of like how easy or hard it is. Yeah, yeah, totally.
It's one of these things that like the lines are a little blurry because I'm sure, I'm sure
someone's going to email us saying, oh, you could do actually this, an airflow or, you know,
Ray could also do that, but just generally speaking, like, at the baseline level, you know,
Ray is not durable.
So if your Ray job, you know, you're responsible for doing checkpointing and things like that,
you know, if your Ray job dies halfway through, it's just gone.
Whatever checkpoint you have is whatever you created.
You know, these other ones, you know, they'll just by their nature, they're constantly creating
checkpoints.
And so, yeah, there's basically several things you're juggles.
you know, how durable do you want these tasks to be,
these parts of your workflow,
and then what's your latency requirement,
and then what's your sort of cost you're willing to spend?
Because Ray, you know, for every...
So just to give an example,
let's say you have some type of task,
and that task has a list of Python third-party libraries
that need to be run, need to be included.
So if you do Airflow, you know, when that task node spins up, it spins up your Docker container
where you've already put those libraries in it, right, and it's ready to go, and it executes your code,
and that's fine. With Ray, you create a Docker container with those dependencies,
similar to with Airflow, but you spin up an entire cluster, and every node on that cluster
has all those Python dependencies.
you know so if you have especially like heterogeneous things where maybe you have one task that requires torch which is like a two gigabyte pie pie file right and then you have another task that's just send an email you know that only requires like the base Ubuntu you know that an airflow is going to save you a little bit there so yeah there's no free lunch there's like several things to juggle I guess so some advice um I think
Ray is, is, you should only use Ray if latency is important.
So if you're doing ML training, latency is very important.
You have this GPU, it's very expensive, you don't want to waste time on it.
So that's why Ray is extremely popular for, uh, training these AI models.
Um, if you're doing ML stuff or things where you're producing a lot of, um, you know,
graphs and a lot of those artifacts, I think Q-Flow could be a good,
choice.
And for everything else, I would use Dagster if what you're doing is in Python.
Otherwise, I would use Airflow.
So that's sort of the workflow diagram of which workflow orchestrator to pick.
Yeah.
Can he's just all prescribed out for us?
I guess it's called a flow chart.
But yeah, okay.
Oh, that's true.
Cool.
Yeah, I guess, you know, some like,
things to avoid other than rolling
your own orchestrator
is I think you can go
the other direction and kind of make things
super complicated with
auto scaling and a lot of these
other things. I think
you can take baby steps
with these things. Just like anything, you can kind of start
getting into it and end up
diving super, super deep
and saying like, oh, I'm going to preload
the nodes with these packages
and at the end of the day, like
you know, this, you want this
to save you time. You don't want this stuff to cost you a lot of time.
Makes sense. Thanks for the overview. Yeah.
Yeah, the education.
Yeah, this is a good thing to learn, folks. I think this is super, super useful.
If you are looking to get started, imagine you have a, you have a website that has any type of job that needs to get done.
I'm trying to have a good example.
Like, you have a website that just OCR's pictures.
Someone puts a upload, drags a picture into your website and you send them back some text or something in an email.
And nowadays, OCR is so bloody fast.
Like, you can just respond instantly.
But let's pretend like we're still where that took a long time.
You know, just spin up temporal.
You know, it's, I think it's relatively easy to stand up.
I think they have a managed service too.
It's very reasonably priced.
and say, look, like, I'm going to call this function to do the OCR,
and that function is going to do the OCR and send the email,
and I'm just going to wrap it in temporal, and now it's durable.
And that could be just an easy way to get started.
Cool.
All right.
Yeah, we hope you folks like the episode,
and definitely keep sending us, you know, requests for new content,
new show ideas.
We take them super seriously.
we've implemented many of them.
So our list is getting smaller.
So if you have a show topic, please Slack us or Discord us.
I mean, email us, however you want to reach out to them.
Thank you, everyone.
by Eric Barndollar.
Programming Throwdown is distributed under a Creative Commons attribution, share-alike 2.0 license.
You're free to share, copy, distribute, transmit the work, to remix, adapt the work,
but you must provide attribution to Patrick and I, and share alike and kind.
