Embedded - 143: I'm Thinking of Unicorns
Episode Date: March 17, 2016Dan Luu (@danluu) spoke with us about processor features, startups vs large companies, error handling, and computer science research. Dan's blog is danluu.com. Some posts we talked about: CPU featu...res since 1980 Working at Startups vs. Large Companies Recurring Postmorten Lessons Efficacy of Computer Science Research Areas Dan mentioned some conference proceedings he monitors. For computer architecture: ACM/IEEE International Symposium on Computer Architecture (ISCA):http://isca2016.eecs.umich.edu/ IEEE Computer Society Technical Committee on Microprogramming & Microarchitecture (MICRO): http://www.microarch.org/ High Performance Computer Architecture (HPCA): http://www.hpcaconf.org/ For software engineering: International Conference on Software Engineering (ICSE): http://www.icse-conferences.org/ Foundations of Software Engineering (FSE): http://www.cs.ucdavis.edu/fse2016/ He also mentioned Operating System Design and Implementation OSDI: https://www.usenix.org/conferences/byname/179
Transcript
Discussion (0)
This is Embedded FM.
I'm Alicia White, here with Christopher White.
This week we'll be talking with Dan Liu
about CPU features and software practices,
how they overlap.
One quick note, don't forget to check out our blog
and sign up for the newsletter.
Now, Dan.
Hi, Dan. Thanks for joining us.
Hi.
Could you tell us about yourself as though
we had just met at a conference
or some such?
Sure. So my background is
mostly CPU design. I worked at this
chip startup, Centaur, for like
seven and a half, I guess almost eight years.
And the last couple of years, I was at Google
doing hardware accelerators on an application that I think
they don't want people to talk about.
And then since I've been at Microsoft, doing
the same kind of thing, a different application. Here, the
application is networking, and they're much less secretive.
And so the basic idea is
Moore's Law is slowing down,
but people still want things to get faster,
and so people are moving functions from
software into dedicated hardware.
Cool. Okay, we have a lot more questions about that.
But before we get started with that, we do this thing called lightning round where we ask you short questions and want short answers.
And then we pretend to want short answers.
We usually ask for explanation, even though it says in the notes that you're not supposed to.
So are you ready?
I'm as ready as I'm going to be, I think.
So sure.
Favorite fictional robot?
I don't know the name, but that robot from Ancillary Justice,
which isn't really a robot,
which is sort of like, maybe that's not the right answer.
He was a robot.
I mean, he wasn't organic.
Go ahead.
I'm still breaking the rules already.
It's going to be one of those.
Worst compiler you've ever used? Oh. Good one. I'm still breaking the rules already. It's going to be one of those. Worst compiler you've ever used.
Oh.
Good one.
I don't know.
There was one I used in school, but I don't know what it was called.
It was one of these things for, I guess, some relatively small microcontroller-y thing.
And some custom thing, but sorry, I don't remember the name.
But I remember that left shift didn't work.
And instead of erroring out, it would just not do it uh which is like sort of
annoying to debug when you assume the compiler is going to actually you know do what you say
yes 8-bit or 32-bit uh 32 okay well we're on this line big or little indian
um little i supposeware or software?
Both.
Least favorite planet?
Hmm.
Earth is allowed.
Yeah, anything except for Earth, right?
I find Earth relatively pleasant.
Okay.
Favorite processor of all time?
The Alpha 21064.
Oh, cool.
All right. Yeah. Yeah, you're nodding along with, you know, that is. Oh, cool. All right.
Yeah.
Yeah, you're nodding along with, you know, that. I remember Alpha.
I'm just looking confused.
Two of the days.
Okay.
Intel or ARM?
Depends.
On what?
Very politic.
It depends what you want to do.
I mean, for now, right, like people want ARM in servers,
but the performance is usually not up there for actual server workloads.
But for a lot of things,
you pay a pretty large tax for buying an Intel chip.
I don't mean necessarily because it's inherently ISA,
I just mean that Intel has high margins,
and they don't want to cannibalize their margins.
So even though low-end chips are more expensive
than they'd often like them to be.
And also, it's easier to get ARM IP.
You can technically get Intel IP,
but you have to be a very large, very important customer.
But anyone can go up to ARM and just get ARM IP. Well, you do have to plunk
down a fair chunk of cash. Yeah, okay.
I've been through that. It was not as pleasant as it sounded.
Fair enough. You have a
deeply technical block. Could you tell us about it?
Yeah, so I don't know. It's funny. I think of it
as not very technical, but I guess it's relative. So the reason I write the blog is often I talk to
one of my friends who's someone who's very technical. I think it was like, you know,
very smart, knows a lot of stuff. And asked me, like, why does X happen for some value of X that
is, I think, probably obvious to me and all my coworkers. And it's like, oh, I thought this was
obvious, but it's not actually obvious. It's only obvious if you've worked in this area for like, you know, five years or 10 years, right? And so I try to write down a lot of X that is, I think, probably obvious to me and all my coworkers. And it's like, oh, I thought this was obvious, but it's not actually obvious. It's only obvious if you've worked in
this area for like, you know, five years or 10 years, right? And so I try to write down a lot
of things that I think any of my coworkers would think are like too obvious to even write down.
And it turns out people like often don't find this stuff obvious, right? And so I think it
provides some value just because it's stuff that like, I don't know, there's a bunch of people,
I would say like thousands or 10,000 people, right, who know all this stuff, and they know
it by heart, it seems like very trivial to them. And there's a bunch of people who like are interested in this stuff. But thousands of people, right, who know all this stuff and they know it by heart. It seems like very trivial to them.
And there's a bunch of people who are interested in this stuff, but there's not a good place to get started, right?
Like textbooks sort of explain one side of it, but it's like hard to just sit and read a whole textbook.
And they also don't keep up to date in a lot of areas.
And so that's sort of what my blog is about.
Does that make sense?
Yeah, it's like institutional knowledge at companies that can be the same way, except it's spread across an entire industry.
Yeah, exactly. be the same way except it's spread across an entire industry yeah exactly well one of the
posts that caught christopher's eye and he sent along to me was where someone asked you what's
new in cpus since the 1980s which was just such a relevant question um because we work in embedded
and our processors are way behind state of the art and so I feel like I'm sort of working in the 1990s processors.
What did you say to that person?
Oh, so this is a super long post, I think,
but to sort of summarize it, right?
First, I have this disclaimer that when I said new,
I just meant new for x86,
so I'm just talking about x86, basically Intel chips.
Because a lot of the ideas that are new to Intel x86,
they were done in supercomputers in the 70s or the 80s
or even the 60s or something like that.
So new is always relative.
A lot of stuff gets invented again.
And it's also like a lot of it was in papers,
again, like 10, 20, 30 years ago,
but it was only practically implemented recently.
But there's, wow.
I guess I don't want to go through it
and just list the whole post
because I think it's like 10,000 words.
But the high-level stuff are memory, caching, that stuff has changed a lot.
Out-of-order execution is, well, certainly new to x86 since 1980.
And it has a pretty large performance gain.
There's a bunch of security stuff that I mostly don't talk about.
So I try to avoid talking about all this low-level stuff that you sort of care about, actually, probably from the better world.
Like there's A20M and APIC and all this other stuff that you actually have to get it right.
But most programmers aren't going to deal with it
because the OS or some driver deals with it.
And then there's also just a lot of things around,
I guess, how you can sort of focus more on things
that affect the programmer, like a normal programmer,
not somebody who's writing drivers or something like that,
and like how these performance things matter
and which ones you should actually care about
when you're sort of writing code.
Does that make sense?
It does.
I mean, that's the perspective that I have is I don't really care what's new inside the
processor.
I just want to know how it affects me.
And memory caches, that makes a lot of sense.
If I use this space, I only have a little bit, but it's very fast.
But out of order execution, that seems like a compiler thing.
Yeah, I mean, it's unusual that you have to care about this
unless you're writing assembly.
That's happened, yeah.
Yeah, so you sort of care about it.
If you're writing assembly and you're really doing optimization,
you sort of care about it.
There's this great tool from Intel.
I don't know what it stands for, but it's IACA.
And what it does is you can give it a snippet of code,
and it'll actually tell you, assuming it's not memory-limited, so assuming it's IACA. And what it does is you can give it a snippet of code, and it'll actually tell you,
assuming it's not memory limited, so assuming it's basically
execution limited, it'll tell you
what ports are busy at which time,
and you can do very micro-level optimizations,
because it has a model of the processor, and it'll
run your code through that. If you do that kind of thing,
then yeah, it's super important. But if you're
just writing high-level C++, Java,
Scala, Ruby, whatever, then
you just sort of write things down.
And you want to make sure you have some good memory quality.
You want to make sure you don't have branches that are totally unreasonable.
But otherwise, this kind of stuff shouldn't affect you directly.
That makes sense.
It sort of reminded me of talking with Nate Tuck many, I want to say years ago,
but it hasn't it that long ago?
About the NVIDIA's Tegra K1,
where they had this thing that ran underneath the machine code
and tried to optimize the Java that was running in Android.
Chris, can you explain that a little better?
Well, I wasn't really trying to optimize the Java directly.
It was taking ARM code that might have been not well optimized
because it was produced by a generic kind of ARM targeting compiler,
and they played some games with microcode,
and I'm probably ruining this,
where they applied more direct optimizations
that they knew how to do at runtime on the CPU.
But this gets back to the whole kind of code rewriting,
microcode, transmeta's ideas.
Transmeta's a company, not a...
It used to be a company.
Where you have the chip architecture is actually micro-coded,
and then it can do other things and kind of morph the code
you provided in another layer below what the compiler optimization is doing.
And I just butchered that completely.
So that's something that's way out from embedded chips.
Yeah, I found that to be a super interesting idea.
It seems like it's still, like the implementation
is much harder than anyone expected, right?
I think Transmeta, I believe they spent almost
a billion dollars, and they brought in about
I think three or four million revenue, and the rest
they brought in through lawsuits or VC funding.
So that was clearly not a good return on investment.
Well, the lawsuit, I guess, you know, if you're a VC,
you don't care if the money comes from a lawsuit or from selling
the chip, but you know, it's still a little bit funny.
And then the Tegra chip,
it's still, I think it's
relative to the ROI, a lot more
effective than Transmeta, but it's still,
I don't know, they're not quite there yet in terms of
having something that really sort of, I don't know,
lives up to the promises that were
sort of originally made about Transmeta.
Yeah, and I think their goals are slightly different.
They're trying to solve some very
specific problems that they see with Android code
and the kinds and the quality of code
that is delivered to the CPU.
Yeah.
Which is kind of, I mean,
it's a problem that really didn't exist
in the transmitted days.
Transmeta was trying to do something
completely different.
Yeah, that's fair.
So you were talking about hardware accelerators.
When I think of those, I think about little PLCs or FPGA blocks that do something very specific for an application.
And, you know, I'm very embedded, so I can be totally out there.
I think of like a GPU as sort of a very big hardware accelerator block or a DSP.
And Chris is shaking his head like I'm crazy. So maybe I should ask you, what is a hardware accelerator block or a DSP. And Chris is shaking his head like I'm crazy.
So maybe I should ask you, what is a hardware accelerator block?
A GPU is a good example.
It accelerates a large class of things, and it's pretty fast for those.
I think people are now excited about GPUs,
not only for graphics, but for all kinds of HPC stuff, for deep learning.
My understanding is if you do deep learning stuff,
you get 8x or 10x improvement on GPU versus CPU.
But you can often do a lot better
if you build custom logic just for an application.
There's a paper by, I think,
a couple of students at Stanford recently
on deep learning. They built a chip
that only does deep learning.
So a GPU has all kinds of stuff that you don't need for deep learning.
It's able to do 64-bit floating-port operations,
for example.
At least I think that's probably true.
Well, obviously, it does graphics. It throws It does all the stuff you don't need.
So it throws all that overboard.
And so their chip, you know, it has the exact amount of cache you need.
Well, they have some
room in case things get bigger, right? But they have approximately
the amount of cache you need, you know, for caching the things you want to cache
for deep learning or whatever, right?
They have a series of benchmarks. They benchmark against a couple
standard models, if you're not familiar with deep learning, you know, whatever.
But, you know, they use ImageNet and a couple other things that are, like,
standard things people do in deep learning,
and they report between a 50x and
1000x speed improvement over CPU.
And so this is pretty good, right?
So it's...
The fundamental reason... Well, they have a novel algorithm
for compression, which is sort of interesting, but ignoring that,
the fundamental reason is that, I guess, the less
general you make something, the sort of more
efficient it is at the thing you want to do.
And there's often these applications
that you really, really want to do now.
Like, I think that most phones now, for example,
have some sort of chip
that does specialized peak recognition, right?
You know, Apple has Siri,
you know, Google has the Google Now thing,
and you say, okay, Google,
please search for whatever, right?
And it's relatively wasteful to wake up the CPU
and have it, like, do some sort of,
you know, basically,
it'll probably do some deep learning thing
and try to recognize your voice.
We have a chip that just sits there
and does that automatically.
Automatically is the right word,
but it's specifically just designed to do that.
That can be much lower power.
And so that's the kind of thing I'm thinking of.
Well, that and encryption blocks,
I mean, that's a very specific,
application-specific hardware block
that a lot of system-on-chips include now.
Yeah, yeah, that's pretty popular.
I think Intel even has something...
Is that true? What do they have now?
Yeah, I think they have AES now in the CPU even.
Well, and
machine learning on GPUs
makes a lot of sense to me because
it's all matrix math.
And that's what
GPUs do.
And GPUs are cheap.
I mean, for the computational power you get, they're pretty cheap.
To go off and do something custom, yeah, you need that 1000x speedup that you're talking about, Dan, to justify it, right?
Yeah, yeah, absolutely.
And I think there's actually a group that did this sort of, well, commercialized isn't the right word,
but they did this on scale first,
is Desiree's D. Shaw Research.
I think they did this with computational chemistry.
I actually don't know if they've actually
successfully made money off of this,
because I think basically there's a very rich person
who thinks this is a good idea.
They're just funding this.
They might have run it through hedge funds.
But I think they also targeted around 1,000x speed
for their application,
and I think they actually claim that this is what they get.
It's sort of useful.
I don't understand enough of the application domain to know whether or not this is a good idea.
Many people think this is a terrible idea.
Many people think it's a great idea, and I can't really judge between the two opinions.
Well, it depends on Moore's Law.
If Moore's Law is slowing down, as many people say,
then the processors aren't going to be getting faster as fast,
and so we're going to need to be smarter about implementing.
But if for some reason Moore's Law manages to continue through other means
if our processors keep getting faster
then designing specialized widgets is not going to help us
because we won't have to.
I mean, we can use the regular things.
This is like embedded systems.
You end up doing something very specific on your processor
because you are resource constrained,
and then another processor comes out in a month or six months or a year
that does all those things and has lower power and more RAM
and more space and more GPIOs,
and you're like, oh, I sunk so much of my life into this,
and now I could just upgrade
my processor.
Oh, and it's cheaper too.
Please shoot me.
That sort of thing.
I mean, it depends on what else is happening.
Yeah, I think that's definitely true, right?
If you're targeting like a 2x or 4x speedup and then the processor gets twice as fast
in a year and a half, right?
It probably took you a year and a half to do that design anyway, so it's not really worth it.
But the other thing that's happening is that there's
companies that are operating at larger and larger scale, right?
I think the numbers are all
pretty secret, but Google, Microsoft, Amazon
all clearly have over a million machines,
and sometimes by a pretty large number.
They're all looking at buying
probably more than that many machines in the next year,
and it's at that scale.
If you can do something that'll save you 5% of your computation, right?
It's actually worth a lot of money to do that.
So a lot of these companies have been hiring a lot of hardware people to go optimize just
a number of different things.
And I don't know, Google's pretty secret about this.
Amazon's pretty secret about this.
But Microsoft has actually talked about at least some of the applications, you know,
they're working on, we're working on.
And it's not that it's necessarily like an inherently better idea than it was 10 years
ago, but it's like the scale is just much larger than it used to be.
I think 10, 20 years ago, there weren't companies
that were doing this kind of thing
where they would have as many machines.
I remember seeing many, many, many years ago
an ad at Google's job boards that said something about
we need embedded software in order to minimize power.
And I was like, well, that sounds really, really boring.
And it was for their server farm.
And it didn't really occur to me that that, you know,
that actually could have been a really cool job.
Saving 5% of power on that scale is different than saving 5% of power on a wearable.
Yeah, yeah.
I mean, so this is a sort made-up number, right? But imagine
they have a million machines, which I think is a very low estimate
for now. I think XKCD estimated
2 million, so maybe we can use 2 million, right? I think
the amortized cost for a machine is
like, Hennessy and Patterson has an estimate where it's
like 5 grand a year or something like that, right?
So let's say 2 million times 5,000.
That's like a pretty large number, right? So if
you shave 5% off of that, it'll pay your salary
not just for the year, right, but maybe for your lifetime so like they're very happy to have
people come in and do the kind of stuff and it depends what you're interested in but like
personally I find that to be a pretty interesting problem well it was like when I worked on consumer
products and we shaved a nickel off of a toy that made several million of them and yeah that that
paid for my salary it was cool um so the blog so this is the sort of thing you talk about on the blog and you
do go into good detail i mean lots of interesting tidbits and facts and links and references it's
pretty cool uh why it's clearly uh you take time to do it why do you do it i don't know i mean
that's a good question.
I guess I feel like there's things
that people would like to know, and
it's helpful to explain them.
Sometimes it's just an interesting fact.
Sometimes it's sort of like, the common
belief about a thing is, I believe, wrong.
And it's not that it's, you know,
I don't know, wrong isn't the right word. Let me give a specific
example to sort of describe what I'm trying to, like,
what I'm failing to articulate. Like, two examples
would be, for a long time, people thought
monorepos, that is, using, putting everything
in one version control system, was, like, a very bad
idea. Not everyone thought this, right? At the time,
Facebook, Google, Twitter, they were all either
doing this or moving towards this idea. Most people who
didn't work in one of these big companies thought this was just, like, the
stupidest thing they'd ever heard of, right? You talk about this,
like, that's crazy. This is clearly wrong in every possible way.
And so my argument wasn't that this is like a great idea necessarily,
but it's like not the worst idea in the world.
And there's like a reasonable argument to be made for the other side.
This is also true.
I have a post on like working for big companies versus working for a startup.
I don't know how this happened.
I think, you know, Paul Graham and Sam Altman,
some other people who are, you know, who's writing as widely read,
they've made the argument that you should work with a startup
because, you know, basically it'll pay better.
The work is more interesting. It's just sort of better in
every possible way. And I don't think this is true,
right? I think there's a trade-off between the two.
And so I'm basically
like, I often sort of feel like everyone
believes something, and they shouldn't all believe
that this one thing is correct. They should believe there's
two possibilities or three possibilities and the trade-off between
them. And this is something that, I don't know, I guess a general
theme of my blog. And I'm like, oh, this sort of bugs me,
right? And so I write a thing trying to explain my position. And I don't know, I guess a general theme of my blog. And I'm like, oh, this sort of bugs me, right? And so I write a thing trying to explain my position.
And I don't know. I feel like I often don't convince everyone or even most people,
but I convince some people and that makes me pretty happy.
Yes, this idea that there is no one right way. Every time I feel like that, I realize
that I have totally gotten it wrong. And diversity of thought is a great thing.
We should talk about different ideas.
Well, it's a phenomenon that comes up a lot when you hire new people.
Because often you'll hire new people who have a different set of myths than your team or your company.
And so it's hard to integrate them, right?
Because they come in and they say, why are you doing it this way?
This is insane.
And the rejoinder to that is, why would you want to do it any other way?
And so you've got these competing sets of priors, both of which are applicable probably in different situations.
But each group is thinking, this is the one true way.
And yeah, I think spreading, trying to break through that is very useful.
Spreading chaos is a good thing.
How long does it take you to prepare a post?
It depends a lot.
I mean, some of my posts is basically half an hour.
I sit down, I write a thing, I hit publish, and I'm done.
Some of the posts, especially the ones that require simulation or something like that,
or a lot of code, that can take a lot longer.
I also sometimes just send it for review to people, not because I care about the post,
but I sort of have this, I don't know,
I guess my writing is often process-focused.
I sort of don't care how the post comes out. That sounds bad,
but it's basically true, because I feel like I would never
actually write anything if I cared how the post comes out.
You can always improve a post, right?
And so my general goal that I have, I guess this is true
not just writing for almost anything, right, is for each post
I want to improve a thing
in my writing, if that makes any sense. So I'll send it to a friend
of mine who's good at reviewing things, or actually, there's also
a professional editor I'll sometimes use.
I'll get feedback, and I'll say, oh, okay,
I messed up these things. And I'll
fix them in the post, right? But my real goal is, in that
post, to not mess up those same things. And I'm
kind of slow, I think, with writing, at least, right? So it
takes me often a lot of posts
to fix just a broken
structural thing in my writing. And so when I'm doing that if i take two or three passes that can easily be like an hour or
two per pass and saying it might be you know three six hours or something like that so i guess i
would say it varies between like half an hour and maybe 15 hours or 20 hours for some of the more
coding intensive posts and you post a couple times a month is that right uh on average yeah i mean
that's probably true yeah i think that's true goals for yourself with that um not really i try to avoid having anything but this
process goal of improving my writing i mean i know some people like having you know i post once a
week i post twice a week post once a month but for me i feel like you know i don't know if i don't
have anything to say or if i don't have time i'd rather just not have this be like a high stress
thing um and so i'd rather just you know have this have this goal of you know every time i post you
know i've improved one thing about my writing.
Okay, so what sort of things do you try to improve about your writing?
I mean, are we talking use of semicolon,
passive voice, or ability to explain through pictures?
What sort of things are you looking at?
It's mostly, I mean, sometimes it's like a minor thing.
A thing that I do a lot is I don't explain graphs
because graphs, they feel very obvious to me,
so I don't feel like I need an explanation.
It turns out this is not true for some people,
so I should explain graphs better.
So that's a simple self-contained example.
But I think a lot of the stuff that I try to fix
is more structural.
It's like I have the tendency to have these structural problems
in my writing where I'll say one thing
and then I'll move on to another topic
or I'll insert another topic and then I'll go back to the first thing I'll sort of insert another topic and I'll go back to the first thing.
And it's sort of confusing when things move back and forth.
And that kind of stuff, it's, I don't know, I guess you can,
well, you can probably tell from my talking, right?
I tend to ramble a lot.
So I do the same thing in writing.
But writing, you have a chance to easily go back and go fix these things, right?
So I try to fix that kind of stuff.
I'm a big fan of the elliptical writing.
You write it around in circles and then eventually you get to your point,
but it's at the center of a long spiral
where you touch on the same things over and over again.
But yeah, I see how some people don't like that.
That's a pretty big time commitment, though.
And you do it to become more effective at communicating,
not to become more effective at technology.
Is that right?
Yeah, I think so.
I mean, it helps me a little bit
technology because sometimes I'll set up papers that I've read, and I feel like I understand them.
And when I have to sit down and explain them, you know, I'll realize there's these parts I
don't understand. So I'll have to go write a simulation or do some more stuff to understand
it. So it does help me a little bit. But yeah, it's mostly a communication thing.
What is your favorite post?
Huh? I don't know that I have a favorite post. Maybe I can think about that and get back to you.
But in general, I don't really have favorites.
So I'm probably not going to be able to come up with a favorite one.
Yeah, well, my next question is,
what is your favorite that is underappreciated,
one that you worked hard on or that helped you really fix some sort of,
you said, structural problem in your writing
that nobody noticed but you were pleased by?
So, hmm.
Yeah, that's another question where I have the same answer.
I'm trying to think, what is one that I've, hmm.
I don't know.
I mean, part of it is when I, there's some of my earlier posts, when I read them, I sort
of, I see these problems in writing that I fix, they bug me.
And so they sort of help me in the sense that writing that post helped me fix some problem.
But whenever I look at old, this is also true of code that I write,
whenever I look at old writing or old code, I usually don't like it
because it's less good than it would be had I done it now.
So I feel like I don't have a favorite in this category, right?
Because I know if I look at a thing that I just wrote recently,
in six months, I'll have the same problem.
And so I sort of don't like any of my writing. Yeah. Okay. I understand that. Sometimes I go back and my
writing is better than I remember it. And sometimes I'm like, no, I would never do that that way
again. Why was I thinking? Didn't I know? Yeah. So do you know what you're going to write about next?
I have a few ideas just floating around my head, but none of them have really gelled in anything
that makes sense.
If you want, I can talk about them, but they're sort of incoherent.
It'll also be incoherent
as I talk about it.
Well, sure.
Incoherency is not required.
Okay.
A couple of things I've been thinking about are
one thing is, I feel like you get all these
ticks just from using computers, and a lot of
computer literacy is just like, you've picked up these ticks.
And what I mean is, there's actually an old Car Talk episode.
I don't know if you've heard of this.
I guess you might think of this as one of these very early podcasts, right?
It's an old radio show.
It's a couple of mechanics.
I like it a lot.
But anyways, so one day, I can't remember which mechanic, one of the two was talking to someone who has this car.
And, you know, they take the car for a test.
And they're like, oh, this car is completely unsafe to drive.
You basically can't steer the car. You can't even go in a straight line. And the guy's like, I don't know what you're talking about.
This car works fine. I drive it all the time. Everything's perfect.
And so the mechanic, you know, after arguing for a while, asked the guy who owns the car to drive.
And so the guy's driving and you know, he does actually go in a straight line.
But if you look at the steering wheel, he's like rapidly like just turning left, turning right, doing all this kind of stuff, right?
And so it somehow slowly develops the steering problem. But you know, because it rapidly turning left, turning right, doing all this kind of stuff. It somehow slowly develops the steering problem,
but because it's so slow,
he got used to it. He just started
doing all these things to fix it up.
I feel like I do this all the time when I use
software. It's just all kinds of weird stuff
that you do. It doesn't really make
any sense, but you're just sort of used to it.
I feel like we do it to service users a lot of the time.
You sort of say, oh, this should be obvious. Everyone should understand this.
You just run this series of 16 commands. Of course, everyone should know these commands. This makes perfect sense. But reallyervice to users a lot of the time. We sort of say, oh, this should be obvious. Everyone should understand this. You just run this series of 16 commands, right?
Of course, everyone should know these commands.
This makes perfect sense.
But really, I think a lot of computer stuff is really confusing,
and we could do a much better job of sort of making things easier.
But it's hard for us to notice, right?
Because we're all, in general, pretty good with computers,
and we sort of just automatically do all these really unreasonable things.
Those are the things that voodoo that you have to do to get everything working,
to get all of my compilers running with all of my debuggers and all of that. Yes, the voodoo, I totally understand. kind of remember the voodoo, but I don't. And that's
always the point where I'm like, I should be
writing this down and fixing it.
But instead I just relearn
whatever it was I needed to do.
I feel like there's a lot of things like that that
like you say, are ticks,
but we don't even fully understand them.
So
like with Git, sometimes there's
a series of commands that i do i haven't really
thought about what they mean i just do them because incantation that's the list of commands
and i feel like that's a huge mistake because when something goes wrong you don't really
understand how to how to unwind it i mean i definitely agree it's a problem i feel like
mistake is sort of a strong term right because i think there's so many of these that if you
understood all these right you would never get anything done right um so i sort of have a list of these things that i you know have written down somewhere and like every think there's so many of these that if you wouldn't have understood all these, right, you would never get anything done. So I sort of have a list
of these things that I have written down somewhere.
And every day, I do like 40 of these things.
And maybe every once in a while, I'll go dig into one
and figure out why that is. But I feel like there's
too many, right? You can either get things done
or you can go understand everything you do, but you
can't do both.
Yeah, that's why they persist.
Well, it'd be better if they were encapsulated.
So I didn't even know that it was voodoo.
I just, it worked.
It should just work.
I don't know.
I have a big belief in,
please stop making me remember these things.
Okay, you mentioned one of your posts,
big company versus startup.
That one was really interesting.
I found you talked a lot about how there's this myth that startups are the place to work.
Big companies don't pay.
They are uninteresting.
They're just not a good idea.
And so there's this pervasive idea that if you're not at a startup, you're just wasting your time.
And you took the other perspective.
But you've been at some big companies.
So what did you find?
And how much do you think is your bias showing through?
Yeah, I don't know.
If I have a bias, I actually don't know what it is.
I mean, so personally, right, I feel like,
so I've worked at a startup for like seven, eight years.
I've worked at a couple of big companies.
And I was happier at the startup, actually,
than I have been at the big companies.
But I think for reasons that are sort of idiosyncratic,
I can't tell which one on average would make me happier.
So the advantages I talk about in the post
that I feel like are sort of some advantages of big companies, right,
is there's like large classes of problems that startups basically can't handle for various reasons, right?
They don't have the manpower.
They don't have the funding.
They just can't do it.
And often you can just work on like a world-class research problem where you can do like, you can use a lot of resources and a big company in a way that's basically impossible as a small company, right?
Like if you're a Google and you want to run a map of Jews across the history of the internet, you can just do it, right?
And small companies, they don't have access to data.
And even if they did, they don't have access to companies and resources, right? But if you want to
use 10,000 machines at Google, it's not a big deal. You can just run a job
and use 10,000 machines. That's fine, right?
Whereas a startup, if you wanted
to use more than 200 machines,
and this is this one startup, you'd have to talk to people, right?
Because they all had 1,000 machines. So if you use
500, you're using half their machines, and this is sort of a problem.
But Google, it's not a big deal, 10,000.
They've got lots of machines lying around, right?
And so that's sort of one thing. And the other
thing is, if you look at, I don't know, well,
there's this Go match that was played recently,
right? AlphaGo, this Go program,
you know, beat someone who was, like,
at least, arguably, the world's strongest Go player.
And large companies, they can often, like,
muster the resources to work on problems like that, right?
And there's a lot of these problems lying around. There's a lot of just really,
really interesting research problems lying around that small companies can't touch.
And so that's one advantage. The other
advantage is that large companies, I think, and this
surprises people, and this is one of the reasons I wanted to write this down,
they often pay much better than startups do.
And I think, I don't know, it's for some reason
very hard to convince people of this. I remember reading the comments
on each of my posts, and a lot of people were like, no, these numbers must be
completely wrong. This cannot possibly be
correct. No one makes that much money.
And the funny thing was, when I passed this post around
to my friends, because I thought it would be controversial, to see
what's controversial, they're like, oh, no one
thought that would be controversial. People mentioned, oh, the numbers
may be a little conservative. But
the problem was that actually the numbers were too conservative
and I thought that they were too high.
And this sort of happened again recently. I don't know if you read
H&M, but there's another discussion recently where the same thing
sort of happened. I think there was a New York Times
article that came out that mentioned that
engineers at sort of large cloud companies
are making between $300,000 and $1 million a year.
And a bunch of people were like, no, this is impossible,
right? This must be a total lie. No one makes that much money.
But if you talk to people at these companies,
$1 million a year is actually pretty extraordinary.
You have to be relatively senior. But $300,000 a year
is not considered
a particularly outstanding number for a senior engineer
at Google, Facebook, wherever.
Right.
And it's not that you should necessarily do that because you should make that much money.
Like maybe you don't want to for whatever reason.
Right.
But like, I think people should know this is a possibility.
Right.
And I feel like startups will often pay you like a third of that or half of that, maybe
if you're lucky.
And it's like, you should know that that's a trade off you're making.
Right.
If you actually go work at a startup.
Yeah, I totally agree.
And I was actually shocked to see that that was a myth reading your blog post because in my experience
and everybody who I've worked with in my history, it was always understood that startups were going
to pay you peanuts and give you the promise of stock options and, you know, later growth or
whatever. But I'd never, never thought that anybody thought that startups paid better than big
companies. I didn't think the disparity was that huge until I started talking to folks at Google and realizing that they were making just a crap load of money.
And back to your point of tools, I remember one person went from a startup to Google and just was like, I can ask for any tool i need and nobody ever tells
me no you're at a startup sometimes you can't even buy the 250 debugger because somebody is like no
no we have to save money and you're like that's going to save me 15 hours give me the damn debugger
but they're like oh no we can't actually spend money. And yet at Google, you would never have that problem.
That sort of thing.
Oh, you want the $100,000 oscilloscope?
Sure, just check it out.
It's very different because they do have that economy of scale as well as plethora of cash.
On the flip side, and you were talking about resources,
the startup experiences I've
been through, the velocity of work that got done and the, I guess the amount that could be
accomplished with a small team seemed almost an order of magnitude beyond what happened at big
companies. And I still see this today where large, large teams work on things
that I felt like teams of 10 or 15 people could do twice as fast. And I think that there's reasons
for that, but I wanted to hear what you thought about that, whether I'm crazy or not.
I mean, it's anecdotal, but yeah.
Yeah, it's very anecdotal.
I think that's definitely true in a lot of cases. It feels sort of paradoxical in a lot of ways, whether I'm crazy or not. I mean, it's anecdotal. Yeah, it's very anecdotal.
I think that's definitely true in a lot of cases.
It feels sort of paradoxical in a lot of ways, right?
Because if you look at the, like Alicia mentioned, right,
at Google you can get access to way better resources.
It's not just that, too.
The internal tooling is just better.
I know a lot of people who work at Google and they stop doing open source stuff
because it's just the friction involved
in writing software outside Google
is so much higher than the friction involved
in writing software inside Google.
Like, I'll give one example of, maybe 20 examples that seem magical when you start working at Google.
They have this build system.
So you run blaze build X for some value of X, right?
And your build just works.
It takes between, I would say, 5 and 20 seconds.
This is assuming you don't do LTO and you don't do anything like that.
But if you're just building a dev build, it should take you between 5 and 20 seconds.
So one day, in fact, during orientation, the backend for this, which is called
Forge, it went down. And so I
tried building locally on my desktop, basically a
hell of a program of Flume. Flume is like a wrapper
on my produce. And after about an
hour, it was 2% done.
So this would take about 50 hours to build, right?
Without the system. And I've talked to people on
Google search, and it's like, there it's like
four days, right? And again, your build
takes 20 seconds. And so, in theory,
you should be much more productive. And in some
cases, you can be. I can think of some teams that are extremely productive.
They take advantage of this really well, and they don't get bogged down in stuff.
But at the same time, I don't know
what happens, but at big companies, there's often just a lot of
weird bureaucracy that doesn't make
any sense whatsoever. It's just like,
baffling this could happen. Actually, a friend of mine,
this is in Microsoft, in another org,
I guess I shouldn't name names, but they're in another org,
and apparently the GM
forgot to sign the thing
that you need to sign so people get promoted under them.
They were reminded a few times, they just didn't do it.
And I'm told this means no one under them will get promoted.
And this is baffling to me, right?
They know who should get promoted, they have this piece of paper,
but whatever, it says who should get promoted.
Can't they just fix that? Can't they just go back
and promote these people? It's not like
this happened eight years ago. This just happened
last month. And a lot of people
think this can't happen. And there's just so much
bureaucracy. This is a super obvious thing.
You just have to check a box somewhere
to make this happen, and they can't make it happen somehow.
And this kind of stuff just adds up and adds up
and adds up, and you lose a lot of opportunity because of that
kind of thing.
And people get annoyed at these little things that build up.
I also found at startups, I felt more needed.
Not more important, but more like if I didn't go in my work,
my contribution meant that the whole company didn't make progress.
I guess that is important.
All right.
Well, the corollary to that is that at startups, there are only a few people.
So everybody has to take on many many roles well i loved the not being pigeonholed to
to float between being uh somebody who helped customers somebody who did development work
somebody who went into manufacturing and helped them debug their issues and a manager and
even the person who went out and got lunch because you know somebody had to and so you do all of
those in well usually about two hours,
and then you do it again a few times.
I liked that part.
But it does decrease the ability for specialization
and deep problems, the hard problems sometimes.
Yeah, and maybe it's that startups can be more productive
than big companies at certain things, is the best way to put it.
At certain things.
And big companies are better for other things.
But certainly pay, yeah, big companies.
Your startup lottery ticket, those stock options,
let's talk about what they're worth.
Yeah.
Let's see, what other posts?
You had so many uh there was one you talked about uh what has worked in computer science and i fear that i didn't read that one properly because you started
out talking about someone else's article and listing what things had actually made real
progress and what things were just not working in computer science
and i think some of the things that had done well pre-1999 and post and up to 2015 when you wrote
the post were like bitmaps and guis and web and algorithms and i see all those things. They've been succeeding wildly. But things like security, certainly pre-1999, no. And between then and now, maybe. And I think that's how everyone would agree is like works very effectively and still works very effectively.
Like virtual memory, right?
You mostly don't even think about it because everything you use basically uses virtual memory.
This is not quite true.
Maybe embedded systems, some of them don't, right?
But for most consumer stuff, server stuff, you use virtual memory.
But the things that are not very effective, right?
So I think I mentioned software engineering, which means something specific.
What I mean is actually the field of research people call software engineering, not the idea of software
engineering, which is sort of different.
Like capability-based computing, I find that one interesting
because it's one of these ideas that I feel like a lot of
the smartest people I know think this is a great idea.
They try to build a system around it, and it doesn't
work. And then they're like, oh, maybe this is
harder than I thought. But I feel like it's sort of
in this class of ideas, it seems like a really brilliant
idea that we should definitely be doing, and then you try to do it
and it's like, somehow it doesn't quite work out. You know, fancy type systems is another thing that like, you know, I think it's also in this class of ideas that seems like a really brilliant idea that we should definitely be doing, and then you try to do it and somehow it doesn't quite work out.
Fancy type systems is another thing that
I think is also in this class of
things. It's really interesting and it makes
sense in some theoretical perspective, but it's been very hard
to actually make practical for some reason.
I guess probably the most controversial thing
is that I said risk was no.
I can talk about why.
I think security,
I said, was, or even yes,
and people mostly don't agree.
I guess I said it moved from no to maybe,
because it wasn't effective,
and a lot of people still think security is a total joke,
and we basically don't do it at all.
And I think that's also valid, right?
We're still trying to figure out how to do security,
and no one really knows how to do it effectively.
I think security is a unique problem,
because it's the only one where
you have adversaries.
Any other technical
problem that you're working on in computer
science, you're trying to improve upon
an existing
scheme or something like that.
But you don't have people actively
taking the previous scheme and ruining it.
Whereas with security,
you're playing this game of cat and mouse all the time.
And maybe something you did that worked for five years now is completely useless.
And so you have to reinvent the entire field as soon as a hash gets broken
or a keying system doesn't work or somebody finds a hole somewhere.
So it seems like a unique area.
Yeah, I mean, it's sort of interesting to me, too,
basically how good at attacking and how persistent people are.
I think this is something you probably don't see too much at a startup.
But if you work at Google, Amazon, Facebook, whatever,
if you look at the actual attacks that are successful,
they're often extremely intricate.
And you sort of wonder how people came up with this.
It's like they open up a VM,
and they do this exact sequence of things, you know,
644 times, and when you do this,
it causes this thing to overflow. And it's like, how did they figure
out that this sequence of things that's like 19 steps
long causes this overflow, right? Because
they don't have your source code. Well, at least you're
guessing they don't have your source code. I mean, maybe they do.
That's also bad. But they probably don't have your
source code, right? So somehow it just did a bunch of stuff.
And eventually you realize, if they did this
very obscure sequence of things, something bad happens. But not just once, they have to do it a bunch of times, right? Then once this happens, right? So somehow just did a bunch of stuff. And eventually you realize, if they did this very obscure sequence of things, something bad happens.
But not just once, they have to do it a bunch of times, right?
And then once this happens, then they'll suddenly
log on to 10,000 VMs, right? They'll do this all over the
place. But it's, I don't know, it's sort of
amazing to me how much effort
people put into this kind of stuff.
There is
recognition and money
to be made by doing it.
And as you pointed out earlier,
we're already pretty good at doing strange voodoo,
magical incantation steps to get things to do what we want.
So yeah, how did they come up with those 19 things
in order to break your security?
And yet, okay, yeah, we have to do that to make security work.
It's only 18 steps, so eh.
But software engineering research, let's go back to that,
because I didn't understand what you meant by that.
Yeah, so there's a couple, I guess, representative examples are
ICSI, International Conference of Software Engineering,
and FSC, Foundations of Software Engineering.
I think these are the two premier conferences in software engineering.
And people, I guess, research in software engineering,
the ceiling has become more nebulous over time,
so it's harder to say what this is.
But it's sort of about, I think,
anything I say, some software engineer researcher will object to,
so I hesitate to even say anything.
But the areas that interest me are when people do empirical research,
they try to figure out what practices are good and what practices are not good.
They try to come up with tools that people will actually use. And I think that
if you compare the impact of these areas versus the impact of, say, research in
algorithms, research in machine learning, research in systems, and career architecture, it's just very low.
It's mostly, most of the stuff we do does not come
from software engineering research, even when the stuff we do is something that could have been covered by software engineering
research, if that makes sense.
So I think as a field, the impact has not been very high.
I sort of don't really understand why.
But I find that to be sort of interesting, at least.
That is.
I mean, there's been a fair amount of research.
I don't know whether it's particularly software or general office, but it says open offices
are terrible for thought-heavy work.
And you're like, yeah, exactly.
Every time I get interrupted, I write a book.
That's just how it goes.
But the MBAs don't really like that answer,
so nobody cares about that software engineering research.
And other things like, okay, PCLint, yes, if we do test-driven development, if we do good static checking, it is better for everyone.
And yet, people don't use it for some reason.
They don't even turn warnings on or they don't care about them and you're like this seems like the sort of research that if even if somebody is
writing a paper that says if you do dash wall you will get 60 better software in 20 less time or
whatever they would come up with those sort of static uh head in the sand sort of static head-in-the-sand sort of behaviors just stick around.
And so even if they're doing the research, nobody's listening.
Do you agree?
Yeah, I find it to be sort of...
Or sorry, I don't know if the question was to me or not.
Oh, no, it was to you.
And was that what you meant by software research?
Yeah, and so, I mean, that's part of it.
So, yeah, I would say that that's half of what I meant.
I certainly agree with that.
I find it pretty weird.
So we use this tool,
just for example,
I mean, I could pick any example,
but for an FPGA vendor,
I won't name which one
because I think that
all their tools basically,
they have this problem.
We use their IP, right?
And I was just, you know,
running a build, right?
In the first phase of the build,
the map phase,
we get 13,000 warnings.
And if you actually
read the warnings,
I obviously haven't read all 13,000,
but some of them are really scary. It's like, you know,
PLL reset not connected correctly.
PLL may not lock.
And it's like, that's bad. If that ever
happens for real, our system is going to go down,
right? And, you know, if we deploy a million of these things,
this seems really bad.
So I asked someone about this. It's like, oh, yeah, so we
talked to the vendor, and they were just like, yeah, we just
have that warning. We didn't connect this up, but it's fine.
And the problem is, you know, maybe this warning is fine.
Who knows? But maybe it's okay.
But if you have 13,000 warnings,
maybe somewhere in that pile of 13,000 warnings,
one of them is not fine.
But because you have 13,000, right,
you can't figure out which ones actually matter.
And this is from our vendor, right?
Like, you would think they would do better about this.
But, like, you know, I've used both tools
from both major FPGA vendors,
and they both do the exact same thing, where their code
fails every lint check. It just produces
a bazillion warnings, and they just don't do it.
And it's not really clear why, but people just don't believe
in this stuff. Well, that's really bad
for the reason you state.
I had a situation a few
weeks back where
I had a weird linker error.
And because people
don't think very much of our tools, they assumed that it was a bug in weird linker error. And because people don't think very much of our tools,
they assumed that it was a bug in the linker.
And so I went back and forth and spent a lot of time
trying to figure out what was going on.
And it wasn't a bug in the linker.
It turned out to be something very dumb that we were doing.
But the natural reaction was, oh, yeah, just go bug them.
Don't spend any time trying to figure out if it's our fault.
If you don't trust your tools because
they get a bad reputation, then you're
not going to trust them when they're telling you you're doing
something wrong. Or you're going
to be swamped under, like you say,
a bunch of spurious cry wolf
kind of warnings.
Yeah, it's very bad.
Yeah, I agree. I feel like it's really, really
bad to develop on tools, especially if you don't trust your compiler.
Your linkers are the same thing.
Because even when, yeah, I've had this experience
where I start just assuming the bug is in the compiler
or in the linker.
And with some compilers, this is actually not a bad assumption.
But it means that you're going to miss real bugs, right?
And it actually is your fault, and you can go and fix it.
And so I always end up, I don't know,
I always regret just doing that.
But at the same time, it's sort of the only way
to operate sometimes when you have a tool that is just so buggy that, you know,
you run into two or three new bugs each day. Yes. Yes. I have an oscilloscope I'm sending back.
Okay. So what other posts should we ask you about? Or should I ask you more about your blog in
general? Hmm. I don't know.
Is there anything else that you thought was particularly interesting?
I feel like, I don't know, there's so much stuff, right?
Like, I feel like sort of everything I write,
I write it down and then I'm like, oh, that's kind of interesting.
But I also sort of don't like it.
So it's hard for me to pick anything in particular.
Well, you've been writing two to three technical posts for several years.
So it's actually going through them all as an education.
It was really interesting.
Everything from error handling in postmortems and CPU bugs in the future
and what we can expect from Intel.
Some of it was outside my normal scope, but it was neat.
I guess instead I'm going to ask you about your blog then, because you didn't... it was outside my normal scope, but it was neat.
I guess instead I'm going to ask you about your blog then, because
you didn't...
Okay, sorry.
Now that you mention that,
I have a comment, if you don't mind.
Sorry for changing direction.
I suppose this is a pet peeve of mine, because
there was a study by...
Oh, man.
Sorry, I don't want to name the author.
I'll pronounce his name, and it'll be terrible. It was in OSD, I think, sorry, I don't want to name the author, so I think I'll pronounce his name, and
it'll be terrible. But it was in
OSD, I think, in 2014. And
they looked at distributed systems, and they found that literally
the majority of what they call sort of
really bad failures, by which they mean the whole system
locks up or corrupts data, came from
bad error handling, right? And they looked at, like, why
the cases were often
really, really, really simple.
Sorry, I use the word really a lot.
But literally, I think in something,
27% of cases, they did not handle the error.
So it's just like, well, that's pretty simple.
And 8% of the cases, in exception,
they mostly looked at Java stuff.
So they have exceptions, right?
But the exception was over-caught by something.
So something would just swallow all exceptions.
So now we're up to, what, 33% of errors.
It's just not even 35% of errors.
Not even, like, where it's just basically not handled
at all, right? And this is
something that's been caught through, like, either
very simple static analysis, like, you can easily write your own linter
to catch this stuff, or just, like, very simple
testing, right? Just check that the error handling does
anything. And, I don't know, this happens
all the time. Like, it's not just this paper, right? There's another
paper out of, by Remzi and
Andrea's group in Wisconsin. They look at file systems, they do a lot
of file system stuff, and they looked at file system
error handling, and they found literally
every file system they checked,
they just, well, I think riserfs was kind of okay.
Other than riserfs, but no one uses riserfs,
so every file system they checked other than riserfs,
they would just ignore large classes of errors.
And there'd be comments like, I hope this error
doesn't happen, or if we're here, you're screwed. And there'd be comments like, I hope this error doesn't happen,
or if we're here, you're screwed.
And it's like, well, you don't really want that in your file system,
but this is just how people write code.
And again, they wrote a very simple static analysis tool,
I think it was on the order of 4,000 lines of code,
and it just produces hundreds or thousands of these errors that aren't handled.
And sometimes it's actually correct to not handle the error,
but often it's a very bad bug.
And I don't know why, but people don't take error handling
as seriously as they take the happy path. very bad bug. And I don't know why, but people don't take error handling as seriously as they take the happy path.
And this seems like,
I don't know, maybe I'm just
too obsessive about this stuff, but this seems like
the wrong attitude to me. I feel like you should spend more time
on error handling than on the happy case.
So that
actually brings up something I wanted to ask about.
I worked
many thousands of years ago
at a company that was doing
full custom CPU development
for a networking product.
And the idea was to take
what had normally been ASICs or FPGAs,
and back then FPGAs were pretty weak,
and use the full custom CPU development process
on them to extract performance gains
and whatever.
And we got to work closely with the digital design guys.
I was in the software group doing routing protocols.
And we got to observe their work habits and the tools they used
and kind of their attitude toward development
versus software development.
And it was a different world.
My impression was, and this could be false, but
it seemed like they took many things much more seriously. And there was a much more well-developed
and accepted path for doing testing and doing verification. And we tried to apply some of
these concepts in the software, and we didn't have much success, partly because we just weren't committed to it, and partly because software schedules and hardware schedules, one thing is understood in one realm and not in another.
But I just wondered what you thought about that, what you thought about it, because you're bringing up error handling and playing fast and loose and the kind of way we write code. And it seems like we've developed this culture
where just kind of fly by night is the way to go,
whereas right over the fence where CPUs are being designed,
that doesn't seem to be happening.
Or am I just delusional?
No, no, I agree.
But I think that it's not that hardware people
are better about everything, right?
So for instance, on my team at Microsoft,
it was a very large effort to get
everyone to use version control.
In the software world, people would be shocked
if you had a team that didn't use version control.
But here, you have hardware people, and sometimes they're just like,
I have version control. It's much easier just to mail zip files around.
This drives me nuts. People think that's totally normal.
It's not that hardware people
are more rigorous in every way.
They're more rigorous about certain things.
But yeah, I think in testing,
I mean, part of it is they spend more time doing it, right?
Because the cost of a hardware bug is much more
than the cost of software bugs, so this sort of makes sense.
But I don't think it's just that. I think on average
software people, they just don't use
the right, I mean, this sounds bad to say,
maybe I'm totally wrong, right, not being a software person,
but they mostly don't use the most effective possible testing
techniques, so like per unit time, they're less efficient.
Something I've noticed is that if you go in and write a thing
that generates tests for you, as opposed to writing tests by hand,
you can often find a lot of bugs.
I've done this for a few different projects,
and usually in half an hour, you can pop out 30 bugs.
And they're often really bad bugs.
I tried using the Julia language for a while.
I wrote a very simple test generator for that.
I guess in software land, people call this a fuzzer.
One of the bugs was
exceptions sometimes aren't caught. And this is terrible
because if an exception isn't caught, it goes and terminates your program.
So this is one of the worst, I think, bugs you can have.
And it's just, this was
literally half an hour of work to go find bugs like this.
There are other few short-stopping bugs like that.
And just no one had done this, right?
Because it doesn't encourage people to do this. And it's not like this is
rocket science, right? This is pretty simple. I'm not really
a software person. I was still able to write this thing
and it basically works.
It goes and runs as much as a test and runs them.
But for whatever reason,
people tend to do testing much more manually
in the software world.
And I don't know why this is.
And I find it to be, I don't know,
really inefficient.
And it's sort of baffling to me.
If they do tests at all.
Yeah.
Yeah, I can think of, you know,
I have a lot of friends at startups, right?
Some of them valued at some number of billions of dollars, and many of the systems
don't have tests, and people just push
production and see what happens. It's like, well, our system went down,
let's roll that back. And it's like, well,
I don't know.
To me, this is shocking, right? But this apparently works, right?
They're worth billions of dollars, you know, it works fine.
But I still sort of can't believe it when I hear about this kind of stuff.
Well, it goes back to cost.
I mean, the actual cost of rolling back feels free.
Now, all those people whose data you lost or who you pissed off because you weren't up or even mission-critical things, it feels free.
It's just a button push.
And I don't know that that's real.
You're not going to TSMC and saying, well, we need you to
re-spin this chip.
You're not even going to re-spin
a board in a couple of weeks.
You're just pushing a button.
And so, why do we have
to have super fancy tests
if we can just push buttons?
We push buttons all day long.
Boy, is that not the right attitude.
I mean, something I find interesting is that, well, for one thing, this basically works,
right?
You very rarely hear about a company actually going under due to some very bad bug.
I mean, I think Knight Capital, this happened to them, right?
But it was pretty unusual.
But you very often hear stories about how things almost went horribly wrong.
I'm not going to name names, but I can give a couple examples.
So in one case, this company that has a database,
their replica started claiming it was primary,
and the replica was empty,
so it started basically deleting the entire database.
And their all call didn't pick up,
and they had no backup on call or anything like that.
And so for like two hours,
the database was just deleting itself.
Luckily, someone noticed after two hours,
and they were able to, I think,
with a week and a half of men at work,
get their database back.
But had they not done this, they wouldn't have had a database.
They wouldn't have known anything about their users or anything.
And they have a competitor
in their space. And had this happened, I think they would have basically
gone bankrupt. And they're worth billions of dollars.
This was sort of a near thing.
Maybe it wasn't as near as it seems because this happens
all the time. They don't actually go bankrupt.
But for most, I would say
80% of startups I can think of, I'm thinking of
unicorns, companies worth a billion dollars or more.
I know of at least one story where they literally almost went bankrupt.
They were a few hours away from just ceasing operations for a week
because they would have just lost their back end and not be able to do anything.
And it's, I don't know, this somehow does get people to go and do this.
But on the other hand, maybe it's correct, right?
Because it's like, other than Knight Capital,
I can't think of any company that went bankrupt,
or software company that actually went bankrupt due to just test
failures or this kind of thing well that actually goes back to the flip side of software testing and
that is you can fix it you can fix it quickly usually if you don't shoot yourself in the foot
too badly but it's hard to quantify the cost, though.
It is.
When you spin a board or when you have to spin a chip,
there is a fixed large value dollar cost.
And when you have to fix a bug, it's a sunk salary cost.
And it's hard to split those.
Or opportunity cost that you spend.
Oh, but those are much harder.
Three weeks cleaning up instead of advancing your product.
Those are much, much harder to quantify.
It's not a cash that you give someone else.
A lot of these startups, it's hard to justify to other people
what you're doing if you're not adding features.
I was talking to someone I know at one of these startups.
This is worth tens of billions of dollars.
He mentioned that no one spends
more than about half a day tracking down a bug
because no one has the time to do that.
I think at the time I was at Google,
this is sort of shocking to me, because at Google I can think of multiple
examples where someone spent three to six months tracking down
just one bug. And they just
believe it's the right thing to do. We have this bug, it's causing
users problems, we should fix it. But at this startup,
and again, they're worth tens of billions of dollars.
They've raised over a billion dollars in funding, right? So it's not
like they don't have money, but they just have this attitude, like we have
to add features, right? This is the only thing that is important.
And if it takes more than half an hour to track down a bug,
you know, forget it. It's too hard to track down.
We'll just live with it. And
the product is also quite buggy, by the way. I actually,
you know, I use it because it's like, you know, it's not
the only game in town, but it is one of only two products.
And the competitor is equally buggy, right?
So it's not like I can switch to another one and using something better.
But it's just sort of, I don't know, I feel like it's sort of a matter of attitude, right?
Because a lot of these companies have raised some money that they could be serious about
it, but for whatever reason, they culturally just sort of don't want to be.
It is very much a cultural problem.
I mean, I've worked in places, I've worked on medical, we've both worked on medical,
and I've worked on FAA products, and there are even startups that understand bugs before features.
You have to fix the bugs before you start the features.
But it is so culturally different.
And being transplanted to a place where it's like, oh, features, just go on, go with the features.
We don't really care about the bugs.
We'll fix them later.
I can't.
I can't even live in that environment.
It makes me crazy.
Well, it's not even the case that that's a company culture.
Sometimes it's a team culture.
You have one team that does things conscientiously
and another team that flies around advancing, advancing.
And then you have a third culture or
management that doesn't understand
why you're spending all the time
not making fees.
Okay, insert rant here
about bugs features.
And let me ask you,
do you advertise your blog, I ask,
because we have a blog that's sort of new and I'm
still trying to figure out how to get
people there.
I don't really advertise my blog.
I mean, when I first started my blog,
I posted to HN when I made a new post.
I don't really do this anymore.
Now people often post to HN.
And it's sort of, I don't know, it actually took a while.
I think it took a year before.
It's Hacker News, right?
Yeah, Hacker News.
It took maybe a year before I got, like, I don't know,
sort of widely known enough that people will just post it other places for me.
I mean, I suppose the one thing I do is I often tweet about a blog post when I have a new blog post. But that's, you know, I don't know, sort of widely known enough that people will just post it other places for me. I mean, I suppose the one thing I do is I often tweet about a blog post when I have a new blog
post. But that's, you know, I don't know. That's the only, basically the only advertising I do,
I think. And what do you read? You've talked about some papers. Do you just go out and read
whatever? Or how do you find books or blogs or papers for your own consumption?
So I used to read a lot of books,
and I think a lot of my knowledge comes from books that I've read in years past.
This is a bad sign because I'm not really reading books anymore,
which means that five years from now,
I'll probably regret not having read books in this time period,
if that makes sense, right?
Because it's sort of this long-term thing.
Papers, I don't know.
There's certain conferences that I'll try to read,
at least skim the proceedings of.
So computer architecture like ISCA, Micro, certain conferences that I'll try to read, at least skim the proceedings of so computer architecture like ISCA,
Micro, various conferences, whatever field you're in
there's certain conferences that have good papers
I'm also very lucky in that I'm sort of
I work in an area where a lot of people
read papers, the stuff just falls in my lap, people are like
hey, this paper is really interesting, have you read it?
and I'm like, oh no, let me check that out
and also sometimes now because I've written enough on paper
people will email me their paper, they'll say hey, I have this
paper under submission to XYZ, can you read this and comment on it, and sometimes I don written enough paper, people will email me their paper. They'll say, hey, I have this paper under submission to XYZ.
Can you read this and comment on it?
And sometimes I don't have time.
So I say, oh, this looks really interesting.
I don't have time to comment.
But often I do have time.
And that's sort of good.
But I think it's a lot.
I'm just sort of in the right place, such that I'm around a lot of people who read a lot of papers.
And I sort of am able to get access to a bunch of stuff.
Not access.
That's not the right word.
I'm able to get pointers to a lot of stuff that's interesting.
And I wouldn't have time to comb through myself, if that makes sense.
Oh yeah, that happens with the podcast sometimes,
that I get to meet interesting people because others suggest them.
Last week's guest, Sarah, I didn't know her until a listener, Crux,
emailed and said, you should have this person on, and she was a ball.
So yeah, I totally get how building an audience and community actually leads you back to having opportunities and pointers that you might not otherwise see.
So yeah, cool.
Have you gotten any other benefits from the blog?
So at one point, I tried running ads just to see.
I didn't think it would make much money, and it didn't. So I believe a standard figure, like cost per 1,000,
like per 1,000 views is like, you know, like a dollar.
So this is really not that much, right?
And so, like, if I look at my traffic,
like in terms of only non-adblocked views,
like in a really, really good month, it's like 300,000.
In a month where I don't post anything, it's like 30,000.
And so that's like, you know, 300,000.
That sounds like a large number, right?
But that's $300. So, you know, and that's when I post relatively, you know't post anything. It's like 30,000. And so that's like, you know, 300,000, that sounds like a large number, right? But that's $300. So, you know, that's, and that's when I, you know,
post relatively, you know, a lot and, you know, whatever, I get lucky and people pass my posts
around or whatever. And so it's just, I don't know, it seems like monetary, directly monetizing
your blog seems difficult. And I haven't really tried to do it very much. I ran an experiment,
I didn't think it was very effective. And so I stopped. But I don't know, I feel like it helps
for job searching, like jobs, you know,
before I wrote a blog,
I had to go find jobs.
And now jobs just sort of come to me, right?
People will be like,
oh, hey, I have this role.
I'm hiring for this role.
Are you interested in this?
And, you know,
the roles are often
pretty interesting.
And even if I don't take the job,
in fact, mostly I haven't
taken the job, right?
Because I've only had
three jobs in my career.
But it's still interesting
to talk to these people
and hear what they're doing.
So I think the biggest benefit
is I get to meet people
and I sort of get to hear
about all these opportunities.
Yes, yes, exactly. We're in that boat too.
It's not necessarily that we get to do these things.
It's just that we get to find out about them and it's just neat.
I like that part. All right. Well, I,
I actually have to run off. It has been wonderful to talk to you.
Chris, do you have any other questions? No, I think we can wrap it up. Dan, do you have any last thoughts you'd like to leave us
with? You know, I have a thought, but it's sort of long. So if you have to run off, I don't want
to ramble for 10 minutes or whatever. So I think I'm okay for last thoughts. Thanks. Go ahead and
ramble. Go ahead and ramble. Okay, so there's something, this is something else I sort of want
to write a blog post about, but my view is sort of too incoherent, I think, to actually write down right now.
But I was just thinking about this because there's this acquaintance I know.
I think he worked at Microsoft for about 10 years in various contracting roles.
And I forwarded his resume around to people who were hiring because the job market right now is extremely hot.
It's sort of unbelievable to me, you know, how much people get paid nowadays.
It doesn't make any sense that it will go back down, but for now it's quite good.
And the response from a lot of companies where I have enough enough insight or enough connections that I can ask what happened,
they're like, oh, I don't know. This guy works with weird technologies, by which they mean he works with.NET
on Windows and not Linux, where it's like, oh, if he was really any good, would he have contracted?
He probably isn't any good, right? Because he's a contractor, not a full-time employee.
And it's just sort of like, this is very strange to me, right? I haven't worked with him, so I don't know if he's any good or not.
But he's not even getting interviews, because his with him, right? So I don't know if he's any good or not. But he's not even getting interviews because
his background is just not
the, I don't know, prestigious background
ever, right? And a lot of these companies, they're happy
to hire kids out of MIT, Stanford, who
they're very smart, but they haven't actually done anything.
And this guy spent 10 years doing all my various things
and people won't even talk to him. And I feel like there's
this path dependence. And this is also true of
my blog, right? I got lucky a few times. People passed my blog
post around. Each time that happens, it increases the odds
I'll get lucky at the next blog post, and more people will pass
my blog around. There's also this path dependence in your
career that seems profoundly unfair.
This person, when they were just out of
school, they took a job, they happened to work on
that stuff. And now they're not
doomed, that's too strong a word, but
it's very hard when they transition to working
in areas that are more trendy.
Startups that do more trendy, like startups
that do web apps,
like Uber,
Lyft,
I don't know,
Stripe,
these guys.
Most of them
won't talk to this guy.
And it's not that he's,
I don't know,
I just find it weird,
I guess.
And there's also
a lot of research
on this area,
not on this specifically
because this is
sort of too new,
but there's a paper
that came out,
I think,
this past year
talking about
just looking at
the children of people who were drafted versus the children of people who weren't drafted, right?
Because this is sort of a randomized trial, you know, the draft is like a random lottery, right?
And they found that even, like, I think a generation or maybe even two generations, I don't have to go reread it to make sure, but, like, income is still substantially reduced over the other case.
And there's also research on, you know, if you graduate into a recession, right, it takes about a decade to sort of normalize salary, right?
Just because, like, jobs aren't available, because jobs aren't available, you can't get an experience.
And when a recession sort of goes away, people would much rather hire this person out of school than this person with bad experience.
Because look at this person with bad experience.
They're like, well, if they're really any good, wouldn't they have better experience?
And it's, I don't know.
I feel like, well, there's two things.
One, it just feels wrong, and I don't like it.
But two, it feels like companies are missing out on a huge opportunity here.
There's a bunch of people, they're desperately competing for the same people, bidding them a lot.
To the point where we're talking about $300,000 to $1 billion in compensation, right?
And there's a huge group of people where people are probably just as good, and they just completely ignore them, right?
I sort of don't know why this is. It seems really weird.
So sorry, that was pretty long.
That was great. And actually, it ties back to diversity of thought. So many times, I see companies wanting to hire new college grads so they can train them up that, that they bring other things to the table beyond that education, beyond the pair of hands that can type. They have more experience in different areas. And I, yeah, cool. I look forward to that post.
Thanks.
Thank you for being with us, Dan.
Yeah, thanks for having me on. Thanks for your time. This was a lot of fun.
Good. My guest has been Dan Liu, hardware software engineer at Microsoft. And you can find his blog
on danlu.com. That's D-A-N-L-U-U dot com.
There will be a link in the show notes, of course.
Thank you also to Christopher White
for producing and co-hosting.
Thank you for listening.
Please check out our blog and our newsletter.
You can find it on embedded.fm
along with a contact link in case you'd like to say hello.
Dan also has a contact link or you can send us joint emails, whatever. It'll work out. I'm sure
it'll be fine. That's enough for this week, I think. And next week it will just be Christopher
and myself. So if you have an email question you've been waiting to send us now is the time, and a final thought to tide you over from Mr. Penumbra's 24-hour bookstore,
which I quite liked as a book.
Nothing lasts long.
We all come to life and gather allies and build empires and die all in a single moment,
maybe a single pulse of some giant processor somewhere.
Embedded FM is an independently produced radio show that focuses on the many aspects of
engineering. It is a production of Logical Elegance, an embedded software consulting
company in California. If there are advertisements in the show, we did not put them there and do not
receive any revenue from them. At this time, our sole sponsor remains Logical Elegance.