Programming Throwdown - R
Episode Date: September 10, 2015This show covers R: a language suitable for data mining and machine learning. Book of the Show Jason: The hard thing about hard things http://amzn.to/1UqMjDD Patrick: Steel World http://amzn....to/1JMcsa5 ★ Support this podcast on Patreon ★
Transcript
Discussion (0)
programming throwdown episode 46 take it away jason awesome so i'm gonna have a little rant
time here about tabs versus spaces.
I don't know if you've had this rant at your work,
but the Go spec, like the Go...
What do you call it?
Go lang.
No, no, I know.
The Go style guide talks about using tabs,
and you actually have to do some trickery to make it support spaces.
And I hate tabs.
I mean, here's the thing.
I use tabs.
Well, I like the tab button because usually it does not insert a tab.
Usually it does something much more useful,
like take you to the beginning of a line or something like that.
So the tab key can stay.
But the tab character,
like for me, it's only good for one thing.
And that's, I started replacing CSVs,
like comma separated value files with TSVs.
And the idea is like, you could have a CSV and some of those columns could have commas in them.
I mean, it could be, you know, last name, comma, first name
as one of the columns, right?
And so now you have to escape it out
and do all sorts of craziness, right?
But nobody should have a tab.
The tab character should not exist, in my opinion.
And so it's easy for me to just kill all the tabs,
like replace tabs to spaces in all of my data,
and then use a tab separated file
and just know that there's not going to be any weirdness because the actual data isn't going
to have tabs in it so for that it's good the tab character has like one purpose and it's that but
when i see source code with tabs in it i don't know it just drives me up the wall especially
open source projects uh i just i don't know why they do that it just drives me up the wall especially open source projects uh i just i don't know why they do that
it just drives me crazy so that's my that's my tabs versus spaces rant use use spaces please
for the love of god uh should we all also stay to 80 character line widths i don't have do you
have an opinion on that yes 80 sucks it's way too short yeah i know that so i actually do prefer
spaces over tabs but i've never had a problem where i really cared that badly about it but i
hate 80 character line limits is so bad yeah i've never actually um had 80 care so let me think
about that so the other ones i've seen are 100 characters yeah that's what i've had 100 is much
more reasonable.
Right.
Probably like I've never seen it, but I can imagine like 120 being better.
And you don't want a complete run on line.
Like that's true.
But the only time is a few people I know run like 80 character width terminal windows,
and they run like a bunch of them.
And then they can like, you know, all their source code they can have side by side.
That's just not how I work.
So to me, the character is just really short especially if you get too many indentation levels which is not too
many but like three or four right like you're in a class or you know in a function you're in a you
know three four levels down is easy and then if you're using two spaces per indent that's already
eight spaces gone right
like yep another thing that like i can't stand and but i don't really know how to fix it is
you know eclipse um i mean eclipse does four spaces for for an indentation level like you can
go into the eclipse settings and change it to two but then as soon as you reinstall eclipse or
change machines or something now you're
back at four and so i'm not really sure hopefully there's i could just figure out some way to
sort of like maybe tell eclipse to go look at dropbox for my style or something like that you
know that's what i do with emacs yeah i use two for indentation level but then for incomplete line it's four or whatever oh really oh it's interesting yeah so like if you have a you know a line which you didn't
complete or like it wraps around you you wrap when you wrap around yeah you indent it four on
the next line oh i see that's cool it's a good idea so actually this is like pro tip, everyone. If you use Emacs or VI, put your Emacs.d or whatever the equivalent VI is in Dropbox.
I mean, that's been a total lifesaver for me.
I mean, if I thought you were going to say if you use Emacs or VI, move forward a decade or two.
Oh, come on.
I love Emacs.
Emacs has support for everything.
Like the other day I needed to do Scala and Emacs. Emacs has support for everything. Like, the other day I needed to do Scala, and Emacs, it just worked.
Like, I spent about five seconds, five minutes, you know,
finding the Scala package and installing it, and then, boom,
I had, like, beautiful, like, you know, syntax highlighting
and all that stuff.
I mean, like, there's no editor that has that kind of support.
Maybe Sublime Text is getting there, but that's it.
Yeah, I mean, it's nice.
Yeah, the problem is certain parts of the development,
you just shift it around.
So I do code in that C or in C++,
and then bring it up in an Eclipse or an IDE,
you can typically follow,
use the go to definition or go to declaration
and see that
and yes i know people do this in like vi or emacs with the what is it c tags or whatever
but i've seen very few people who are on top of it and doing it well enough that like
they actually are as efficient and getting into a new piece of code and understanding it as i can
be by you know using an id and actually going to the definition and whatever yeah that's a good point that's a good point so when i'm browsing around code that
i'm not familiar with it's just so useful to be able to do that um and it just isn't the same
when people use emacs or vi they typically end up opening a new window grepping for the thing
and then like may or may not have success like i think i think you're i think if i
could get um you know as i said before if i could get the eclipse configuration to somehow sit on
dropbox and synchronize between my laptop my desktop and if i could find support for all the
languages like eclipse would have to be able to do Python and Scala. I think it can do all of those.
But you're right.
If I could get Eclipse set up correctly, the go-to stuff is totally awesome.
But Eclipse can also do Emacs and VI emulating, right?
So in theory, if you could get the syntax highlighting right, you could still use all
the macros and key commands and stuff just like Emacs.
So at that point, someone would have to tell me
what the downside would be i'm not you're right you're right if you could get support for like
dot proto files if you get syntax highlighting for that and you know python r all the languages
if you could if eclipse could be as fully featured as emacs then you should use eclipse i agree with that because you could just
turn on the emacs keys right yes but then the argument is it's so resource hungry and yeah
it's true that was a problem before but now i mean like this the laptop i'm on isn't like
the two top of the line it still has 16 gigs of ram yeah i there will never be one answer for everyone that's true but jason
tells you to use spaces and deal with it oh yeah there's there's never one ide answer but there is
one tabs versus spaces answer and it's spaces and and even if you use eclipse you should still know
how to use vi or emax one right so like i still know how to use vi so when i'm on another person's
computer or doing something on a computer that's not configured you still need to know how to do it and be reasonably efficient
yep a lot of people aren't and then that's like a problem so they just use eclipse because it's
the fastest way to get started i guess and then they become debilitated when it's not working or
they have to they're using a new computer or whatever. Yep. That's not good either. Yeah, definitely. Yeah. You should learn. I mean, Emacs is,
so technically VI gives you complete coverage. Um, I have been on a few, very seldom I've been
on machines that didn't have Emacs. Um, but, uh, but yeah, you should definitely learn one of the
two. Um, and you should change your Eclipse to use spaces instead of tabs, please.
And make it dark, dark themed.
Oh yeah, everything should be dark themed.
I don't know why people don't do this, but I have my Mac OS.
OS 10 is set up to be dark themed.
All of my browsers are dark themed.
Eclipse is dark themed.
Everything should be dark themed.
Your eyes will thank you.
Wow. I don't have everything dark themed, butemed. Eclipse is dark-themed. Everything should be dark-themed. Like, your eyes will thank you. Wow.
I don't have everything dark-themed,
but all my coding stuff is dark-themed.
Honestly, like, if I could find a way
to invert everything but the graphics
on every website, you know,
some kind of Eclipse plugin,
I would probably do it.
That's pretty goth.
That's pretty goth.
All right.
I think it's time for the news.
News.
So the first news article is
Hack, a typeface designed for source code.
Wow, it's almost like we planned that lead-in.
What's that?
Talking about IDEs.
Oh, I know.
Yeah, we totally set that up.
We've been doing this for too long.
But, yeah, so Hack is pretty cool.
I've looked into it.
It's basically, I used to use Deja Vu for Emacs.
Hack is based on Deja Vu, which is comforting for me.
But it has a few changes, which I really like.
One is the letters that are very slim, such as I and L,
they purposely give them kind of an exaggerated serif
that kind of hangs off to the right. And so it kind of makes those kind of letters,
you know, stand out a bit more.
It's monospace, of course,
because it's designed for source code.
And so when you have monospace,
I and L kind of tend to look kind of weird.
So it fixes that.
Also, I mean, just some subtle things.
The O and the zero, the zero has a dot in the middle.
So you can tell whether it's
O or zero. So it's got some things like that, which I think are pretty cool. So I ended up
actually sticking with Deja Vu, not because I didn't like hack, but because I couldn't really
figure out a way without a lot of work to install it on every machine you actually to install a font
there's not like some command line trick you can do you actually have to open up the font and
and go and install it and so i'm still trying to figure out if there's a way to do that
um but uh overall if it was as easy as deja vu to get i would i would be using it
i'm not gonna lie i don't even know what i use you just use like a font nerd i just use whatever
yeah eclipse's default is pretty good actually pretty good i'm looking at it i can't tell what's
different other than like you said the dot and the zero the the eye if you look at the eye on
deja vu and on hack yeah but i can't think like without looking at it i can't tell what mine is so
oh i'd have to go look at whatever font i have by default if your eye is symmetric then it's deja vu like if it's symmetric across the y-axis
okay yeah i am not uh what do they call someone who's a font nerd it has a name but i forget
really and maybe not i'll look it up all right. Speaking of aesthetics, we have another article, the Kalman filter in pictures.
So not using Kalman filtering for pictures, but describing how a Kalman filter works using pictures.
The first time I saw an explanation for the Kalman filter that was sort of like this i believe was in one of the what uh the guy who does the google self-driving car stuff has the online course uh sebastian
thron yes that one i think he has a video presentation about um common filtering particle
filters and bayesian analysis uh that he does through and it was really good that was the
first time I'd seen it presented a similar way to this article but this article also does a very
good job and basically talking about kind of how if you use if you think about it kind of in the
Bayesian way and like what happens as you take measurements and update them and graphically
showing the kind of distributions in a way that helps you kind of
more intuitively understand what the common filter is doing i don't know that it gives you enough to
where like at the end of it you'd be able to like go implement it but kind of understand where where
it's going and even if you don't read like the actual formulas that are in here are uh they look
foreign to me and i don't exactly know what they're saying i'm sure if i sat down i could probably work through most of it because i have actually
implemented a common filter before um but uh i've forgotten it all since then the pictures though
do help tell a story and jog my memory enough and you know i thought was a good representation of
why you would use a common filter what does it do for you how is it helpful and that it's not just magic cool yeah this looks
awesome i actually read uh half of this article um a few days ago and i still have to read the
rest of it but it's awesome i yeah i also highly recommend it so common filter in pictures yeah
oh definitely check that out you don't know we should say what a common filter is uh oh so a
common filter is a way if you're taking and there's
variations of it but in the simple form if you're taking measurements over time and the measurements
have noise in them and you're trying to say uh how believable is the measurement at a given point in
time uh yep pretty much it's just it's a recurs a recursive Bayesian filter.
So the cool thing about it is because of the assumptions
a Kalman filter makes, you only need to know the current state
to infer the next state.
So it's not like you have to store the entire history in memory.
And so that part is pretty cool.
It's very memory efficient.
Yep, and then you predict what the new state should be. Then you take your measurement
and then you basically adjust where to predict next based on that.
Yep. Although people do all sorts of tricks, like take the last four states and make those
states in the common filter. So that like, yeah. Oh, I've never seen that.
There's also ways of doing it where you have multiple measurements.
Oh, what is that called?
Consensus, right?
Is that right?
Oh, you take kind of voting?
Similar.
Well, you have things that are measuring stuff similarly, but differently.
Like if you have an inertial measurement unit, you have like two of them and they'll give you slightly different answers.
And you put them both into the Kalman filter in hopes of getting a better answer out
if their way that they have error is different from each other right that's right yep so that
was a bad example but oh like dead reckoning on a car so if you have a car you can count like the
rotation of the wheel but the wheel slips a lot right so like how many times the wheel went around
isn't a great measurement on its own
but if you combine it with like gps you know like the strengths and weaknesses of the two complement
each other very well and you end up with a better estimate exactly yep there we go that's a good
example yeah cool very cool um so this next story is an honest guide to San Francisco startup life.
Honestly?
Honestly.
So this, here's the thing.
This article, I don't know if it's true or not.
I've never worked at a startup.
Wait, it says honest right in the title.
No, it's on the internet, but it's got to be true.
Yeah, I mean, he really shouldn't have put honest.
But, I mean, you know, it's hilarious is what it is. The guy is absolutely hilarious.
He has a way of writing, like a sense of sarcasm that I really appreciate.
Like, I'll just read an excerpt for it.
He's talking about getting to work.
He works in the city.
He lives not in SF because he works at at a startup and he goes on about how he
can't afford to live in sf and everything but but he said uh you know driving to sf is like a theme
park ride blah blah blah but then he goes those with the death wish ride a bicycle to work it's
easy to spot a cyclist if you see a guy with one side of his jeans rolled up to the shin
he's a moron if you see a guy on one side of his jeans rolled up to the shin, he's a moron. If you see a guy on a bicycle, he's a cyclist.
You've got to read this.
It's hilarious.
A lot of it is true.
Okay, so this is a tongue-in-cheek article.
It's very tongue-in-cheek.
Very, very funny.
So he works in South of Market, which is is being south of market but he says that's
what the returns all the startups should expect yeah right worse than market oh that's terrible
it's so funny so definitely check it out i mean it does give you uh insights into working at a
startup in sf or just being in the bay area in general if you're interested um so there are you
know there is there is uh there are academic aspects of this article but but as patrick said i mean it's just tongue and she's very funny this
tomorrow at work which one bay area companies have obvious names evernote makes note-taking apps
optimizely lets you optimize your websites google lets you google anything thing oh that's so good that's amazing okay maybe this is not funny to any of our listeners who
don't live in this area which is all of them yeah i mean it goes into everything like you know bring
your dog to work you know having an exercise ball as a chair i mean all the things that does that
happen in a lot of places that aren't like seattle
or san francisco san jose i'm pretty sure these phenomena only happen in i mean maybe in austin
i don't know but definitely you know obviously the density is much higher in sf of this kind
of craziness okay um yeah so that's that's a quote-unquote honest guide.
Yeah.
So the next news article I have is not news, but a contest,
the Underhanded C Contest.
And I thought I had read something recently about this,
and then I was looking on their website,
and I saw they have a new challenge.
I must have been reading results of a previous challenge.
But I thought this was really clever so you may have heard of the obfuscated c contest oh i remember that yep it's not it's like the worst thing ever it's basically
people doing all the things in coding that you should never do um and like trying to write
one-liners did you read that article about the top top 20 things you can do right before you quit a job you hate?
Oh, I did see this.
Yes, this is terrible.
I don't want to talk about any of them because I feel like anyone who does that is a terrible person.
One of them was poundifying break to space.
And so what that means is there's no compiler errors but all of your breaks just
disappear it's like all of your case statements and your switches just fall through to the next
one um you know if you ever have a while true it just never ends and but but there's no compiler
error but they're all like that there was like redefining true to be false or something yeah
oh no i think it was redefining true to be a random number generator
so good no it's so bad don't do that don't do that don't ever do that so this is underhanded
similarly to those things i guess um but basically they present kind of a scenario that's supposed to
be a real world challenge and it has kind of like a simple
solution. It's not like a programming challenge to actually figure out how to do it should be
really straightforward and how to actually be able to accomplish the scenario they're putting forth.
But then your goal is to try to put flaws into the code that are very difficult to detect,
but you have to send
your source code right so you post your source code and you could do things like
kind of Jason is alluding to like maybe you pound to find a macro that does
something unusual but it's hard to detect or and so like in any case you're
trying to leak information that you shouldn't be so in the case they have
now I think you're a nuclear arms inspector working or trying to leak information that you shouldn't be so in the case they have now i think you're a nuclear arms inspector uh working or trying to sabotage nuclear arms inspectors
by allowing the certain countries to influence the results of nuclear power tests um and you
your code will get inspected and it needs to kind of your bugs have to slip through the inspection
so you have to do you
have to do things like maybe subvert how the date function is working or have a clever integer
underflow that when it happens deliberately does something else um and anyways you can read through
previous results there's several years of uh previous results and he kind of i think it's a professor he or she posts up uh kind of like
analysis of the the best ones and short and simple ones tend to win because it seems less
sinister and people will you know kind of like just glance over like oh it's so short how could
there possibly anything evil going on here versus if you write just mountains and mountains of
unreadable text people are gonna you know be suspicious so let me check it out it's an interesting approach to thinking about how
to break something which hopefully helps you not to do it yourself this is amazing yeah definitely
i mean i feel like i could spend like a whole day just looking at this website. This is so good. Oh, this is so good.
Okay.
Okay.
Yep.
So this is amazing.
So we've talked about looking at fonts and describing to them to you visual.
Then we talked about a web tutorial that was picture oriented.
So now Jason is just simply describing to us his feeling about looking at underhanded C code.
I think maybe we should go back to being on Twitch. i stopped doing the twitch thing because no one was really on it
but uh some people some people were but yeah it's just hard right like people aren't all in this
time zone yeah exactly but but this is the kind of thing that if i had been streaming my laptop uh
display like it would uh the people on twitch would have actually seen what we were talking about
anyway you guys we're gonna post all the links to the blog so the one line i was thinking about
that's in this uh what to do before you leave your company is hide yeah it was pound defining if
as a macro that instead of you do if and then you have something inside of the if
instead it does the thing that's inside of the if and a random number is basically that evaluates to true 99% of the time.
So 99% of the time, it'll just be whatever you have inside of the parentheses.
In other words, it'll operate exactly as you intend it.
But then 1% of the time, it's just going to return false for no apparent reason.
And so then when you debug it right like if if you're not depending on how you're debugging it
you're likely to just be like nope it's right it's right why is it not working and then why is it
wrong this is so amazing the only thing about this file is oh i guess you're supposed to use a piece
of this file because there's some duplicated things here. Yeah, you need to hide some of it.
And you need to scroll it away or it'd be too obvious, right?
If you just put all of these in one place.
I'm actually, I'm going to take some of this file
and put it in our logging.h at the top.
And I'm going to submit the code review just for fun.
This is actually our week of test.
We're supposed to be testing things
so this would be
so when I do GDB debugging and you've pound defined
if to something else I'm trying to think
if you're doing it just line by line
I don't think you would ever be able
to tell this even if you were like line by line
GDB debugging something
but I tend to be
really paranoid and I typically drop to assembly
very quickly
and then I would see it I've actually never done that ever But I tend to be really paranoid, and I typically drop to assembly very quickly. Oh, really?
And then I would see it, yeah.
I've actually never done that, ever.
Really?
Yeah, I don't even know how to do that. I spent a non-trivial amount of time last week writing custom world assembly.
Wow, yeah, I have no idea how to do that.
Does that really help?
I mean, I feel like GDB is as low as you ever need to get.
No, it was really good.
Wow.
Okay, good to know the thing i was looking for is i really wanted
uh regardless of what optimization level you're at i wanted something that had a very specific
performance and so i knew i could do it in two instructions taking three cycles and the compiler
was only getting down to like five instructions seven or eight cycles and so what i was worried
is that someone would later find better optimizations
and would change the timing inadvertently,
and I needed it to have very specific timing.
Wow, that's amazing.
Cool, so time for book of the show.
Book of the show.
My book of the show is entrepreneurial.
It's The Hard Thing About Hard Things. So what is The Hard Thing? my book of the show is entrepreneurial it's the hard thing about hard things
which is the hard thing the hard thing about hard things is that you don't know the answer and uh
you don't know if you're going to succeed or not i thought you were going to say that you don't
know that they're hard um that would be deep well so i mean i agree with you but uh according to the book you definitely know
uh that you're in hard times but uh but you don't really know what the right answer is and so this
is from ben horowitz so if you've ever heard of andreessen and that's right if you've ever heard
of andreessen horowitz so they andreessen mark andreessen and ben horowitz both founded netscape
i believe or like we're, were, you know,
founding engineers on Netscape. I could be wrong about that, but, um, Ben Horowitz was definitely
founding engineer on Netscape. And, uh, it's all about actually Ben Horowitz. It's a lot about Ben
Horowitz's life. I was pretty interesting. I mean, at one point he was doing a startup,
he was feeling really good about the startup. And then he had some family trouble.
He had a daughter who was born with severe autism and he needed medical benefits, things like that.
So he was forced to quit the startup that he founded and get, quote unquote, like a corporate job.
And then worked there for a while, went back to startups and all about sort of
the crazy volatility um you know his startup was completely falling apart and so he had to sell it
and so it's this craziness where he's got talented people so it's kind of worth money but it's really
kind of an aqua hire because he doesn't have a good product but people don't know he has a bad product yet and it's all about how he kind of pulls that off and sells the business for like
a ton of money um and uh it's just it's amazing i mean it talks a lot about even
they they basically this one guy it talks a lot about b2b which is for people don't know that's business to business
so most jobs are business to business like for example like facebook is business to consumer
there's all these people they come to facebook and it actually is facebook even b2 i think
facebook is actually business to business because advertisers come to facebook and then facebook
um you know displays their that's not what they
mean when they're talking well yeah it's kind of on the business to business yeah the point is like
uh so a lot of these b2b uh you know companies they have a very few clients especially this
would be more like square right like square is squarely oh i'm so sorry i didn't mean to do that
square is very like business to-business oriented, right?
I mean, individuals are using it,
but they're really out there to help small businesses.
Exactly.
They're a business selling stuff to small business.
In this case, it's service.
So this guy's startup, they literally had one customer
that was 90% of the revenue.
And this one customer was complaining to them
because he wanted to use this one piece of
software but he couldn't get permission to use it he only had permission to use their software which
they were orthogonal it's like they were competing but he wanted to use this other piece of software
and he was just really mad and in general they were saying he hated his life and he told them
he was just like depressed and everything but yet he was like in control of their whole whole business so so what they did they
bought this company only so that this one client would get access to this company's software without
needing approval so it's like imagine that i mean you buy a company just to make this one guy happy
um and then because that guy was happy he funded them for another year and then they got
bought or something it was just like crazy things that you never thought would happen
at the highest levels of uh of business happen and this book talks about them it's very interesting
all right mine is a work of science fiction steel oh not the book itself is real but it's a science fiction
book it's called steel world by bv larson and it's about i don't i never i don't really want
to spoil it but it is set in the kind of near future and earth is only one of many uh planets that have people on them and the guy is basically becoming a
mercenary to fight battles for other planets uh and kind of his going through that and the
storyline actually sounded kind of cheesy i i got the uh book during an audible sale and so i was
like so because i listened to most of my books all all of my books, I'm not going to lie,
because I have a terrible commute.
And so I was listening to it, and I was like,
I got it on sale, I'll try it out.
And I was kind of like, meh.
But then I really got into it, and I was really digging it by the end.
So I really recommend it.
Cool.
Well, you gave me good advice with Ready Player One,
so I'll take your word on it.
It has some interesting concepts in it.
It's definitely not hard sci-fi, or least oh good by my standards that's good yeah for me i'm reading
currently a one that is very hard sci-fi and i'm finding it kind of boring um so i'll probably have
as my recommended book of the show next week but that's not a good endorsement i'm sorry i was
gonna say hopefully it has a twist at the ending or something maybe maybe i'll finish another book in between um but definitely steel world was good it's actually part of a series uh the undying
mercenaries series which gives a little bit of a clue but i won't tell you what it was because it
was a surprise to me and i enjoyed finding out what the surprise was once i uh heard it but cool
cool cool uh and then as i we've talked previously, if you would like to get, that sounds so chilly.
Oh, well.
If you would like to get a free trial to Audible,
they're offering a one-month free subscription
where you get one free book
and then you get to keep it when your trial's over.
Just cancel it before the end
and there's no charge to you.
And you can do that and support the show
by going to www.audibletrial.com
slash programmingthrowdown, all lowercase, all one word.
That helps support the show.
We've had a number of people do Audible trials through that,
and that's really helped us out.
Thanks, guys.
Hopefully, you enjoyed whatever books you picked.
Write in and tell us what books you picked
or give your own book of the show recommendations
because we read a lot.
If you get the hard thing about hard things
or Steel World and you like them or you hate them,
let us know.
Well, if you hate them, don't tell us, please.
Because then I'll just feel guilty
that I made you get something you didn't like.
Well, maybe if you hated it compared to another book
that's actually way better that we didn't know about.
Ah, there we go.
Constructive criticism.
That's right.
All right, time for Tool of the Show.
Tool of the Show. so my tool of the show
is electron which is made by github it used to be called adam shell but it's pretty cool so
basically it's a node.js library and um the way it works is so a simple way of saying it is you can take a website and turn it into a desktop app, which is pretty cool.
We definitely, we have some internal websites at work that would be kind of just much better if they were desktop apps because, you know, you want them to live on the dock.
It's kind of weird if they're just a frame, you know, because they kind of do so much.
And so what this thing does is you can actually even just take the example.
If you're not like a Node.js expert or anything, it's fine.
You can just take their example code.
But it has this sort of app.
Like, it's kind of hard to explain, but it a os 10 windows linux desktop apps and in the app is a folder so in the case of os 10 apps are folders so you just put it in the app folder
um in the case of linux and windows um they do something kind of goofy but it works i think it's
like some kind of executable zip or something but um but just kind of, you have this Node.js code,
which could be as simple as just one line
that says open URL www.google.com.
And now you have a google.com app.
But typically you would have it point to some internal app
for some internal site.
Or if you have, you can make a Facebook app.
It's kind of cool.
I'd imagine for things like Google and facebook they're probably professional versions like people
have probably done all sorts of cool stuff that you can leverage um but if you have you know a
site of your own or if you have your company wiki and you want to make a desktop app for that you
could do that so just think of it as kind of like a desktop wrapper around so it just like opens its own browser but the browser is just your app exactly it uses the
chrome embedded framework cef um but yeah you basically you give it the title of your app
and you give it a app icon and so you, you know, it just launches that app,
but the window is just a browser window.
And it's pretty cool.
Definitely check it out.
All right.
Yeah, I'm looking for like screenshots,
trying to see what it looks like, but I can't find any.
Basically, it looks like you opened a browser.
You can choose-
But there's no back and forward?
That's right.
There's none of that.
No address bar?
You can choose, actually, whether you want the address bar or not.
So it's just a flag when you write the Node.js code.
So, yeah, if you don't have any address bar,
then it's literally like it feels a lot like an app.
But then also if you don't have an address bar and you don't have back and forth,
then you as the web developer has to make sure
that you allow people to navigate to where they need to go.
My tool of the show is not a tool, but a game.
Surprise!
And this is Does Not Commute, which is an awesome name for a game and in this game uh
which is free to download on ios i don't know if there's an android port i guess i should have
looked that up oh well i will look i actually don't know what the business model is there is
an android app there is oh awesome you should check it out apparently it's telling me get
premium checkpoints if i buy but i'm just saying not now and what it is is it shows you a little top-down view like what was
that the original grand theft auto was like that right yeah that's right it's like the top-down
view of the car and you basically uh steer your steer your car left or right through a neighborhood
trying to steer a car to a destination except that once he gets there you immediately start
in a new location with a new car and you need to drive it to a destination except that time is
running in parallel with the previous car you drove which is now also driving on the street
you're trying to drive on so it's following the path you chose to take and then you're driving
through it and then you keep doing this over and over again um until kind of like a certain certain number of cars or whatever but your street is
getting more and more busy as all your previous runs are stacking on top of each other uh and if
you collide with yourself now there's an accident you need to drive around the accident um and this
kind of thing so it's's a really fun concept.
I just verified does not commute.
I looked at the in-app purchases.
It only has one in-app purchase, and that is does not commute premium.
So my guess is at some point, you'll have to pay $2, which is really not bad.
But it's not a pay to win.
Oh, okay.
Yeah, it's not like –
Jason was concerned about that yeah the thing
that scares me on free apps is like if i go to the in-app purchases and it's like 100 ruples
or 100 like some weird currency for 10 bucks then i know it's kind of you know stay away i kind of
knew it wouldn't be like that but anyways for me it's just like i didn't need like i didn't need
to keep playing it it was just a really clever, and I liked playing it for a few minutes. I don't know that I would play it until it was over or the end,
all the levels.
Just check it out for a really cool, clever, original concept,
which don't feel like they come along that often.
Yeah, definitely.
It looks great.
Cool, cool.
So our language for today's show.
Our language is our.
Our, matey.
So our actually came from S, and we'll talk all about that.
Wait, but R is before S.
R.
Oh, R is before S.
Oh, man, that totally messes everything up.
So this is a suggested.
If it messes everything up, you mean doesn't matter but yes continue this is a suggested
language uh john williams thanks for writing in he suggested we do our back in may or a little late
better than ever yeah well you know we we definitely uh keep track of everything so uh we
got to you eventually um and also alfredo galagos uh in, and he was kind of wondering, what is data science?
What is data analytics? And what is R? So we had two people kind of both interested.
And it's something we know a lot about. So this is a good topic for us to cover.
So I guess... So what is data science? Yeah, before we get into R, let's kind of talk about data science, data analytics.
This is kind of, you hear a lot of different things.
Like Alana Mitt, who's a pretty famous mathematician who worked at Google for a while.
He says a data scientist is a statistician who works in San Francisco.
That is truth.
Yeah, there's definitely an element of truth to that.
But basically, so data science is this um it's maybe easiest to talk about through uh some kind of example right
um let's say you are um trying to sell tickets to a baseball game right so you want to figure
out a good price for tickets.
So you have a bunch of data that has,
you know, let's say you have all of StubHub's data.
So you know on the secondary market
how much people charged for tickets,
how much people paid,
how many of those tickets that were on sale
were purchased, things like that.
And you want to build kind of some model.
Like you want to understand this data
so that you could make your own.
You could do price gouging on StubHub
or arbitrage on StubHub.
So you might say,
well, let's just take the name of the team that's playing.
Let's just say you just focus on one stadium.
So the home team is always the same.
Let's just take the away team and take the average of the prices of the tickets.
And so for the Cardinals, it's $17.
And so I'm going to charge $17.
Well, that probably won't work, right?
Because there's different levels.
If you charge $17 for front row seats, then you're giving somebody a great deal, but you haven't accurately described the price of those tickets.
So you go back and you say, okay, I'm going to take into account the different levels of seats.
And now for each category of seats, I'll have a different price structure.
And now you're a little bit better off. And then you found out, oh, when they played on Saturday, the tickets were much more expensive,
but when they played Tuesday afternoon, the tickets were much cheaper because the accessibility,
people work on Tuesdays.
So from looking at the data, you learn that and then you decided i need to add day of the week or at
least i need to add weekday uh yes or no to to um to my data and so i need to split the model again
and now just look at you know two weekdays with the cardinals and see if that data is more regular
and so that's basically what a data scientist does,
is they take the existing data and just kind of start slicing it up
to the point where they can end up with very regular, nice distributions.
And presumably they could do this on a small subsection of data,
and that would scale across all the rest of the data.
So in other words, we just talked about how to handle the giant stadium, but presumably
that same logic applies to every baseball stadium.
So that's how a data scientist can kind of look at small slices of data and hopefully
still solve the bigger problem, right?
So that's a big part of it is that's the analysis part.
There's also building models.
So most of the time there's a goal of data science.
So in this case, we wanted to become a ticket arbiter on StubHub.
So we want to know the effective value of a ticket. And so that becomes a regression
problem, right? So given all the things we know about the ticket, and then also we can use other
data sources. We could go to mlb.com and see, you know, are the Giants having a good year or a bad
year? Are the Cardinals having a good year or a bad year? Because that affects the sales, right?
So using all this data,
can we feed it into some machine learning model
and it would just spit out, you know,
oh, $57 a ticket for this ticket and be pretty accurate.
You might want to not just assign numbers
to data that you don't know about,
like in this case, future tickets.
You might want to classify them.
Like you might just want to say, okay, I don't want to know exactly what it's worth. I just want
to know which ones are, you know, selling for too cheap so that I can go and buy those and then
resell them. Which ones will sell out versus not sell out. Exactly. And so that's classification,
right? That's more of a categorical task where you want to say, is it A or is it B?
You also want to do clustering. So, for example, let's say you don't know which tickets are.
Let's say like in the data, you don't have low level, high level, like you don't know which which seats are front row seats.
But you know that people will charge more for front row seats. So you could do a
clustering. You could say, okay, given all the tickets at the Cardinals game, I want to cluster
it based on price into two groups. And I'll just assume the second group is the front row seat
group. Then there's also data embedding, which gets a little bit more complicated, but that's all about understanding the distance between two things.
So you might want to know, are these two tickets similar or different
with respect to their value?
And so embedding is a way to do that.
And so if you're interested in that kind of stuff,
there's data science competitions,
and they also have tutorials on Kaggle.com.
So that's K-A-G-G-L-E.com.
It's pretty cool. So even their tutorials are actually fake competitions.
Like they're competitions where there's no deadline
and there's no prize.
But you still get a feeling like you're doing this competition.
And it kind of gets you set up to do the real competitions.
I've done a few of them.
They're really fun.
I've never been in the top 10 or anything crazy like that.
A lot of those people are just extremely good.
But it's definitely worth checking out and you'll have a good time.
So what would you say is the difference between statistics or what you learn in like let's say a college undergraduate level statistics class and what
data science is about um yeah good question so well in undergrad you mainly learn how to you
know get the statistical properties of data so you learn how to fit to a distribution,
what the confidence intervals mean, things like that.
In most classes, even graduate classes, they don't really talk about how to sort of,
I don't know if normalize is the right word.
Clean up the data.
But exactly, like how to, people will tell you,
okay, given some data, here's how you can check and see if the data is normally distributed.
What that means is that you can find a Gaussian or a normal curve that sort of fits the density of the data.
So, you know, for example, if there's a lot of data around 20, and as you get further away from 20, there's kind of less and less data,
then you could kind of fit a Gaussian curve where the center is at 20, right?
What they don't really tell you is,
if your data is not that, then what do you do, right?
So that's not really covered.
And the answer is kind of, as we talked about,
you do this segmentation.
Also, you have to do, as Patrick said,
you have to do a lot of cleanup.
It might be, you know, some people will post a ticket on StubHub and charge $50,000 just because they can, because they're just crazy.
Or who knows, like they just think it's cute or funny.
Or they think, oh, there's a chance that some insane person is going to buy this ticket.
Let me just see what happens.
And so, you know, what that ends up translating to are just anomalies in the data that you have to filter out.
Otherwise, they'll just completely dominate any type of machine learning.
Like it will spend so much energy trying to understand this ridiculous number that it will forget about the entire problem.
So data science is about all of that.
So there's occasionally that happens on, this is not really related, but on Amazon.
If you ever go look,
especially I find it on like used books.
If you ever look at used books that are out of print
or don't have a current like new edition.
I've seen this with board games,
used board games.
Ah, okay.
Yeah.
And it's almost like people have an algorithm
which says people won't buy the cheapest used something.
They'll buy the second cheapest used something.
So find whatever the lowest priced used
something is and like charge three percent more than it and so then you end up with this weird
race of like two or three people offering used books for sale for like ten thousand dollars
yep and it's like but it's not like a rare book it's just an out of print book and for some reason
or like a previous edition of a book and it's very unclear out of print book and for some reason or like a previous edition
of a book and it's very unclear what caused it but if you were doing data scraping and trying to
say like hey books on science fiction what is is it worth collecting a random paperback book
and you know you had in your data these things that are like ten thousand dollars but that
doesn't mean anyone's actually buying them so it's not really a good measure of their net worth or of their actual worth but it would still
corrupt your data that's right yep you're exactly right now yeah even with board games there's this
game uh beyond the hill i think it's called um and it was just yeah the game retailed for like
fifty dollars just a normal board game um but it was out of print and they were getting ready to
release a new edition.
I didn't know all of that.
I just went to look it up on Amazon.
And yeah, they said it's worth, you know, $50,000.
It's just ridiculous.
But yeah, all of that comes as your data.
But I'm also curious why that happens in the first place.
Yeah, I think you were right.
I think there's some people with algorithms.
I would imagine there's people who just do data science on Amazon
and buy and sell things,
and they don't even know really what they're buying and selling.
Or they don't actually have it.
They just plan to buy it from someone else.
And so their plan is like one person's bot is trying to be the cheapest,
and then the second person is trying to be the second cheapest.
And so then they just race each other right yep uh maybe yeah pretty cool i don't know it's like high frequency trading of amazon only in this one no one actually ends up buying it because
there's nobody on the other side yeah i wonder how many high frequency trading
companies have like one or two or 2% of their revenue
are just broken algorithms,
and they just expect to lose that.
Like, they have a budget of $100 million,
and every year, $1 million or $2 million
just gets invested in a completely bogus way
because of some race like this.
Someone said that there's a group of people
kind of doing... it's not really
high frequency trading because like every case you're trading means something slightly different
but algorithmic trading where you know it's just like a computer running a program not like a super
high speed but that basically as humans is right um yeah but that means something different there's
other ones that watch like for a pattern.
If you've seen a stock is up 10% in the last month and it's currently down more than 2% in the last day,
buy it because chances are it'll be up again.
And so it might operate over the course of hours or days,
but there's no human intervening.
And so it's algorithmic, but not high frequency.
Although more frequent than maybe you or i might trade stocks anyways right um and so if you find them like you can
tell how they behave because they act in a certain way like not like a human per se and then basically
if you find these people leave the bots running for some period of time because they're initially
profitable then you know basically people figure out whatever
is being done and close down the profit by closing the arbitrage and then for some while they're
they're just like break even and then they become you know get shut down when they basically start
losing money but if you can find the ones that are still lurking out there that have gone break even
you can trick them into like taking the other sides of your orders and like make money
off of them but i've never seen anyone i've like no firsthand actually demonstrate this phenomenon
to me so maybe it's just urban myth uh yeah i don't know a lot of this a lot of stuff in the
stock exchange i don't you just have to be there and we're not there so but i would believe it i
mean i definitely i'm not skeptical of that i'm pretty
sure something like that happens i mean definitely if you're doing algorithmic trading you probably
are taking advantage of other algorithmic traders that's my guess i mean maybe not maybe not
knowingly but at any rate i don't know so all right yeah so let's talk about r so it is open source that's
a good thing it is open source if you've ever used matlab um you know how frustrating it is
um you know there is octave we had a whole show in matlab and octave um but octave is um has kind
of some issues um actually they're similar to r so we'll talk about that um but it's hard the
problem with matlab to octave is if like and this gets back to the b2b thing you were saying earlier if you're at work
you probably aren't going to use octave if you work for a big company because it's better for
them to just pay for matlab no matter how much it costs right right and you just deal with it so
then when you go home and you try to use octave it's different enough that it's frustrating yeah
you're exactly right you hit the nail on the head. Yeah, so R is totally open source. There's some like RStudio and things like that where they cost money if you're a company, but they're free if you're an individual. So that kind, you know, matrix based, you know, kind of linear algebra, you know,
based language. But one thing is very different is R supports this thing called data frames.
And data frames are awesome. It's basically like, think of it as like a SQL database in memory.
That's kind of the best way to explain it. So imagine if you had the SQL database or many of SQL databases sitting in memory and the entire sort of language that you're, all the code that you're writing can kind of take advantage of that database.
Like you have these sort of pseudo SQL statements right inside of your code.
So your code might say, you know, grab this data and then take all the rows where this one column is greater
than 0.5. And now you have a new SQL database that just has those rows. And then, you know,
do some other things to it and then do a group by and things like that. So it's really powerful.
It's very cool. Highly recommended. I recently, well, by recently, I mean like three years ago,
I switched, not that recently,
I switched from MATLAB to R, and I think it's great.
I'm a big fan.
It has tons of packages.
Almost all the packages work on every operating system,
so you don't have to worry about that.
It even has support for big data.
So there's this thing called RMR.
I actually never used it,
but it looks pretty cool. It looks like it would be
pretty intuitive.
It does map reduce in R.
The idea is, if you have a big
cluster full of machines,
as long as you use the RMR
variant of the various
functions, it will know to ship that data off to the cloud,
crunch on the numbers, and then ship back the result.
Some of the cons about R is it's like many open source projects.
It's got this very fragmented community.
The documentation is, to be honest, pretty terrible. The documentation on R
itself is great, but R kind of lives or dies by its packages, which is true for most of these
languages. And the documentation, like if you're used to, you know, the Apache Commons kind of
documentation and just in general, the kind of documentation you see on java products projects you'll be just supremely disappointed so the documentation r is in r's packages is
pretty terrible um but you're mostly using r for it's like difference like python python is used
for scripting but a lot of people do build applications with it but i've never i mean
maybe you have i've never seen anyone even try to build like an application with r it's always like a script that runs right like an
application in that it runs from start to finish but not like uh like what you would normally build
with like c or java or c plus plus yep yeah you're right so people are using it to do a series of
analyses and produce an output as opposed to like building a GUI with it.
Exactly.
Yep.
Yeah.
So there's many of the packages kind of lack support.
There's front ends that are kind of in various states of disrepair.
Kind of like we talked about QT octave being discontinued back on the MATLAB episode.
So it's similar to R.
I mean, RStudio is actually,
has a lot of traction, and now it's definitely in a good place. But the rest of them are basically
defunct. Yeah, for example, just to sort of like illustrate this, there's this library,
there's this package called Zoo for handling time series. And all I wanted to do was i had some data that was not regular across time so in other
words you know some like one of the data might be from tuesday and then wednesday and then friday
and thursday is just missing and so i just wanted to linearly interpolate all of the days that were
missing and the zoo
Package that it would just do that and so I thought oh great. I'll have to do it myself
I'll just use this package and it just it took me a whole day just because the documentation was kind of terrible I kept getting weird errors and
And so that's that's just something you kind of have to deal with
But then once you get kind of better at R,
and once you've kind of gone through the rigor,
the ringer, then it's awesome.
And it has an incredible power.
You already spoiled that it was based on S.
That's right.
From Bell Labs, which seems to be the source
of everything in modern day life.
I know, isn't that amazing? Do you know anybody who worked From Bell Labs, which seems to be the source of everything in modern day life. I know. Isn't that amazing?
Do you know anybody who worked at Bell Labs?
No.
Is Bell Labs still going?
I guess it might be.
I know Xerox PARC is still going because I have a buddy who works there.
But Bell Labs.
Let me check.
Yeah, no.
It's just all mythical people from legend that worked at bell labs but
i don't know anyone um yeah you go ahead and keep talking i'm gonna look up if bell labs is still a
thing it was developed at the university of auckland by one russ ahaka and robert gentleman
um used deriving some of the ideas that they got from us and as jason pointed out a big deal by one Russ Ahaka and Robert Gentleman,
deriving some of the ideas that they got from us.
And as Jason pointed out,
a big deal with R is finding packages to help you do what you would like to do
because everyone likes doing less work.
But hopefully your package is easier to use
than Jason's anecdotal experience
about his time series plotting.
And so some ones that we use before,
so Jason mentioned Zoo,
although his recommendation was kind of weak.
Right.
There's also plotting ones, so ggplot2,
which allows you to use GNU plot from within R,
or GNU plot, I don't know if people spell it out or say it.
And then if you need to do web visualizations,
you could use HTML widgets.
And if you're going to do some markdown,
you could use R markdown.
Yeah, definitely.
So Bell Labs is still going.
They had a couple of things pretty recently
that were noteworthy.
It said in July 2014,
Bell Labs broke the broadband internet speed record
with a new technology dubbed XG Fast
that promises 10 gigabits per second connectivity speeds.
Over what?
So, oh, 10 gigabits a second.
That's not that impressive if you're talking about fiber.
So I'm assuming they must be talking about the existing, you know, cable, right?
I don't know.
Yeah, because fiber can, I believe fiber.
Wait, 10, oh, no, sorry, 10 gigabits a second.
That's a lot, actually.
No, that's like, well, gigabit Ethernet is a thing, right?
Like, that's just a regular Ethernet.
And there's also specialized, like, server stuff, kind of 10 gigabit Ethernet.
You're right, you're right.
So, yeah.
But, yeah, anyways.
It's some sort of record.
It sounds cool.
Yeah, I mean, I guess not that much is happening.
But people probably weren't that excited when they heard about this thing called s or c you know but apparently the single letter program
names at bell labs were the uh heyday oh interesting uh a couple of a few months ago nokia bought bell
labs oh wow um so anyway so um yeah are there programming languages that we use
that came out of Bell Labs
other than C and R
C
obviously there was B
but B preceded C but I've never used B
I've never even seen B
did D come out of Bell Labs
alright
this is going to be really boring
we can look this up later and prepare
something um so i already mentioned our studio uh oh yeah yeah so so our studio is a great gui
there's also um rattle which i haven't tried but it looks pretty promising but at this point our
studio is kind of i think there's be more pirate theming here is it just me that's onto the pirate theme you have a programming language named r like come on oh man so if you want to learn how to be a pirate
um you should go to try our try our dot code school.com do they have a code school for everything
like if i just went to code school is so big now that guy like he writes he's really prolific either
that or he's developed a platform and other people are writing i don't know what the deal is
right every time i'm like oh i need to learn a language for the show i'm like code school oh
here we go um and it seems really reasonable like the business model which is like often
basically like the first one is free and then if you want more advanced
ones you have to pay um yeah it looks like they may have changed now they have ruby on rails
coffee script um get more ruby jquery objective c they have a ton of stuff that's pretty cool
am i thinking of something else?
What's that?
No, it's Code School.
Yeah.
Huh.
Okay, yeah.
Sorry, they changed.
I was looking at their pricing thing.
They've changed how they're doing it now.
So what is it now?
Is it better? I don't know.
It says there's a per month subscription.
I don't know.
Oh.
I don't understand what that means.
All right.
So, okay.
I have no idea what Code School, blah, blah, blah. But I've seen lots of that means. All right. So, okay, I have no idea about CodeSchool, blah, blah, blah.
But I've seen lots of good stuff in them before.
So if you've never seen them, check it out.
We both went through the R tutorial.
I mean, we already know R, but we went through it briefly.
And it looked pretty cool.
Who had the Ruby on Rails for Zombies or whatever it was.
I actually did that course a long time ago.
It's great.
Very cool. Rails for Zombies. whatever it was before. I actually did that course a long time ago. It's great. Very cool.
Rails for zombies.
So that's R in a nutshell in a pirate's hat.
It's pretty cool.
We should have had more pirate jokes.
Missed opportunity.
I'm still sort of torn about R versus Python with Pandas.
So for people who don't know, Pandas data frames to python so you know how i was
talking about data frames the whole like sql kind of just built into your programming language um
pandas is a library for python that gives python that power and so once you have pandas i mean
python is much better documented and things like that um I've been using R for so long, like three years now,
so it's kind of hard to switch again to something else.
And R is definitely much more like MATLAB than NumPy is.
But this is pretty cool.
Check it out.
Definitely try R.
Try Python with Pandas.
Did you totally just undermine our entire podcast at the end by saying,
if you're new to this, you probably should use Python with Pandas. Did you totally just undermine our entire podcast at the end by saying, if you're new to this, you probably should use Python with Pandas.
But if not...
Well, I think it's hard to say.
I mean, I think you should try both.
Okay, check them both out.
Make your own decision.
Tabs, spaces.
Jason doesn't care anymore.
Well, so the thing is, it's not fair for me to say
because I've been using Python for a decade.
And so it's easy for me to feel like, oh, it's much more comfortable for me to jump into pandas than R because I've been using Python for so long.
That's why I'm kind of torn.
So I feel like I'm biased here.
So definitely try both.
It's easy enough to pick up R and start doing some cool stuff.
R can ingest CSV, TSV, tab separated files very easily.
It's just a one liner.
And so it's easy to get up and running.
Just to recap a few things.
So the book of the show was The Hard Thing About Hard Things is my book of the show. Patrick's book of the show
was Steel World.
My tool of the show is
Electron. You can check it out on
GitHub. Patrick says that it does not
commute the iOS Android app.
And
we actually, the Patreon
is doing pretty good.
Yes, thank you. Some of our members, our
listeners suggested that we recap at the end of the show
for people who wanted to write it down at the end.
Yeah, definitely.
And thank you so much for your support on Patreon.
That's definitely helping us in our bandwidth costs.
We recently switched bandwidth.
We switched hosting providers.
And so let us know, especially if you have issues.
So if downloading this episode was very difficult for you
and it took three days or something kind of incredulous,
tell us, let us know,
because we're kind of messing around
with the infrastructure right now
and it's important to get that feedback.
Yep.
So thank you everyone who supports us on Patreon,
doing the Audible trials, buying the books of the show or using our amazon links for doing that that all that all that's helping us out guys thanks yeah we appreciate it
all right well till next time yep see you later the intro music is axo by biner pilot
programming throwdown is distributed under a creative Commons Attribution Sharealike 2.0 license.
You're free to share, copy, distribute, transmit the work, to remix, adapt the work,
but you must provide attribution to Patrick and I and sharealike in kind.