The Changelog: Software Development, Open Source - VisiData is like duct tape for your data (Interview)
Episode Date: May 12, 2020Saul Pwanson is the creator and maintainer of VisiData, a terminal interface for exploring and arranging tabular data. On this Maintainer Spotlight episode, Saul joins Jerod for a wide-ranging discuss...ion on crossword puzzles, biographs, and Saul's open source gift to the world. Thanks to AJ for the suggestion!
Transcript
Discussion (0)
If I do things just for myself, I do a crappy job.
I don't seem to like myself very much.
I don't care about my experience.
And so I just do the bare minimum.
I think most people are kind of in the same boat.
And yet it's really nice to have really nice tools,
like to be able to feel good about the tools that you're using.
It's like, you know, driving a janky go-kart that you've made yourself versus a nice Ferrari.
And it's not worth it to build a Ferrari just for yourself.
So it's not worth it to make something like Visited
just for my own use case.
But if I take the mentality that,
no, I'm going to do it for other people,
I'm going to do it once and for all,
like this is my gift to the world,
then I'm more motivated to do it well
and then I do do it well
and then I benefit from it too
because I get to have that tool
and have it be a nice thing.
Being With Your Change Log is provided by Fastly.
Learn more at fastly.com.
We move fast and fix things here at Changelog because of Rollbar.
Check them out at rollbar.com.
And we're hosted on Linode cloud servers.
Head to linode.com slash changelog.
Welcome back, friends. This is The Change Log, a podcast featuring the hackers, the leaders, and the innovators in the software world.
We have a great Maintainer Spotlight episode for you today.
Saul Paunson is the creator and maintainer of VisiData, a terminal interface for exploring and arranging tabular data.
Saul joined me for a wide-ranging discussion. We talk crossword puzzles, we talk biographs,
and of course we talk about Saul's open-source gift to the world. Maintainer Spotlight is
produced in partnership with our friends at Tidelift. Check them out at Tidelift.com.
Okay, let's do the thing. First, let me give a shout out to AJ in our community
Slack for pointing us towards Sol and Vizzy Data. AJ wrote to me and said Sol and Vizzy Data,
he said a super cool tool that I'm apparently way too late to discover and a really nice guy to boot. So
thanks to AJ for the suggestion. Also a quick request for the listeners. If you have an open
source software maintainer in your life that deserves the spotlight, please do let us know.
You can hop in our Slack and chat me up there. It's free to join at changelog.com slash community,
or you can simply submit an episode request at changelog.com slash request
and mention our Maintainer Spotlight series.
Okay, with that said, Saul, thanks so much for coming on Maintainer Spotlight.
For sure. Thank you for having me.
Well, I am very excited.
We're here to talk about VisiData.
VisiData is an interactive multi-tool for tabular data.
It combines the clarity of a spreadsheet,
the efficiency of the terminal,
and the power of Python into a lightweight utility
which can handle millions of rows with ease.
How did I do reading that marketing copy?
Oh, that's actually, that's great.
It's always very interesting to hear somebody else say those words.
I think it's actually one of the best summaries
that I've managed to come up with,
but it doesn't actually, it's kind of a mouthful, right?
It is. Well, that's the thing with software. Sometimes like you have to present what it is,
and what it does, so that people know, like what value that thing brings. That can actually be a
real challenging thing, I think, lots of times for us, especially software developers, not so
good at describing things in prose. That being said, it does the job. Yeah, well, thanks. Well, you know, I actually
have been thinking a lot over the past couple years, well, decade, really, but a couple years
about marketing. And as an engineer, I really don't like marketing, it feels kind of icky and
gross, like I should be able to make something that's awesome. And that should be sufficient,
right. And the more that I'm, I mean, the the older that i get and the more that i see how the
world works it's not just a necessary evil it's actually like um it's just a necessary and so i've
been thinking of it recently in terms of packaging for humans i mean you wouldn't really you do
release software without packaging but the best software packages do have actual packages where you release them to whatever ecosystem right windows installer or whatever
and marketing is kind of the packaging but for brains you know if you want it to um become not
just adopted but spread even if you have a great thing if it's not packaged well in the memetic
sense then people themselves aren't going to spread it even if they love it that's the thing
i actually realized recently that's been kind of a little bit mind-blowing for me,
is that it's not just that they have to love it.
They have to love it and have something to repeat.
If there's not something to actually repeat that they feel confident about
or that will lodge in somebody else's brain, they kind of just won't.
Or they'll try, kind of have a really simple lodge, and it just gets stuck right there.
It's kind of like if you've ever seen a movie or read a book that you really appreciated and then you turn
around and try to tell somebody else about it and because you can't exactly describe what what you
liked about it or why it was great you become like a really bad chill basically as a salesperson
you're like you should just go see it trust me and they're like well i have a lot of movie options i
might not trust you yeah so yeah, if you can like give
somebody a phrase to say, and then they can just use that phrase to describe your thing,
you enable them to help you. Exactly. Exactly. And so, yeah, so those 30 words that I have on
the front page that I feel like describe it, they're a really good description, but not a
really good marketing package. And so I've actually working on the better marketing packages. And
like you said,
it's a constant challenge. Do you have any betas you want to you want to test right here with me,
you can give me a couple phrases, I'll tell you if they're any good. Yeah, I've got a handful.
They're more conceptual than well, some of them are phrases, some are conceptual. So
one of the concepts I think is probably the strongest one is duct tape, like duct tape for
data, right? I feel like there there's a you know visit data itself if
you think about any one thing that it does it doesn't i mean sure it's fine it's a csv viewer
editor or whatever you can kind of pick any one of those things but it's not about that it's about
you know if you know this one tool and all of a sudden you've got all this potential and it's like
duct tape isn't anything really special except for well it's so sturdy and um universal
and universally useful that you can kind of slap it on anywhere and get something going when all
you need is a little bit of you know glue and it's true of duct tape i'm sorry of visitator
where you know you come in the one end you wind up with a pile of json maybe but you need a little
bit of csd at the other end and it's got to be just a little bit different and it fits in really
great there yeah so duct tape is one of the things i've been playing with as a meme i think duct tape is a great metaphor
for software especially software that does what busy data does uh so i think that's on point you
should definitely you know start to work that into the messaging around busy data before we get too
deep into the project i want to talk a little bit about something else that you're enthusiastic about and it seems like i'm i'm seeing it a little bit in your your
interest in packaging and words is that you're a self-professed crossword enthusiast so what is it
about crosswords that gets you going what do you like about them i like the density of uh
information not just information it's like there's a whole lot going on there
right like you read a book and you kind of have to go through the words one at a time and turn
the page and it's a it's a bigger thing whereas one single crossword puzzle can keep you you know
i'll use the word entertained for hours right it's just this little puzzle and like there's just such
a tight little format every word in fact every character is carefully chosen and meaningful, right?
Sometimes, you know, it's a capital letter versus a lowercase letter that might change the entire meaning of a clue.
And yeah, so I like the fact that it's a really dense puzzle that you can kind of noodle over for a while.
And I also think that there's a lot of, you know, people think of crosswords as kind of like a throwaway thing. And this is actually something that I've wanted to say in public for a while that I don't think I've had a chance to yet, which is that the Library of Congress actually classifies crosswords as do not collect crossword monographs.
And it's I think it's probably a throwback.
Crossword's been around for a hundred years. And, um, for the first, I don't know, 30 or 40 years or so, maybe even longer,
there was kind of like lists of definitions and just kind of like,
we're in any way,
you can kind of see how library of Congress getting like an unfilled
crossword book. Well, first of all, it might get filled. That makes,
you know, a solved crossword is useless or worthless. Right. Um,
that's a good point. Yeah.
But then library of Congress marks them as do not
collect. And that's one of the very few things that are marked as do not collect. Like they'll
even take like one off weird, like alien conspiracy theory books as collect one or collect two of
those. And so crosswords are this weird thing. It's like this trivial thing that nobody kind of
cares about. And yet it's a really great cross section of culture.
Like you can really learn a lot about culture by what it's not just by what's in there, but what is presumed to be cultural knowledge, right? Like the fact that this person is famous
enough. Like they assume you can figure this out. They assume you know this. Yeah, exactly.
They'll assume you know this. So let me just say I'm a crossword neophyte. Like I've done a few,
I enjoy puzzle games. And so I've enjoyed crossword puzzles.
I never got deep into them.
And I've just recently learned,
probably through the power of podcasting,
a little bit of reading the story
I want you to tell about Gridgate.
But there's like this whole creation side of it.
So like I think of a crossword puzzle,
like it just kind of existed and I just do it, right?
But like there's a person that created that crossword puzzle
and there's like this depth of creation where like there's themes and there's like fillers and there's like all this
side which most of us never even consider and so i looking at it from a creator's standpoint
is you say it's like a a window a little bit into like cultural maybe like the zeitgeist because as
a creator you assume that i can figure out this thing based on these clues.
And so there's like a cultural connection between you and I, maybe you can't, and that's
when you fail the crossword.
But if it's, if it's answerable, this particular clue is going to lead me to something that
I already know about.
Right.
That's interesting.
Yeah.
And that you know about, or that you kind of know deep in the recesses of your brain.
That happens to me quite frequently where it's like, I have no idea what they're talking about.
And then I get a few more letters and it's like, oh, I do remember that.
You're totally right.
And then the other thing is the most, the crosswords that I have had the best time with
are the ones where they actually teach me something where I get through it and I'll
fill in the last letters.
I'm like, oh, I see.
You're totally right.
You know, that clue, maybe it's a word that I already knew.
I had no idea what the clue, how the clue would have gotten me to that answer.
But now that I see the answer and associate with the clue, it's like, oh, well, thank you.
You actually gave me a new tidbit of information there.
That is cool.
So tell us about GridGate.
So there's another thing that I didn't even think about, which you can plagiarize.
You can plagiarize crosswords.
Of course,
if it's your job,
a lot of people have a job to do that,
right?
Like I create the Sunday morning crossword for the New York times.
So that's like the famous one,
but for like all these local papers,
they have crossword creators,
right?
And so there's plagiarism that happens.
Well,
so,
um,
I didn't know any about this stuff until I got into creating crosswords
myself.
And that was actually what got me into it.
Um, I had a manager that I really liked and he was leaving the job that I was working at.
And as a going away president, I decided I was going to make, we do the New York Times crossword puzzle every day.
Put it on the kitchen, you know, table at work, the kitchen thing.
And then we kind of, I liked it because we did it kind of as a group.
You know, we all take a break every once in a while, go up and get a couple of answers and then go back to work.
And so when he was leaving, I decided as a little surprise gift, I was going to make a crossword for him.
That was all about our workplace.
And then we actually cut out the New York Times crossword for that day and pasted it in there.
So he didn't even know.
He was just doing the New York Times crossword puzzle except for it was the one that I created from that. And it wasn't a very good crossword or anything like that.
But the central answer was a phrase of his that i can't repeat on this show um and uh it was actually super cute and of course our imaginations will have to fill in that
blank and so when he uh when he got to there of course it was you know very nice and but
having done that once then i kind of realized oh this is super fun i i as i was saying i i came to say that um it's like just the right level of frustration to make
a crossword like it's actually difficult so to get everything to kind of line up in the right
way i could imagine i've never done it but i could imagine it's it's challenging and frustrating to
a degree but probably satisfying when you get it figured out very much so yeah and it requires a
level of creativity that it's it's not just um you know
slapping some things together this is kind of a deep art form and so as i like i subscribe to the
uh crossroads of crossword creators list uh that's been going on for many years and started
listening to them and the more that i listen to the more it's like no this is a uh this is an art
form this is um something that we pay attention to every little detail like that
you know my slapped together crossword that i did in a couple of hours isn't going to cut it no
matter right um no matter what i do to it and so then as i got deeper into it i made a couple more
crosswords and then somebody actually even said something about um you know have you tried thought
about submitting to the new york times and as i looked into that i got like a crossword
constructor's guide and it's's like, oh, wow,
I am nowhere near that level.
How would I get better at that?
It goes deep, doesn't it?
Right, it totally does.
And so, yeah,
so I decided to collect some crossword data
and, you know, I'm a data nerd.
So I scraped a couple of websites
and got a bunch of crosswords.
And then just,
I gave a talk on this last year at CSVConf.
I think it captures it pretty well.
But yeah, it was kind of, it was almost accidental.
I was just kind of looking for generic patterns in Gridfill to see like, you know, if S's were more common in this corner or that corner, if any corners were at duplicates.
And then it just turned out that a lot of puzzles were actually duplicated in whole or in part.
And then, like I said, I didn't know anything about the crossword industry at that point.
You said something about how there are people who make the crossword every day.
And actually, it turns out that's not how it goes.
There's an editor for the crossword page.
So like Will Shorts does the New York Times.
And then there's a lot of famous crossword editors, at least famous in the community
where crossword editors.
But they solicit crosswords from the community, basically. They pay a couple hundred bucks for the better ones i mean for the
better papers for each crossword and so there's you know there's a bylaw you select the ones they
want to use absolutely and they edit them and make sure they're all uh everything's all nice
and tight and properly everything but yeah i don't think will shorts has actually made his
own crossword at least for the new york times in quite a long time seems like an easy job he just looks at all the ones that come in he's like this one looks awesome
i'm gonna just print that one yeah i'm sure i'm sure it's that easy that's how i would do it if
i was will well but then you know it's interesting just like any editor he's gonna have his finger on
the pulse of the fight guys too right what's appropriate what's not appropriate and he's
actually been taken to task um recently for not having very many women constructors you know ideally in our progressive society 50 of the puzzles that could
publish to be by women and they're nowhere near that and so he's caught some flack for that
and for some of the um answers and clues and stuff that are in the puzzles too it's interesting to
see what winds up getting people up in arms over and he's like what that seems fine it's like well
i'm sorry will you're now of a different generation and the current generation doesn't think that's
okay you know so just like any editor i think he's gotta pay attention to that kind of stuff
any crossword editor has to do the same kind of thing so it's it's a challenging job from an
artistic standpoint from a audience perspective absolutely yeah so whenever your uh your thumb
is on the pulse of the zeitgeist if you get outside of what is the cultural norms of the society that you're
that you're editing for then yeah you get taken as half that's so fascinating that there's
something that's uh ostensibly so trivial as a crossword puzzle but is so deep and so
controversial you know when things happen and of, we're talking about words and their meanings.
And those things are important to folks.
So back to the plagiarism story.
So this, you started collecting this data.
You started collecting the crossword puzzle data.
And somebody used your database, which I think is published, you know, open on your website
to find out there's a lot of people or like maybe one person.
I can't, you can tell the story.
Someone's out there just duplicating these crosswords. Yeah actually it was me i collected the data and then i actually posted
the data to reddit and because i was kind of sick of dealing with it i just wanted to i was mostly
interested in coming up with the file format and doing archival kind of uh research but then you
know you post something to reddit and then nobody picks it up and it's kind of disappointing and so
i was like fine i'll just go ahead and look for some stuff whatever and then it picks it up and it's kind of disappointing. And so I was like, fine, I'll just go ahead and look for some stuff, whatever.
And then it just so happened that the first thing I looked for yielded some results.
And then I kept tugging at that string.
And lo and behold, there was a thing.
And there is a common thread.
There's a bunch of different cases of duplicated crosswords.
A lot of them are very interesting.
But the one that stood out was Timothy Parker,
who was the crossword editor for the USA Today
and his own syndicated service, Universal.
And he had been, it wasn't just that he had stolen,
I'm sorry, plagiarized one puzzle or two.
These were hundreds of puzzles.
And it was so egregious that it's really remarkable.
Actually, if you go to xd.sol.pw, that's where I have the site and the data for this thing.
And there's a pretty interesting visualization that I came up with well after the whole scandal was over.
But it's undeniable when you look at that thing.
There's a six-year period where literally hundreds and hundreds of puzzles were ganked or gutted or misappropriated.
Yeah.
And then there are other puzzles on there too.
There's plenty of other instances of so-called plagiarism from other authors,
but you wouldn't even notice it because of this one guy's malfeasance.
Yeah.
So there's a 538 feature all about this.
It's called a plagiarism scandal is unfolding in the crossword world published
March 4th,
2016 with Saul's data. I'll link to that in the show notes for those march 4th 2016 with saul's data i'll
link to that in the show notes for those who are interested in diving deep into that uh super
interesting one thing i want to say i'll give you a little bit more uh thing on this one thing i
want to say is that this ollie reuter um put the plagiarism spin on the story and that's the way that the story managed to unfold but
it's not strictly clear what plagiarism means in the context of crosswords right we all have
this kind of notion that you shouldn't do certain things but from my perspective if the library of
congress doesn't even value crossword puzzles as a thing are they even copyrightable i've talked
with a couple of lawyers and um they seem to say that the whole puzzle is copyrightable, but individual clues aren't and individual pieces aren't.
And I think crosswords are actually kind of like a very interesting nexus of copyright and art and all this kind of stuff.
It's not clear to me that what Timothy Parker did was anything other than a little bit greasy, if
you will.
I like that word to describe that.
Yeah.
Like, it's not illegal.
It's not like maybe it's immoral.
I don't know.
People, of course, are getting up in arms.
They want their stuff reused.
But he he didn't actually take crosswords from other outlets very often and redo them.
He was from his own outlet that he took and then he republished in the USA Today.
It's changed.
Right. And so it's like if you if you wrote a short if somebody sold you a short i'm sorry
somebody yeah so he sold you a short story and then you bought it and you published it and then
you took that short story and you republished it somewhere else for um you know changing maybe i
don't know a third of the story is that plagiarism and i think we probably would say yes it is and there's
a term called self-plagiarism where you take something that you wrote and you pass it on
something that you wrote again and it's this weird kind of gray area for me like it's not as clear
cut as ollie made it out to be well this has been our visit to the seedy underworld of crossword
creation let's shift gears a little bit i'll tell you it's all that you know i do gears a little bit. I'll tell you, Saul, that I do a little bit of
legwork with guests coming on. And oftentimes I'll meet somebody and I don't know anything
about them. And I will tell you that with you, I feel the opposite. You have your website,
which we'll link up, saul.pw, has lots of information on it. I know things about you
that I don't know about some of my good friends. Like I know at age seven, uh, well, you broke your leg or was it your wrist? You broke your wrist. You broke your jaw at age 33.
You broke your arm at age 36. Uh, first, my first question is, is there a common theme to these
injuries? Are you secretly like a evil Knievel style daredevil or why are you breaking all these
bones, man? I'm secretly, i'm secretly a klutz who uh tries
things beyond my level of capacity so in every single instance of those things i was doing
something i probably shouldn't have been doing and then wound up breaking something so yeah
overconfidence i think welcome to welcome to the club uh broken things myself for reasons similar
um i know other stuff so for for instance i know that between the
ages of 1921 you spent a lot of your nights and weekends with somebody named joey you lived in
urbana illinois and that's like a completely to me a random fact but to you like you're just you
put that out there that's just i i selected that fact but tell everybody how i can find all this
stuff about you i'm not a sleuth uh tell us about your uh well you you actually are kind of a biograph what what's this thing you put on your
website yeah it's called a biograph or at least i call it a biograph there is a thing that was made
maybe uh 100 years ago uh called a histo map and it was i think by rand mcnally and it was basically
it's a time map it was over the course of about 2 000 years they mapped the um the rise
and fall of civilizations and so you can kind of see at a glance like here's when the mongols
were and here's when the roman empire was and here's you know literally you can see um the
german the yeah the european situation and german empire and stuff like that and i love it as a
snapshot i love things like that that take data
and present it in a visual format so you can see at a glance you don't have to like pour through
tomes and tomes of text you can just see it there right yeah and i was wondering what that would be
like to apply it to my own life i kind of feel like it's important for for people to have this
kind of perspective on their life right we're also caught up in the day-to-day experience and what's
going on right now we kind of forget that like well wow 10 years ago i was doing that and
20 years ago i was doing that and wow i really did spend a lot of time in college with joey who
was my girlfriend at the time and so that's why that was the case and um and it's also interesting
too i kind of want to put it out there it's you say you're not a sleuth but um most people i think
look at something like that.
And if you look at the actual biograph that I've got, it's not pretty.
It's kind of a big, jarring mess of details and information.
But there's a lot of, like you noticed, a lot of interesting stories that are buried in there.
I feel comfortable sharing what I've shared because there's nothing really damning in there.
It's just a bunch of little facts.
But if you're looking and you're trying to piece together certain things,
you can discover things like that,
where like I had that person that I was really close with in college and we did
live together for a year.
And same thing is true of many other facts on that thing.
And so I feel like it's,
I don't want to say hidden in plain sight,
but you know,
anybody who's willing to do the research
can find out a lot of interesting details about my life.
But like I said, most people aren't.
Most people want to actually have me tell a story about that,
and I don't feel the need to tell that story.
So I just like to collect the data and put it up.
Yeah, well, it's a cool thing.
I'll describe it real quick visually,
and we'll, of course, link to it.
Everybody should go check this out.
On the left-hand side is the years, and then if you imagine if you imagine x y axis it's not really an x y axis but there
is a left hand side the year is going down 2020 down to uh i guess the the beginning of time no
uh 1976 and then on the beginning of my time yeah the beginning of salt's time and then you have
uh different categories residences flatmates weekday mornings weekday afternoons weekday evenings weekends and then like specific events like years of your life and then
they're just like drawn out where he was who he was with what he was up to during that time period
and it's one of these things where you look at it and i agree at first it kind of just looks like a
mess uh there's almost like a brutalism into the design here and you're like what is this thing
and then it becomes very clear pretty quickly you're like oh wait a minute there's almost like a brutalism into the design here and you're like what is this thing and then it becomes very clear
pretty quickly you're like oh wait a minute
there's stories in here
like the different bone breaks
like the different flatmates and the time period
and like the how you were spending your time
and it's really a neat thing
so I'm just curious is this easily reproducible
with somebody else's data
is there a tool that you use to build this thing
because I would love to have other people
be able to build this for their own life. It would be cool.
Yeah, there is a Python tool,
a Python library,
if you will, that we use. It's not that great.
You kind of have to write the stuff in Python.
I agree. I think it would be great if other people
could do this and did do this. I kind of want to make
this for each member of my family, for instance.
You know, I've seen my dad
when he went to school and these various events, especially want to make this for each member of my family for instance you know i've seen my my dad um when
he went to school and these various events especially the ones that happened before i was
even born you know i've heard about these things but i can't figure out exactly i can't place them
and especially i can't place them in context with other history you know he says he and my mom got
married in whatever year and it's like oh but that was the year that this happened in i don't have
that kind of context.
And so I would love to get that.
So yeah, I would like that.
Like I said, the library exists.
It's under devotees slash biograph,
I think it's called.
And people can use it.
People can look at it.
We actually have to do some work on that
and clean up the example
and update my own biograph
and stuff like that.
But if anybody is interested in helping put together their own,
I would love to either advise on it or help clean up the code myself
because that's, yeah, I think it's actually a very important thing.
I'm actually surprised that more people don't do this kind of stuff in general. Thank you. dependencies covering millions of open source projects across JavaScript, Python, Java, PHP,
Ruby,.NET, and more. Subscriptions include security updates, licensing verification and
indemnification, maintenance and code improvement, package selection and version guidance, roadmap
input, and more. The bottom line is this. You get all the capabilities you expect from commercial
software, but for all of the key open source software that you're already using and depending
on. Tidelift works with GitHub, GitLab, Bitbucket, and more. They support every
cloud platform out there. And of course, you can try it absolutely free. Start your free trial today
at Tidelift.com. so one more thing i learned on your biograph this one's more on point it's busy data of course a big
part of your life so there it is on the biograph and it seems like most of the work that you did on that, or at least started was in 2017 and 2018 during weekday afternoons. So I'm curious, is this a work related endeavor? Or do you have a lot of free time in the afternoons? Then tell us the story of Visi. And so it turns out that for some reason, leap years for me are very, I get really restive.
Something happens every four years.
And there's just like part of the cultural moment makes me really uneasy.
And anyway, this happened to be 2016.
It happened to be at the, turned out to be the end of a job that I was at.
And I was kind of looking around at my career
which has been about 20 years in software and i realized i didn't have very much to show for it
at all because you know you work for a company and you ship code and most of the code i've actually
you know so-called shipped has gone nowhere whether it's because the project was canceled or
you know it wasn't the right thing at the right time or whatever and even the stuff that i did
ship and they did into a product,
it just gets absorbed into the board, right?
It's not mine.
It goes wherever it goes.
And it's just, I don't know, it's its own thing, right?
That's the industry.
And it's disappointing having all this time and energy
and I think things to say in terms of software
that I didn't have anything to show for it.
And I kind of, so it actually turns out that I turned 40 right at the same time.
I don't think that's a, it's not an accident that I started visiting at that exact time.
It's a little bit of a midlife crisis, right?
Where I'm looking at my career and what's going on there.
And I took stock of the various projects that I'd had over the years.
And this was one that I had actually done at a previous job that I wanted for myself going forward.
I kind of like the next job I was like, oh shoot, it'd be really great if I had this thing
just to look at some data really quickly. And of course my previous employer owned that code,
and so I didn't feel like I could actually, I couldn't steal it, right? And so I decided,
you know, I was going to rewrite it. I was just going to do it. I was going to do it once and
for all. And I have this kind of thing that I get into it. It's
Oafa once and for all. And I get into that mentality and it's a huge,
it's a mistake. It really is not the right way to do things.
Where does that come from? So I think it's a, it's a, it's a self-management technique.
If I do things just for myself, I do a crappy job. I don't seem to like myself very much.
I don't care about my experience.
And so I just do the bare minimum.
I think most people are kind of in the same boat, right?
And yet it's really nice to have really nice tools,
like to be able to feel good about the tools that you're using.
It's like, you know, driving a janky go-kart that you've made yourself
versus a nice Ferrari.
And it's not worth it to build a Ferrari
just for yourself. It's literally, you know, there's an XKCD comic about this in terms of
when should you automate things, right? If it takes you this long to do it every time and you
only do it this many times a year, then you can spend this many hours making an automated
replacement. And so it's not worth it to make something like Visited just for my own use case
but if I take the mentality that
no I'm going to do it for other people
I'm going to do it once and for all
like this is my gift to the world
then I'm more motivated to do it well
and then I do do it well
and then I benefit from it too
because I get to have that tool
and have it be a nice thing
but like I said it's not actually a good trade-off
just for myself
I need to have somebody else that I'm doing it for,
even if that group or person never materializes.
And in the case of Visadata,
I mean, it is a generally useful tool
and people have come up and started using it.
And the more people that use it,
the more motivated I am to make it do more
or make it better in whatever sense.
So that's how that came about.
And also, you know, I know, in 2016,
there happened to be some other event right around my birthday, that kind of took its toll emotionally on me, I'm sure many other people. And I decided, I think part of this is also
channeling that kind of, you know, negative energy or fearful energy about the state of the world
into something that I can directly control and contribute and make the world a little bit of a better place.
So how long was it from conception?
I'm going to do this once and for all, the busy data to user number two, I assume you're
user number one.
Yep.
And so, you know, I conception idea, excitement, usually, then there's the toil, there's the
there's the perspiration, right?
1% inspiration, the 99% to get from you to two. Yeah. How long was it? How much work was it?
So I don't actually know. Because I wound up basically only a couple of weeks before I had
something that I was willing to show. And I posted it on Reddit again. And I got a couple of people
to basically complain that I had no unit tests, which is kind of funny. And, um, sounds like Reddit, right? And, uh,
then it was, so it's actually, so I was, where I was getting at before was that at the end of this
job and this phase of my life, I decided to take a sabbatical. I took a year off very consciously
from work. I've done this a couple of times and I have, I have to say, it's one of the best things
I have ever done for myself. It definitely sets you back financially and, you know, puts a
crack in your career and stuff like that. But personally, I've never had any better times when
I've taken these sabbaticals. And Visited was the focus of this. And I wound up actually going to
attending the Recurse Center in New York City for three months. And I would say that's where I got my, if you will, my second user.
And, you know, everybody there is very supportive.
It's basically, it's a retreat, a programming retreat that you go on for three months with
these other people who are in the same program, kind of an unlearning environment, or I'm
sorry, an unschooling environment.
Unlearning, you forget stuff while you're there.
And so there was somebody there, Moritz,
who just picked up the tool and used it
for one of the projects he was working on.
And I was like, oh my God, thank you.
Like I had given a demo of the thing
and I was kind of used to like having people go,
oh, that's really great.
And then of course just ignoring it
because everybody's got all this crap
they were supposed to see
and that's what he cares to learn about.
And he actually picked it up and used it.
And I was very, I was kind of touched i guess like it's like oh yeah this
actually is useful in this sense and i kept on working on it there and i got a little more
support there and i think that the thing that really um kicked it off the if i my real you
know first the first user that i had that wasn't somebody that I knew Okay was when I released just after the recurse center and it hit the front page of hacker news and I was like
Okay, great
You know people are people will vote up anything and I'll kind of look at something for three minutes and whatever but then one particular
Person really took notice and really started using it and started contacting me and that was Jeremy Singer vine
And he's the data editor for BuzzFeed. And he's been using it ever since then.
So I'd say almost three years now, he's been using it himself.
And he's been a great source of both motivation and inspiration
in terms of what kinds of features we want to support, how he uses it.
So he does it a lot for data exploration.
If you get 1,000 data sets a week, you've got to be able to dive into each one of them
very, very quickly and bounce around
and kind of get the feel for it.
And half the data sets are completely
worthless, but you want to find that out as quickly as possible.
And VisiData is great for that. In terms of data exploration,
you can get anywhere really, really quickly.
And then you do use
better or stronger or whatever
more specific tools when you've
gotten to that point. But
for the first 10% of exploration,
I don't think there's anything that beats it.
So just on the homepage, visidata.org,
it shows off some of the things it can do.
It's a terminal tool, a visualization tool
inside your terminal.
And you talked about the duct tape analogy.
Right now you have as the tagline,
data science without the drudgery.
And the way you present this is pretty interesting, I think. So you that it's duct tape or kind of glue stringing data and getting something
out and you have this double select so on the left hand side when you have a blank but you need a
blank and on on the first one it's like the typical data formats that we all know and love slash hate
in many cases you know csv json h. I mean, there's lots of formats supported.
You can throw a SQLite database at it.
If you still got some Excel 97 files laying around,
VisiData can handle that.
And then on the right-hand side,
you have kind of the outputs that are possible,
which I think probably most of them use it
in terms of data exploration
and really the kind of data science-y sleuthing that you're talking about,
probably they use the terminal interface, but you can also output, you know, you can clean up text,
you can output, you know, plain text, JSON, these kinds of things. Has this been a feature set that
you just built up over time? Did you have like a first use case where it's like CSV in and terminal
interface out? What was your initial concept of what this thing should do?
Wow.
Yeah, that's a big question.
So I was calling it a CSV viewer, sorry, crappy CSV viewer for a long time.
And part of that's just, you know, some kind of false humility.
But also I kind of wanted to keep it a small project, right?
Like, you know, get in, get out and have it be a three month thing
but
in actuality the thing that really drew me to the project in the first place was the flexibility the kind of almost the
universality of the interface and
That's part of the both blessing and the curse of it right because of the structure
so one of the one of the central theses, I guess, of VisiData is
that everything is data, including metadata and internal data, and etc. And so one of the things
that you can do with VisiData, for instance, is you can go to a list of the columns of the current
sheet as a sheet itself. And you can modify the columns on that sheet, you can edit the names,
change the widths, you can change the types. Basically you interact with columns just like their own sheet. I don't have any
other tool that treats stuff like that. I mean imagine if you had a hundred columns
in Excel and you could pop open an Excel spreadsheet of those columns and have that stuff reflected
back to the columns themselves on the original sheet. And that was one of the original features
in the tool when I had made it at F5 in the first place.
And I kind of, it was a little bit of a throwaway feature there. I was like, oh this is neat, okay, hey.
But then as I got to making it here on, again, this time, I just kept finding more and more uses for it.
And it's just the internal structure is so that it's really easy
to add new loaders. And so you know, you add a CSV loader, you add a JSON loader, because of Python
is so has such a rich ecosystem, you can go ahead and add an Excel loader. And it's just importing
this third party library and just using it like they're really actually small things. And so yeah,
it's it's increased and added over time. But really,
every one of those things is, you know, a handful of lines of code, and because it fits
in with the structure of visit data internally. And so in some sense, like I, I kind of a
little bit, I don't say frustrated, but there's all kinds of great data tools out there. But
they're usually very format specific, right? You have the CSV viewer, you have this JSON editor or whatever. And there's no need for that to be the case, right? I think
any tool that does data should be able to take data from many different formats and sources.
Obviously, you need like an ecosystem like Python to make that easy to do as a developer. But
it's all possible. And especially in this day and age, when there's much code out there in the first place, to do all this kind of stuff.
And one thing led to another. I don't know, I kind of just want to tout that year that I was on that sabbatical, I was working on it during the weekdays and I was being kind of obsessive about it.
There was the time that I was like, you know what, I want to make a Git interface with Visadata. And it's so neat to be able to open up
a Git history in Visadata
and be able to do a frequency
analysis of, for instance, a contributor
and see instantly who's got the most contributions
and then dive into their contributions.
And I think there's actually still
a lot of room for something like a Git interface where
you could, again, go ahead and bulk edit
Git commits and have it
automatically rebase everything for you.
So you're kind of like curating your Git history
in a tabular format.
The more that I work with it and see it,
the more I think that everything really is data.
I mean, of course, that's the case.
And if you treat things like that
and put everything into a tabular format,
you've got instant...
The organization really just makes everything
instantly more powerful.
That does sound intriguing.
And then the other thing that really was awesome the organization really just makes everything instantly more powerful. That does sound intriguing.
And then the other thing that really was awesome was that October, I think,
I was inspired by Drawil,
which is a terminal drawing library that works with Braille characters.
So you can get like eight times the resolution.
And I decided to just cram that into Visited. And so Visited has a graphing functionality.
Everything's a scatterplot, but you can do a lot of work with scatterplots and you can
do it all in the terminal and you can zoom in and zoom out and select areas.
And it's like, I was kind of personally myself also very surprised I was able to pull this
off and make it work as well as it does.
So I kind of like went nuts that year basically and um
visited with where i went nuts on that's awesome so it's 2020 we're four years later it's a leap
year sure it is and so you're coming up on your you're coming up on your four-year uh reset uh
you got busy data 2.0 coming out happens to also be an election year but surely unrelated what's uh
what's going on now you got 2.0 on the rise tell us what's is this project obviously you're not
done with it because you're coming out with a new version so are you looking to continue this into
the future well so this is actually very interesting um i want to get a 2.0 out because i want to have
a stable platform for people to develop their own plugins or apps or extensions for visiting their own loaders, for instance, right?
To have it be a platform that people can contribute to the whole everything is data concept.
And that doesn't happen unless you have a stable platform.
And I don't feel like the API, it was kind of a little, it's a little over the place.
I want to be more consistent and documented
and all that kind of stuff. And I have to admit that
I mean, we've been talking about 2.0 for
months and months, if not a year at this
point. And the
thing that's holding up 2.0 actually
is SemVer.
And I
wanted to make this point because I feel
this deep in my bones where
I want to commit to a stable
API and I'm not there yet. I don't think this is there yet. I think there are things I definitely
want to change. And so I don't feel comfortable releasing it because according to Semver,
once you make a major version, you're stuck with that interface, right? And I'm not comfortable
with this as it is. I feel like I have to go through this whole process of both, you
know, auditing the code or kind of combing through it and finding the bits that I don't like and changing the interface and then documenting the
interface. And so I've been kind of saying, well, you know, let's call it 2.0 because it actually
is, there have been some radical changes to it. We've added a lot of cool features like undo and
a lot of, I can't even go over the things. It's been so long since I've been on this,
since the last real, you know, quote, real release. And so we probably could release very soon
if I wasn't concerned about the API.
But because of the whole Semver mentality
and knowing that aside from just me saying we're not Semver,
the second that I say we're not Semver,
you attract a whole lot of kind of ire,
the ire of the open source community.
It's like, you should be Semver.
And it's like, I don't know if Semver actually is the right thing in general.
I'm kind of like kind of coming away from that myself.
I think that it's actually very difficult to maintain APIs that are completely backwards
compatible.
And you wind up making mistakes, right?
You wind up making a change that seems like it's backward compatible, and it turns out
it's not.
And if you didn't bump the version then, are you supposed to then make a new release and
bump it when you discover it?
Right.
So what you end up doing is bumping major versions way more often right right
as a result of that which turn which is negative i mean some people don't like that and it's like
well right is vizidata used as a dependency very often or is it more of a command line tool that
you use as like a final application it seems more like the latter but maybe there's people that are
embedding it or something well most people i think are using it like that i think the thing i want to do is i want to encourage people to use it as i think it's latter, but maybe there's people that are embedding it or something. Well, most people I think are using it like that. I think the thing I want to do is I
want to encourage people to use it as that. I think it's happening more and more. There's a
plugin architecture now, and actually there are some people who are making plugins for their own
things. I really want to encourage that because, you know, we get feature requests all the time
on the Visadata GitHub issues list. And as much as I like a lot of the feature ideas, I don't have the time
to implement that kind of stuff.
Even if I didn't have a job,
there's still other directors
that would want to take it.
But I want to encourage people
to do their own thing
and to experiment
and to feel free to ship code
outside of the Visadata release cycle.
Not only that,
I don't want to own
someone else's code.
Someone else makes a feature for Visadata. I feel very uncomfortable taking it into Visadata if I don't want to own someone else's code you know someone else makes a feature for
visit data i feel very uncomfortable taking it into visit if i don't fully understand right
because you got to maintain it right they write it once and you exactly maintain it forever
exactly so they lob it in and it's kind of like i don't want to say no to this um you know great
feature and yet the more the more experience i have in the software world the more it's like wow
that's um going to take effort on my part.
And I don't know how long.
I mean, I'm definitely going to be maintaining Visita for a long time,
but I really kind of hope at some point,
maybe even in the near future,
that I can kind of set it aside and let it run itself
because I have other things I want to work on, right?
And so I think part of that is having a viable plug-in architecture
and ecosystem so that people can contribute.
But then in order to have that, you need to have a stable platform with a document api and everything being stable
right if it wasn't for the plugin architecture i could say well just throw semver out you're a tool
people can just use the tool at whatever version they grab it at and they can just update there's
not really much of a reason to have semver for like a, an application that you use. But if you want to have a platform that people are building plugins for, and you do
have to make some decisions and some guarantees around APIs for them to feel comfortable doing
that. Cause I've definitely built plugins for unstable APIs and I don't, I don't make that
mistake anymore. Right. Yep. Um, when the ground gets swept out from under you, fool me once, shame on me,
but I'm not going to do that anymore.
So I understand your reticence to bump that.
I will say I wanted to talk about the feature requests
that we get from VistaData.
And you mentioned you're interested in the process
behind open source software development.
For sure, yeah.
And I don't like having a huge amount of open issues on my the project board
because they just i don't know having 400 open issues just feels both unwieldy on my end and
also i feel like it looks bad from people coming by and think it might be like a buggy thing even
though even if 390 of them are our feature requests and so we've actually um come up with
a policy that we use i kind of like
it which is to market with a as a wish list item in brackets in the title and then close it i
basically uh make a comment saying this sounds great i hope somebody can lend if it's you or
anybody else i'd love to help whatever uh closing as per wish list policy and our policy is just we
have like a marie condo keep things nice and neat on the issues board mentality about it.
And I hope that doesn't turn anybody off.
But I think it's really nice that we have, I think, like 20 open issues right now,
almost all of which are something that I think we probably actually should fix before we ship.
Or if we don't, we actually actively want help on,
as opposed to having, you know, hundreds of feature requests that would be nice to have,
but I can't put any time into right now.
Right. So how do you surface those wish lists in terms of somebody else coming to the project,
you know, dupes and whatnot?
So say I want to feature wishlists,
having them closed, they're not visible to me.
I'm probably not going to search closed issues.
I'll look through the open issues
and see if somebody else has requested this.
But do you have people ever wondering
for the same feature or,
by the way, this has already been closed?
Is there a way of surfacing those wishlist items
and say like, here's what people want. Maybe they can thumbs up it or something, even though it's a closed item. Yeah, I mean, people has already been closed. Is there a way of surfacing those wishlist items and say like, here's what people want.
Maybe they can thumbs up it or something,
even though it's a closed item.
Yeah, I mean, people can search for closed items.
I think people do search through the issues as a whole
more often than you might think.
Like maybe not for a wishlist item,
but in general, maybe this is also how Visited is.
It has so many features.
People are like, I couldn't possibly know
all the things that it does.
And so then they wind up searching through the issues
for how do I do this thing?
And only when they discover that it's not there at all,
do they file a feature request for it.
And so I feel like in that initial search,
where like, how do I do this one thing?
Then they find hopefully the wishlist item
where it does happen, or it hasn't already suggested.
And then submit it.
Also, there's been a pretty active community around Visadata.
People will file issues and then somebody will chime in and say,
oh yeah, this is over here or I'm working on this or whatever.
And so there's a little bit of community interaction.
I wanted to thank AJ.
I think I know the AJ that you're talking about who recommended me for this show.
And he's been exactly one of those people who's been active in the project.
And just even just as a voice
for to talk to new people who come on
who want to say,
who are filing bugs, whatever,
even just to have somebody
who responds quickly,
I think is a really good practice.
And so people will say,
hey, I want this feature
and somebody else will say,
oh yeah, we don't do that kind of thing.
Or here's where you can look
at how you do it yourself
or great idea, wishlist, close. Right. Yeah, that's awesome. So on the about page, you do have a list
of contributors. I do see AJ on there now as well as is it Anya? Is that how you pronounce that name?
It seems like a major contributor. Yeah, I just been a great help. I met her at the recurse center.
She's been fantastic. Awesome. And then you also have a bunch of patrons. So you said this was
your gift to the world that
being said people can throw up their as you call it cold hard cash via patreon curious how that
has gone the idea for patreon if you like the platform and then if you've had any patrons along
the way so you got uh maybe two dozen or so listed here who've helped support the project
you know non-code contributions but maybe financial
contributions yeah i want to first of all i want to give a big thanks to christian warden and
october swimmer which is his salesforce consultancy he's our corporate sponsor he's the
the top uh contributor to the part my patreon and the reason i started patreon was because
originally i had a tiny letter where I would be sending
out content every week or however often I could do it.
And I found that I didn't get any engagement.
Like I sent things out to people and then you can ask up and down all day long.
You know, do you have it?
We've got this feature.
How about this question?
Whatever.
I got almost nothing.
And I felt like it was a lot of effort on my part to put together those things and I
didn't get any reward for it.
And so with Patreon, it's like, well, here's a very low effort way.
It's like I can collect kind of an audience,
and people can contribute to whatever level they feel comfortable with.
And, you know, I have to say that I want to address the asymmetry
in open source software development, where, yeah,
this is my so-called passion project, right,
where I put my heart and soul into it but it's not again it's not worth it for me i want people to um i want
people to contribute i want people to promote it to add code to when you say it's not worth it for
you you mean that the work you put in isn't would not be worth that effort if you were the only one
using it yes right
exactly but it is worth it like it's your passion project it is worth it to you to do this but not
if you were the only user that's what you're trying to say yes exactly sorry yes um okay just
making sure i'm hearing you correctly yes um the only thing that i get out of visit is looking at
the usage numbers i have a way of figuring out how many people are using Visadata in general.
And I look at that chart and I'm like,
oh, I feel like warm and fuzzy
when I see that number going up, right?
Yeah.
Which is a classic ego mentality.
Well, it's your impact.
I mean, it's just showing you helping others.
There's some ego there,
but there's also some altruism there.
Right, yeah. And I think that if I was, if there was only 10 people who are using it on a regular
basis, I'd probably be like, okay, well, that's great, but I'll move on to something else. But,
you know, seeing that number grow and grow and grow makes me more motivated and having people
being willing to put down cold, hard cash, even if it's just a couple of bucks is meaningful.
It's like, no, it's really hard to get people to even put a couple of bucks towards something.
It is.
And so I'm so much more willing to work on something
if someone's subscribed by Patreon, if they have an idea.
And like I said, it's not that they're paying me to do it.
Like it's not worth it for me to get $3
to do whatever feature that takes me four hours.
But just the mere fact that they're, you know,
invested in any way is motivating.
And so that's the reason behind the um patreon being a patreon thing and i don't actually i actually i don't take
money out of it as an income for myself like i have a full-time job and so i don't really need
that money in a certain sense what i do with that money is i run other experiments so for instance
this last winter i was working there's a guy called,
his name is Tom Buckley Houston. And if I hope I pronounced that wrong, right, hopefully Tom.
Anyway, I wanted to see if I could get VistaData on the web. And so he's, he makes brow.sh,
he calls it, which is a text interface to the browser. You can basically run a full
browser in your terminal. And you can do things like play YouTube videos through it. Like it's
nice. It is pretty cool. Yeah, mind blowing. So he's my kind of nuts. And so I contacted him and
wanted to hire him just for a little bit to see if we could get a version of Visited on the web.
And he whipped something up, and we actually used GoTTY,
and we kind of got something working.
And I didn't wind up actually publishing it or pushing it out
because I know that I might be inviting a stampede,
and it's kind of a big hassle to get the ability to scale things
and make it be viable.
But as far as just having a very simple interface to a server that's running
visited, we totally, he totally pulled it off.
And so I use the money that I get from Patreon to support stuff like that.
Those kinds of, those kinds of things,
those little experiments that like, you know,
it's not worth it for me to spend a few thousand dollars to do something like
that. But if it's essentially, you know, quote free money, then well, sure.
Let's play around and see how this goes. Plus, it's investing it back into the project,
which I'm sure your patrons would, I mean, they want to support you, but they want to support
VisiData. And so it's like, this is like directly supporting that, whether it's putting food on your
table or putting research and development into the tool itself. I think they're happy probably
either way. So it's a great use of those resources, especially if you don't personally need them to live your life.
Yeah, it's a good point.
Very cool.
Well, Vizzy Data, awesome project.
Definitely has piqued my interest.
Thanks again to AJ for recommending it
and for being a key part of Saul's community around this.
If you're interested, of course,
all the links to all the things are in the show notes right
where they belong.
Saul, thanks so much for coming on Maintaining Spotlight.
Thanks for putting this project out there and all the cool stuff you're doing on your
website.
Any final words to the hackers out there, the open source community regarding VisiData
or what you're up to?
What would be your call to action?
If somebody is listening and they're like, Saul seems cool, his project seems cool, what would you want them to action like if somebody is listening they're like saul seems cool
his project seems cool what would you want them to do give it a try hop on your patreon hop onto
your github issues what's the best way somebody can get involved with busy data i would say that
the things that um are most meaningful to me are to give it a try to so install it play around with
it for i'd say an hour it seems like if people can play around with it for an hour,
then they either get it or they don't.
If they get it, they're kind of hooked.
And then to say it out loud, say it in public, on Twitter or whatever,
and say why you like it and what has been mind-blowing for you.
I feel like that's the one piece that I haven't been able to get,
and it's hard for me to say it myself.
Obviously, I think it's great, but the more that I say it's great, people like yeah sure he loves his own baby whatever um so yeah so if you do like it then uh say it out loud and say it loud say it proud there you go
and uh yeah um otherwise the other one other thing that i am looking for i think that there's a uh
gap in visit data's packaging in terms of Windows releases.
Like we have it for Linux and for Mac.
It comes as part of the Python installation stuff.
But you can run it on Windows
using, for instance, a Windows subsystem for Linux.
But it's a little bit of a pain to install.
And so that turns a lot of people off.
And of course, a lot of people are running Windows.
So if there's anybody out there
who has experience with making a Windows
release, like an actual package for Windows, especially
from a Python project and especially for a Python
terminal project, I would love to talk
with them. We really need a, could use a
Windows release manager. I think it would really increase
adoption and usage of
this area. Very good.
Well, Saul, I've had a blast chatting with you.
Thanks so much for coming on the show and
thanks for all the open source work you do.
Thank you very much for having me.
It's very nice meeting you.
All right.
Comment at us on changelog.com.
Are you going to take busy data for a test drive?
Interested in starting an open source project around biographs?
Have more questions for Saul about the seedy underworld of crossword creators?
Here's what to do.
Pop open your show notes, click the discuss this on changelog news link
and let your voice be heard.
It's super easy and totally free.
Thanks again to Tidelift,
our partners on this maintainer spotlight series.
Tidelift is managed open source,
backed by maintainers.
And hey, they have a new webinar right now.
Best practices for open source app development
in a downturn.
This isn't a downturn, is it?
Yeah, this is a downturn. Learn more at Tidelift.com. Our beats are farm fresh and we get them from
Breakmaster Cylinder. And we're brought to you by amazing people at companies who get it. Thanks
again to Fastly, Linode, and Rollbar. On the next episode, we have GitHub's CTO, Jason Warner. I
don't want to say too much, but trust me, you want to listen to that one. Subscribe if you haven't
at changelog.com slash podcast or search for the changelog in your favorite podcast app.
You'll find us.
That's all for now.
We'll talk to you again next time.