The Changelog: Software Development, Open Source - Data Science at OSCON (Interview)
Episode Date: November 10, 2017We went back into the archives to conversations we had around data science at OSCON 2017. We talked with Vida Williams (Data Scientist) and Michelle Casbon (Director of Data Science at Qordoba) about ...the social impact of open data, personal data and transparency, privacy, the big data problem of public surveillance, electronic fingerprinting, the rift between data scientists and computer scientists, natural language processing, machine learning, and more.
Transcript
Discussion (0)
Bandwidth for Changelog is provided by Fastly.
Learn more at fastly.com.
And we're hosted on Linode servers.
Head to linode.com slash changelog.
This episode is brought to you by Bugsnag.
Bugsnag is mission control for software quality.
And on this segment, I'm talking with James Smith, co-founder and CEO of Bugsnag,
about the core problem they're solving for software teams,
and why you should head to bugsnag.com slash changelog to test it out with your team.
Let's start with, you mentioned you and Simon. So you guys obviously at one point didn't have
this company, right? So as founders, as engineers, you got to a problem. What was that problem?
Why does Bugsnag exist? Simon and I, my co-founder i met in college we went off to
build software for other companies i ended up in startup he ended up in enterprise software
and we had the same problem in both of these companies when things break it's really hard
to figure out how badly they're broken who's impacted and what to fix first so we both had
this problem ourselves so we decided hey why is no one doing a good job of fixing this problem right now? So very much Bugsnag was born out of scratching our own itch, as they say. new features to your customers or you want to build cool new stuff but at the same time you've
got to fix bugs because no matter how good a coder you are you're going to introduce bugs but there's
no clear definition of where to set that slider should i be fixing bugs now or should i be
releasing features and so this tension exists i think in all product teams all software teams
if you don't have a tool like bug sn, it's very difficult for you to figure out
where to spend time.
And so that's the idea here,
is we're trying to help teams understand
whether they should be building or fixing,
because there's a bit of a delicate balance between both.
So if your team is unsure of how to spend their time
building or fixing, give Bugsnag a try.
It's free to get started with a 45-day extended trial exclusive to our listeners.
Head to bugsnag.com slash changelog.
And by Linode.
Everything we do here at Changelog is hosted on Linode servers.
Pick a plan, pick a distro, and pick a location.
And in seconds, deploy your virtual server. droll-worthy hardware, SSD cloud storage, 40 gigabit network, Intel E5 processors,
simple, easy control panel, nine data centers, three regions, anywhere in the world they've
got you covered. Head to lyndo.com slash changelog and get $20 in hosting credit. Thank you. Data Science at OSCON 2017. We talked to Vita Williams, Data Scientist, Educator, and Entrepreneur, and also Michelle
Casbon, Director of Data Science at Cordoba.
We talked about the social impact of open data, personal data and transparency, privacy,
the big data problem of public surveillance, electronic fingerprinting, the rift between
data scientists and computer scientists, neuro-linguistic programming, machine learning, and so much more. Enjoy the show.
Unless you're a data practitioner in the world of open source developers,
it's not really on the core of everything. I have to make a compelling case to be interesting.
I see data science and I get excited.
Yeah.
And I'm an open source developer.
So, yeah.
Maybe I'm the outlier.
No, well, it was interesting because one of the things I talk about is open data.
That's specifically what I'm interested in, but the social impact of open data.
How do we come together?
That's what we want to talk about.
But that's my thing.
Right.
And there's just now a burgeoning conversation around it.
I think we tried to have it, interestingly enough, 20 years ago,
but there wasn't an infrastructure for open data at the time.
Who's we?
Data practitioners.
I mean, my first big project was a DPA data project,
so that was big data before big data was big.
We were doing something stupid that 15 years later we knew not to do
and that's moved from mainframe into relational like I don't want to do that
to that volume of data. That being said at the time there were discussions
around transparency and open data and who should have access to it but there
were no standardizations, there were no protocols, there were no accesses, there
were no platforms.
So now we're finally in a place where we can have this discussion because especially in the open source, all that stuff exists.
So now it's regathering the Avengers, if you will,
all the data superheroes and going,
hey, we can now hold everybody accountable for privacy,
for serenization, for protocols on access
in order to actually make a difference.
So why don't we do that?
So anyway, that's what the talk was about.
Cool.
Interesting.
We've actually had some shows.
We've been around for a while.
2009, we started this show, and we've talked about open data,
mostly in the government space a couple times.
Yeah.
Yeah.
I'm looking for some, like, older shows.
It's been a while.
Like, Civic Hacking with, this, this is like the first one,
with Luigi Montanez and Jeremy Carball.
That was when they were both working.
Sunlight Labs?
Yeah, Sunlight Labs.
Sunlight Foundation?
Yeah.
Well, now you have the President's Information Fellows, the PIFs, right,
who are in that whole White House-sponsored open data platform.
But an interesting question came up in my session about
if this conversation was before and what do we do about the question of privacy?
So it was really like, okay, so if everybody's supposed to have this personal data,
then what is this, how do we accomplish this around privacy?
And my response was we need to hold,
we as data practitioners need to challenge the hypocrisy of privacy.
We want to put a camera everywhere and be able to develop in reality TV,
and there's no privacy communication there.
But all of a sudden, you're a data point, and there's all of a sudden a need for privacy.
So we as practitioners need to actually challenge the definition of data as though image is somehow not data and thus exempted from privacy.
But if you're a number or some type of codified information, then all of a sudden it's privacy
rules.
That's interesting.
I never really considered the idea of cameras being somewhere,
and considering that, I hate that too.
I mean, I may be somewhat of a devil's advocate,
but I'm not sure your perspective.
It kind of bugs me that you can take six data points
and figure out exactly who I am.
Absolutely.
Male, color, where I originated from, how much money I probably make,
if I have kids. You can take six data points and pretty much figure out roughly everything about me besides my name.
That's the world we live in, but should we accept that?
Is it okay to have all that?
And I'm born in 79, so I'm 38 years old.
People born today's age, it's like second nature.
They have no expectation of privacy.
Well, and so, okay, so where I sit on it, I'm an introvert data geek, so I don't want anybody to know anything.
Okay, so maybe I'm not devil's advocate.
No, no, no.
I don't want anybody, you know, I'm one of the first ones to say I'm falling off the grid for said period of time and you can't get me. But I also, I think having been in technology for so long,
strike a cool balance between the fact that in order for us
to have this technological infrastructure
and the innovation revolution that we're currently in,
we have already as a country at minimum,
world a little bit less, but equally made a decision to forego privacy.
So now when we discuss privacy, we're only talking about
it really in the realm of making you feel comfortable at having you as a citizen for
having given it up. Right? Anytime you start...
So it's already out there. It's reversing it.
Right. It's already gone. Now, the problem that I have from a data scientist's perspective
is the definition of data. We will refuse to call image information data,
and it is equally data.
We as a...
When we start talking about privacy laws,
we do not consider image, video, et cetera,
with the same standard as we do your credit card number,
your social security number, you know,
except for now we have technology
where if I put your picture up, I can equally find everything about you on the internet
that's associated with that image, right?
You're scaring me, Vida.
Come on now.
I mean, I'm just saying.
It's true.
It's like catfish, right?
You just throw that image in Google or whatever, this magic machine.
If you're trying to prevent catfish from happening, you might want to put the image up.
I'm just saying.
Yeah, that's true.
That's true.
But we don't have the same protocols and expectation around privacy.
Right.
And I'm saying there's a bit of a hypocrisy there.
And so in my space, when we're talking about making an actual difference in the world,
so we will not at all disclose the information of a youth who's in trouble at all, right?
But as soon as he's in a fight
or as soon as he's in some police exchange
or as soon as he's in whatever,
all privacy goes out of the window
because there's an image, there's a video,
and now we know everything, right?
Yeah.
But if we could have just, and this is my,
so one of my core spaces is child welfare.
I work a lot in education.
I work a lot in urban planning,
a lot of impact investing and a lot of those things
where I feel like we make communities safer.
How about if we just identified at the point in time
that he became a foster youth
and all of a sudden his environment is unstable?
Why couldn't we de-privacy, denude some of that data then
so that we could provide services that
could have helped him.
But now that is a privacy issue.
So I don't know where the lines are.
I just know that we don't, I don't know where the lines are, but I know that we do not have
a rational way of discussing privacy via data in a way that is actually going to be beneficial
for humanity.
That's what I know.
So my thing is issuing a call to action to those who deal with data to begin the process
of discussing how do we templatize it, how do we standardize it, what protocols do we
put in place in order to make data more available and more consumable for impact.
That's my goal.
And I don't know if you're recording any of this.
We recorded all of it.
Did you really?
We've already started.
We actually,
this is like a soft opening here.
Yeah.
Unless you want to like resume it.
No,
I was about to say that.
Like,
by the way,
we've been recording this whole thing.
This is a good riff.
So let's keep it down.
We don't want that privacy here.
You know,
we've been recording everything you're going to say.
I was going to say that.
Well,
it's funny because normally we'll do like an intro thing and then we'll start.
Well, she was glad it's already had it going. I was like, we'll just keep talking. I was like, this is better than the you were going to say. I was going to say. Well, it's funny because normally we'll do like an intro thing and then we'll start. Well, she was glad it's already had a go.
I was like, let's keep talking.
I was like, what were you thinking?
This is better than the show is going to be.
This is the show, y'all.
This is the show.
This is the show.
Yeah, so Vita Williams.
Vita Williams.
Lots to say.
From my perspective, I didn't realize this, so I've always considered it,
but because I'm just like a nerdy developer person, like images are data,
the video is data, my phone number is data. I always saw it, but because I'm just like a nerdy developer person, like images are data, the video is data, my phone number is data.
I always saw it the same.
I didn't realize that the classification from the data practitioners or from the governmental
bodies or people making the decisions, they see imagery and video as like completely distinct
things.
Well, think about it this way.
When you had the huge push for police to wear cams, right?
Like that was the answer to the interactions between police and youth, right?
The answer was, let's everybody wear a cam.
Body cam, yeah.
Right? So my response was, who is managing all that data, right?
How are you exactly organizing the fact that, well, we need to pick up this cam from this
person at this time, and who has the space?
Who's managing the space constraints for culling all of that data at once?
Those types of properties.
Is it archived?
Is it archived well?
Yeah.
Could it be used in the cord?
Absolutely.
All these things.
I never even thought about that.
Nobody does.
Nobody did. Right. And that is where- We do. We should where the data people come in. And we were nowhere in that conversation. So, yes, a social justice question because the legislators want to say, yes, we're a body cam. And the data people are like, well, wait a minute. That's like a yes, no, because that's a yes, we should do it. But a no, we can't. Right. And then how do you play that out later in the courts? And
then where's the question of privacy then? The people in the video are under 18. How
much can you show? You can't even tell a child's name if there's been any type of sexual violence
in a newspaper and yet you can show an entire video of a young person in some type of exchange
with police? Talk to me about privacy again. But because the data people are missing from those types of conversations,
those points are only discussed in our rooms behind our little screens
because we don't really like talking to people.
So what are they doing, then, with these cameras?
How are they dealing with the data?
Do you know?
I have no idea.
I honestly have no idea.
I have talked to a couple.
What's your best guess?
My best guess is they're not.
Just lose it.
So you think maybe it's around for a week until the SD card is formatted?
They'll have.
And, in fact, what will happen is we'll have some case that will challenge it, right,
where the data will need to be there.
The data, the film, the metadata, and the images will all need to be there.
And we'll just call them the legislators of the day.
We'll come up and say, you know what?
Our policy at that point in time was to archive it seven days because of the volume of the data.
And unfortunately, that was cut before we could get there.
Right.
It'll be some answer like that because then that enables the legislators to vote yes.
And then the execution of it to fall defunct and it be nobody's fault.
Yeah.
I'm starting to think of chain of custody and issues like that as well.
Exactly.
Who's the one who's maintaining the data?
Is it the same people who are called under question by the jail?
And that's why I said the metadata becomes very important.
Who picked it up?
Who cataloged it?
Where did they move it?
When did they move it?
We have electronic fingerprints.
That's all a data issue.
That's a development issue, right?
That's an infrastructure issue.
But we don't have the practices in place,
and nor do we have the protocols in place to deal with issues such as privacy. So now,
if you had a routine traffic stop, I would stop. You know, he's got a camera on, he's taking a
picture of me. But later I go running for office, what if I cursed him out during that traffic stop?
Well, that video can resurface. Where's the privacy of that was a state sanctioned video so there's all kinds of questions of privacy that never come up
when you're dealing with data from an image perspective they always say you
never have something to hide until you have something to hide that's the truth though
but in the era of data you have everything to hide or nothing to hide
like that's where we are now you don't even know what's out everything to hide or nothing to hide like that's the that's where we are now
you don't even know what's out there too high i'm out i'm out i'm going off grid we're done here
i'll get my privacy back oh boy uh does it kind of feel like that you throw your hands up and
you're like what what are we gonna do i did that years ago when i knew that we gave up privacy it
was just one of those things where i literally will fall off the grid for a moment because i
know i'm never really off the grid.
I just don't want to talk to anybody.
So I think we're in the era of transparency.
I think the best opportunity we have as citizenry and on our side of the house as developers, as infrastructure planners, as data, is to begin to influence the legislation around it. It's to begin to have some expectation that would be at the table as they're defining what are the rights and the wrongs of people as it has to do
with the information that we're culling. I think that's where we need to be and I
don't think that we're in the conversation at all. I don't think people
are thinking about let's bring the geeks to the table to discuss how this can happen.
I agree with that. They want us there last. We've made the solution. But it's too late. Go make it.
They want us to fix it. We've designed how
it should be. Yeah, exactly. All the decisions
are made. Here's the spec.
Can you do this now? Two weeks.
We're ending this tomorrow. Exactly.
Two weeks. Like, well, really, we needed this last week, so
we're going to pay you a hell of a lot of money to
maybe get it wrong, but we've got to roll it
out anyway, and then we'll just correct it on the back end.
Oh, man.
That's how it's going to go down. That's end oh man that's that's how it's gonna go
down that's how it goes down that's how it goes down but we can change that that's why you're
doing this podcast we're calling awareness to it call to action bring the geek avengers out we can
change this what's your biggest call to call to action for developers data scientists geeks out
there what's your biggest call to action steps what can we do my biggest call to action is really
get engaged with social justice issues, right?
There are not enough of us that apply our talents into spaces where our impacts can be readily felt.
So three years ago, I went from working high-corp enterprise architecture and data
to deciding that if I was so good at what I do, that I can drive corporate missions forward,
Department of Defense missions forward, that if I use that same talent and applied it to child welfare and applied it into these
other places, that I can drive those missions forward just as fast. And I would think that
that would be true for all of us, that if we reapply all of our skill sets in these areas
and look at that as a donation, as much as we look at dollar donations, that maybe we can start
affecting change in our communities. Any low-hanging fruit in particular you
could mention? Absolutely probably education is the biggest one right now
like how do we standardize education data so that we can actually show where
our students are successful where they're struggling which communities can
benefit from what types of actions right we just need data we need platforms to
be able to nationalize
some of the results that we're getting from the education systems. If there's already
a mandate to produce education data, why isn't it standardized across the nation, right?
And who's holding them accountable for doing that? And then who's doing that type of reporting
that is accessible to educational practitioners, whether that's preschool programs or extracurricular education programs or whatever it is,
social workers or counselors.
So that's low-hanging fruit that's really easy
but has the biggest impact for our next decade.
Always got to take care of our future generation, right?
It would seem to be.
That's the best place to invest.
They don't even know that, you know, gone.
They don't even know that they're not supposed to tell you this information.
Yeah, really.
So that's probably my biggest call to action and the first industry that I would say we could be the most impactful.
So if people were listening to this and they're like, I love Vida.
She's awesome.
They can learn more about you.
Where do they go to find out more about you and what you're doing?
Well, the first thing I would have to do is tell you my name is not Vida, but Vida.
Oh, my goodness.
Which is fine.
Come on now.
You already said it 15 times, and I messed it up.
You waited this long.
I even said, are you Vida Williams?
I'm not even embarrassed now.
She said, yes, I am.
I'm just mad.
Oh, man.
More mad than embarrassed.
Vida.
The audience knows that I mess a lot of names up.
And I was going to say, it's not a big deal, because in Europe, they told me I say my name wrong anyway.
Okay.
What is it then?
It is Vida Williams. Vida. V-I-D-A. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. say my name wrong anyway. Okay. What is it then? It is Vida Williams.
Vida.
V-I-D-A.
Vida.
I went on the Spanish.
I was thinking Vida like life in Spanish.
Me too.
Live in La Vida Loca.
Yes.
What I said to Adam and he rolled his eyes at me.
No.
That's what I.
That's it.
Live in La Vida Loca.
Yes.
That's it.
And I am Vida Christy everywhere.
So on Twitter, on Google, via email, Gmail. Okay. You can always get me am Vita Christi everywhere so on Twitter on Google
via email
Gmail
you can always get me
at Vita Christi
we'll put the links
in the show notes to you
and make sure
everybody knows about you
awesome
any closing thoughts
I just thank you
for the opportunity
to ramble
for about 15 minutes
I mean I don't get that
too often
that's pretty awesome
we're happy to
talk to you
very much
thank you That's pretty awesome. Cool. Happy to talk to you. Thank you.
This episode is brought to you by GoCD.
GoCD is an open source continuous delivery server built by ThoughtWorks. It provides continuous delivery out of the box with its built-in pipelines, advanced traceability, and value stream visualization.
With GoCD, you can easily model, orchestrate, and visualize complex workflows from end to end.
It supports modern infrastructure with Elastic On-demand agents and cloud deployments,
and their plugin ecosystem ensures GoCD
will work well in your unique environment.
To learn more about GoCD,
visit gocd.org slash changelog.
It's open source and free to use,
and there's also professional support
and enterprise add-ons available from ThoughtWorks.
Once again, gocd.org slash changelog. And by TopTow. TopTow
is the best place to work as a freelancer or hire the top 3% of freelance talent out there for
developers, designers, and finance experts. In this segment, I talk with Josh Chapman,
a freelance finance consultant at TopTow about the work he does and how TopTow helps him
legitimize being a freelancer. Take a listen. Yeah, in my arena within TopTal about the work he does and how TopTal helps him legitimize being a freelancer.
Take a listen. Yeah, in my arena within TopTal, I specialize in everything from market research
to business plan creation, to pitch decks, to financial modeling, valuation. And then that
leads very naturally into fundraising strategy, capital raising strategy, investor outreach,
closing a deal, deal negotiation, how to value the company, how to negotiate strategy, capital raising strategy, investor outreach, closing a deal, deal negotiation,
how to value the company, how to negotiate that.
And all those skill sets that I have continued to hone over on the TopTal side are ones that
I actually deploy every single day in my own company.
Freelancing can sometimes be seen as not legitimate or subpar work.
Now, I would argue that when you work with a company like TopTal, they put so much vetting into not only the companies that you work with, but also the talent that you work with, which I'm on the talent side, that it adds a level of legitimacy that isn't seen across other platforms.
And that, for me, as the talent side, is incredibly fruitful and awesome to be a part of.
I enjoy the clients. I enjoy the other talent that incredibly fruitful and awesome to be a part of, right? I enjoy the
clients. I enjoy the other talent that I get to talk to. I enjoy the TopTal team. And that creates
an overall positive experience, not only for TopTal, but for me as the talent and for the
client as the company on the other side. And that is really not seen or is the experience across
other platforms in the freelance market.
So if you're looking to freelance or you're looking to gain access to a network of top
industry experts in development, design, or finance, head to toptal.com. That's T-O-P-T-A-L.com
and tell them Adam from the Change Law sent you. For those wanting a more personal introduction,
email me, adam at changelog.com. So we're here with Michelle Casbon,
Director of Data Science at Cordoba.
And Michelle, you as well as Vida Williams,
another data scientist we spoke to at this show,
and I guess maybe other,
I just feel like we're sensing a thing
which I didn't know existed.
We were talking about it before we started recording,
but I wanted to get your explanation
because this is a social construct
that I have never experienced,
which is there seems to be a bit of a divide
between data scientists,
maybe with quotes around that,
and computer scientists with quotes around that,
or programmers.
What's up with that?
Yeah, that's a great question.
I think it stems from a lot of,
so data science didn't really exist
until, I don't know, five years ago, 10 years ago. It's a lot of, so data science didn't really exist until, I don't know, five years ago,
10 years ago. It's a new thing. And I think when companies started to bring data scientists on,
they sort of created these organizational structures that put a wall in between them.
And they have different skill sets for the most part. So there's definitely some overlap.
Engineering, you need a really strong programming background but data
science you need strong engineering and strong math all of these other things in
addition and so I feel like engineering kind of thought well their programming
skills aren't as strong because they're really good at math and then the data
scientists are like well they don't know anything about modeling because they're really good at math. And then the data scientists are like, well, they don't know anything about modeling,
so they're no good.
But I think it really boils down to organizational structures
and having that wall in between.
Because a lot of times data science will do some really amazing things with math,
and then they'll sort of like, hey, go implement that.
Go put it into production.
And an engineer is like, this library, it doesn't exist in Java. I don't know what kind of magic you expect me to do.
But that's sort of throwing things over the fence. And that kind of tension, I think,
has caused a lot of problems. And that seems to have moved beyond the walls of the corporations to even events like this where I think yourself as well as Vida
both responded to us
in certain
different terms like are you sure
you want to talk to me?
I'm not a developer
and our response to that is like wow
sure yeah yes we do
we want to talk to you
I have never been aware what's my response to that
question well that's okay well and to be fair i didn't say that's okay i didn't say i'm not a
developer because data scientists are definitely right you didn't say you're not well vita said
she wasn't a developer you think it's just said what's your audience well enough like maybe not
hanging out like since it's newish so to speak, like, maybe y'all haven't gotten a time to congeal that well
or hang out in the same rooms and realize that you're all human beings and you all have smarts
and can bring something to a changing landscape of things.
Yeah, I mean, logically, that makes sense.
It makes a lot of logical sense.
Humans aren't logical.
Right.
That's true.
Or emotional. Very judgmental. logical sense. Humans aren't logical. Right. That's true. Or emotional.
Very judgmental.
Very picky.
I don't know.
I guess there's just, it seems like there are these two focuses.
Like, one is just on sort of production code, you know, writing things that don't break.
And then there's the, no, but machine learning.
Like, the math is the most important part.
And so I just think that, like, with any two organizations, just like between engineering and DevOps, like, there's the most important part. And so I just think that with any two organizations,
just like between engineering and DevOps,
like there's a lot of tension.
Because the goals are a bit different.
Right, and in a certain sense,
because there's overlapping skill sets,
but not identical skill sets,
both sides feel threatened by the other one.
Yeah, that's a strong word, but.
Oh, that's too strong?
I mean, threatened is like, that's just a strong word.
Okay.
I'm not saying it's wrong.
I'm going to back it off.
How do you mean threatened?
Just curious.
I said it.
No, no, no.
But she says, she thinks it's strong.
Why is it strong?
Because I felt like it was.
Like in what way?
I felt like it was apropos.
I feel like it's right on too.
Yeah.
But different reaction here.
So please tell us.
So I think because we understand enough of
what the other side does that it's easy to be critical of how other people are
doing things I think the best way to so what I've what I've seen to make the
problem go away the best is really just to take down those walls and like
organizationally you're not two different people.
As you were saying, just sit together, work together, there's even like job descriptions.
Sitting together, yes, and like sharing titles.
So I consider myself a data science engineer because I feel like that better describes
what I do because I do have a background in engineering and now I do a lot of machine
learning and like my official title is director of data Science, but I don't feel like that's distinct from engineering anymore.
So NLP is what I focus on, and in order to do that, I have to be able to understand distributed
computing, and that didn't necessarily exist in traditional NLP.
And so now to be able to do machine learning, I really have to understand so much
of it and vice versa.
If anyone wants to implement any of these models, any of this NLP stuff, they really
kind of have to understand what the libraries are doing.
I guess what I'm saying is just that the more you can merge the roles and the everyday tasks, like whether that starts with
calling people data science engineers or merging titles somehow or giving people
the same sort of social status in the hierarchy, the engineering
hierarchy, either way I think the more those can merge and the more you can align
those goals, yeah, then the better people will work together.
It's a form of segregation, right?
Titles, wouldn't you say?
Well, you're literally segregating.
It's not a racial segregation like maybe that term is normally associated with,
but it's a segregation.
You're separating by roles and distinctions
when you should be melding more
and considering yourselves more of a cohesive unit.
It's what you learn in the military.
It's what you learn working with teams.
And the more you operate as a team, a fluid team, the better you are in the end result.
But in the military, you have titles.
You have the medic.
You have the engineer.
I didn't say that the authority and structure is required because you have to respect those above you who've had the experience a bit down the road.
So that's still there, I think.
I mean, military is maybe a little different to compare perfectly.
It's not a one-to-one, but you still have structure.
You still have hierarchy.
But that doesn't mean that you can't be on the same team.
I agree.
And that also helps with the whole common goal thing.
Like, you're all working towards the same thing.
Right.
You don't have to be nailed down to a certain thing.
Yeah.
We just got to quit putting each other in boxes, man.
That's right, man.
No boxes, okay?
Don't put me in a box, right?
Box, not boxes.
I'm really encouraged by the fact that you guys, like, didn't even know that there was this tension.
That is definitely a good sign for the future.
I've started getting a hint of it, though.
I've been working with...
Daniel Whiteney?
No, Pete Soderling from Data EngConf.
He's great.
Yeah, Pete's great.
And so I've kind of caught some edge that there's this divide
because, like, okay, why is it Data EngConf and not Data ScienceConf?
Or just, like, why are there these nuances?
And so I didn't know the animosity or the divide,
but I can sense that something was not perfect.
Not a cohesive world.
There was a distinct between the different roles.
Yeah, and his conference is part of, I think, part of the solution
because he really addresses it.
And it's all about working together as data science engineers
and not as engineering and data science.
Not as individuals.
Yeah.
That's cool.
Let's talk about your talk and what you're here to talk about.
You said your focus is on natural language processing, speech recognition, stuff like
that.
Is that what your talk was about?
So it was about how we use NLP at Cordova.
So we have a platform that helps people localize their products.
It doesn't really matter what the product is, but most everyone has a website or a mobile
app, anything like that.
We have a platform that helps people release that product in different markets.
So not just English speaking ones, but really across the globe.
And so my role within the engineering team is to work on the machine learning.
So my talk really set the stage for, okay, why is localization important?
Why should you even care about it?
Because these are the disasters that happen when you don't care about it.
And I went down into a few of the details about which tools we're using.
We built a lot of this on open source software.
I really couldn't imagine building it on anything else.
Like, open source really did enable us
to even create this platform.
Because of the cost or because of the why?
No, capabilities.
It's just better software.
Well, I don't think any of...
Like, I don't think...
There's so many different components.
I don't think any one, like I don't think, there's so many different components. I don't think any one vendor provides that entire stack.
And even if I wanted to cobble all that together,
it would be extremely difficult.
It's much, much easier using open source tools
and they have gotten better so much faster.
What are some of the tools that you're using?
Let's see.
So the heart of our machine learning,
we're using Sparks, MLlib, we use their logistic
regression, random forest, stuff like that, libraries, and prediction IO is what does
a lot of the NLP stuff.
And let's see, we're running that in Docker containers on Kubernetes, it's all in scala and let's see our storage layer is raising maria db
and cassandra um there's i mean there's a lot of stuff yeah yeah yeah so i talked a little bit
that's interesting laundry list yeah it's basically it's all open source it's almost
all open source basically a dream yeah like as an engineer to be able to work with such amazing
tools it's yeah it's really really fun they Yeah, like as an engineer, to be able to work with such amazing tools, it's really, really fun.
They didn't have to work too hard to recruit me.
Because the mission, I mean, changing the world, being able to give people products that feel native to them,
even if they don't speak English, can really do so much good in the world by building that kind of platform.
And then using the best tools out there to do do it the tools that engineers really want to use that's that's a big plus yeah I love
the branding yeah the branding is phenomenal Cordoba have you seen the
site yeah I have it it's beautiful we have a great designer yeah I mean I love
the the direction it's I, it looks extremely trustworthy.
That's actually our brand newly unveiled site because we just announced our funding.
We just closed our Series A funding round, and part of that was unveiling the new website.
So I'm glad you like it.
Congratulations on all that.
Why is it the first time we're hearing of Cordova?
Why do you think?
So I've asked myself that question a lot.
When I first met the co-founders and I first heard about what they were building,
it was one of those times where I was just like, light bulb, how have I not thought of using
machine learning for that purpose? It's so well suited. It just makes sense. But I think a lot
of good ideas in the past are like that. They seem obvious once you've thought of them.
Right.
The thing about localization...
Exactly.
This circle was better than that square I was using.
The thing about the localization field
is that it just really hasn't changed much in 30, 35 years
and we're really here to take a lot of the tools
that work so well in other areas
and apply it to this sort of older, more traditional one.
And why hasn't anyone done it before?
I have no idea because it makes so much sense.
And it's really, really exciting to be a part of that
so early in the game at such an early stage of a startup.
It's a fantastic experience.
Cool.
Well, Michelle, thanks so much for sitting down with us.
Of course.
Any closing thoughts to share? Anything, any words of wisdom to part on? Cool. Well, Michelle, thanks so much for sitting down with us. Of course.
Any closing thoughts to share?
Any words of wisdom to part on?
For the data scientists out there, the data engineers out there, and the mathematicians
not melding well enough, what's going on?
Feel the love.
I guess I feel very personally invested in that whole data science versus engineering thing because I have one foot in both sides.
You're the hybrid.
I am definitely a hybrid.
And that's been a fantastic experience.
I haven't encountered any animosity in my personal teams.
Okay.
And so I guess I just want to see more of that just everyone be nice
everybody be nice
be nice please
alright thank you for tuning in to this episode
of the changelog if you enjoyed this show
share it with a friend
rate us an apple podcast and thank you to our
sponsors Bugsnag
Linode, GoCD
and TopTile
also thanks to Fastly, our bandwidth partner
head to Fastly.com to learn more
we host everything we do
on Linode cloud servers
head to Linode.com slash changelog
check them out, support the show
the changelog is hosted by myself, Adam Stachowiak
and Jared Santo
it's edited by Jonathan Youngblood.
The awesome music you've been hearing is produced by Breakmaster Cylinder.
You can find more episodes just like this at ChangeLog.com
or by subscribing wherever you get your podcasts.
Thanks for listening. Thank you.