The Changelog: Software Development, Open Source - Data Science at OSCON (Interview)

Episode Date: November 10, 2017

We went back into the archives to conversations we had around data science at OSCON 2017. We talked with Vida Williams (Data Scientist) and Michelle Casbon (Director of Data Science at Qordoba) about ...the social impact of open data, personal data and transparency, privacy, the big data problem of public surveillance, electronic fingerprinting, the rift between data scientists and computer scientists, natural language processing, machine learning, and more.

Transcript
Discussion (0)
Starting point is 00:00:00 Bandwidth for Changelog is provided by Fastly. Learn more at fastly.com. And we're hosted on Linode servers. Head to linode.com slash changelog. This episode is brought to you by Bugsnag. Bugsnag is mission control for software quality. And on this segment, I'm talking with James Smith, co-founder and CEO of Bugsnag, about the core problem they're solving for software teams,
Starting point is 00:00:24 and why you should head to bugsnag.com slash changelog to test it out with your team. Let's start with, you mentioned you and Simon. So you guys obviously at one point didn't have this company, right? So as founders, as engineers, you got to a problem. What was that problem? Why does Bugsnag exist? Simon and I, my co-founder i met in college we went off to build software for other companies i ended up in startup he ended up in enterprise software and we had the same problem in both of these companies when things break it's really hard to figure out how badly they're broken who's impacted and what to fix first so we both had this problem ourselves so we decided hey why is no one doing a good job of fixing this problem right now? So very much Bugsnag was born out of scratching our own itch, as they say. new features to your customers or you want to build cool new stuff but at the same time you've
Starting point is 00:01:25 got to fix bugs because no matter how good a coder you are you're going to introduce bugs but there's no clear definition of where to set that slider should i be fixing bugs now or should i be releasing features and so this tension exists i think in all product teams all software teams if you don't have a tool like bug sn, it's very difficult for you to figure out where to spend time. And so that's the idea here, is we're trying to help teams understand whether they should be building or fixing,
Starting point is 00:01:52 because there's a bit of a delicate balance between both. So if your team is unsure of how to spend their time building or fixing, give Bugsnag a try. It's free to get started with a 45-day extended trial exclusive to our listeners. Head to bugsnag.com slash changelog. And by Linode. Everything we do here at Changelog is hosted on Linode servers. Pick a plan, pick a distro, and pick a location.
Starting point is 00:02:17 And in seconds, deploy your virtual server. droll-worthy hardware, SSD cloud storage, 40 gigabit network, Intel E5 processors, simple, easy control panel, nine data centers, three regions, anywhere in the world they've got you covered. Head to lyndo.com slash changelog and get $20 in hosting credit. Thank you. Data Science at OSCON 2017. We talked to Vita Williams, Data Scientist, Educator, and Entrepreneur, and also Michelle Casbon, Director of Data Science at Cordoba. We talked about the social impact of open data, personal data and transparency, privacy, the big data problem of public surveillance, electronic fingerprinting, the rift between data scientists and computer scientists, neuro-linguistic programming, machine learning, and so much more. Enjoy the show. Unless you're a data practitioner in the world of open source developers,
Starting point is 00:03:39 it's not really on the core of everything. I have to make a compelling case to be interesting. I see data science and I get excited. Yeah. And I'm an open source developer. So, yeah. Maybe I'm the outlier. No, well, it was interesting because one of the things I talk about is open data. That's specifically what I'm interested in, but the social impact of open data.
Starting point is 00:03:59 How do we come together? That's what we want to talk about. But that's my thing. Right. And there's just now a burgeoning conversation around it. I think we tried to have it, interestingly enough, 20 years ago, but there wasn't an infrastructure for open data at the time. Who's we?
Starting point is 00:04:16 Data practitioners. I mean, my first big project was a DPA data project, so that was big data before big data was big. We were doing something stupid that 15 years later we knew not to do and that's moved from mainframe into relational like I don't want to do that to that volume of data. That being said at the time there were discussions around transparency and open data and who should have access to it but there were no standardizations, there were no protocols, there were no accesses, there
Starting point is 00:04:43 were no platforms. So now we're finally in a place where we can have this discussion because especially in the open source, all that stuff exists. So now it's regathering the Avengers, if you will, all the data superheroes and going, hey, we can now hold everybody accountable for privacy, for serenization, for protocols on access in order to actually make a difference. So why don't we do that?
Starting point is 00:05:06 So anyway, that's what the talk was about. Cool. Interesting. We've actually had some shows. We've been around for a while. 2009, we started this show, and we've talked about open data, mostly in the government space a couple times. Yeah.
Starting point is 00:05:19 Yeah. I'm looking for some, like, older shows. It's been a while. Like, Civic Hacking with, this, this is like the first one, with Luigi Montanez and Jeremy Carball. That was when they were both working. Sunlight Labs? Yeah, Sunlight Labs.
Starting point is 00:05:33 Sunlight Foundation? Yeah. Well, now you have the President's Information Fellows, the PIFs, right, who are in that whole White House-sponsored open data platform. But an interesting question came up in my session about if this conversation was before and what do we do about the question of privacy? So it was really like, okay, so if everybody's supposed to have this personal data, then what is this, how do we accomplish this around privacy?
Starting point is 00:06:03 And my response was we need to hold, we as data practitioners need to challenge the hypocrisy of privacy. We want to put a camera everywhere and be able to develop in reality TV, and there's no privacy communication there. But all of a sudden, you're a data point, and there's all of a sudden a need for privacy. So we as practitioners need to actually challenge the definition of data as though image is somehow not data and thus exempted from privacy. But if you're a number or some type of codified information, then all of a sudden it's privacy rules.
Starting point is 00:06:41 That's interesting. I never really considered the idea of cameras being somewhere, and considering that, I hate that too. I mean, I may be somewhat of a devil's advocate, but I'm not sure your perspective. It kind of bugs me that you can take six data points and figure out exactly who I am. Absolutely.
Starting point is 00:07:00 Male, color, where I originated from, how much money I probably make, if I have kids. You can take six data points and pretty much figure out roughly everything about me besides my name. That's the world we live in, but should we accept that? Is it okay to have all that? And I'm born in 79, so I'm 38 years old. People born today's age, it's like second nature. They have no expectation of privacy. Well, and so, okay, so where I sit on it, I'm an introvert data geek, so I don't want anybody to know anything.
Starting point is 00:07:33 Okay, so maybe I'm not devil's advocate. No, no, no. I don't want anybody, you know, I'm one of the first ones to say I'm falling off the grid for said period of time and you can't get me. But I also, I think having been in technology for so long, strike a cool balance between the fact that in order for us to have this technological infrastructure and the innovation revolution that we're currently in, we have already as a country at minimum, world a little bit less, but equally made a decision to forego privacy.
Starting point is 00:08:01 So now when we discuss privacy, we're only talking about it really in the realm of making you feel comfortable at having you as a citizen for having given it up. Right? Anytime you start... So it's already out there. It's reversing it. Right. It's already gone. Now, the problem that I have from a data scientist's perspective is the definition of data. We will refuse to call image information data, and it is equally data. We as a...
Starting point is 00:08:30 When we start talking about privacy laws, we do not consider image, video, et cetera, with the same standard as we do your credit card number, your social security number, you know, except for now we have technology where if I put your picture up, I can equally find everything about you on the internet that's associated with that image, right? You're scaring me, Vida.
Starting point is 00:08:52 Come on now. I mean, I'm just saying. It's true. It's like catfish, right? You just throw that image in Google or whatever, this magic machine. If you're trying to prevent catfish from happening, you might want to put the image up. I'm just saying. Yeah, that's true.
Starting point is 00:09:03 That's true. But we don't have the same protocols and expectation around privacy. Right. And I'm saying there's a bit of a hypocrisy there. And so in my space, when we're talking about making an actual difference in the world, so we will not at all disclose the information of a youth who's in trouble at all, right? But as soon as he's in a fight or as soon as he's in some police exchange
Starting point is 00:09:29 or as soon as he's in whatever, all privacy goes out of the window because there's an image, there's a video, and now we know everything, right? Yeah. But if we could have just, and this is my, so one of my core spaces is child welfare. I work a lot in education.
Starting point is 00:09:43 I work a lot in urban planning, a lot of impact investing and a lot of those things where I feel like we make communities safer. How about if we just identified at the point in time that he became a foster youth and all of a sudden his environment is unstable? Why couldn't we de-privacy, denude some of that data then so that we could provide services that
Starting point is 00:10:05 could have helped him. But now that is a privacy issue. So I don't know where the lines are. I just know that we don't, I don't know where the lines are, but I know that we do not have a rational way of discussing privacy via data in a way that is actually going to be beneficial for humanity. That's what I know. So my thing is issuing a call to action to those who deal with data to begin the process
Starting point is 00:10:31 of discussing how do we templatize it, how do we standardize it, what protocols do we put in place in order to make data more available and more consumable for impact. That's my goal. And I don't know if you're recording any of this. We recorded all of it. Did you really? We've already started. We actually,
Starting point is 00:10:49 this is like a soft opening here. Yeah. Unless you want to like resume it. No, I was about to say that. Like, by the way, we've been recording this whole thing.
Starting point is 00:10:56 This is a good riff. So let's keep it down. We don't want that privacy here. You know, we've been recording everything you're going to say. I was going to say that. Well, it's funny because normally we'll do like an intro thing and then we'll start.
Starting point is 00:11:04 Well, she was glad it's already had it going. I was like, we'll just keep talking. I was like, this is better than the you were going to say. I was going to say. Well, it's funny because normally we'll do like an intro thing and then we'll start. Well, she was glad it's already had a go. I was like, let's keep talking. I was like, what were you thinking? This is better than the show is going to be. This is the show, y'all. This is the show. This is the show. Yeah, so Vita Williams.
Starting point is 00:11:14 Vita Williams. Lots to say. From my perspective, I didn't realize this, so I've always considered it, but because I'm just like a nerdy developer person, like images are data, the video is data, my phone number is data. I always saw it, but because I'm just like a nerdy developer person, like images are data, the video is data, my phone number is data. I always saw it the same. I didn't realize that the classification from the data practitioners or from the governmental bodies or people making the decisions, they see imagery and video as like completely distinct
Starting point is 00:11:39 things. Well, think about it this way. When you had the huge push for police to wear cams, right? Like that was the answer to the interactions between police and youth, right? The answer was, let's everybody wear a cam. Body cam, yeah. Right? So my response was, who is managing all that data, right? How are you exactly organizing the fact that, well, we need to pick up this cam from this
Starting point is 00:12:07 person at this time, and who has the space? Who's managing the space constraints for culling all of that data at once? Those types of properties. Is it archived? Is it archived well? Yeah. Could it be used in the cord? Absolutely.
Starting point is 00:12:21 All these things. I never even thought about that. Nobody does. Nobody did. Right. And that is where- We do. We should where the data people come in. And we were nowhere in that conversation. So, yes, a social justice question because the legislators want to say, yes, we're a body cam. And the data people are like, well, wait a minute. That's like a yes, no, because that's a yes, we should do it. But a no, we can't. Right. And then how do you play that out later in the courts? And then where's the question of privacy then? The people in the video are under 18. How much can you show? You can't even tell a child's name if there's been any type of sexual violence in a newspaper and yet you can show an entire video of a young person in some type of exchange with police? Talk to me about privacy again. But because the data people are missing from those types of conversations,
Starting point is 00:13:06 those points are only discussed in our rooms behind our little screens because we don't really like talking to people. So what are they doing, then, with these cameras? How are they dealing with the data? Do you know? I have no idea. I honestly have no idea. I have talked to a couple.
Starting point is 00:13:19 What's your best guess? My best guess is they're not. Just lose it. So you think maybe it's around for a week until the SD card is formatted? They'll have. And, in fact, what will happen is we'll have some case that will challenge it, right, where the data will need to be there. The data, the film, the metadata, and the images will all need to be there.
Starting point is 00:13:38 And we'll just call them the legislators of the day. We'll come up and say, you know what? Our policy at that point in time was to archive it seven days because of the volume of the data. And unfortunately, that was cut before we could get there. Right. It'll be some answer like that because then that enables the legislators to vote yes. And then the execution of it to fall defunct and it be nobody's fault. Yeah.
Starting point is 00:14:01 I'm starting to think of chain of custody and issues like that as well. Exactly. Who's the one who's maintaining the data? Is it the same people who are called under question by the jail? And that's why I said the metadata becomes very important. Who picked it up? Who cataloged it? Where did they move it?
Starting point is 00:14:14 When did they move it? We have electronic fingerprints. That's all a data issue. That's a development issue, right? That's an infrastructure issue. But we don't have the practices in place, and nor do we have the protocols in place to deal with issues such as privacy. So now, if you had a routine traffic stop, I would stop. You know, he's got a camera on, he's taking a
Starting point is 00:14:35 picture of me. But later I go running for office, what if I cursed him out during that traffic stop? Well, that video can resurface. Where's the privacy of that was a state sanctioned video so there's all kinds of questions of privacy that never come up when you're dealing with data from an image perspective they always say you never have something to hide until you have something to hide that's the truth though but in the era of data you have everything to hide or nothing to hide like that's where we are now you don't even know what's out everything to hide or nothing to hide like that's the that's where we are now you don't even know what's out there too high i'm out i'm out i'm going off grid we're done here i'll get my privacy back oh boy uh does it kind of feel like that you throw your hands up and
Starting point is 00:15:15 you're like what what are we gonna do i did that years ago when i knew that we gave up privacy it was just one of those things where i literally will fall off the grid for a moment because i know i'm never really off the grid. I just don't want to talk to anybody. So I think we're in the era of transparency. I think the best opportunity we have as citizenry and on our side of the house as developers, as infrastructure planners, as data, is to begin to influence the legislation around it. It's to begin to have some expectation that would be at the table as they're defining what are the rights and the wrongs of people as it has to do with the information that we're culling. I think that's where we need to be and I don't think that we're in the conversation at all. I don't think people
Starting point is 00:15:56 are thinking about let's bring the geeks to the table to discuss how this can happen. I agree with that. They want us there last. We've made the solution. But it's too late. Go make it. They want us to fix it. We've designed how it should be. Yeah, exactly. All the decisions are made. Here's the spec. Can you do this now? Two weeks. We're ending this tomorrow. Exactly. Two weeks. Like, well, really, we needed this last week, so
Starting point is 00:16:15 we're going to pay you a hell of a lot of money to maybe get it wrong, but we've got to roll it out anyway, and then we'll just correct it on the back end. Oh, man. That's how it's going to go down. That's end oh man that's that's how it's gonna go down that's how it goes down that's how it goes down but we can change that that's why you're doing this podcast we're calling awareness to it call to action bring the geek avengers out we can change this what's your biggest call to call to action for developers data scientists geeks out
Starting point is 00:16:39 there what's your biggest call to action steps what can we do my biggest call to action is really get engaged with social justice issues, right? There are not enough of us that apply our talents into spaces where our impacts can be readily felt. So three years ago, I went from working high-corp enterprise architecture and data to deciding that if I was so good at what I do, that I can drive corporate missions forward, Department of Defense missions forward, that if I use that same talent and applied it to child welfare and applied it into these other places, that I can drive those missions forward just as fast. And I would think that that would be true for all of us, that if we reapply all of our skill sets in these areas
Starting point is 00:17:19 and look at that as a donation, as much as we look at dollar donations, that maybe we can start affecting change in our communities. Any low-hanging fruit in particular you could mention? Absolutely probably education is the biggest one right now like how do we standardize education data so that we can actually show where our students are successful where they're struggling which communities can benefit from what types of actions right we just need data we need platforms to be able to nationalize some of the results that we're getting from the education systems. If there's already
Starting point is 00:17:50 a mandate to produce education data, why isn't it standardized across the nation, right? And who's holding them accountable for doing that? And then who's doing that type of reporting that is accessible to educational practitioners, whether that's preschool programs or extracurricular education programs or whatever it is, social workers or counselors. So that's low-hanging fruit that's really easy but has the biggest impact for our next decade. Always got to take care of our future generation, right? It would seem to be.
Starting point is 00:18:18 That's the best place to invest. They don't even know that, you know, gone. They don't even know that they're not supposed to tell you this information. Yeah, really. So that's probably my biggest call to action and the first industry that I would say we could be the most impactful. So if people were listening to this and they're like, I love Vida. She's awesome. They can learn more about you.
Starting point is 00:18:37 Where do they go to find out more about you and what you're doing? Well, the first thing I would have to do is tell you my name is not Vida, but Vida. Oh, my goodness. Which is fine. Come on now. You already said it 15 times, and I messed it up. You waited this long. I even said, are you Vida Williams?
Starting point is 00:18:51 I'm not even embarrassed now. She said, yes, I am. I'm just mad. Oh, man. More mad than embarrassed. Vida. The audience knows that I mess a lot of names up. And I was going to say, it's not a big deal, because in Europe, they told me I say my name wrong anyway.
Starting point is 00:19:03 Okay. What is it then? It is Vida Williams. Vida. V-I-D-A. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. Vida. say my name wrong anyway. Okay. What is it then? It is Vida Williams. Vida. V-I-D-A. Vida. I went on the Spanish. I was thinking Vida like life in Spanish.
Starting point is 00:19:11 Me too. Live in La Vida Loca. Yes. What I said to Adam and he rolled his eyes at me. No. That's what I. That's it. Live in La Vida Loca.
Starting point is 00:19:19 Yes. That's it. And I am Vida Christy everywhere. So on Twitter, on Google, via email, Gmail. Okay. You can always get me am Vita Christi everywhere so on Twitter on Google via email Gmail you can always get me at Vita Christi
Starting point is 00:19:29 we'll put the links in the show notes to you and make sure everybody knows about you awesome any closing thoughts I just thank you for the opportunity
Starting point is 00:19:36 to ramble for about 15 minutes I mean I don't get that too often that's pretty awesome we're happy to talk to you very much
Starting point is 00:19:43 thank you That's pretty awesome. Cool. Happy to talk to you. Thank you. This episode is brought to you by GoCD. GoCD is an open source continuous delivery server built by ThoughtWorks. It provides continuous delivery out of the box with its built-in pipelines, advanced traceability, and value stream visualization. With GoCD, you can easily model, orchestrate, and visualize complex workflows from end to end. It supports modern infrastructure with Elastic On-demand agents and cloud deployments, and their plugin ecosystem ensures GoCD will work well in your unique environment. To learn more about GoCD,
Starting point is 00:20:33 visit gocd.org slash changelog. It's open source and free to use, and there's also professional support and enterprise add-ons available from ThoughtWorks. Once again, gocd.org slash changelog. And by TopTow. TopTow is the best place to work as a freelancer or hire the top 3% of freelance talent out there for developers, designers, and finance experts. In this segment, I talk with Josh Chapman, a freelance finance consultant at TopTow about the work he does and how TopTow helps him
Starting point is 00:21:04 legitimize being a freelancer. Take a listen. Yeah, in my arena within TopTal about the work he does and how TopTal helps him legitimize being a freelancer. Take a listen. Yeah, in my arena within TopTal, I specialize in everything from market research to business plan creation, to pitch decks, to financial modeling, valuation. And then that leads very naturally into fundraising strategy, capital raising strategy, investor outreach, closing a deal, deal negotiation, how to value the company, how to negotiate strategy, capital raising strategy, investor outreach, closing a deal, deal negotiation, how to value the company, how to negotiate that. And all those skill sets that I have continued to hone over on the TopTal side are ones that I actually deploy every single day in my own company.
Starting point is 00:21:40 Freelancing can sometimes be seen as not legitimate or subpar work. Now, I would argue that when you work with a company like TopTal, they put so much vetting into not only the companies that you work with, but also the talent that you work with, which I'm on the talent side, that it adds a level of legitimacy that isn't seen across other platforms. And that, for me, as the talent side, is incredibly fruitful and awesome to be a part of. I enjoy the clients. I enjoy the other talent that incredibly fruitful and awesome to be a part of, right? I enjoy the clients. I enjoy the other talent that I get to talk to. I enjoy the TopTal team. And that creates an overall positive experience, not only for TopTal, but for me as the talent and for the client as the company on the other side. And that is really not seen or is the experience across other platforms in the freelance market.
Starting point is 00:22:26 So if you're looking to freelance or you're looking to gain access to a network of top industry experts in development, design, or finance, head to toptal.com. That's T-O-P-T-A-L.com and tell them Adam from the Change Law sent you. For those wanting a more personal introduction, email me, adam at changelog.com. So we're here with Michelle Casbon, Director of Data Science at Cordoba. And Michelle, you as well as Vida Williams, another data scientist we spoke to at this show, and I guess maybe other,
Starting point is 00:23:17 I just feel like we're sensing a thing which I didn't know existed. We were talking about it before we started recording, but I wanted to get your explanation because this is a social construct that I have never experienced, which is there seems to be a bit of a divide between data scientists,
Starting point is 00:23:30 maybe with quotes around that, and computer scientists with quotes around that, or programmers. What's up with that? Yeah, that's a great question. I think it stems from a lot of, so data science didn't really exist until, I don't know, five years ago, 10 years ago. It's a lot of, so data science didn't really exist until, I don't know, five years ago,
Starting point is 00:23:46 10 years ago. It's a new thing. And I think when companies started to bring data scientists on, they sort of created these organizational structures that put a wall in between them. And they have different skill sets for the most part. So there's definitely some overlap. Engineering, you need a really strong programming background but data science you need strong engineering and strong math all of these other things in addition and so I feel like engineering kind of thought well their programming skills aren't as strong because they're really good at math and then the data scientists are like well they don't know anything about modeling because they're really good at math. And then the data scientists are like, well, they don't know anything about modeling,
Starting point is 00:24:27 so they're no good. But I think it really boils down to organizational structures and having that wall in between. Because a lot of times data science will do some really amazing things with math, and then they'll sort of like, hey, go implement that. Go put it into production. And an engineer is like, this library, it doesn't exist in Java. I don't know what kind of magic you expect me to do. But that's sort of throwing things over the fence. And that kind of tension, I think,
Starting point is 00:24:55 has caused a lot of problems. And that seems to have moved beyond the walls of the corporations to even events like this where I think yourself as well as Vida both responded to us in certain different terms like are you sure you want to talk to me? I'm not a developer and our response to that is like wow sure yeah yes we do
Starting point is 00:25:21 we want to talk to you I have never been aware what's my response to that question well that's okay well and to be fair i didn't say that's okay i didn't say i'm not a developer because data scientists are definitely right you didn't say you're not well vita said she wasn't a developer you think it's just said what's your audience well enough like maybe not hanging out like since it's newish so to speak, like, maybe y'all haven't gotten a time to congeal that well or hang out in the same rooms and realize that you're all human beings and you all have smarts and can bring something to a changing landscape of things.
Starting point is 00:25:58 Yeah, I mean, logically, that makes sense. It makes a lot of logical sense. Humans aren't logical. Right. That's true. Or emotional. Very judgmental. logical sense. Humans aren't logical. Right. That's true. Or emotional. Very judgmental. Very picky.
Starting point is 00:26:09 I don't know. I guess there's just, it seems like there are these two focuses. Like, one is just on sort of production code, you know, writing things that don't break. And then there's the, no, but machine learning. Like, the math is the most important part. And so I just think that, like, with any two organizations, just like between engineering and DevOps, like, there's the most important part. And so I just think that with any two organizations, just like between engineering and DevOps, like there's a lot of tension.
Starting point is 00:26:30 Because the goals are a bit different. Right, and in a certain sense, because there's overlapping skill sets, but not identical skill sets, both sides feel threatened by the other one. Yeah, that's a strong word, but. Oh, that's too strong? I mean, threatened is like, that's just a strong word.
Starting point is 00:26:47 Okay. I'm not saying it's wrong. I'm going to back it off. How do you mean threatened? Just curious. I said it. No, no, no. But she says, she thinks it's strong.
Starting point is 00:26:54 Why is it strong? Because I felt like it was. Like in what way? I felt like it was apropos. I feel like it's right on too. Yeah. But different reaction here. So please tell us.
Starting point is 00:27:01 So I think because we understand enough of what the other side does that it's easy to be critical of how other people are doing things I think the best way to so what I've what I've seen to make the problem go away the best is really just to take down those walls and like organizationally you're not two different people. As you were saying, just sit together, work together, there's even like job descriptions. Sitting together, yes, and like sharing titles. So I consider myself a data science engineer because I feel like that better describes
Starting point is 00:27:37 what I do because I do have a background in engineering and now I do a lot of machine learning and like my official title is director of data Science, but I don't feel like that's distinct from engineering anymore. So NLP is what I focus on, and in order to do that, I have to be able to understand distributed computing, and that didn't necessarily exist in traditional NLP. And so now to be able to do machine learning, I really have to understand so much of it and vice versa. If anyone wants to implement any of these models, any of this NLP stuff, they really kind of have to understand what the libraries are doing.
Starting point is 00:28:17 I guess what I'm saying is just that the more you can merge the roles and the everyday tasks, like whether that starts with calling people data science engineers or merging titles somehow or giving people the same sort of social status in the hierarchy, the engineering hierarchy, either way I think the more those can merge and the more you can align those goals, yeah, then the better people will work together. It's a form of segregation, right? Titles, wouldn't you say? Well, you're literally segregating.
Starting point is 00:28:53 It's not a racial segregation like maybe that term is normally associated with, but it's a segregation. You're separating by roles and distinctions when you should be melding more and considering yourselves more of a cohesive unit. It's what you learn in the military. It's what you learn working with teams. And the more you operate as a team, a fluid team, the better you are in the end result.
Starting point is 00:29:13 But in the military, you have titles. You have the medic. You have the engineer. I didn't say that the authority and structure is required because you have to respect those above you who've had the experience a bit down the road. So that's still there, I think. I mean, military is maybe a little different to compare perfectly. It's not a one-to-one, but you still have structure. You still have hierarchy.
Starting point is 00:29:34 But that doesn't mean that you can't be on the same team. I agree. And that also helps with the whole common goal thing. Like, you're all working towards the same thing. Right. You don't have to be nailed down to a certain thing. Yeah. We just got to quit putting each other in boxes, man.
Starting point is 00:29:51 That's right, man. No boxes, okay? Don't put me in a box, right? Box, not boxes. I'm really encouraged by the fact that you guys, like, didn't even know that there was this tension. That is definitely a good sign for the future. I've started getting a hint of it, though. I've been working with...
Starting point is 00:30:06 Daniel Whiteney? No, Pete Soderling from Data EngConf. He's great. Yeah, Pete's great. And so I've kind of caught some edge that there's this divide because, like, okay, why is it Data EngConf and not Data ScienceConf? Or just, like, why are there these nuances? And so I didn't know the animosity or the divide,
Starting point is 00:30:24 but I can sense that something was not perfect. Not a cohesive world. There was a distinct between the different roles. Yeah, and his conference is part of, I think, part of the solution because he really addresses it. And it's all about working together as data science engineers and not as engineering and data science. Not as individuals.
Starting point is 00:30:46 Yeah. That's cool. Let's talk about your talk and what you're here to talk about. You said your focus is on natural language processing, speech recognition, stuff like that. Is that what your talk was about? So it was about how we use NLP at Cordova. So we have a platform that helps people localize their products.
Starting point is 00:31:05 It doesn't really matter what the product is, but most everyone has a website or a mobile app, anything like that. We have a platform that helps people release that product in different markets. So not just English speaking ones, but really across the globe. And so my role within the engineering team is to work on the machine learning. So my talk really set the stage for, okay, why is localization important? Why should you even care about it? Because these are the disasters that happen when you don't care about it.
Starting point is 00:31:37 And I went down into a few of the details about which tools we're using. We built a lot of this on open source software. I really couldn't imagine building it on anything else. Like, open source really did enable us to even create this platform. Because of the cost or because of the why? No, capabilities. It's just better software.
Starting point is 00:31:59 Well, I don't think any of... Like, I don't think... There's so many different components. I don't think any one, like I don't think, there's so many different components. I don't think any one vendor provides that entire stack. And even if I wanted to cobble all that together, it would be extremely difficult. It's much, much easier using open source tools and they have gotten better so much faster.
Starting point is 00:32:18 What are some of the tools that you're using? Let's see. So the heart of our machine learning, we're using Sparks, MLlib, we use their logistic regression, random forest, stuff like that, libraries, and prediction IO is what does a lot of the NLP stuff. And let's see, we're running that in Docker containers on Kubernetes, it's all in scala and let's see our storage layer is raising maria db and cassandra um there's i mean there's a lot of stuff yeah yeah yeah so i talked a little bit
Starting point is 00:32:54 that's interesting laundry list yeah it's basically it's all open source it's almost all open source basically a dream yeah like as an engineer to be able to work with such amazing tools it's yeah it's really really fun they Yeah, like as an engineer, to be able to work with such amazing tools, it's really, really fun. They didn't have to work too hard to recruit me. Because the mission, I mean, changing the world, being able to give people products that feel native to them, even if they don't speak English, can really do so much good in the world by building that kind of platform. And then using the best tools out there to do do it the tools that engineers really want to use that's that's a big plus yeah I love the branding yeah the branding is phenomenal Cordoba have you seen the
Starting point is 00:33:36 site yeah I have it it's beautiful we have a great designer yeah I mean I love the the direction it's I, it looks extremely trustworthy. That's actually our brand newly unveiled site because we just announced our funding. We just closed our Series A funding round, and part of that was unveiling the new website. So I'm glad you like it. Congratulations on all that. Why is it the first time we're hearing of Cordova? Why do you think?
Starting point is 00:34:03 So I've asked myself that question a lot. When I first met the co-founders and I first heard about what they were building, it was one of those times where I was just like, light bulb, how have I not thought of using machine learning for that purpose? It's so well suited. It just makes sense. But I think a lot of good ideas in the past are like that. They seem obvious once you've thought of them. Right. The thing about localization... Exactly.
Starting point is 00:34:30 This circle was better than that square I was using. The thing about the localization field is that it just really hasn't changed much in 30, 35 years and we're really here to take a lot of the tools that work so well in other areas and apply it to this sort of older, more traditional one. And why hasn't anyone done it before? I have no idea because it makes so much sense.
Starting point is 00:34:52 And it's really, really exciting to be a part of that so early in the game at such an early stage of a startup. It's a fantastic experience. Cool. Well, Michelle, thanks so much for sitting down with us. Of course. Any closing thoughts to share? Anything, any words of wisdom to part on? Cool. Well, Michelle, thanks so much for sitting down with us. Of course. Any closing thoughts to share?
Starting point is 00:35:06 Any words of wisdom to part on? For the data scientists out there, the data engineers out there, and the mathematicians not melding well enough, what's going on? Feel the love. I guess I feel very personally invested in that whole data science versus engineering thing because I have one foot in both sides. You're the hybrid. I am definitely a hybrid. And that's been a fantastic experience.
Starting point is 00:35:36 I haven't encountered any animosity in my personal teams. Okay. And so I guess I just want to see more of that just everyone be nice everybody be nice be nice please alright thank you for tuning in to this episode of the changelog if you enjoyed this show share it with a friend
Starting point is 00:36:00 rate us an apple podcast and thank you to our sponsors Bugsnag Linode, GoCD and TopTile also thanks to Fastly, our bandwidth partner head to Fastly.com to learn more we host everything we do on Linode cloud servers
Starting point is 00:36:15 head to Linode.com slash changelog check them out, support the show the changelog is hosted by myself, Adam Stachowiak and Jared Santo it's edited by Jonathan Youngblood. The awesome music you've been hearing is produced by Breakmaster Cylinder. You can find more episodes just like this at ChangeLog.com or by subscribing wherever you get your podcasts.
Starting point is 00:36:35 Thanks for listening. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.