The Changelog: Software Development, Open Source - Operação Serenata de Amor (Artificial Intelligence, Data Science, Government Corruption 😱) (Interview)

Episode Date: October 31, 2017

Eduardo Cuducos joined the show to talk about Operação Serenata de Amor an Artificial Intelligence and Data Science project that aims to inform the general public about government corruption and spe...nding. We talked about how this artificial intelligence project analyzes claims for reimbursement from congresspeople to determine illegal probability, how it monitors government spending, the technology behind it, and how other governments might be able to follow this model.

Transcript
Discussion (0)
Starting point is 00:00:00 Bandwidth for Changelog is provided by Fastly. Learn more at fastly.com. And we're hosted on Linode servers. Head to linode.com slash changelog. This episode is brought to you by Bugsnag. Bugsnag is mission control for software quality. And on this segment, I'm talking with James Smith, co-founder and CEO of Bugsnag, about the core problem they're solving for software teams
Starting point is 00:00:25 and why you should head to bugsnag.com slash changelog to test it out with your team. Let's start with, you mentioned you and Simon. So you guys obviously at one point didn't have this company, right? So as founders, as engineers, you got to a problem. What was that problem? Why does Bugsnag exist? Simon and I, my co-founder, I met in college. We went off to build software for other companies. I ended up in a startup. He ended up in enterprise software. And we had the same problem in both of these companies. When things break, it's really hard to figure out how badly they're broken, who's impacted and what to fix first. So we both had this problem ourselves. So we decided, hey, why is no one doing a good job of fixing this problem right now? So very much Bugsnag was born out of scratching our own itch,
Starting point is 00:01:09 as they say. One thing that we find all the time is that there's this tension in software teams or in product companies where you want to deliver new features to your customers, or you want to build cool new stuff. But at the same time, you've got to fix bugs because no matter how good a coder you are, you're going to introduce bugs. But there's no clear definition of where to set that slider. Should I be fixing bugs now, or should I be releasing features? And so this tension exists, I think, in all product teams, all software teams. If you don't have a tool like Bugsnag, it's very difficult for you to figure out where to spend time. And so that's the idea here is we're trying to help teams understand
Starting point is 00:01:50 whether they should be building or fixing because there's a bit of a delicate balance between both. Absolutely. I couldn't agree more. So if your team is unsure of how to spend their time building or fixing, give Bugsnag a try. It's free to get started with a 45-day extended trial exclusive to our listeners. Head to Bugsnag.com slash changelog. You're listening to the ChangeLog, a podcast featuring the hackers, leaders, and innovators of open source. I'm Adam Stachowiak, editor-in-chief of ChangeLog. On today's show, we're talking with Ed Kudikos about Setanat idea more, an artificial intelligence open data project for social control of government spending and public administration. We talk about how this project is capable of analyzing claims for reimbursement from
Starting point is 00:02:46 Congress people to determine if they're illegal or not, how it monitors the spending of governments, the technology behind it, and how other governments might be able to follow this model. So, Ed, you're going to have to help us out because we first learned about Serenata de Amor in our ping repo from Fabio Rem. Thank you, Fabio, for making us aware of this project. I have to admit, I probably would have never found it on my own. And so I'm grateful that it came to us because it's very cool what you guys have been up to with this project down in Brazil. Give us a little bit of the backstory.
Starting point is 00:03:23 Tell us what it is first and we'll dive in from there. Okay, good. So first of all, thanks, Fabio, once more. But anyway, I think we are pretty good at technology stuff. By we, I mean people who started this project. But some of us were not so familiar with politics in general. So at a certain point, Edu, one of the founders, decided to get more involved in politics, and he might have asked a question that goes something like, how can I use the knowledge I have in technology? He's a great data scientist.
Starting point is 00:04:08 He's a great developer. So how can I use this knowledge to kind of go to see that it's unknown for me, like politics? And as we live in an age that we have a lot of open data about our government, it was kind of one possible answer to use data science to understand what this data was telling us about our government. So we can know who should we vote for, or at least who shouldn't we vote for. So I think this is the very, very beginning of the idea of Sera Nataja More.
Starting point is 00:04:47 So if you were going to summarize where we're at today or what it does currently, just a big picture, we'll dive into the details, but what is it? What does it do? Well, so we started to take data from the, I think it would be the lower house in the USA but here is the Chamber of Deputies and those guys do a lot of expenses while they are working on their job or while they are representing us in the government so we start to look to this data about those expenses and try to find bizarre things, suspicious things in these data sets because this would be kind of corruption or at least a kind of immoral or regular
Starting point is 00:05:39 use of public money. So this is basically what we started with and as we have just found in for a short stint basically that's the point today we are still looking for federal data not looking for like looking into does that make sense anyway into this data set and trying to understand what people are doing with our money. People are representatives of our money, like the money we pay for government and basically taxes and etc. Yeah, so the tagline on the website is
Starting point is 00:06:16 Artificial Intelligence for Social Control of Public Administration. And that is, the project focuses around the Brazilian government and these officials that you're speaking of, these congresspeople, and uncovering via artificial intelligence and the system that you built, I think you said the word odd,
Starting point is 00:06:42 potential corruption in their spending. Is that fair? Yes, yes. Okay. Completely fair. I think if I might jump into an example, it may be easier to grasp. Yep. A very simple example.
Starting point is 00:07:00 Congresspeople, they can pay with public money for their meals, for example, but only their meals, not like the meal of someone who works for them or someone who's meeting for whatever reason it is. And we can use data science and machine learning to take all those recipes from restaurants and spot which one is really an outlier. So probably there's something odd with this outlier and maybe the guy is paying the lunch for someone else when he or she shouldn't do it. So that's the kind of thing we're looking at. Right. So you gave us the initial idea and now we know what it is, what it does. Let's hear the backstory on bringing this software to exist. You said the idea happened. I'll lay out there
Starting point is 00:07:55 that two months of development was crowdfunded, almost 1300 people participated in that crowdfunding event. Tell us that story of saying when the founder had the idea for this project and how it came to be. So I think the founder is called Edu and he had the idea, I think he had this at a certain point that question in his mind, how can I use the knowledge I have
Starting point is 00:08:21 to go deeper in politics? And he approached some friends. I was one of them. And we looked at his idea and said, wow, that's amazing. Let's do something about it. And we were, like, the three of us, very familiar with crowdfunding for different reasons. And our idea was, okay, let's put up a project on a crowdfunding platform so we can raise money
Starting point is 00:08:52 to do a kind of MVP and kind of show people that this is possible. We can actually use technology. We can use machine learning to make sense of this, like, tons of data that much from crowdfunding in terms of money. Like, it wasn't feasible to expect money to work for a full year in the project. So we kind of estimated that we could raise money for two months. And two months is a very short period of time for a data science project. So we decided that we should start with a very simple
Starting point is 00:09:47 idea, put it, like, get our hands dirty on it, and just show people that it's really, really possible. So we gather a team, I think we were six or seven in the beginning, and we put the project online, project i mean the the crowdfunding campaign online and it was roughly one year ago i think it was september 2016 so for two months we were open to donations and at the end of the week got money to actually work three months in the project, which was better than we expected. And so what we did is translate, let's say translate law into code to try to spot in those data sets, eight or, no, I think eight, I'm overestimating.
Starting point is 00:10:40 I think it was in the beginning, five or six hypotheses of how people could be using the public money in illegal ways or not so moral ways of using it. So that's what we actually did, how it got started, which was pretty important pretty much important in the process was that we got a lot of support from the media in Brazil so in a certain week we we got like a cover of the biggest newspaper of the country and at three minutes on the biggest TV news on the country. So I think this gave us a lot of supporters in terms of not only code, but people who were very interested in what we were doing.
Starting point is 00:11:32 Something you said that was pretty interesting there was translated law into code, if I heard you correctly. Is that what you said? Yes, yes. Jeez, what kind of process was that? It was literally going through law texts. So we have a document. I really don't know how to say what this document is in English.
Starting point is 00:11:56 It's not a law, but it's a kind of agreement from the lower house, from the Chamber of Deputies, saying how much is this money we're talking about and how uh representatives can use it okay so kind of like guidelines on how public officials can use public funds to go about using their jobs doing their jobs day to day yes yes and the those guidelines for some reason they have like the same wage as the law if you actually decide to sue someone, for example, because there's no law above it. It could be, but there's no law. So this is kind of the main piece that Jude would look at if someone sued a congressperson and said, hey, you're not using this money as we expect you to do. So first of all, we have to understand law.
Starting point is 00:12:50 That's how it really begins. And I think the second step is a bit curious because we should, once we understand this piece of law, this agreement, these guidelines, we should think, okay, so if I was to try to, how can I say, like do a little work around to use this money in another way, how would I do it? And then we could think about, okay, let's say someone actually did this thing we just suggested in a kind of brainstorm, how could data tell us that someone did that?
Starting point is 00:13:30 And then we start to really organize the law into code. So we have the law, we have a way to bypass it, and we have a way to analyze data that would say that people use it, that exactly bypass we've thought about it, and then it's kind of easier to write code on that. So let's lay out a few of the findings. I think some of the details here are important because it's specific to the setup that you have there in Brazil
Starting point is 00:14:03 with regards to the money going to the Congress people, how much they're willing, how much they're able to spend, and the fact that they do report these reimbursements or receipts or however it goes in that you can use for that data. So first, the cool thing is that the robot, which you guys gave it a name, right?
Starting point is 00:14:23 Rosie, is that the right name? Yes, Rosie. It's named after the corrector from the Jetsons. Yes, the maid. The maid, I think. Yeah, the maid from Jetsons. Yeah. There you go.
Starting point is 00:14:38 Yeah, so Rosie has some findings, and they're on the website, which is linked up in the show notes for those interested. But just a few to summarize. So 219 Congress people max out their monthly spending. That's one of the findings. One Congress person that usually gets 30 gas tanks filled each month on average, which seems to be outside the norm. Two Congress people have claimed 13 meals paid in the same day. And as you said, they're able to pay their own meals, but not other people's. So that would be a strange one.
Starting point is 00:15:14 These are the kinds of things that Rosie is uncovering. And I think it's important to understand that this works because the Congress people are reporting this. Tell us the process of how they actually send in their reimbursements and what that all looks like. Okay, yeah. Brazil is a big country, and we have a lot of representatives, basically because we have a lot of different states with different sizes.
Starting point is 00:15:47 So in Tucson, we have a little bit more than 500 representatives. It's 513, if I'm not wrong. So they are reimbursed for those kind of expenses, expenses with meals, with security, with if they want to subscribe for some content, whether it is like digital or printed, doesn't matter. Consultancies, which kind of makes sense. They have to vote for a law and it's not their field of expertise, so they can hire a consultant or a consultancy to help them.
Starting point is 00:16:23 Transportation, because in theory in democracy they should be close to people who voted for him so he can they can be reimbursed for traveling from the capital which is in the middle of a massive country to whatever part of the country elected him or her uh different kind of expenses so So it works in a reimbursement process. They pay from their pockets and they save the receipt and they submit it to the house, to the Chamber of Deputies, saying, hey, I just spent, I don't know, 10 bucks with this and show the receipt and they are reimbursed. The problem is actually that we've been to the Chamber of Dep receipt and they are reimbursed the problem is actually that uh we've been to the
Starting point is 00:17:07 chamber of deputies and they're actually four people working in this process so they are in the office they are not representative they are uh how do you call it public servants yeah i'm not sure but yes they got an office they had this their job to kind of hang out there and do their thing yes they are part of the the government but they are not the politicians themselves they are not they're the state not the government public servants yeah yeah anyway so there are four people to actually receive and analyze and decide whether to reimburse and not all the recibed from 413 representatives they said we they actually received more than one thousand and five hundred like fifteen hundred claims for reimbursement
Starting point is 00:18:00 every day and it's a massive job. It's basically impossible to handle this job without the help of technology. And actually they are handling this without the help of technology and probably that's why we have a lot of work for Rosie to do because they are handling it but we don't trust it's possible to do a good job. If I wore their shoes, probably I would miss a lot of things. And I guess that's what's happening. It's kind of like having checks and balances, right? You have a human doing a job, but at the same time, that person could do errors.
Starting point is 00:18:41 And that's going to happen as part of doing any job, right? But Rosie is there to cover the checks and balances to make sure that what goes through the system these human beings doing this awesome job this hard job uh is following the law yeah yeah that's it once uh uh in a presentation about the project i just made up up a really, really simple example. I was saying, okay, so imagine you are one of these guys over there, and there's a pile of receipts on your table, and you have to just look through them and just say, is this a kind of very, very expensive meal, like something that is not right? And I just said to the public, hey, imagine you have a pile as tall as the Lord of the Rings.
Starting point is 00:19:36 Each page is a different recipe. The book, of course, not the DVDs, anyway. So, yeah. Or movies, too. Yeah. One receipt per second of the movies there you go there you go and i just run like a quick uh live code and say okay but imagine have all those uh in a data set and we can use for the project we use python and pandas so i just in i'm just guessing by but like in 10 lines of code i could uh from a sample of one of 1500 receipts that it's basically what the department gets every day i say okay from this we have like just 13 that are from meals or that are
Starting point is 00:20:28 kind of outliers so in one day you can look to 13 receipts but you can't look to 1,500 receipts so that's the idea and with technology it's easy it's like 10 lineups code lines of code and you can automatically get proof probably most of them and just pay attention to the ones that probably are that probably deserve this extra attention that's the idea I guess yeah exactly I mean this is
Starting point is 00:20:56 this is classic human empowerment right so the combination of a computer and a human in this case you have the computer to basically flag outliers or oddities. And then the human to then, you're reducing the human's load from 1,500 to maybe a dozen. Like you said, the previous number was not even,
Starting point is 00:21:18 you couldn't even do that. And then a dozen sounds like it's, you can do that in an hour. And so in the actual, what's hard for computers, and they're getting better at it, but they're still not there yet, is actually detecting whether or not this is corruption. Is this a false positive? Is there an explanation?
Starting point is 00:21:40 And so ROSI doesn't do any actual reporting, right? ROSI just gives the information back to person. Yes. It would make sense to have a system, though, to have humans process the data, you know, do data entry, basically. You know, obviously do human flagging as well as part of the process, but because the load is so massive, to reduce the thought process during data processing
Starting point is 00:22:06 or data entry, so to speak, and do that after the fact. I mean, that would make sense to me anyways, right? Like, this seems like the way it should be. Yes. I mean, like, by after the fact. You guys agree, of course. Yeah.
Starting point is 00:22:22 After the fact, the fact is what the reimbursement itself what do you mean after the data entry they're putting the data into I'm assuming process it and say okay this person should get reimbursed
Starting point is 00:22:37 this amount of money they're just processing receipts essentially and applying it like here's a receipt you know for Ed and Ed got three receipts today. Boom. It's in the system. They didn't discern whether or not those three receipts was illegal or, or seemed illegal by any means.
Starting point is 00:22:55 They didn't look at the law. They just simply process the information and put it into the system and allowed something like Rosie to, to do its job. Yeah. Uh, actually there. Actually, well, like politicians are pretty clever, I must say that. Because there's another layer to the discussion that is, officially, the Chamber of Deputies, like the demonstrations,
Starting point is 00:23:21 the public servants we were talking about, they are only there, that's what's written in this piece of paper that acts like the law, they're only there to say if the receipt used for the claim is a valid one. Let's say if the federal revenue ever gets a receipt, would the federal revenue say, yes, this is actually legal for revenue, whatever. Which is bizarre because the other side of this coin is that only the politician is, like the representative, is able by law to decide if the expense is claimable. Probably this word doesn't
Starting point is 00:24:05 even exist but like if he or she can claim the reimbursement for this expense so if one of the representatives goes to a restaurant that we know that for sure one cannot pay more than $100 for a meal. And he goes there and say, okay, this is my receipt. I spent $500. Actually, by law, he's the only person allowed to do it if this is reimbursable or not. Which is even bizarre. But I don't think we should get as far as this.
Starting point is 00:24:45 But anyway. You can only catch yourself. Yes, yes. They're pretty clever. But in spite of that, I think there's a lot of morality that we can put to work on our favor. And by our, I don't mean like the project.
Starting point is 00:25:04 I mean like actually the population of Brazil. Yeah. A classic example is that the law doesn't say anything about alcoholic drinks, so he could go to a liquor store or something. But actually there's the, I forgot the word. I think it's jurisprudence. Is this familiar? Say it again.
Starting point is 00:25:31 Jurisprudence or something. Yeah. When a lot of Jews seem to take the same directions in similar cases, it's not the law, but it's how the Jews probably – all of them will push this direction. I think what he's saying is when a lot of judges agree on a certain direction, I'm not sure what that term is called, but if you've got 11 of 12 judges agree on a direction, what is that? Yes, that's the point. I don't know the words, sorry.
Starting point is 00:26:03 About alcoholic drinks, we don't know the words sorry but yeah so uh about alcoholic drinks we don't have anything written that it's forbidden but we have a kind of this shared understanding that this is not the purpose uh the purpose for this money so you can actually report someone in this context like he's using public money for alcoholic drinks and even if it's not the law probably the jude will will so there's a lot of nuance basically into this process i mean so the the question back was was basically uh you know how do you take law and turn it into code and so it's very nuanced a lot of creative liberty could even be taken considering like this one in particular, where while alcohol may not be discouraged, it's not lawfully enforced to not do it. It's just discouraged.
Starting point is 00:26:56 Yeah. Coming up, we ask Ed how he and his team got involved in this project and what their position is, whether they're civilians, government officials, or employees, or none of the above. And when they got started with this project, they started to report these discrepancies back to the government. And as you may assume, they got a really low rate of response. So they gave Rosie, this robot, a Twitter handle and started making these discrepancies public data, which started to obviously raise awareness, but also ruffle the feathers of those in power. To find out what happens next, stay tuned. This episode is brought to you by Linode, our cloud server of choice. Everything we do here at Changelog is hosted on Linode servers.
Starting point is 00:28:06 Pick a plan, pick a distro, and pick a location, and in seconds, deploy your virtual server. Jewel-worthy hardware, SSD cloud storage, 40 gigabit network, Intel E5 processors, simple, easy control panel, nine data centers, three regions, anywhere in the world they've got you covered. Head to linodeo.com slash changelog and get $20 in hosting credit. And by CircleCI. CircleCI is how leading engineering teams deliver value faster. By automating the software
Starting point is 00:28:37 development process, using continuous integration and continuous delivery, you are free to focus on what matters most, which is building value for your customers. CircleCI is everything great teams need. Support for any language that builds on Linux, configurable resources, advanced caching options, custom environments, SSH access, security through full-level virtual machine isolation, interactive visual dashboard, first-class Docker support, and more. Get started with their free plan, which gives you unlimited projects and 1500 bills per month. Plenty to get started with.
Starting point is 00:29:12 Head to circleci.com slash changelogpodcast. what about rewinding a bit when you said to do this in the first place the founder of this project they wanted to get more into politics and you say that you are working with the the individuals processing these receipts how did you all go about getting uh one the idea is great too but how did you get to actually be embedded into the government it seems like what is your is your position civilians is your position government officials this project governmentally sanctioned how did How did that sell happen? How did you get there? Okay. We are not related to the government, I think.
Starting point is 00:30:11 When we started the project, a lot of the project... So this is happening outside the government? Yes, totally outside. We are mostly in a kind of hacker cuter, I guess. There's a lot of nuance over there, but by hacker cuter I guess there's a lot of nuance over there but by hacker
Starting point is 00:30:27 cuter I just mean the hands-on mode really trying to not just
Starting point is 00:30:34 wave banners but like let's do some stuff like what can we do with whatever
Starting point is 00:30:39 we know is this awareness then so you're processing this data with Rosie and you're raising awareness back to the government saying, hey, here's
Starting point is 00:30:48 corruption happening consistently. Yes. I think that that's summarized pretty well. One very interesting point on that is that when we started to find something odd, we started to report and it's pretty funny funny like the very first case we spot was a guy drinking a Samo Adams beer in Gordon Ramsay's restaurant in Las Vegas a Brazilian representative and say hey we are paying beer for someone in Vegas that's not not, like, that's unexpected.
Starting point is 00:31:26 So we started to report, and actually we got a really low rate of response because actually they don't have to reply at all. So the Chamber of Deputies, if anyone from the population asks them something hey there's this data this receipt here it's kind of uh odd can you clarify that for me they it's compulsory for them to give us a response but it's not compulsory for the congress person to report back to the chamber of deputies like to this administration part of the Chamber.
Starting point is 00:32:06 So we started to have a really low rate of responses. We did a kind of marathon of reports in one week and we reported almost a thousand cases and we just had, I think, 10% of response, which is pretty low. So from that point, we started to turn our attention not to officially reporting cases, but bringing them to the public, kind of a public arena, public place. So basically, we gave Rosie a Twitter account. I was going to say, it seems like the best way to call it out is just to make it public instead of saying
Starting point is 00:32:46 hey, can you tell me more about this? It's more like hey, this is happening. Yes, and until that point we're really afraid of publicizing some name of representatives claiming that there was a suspicion in his
Starting point is 00:33:02 reimbursement because it was pointing fingers and that wasn't the idea. We shouldn't just point fingers. But the end of the story is that it didn't work as we expected. And when Rosie started to tweet Kase, so we are careful with the language, so she basically asks for help,
Starting point is 00:33:24 a kind of translation of the tweets because she basically asks for help a kind of translation of the tweets because she's a machine so she's pretty much repetitive in whatever she tweets but is hey people I found something suspicious here can you help me look into it and and likes give a say is it really something odd or I just mess it up? Because like sometimes it happens, there's false positives. And this was pretty good because a lot of different people started to follow Rosie. And when she tweets, they start to ask the representative like, hey, guy, what is this thing Rosie's saying about us? So it's, I don't know, maybe one, two, three, maybe ten people asking the
Starting point is 00:34:07 congressperson what the hell's going on with this reimbursement. And this is pretty this was effective, I think. So that's how we kept doing. Rosie is still tweeting things. People ported the code
Starting point is 00:34:24 so she... What's the Twitter for Rosie? Rosie is still tweeting things. People ported the code. What's the Twitter for Rosie? Rosie.Serenata, which is Rosie from Serenata, translating it. I can add the link, so I can share the link. So you put it on the podcast if you want. Absolutely. I think it's very interesting that the limiting factor in Rosie's effectiveness is the actual structure of the government
Starting point is 00:34:51 itself. Meaning that you'd have to reorganize the way it even works in order for the corruption reporting to have legal ramifications for these people. But you can't stop the spread of the information once it's been found and so while you had only 10 percent of respondents with these
Starting point is 00:35:15 claims that were being submitted now you can just say well if that's we can't restructure the government we can at least bring to light the corruption into the public forum, and then the individual people can hold their politicians to the fire. That's really cool. Yeah, and that's the idea. That wasn't our first option, but that's the only way we found, so we came up with that. It's interesting that your perspective first was to just, you know, silently whistleblow back to the government potential corruption or just, you know, potentially just an error, you know, or an oversight.
Starting point is 00:35:54 Not so much, you know, saying these people are wrong or they're, you know, they're breaking a law. Maybe it's by accident, who knows, to get essentially no response or lack of response or slow response in a lot of cases. And now turn it over to the public and say, here's a public data set of erroneous receipts happening in our government. And here's who's to blame. Yeah, it's interesting because we work with CISED because this is public shaming and you shouldn't do that. Like you're just pointing fingers and maybe like people will bully some congressperson because of no reason. But an interesting story is that in the very beginning, our idea was to put Rosie to work and she would give us back suspicions and then we have a kind of blind reveal of
Starting point is 00:36:50 suspicions and then we had like kind of Google form to people interest in helping us investigating the suspicion so we would like sort cases and do this blind review. And only after, let's say, three people flag it as, OK, this is not a false positive, then we would report it. And that was a disaster because basically people haven't the knowledge of the law we had. So people wouldn't say like, OK, he was in Vegas and it was just a beer, like a small bottle. So it's not illegal or it's not immoral. So that's okay. When actually by the law, it's not written, but we were putting a lot of pressure on our shoulders, a lot of work on our desk to investigate all suspicions before reporting.
Starting point is 00:37:59 And then we were kind of afraid of just tweeting stuff and names. But it was, I think, a really good experience in spite of the blind reveal thing. And part of that is that because a lot of our followers or Rosie followers, they ask us stuff. So can the congressperson do this or that? Which part of the law says this or that? Or how can I investigate that? This was really, really cool. Like people were not just public shaming. Of course, there were some doing it,
Starting point is 00:38:32 but it's not the kind of behavior we try to foster. We care a lot of communication, like words and how we put cases. And in our Facebook account, we really try to share our techniques of investigation, how we go from a receipt to a decision if there's a false positive or it's really suspicious. And this was pretty cool. Like people's interest in law
Starting point is 00:38:56 from rose the suspicions like that. I like that you're not just raising the awareness, but you're also somewhat raising the education of the public's knowledge of what the law is and isn't. It's like a discussion, a forum around such things that many people would never engage or learn about, not given a medium like this. Pretty cool.
Starting point is 00:39:21 So the question is, you got Rosie de Serenata. How scalable is rosie so you have you have this twitter account and it's for brazil and it's in portuguese and the question that i always want to seem to ask our guests is how scalable are these things like the idea of course is free and anybody in their government or their locality can go out and build their own system but um how scalable do you see this in light of taking the rules that are in brazil that are specific to brazilian law and porting the system or maybe even just the idea of the system to different localities because citizens of citizens of many countries are probably learn of something like this and say oh i would love to have something like this where i live okay i have a lot to do
Starting point is 00:40:14 in this a lot to talk about in questions like that first of all and and and i think the most basic step in this direction is that everything we do in terms of code, in terms of technology, is in English. Again, we've been criticized because, oh, we are letting a lot of Brazilians out. Maybe they don't speak English or they don't feel comfortable in discussing issues on GitHub in English. But that's a decision we took and we embrace it. So all the code itself is in English and all the comments are the discussions in this kind of technological forum that is GitHub. Because that's the idea, like people should use it to their own realities.
Starting point is 00:41:03 So this is the first thing. The second thing is that to this point we are kind of specialized in analyzing reimbursements. So we if you have other kinds of public expenses probably our classifiers won't fit perfectly like you have to really write your own classifiers. But on the other hand, we try to design the software in a way that is pretty much pluggable. So you can have,
Starting point is 00:41:34 our architecture just requires basically an adapter, which says where Rosie can find data and a set of classifiers for this data. And all the pipeline would work the same. It doesn't matter if you're pulling data from Brazilian government or U.S. government or my city or whatever. So we try to be useful for other, not other countries only, of course, other countries for sure,
Starting point is 00:42:06 but even inside Brazil, it's different. Like if we are talking about a city hall or the federal government, it's completely different data sets. And anyway, but we try to be this pipeline where we can plug adapters for data and classifiers. So you can't skip knowing the laws of your country, your city, your state, and translating it into code.
Starting point is 00:42:35 Maybe if you find some similarities comparing your law to Brazilian law, we use it, it will be way easier. If it's completely different, probably have more work to do. But the idea is for us to grow the project to the point that we have a lot of references, and that would make it easier for people to use. Right now in Brazil, I've known about different cities or different initiatives trying to adapt our code to municipalities, just city halls basically.
Starting point is 00:43:11 And we try to support as much as we can. And also, I think there's this big thing of the idea of the project. So the day before yesterday, I was told some guys were looking into another kind of expense by the government. Again, I don't know the word in English, but when the government wants to hire a service or to buy something, he can't just, like the government can't just walk into the supermarket,
Starting point is 00:43:42 okay, I need this and buy it. It has to publicly advertise that he's looking for the service so every company can bid. I think they call that a call for proposals. Procurement. Yeah, procurement. Okay, okay, that's new. They essentially put out a call that,
Starting point is 00:44:00 hey, we're going to have a project coming up. We need to have proposals from an RFP. A lot of people have to bid on it, and it's a process. Okay, yeah. Like I'm building a bridge so different engineers can bid. Okay, I can build this bridge for this amount, and the government is kind of obliged to pick the cheapest one. So those guys, like I've known about them like two days ago, they did this for the city of Sao Paulo,
Starting point is 00:44:28 which is the biggest Brazilian city, for this RFP. So they are using NLP to cluster these calls by similarity. So when they have very similar calls with very different prices, there's something wrong. So probably there's someone trying calls with very different prices, there's something
Starting point is 00:44:45 wrong. So probably there's someone trying to take advantage of one or another call. So they actually, as far as I know, I've gone through their GitHub. As far as I know, they haven't used not a single line of code we wrote. They could, it's all open-sourced. But I think the idea is spread and i this is amazing this is really good so we you don't have to use rosy or whatever uh code you write but just using technology to helping you to make sense of public data that's amazing that's what we really expect to foster with this project. In this final segment of the show, we talk about the importance of open data,
Starting point is 00:45:35 but more importantly, making it accessible. This involves data scientists joining the effort to help make this not just public data, but usable public data. We also call out to all of our listeners in Brazil to reach out and get involved in this project. We'll be right back. This episode is brought to you by TopTow. TopTow is the best place to work as a freelancer or hire the top 3% of freelance talent out there for developers, designers, and finance experts in this segment i talk with josh chapman a freelance finance consultant at toptow about the work he does and how toptow helps him legitimize being a freelancer take a listen yeah in my arena within toptow i specialize in everything from market
Starting point is 00:46:36 research to business plan creation to pitch decks to financial financial modeling, valuation. And then that leads very naturally into fundraising strategy, capital raising strategy, investor outreach, closing a deal, deal negotiation, how to value the company, how to negotiate that. And all those skill sets that I have continued to hone over on the TopTal side are ones that I actually deploy every single day in my own company. Freelancing can sometimes be seen as not legitimate or subpar work. Now, I would argue that when you work with
Starting point is 00:47:12 a company like TopTal, they put so much vetting into not only the companies that you work with, but also the talent that you work with, which I'm on the talent side, that it adds a level of legitimacy that isn't seen across other platforms. And that for me, as the talent side, that it adds a level of legitimacy that isn't seen across other platforms. And that for me, as the talent side, is incredibly fruitful and awesome to be a part of, right? I enjoy the clients. I enjoy the other talent that I get to talk to. I enjoy the TopTal team. And that creates an overall positive experience, not only for TopTal, but for me as the talent and for the client as the company on the other side. And that is really not seen or is the experience across other platforms in the freelance market. So if you're looking to freelance or you're looking to gain access to a network of top
Starting point is 00:47:56 industry experts in development, design, or finance, head to TopTal.com, that's T-O-P-T-A-L.com and tell them Adam from The Change Law sent you. For those wanting a more personal introduction, email me, adam at changelog.com. I think that's one of the things that gets me personally so excited about open data, public data from governments is that it allows those people out there that have the ability to look at the data and examine it and potentially cross-examine how government spending is being done and put the power back into the people's hand versus just assuming that there is no corruption or there is no illegality is taking place. It's, you know, that something, someone out there is, uh,
Starting point is 00:49:08 is looking into this in ways that aren't just trusted individuals. Yes. Yes. And I mean, sometimes like, uh, when we started the project later, they, they changed the API, but the, the chamber of deputies, they used to, uh, follow this, uh, open data law open data law so they were kind of it was compulsory for them to public publicize this data but they actually did it in a really massive xml file it was like five gigabytes so okay it's open data, it's out there, you can like click and download and there you have all the data from this department, from this part of the government.
Starting point is 00:49:50 But actually, how accessible is a five gigabytes XML? It's basically like, I don't think I can open it in my computer, it doesn't have enough memory to handle a file this big. And also, is XML the proper file format to make that accessible? I think just tech people know that XML exists. Like if you tell my mom, probably she has no clue about how to open XML file. So I think it's really good to have open data, but we should be very critical
Starting point is 00:50:29 in pondering how accessible is it for people? And one step further, how can people actually make sense? Because if I open in Excel, some spreadsheet software, 1.6 million lines file,
Starting point is 00:50:52 how can I actually understand what these lines are telling me? So I think it's a really good thing to bring data science to help you make sense of this data. How many times have we said that, Jerry, where it's nice to be open source, but wouldn't it be nice if it was also accessible? data science to help you make sense of this data. How many times have we said that, Jerry? It's nice to be open source, but wouldn't it be nice if it was also accessible? Open source is one thing, but then the accessibility of the project
Starting point is 00:51:13 or the data in this case, we said at least a dozen times in the show, I'm sure. Absolutely. Yeah. There's a lot of work to be done in taking public data that's public the way that Ed has described because it has to be
Starting point is 00:51:30 but there's no investment into it at all they basically throw it over the wall so to speak and take that into usable public data and there's lots of foundations doing that kind of work civic hackers and stuff like that because you say an XML file is bad. Well, in terms of programmatic access, it's actually kind of nice.
Starting point is 00:51:51 Like you said, it's just humongous and it's difficult to parse reliably. But what's worse is PDFs or scanned images. There's terrible forms of public data. We can't get to the interesting work until we can get access to what is rightfully ours in whatever
Starting point is 00:52:14 locale you happen to be in. The citizens. It's a big problem. Totally. We started to write our own dashboard to browse data. So if going through an analysis, we want to look to more details on a specific reimbursement, we need a kind of dashboard. So just put the reimbursement number and you have all the information.
Starting point is 00:52:41 Which reminds me that we don't look only to the data set provided by the Chamber of Deputies but it's right we start to add layers of information so okay the reimbursement was at that company so go to the federal revenue to grass to grab more that about the company then we have the address of the company so we can ask Google to a picture like Google Street View so where this company is so there's a lot of layers of data we add to the original data set and actually this dashboard started as a kind of internal tool for us to do whatever we were doing and now there's a big effort in the in the team in the core of contributors to make this dashboard more accessible. Because as an internal tool, it was really, really terrible in terms of UX.
Starting point is 00:53:33 You have to know the reimbursement number and some other code, like numeric ID, to get to the data. And now we are trying to, okay, you can search by name and you can filter just your state and, let's say, reimburse them from this or that category, like meals from my state in 2015. And I think this is pretty interesting. We've seen journalists doing amazing jobs because we offer this tool for them to browse data, to browse government data. And that's something the government should have done, I guess. I think maybe they don't have enough people to do it. Maybe that's not their focus.
Starting point is 00:54:14 But this kind of dashboard, that is not technical. It doesn't require Python or Pandas or whatever for the user. This is something really important. It doesn't matter if it's a kind of civic activism stuff or if it's provided by the government itself. This is the kind of thing that really should exist out there. Maybe something that's sitting on everybody's thoughts, lips, whatever, as they're listening to the show is like,
Starting point is 00:54:39 this name. I said it not a damn more. Word of serenade of love, if you, if you, you know, translate it into English, serenade of love. What, what does it, what does it, why this name? Where did this come from? Okay. Like a literary translation, translation wouldn't make sense because it'd be love serenade,
Starting point is 00:55:01 I guess. Serenade is like when people like sing a song for someone who he or she is in love with right right so but actually this is the name of a very famous brazilian chocolate which is kind of even more bizarre i guess but the point is a lot of our listeners are really into chocolate yeah okay good thing i know. I know this by our downloads. Certain cities are really into chocolate. They make downloads in those cities. Just kidding. I'm assuming our listeners like chocolate.
Starting point is 00:55:36 I like chocolate. That's a pretty safe assumption. I mean, most people like chocolate. Who doesn't like chocolate? I mean, come on. Yeah. Who doesn't like chocolate? I mean, come on. But the real reason why we pick it up like a chocolate name is in mid-90s, there was a Swedish politician. She was probably going to be the next prime minister of Sweden. And for some reason that I don't remember why,
Starting point is 00:56:05 they started to investigate her. And they realized that she was using public money to buy stuff she wasn't supposed to buy. And one of those things was a single bar or maybe two bars of Toblerone. Yes, Toblerone, yes. Those are good. So it became known as the Toblerone Affair.
Starting point is 00:56:28 And yeah, I think you can Google for it. I don't know if we have a Wikipedia page for Toblerone Affair, but if you go to this politician page on Wikipedia, it's there, Toblerone Affair. It is there. Yeah, I'm on the selling. Yeah, we're going to leave this up there. Yeah, I'm on the selling side. We're going to leave this up in the show notes so everybody can follow along.
Starting point is 00:56:49 This is hilarious that it's well, I guess not hilarious in hindsight but the fact that it's connected to chocolate you know. I think it has a lot to do with the kind of irregular or illegal usage of public money we expect to find using data science.
Starting point is 00:57:07 Because when there's a big corruption scandal of millions, billions, maybe, I don't know, trillions of whatever currency you're using, probably someone already spot that and someone is working on it. But when we use big data, probably we are seeing a lot of small cases, a lot of small cases, a massive amount of small cases that hardly ever would be spotted by a human being. That was our hypothesis. And I think that's, well,
Starting point is 00:57:38 has a lot to do with the Toblerone affair. It was a level of corruption in terms of monetary value very low. So this is the main reason. In Brazil, there's a second reason that in Brazil, when our FBI, let's say our federal Bureau of Investigation is investigating something. Of course, they can't say I'm investigating this case with a very meaningful name, so they just give random names. I don't know how it
Starting point is 00:58:12 works elsewhere. So calling our project Serenata Amor was a kind of joke with those random names our FBI uses. So it's usually Operation Something and something that makes no sense
Starting point is 00:58:28 like Operation Sandcastle, Operation Car Wash and then we have Operation Car Wash. Yeah, Car Wash, actually the biggest investigation on corruption in Brazil, it's going on.
Starting point is 00:58:43 Anyway, and there's a third reason that it's basically our investigation on corruption in Brazil. It's going on. Anyway. And there's a third reason that it's basically our love story and aid for our country, like the kind of gesture we can do as a citizen to help our country. So this is the cheesy one, but I love it.
Starting point is 00:58:58 Great name. Lots of meaning inside that name. That's excellent. And how can we as the hacker community get involved, help you out, further your cause? We have lots of listeners in Brazil. We have listeners around the globe. How can we help out and get involved?
Starting point is 00:59:16 Well, we would be really, really proud if you get inspired by some of our ideas and try to do something local. We think we don't have to help us out in the sense of making this project better in Brazil for the levels of administration we are working at. Feel free to take the idea forward. This would be amazing. If you are just wanting to get deep in the code and stuff, we have a lot of issues from deployment to UX to DevOps to data science. A lot of analysis we would like to do and we just can't.
Starting point is 01:00:00 We just haven't had time to do it. And this is basically because we started with this big crowdfunding campaign. And since then, we basically had no other big fundraising except a recurring crowdfunding campaign we started when we ran out of money. So we are really glad because people are supporting us. But unfortunately, the amount of money we raise is not enough to put a couple of people, two, three, four people working full-time on the project. So right now we just have two or three part-time developers in the project.
Starting point is 01:00:42 So if you want to write code, there's a lot of things to write, from data science to development to UX to DevOps or whatever. And, like, there's a lot of communication stuff going on. So there's someone looking after our social media. There's people from law school helping us to dig into laws and think about new ways to get better results out of the report. Or maybe you think about new hypotheses that could be translated into classifiers. So actually, there's a lot of things to do if you want to help us. I think get started reading something about us. There's our website.
Starting point is 01:01:28 Probably the link will go in the podcast post. Or maybe in our GitHub if you're a more techie, savvy person. Feel free to drop a line on GitHub saying hi. Very good. Well, Ed, this was a lot of fun. Thanks for stopping by and telling us all about it. Thanks for this opportunity. It's really, really good
Starting point is 01:01:50 to talk about this project because not exactly the code we write, but the ideas underneath this code is really important for us. And it's really, really a pleasure. It's an honor to be here sharing these ideas with you guys. So thanks a lot for this opportunity.
Starting point is 01:02:09 Absolutely. All right. Thank you for tuning into the show this week. If you enjoyed the show, you know what to do. Share it with friends. Read us on Apple Podcasts. Tell everybody you know, please. Thanks to our sponsors, Bugsnag, Linode,
Starting point is 01:02:26 CircleCI, and also TopTile. Big thanks to Fastly, our bandwidth partner. Head to fastly.com to learn more. We host everything we do on Linode cloud servers. Head to linode.com slash changelog. Check them out. Support this show.
Starting point is 01:02:42 This show is hosted by myself, Adam Stachowiak, and Jared Santo. Editing is hosted by myself Adam Stachowiak and Jared Santo editing is done by Jonathan Youngblood and the awesome music you hear is produced by the mysterious Breakmaster Cylinder you can find more shows just like this at changelog.com or where you subscribe to podcasts thanks for listening Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.