The Changelog: Software Development, Open Source - Operação Serenata de Amor (Artificial Intelligence, Data Science, Government Corruption 😱) (Interview)
Episode Date: October 31, 2017Eduardo Cuducos joined the show to talk about Operação Serenata de Amor an Artificial Intelligence and Data Science project that aims to inform the general public about government corruption and spe...nding. We talked about how this artificial intelligence project analyzes claims for reimbursement from congresspeople to determine illegal probability, how it monitors government spending, the technology behind it, and how other governments might be able to follow this model.
Transcript
Discussion (0)
Bandwidth for Changelog is provided by Fastly.
Learn more at fastly.com.
And we're hosted on Linode servers.
Head to linode.com slash changelog.
This episode is brought to you by Bugsnag.
Bugsnag is mission control for software quality.
And on this segment, I'm talking with James Smith, co-founder and CEO of Bugsnag,
about the core problem they're solving for software teams
and why you should head to bugsnag.com slash changelog to test it out with your team.
Let's start with, you mentioned you and Simon. So you guys obviously at one point didn't have
this company, right? So as founders, as engineers, you got to a problem. What was that problem? Why
does Bugsnag exist? Simon and I, my co-founder, I met in college. We went
off to build software for other companies. I ended up in a startup. He ended up in enterprise software.
And we had the same problem in both of these companies. When things break, it's really hard
to figure out how badly they're broken, who's impacted and what to fix first. So we both had
this problem ourselves. So we decided, hey, why is no one doing a good job of fixing this problem right now? So very much Bugsnag was born out of scratching our own itch,
as they say. One thing that we find all the time is that there's this tension in software teams
or in product companies where you want to deliver new features to your customers,
or you want to build cool new stuff. But at the same time,
you've got to fix bugs because no matter how good a coder you are, you're going to introduce bugs.
But there's no clear definition of where to set that slider. Should I be fixing bugs now,
or should I be releasing features? And so this tension exists, I think, in all product teams,
all software teams. If you don't have a tool like Bugsnag, it's very difficult for you to figure out where to spend time.
And so that's the idea here is we're trying to help teams understand
whether they should be building or fixing
because there's a bit of a delicate balance between both.
Absolutely. I couldn't agree more.
So if your team is unsure of how to spend their time building or fixing,
give Bugsnag a try.
It's free to get started with a 45-day extended trial exclusive to our listeners. Head to Bugsnag.com slash changelog. You're listening to the ChangeLog, a podcast featuring the hackers, leaders, and innovators of open source.
I'm Adam Stachowiak, editor-in-chief of ChangeLog.
On today's show, we're talking with Ed Kudikos about Setanat idea more, an artificial intelligence open data project for social control of government spending and public administration. We talk about how this project is capable of analyzing claims for reimbursement from
Congress people to determine if they're illegal or not, how it monitors the spending of governments,
the technology behind it, and how other governments might be able to follow this model.
So, Ed, you're going to have to help us out because we first learned about Serenata de
Amor in our ping repo from Fabio Rem.
Thank you, Fabio, for making us aware of this project.
I have to admit, I probably would have never found it on my own.
And so I'm grateful that it came to us because it's very cool what you guys have been up to with this project down in Brazil.
Give us a little bit of the backstory.
Tell us what it is first and we'll dive in from there. Okay, good. So first of all, thanks, Fabio, once more.
But anyway, I think we are pretty good at technology stuff. By we, I mean people who
started this project. But some of us were not so familiar with politics in general.
So at a certain point,
Edu, one of the founders, decided to get more involved in politics,
and he might have asked a question that goes something like,
how can I use the knowledge I have in technology?
He's a great data scientist.
He's a great developer.
So how can I use this knowledge to kind of go to see
that it's unknown for me, like politics?
And as we live in an age that we have a lot of open data
about our government, it was kind of one possible
answer to use data science to understand what this data was telling us about our government.
So we can know who should we vote for, or at least who shouldn't we vote for.
So I think this is the very, very beginning of the idea of Sera Nataja More.
So if you were going to summarize where we're at today or what it does currently, just a
big picture, we'll dive into the details, but what is it? What does it do?
Well, so we started to take data from the, I think it would be the lower house in the USA but here
is the Chamber of Deputies and those guys do a lot of expenses while they are
working on their job or while they are representing us in the government so we
start to look to this data about those
expenses and try to find bizarre things, suspicious things in these data sets
because this would be kind of corruption or at least a kind of immoral or regular
use of public money. So this is basically what we started with and as we have just
found in for a short stint basically that's the point today we are still
looking for federal data not looking for like looking into does that make sense
anyway into this data set and trying to understand what people are doing with our money.
People are representatives of our money,
like the money we pay for government
and basically taxes and etc.
Yeah, so the tagline on the website is
Artificial Intelligence for Social Control
of Public Administration.
And that is, the project focuses around
the Brazilian government and these officials
that you're speaking of, these congresspeople,
and uncovering via artificial intelligence
and the system that you built,
I think you said the word odd,
potential corruption in their spending.
Is that fair?
Yes, yes.
Okay.
Completely fair.
I think if I might jump into an example, it may be easier to grasp.
Yep.
A very simple example.
Congresspeople, they can pay with public money for their meals, for example, but only their meals, not like the meal of someone who works for them or someone who's meeting for whatever reason it is.
And we can use data science and machine learning to take all those recipes from restaurants and spot which one is
really an outlier. So probably there's something odd with this outlier
and maybe the guy is paying the lunch for someone else when he
or she shouldn't do it. So that's the kind of thing we're looking at.
Right. So you gave us the initial idea and now we
know what it is, what it does. Let's hear
the backstory on bringing this software to exist. You said the idea happened. I'll lay out there
that two months of development was crowdfunded, almost 1300 people participated in that crowdfunding
event. Tell us that story of saying when the founder had the idea for this project
and how it came to be.
So I think the founder is called Edu
and he had the idea,
I think he had this at a certain point
that question in his mind,
how can I use the knowledge I have
to go deeper in politics?
And he approached some friends.
I was one of them.
And we looked at his idea and said, wow, that's amazing.
Let's do something about it.
And we were, like, the three of us, very familiar with crowdfunding for different reasons.
And our idea was, okay, let's put up a project
on a crowdfunding platform so we can raise money
to do a kind of MVP and kind of show people
that this is possible.
We can actually use technology.
We can use machine learning to make sense of this, like, tons of data that much from crowdfunding in terms of money.
Like, it wasn't feasible to expect money to work for a full year in the project.
So we kind of estimated that we could raise money for two months.
And two months is a very short period of time for a data science project.
So we decided that we should start with a very simple
idea, put it, like, get our hands dirty on it, and just show people that it's really,
really possible. So we gather a team, I think we were six or seven in the beginning,
and we put the project online, project i mean the the crowdfunding campaign
online and it was roughly one year ago i think it was september 2016 so for two months we were
open to donations and at the end of the week got money to actually work three months in the project, which was better than we expected.
And so what we did is translate,
let's say translate law into code to try to spot in those data sets,
eight or, no, I think eight, I'm overestimating.
I think it was in the beginning,
five or six hypotheses of how people could be using the public money in illegal ways or not so moral ways of using it.
So that's what we actually did, how it got started, which was pretty important pretty much important in the
process was that we got a lot of support from the media in Brazil so in a certain
week we we got like a cover of the biggest newspaper of the country and at
three minutes on the biggest TV news on the country. So I think this gave us a lot of supporters
in terms of not only code,
but people who were very interested in what we were doing.
Something you said that was pretty interesting there
was translated law into code, if I heard you correctly.
Is that what you said?
Yes, yes.
Jeez, what kind of process was that?
It was literally going through law texts.
So we have a document.
I really don't know how to say what this document is in English.
It's not a law, but it's a kind of agreement from the lower house, from the Chamber of Deputies, saying how much is this money we're talking about and
how uh representatives can use it okay so kind of like guidelines on how public officials can
use public funds to go about using their jobs doing their jobs day to day yes yes and the
those guidelines for some reason they have like the same wage as the law if you actually decide to sue someone, for example, because there's no law above it.
It could be, but there's no law.
So this is kind of the main piece that Jude would look at if someone sued a congressperson and said, hey, you're not using this money as we expect
you to do.
So first of all, we have to understand law.
That's how it really begins.
And I think the second step is a bit curious because we should, once we understand this
piece of law, this agreement, these guidelines, we should think, okay, so if I was to try
to, how can I say, like do a little work around to use this money in another way, how would
I do it?
And then we could think about, okay, let's say someone actually did this thing we just
suggested in a kind of brainstorm,
how could data tell us that someone did that?
And then we start to really organize the law into code.
So we have the law, we have a way to bypass it,
and we have a way to analyze data that would say that people use it,
that exactly bypass we've thought about it,
and then it's kind of easier to write code on that.
So let's lay out a few of the findings.
I think some of the details here are important
because it's specific to the setup that you have there in Brazil
with regards to the money going to the Congress people,
how much they're willing,
how much they're able to spend,
and the fact that they do report these reimbursements
or receipts or however it goes in
that you can use for that data.
So first, the cool thing is that the robot,
which you guys gave it a name, right?
Rosie, is that the right name?
Yes, Rosie.
It's named after the corrector from the Jetsons.
Yes, the maid.
The maid, I think.
Yeah, the maid from Jetsons.
Yeah.
There you go.
Yeah, so Rosie has some findings, and they're on the website,
which is linked up in the show notes for those interested.
But just a few to summarize. So 219 Congress people max out their monthly spending. That's
one of the findings. One Congress person that usually gets 30 gas tanks filled each month on
average, which seems to be outside the norm. Two Congress people have claimed 13 meals paid in the same day.
And as you said, they're able to pay their own meals,
but not other people's.
So that would be a strange one.
These are the kinds of things that Rosie is uncovering.
And I think it's important to understand that this works
because the Congress people are reporting this.
Tell us the process of how they actually send in their reimbursements
and what that all looks like.
Okay, yeah.
Brazil is a big country, and we have a lot of representatives,
basically because we have a lot of different states with different sizes.
So in Tucson, we have a little bit more than 500 representatives.
It's 513, if I'm not wrong.
So they are reimbursed for those kind of expenses,
expenses with meals, with security, with if they want to subscribe for some content, whether
it is like digital or printed, doesn't matter.
Consultancies, which kind of makes sense.
They have to vote for a law and it's not their field of expertise, so they can hire a consultant
or a consultancy to help them.
Transportation, because in theory in democracy
they should be close to people who voted for him so he can they can be reimbursed for traveling
from the capital which is in the middle of a massive country to whatever part of the country
elected him or her uh different kind of expenses so So it works in a reimbursement process.
They pay from their pockets and they save the receipt and they submit it to the house,
to the Chamber of Deputies, saying, hey, I just spent, I don't know, 10 bucks with this
and show the receipt and they are reimbursed.
The problem is actually that we've been to the Chamber of Dep receipt and they are reimbursed the problem is actually that uh we've been to the
chamber of deputies and they're actually four people working in this process so they are in
the office they are not representative they are uh how do you call it public servants yeah i'm
not sure but yes they got an office they had this their job to kind of hang out there and
do their thing yes they are part of the the government but they are not the politicians
themselves they are not they're the state not the government public servants yeah yeah anyway so
there are four people to actually receive and analyze and decide whether to reimburse and not all the
recibed from 413 representatives they said we they actually received more than
one thousand and five hundred like fifteen hundred claims for reimbursement
every day and it's a massive job. It's basically impossible to handle this job without the help of technology.
And actually they are handling this without the help of technology and probably that's
why we have a lot of work for Rosie to do because they are handling it but we don't
trust it's possible to do a good job.
If I wore their shoes, probably I would miss a lot of things.
And I guess that's what's happening.
It's kind of like having checks and balances, right?
You have a human doing a job, but at the same time, that person could do errors.
And that's going to happen as part of doing any job, right?
But Rosie is there to cover
the checks and balances to make sure that what goes through the system these human beings doing
this awesome job this hard job uh is following the law yeah yeah that's it once uh uh in a
presentation about the project i just made up up a really, really simple example.
I was saying, okay, so imagine you are one of these guys over there, and there's a pile of receipts on your table, and you have to just look through them and just say, is this a kind of very, very expensive meal, like something that is not right?
And I just said to the public,
hey, imagine you have a pile as tall as the Lord of the Rings.
Each page is a different recipe.
The book, of course, not the DVDs, anyway.
So, yeah.
Or movies, too.
Yeah. One receipt per second of the movies there you go there you go and i just run like a quick uh live code and say okay but imagine have
all those uh in a data set and we can use for the project we use python and pandas so i just in i'm just
guessing by but like in 10 lines of code i could uh from a sample of one of 1500 receipts that
it's basically what the department gets every day i say okay from this we have like just 13 that are from meals or that are
kind of outliers so in one day you can look to 13 receipts but you can't look
to 1,500 receipts so that's the idea and with technology it's easy it's like 10
lineups code lines of code and you can automatically get proof probably most of them and just
pay attention to the ones that probably are
that probably
deserve this extra attention
that's the idea I guess
yeah exactly I mean this is
this is classic
human empowerment right so the
combination of a computer and a human
in this case you have the computer
to basically flag outliers or oddities.
And then the human to then,
you're reducing the human's load from 1,500 to maybe a dozen.
Like you said, the previous number was not even,
you couldn't even do that.
And then a dozen sounds like it's, you can do that in an hour.
And so in the actual, what's hard for computers,
and they're getting better at it,
but they're still not there yet,
is actually detecting whether or not this is corruption.
Is this a false positive?
Is there an explanation?
And so ROSI doesn't do any actual reporting, right?
ROSI just gives the information back to person.
Yes.
It would make sense to have a system, though, to have humans process the data,
you know, do data entry, basically.
You know, obviously do human flagging as well as part of the process,
but because the load is so massive, to reduce the thought process
during data processing
or data entry, so to speak,
and do that after the fact.
I mean, that would make sense to me anyways, right?
Like, this seems like the way it should be.
Yes.
I mean, like, by after the fact.
You guys agree, of course.
Yeah.
After the fact,
the fact is what
the reimbursement itself
what do you mean
after the data entry
they're putting the data
into I'm assuming process it
and say okay this person should get reimbursed
this amount of money
they're just processing receipts essentially
and applying it like here's a receipt
you know for Ed and Ed got three receipts today.
Boom.
It's in the system.
They didn't discern whether or not those three receipts was illegal or,
or seemed illegal by any means.
They didn't look at the law.
They just simply process the information and put it into the system and
allowed something like Rosie to,
to do its job.
Yeah.
Uh, actually there. Actually, well, like politicians are pretty clever, I must say that.
Because there's another layer to the discussion that is,
officially, the Chamber of Deputies, like the demonstrations,
the public servants we were talking about, they are only
there, that's what's written in this piece of paper that acts like the law, they're only
there to say if the receipt used for the claim is a valid one.
Let's say if the federal revenue ever gets a receipt, would the federal revenue say,
yes, this is actually legal for revenue, whatever.
Which is bizarre because the other side of this coin is that only the politician is,
like the representative, is able by law to decide if the expense is claimable.
Probably this word doesn't
even exist but like if he or she can claim the reimbursement for this expense
so if one of the representatives goes to a restaurant that we know that for sure
one cannot pay more than $100 for a meal.
And he goes there and say, okay, this is my receipt.
I spent $500.
Actually, by law, he's the only person allowed to do it if this is reimbursable or not.
Which is even bizarre.
But I don't think we should get as far as this.
But anyway.
You can only catch yourself.
Yes, yes.
They're pretty clever.
But in spite of that,
I think there's a lot of morality
that we can put to work on our favor.
And by our, I don't mean like the project.
I mean like actually
the population of Brazil.
Yeah.
A classic example is that the law doesn't say anything about alcoholic drinks, so he
could go to a liquor store or something.
But actually there's the, I forgot the word. I think it's jurisprudence.
Is this familiar?
Say it again.
Jurisprudence or something.
Yeah.
When a lot of Jews seem to take the same directions in similar cases, it's not the law, but it's how the Jews probably – all of them will push this direction.
I think what he's saying is when a lot of judges agree on a certain direction,
I'm not sure what that term is called,
but if you've got 11 of 12 judges agree on a direction, what is that?
Yes, that's the point.
I don't know the words, sorry.
About alcoholic drinks, we don't know the words sorry but yeah so uh about alcoholic drinks we don't
have anything written that it's forbidden but we have a kind of this shared understanding that
this is not the purpose uh the purpose for this money so you can actually report someone
in this context like he's using public money for alcoholic drinks and even if it's not the law probably
the jude will will so there's a lot of nuance basically into this process i mean so the the
question back was was basically uh you know how do you take law and turn it into code and so
it's very nuanced a lot of creative liberty could even be taken considering like this one in particular, where while alcohol may not be discouraged, it's not lawfully enforced to not do it.
It's just discouraged.
Yeah. Coming up, we ask Ed how he and his team got involved in this project and what their position is,
whether they're civilians, government officials, or employees, or none of the above.
And when they got started with this project, they started to report these discrepancies back to the government.
And as you may assume, they got a really low rate of response.
So they gave Rosie, this robot, a Twitter handle and started making these discrepancies public data,
which started to obviously raise awareness, but also ruffle the feathers of those in power.
To find out what happens next, stay tuned. This episode is brought to you by Linode, our cloud server of choice.
Everything we do here at Changelog is hosted on Linode servers.
Pick a plan, pick a distro, and pick a location,
and in seconds, deploy your virtual server.
Jewel-worthy hardware, SSD cloud storage,
40 gigabit network, Intel E5 processors,
simple, easy control panel,
nine data centers, three regions,
anywhere in the world they've got you covered. Head to linodeo.com slash changelog and get $20 in hosting credit. And by CircleCI.
CircleCI is how leading engineering teams deliver value faster. By automating the software
development process, using continuous integration and continuous delivery, you are free to focus
on what matters most, which is building value for your customers. CircleCI is everything great teams need. Support for any language that builds
on Linux, configurable resources, advanced caching options, custom environments, SSH access,
security through full-level virtual machine isolation, interactive visual dashboard,
first-class Docker support, and more.
Get started with their free plan, which gives you unlimited projects and 1500 bills per
month.
Plenty to get started with.
Head to circleci.com slash changelogpodcast. what about rewinding a bit when you said to do this in the first place the founder of this project
they wanted to get more into politics and you say that you are working with the the individuals processing these receipts how did you all go about getting
uh one the idea is great too but how did you get to actually be embedded into the government it
seems like what is your is your position civilians is your position government officials this project
governmentally sanctioned how did How did that sell happen?
How did you get there?
Okay.
We are not related to the government, I think.
When we started the project, a lot of the project...
So this is happening outside the government?
Yes, totally outside.
We are mostly in a kind of hacker cuter, I guess.
There's a lot of nuance over there, but by hacker cuter I guess there's a lot of nuance
over there
but
by hacker
cuter I
just mean
the
hands-on
mode
really trying
to not
just
wave
banners
but like
let's do
some stuff
like what
can we do
with whatever
we know
is this
awareness then
so you're
processing this
data with
Rosie and
you're raising awareness back to the government saying, hey, here's
corruption happening consistently. Yes.
I think that that's summarized pretty well. One very
interesting point on that is that when we started to find
something odd, we started to report and
it's pretty funny funny like the very
first case we spot was a guy drinking a Samo Adams beer in Gordon Ramsay's
restaurant in Las Vegas a Brazilian representative and say hey we are paying
beer for someone in Vegas that's not not, like, that's unexpected.
So we started to report,
and actually we got a really low rate of response
because actually they don't have to reply at all.
So the Chamber of Deputies,
if anyone from the population asks them something hey there's this
data this receipt here it's kind of uh odd can you clarify that for me they it's compulsory for
them to give us a response but it's not compulsory for the congress person to report back to the
chamber of deputies like to this administration part of the Chamber.
So we started to have a really low rate of responses.
We did a kind of marathon of reports in one week and we reported almost a thousand cases
and we just had, I think, 10% of response, which is pretty low.
So from that point, we started to turn our attention not to officially reporting cases,
but bringing them to the public, kind of a public arena, public place.
So basically, we gave Rosie a Twitter account.
I was going to say, it seems like the best way to call it out
is just to make it public instead of saying
hey, can you tell me more
about this? It's more like hey, this is happening.
Yes, and
until that point we're really
afraid of
publicizing some name of
representatives claiming that
there was a suspicion in his
reimbursement
because it was pointing fingers
and that wasn't the idea.
We shouldn't just point fingers.
But the end of the story is that it didn't work as we expected.
And when Rosie started to tweet Kase,
so we are careful with the language,
so she basically asks for help,
a kind of translation of the tweets because she basically asks for help a kind of
translation of the tweets because she's a machine so she's pretty much
repetitive in whatever she tweets but is hey people I found something suspicious
here can you help me look into it and and likes give a say is it really
something odd or I just mess it up?
Because like sometimes it happens, there's false positives.
And this was pretty good because a lot of different people started to follow Rosie. And when she tweets, they start to ask the representative like, hey, guy, what is this thing Rosie's saying about us? So it's, I don't know, maybe one, two, three, maybe ten people
asking the
congressperson
what the hell's going on with this
reimbursement. And this is pretty
this was effective, I think.
So that's how
we kept doing.
Rosie is still tweeting
things. People ported the code
so she... What's the Twitter for Rosie? Rosie is still tweeting things. People ported the code.
What's the Twitter for Rosie?
Rosie.Serenata, which is Rosie from Serenata, translating it.
I can add the link, so I can share the link.
So you put it on the podcast if you want.
Absolutely. I think it's very interesting that the limiting factor in Rosie's
effectiveness is the actual
structure of the government
itself.
Meaning that you'd have
to reorganize the way it even
works in order for the
corruption reporting
to have legal
ramifications for these people. But you can't stop the spread of
the information once it's been found and so while you had only 10 percent of respondents with these
claims that were being submitted now you can just say well if that's we can't restructure the
government we can at least bring to light the corruption into the public forum,
and then the individual people can hold their politicians to the fire.
That's really cool.
Yeah, and that's the idea.
That wasn't our first option, but that's the only way we found,
so we came up with that.
It's interesting that your perspective first was to just, you know, silently whistleblow back to the government potential corruption or just, you know, potentially just an error, you know, or an oversight.
Not so much, you know, saying these people are wrong or they're, you know, they're breaking a law.
Maybe it's by accident, who knows, to get essentially no response or lack of response or slow response in a lot of cases.
And now turn it over to the public and say, here's a public data set of erroneous receipts happening in our government.
And here's who's to blame.
Yeah, it's interesting because we work with CISED because this is public shaming and you shouldn't do that.
Like you're just pointing fingers and maybe like people will bully some congressperson because of no reason.
But an interesting story is that in the very beginning, our idea was to put Rosie to work and she
would give us back suspicions and then we have a kind of blind reveal of
suspicions and then we had like kind of Google form to people interest in
helping us investigating the suspicion so we would like sort cases and do this blind review.
And only after, let's say, three people flag it as, OK, this is not a false positive, then we would report it.
And that was a disaster because basically people haven't the knowledge of the law we had.
So people wouldn't say like, OK, he was in Vegas and it was just a beer, like a small bottle.
So it's not illegal or it's not immoral.
So that's okay.
When actually by the law, it's not written, but we were putting a lot of pressure on our shoulders, a lot of work on our desk to investigate all suspicions before reporting.
And then we were kind of afraid of just tweeting stuff and names.
But it was, I think, a really good experience in spite of the blind reveal thing.
And part of that is that because a lot of our followers or Rosie followers, they ask us stuff. So can the congressperson do this or that?
Which part of the law says this or that?
Or how can I investigate that?
This was really, really cool.
Like people were not just public shaming.
Of course, there were some doing it,
but it's not the kind of behavior we try to foster.
We care a lot of communication, like words and how we put cases.
And in our Facebook account,
we really try to share our techniques of investigation,
how we go from a receipt to a decision
if there's a false positive or it's really suspicious.
And this was pretty cool.
Like people's interest in law
from rose the suspicions like that.
I like that you're not just raising the awareness,
but you're also somewhat raising the education of the public's knowledge
of what the law is and isn't.
It's like a discussion, a forum around such things
that many people would never engage or learn about,
not given a medium like this.
Pretty cool.
So the question is, you got Rosie de Serenata.
How scalable is rosie so you have you have this twitter account and it's for brazil and it's in portuguese and the question that i always
want to seem to ask our guests is how scalable are these things like the idea of course is free
and anybody in their government or their locality can go out and build their own system but um how scalable do
you see this in light of taking the rules that are in brazil that are specific to brazilian law
and porting the system or maybe even just the idea of the system to different localities
because citizens of citizens of many countries are probably learn of something like
this and say oh i would love to have something like this where i live okay i have a lot to do
in this a lot to talk about in questions like that first of all and and and i think the most
basic step in this direction is that everything we do in terms of code, in terms of technology, is in English.
Again, we've been criticized because, oh, we are letting a lot of Brazilians out.
Maybe they don't speak English or they don't feel comfortable in discussing issues on GitHub in English.
But that's a decision we took and we embrace it.
So all the code itself is in English and all the comments are the discussions in this kind
of technological forum that is GitHub.
Because that's the idea, like people should use it to their own realities.
So this is the first thing. The second thing is that
to this point we are kind of specialized in analyzing reimbursements. So we if you
have other kinds of public expenses probably our classifiers won't fit
perfectly like you have to really write your own classifiers.
But on the other hand,
we try to design the software
in a way that is pretty much pluggable.
So you can have,
our architecture just requires basically an adapter,
which says where Rosie can find data
and a set of classifiers for this data.
And all the pipeline would work the same.
It doesn't matter if you're pulling data from Brazilian government or U.S.
government or my city or whatever.
So we try to be useful for other, not other countries only, of course, other
countries for sure,
but even inside Brazil, it's different.
Like if we are talking about a city hall
or the federal government,
it's completely different data sets.
And anyway, but we try to be this pipeline
where we can plug adapters for data and classifiers.
So you can't skip knowing the laws of your country, your city, your state,
and translating it into code.
Maybe if you find some similarities comparing your law to Brazilian law,
we use it, it will be way easier.
If it's completely different, probably have more work to do.
But the idea is for us to grow the project to the point
that we have a lot of references,
and that would make it easier for people to use.
Right now in Brazil, I've known about different cities
or different initiatives trying to adapt our code to municipalities, just city halls basically.
And we try to support as much as we can.
And also, I think there's this big thing of the idea of the project. So the day before yesterday, I was told some guys were looking into
another kind of expense by the government.
Again, I don't know the word in English,
but when the government wants to hire a service
or to buy something,
he can't just, like the government
can't just walk into the supermarket,
okay, I need this and buy it.
It has to publicly advertise that he's looking for the service
so every company can bid.
I think they call that a call for proposals.
Procurement.
Yeah, procurement.
Okay, okay, that's new.
They essentially put out a call that,
hey, we're going to have a project coming up.
We need to have proposals from an RFP.
A lot of people have to bid on it, and it's a process.
Okay, yeah.
Like I'm building a bridge so different engineers can bid.
Okay, I can build this bridge for this amount, and the government is kind of obliged to pick the cheapest one.
So those guys, like I've known about them like two days ago,
they did this for the city of Sao Paulo,
which is the biggest Brazilian city,
for this RFP.
So they are using NLP
to cluster these calls
by similarity.
So when they have very similar calls
with very different prices,
there's something wrong. So probably there's someone trying calls with very different prices, there's something
wrong. So probably there's someone trying to take advantage of one or another call.
So they actually, as far as I know, I've gone through their GitHub. As far as I know,
they haven't used not a single line of code we wrote. They could, it's all open-sourced. But
I think the idea is spread and i this is amazing this is really good
so we you don't have to use rosy or whatever uh code you write but just using technology to
helping you to make sense of public data that's amazing that's what we really expect to foster with this project.
In this final segment of the show,
we talk about the importance of open data,
but more importantly, making it accessible.
This involves data scientists joining the effort to help make this not just public data,
but usable public data.
We also call out to all of our listeners in Brazil to reach out and get involved in this project.
We'll be right back. This episode is brought to you by TopTow.
TopTow is the best place to work as a freelancer or hire the top 3% of freelance talent out there for developers, designers, and finance experts in this segment i talk with josh chapman a freelance
finance consultant at toptow about the work he does and how toptow helps him legitimize being
a freelancer take a listen yeah in my arena within toptow i specialize in everything from market
research to business plan creation to pitch decks to financial financial modeling, valuation. And then that leads very naturally into fundraising strategy,
capital raising strategy, investor outreach,
closing a deal, deal negotiation,
how to value the company, how to negotiate that.
And all those skill sets that I have continued to hone
over on the TopTal side are ones that I actually deploy
every single day in my own company. Freelancing can
sometimes be seen as not legitimate or subpar work. Now, I would argue that when you work with
a company like TopTal, they put so much vetting into not only the companies that you work with,
but also the talent that you work with, which I'm on the talent side, that it adds a level
of legitimacy that isn't seen across other platforms. And that for me, as the talent side, that it adds a level of legitimacy that isn't seen across other platforms. And that for me,
as the talent side, is incredibly fruitful and awesome to be a part of, right? I enjoy the
clients. I enjoy the other talent that I get to talk to. I enjoy the TopTal team. And that creates
an overall positive experience, not only for TopTal, but for me as the talent and for the client
as the company on the other side. And that is really not seen or is the experience across other platforms in the freelance market.
So if you're looking to freelance or you're looking to gain access to a network of top
industry experts in development, design, or finance, head to TopTal.com, that's T-O-P-T-A-L.com
and tell them Adam from The Change Law sent you.
For those wanting a more personal introduction, email me, adam at changelog.com. I think that's one of the things that gets me personally so excited about open data,
public data from governments is that it allows those
people out there that have the ability to look at the data and examine it and potentially
cross-examine how government spending is being done and put the power back into the people's
hand versus just assuming that there is no corruption or there is no illegality is taking place. It's, you know, that something,
someone out there is, uh,
is looking into this in ways that aren't just trusted individuals.
Yes. Yes. And I mean, sometimes like, uh,
when we started the project later, they, they changed the API, but the,
the chamber of deputies, they used to, uh,
follow this, uh, open data law open data law so they were kind of
it was compulsory for them to public publicize this data but they actually did it in a really
massive xml file it was like five gigabytes so okay it's open data, it's out there, you can like click and download and there
you have all the data from this department, from this part of the government.
But actually, how accessible is a five gigabytes XML?
It's basically like, I don't think I can open it in my computer, it doesn't have enough
memory to handle a file this big.
And also, is XML the proper file format to make that accessible?
I think just tech people know that XML exists.
Like if you tell my mom, probably she has no clue about how to open XML file.
So I think it's really good to have open data, but
we should be very critical
in pondering how
accessible is
it for people?
And one step further,
how can people actually make sense?
Because if I open
in Excel, some spreadsheet software,
1.6 million lines file,
how can I actually understand what these lines are telling me?
So I think it's a really good thing to bring data science
to help you make sense of this data.
How many times have we said that, Jerry,
where it's nice to be open source, but wouldn't it be nice if it was also accessible? data science to help you make sense of this data. How many times have we said that, Jerry?
It's nice to be open source, but wouldn't it be nice if it was also accessible?
Open source is one thing, but then
the accessibility of the project
or the data in this case,
we said at least a dozen times in the show, I'm sure.
Absolutely.
Yeah.
There's a lot of work to be done
in taking public data
that's public the way that Ed has described
because it has to be
but there's no investment into it at all
they basically throw it over the wall so to speak
and take that into usable public data
and there's lots of foundations doing that kind of work
civic hackers and stuff like that
because you say an XML file is bad.
Well, in terms of programmatic access,
it's actually kind of nice.
Like you said, it's just humongous
and it's difficult to parse reliably.
But what's worse is PDFs or scanned images.
There's terrible forms of public data.
We can't get to
the interesting work until we can get
access to what is
rightfully ours in whatever
locale you happen to be in.
The citizens.
It's a big problem.
Totally.
We started to
write our own dashboard to browse data.
So if going through an analysis, we want to look to more details on a specific reimbursement, we need a kind of dashboard.
So just put the reimbursement number and you have all the information.
Which reminds me that we don't look only to the data set provided by the
Chamber of Deputies but it's right we start to add layers of information so okay the reimbursement
was at that company so go to the federal revenue to grass to grab more that about the company then
we have the address of the company so we can ask Google to a picture like Google Street View so where this company is so
there's a lot of layers of data we add to the original data set and actually this dashboard
started as a kind of internal tool for us to do whatever we were doing and now there's a big
effort in the in the team in the core of contributors to make this dashboard more accessible.
Because as an internal tool, it was really, really terrible in terms of UX.
You have to know the reimbursement number and some other code, like numeric ID, to get
to the data.
And now we are trying to, okay, you can search by name and you can filter just your state and, let's say, reimburse them from this or that category, like meals from my state in 2015.
And I think this is pretty interesting.
We've seen journalists doing amazing jobs because we offer this tool for them to browse data, to browse government data.
And that's something the government should have done, I guess.
I think maybe they don't have enough people to do it.
Maybe that's not their focus.
But this kind of dashboard, that is not technical.
It doesn't require Python or Pandas or whatever for the user.
This is something really important.
It doesn't matter if it's a kind of civic activism stuff
or if it's provided by the government itself.
This is the kind of thing that really should exist out there.
Maybe something that's sitting on everybody's thoughts, lips, whatever,
as they're listening to the show is like,
this name.
I said it not a damn more.
Word of serenade of love, if you, if you, you know, translate it into English,
serenade of love.
What, what does it, what does it, why this name?
Where did this come from?
Okay.
Like a literary translation, translation wouldn't make sense because it'd be love serenade,
I guess.
Serenade is like when people like sing a song for someone who he or she is in love with right right so but actually this is the name of a very famous brazilian chocolate
which is kind of even more bizarre i guess but the point is a lot of our listeners are really
into chocolate yeah okay good thing i know. I know this by our downloads.
Certain cities are really into chocolate.
They make downloads in those cities.
Just kidding.
I'm assuming our listeners like chocolate.
I like chocolate.
That's a pretty safe assumption.
I mean, most people like chocolate.
Who doesn't like chocolate?
I mean, come on.
Yeah. Who doesn't like chocolate? I mean, come on. But the real reason why we pick it up like a chocolate name is in mid-90s, there was a Swedish politician.
She was probably going to be the next prime minister of Sweden.
And for some reason that I don't remember why,
they started to investigate her.
And they realized that she was using public money
to buy stuff she wasn't supposed to buy.
And one of those things was a single bar
or maybe two bars of Toblerone.
Yes, Toblerone, yes.
Those are good.
So it became known as the Toblerone Affair.
And yeah, I think you can Google for it.
I don't know if we have a Wikipedia page for Toblerone Affair,
but if you go to this politician page on Wikipedia,
it's there, Toblerone Affair.
It is there.
Yeah, I'm on the selling.
Yeah, we're going to leave this up there. Yeah, I'm on the selling side.
We're going to leave this up in the show notes so everybody can follow along.
This is hilarious that it's well, I guess not hilarious in hindsight
but the fact that it's connected to chocolate
you know.
I think it has a lot to do
with the kind of
irregular
or illegal usage of public
money we expect to find using data science.
Because when there's a big corruption scandal of millions, billions, maybe, I don't know,
trillions of whatever currency you're using, probably someone already spot that and someone
is working on it. But when we use big data, probably we are seeing a lot of small cases,
a lot of small cases,
a massive amount of small cases
that hardly ever would be spotted by a human being.
That was our hypothesis.
And I think that's, well,
has a lot to do with the Toblerone affair.
It was a level of corruption
in terms of monetary value very low.
So this is the main reason. In Brazil, there's a second reason that in Brazil, when our FBI,
let's say our federal Bureau of Investigation is investigating something. Of course, they can't say
I'm investigating this case with a very
meaningful name, so they just
give random names. I don't know how it
works elsewhere.
So
calling our project Serenata Amor was
a kind of joke with those random names
our FBI uses.
So
it's usually Operation Something
and something that makes no sense
like Operation Sandcastle,
Operation Car Wash
and then we have Operation
Car Wash.
Yeah, Car Wash, actually
the biggest
investigation on corruption
in Brazil, it's going on.
Anyway, and there's a third reason that it's basically our investigation on corruption in Brazil. It's going on. Anyway.
And there's a third reason that it's basically our love story
and aid for our country,
like the kind of gesture
we can do as a citizen
to help our country.
So this is the cheesy one,
but I love it.
Great name.
Lots of meaning inside that name.
That's excellent.
And how can we as the hacker community get involved,
help you out, further your cause?
We have lots of listeners in Brazil.
We have listeners around the globe.
How can we help out and get involved?
Well, we would be really, really proud
if you get inspired by some of our ideas
and try to do something local.
We think we don't have to
help us out in the sense of making this project better in Brazil for the levels of administration
we are working at. Feel free to take the idea forward. This would be amazing.
If you are just wanting to get deep in the code and stuff, we have a lot of issues from deployment to UX to DevOps to data science.
A lot of analysis we would like to do and we just can't.
We just haven't had time to do it. And this is basically because we started with this big crowdfunding campaign.
And since then, we basically had no other big fundraising except a recurring crowdfunding campaign we started when we ran out of money.
So we are really glad because people are supporting us.
But unfortunately, the amount of money we raise
is not enough to put a couple of people,
two, three, four people working full-time on the project.
So right now we just have two or three part-time developers
in the project.
So if you want to write code,
there's a lot of things to write,
from data science to development to UX to DevOps or whatever.
And, like, there's a lot of communication stuff going on.
So there's someone looking after our social media. There's people from law school helping us to dig into laws and think about new ways to get better results out of the report.
Or maybe you think about new hypotheses that could be translated into classifiers.
So actually, there's a lot of things to do if you want to help us. I think get started reading something about us.
There's our website.
Probably the link will go in the podcast post.
Or maybe in our GitHub if you're a more techie, savvy person.
Feel free to drop a line on GitHub saying hi.
Very good.
Well, Ed, this was a lot of fun.
Thanks for stopping by and telling us all about it.
Thanks for this opportunity.
It's really, really good
to talk about this project
because not exactly the code we write,
but the ideas underneath this code
is really important for us.
And it's really, really a pleasure.
It's an honor to be here
sharing these ideas with you guys.
So thanks a lot for this opportunity.
Absolutely.
All right.
Thank you for tuning into the show this week.
If you enjoyed the show, you know what to do.
Share it with friends.
Read us on Apple Podcasts.
Tell everybody you know, please.
Thanks to our sponsors, Bugsnag, Linode,
CircleCI, and also
TopTile. Big thanks to
Fastly, our bandwidth partner. Head to
fastly.com to learn more. We
host everything we do on
Linode cloud servers. Head to
linode.com slash changelog. Check
them out. Support this show.
This show is hosted by myself, Adam Stachowiak,
and Jared Santo. Editing is hosted by myself Adam Stachowiak and Jared Santo editing is
done by Jonathan Youngblood and the awesome music you hear is produced by the mysterious
Breakmaster Cylinder you can find more shows just like this at changelog.com
or where you subscribe to podcasts thanks for listening Thank you.