The Changelog: Software Development, Open Source - Open Government and the Citizen Coder (Interview)

Episode Date: February 1, 2011

Adam and Wynn caught up with Carl Tashian from Open Government to talk about OpenGovernment.org, OpenCongress.org, and the rise of the Citizen Coder....

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the ChangeLog episode 0.4.7. I'm Adam Stachowiak. And I'm Winn Netherland. This is the ChangeLog. We cover what's fresh and new in open source. If you found us on iTunes, we're also on the web at thechangelog.com. We're also up on GitHub. Head to github.com slash explore. You'll find some training repos, some feature repos from the blog, as well as our audio podcasts. If you're on Twitter, follow Change Log Show. And me, Adam Stack. And I'm Penguin, P-E-N-G-W-Y-N-N. This week's episode is sponsored by GitHub Jobs. Head to thechangelog.com to get started. And if you'd like us to feature your job on the show,
Starting point is 00:00:47 select Advertise on the Change Log when posting your job, and we will take care of the rest. If you're in the New York City area, a startup crowd tap needs a behavior-driven development nut who contributes to open source and knows the law of Demeter violation when he sees one. If you're interested, lg.gd.com. And if you're looking for the best gig any passionate engineer could ever imagine,
Starting point is 00:01:08 you've got to check out the software master gurus at Red Radiant. Check out lg.gd slash 6y. And London-based Alpha Sites needs a soup-to-nuts Rails dev familiar with every level of the application from CSS to SQL and all the Haml and Sass in between. If you're in the Covent Garden
Starting point is 00:01:24 area, follow lg.gd.com. Not to mention you'll be over there soon. I will be over there soon. If you're a tweet in London, give me a holler on the Twitter. Let's hook up. And that's next week, huh? That is next week. Fun episode this week.
Starting point is 00:01:39 We talked to Carl Tasching over at Open Government. Talked about some of the state APIs they've got developing. This is actually probably in youred about some of the state APIs they've got developing. This is probably in your more neck of the woods because of Tweet Congress and your I guess deep desire for APIs and whatnot, but this is a fun episode. Yeah, kind of a mashup of a lot of
Starting point is 00:01:57 passions that I have. I guess politics and APIs and Ruby and Rails. Fun stuff. And I think as devs, too, this is a real fun subject that you can dive in. They've got needs for front-end designers, too, so don't feel like if you're just a Rails dev or a Rubyist that you can't jump in. If you're a Ham or a Sass or just a simple front-end design, dude, they need your help. So check out the project.
Starting point is 00:02:19 Absolutely. It's the rise of the citizen coder. So if you want to get involved and affect your government, this is the way to do it. Absolutely. Fun episode. Should we get to it? Let's do it. I'm chatting today with Carl Tashian from Open Government. Carl, I want you to introduce
Starting point is 00:02:43 yourself a little bit about what you do over at Open Government. Sure. So I'm the Director of Technology at Open Government. We're really small. We're based in New York. We're a nonprofit. And so I end up – I write a lot of the code and coordinate the development on both opencongress.org and opengovernment.org, which is our newest project. Why don't you give the folks a little bit of background about what those two projects are? Yeah, so Open Congress was sort of our flagship project launched in 2007.
Starting point is 00:03:15 It's an open-source Rails application that lets you read bills in Congress and find out sort of what's going on in Congress, track people in the media, and comment on the bills. So comment on legislation and sort of integrate some of the social media stuff with what's happening in Congress. And it kind of gives, I think, a better interface than what you get from Thomas, which is sort of the U.S. government's standard site for that. And open government brings a lot of that stuff into the state legislatures. So we started with five states, Texas, California, Louisiana,
Starting point is 00:03:58 Maryland, and Wisconsin. So I see on your site you're a partner of the Sunlight Foundation. I know when we built TweetCongress.org, we leveraged their APIs heavily. Are you guys using code or just getting some backing from Sunlight? We're definitely using code. So the Open States Project is really the core data. They provide the core data for opengovernment.org. And they've really, like, they've done the hard work of this project in terms of, you know, they're writing all the scrapers.
Starting point is 00:04:30 That's also open source. So they scrape all of the state government websites and then provide a consistent API for legislators, bills, committees, you know, and votes and sort of everything that's going on. So as the person who's been directing the project for this past, I don't know how long this project's been going on, but I know when we first blogged about this on the Change Log, it was, I guess, about three weeks ago. And I got pretty excited about just transparency in government and what that means for us as
Starting point is 00:05:02 individuals. But what's the last month been like in terms of not just being open source, but also moving to this launch stage? I mean, it's been great. It's really rewarding to, you know, to finally launch something you've been working on for a while. I mean, we've been kind of working on this, I guess, on and off since last January. And, you know, I know it's been
Starting point is 00:05:25 sort of on David's mind, our executive director for a long time, to really reach down into the states and the local governments with some of the stuff that's been going on with Open Congress. So, yeah. It's been really exciting because
Starting point is 00:05:40 we're getting some volunteer developers now and getting some attention for it. And I just love to, you know, finally be getting some real feedback from, you know, all over the country from people using the site. So. So when you're pulling down news into open government, where are some of these sources coming from? I'm seeing that some are like, you know, Google News and the Money API and Open States API. Tell me about some of the connections and how you're pulling back that data. Yeah, so Open States API really provides the core data set. That's the official data from the state legislatures. And then we've got scrapers for, you know, Google News, don't tell anybody. And I think we're doing Technorati also a little bit.
Starting point is 00:06:35 And then we're pulling tweets on the client side. So if you look up a member of state legislature, you'll see tweets about them. We've got campaign contributions coming in from Follow the Money, which is an amazing organization that gathers and aggregates all this stuff. So that actually is coming through also. That's coming through another Sunlight API. So Sunlight took a lot of the Follow the Money data, and they sort of reformatted it a little bit. And they've got a site called Transparency Data,
Starting point is 00:07:11 and we're pulling the follow the money stuff from Transparency Data. I love how this is all a mashup like that. I mean, this is intense how a lot of different services can piggyback off each other, and essentially this is open source, and it's providing such great data. Yeah, and all of these APIs are open source too. Well, a lot of them. Definitely all the Sunlight stuff and our stuff. We also have a gem called GovKit that's part of this project. So Open Government relies on, we just wrote a gem to wrap those APIs,
Starting point is 00:07:36 the Sunlight APIs and a couple others. That's actually, GovKit's in the queue to be covered on the changelog. I'm writing a blog post about that. I discovered that when I kind of's in the queue to be covered on the changelog. I'm writing a blog post about that one. I discovered that one. I kind of play in the space. Do you know Luigi Montanez over at Sunlight? Oh, no, I don't. Okay.
Starting point is 00:07:54 Well, I came across GovKit the other night. I was looking at the Open Government GitHub page because he and I teamed up on a transparency data gem that wraps the sunlight transparency data API. And then I came across GovKit and I thought, what a great name. It's kind of like the gem Fog wraps all these cloud service providers. You know, GovKit kind of wraps all the different open government space APIs. It's really a cool name. Yeah, that's the, you know, that's what we're looking for. I'd like to see more in there. We just sort of did the ones that we needed for open government, but I hope that it can become a little bit of a hub for those APIs.
Starting point is 00:08:35 So do you have a political background at all prior to this project? Not really. So I worked at Zipcar for about five years, and I built a lot of the technology behind, um, behind Zipcar. And then I was freelance for a while and, you know, this opportunity came up and, um, I sort of, I mean, personally, I felt, uh, really just disengaged from politics. And, um, it seemed like an opportunity to change that, to see if there was a way I could sort of find a better connection with it
Starting point is 00:09:12 and help other people do that. So, yeah, that's how I got involved. I really think it's great. A project like this helps with transparency. And do you think that technology or projects like this advance the cause of transparency in any meaningful way?
Starting point is 00:09:32 Yeah, I do. I mean, I think it's a long road and we're sort of toward the beginning of it. This is definitely, I would say open government is a first step because, you know, a lot of, if you start to look at the bills, a lot of the legislation is, if you don't have a law background, it's almost impossible to understand
Starting point is 00:09:51 what's actually going on. And I think that, so with Open Congress, we've sort of worked, our fix for that in a way has been editorial content. So we have Donnie Shaw, who's just a fantastic blogger, writing about what's going on in Congress on our Open Congress blog. And I hope we can do some of that with open government as well and maybe have some bloggers in each state or something. Because I think that this data does need some editorial context around it for most people.
Starting point is 00:10:24 You mentioned you guys have been working on this since last January, not this most recent, which was, I guess we're still technically in January. I'm trying to advance to February as quick as we can, I guess. So it's been about a year since you've been on this project. What were some of the most biggest technical challenges that you've overcame over this past year? I think the big challenge overall is that when you're merging a bunch of large data sets, there's always going to be a lot of sort of hidden anomalies and things that you've
Starting point is 00:11:01 got to work around. And yeah, that's been the challenge, is just lining things up. I've sort of learned a little bit about how to manage that, but it's still something I think we need to work on. How do you sync up six different data sets, especially when some of your fields are sort of overlapping? Like we get, you know, photos from sunlight for some, we get like the URLs for photos of legislators for some of the legislators from sunlight.
Starting point is 00:11:34 And then we go back to VoteSmart and get the rest, you know, or a lot more. And then it's sort of now you've got this field that's being updated from two different places. And those kinds of syncing problems seem to come up a lot. So that's some of the stuff we've been dealing with. Yeah, I think also with large data sets, there's always the SQL questions of sort of how do you aggregate things
Starting point is 00:12:03 and make the site, you know, run fast. So, you know, as you dive into the open States and the data that is coming out of that project, you know, in large corporate settings, you'd be surprised how much of the business actually runs, not on the sophisticated, um, high end servers and things, but on CSPs and Excel spreadsheets to get passed around. What's the state of the data that you're finding in the state government level? Yeah, I mean, I think Sunlight deserves so much credit for really making that happen through, making that stuff available through great, like, clean APIs.
Starting point is 00:12:40 And that's definitely a breakthrough. I think you're totally right about the CSV files and Excel spreadsheets and stuff. Yeah. So when we look at the different data sets that you're kind of bringing together, what is the database backend that you're using, and how are you actually going over
Starting point is 00:12:56 some of those problems you just mentioned? Yeah, so we've got Postgres backend, and we're also using PostGIS to do some basic, like, geo stuff. You can type in your zip code on the front page and get a list of all of your representatives from the federal and state level. And we're also using GeoServer on this project, which ties into PostGIS. And that's pretty interesting because we map the geography of any vote in the legislature. So you can actually see red for Republican and blue for Democrats and then different shading depending on whether they voted yes or no or abstain. And that was pretty fun to put together also.
Starting point is 00:13:47 So that's the sort of, those are the pieces on the back end. We also have, we also did something with MongoDB, which I'm not sure if I'm going to regret this later because there's sort of no way to join those two databases. But I wanted a fast way to track the page views on the legislators and bills so we could show people, here are the most viewed pages and stuff. And I didn't really want to store all that stuff in Postgres, so we just set up a really simple MongoDB
Starting point is 00:14:18 that stores the pages, and it's recording through a little JavaScript hook on the client side that goes into a Rack app. So it's very basic and pretty much detached from the rest of the application. So that's the other piece, but I think it's pretty minor. Mongo's got some nice operators there for increment and decrement to handle
Starting point is 00:14:47 large arrays like that. Yeah, absolutely. I mean, it's so fast and we can really, you know, I think because it was the kind of thing where we couldn't use an external analytics service for it. We couldn't like pull from Google, this kind of stuff. We just really needed to track it locally, but we didn't need to track a lot. We just really wanted page views. We basically break it down by hour
Starting point is 00:15:10 and just make an entry in MongoDB for each hour and then how many views were on that object. So when the devs that are listening to this podcast right now, they're actually on GitHub right now, they're about to hit the fork button. When they hit that fork button, what are some of the things they could do to contribute to this project?
Starting point is 00:15:28 Oh, well, there are just so many. It depends on the scale. So we've got some really good install instructions, so it's easy to get started. We've got everything from bug fixes that are needed right now.
Starting point is 00:15:45 I think that our test coverage could be a lot better. We're using RSpec and Cucumber. What else? On the bigger scale of things, I would love to see an API. One of the things that we've got that we pulled together here that I think nobody else sort of has right now is this district lookup thing where you can sort of say, here's my zip code or here's my address, and you can get back your legislators from the sort of federal and the state level together and all their contact information. I would love to see that as an API call that we could offer. And, you know, it's a pretty simple project, but that's just an example of something that we have coming up.
Starting point is 00:16:29 So we had the guys from Sunlight on early in the life of the show. We're going to be talking to the Code for America guys, I guess, maybe in a couple of weeks. We love projects like Open Government that allow just developers to get involved and kind of give back. Do you see this as a trend where it's the citizen coder kind of advancing government and improving government around the fringe from the outside in, or do you see the government spaces just improving on their own just as the pace of technology improves?
Starting point is 00:17:04 I think both are going to happen sort of simultaneously. And I think that it's great to see this sort of citizen coder thing because those efforts, I think, can really push the bureaucracy to do more. And I think a good example of that is the crime spotting site in, uh, the story of the crime spotting site in Oakland where, um, uh, you know, Stamen made this beautiful site that, you know, that pulled these crime maps, the crime data from the city of Oakland and made it, you know, it was a flash thing. This was a couple of years ago, but they, um, they were scraping it. And then the sort of the Oakland Police Department shut them down for a little bit. And then I think they had so much support and they had so much visibility for the
Starting point is 00:17:54 project that Oakland just sort of had to cave and say, okay, we'll open it back up again and we'll work with you. And then two years later, San Francisco came and said, yeah, we want crime spotting too. And here's the data. And they handed over like a perfect KML file or whatever of, you know, of all the stuff. So I think that's my understanding of how that went down. And I think that's, you know, that's a great sort of outcome, you know, for that kind of project. So I would love to see more of that. The piece that I like a lot, at least from a numbers perspective and this view perspective, is the money trail view.
Starting point is 00:18:32 You can actually see where a lot of your money is going in your state and stuff like that. But what I find, this view seems to be, like it doesn't make complete sense to me. So how do you help not only pull back this data, but also make sense of it? Yeah, I mean, I think that's really the next step. We could use more in terms of visualizations of that stuff.
Starting point is 00:18:51 I think that the money trail, there's a lot of data there, and we're aggregating it in the simplest possible way right now. We had to make some changes on that kind of at the last minute. So, you know, I think that could be made more clear. And I think that overall, like that's the arc of the project is how can we just keep making all this data that we're bringing in more clear and more understandable for people. So, I mean, we're really just getting started, you know. It reminds me a lot of the Document Cloud project. Are you familiar with that one? We actually use Jamit, which is their asset packaging thing.
Starting point is 00:19:34 And I know that they're, were they a Knight fellow or something? Yeah. Sure. You know what I love about the project is not only is it helping to turn documents into data, but at the same time it's giving so much back to the open source community as byproducts. Yeah, that's great. And that's what we hope to do too. Yeah, it's a challenge because when you're running a service, you know, okay, so I think one of the things that makes sort of open government different as an open source project is that it's actually a service, right?
Starting point is 00:20:04 It's like it's a full Rails app. There aren't a whole lot of those. It's mostly gems that people are contributing to. So it's this trade-off. I think GovKit, we've broken that piece off, and I think there's more we could be doing. Maybe some of this Mongo analytics stuff could become its own gem, for example.
Starting point is 00:20:25 But on the other hand, it's interesting to be able, I think if you're just learning Ruby on Rails, to be able to see a whole app and sort of here's how it works. And I would hope that there are some sort of best practices here that we're using or that we can be using to exemplify a good Rails app. So when you actually mentioned the kind of help that you're looking for and you said you need more help in the visualization sections of this application, does that mean that somebody like a designer, for example,
Starting point is 00:20:59 or someone who's an infographics nut that just really gets in can take this large set of data and make it make sense in unique ways that can be viral or communicated well. Is that the kind of help you're looking for as well, or is it just strictly programming sites? Yeah, absolutely. That would be really helpful. And I think they go together. I was just looking at that ProtoViz, that great JavaScript library for visualizations,
Starting point is 00:21:23 and that kind of thing. There's a lot of opportunities on the site. JavaScript library for visualizations and, you know, that kind of thing, right? We can, there's a lot of opportunities on the site. Now that we have the data in place, I don't think it would be very hard to add in some of those things. So. So with five states open, plenty of room for opportunity here. Absolutely. Yeah. And the schedule is, so Sunlight is actually working on the scrapers for the other 45. And so that's sort of an ongoing project. And we will launch, you know, as we can, more states. Is that kind of stemming from some of the progress needed on open states as well? Because I'm not sure we talked about it too much.
Starting point is 00:22:03 Yeah, yeah. And they definitely need developers there. All the scrapers are in Python. And I mean, that's really, like, as I said before, I think that's the hardest piece of this whole project. I mean, they've really made it easy for us because we just sort of pull a data feed from them. And it's very, they've made it really consistent over the last few months. And, you know, whereas going to these state legislative sites to scrape that stuff just seems like it would be a really tough problem that they've taken on.
Starting point is 00:22:32 I don't know if you've looked at the state legislatures, but a lot of these websites are not pretty. And the data is not in a consistent format at all. Well, I don't think it's their highest priority, I guess. But that's a different subject and a different kind of podcast. So we have open Congress. We have open government. We have open states.
Starting point is 00:22:54 What else are we opening up? Yeah, I mean, that's really – those are our projects right now. Open treasury? Yeah, I mean, let's do it. I arbitrarily linked to this, I guess, pretty heavily viewed YouTube video about how this person could not – in Congress could not answer a simple question, which was where did this large stack of money go, which was $6 trillion or something like that. It was just a huge amount of money. And when I look at scenarios like that and she's fumbling over her answers and we can't get clear yes or no, this is what happened or this is who we gave it to or this is how we're tracking this money. As a citizen who pays taxes and who does all the right things and trusts our government, I've got to look at our treasury and say, how are we putting out these bailouts? And how are we – and I understand the reasons, and this is not the state of this podcast, but, you know, is Open Treasury next?
Starting point is 00:24:14 You know, I don't know for us whether that's, I would love to see it. You know, I'd love to see that kind of thing. I think that these projects are so critical right now. Just as you're saying, there's just so much sort of stuff that's going on behind the scenes. And yeah, I mean, totally. Open Treasury, I think there's a lot that can be done around elections too, you know? And sort of figuring out how to, I mean, I spend hours and hours
Starting point is 00:24:40 every time there's an election just trying to make the right choices on the ballot. And I still don't feel like I know whether I did or not in the end. It's like it's really – I think we're always painted into a corner because we – our choices are what we're given, not what we actually truly elect in some cases. So I mean –
Starting point is 00:24:56 Yeah. I don't think you ever get to make a right choice when you're at an election booth. You know what I love about the open government space though? It really cuts across both sides of the aisle. When we created Tweet Congress, we were surprised. We got an initial set of seed data from Sunlight. We were really surprised to find a two-to-one Republican-to-Democrat advantage of politicians using Twitter, at least at the federal level. But what we found was the developers that were interested in that space were just as fervent on both sides of the aisle. We met a lot of friends, you know, in that space.
Starting point is 00:25:29 Yeah. Because it's really like, you know, it's the objective, uh, sort of criteria that everybody's looking for. It's the objective information. Um, hopefully. Right. So Ruby and Rails is the platform behind the open government website. So what were some of the decisions around choosing that as a technology? Well, we are sort of already invested in it with Open Congress and have been doing that since 2007. And I also – my background is I did freelance Rails development for three years. And I really just, I love Rails.
Starting point is 00:26:06 I think it's a great platform for making web apps. I'm curious about, I didn't look at the code base, but I'm curious if they're using Sass. We're using Haml, but we're not using Sass. Yeah. And Haml's been great. Yeah. So OpenCongress is all ERB, and when I just look at the difference in the files,
Starting point is 00:26:29 it's just astounding. I'm always amazed at the people that pick either, they can either hate both Hamill and SAS, or they'll say, I love Hamill and hate SAS, or vice versa. It's always intriguing to me to listen to the, I guess, opinions behind that. So why Hamill and not SAS? to listen to the, I guess, opinions behind that. So why Haml and not Sass? You know, I don't know yet.
Starting point is 00:26:52 I think Sass could work for us, and we're just not there yet with it. I guess my only concerns about these things are just, you know, how long does it take to actually serve up the page in the end? And I feel pretty good about Haml. I think that it's pretty clear now that it's just as fast as ERB in most cases, right? Yeah, that's the beauty of Sass. I end up pre-compiling all my stylus sheets and don't even really integrate with the server.
Starting point is 00:27:21 You know, just spit out the CSS from SAS and link it up like you normally would. Yeah. So I think we just got to get that into our workflow. Um, it's probably not even, it's probably not even a question. Um, yeah, I mean, given how much I love Hamill and we love working with it, um, it seems like that would be great. We've been asking a series of questions here in the last few episodes, just to kind of get a better look at the developers that we're profiling. So a series of either-or questions, and you can say none of the above if it so fits. So Bash or Zshell?
Starting point is 00:27:55 Bash. I just know the answer to Haml or ERB. Yes, Haml. Your terminal font. Oh, I've got to look that up. I keep changing it. Because Kenneth and I are creating a site that's going to help you pick your terminal font. We just had this brainstorm about an hour ago.
Starting point is 00:28:15 That's a great idea. We have this debate on whether or not Menlo or Inconsolata. I know you're an Inconsolata fan there, Adam, but the serifs are too much for me. Am I? I thought I switched to Menlo. Oh, did I get you to switch? Because you twisted my arm, yeah. I browbeat you into switching to Menlo. Yeah, I couldn't. Well, I used to be an Anonymous Pro user, that's why.
Starting point is 00:28:34 Ah, Anonymous Pro, that was the one. Yeah. Yeah, so I'm using Consolus on TextMate, but then I just looked in my terminal and I've got Bitstream Vera Sans here. I think I need to change that. That was going to be my next question. TextMate, Emacs, or Vim? It sounds like you're a TextMate guy. Yeah, yeah, definitely. Although I do use Vim a lot
Starting point is 00:28:51 and have for a long time. Depends on the circumstances, doesn't it? Well, Carl, we're at the point we like to ask the cool question about what you're doing in open source. So, what in open source right now has got you excited that you want to fork and play with?
Starting point is 00:29:09 I love all of the stuff that we have incorporated into Open Government. I mean, I think this project wouldn't be possible without not just Rails, but just these gems like Jam it um uh things like mongo dp i mean i'm just i'm excited about so the whole ecosystem of um of ruby and um and sort of rails gem you know gems and and it's yeah i guess it's hard for me to choose. I think I like to think about it like whatever – I just like – I like that there is always a sort of an option for whatever I'm working on, whatever the job is. It always seems like there is something that's going to help me along the way in the ecosystem right now. So let's say you had a long weekend, a four-day weekend,
Starting point is 00:30:07 and you had no open government work to do whatsoever. You weren't even going to touch it. What would you play with? Oh, good question. That's a loaded question because it assumes that he doesn't have a life like we don't have lives. I think that, I mean, I'm kind of going back, I'm getting back into, I'm learning a lot right now.
Starting point is 00:30:28 Like, I think things like MongoDB and sort of document store and a lot of the real-time stuff is really exciting to me right now. And, you know, it's hard to kind of keep up with all sides of the web app, sort of the stuff that's happening for the innovations that are happening sort of on the back end and then the more like client side stuff. And I'm getting excited about the client side stuff and the sort of the real time, you know,
Starting point is 00:30:56 more stripped down sites. Like there's a part of me that wishes we had been able to use Mongo for this project because I think it's exciting. I think it's suitable for, somewhat suitable for the data that we're using. And, you know, but it was also just like a lot of, like, I, you know, my background is Oracle from Zipcar, and I'm very comfortable with SQL, and I'm very comfortable with that setup. So that's the trade-off.
Starting point is 00:31:28 And I think it's great what we've got, but I also could totally see this working with more of a document store model. Well, thanks for joining us, Carl. It's been fun talking about open government, and we'll keep an eye on the GitHub to see what bits you guys are releasing next. Yeah, thanks, you guys.
Starting point is 00:31:49 I love the changelog, and it's just been, it's so important to have a hub for all of these projects and have somebody talking about them, so thank you. We enjoy doing it. Thank you. See it in my eyes.
Starting point is 00:32:24 So how could I forget when I found myself for the first time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.