Archive for the 'Data' Category
Geocoding
geopy is a geocoding toolbox for python. See the website for installation instructions. It uses third-party geocoders (such as google maps) so you can add geographic coordinates to the addresses in your application. I cooked up my first python script to use it. You give it a csv file with addresses and it returns a csv file with addresses + latitude and longitude. It might be useful to someone out there.
from geopy import geocoders
g = geocoders.Google('your-google-maps-api-here')
import csv
writer = csv.writer(open("out.csv", "wb"))
writer.writerow(("endereco","cidade","estado","pais","latitude","longitude"))
reader = csv.reader(open("endereco.csv"))
for row in reader:
now = row[0]+","+row[1]+","+row[2]+","+row[3]
try: place, (lat, lng) = g.geocode(now)
except: place, (lat, lng) = "NA", ("NA", "NA")
writer.writerow((row[0],row[1],row[2],row[3],lat, lng))
No comments
Election 2002 data (Brazil)
The TSE for unknown reasons pulled out the Access files for the 2002 election. I haven’t looked at them in a while, but here are the two that I have:
and
VotoMun_DadosCand_2002.mdb.zip
these are big files. Let me know if you find out that just one of the two is needed or if you have problems downloading. eleoni at gmail dot com
3 commentsData Visualization for the Masses
There are a few websites/startup companies trying to fly the idea of being a repository and visualization engine for data that anyone can upload. Swivel, for example, made some splash in the past few months as the “youtube for data”.
I experimented with Swivel and a few others. The main problem with all of them is the lack of tools allowing conditioning, at least in an easy way. Conditioning is important for constructing small multiple plots, or even plotting groups in different colors in scatter plot. For example, take the ideal point data I uploaded:
In order to different parties in different colors I would have (as far as I know) to upload a different dataset for each party! It goes without saying that this is unnecessarily burdensome.
Many Eyes is also very impressive, with more advanced visualization plots. It is java based, and does not play so nicely in firefox at the mac, unfortunately. I also had problems with the ideal points dataset there. It doesn’t allow one to create a scatter plot with only two variables (!!!) requesting a third to be displayed as the size of the symbols.
The focus on bar charts on both platforms is also annoying… dotplots and boxplots would be nice.
There is another site data360, but I didn’t have much luck. It is more “professional”, allowing one to pull data directly from the web automatically. But its focus on time series data makes it simply unusable for the example I tried.
Of course, all three are just beginning and we might be hearing a lot about them soon. And the graphs they make are not half bad. Not until you consider rich chart live, that is. This one is ugly! And I mean, it makes you miss excel kind of ugly:

Electoral Data on the TSE site
The Tribunal Superior Eleitoral has at their [site](http://www.tse.gov.br/) electoral data for elections at the local and national level since 1994. Despite the data being archived in the horrible Microsoft Access format, it is very easy to extract and manipulate it in your program of choice.
Unfortunately they moved to a web only service since the 2002 elections, making it much harder to get all the data. We emailed them recently and fortunately enough they still use internally MS Acess, and were kind enough to send us a cd-rom with the data. It is a huge file, and I am still working out the wrinkles before I post a processed version in our data section. Nevertheless, if someone needs it before them, just email me and we can arrange something. I cannot post the access files before we move to a new server, since it is a 122mb zip file! If you want to donate to this noble cause (our own server account), click on the donate button in the side bar. You will have our most sincere thanks… and plenty of bandwidth and storage for Brazilian political data!
[Incidentally, I would like to add that the site is going well. We had more than 500 unique visitors in October, and we already show up as the first link if you [google brazilian politics](http://www.google.com/search?hl=en&q=brazilian+politics&btnG=Google+Search)! ]
On the other hand, although (very) buggy, the web tools are kind of neat. If you haven’t checked it out already, look at maps and text reports about the distribution of votes for the winning candidates across each state [here](http://www.tse.gov.br/partidos/principal.html) (see a screenshot below). You can play around with the layers and colors interactively (they use Adobe’s implementation of a new graphic format called svg. You have to download the viewer here
to use the tools.

They also have now electoral data at the “zona eleitoral” (electoral subdivision of municipalities) level, but I don’t have the data base for it yet. Anyway, very cool if you want to explore the electoral patterns for specific candidates with no gis knowledge required.
No commentsAnalyzing the Brazilian Supreme Court
It was almost like a soccer match. The most popular brazilian politics bloggers (in Portuguese, of course) NOBLAT and Josias de Souza
broadcasted vote by vote an important decision in the Brazilian
Supreme Court. Federal Deputy Jose Dirceu, former President of the PT,
Chief of Staff (and at times called “Prime Minister”) in the Lula
government, was trying to stop the ongoing expulsion procedure in the
Camara.
The episode illustrates a broader pattern. Supreme Courts have
been performing an increasingly important role in economic and
political issues in contemporary latin american politics. However, our
understanding of how decisions are cast in these bodies is quite
incipient. Not much data have been assembled in a easily accessible
form, and our theoretical understanding is in an even worse shape.
This is the main reason Antonio Pedro Ramos and I embarked on a project to estimate the ideal points of the Brazilian Supreme Court Justices. Our objectives are twofold: a) to what extent do ideal point models account for the votes cast in the STF? b) How are the preferences of outside actors, in particular the Executive, related to those of the Supreme Court Justices?
2 commentsLinking IBGE to TSE databases
As promised, I have been working on linking TSE (electoral) data to IBGE (maps, census, etc) data by using municipalities names. They are very far from consistent, so I created some adhoc programs in Stata that seem to work quite well. The strategy was to strip all accents and “extra” words like “de”. “da”, state names, and merge using the processed names state by state.
2 comments

