cluelessresearch.com

political methodology, brazilian politics, etc.

Archive for the 'Graphs' Category

Regression plots

I am writing a paper with a coauthor promoting the use of graphs instead of tables in political science. We did some research on the current use of tables and graphs and found out that a substantial proportion of the tables is devoted to the display of regression results.

So we thought that creating graphs to display regression tables was essential to our task. Thais to turn, this:

into a nice graph.

We are currently revising the paper for (re)submission, and are still undecided on how to display such graphs. Here are some of the several revisions, from one of the first (back in November)

This one from later in the same month

And the two I am currently looking, taking out the boxes aroud the plots. This one is minimalist:

And the next one has the x-axis repeated in each plot:

My coauthor thinks the boxes are necessary, I cite Tufte over and over and say they aren’t. I think we will have to arrange an intercontinental boxing match to settle the issue.

No comments

New maps

Jos compains of my 1995 technique to create the animations (animated gifs), and the lack of interactivity. Perhaps this flash(y) version will be of greater appeal to him.

The whole 1982-2006 period is posted. If you pay attention, the map changes slightly to reflect changes in the distribution of seats across the nation. E.g. the creation of Tocantins following the 1988 Constitution (1990 map), or the increase in the number of deputies elected from São Paulo in 1994.

1 comment

Spatial distribution of Parties in Brazil

I’ve been collecting Brazilian electoral data for my dissertation for some time, and have always wondered about how to display the somewhat massive data available in an efficient manner. Take the best case scenario: 27 districts (states) x 4 elections (since 1994) x 7 largest parties=756 data points. This is a lot of numbers to look at in a table! Imagine using aggregate data at the municipality level: 5000 x 4 x 7! No, you do the math…

Maps, of course, is one way to display the data. The major problem then is that the widely varying population density in Brazil would produce a misleading map of the voting distribution across the country. That is the reason I am considering using cartograms, as discussed in the last post (link). Displayed below is a whole set of cartograms displaying data for the Câmara dos Deputados for the past four elections. The idea now is that areas in the cartogram should be proportional to the number of seats assigned to each district (which in Brazil are the states.) Given the high degree of malapportionment, the cartogram looks somewhat different from the one based on population or vote totals we presented previously. The time dimension is presented as a movie, so it is easy to follow the spatial distribution of seats for each party throughout the recent elections.

animation.gif

Now I only have to figure out how to put this in paper format…

No comments

Cartogram for the 2006 election, 2nd round

The Brazilian electoral court (TSE - Tribunal Superior Eleitoral) has finally posted the 2006 elections results in a format suitable for researchers. This past week I got the data in shape for analysis in my dissertation and decided it was a good time to do some charts. As usual, the plots were done in R, this time using the maptools package.

The second round was a landslide in favour of the incumbent, Lula da Silva, from the PT (Workers’ Party). He got around 61% of the votes, while Geraldo Alckmin got 39%.

Perhaps more interesting is the spatial distribution of the votes. The individual units in the map are what the Brazilian Geographic and Statistical Institute (IBGE) calls “mesoregions”, but the original data is by municipality and electoral zone.

Original projection

It is noticeable how the Northeast is overwhelmingly red, indicating Lula won there by extremely wide margins. On the other hand, margins were much thinner in the south, in the center-west and in São Paulo.

I’ve always been dissatisfied with maps like this, since it overrepresents areas such as the west of the country, where the population (and therefore vote) density is much lower than in the coast. Ditto for country areas versus the big cities. Yet, the geographical representation allows us to grasp the overall pattern and correlate with facts that we know. For example, the northeast is much poorer than the south, so we immediately recognize that Lula did worse in richer areas.

Cartograms are a way to “correct” the overrepresentation of low density areas. By correction, of course, we mean distorting, but that is the whole point of the procedure. For voting and other social science data, geographic distance is just as arbitrary. Gastner and Newman invented one method to produce cartograms that seem to work very well in practice (paper here.) The original software was written in C, but there is a java version by Frank Hardisty which I used, since it uses shapefiles as input and output. Click here to take a look at maps for the 2004 US election.

Cartogram

Most of the Brazilian west is dramatically shrunk, while the big cities (particularly São Paulo) are several times blown up. In fact, I find it particularly helpful in showing the votes in the big metropolitan areas, and comparing it to areas in the country side. Although interesting, I wonder if the cartogram is too distorted to be useful, and would be interested in hearing other opinions.

No comments

Conditioning

Brian Mulloy, one of the founders of Swivel, wrote a nice comment on my post explaining how Swivel is in fact able to condition on data categories. For example, if you want to do a graph highlighting a particular category, or even using only data from a particular category, you are able to. The process has to start in the dataset view.

Dmitry Dimov calory and Dmitry Dimov costs

See his comment for the full explanation.

I guess I have to spend more time on it, but it still doesn’t seem to be able to do what I want. I downloaded my own data in csv format and created a couple of figures using the ggplot package in R. I don’t expect Swivel to have the same flexibility, since its objectives are very different from those of an academic statistical software. However, I don’t see why in the not so distant future something like this would be possible in a web application.

color by party

color by party, one plot per state

The code:

Read more

No comments

Nike+

After losing my dear last gen ipod shuffle in the plane, I “had” to buy another ipod. I decided to get a nano, and on a whim got the nike+ sensor since I decided to start running again. In theory you should buy special nike sneakers that have a place in the sole to put the sensor in.

Since I usually don’t have $100 lying around, I spent $30 on a new balance I found on sale, and got this totally geeky thing that includes a velcro pouch you that you can put in your shoe laces.

So, there I went, first at the treadmill in the gym. After some calibration, and tying the shoes a little tighter, every thing was fine. Very precise instrument, for the price. One I get back to Rochester in the weekend, I decide to run out for that nice run on the snow (no gym membership there) and, to my surprise, the sensor stops working… After some experimentation, I am pretty sure the thing doesn’t like the cold. (Who can blame it, really?) With the actual nike sneakers it wouldn’t be much of a problem, since it should be fairly warm inside your shoes… then again, $100 for a pair of shoes…

This pretty much made it impossible for me to complete my goal set at the Nike+ website, 60 miles in a month. I didn’t actually run 60 miles, more like 45, but only 6 runs were recorded, as displayed in this incredibly junky chart:

Nike Plus runs

Which brings us to my second point in this post. I am not a fan of bar charts in general, but this one takes the cake. Note how the bars start at -1 !!! Amazing, you don’t even start and you have already ran a mile… take that as a moral booster!

In any case, it does allow one to pull the data and display it in all its (text) glory in the sidebar that you should be able to see in the right. I used the wordpress plugin Nike+ stats, in case you are wondering.

No comments

Data Visualization for the Masses

There are a few websites/startup companies trying to fly the idea of being a repository and visualization engine for data that anyone can upload. Swivel, for example, made some splash in the past few months as the “youtube for data”.

I experimented with Swivel and a few others. The main problem with all of them is the lack of tools allowing conditioning, at least in an easy way. Conditioning is important for constructing small multiple plots, or even plotting groups in different colors in scatter plot. For example, take the ideal point data I uploaded:

2nd Dimension by 1st Dimension

In order to different parties in different colors I would have (as far as I know) to upload a different dataset for each party! It goes without saying that this is unnecessarily burdensome.

Many Eyes is also very impressive, with more advanced visualization plots. It is java based, and does not play so nicely in firefox at the mac, unfortunately. I also had problems with the ideal points dataset there. It doesn’t allow one to create a scatter plot with only two variables (!!!) requesting a third to be displayed as the size of the symbols.




The focus on bar charts on both platforms is also annoying… dotplots and boxplots would be nice.

There is another site data360, but I didn’t have much luck. It is more “professional”, allowing one to pull data directly from the web automatically. But its focus on time series data makes it simply unusable for the example I tried.

Of course, all three are just beginning and we might be hearing a lot about them soon. And the graphs they make are not half bad. Not until you consider rich chart live, that is. This one is ugly! And I mean, it makes you miss excel kind of ugly:

1 comment

Dotplots for Regression tables

Continuing our project on tables to graphs, I am writing a function with the objective of taking the regression table where multiple models are displayed and turning it into a graph.

The way it is set up right now is as follows. The user supplies a list with the estimates, and (possibly two) confidence intervals. I will write a function that gets these values from regression models for the user convenience.

Vdotplot(Y,label.x=label.vec.model)

Read more

1 comment

From tables to graphs

I embarked on a paper project with John Kastellec to document the use of statistical tables versus graphs in political science and provide examples about how many (even most) of the tables can be transformed into graphs. That is, we aim to do for a political science audience what Andrew Gelman and coauthors did for statistics.

The main reason we use tables is because, well, they are a hundred times (or more) easier to produce. In that respect, our plan is to provide code showing how we did the examples and also provide functions that transforms what we identified as most frequent kinds of tables into (hopefully beautiful) graphs. We aim, therefore, to make it easy the translation from tables to graphs.

For the time being, the code and examples will be hosted here at brazilianpolitics.org. My first contribution is a function that takes a matrix correlations and produces a image plot like the figure 8 in Zhengy, Salganikz and Gelman “How many people do you know in prison?: Using overdispersion in count data to estimate social structure in networks”, JASA 101(474): 409-423.

(links to code and graphs always point to the latest version, and are continually updated. use it at your own risk.)

Read more

Comments are off for this post

Presidential Elections Page

I created a page to keep track of the 2006 Presidential elections polls (with pretty graphs!) I plan to keep it updated on a weekly basis. Comment
on the plot and other suggestions welcome in this post.

files:

data (stata format)

code (R, using ggplot)

1 comment