cluelessresearch.com

political methodology, brazilian politics, etc.

Archive for the 'R' Category

QuickR

There is a new (I think) website for learning R that looks pretty decent: Quick-R(http://www.statmethods.net/). It was created by Robert Kabacoff, whom I had the pleasure to meet several months ago. We discussed R briefly at that time and he was just getting into it. Apparently he has been busy! The intended audience are users of SAS/SPSS/STATA transitioning to R. If that fits your bill, go ahead and take a look. If it doesn’t and you are already an experienced users, I am sure there are more than a couple of people you can point the website to.

No comments

Regression plots

I am writing a paper with a coauthor promoting the use of graphs instead of tables in political science. We did some research on the current use of tables and graphs and found out that a substantial proportion of the tables is devoted to the display of regression results.

So we thought that creating graphs to display regression tables was essential to our task. Thais to turn, this:

into a nice graph.

We are currently revising the paper for (re)submission, and are still undecided on how to display such graphs. Here are some of the several revisions, from one of the first (back in November)

This one from later in the same month

And the two I am currently looking, taking out the boxes aroud the plots. This one is minimalist:

And the next one has the x-axis repeated in each plot:

My coauthor thinks the boxes are necessary, I cite Tufte over and over and say they aren’t. I think we will have to arrange an intercontinental boxing match to settle the issue.

No comments

Approximate matching in R

One frequent problem when working with data from multiple sources is how to match names that, at best, are approximately equal. Typical examples include matching country names from multiple data sets and matching political candidate names from electoral and legislative sources.

For country names, it is not a big deal, since the datasets at most consist of two hundred or so names, but even then the task can be boring and prone to error. For electoral candidates the problems escalates quickly. In Brazilian politics, for example, one has to match the 513 elected legislators in the lower chamber to the 5000+ candidates. What to do then? Hire monkeys undergraduate assistants? Outsource to India?

A better (and cheaper) way is to use the computer to do the grunt work for you. Again, R comes to the rescue, this time with the agrep function. (authored by David Meyer, based on C code by Jarkko Hietaniemi; with modifications by Kurt Hornik.)

Agrep by itself doesn’t help much, but help is only some useful little wrappers away. One function, agrep.match [link], does the following: a) sets names to lower case and kills multiple white spaces; b) given these transformations, matches exactly; c) with what hasn’t matched, matches approximately with a decreasing threshold of “aproximateness” (is that even a word? it is like so in the R help file); d) returns the indexes with matched and unmatched names and the corresponding thresholds used.

See example of usage in this pdf document.

Note that both links are to my current and constantly updated development version. It might break or already be broken. No warranties are made, yada yada yada. The development is in its early stages, but seems to work. If anyone has comments, suggestions or bug reports, drop me an eline (e.leoni AT gmail DOT com).

No comments

New maps

Jos compains of my 1995 technique to create the animations (animated gifs), and the lack of interactivity. Perhaps this flash(y) version will be of greater appeal to him.

The whole 1982-2006 period is posted. If you pay attention, the map changes slightly to reflect changes in the distribution of seats across the nation. E.g. the creation of Tocantins following the 1988 Constitution (1990 map), or the increase in the number of deputies elected from São Paulo in 1994.

1 comment

Spatial distribution of Parties in Brazil

I’ve been collecting Brazilian electoral data for my dissertation for some time, and have always wondered about how to display the somewhat massive data available in an efficient manner. Take the best case scenario: 27 districts (states) x 4 elections (since 1994) x 7 largest parties=756 data points. This is a lot of numbers to look at in a table! Imagine using aggregate data at the municipality level: 5000 x 4 x 7! No, you do the math…

Maps, of course, is one way to display the data. The major problem then is that the widely varying population density in Brazil would produce a misleading map of the voting distribution across the country. That is the reason I am considering using cartograms, as discussed in the last post (link). Displayed below is a whole set of cartograms displaying data for the Câmara dos Deputados for the past four elections. The idea now is that areas in the cartogram should be proportional to the number of seats assigned to each district (which in Brazil are the states.) Given the high degree of malapportionment, the cartogram looks somewhat different from the one based on population or vote totals we presented previously. The time dimension is presented as a movie, so it is easy to follow the spatial distribution of seats for each party throughout the recent elections.

animation.gif

Now I only have to figure out how to put this in paper format…

No comments

Cartogram for the 2006 election, 2nd round

The Brazilian electoral court (TSE - Tribunal Superior Eleitoral) has finally posted the 2006 elections results in a format suitable for researchers. This past week I got the data in shape for analysis in my dissertation and decided it was a good time to do some charts. As usual, the plots were done in R, this time using the maptools package.

The second round was a landslide in favour of the incumbent, Lula da Silva, from the PT (Workers’ Party). He got around 61% of the votes, while Geraldo Alckmin got 39%.

Perhaps more interesting is the spatial distribution of the votes. The individual units in the map are what the Brazilian Geographic and Statistical Institute (IBGE) calls “mesoregions”, but the original data is by municipality and electoral zone.

Original projection

It is noticeable how the Northeast is overwhelmingly red, indicating Lula won there by extremely wide margins. On the other hand, margins were much thinner in the south, in the center-west and in São Paulo.

I’ve always been dissatisfied with maps like this, since it overrepresents areas such as the west of the country, where the population (and therefore vote) density is much lower than in the coast. Ditto for country areas versus the big cities. Yet, the geographical representation allows us to grasp the overall pattern and correlate with facts that we know. For example, the northeast is much poorer than the south, so we immediately recognize that Lula did worse in richer areas.

Cartograms are a way to “correct” the overrepresentation of low density areas. By correction, of course, we mean distorting, but that is the whole point of the procedure. For voting and other social science data, geographic distance is just as arbitrary. Gastner and Newman invented one method to produce cartograms that seem to work very well in practice (paper here.) The original software was written in C, but there is a java version by Frank Hardisty which I used, since it uses shapefiles as input and output. Click here to take a look at maps for the 2004 US election.

Cartogram

Most of the Brazilian west is dramatically shrunk, while the big cities (particularly São Paulo) are several times blown up. In fact, I find it particularly helpful in showing the votes in the big metropolitan areas, and comparing it to areas in the country side. Although interesting, I wonder if the cartogram is too distorted to be useful, and would be interested in hearing other opinions.

No comments

Conditioning

Brian Mulloy, one of the founders of Swivel, wrote a nice comment on my post explaining how Swivel is in fact able to condition on data categories. For example, if you want to do a graph highlighting a particular category, or even using only data from a particular category, you are able to. The process has to start in the dataset view.

Dmitry Dimov calory and Dmitry Dimov costs

See his comment for the full explanation.

I guess I have to spend more time on it, but it still doesn’t seem to be able to do what I want. I downloaded my own data in csv format and created a couple of figures using the ggplot package in R. I don’t expect Swivel to have the same flexibility, since its objectives are very different from those of an academic statistical software. However, I don’t see why in the not so distant future something like this would be possible in a web application.

color by party

color by party, one plot per state

The code:

Read more

No comments

Stata’s outreg equivalent for R

[Update 4/9/2008 -- A comment below suggests using the R package memisc, function mtable. It seems to be a useful package overall (much more stuff than just tables), and the tables it generates look pretty good. I didn't know about it, thanks for pointing it out.]

A nice feature of STATA is the large number of ado files helping in the creation of tables of coefficient estimates that you can cut and paste into your (yuck!) Word document, or much more elegantly, produce LaTeX code for your table that you can include in your LaTeX document. R has a somewhat similar feature with the package xtable, but it currently lacks the ability of producing a single table in which the columns have the results from different models or specifications.

Some time ago Ajay Narottam Shah published some code at the R-help list to do just that. I took just part of it and tweaked a bit. You give it a matrix of coefficients and a matrix of standard errors and it produces the latex code.

Here it is (link):

latex.table <- function(coef.mat,se.mat,digits=3,table.command=TRUE) {
nc <- ncol(coef.mat)
coef.mat <- round(coef.mat,3)
se.mat <- round(se.mat,3)
text.now <- NULL
if (table.command) {
text.now <- c(text.now,"\\begin{table}\n")
text.now <- c(text.now,"\\centering\n")
}
text.now <- c(text.now,"\\begin{tabular}[R]{",rep("c",nc+1),"}\n")
text.now <- c(text.now,"\\hline\n")
for (j in 1:ncol(coef.mat)) {
text.now <- c(text.now," & ", colnames(coef.mat)[j])
}
text.now <- c(text.now,"\\\\\n\\hline\n")
for (i in 1:nrow(coef.mat)) {
##print coef estimates
text.now <- c(text.now,rownames(coef.mat)[i])
for (j in 1:ncol(coef.mat)) {
if (is.na(coef.mat[i,j])) {
text.now <- c(text.now," & ")
} else {
text.now <- c(text.now," & ", coef.mat[i, j])
}
}
text.now <- c(text.now,"\\\\\n")
## print SEs
for (j in 1:ncol(coef.mat)) {
if (is.na(se.mat[i,j])) {
text.now <- c(text.now," & ")
} else {
text.now <- c(text.now," & ", sprintf("(%s)", se.mat[i,j]))
}
}
text.now <- c(text.now,"\\\\[1mm]\n")
}
text.now <- c(text.now,"\\\\\n")
text.now <- c(text.now,"\\hline")
text.now <- c(text.now,"\n")
text.now <- c(text.now,"\\end{tabular}\n")
if (table.command) text.now <- c(text.now,"\\end{table}\n")
paste(text.now,collapse="")
}

So, for

tmp.estimates
cluster jack-knife lmer lmerMcmc edvreg
iquality -0.503 -0.503 -0.414 -0.413 -0.454
iqualityrep 0.030 0.030 0.019 0.019 0.019
gdppc 0.022 0.022 0.024 0.023 0.047

and

tmp.se

cluster jack-knife lmer lmerMcmc edvreg
1 0.1230 0.1623 0.1593 0.213 0.1905
2 0.0056 0.0065 0.0092 0.012 0.0092
3 0.0149 0.0280 0.0327 0.042 0.0634

latex.table(tmp.estimates,tmp.se,table.command=TRUE)

"\\begin{table}\n\\centering\n\\begin{tabular}[R]{cccccc}\n\\hline\n & cluster & jack-knife & lmer & lmerMcmc & edvreg\\\\\n\\hline\niquality & -0.503 & -0.503 & -0.414 & -0.413 & -0.454\\\\\n & (0.123) & (0.162) & (0.159) & (0.213) & (0.19)\\\\[1mm]\niqualityrep & 0.03 & 0.03 & 0.019 & 0.019 & 0.019\\\\\n & (0.006) & (0.006) & (0.009) & (0.012) & (0.009)\\\\[1mm]\ngdppc & 0.022 & 0.022 & 0.024 & 0.023 & 0.047\\\\\n & (0.015) & (0.028) & (0.033) & (0.042) & (0.063)\\\\[1mm]\n\\\\\n\\hline\n\\end{tabular}\n\\end{table}\n”

which you can dump to a file as:


cat(latex.table(tmp.estimates,tmp.se,table.command=TRUE),file="table.tex")

producing

table

Nifty!

2 comments