Hairy Toad
I wonder if it is a common practice in other countries for the big press to call the president names like “sapo barbudo” (hairy toad). Or is it just a sign of prejudice against an uneducated president born in the poor northeast?
3 commentsConditioning
Brian Mulloy, one of the founders of Swivel, wrote a nice comment on my post explaining how Swivel is in fact able to condition on data categories. For example, if you want to do a graph highlighting a particular category, or even using only data from a particular category, you are able to. The process has to start in the dataset view.
See his comment for the full explanation.
I guess I have to spend more time on it, but it still doesn’t seem to be able to do what I want. I downloaded my own data in csv format and created a couple of figures using the ggplot package in R. I don’t expect Swivel to have the same flexibility, since its objectives are very different from those of an academic statistical software. However, I don’t see why in the not so distant future something like this would be possible in a web application.
The code:
No commentsStata’s outreg equivalent for R
[Update 4/9/2008 -- A comment below suggests using the R package memisc, function mtable. It seems to be a useful package overall (much more stuff than just tables), and the tables it generates look pretty good. I didn't know about it, thanks for pointing it out.]
A nice feature of STATA is the large number of ado files helping in the creation of tables of coefficient estimates that you can cut and paste into your (yuck!) Word document, or much more elegantly, produce LaTeX code for your table that you can include in your LaTeX document. R has a somewhat similar feature with the package xtable, but it currently lacks the ability of producing a single table in which the columns have the results from different models or specifications.
Some time ago Ajay Narottam Shah published some code at the R-help list to do just that. I took just part of it and tweaked a bit. You give it a matrix of coefficients and a matrix of standard errors and it produces the latex code.
Here it is (link):
latex.table <- function(coef.mat,se.mat,digits=3,table.command=TRUE) {
nc <- ncol(coef.mat)
coef.mat <- round(coef.mat,3)
se.mat <- round(se.mat,3)
text.now <- NULL
if (table.command) {
text.now <- c(text.now,"\\begin{table}\n")
text.now <- c(text.now,"\\centering\n")
}
text.now <- c(text.now,"\\begin{tabular}[R]{",rep("c",nc+1),"}\n")
text.now <- c(text.now,"\\hline\n")
for (j in 1:ncol(coef.mat)) {
text.now <- c(text.now," & ", colnames(coef.mat)[j])
}
text.now <- c(text.now,"\\\\\n\\hline\n")
for (i in 1:nrow(coef.mat)) {
##print coef estimates
text.now <- c(text.now,rownames(coef.mat)[i])
for (j in 1:ncol(coef.mat)) {
if (is.na(coef.mat[i,j])) {
text.now <- c(text.now," & ")
} else {
text.now <- c(text.now," & ", coef.mat[i, j])
}
}
text.now <- c(text.now,"\\\\\n")
## print SEs
for (j in 1:ncol(coef.mat)) {
if (is.na(se.mat[i,j])) {
text.now <- c(text.now," & ")
} else {
text.now <- c(text.now," & ", sprintf("(%s)", se.mat[i,j]))
}
}
text.now <- c(text.now,"\\\\[1mm]\n")
}
text.now <- c(text.now,"\\\\\n")
text.now <- c(text.now,"\\hline")
text.now <- c(text.now,"\n")
text.now <- c(text.now,"\\end{tabular}\n")
if (table.command) text.now <- c(text.now,"\\end{table}\n")
paste(text.now,collapse="")
}
So, for
tmp.estimates
cluster jack-knife lmer lmerMcmc edvreg
iquality -0.503 -0.503 -0.414 -0.413 -0.454
iqualityrep 0.030 0.030 0.019 0.019 0.019
gdppc 0.022 0.022 0.024 0.023 0.047
and
tmp.se
cluster jack-knife lmer lmerMcmc edvreg
1 0.1230 0.1623 0.1593 0.213 0.1905
2 0.0056 0.0065 0.0092 0.012 0.0092
3 0.0149 0.0280 0.0327 0.042 0.0634
latex.table(tmp.estimates,tmp.se,table.command=TRUE)
"\\begin{table}\n\\centering\n\\begin{tabular}[R]{cccccc}\n\\hline\n & cluster & jack-knife & lmer & lmerMcmc & edvreg\\\\\n\\hline\niquality & -0.503 & -0.503 & -0.414 & -0.413 & -0.454\\\\\n & (0.123) & (0.162) & (0.159) & (0.213) & (0.19)\\\\[1mm]\niqualityrep & 0.03 & 0.03 & 0.019 & 0.019 & 0.019\\\\\n & (0.006) & (0.006) & (0.009) & (0.012) & (0.009)\\\\[1mm]\ngdppc & 0.022 & 0.022 & 0.024 & 0.023 & 0.047\\\\\n & (0.015) & (0.028) & (0.033) & (0.042) & (0.063)\\\\[1mm]\n\\\\\n\\hline\n\\end{tabular}\n\\end{table}\n”
which you can dump to a file as:
cat(latex.table(tmp.estimates,tmp.se,table.command=TRUE),file="table.tex")
producing
Nifty!
2 commentsR in OS X - Making quartz device work from terminal or Emacs
buggy, but works! I was getting really tired of X11 in OS X…
You need the apple developer tools installed (comes in the Tiger DVD, or can be downloaded from Apple Developer Connection)
install.packages("CarbonEL",,'http://rforge.net/',type='source')
library(CarbonEL)
and then
quartz()
opens a graphic window in OS X.
No commentsNike+
After losing my dear last gen ipod shuffle in the plane, I “had” to buy another ipod. I decided to get a nano, and on a whim got the nike+ sensor since I decided to start running again. In theory you should buy special nike sneakers that have a place in the sole to put the sensor in.
Since I usually don’t have $100 lying around, I spent $30 on a new balance I found on sale, and got this totally geeky thing that includes a velcro pouch you that you can put in your shoe laces.
So, there I went, first at the treadmill in the gym. After some calibration, and tying the shoes a little tighter, every thing was fine. Very precise instrument, for the price. One I get back to Rochester in the weekend, I decide to run out for that nice run on the snow (no gym membership there) and, to my surprise, the sensor stops working… After some experimentation, I am pretty sure the thing doesn’t like the cold. (Who can blame it, really?) With the actual nike sneakers it wouldn’t be much of a problem, since it should be fairly warm inside your shoes… then again, $100 for a pair of shoes…
This pretty much made it impossible for me to complete my goal set at the Nike+ website, 60 miles in a month. I didn’t actually run 60 miles, more like 45, but only 6 runs were recorded, as displayed in this incredibly junky chart:
Which brings us to my second point in this post. I am not a fan of bar charts in general, but this one takes the cake. Note how the bars start at -1 !!! Amazing, you don’t even start and you have already ran a mile… take that as a moral booster!
In any case, it does allow one to pull the data and display it in all its (text) glory in the sidebar that you should be able to see in the right. I used the wordpress plugin Nike+ stats, in case you are wondering.
No commentsPolitical Reform
The never ending ongoing political reform debate is a never ending source of amusement and befuddlement for anyone with even paltry knowledge about the effects of political institutions. Fernando Rodrigues, for example, claims that the system as it is, which is to say as dysfunctional as it is, is better than most proposed reforms. Although I do not agree with the specifics of his arguments, he is probably correct in the overall assessment.
Case in point, the proposed reforms that the Câmara president, Arlindo Chinaglia, wants to start discussing on the floor in the next week or so. It proposes to change to a closed list proportional electoral system. Comparative scholars everywhere know that such a system is purported to increase the roll call discipline of legislators. The mechanism is simple, legislators that do not behave in accordance to the political party recommendations risk not being placed at or near the top of the list in the next election.
And herein lies is the Brazilian twist: in the proposed reform, legislators running for reelection are placed at the top of the list by default! plus ça change …
Read on for the relevant excerpt from the bill (in Portuguese.)
No commentsHelp Desk - Introducing the Book
Someone I helped this week at work sent me this video. Very funny. I think this is Norwegian, but it is subtitled.
No commentsData Visualization for the Masses
There are a few websites/startup companies trying to fly the idea of being a repository and visualization engine for data that anyone can upload. Swivel, for example, made some splash in the past few months as the “youtube for data”.
I experimented with Swivel and a few others. The main problem with all of them is the lack of tools allowing conditioning, at least in an easy way. Conditioning is important for constructing small multiple plots, or even plotting groups in different colors in scatter plot. For example, take the ideal point data I uploaded:
In order to different parties in different colors I would have (as far as I know) to upload a different dataset for each party! It goes without saying that this is unnecessarily burdensome.
Many Eyes is also very impressive, with more advanced visualization plots. It is java based, and does not play so nicely in firefox at the mac, unfortunately. I also had problems with the ideal points dataset there. It doesn’t allow one to create a scatter plot with only two variables (!!!) requesting a third to be displayed as the size of the symbols.
The focus on bar charts on both platforms is also annoying… dotplots and boxplots would be nice.
There is another site data360, but I didn’t have much luck. It is more “professional”, allowing one to pull data directly from the web automatically. But its focus on time series data makes it simply unusable for the example I tried.
Of course, all three are just beginning and we might be hearing a lot about them soon. And the graphs they make are not half bad. Not until you consider rich chart live, that is. This one is ugly! And I mean, it makes you miss excel kind of ugly:

Statistics and Programming
Last month at Machine Learning there was a discussion about the creation of a machine learning department at Canergie Mellon University. The discussion of the post was fairly interesting, in particular this pearl by John Langford:
I regard ‘rogramming as the missing member of reading, ‘riting, and ‘rithmetic, and I’ve found a statistical understanding of the world genuinely valuable.
I couldn’t agree more. As social scientists, we need a much better training in programming if we are to partake and benefit from the high paced increase in computer power available. The question is, how can we properly train graduate students in the social sciences, who tend to start the program with very little knowledge or interest in programming?
It seems to me that the classes and books available miss badly the mark. They either assume a lot of programming experience (e.g S Programming by Venables and Ripley and Programming with Data by John Chambers) or focus too much on the statistics side of things without a proper discussion of the basic programming concepts.
I am certainly not alone in this assessment. Back in November 2006, Jan de Leew made the following suggestion in the StatCompute mailing list:
This is the title of a series of free computer programming
textbooks, started by Alan Downey. There are versions
for C++, Java, Logo, and Python.
…
Since the LaTeX for these books is also freely available,
it may not be too hard (and possibly quite useful) to
make an R/S version. What seems to be common practice is to edit
the original LaTeX and then add your name to the author list.
But should (most) social scientists learn how to program? This was a topic of an extended discussion at PolMeth . The main argument against it was that we should leave programming to the “pros”, i.e. the programmers at Stata/SAS/R. We perhaps shouldn’t trust computer code written by us plain social scientists. My own take is that a large chunk of data analysis is simply indistinguishable from programming. The problem is that it currently mostly done via a non-reproducible, error prone and downright ugly way. Even the most basic understanding of flow control, loops and data structures will be a quantum leap for most of the current statistical practice, at least in political science.
Therefore, something should be done to cover the gap. Luckily this is not only a problem for us lowly social scientists. Alan Downey, the author of “How to Think Like a Computer Scientist”, is currently finishing up a book on “Physical Modeling in MATLAB” which appears to have the same basic idea, although focused on a “real” science and a different language. It is a “free book” covered by the GNU Free Documentation License, so it is possible to adapt it (or even combine it with part of the “How to Think…” series) in order to make it more applicable to social scientists.
No comments





