cluelessresearch.com

political methodology, brazilian politics, etc.

Archive for the 'Statistics' Category

Random Coefficients: Monte Carlo Experiments

[edited 9/10/06]

Neal Beck and Jonathan Katz have a new paper coming out at Political Analysis. As much as I tried, I could not replicate their results following the very simple data generating process they describe in the article. Keep reading if you want to know why.

The DGP in the article:

[tex] \beta=5; \sigma^2_x=.01; \sigma^2_e=1; \gamma=1.8 [/tex]

[tex] x_{i,t} \sim N(0,\sigma^2_x) [/tex]

[tex] \beta_i \sim N(\beta,\gamma^2) [/tex]

[tex] y_{i,t}=\beta_i x_{i,t}+e_{i,t} [/tex]

When the number of observations [tex]N[/tex] and the number of periods [tex]T[/tex] are both 20, for example, they show a marked difference in efficiency between the pooled least squares estimator and the maximum likelihood random coefficient model they advocate (Pinheiro and Bates NLME). However, in my attempt of replicating their results, no such thing happened. Why?

I decided to ask Neal for replication files, who then told me to email Jonathan. It turns out that they actually drew x from :

[tex] x_{i,t} \sim N(1,\sigma^2_x) [/tex]

(They also did not hold [tex]x[/tex] fixed across simulations as they claim, but that is minor.)

Can that be it? I thought long about it and then it hit me. The interactive model makes the original scale and location important. To see why, let’s rewrite the model for [tex]\beta[/tex] in deviated form:

[tex] u_i=\beta_i - \beta [/tex]

[tex] y_{i,t}=\beta x_{i,t}+ u_i x_{i,t} + e_{i,t} [/tex]

Now suppose that the DGP for [tex]x[/tex] is now the original [tex]x[/tex] +1. Does it change anything besides the constant? Let’s call the new [tex]y[/tex] [tex]\hat y[/tex]

[tex] \hat y_{i,t}=\beta (x_{i,t}+1) + u_i (x_{i,t}+1) + e_{i,t} [/tex]

[tex] \hat y_{i,t}=\beta+ \beta x_{i,t} + u_i x_{i,t} + u_i + e_{i,t} [/tex]

The difference between the two DGPs is.

[tex] \hat y_{i,t}-y_{i,t}=\beta + u_i [/tex]

Not so suddenly we have what looks like _random intercepts_ since [tex]u_i [/tex]appears by itself. In other words, the random coefficient can be decomposed into a random intercept part plus a random coefficient part.

Does demeaning then solve the issue? Of course not! Since we are not in control of the data generating process, demeaning whatever x we actually have as data causes a shift in the intercept and nothing more. The key to this fact is that [tex]u_i x_{i,t}[/tex] is unobservable, completely outside our control.

Key lessons:

a) Models with random slopes should be compared to models with random intercepts.

b) Interaction models keep fooling the best of us.

c) When compared to a random intercepts model (aka random effects), the MLE random coefficient models increase in efficiency would be small in the B&K experiments.

No comments

Estimated dependent variable regression

In a recent Political Analysis special issue (Volume 13 Number 1, Winter 2005) there is the suggestion of estimating “two level models” for cross-country survey data in two steps. Where the first step is a within country regression and the second steps regresses the estimated coefficients of the first step (say, the intercepts) on country level covariates.

Jeff Lewis kindly provided me his ancient (2000) stata code and I corrected a very small bug and included an option for Borjas weights.

Jeff Lewis and Linzer article:

Lewis, J.B. & Linzer, D.A. Estimating Regression Models in Which the Dependent Variable Is Based on Estimates Political Analysis, 2005

and the (very similar) Borjas weights used in my article (with John Huber and Georgia Kernell) in the same issue:

Huber, J.D.; Kernell, G. & Leoni, E.L. Institutional Context, Cognitive Resources and Party Attachments Across Democracies Political Analysis, 2005, 13, 365-386

See also

Borjas, G.J. & Sueyoshi, G.T. A two-stage estimator for probit models with structural group effects Journal of Econometrics, 1994, 64, 165-182

Hanushek, E.A. Efficient Estimators for Regressing Regression Coefficients American Statistician, 1974, 28, 66-67

[edit 3/27/2007: fixed links]

stata ado file

stata example using simulated data

Read more

Comments are off for this post

Total effects

Question for the stats experts:

There are three equations, with the same set of independent variables (which we combine in
the matrix X.)
[tex] risk1=X\gamma+e_1 [/tex]

[tex] risk2=X\zeta+e_2 [/tex]

[tex] cc=X\beta+\delta_1 risk1 + \delta_2 risk2 + e_3 [/tex]

we can rewrite cc as:

[tex] cc=X\beta+\delta_1 (X\gamma+e_1) + \delta_2 (X\zeta+e_2) + e_3 [/tex]

[tex] =X\beta+\delta_1 (X\gamma+e_1) + \delta_2 (X\zeta+e_2) + e_3 [/tex]

[tex] =X\beta+X(\delta_1\gamma) + X(\delta_2\zeta) +e_1\delta_1+e_2\delta_2+ e_3 [/tex]

Assume that the covariances between [tex]e_1,e_2[/tex] and [tex]e_3[/tex] are all zero and let [tex]r=e_1\delta_1+e_2\delta_2+ e_3[/tex]. Then we have the following equation

[tex] cc=X(\beta+\delta_1\gamma+\delta_2\gamma)+r [/tex]

Can this be estimated by least squares? Does it yield an estimate of the “total effect” as refered to in the path analysis/structural equation modelling literature?

2 comments

« Previous Page