Example 3: Weighted regression with subsets

Selecting the by option in zelig() partitions the data frame and then automatically loops the specified model through each partition. Suppose that mydata is a data frame with variables y, x1, x2, x3, and state, with state a factor variable with 50 unique values. Let's say that you would like to run a weighted regression where each observation is weighted by the inverse of the standard error on x1, estimated for that observation's state. In other words, we need to first estimate the model for each of the 50 states, calculate 1 / SE(x151#51 ) for each state 52#52 , and then assign these weights to each observation in mydata.

Estimate the model separate for each state using the by option in zelig():
```
z.out <- zelig(y ~ x1 + x2 + x3, by = "state", data = mydata, model = "ls")
```
Now z.out is a list of 50 regression outputs.

Extract the standard error on x1 for each of the state level regressions.

se <- array()                          # Initalize the empty data structure.
for (i in 1:50) {                      # vcov() creates the variance matrix
  se[i] <- sqrt(vcov(z.out[[i]])[2,2]) # Since we have an intercept, the 2nd 
}                                      # diagonal value corresponds to x1.

Create the vector of weights.
```
wts <- 1 / se
```
This vector wts has 50 values that correspond to the 50 sets of state-level regression output in z.out.
To assign the vector of weights to each observation, we need to match each observation's state designation to the appropriate state. For simplicity, assume that the states are numbered 1 through 50.
```
mydata$w <- NA            # Initalizing the empty variable
for (i in 1:50) { 
  mydata$w[mydata$state == i] <- wts[i]
}
```
We use mydata$state as the index (inside the square brackets) to assign values to mydata$w. Thus, whenever state equals 5 for an observation, the loop assigns the fifth value in the vector wts to the variable w in mydata. If we had 500 observations in mydata, we could use this method to match each of the 500 observations to the appropriate wts.
If the states are character strings instead of integers, we can use a slightly more complex version
```
mydata$w <- NA
idx <- sort(unique(mydata$state))
for (i in 1:length(idx) { 
  mydata$w[mydata$state == idx[i]] <- wts[i]
}
```

Now we can run our weighted regression:

z.wtd <- zelig(y ~ x1 + x2 + x3, weights = w, data = mydata, 
               model = "ls")

Gary King 2011-11-29