Gary King Homepage Previous: Quick Overview Up: Conducting Analyses after Matching Next: Reference Manual

Examples

We now give four examples using the Lalonde data. They are meant to be read sequentially. You can run these example commands by typing demo(analysis). Although we use the linear least squares model in these examples, a wide range of other models are available in Zelig (for the list of supported models, see http://gking.harvard.edu/zelig/docs/Models_Zelig_Can.html.

To load the Zelig package after installing it, type

> library(Zelig)

Model-Based Estimates
In our first example, we conduct a standard parametric analysis and compute quantities of interest in the most common way. We begin with nearest neighbor matching with a logistic regression-based propensity score, discarding control units outside the convex hull of the treated units (King and Zeng, 2007; King and Zeng, 2006):
> m.out <- matchit(treat ~ age + educ + black + hispan + nodegree + 
                    married + re74 + re75, method = "nearest", discard
                    = "hull.control", data = lalonde)
Then we check balance using the summary and plot procedures (which we don't show here). When the best balance is achieved, we run the parametric analysis:
> z.out <- zelig(re78 ~ treat + age + educ + black + hispan + nodegree + 
                  married + re74 + re75, data = match.data(m.out), 
                  model = "ls")
and then set the explanatory variables at their means (the default) and change the treatment variable from a 0 to a 1:
> x.out <- setx(z.out, treat=0)
> x1.out <- setx(z.out, treat=1)
and finally compute the result and examine a summary:
> s.out <- sim(z.out, x = x.out, x1 = x1.out)
> summary(s.out)

Average Treatment Effect on the Treated
We illustrate now how to estimate the average treatment effect on the treated in a way that is quite robust. We do this by estimating the coefficients in the control group alone.

We begin by conducting nearest neighbor matching with a logistic regression-based propensity score:

> m.out1 <- matchit(treat ~ age + educ + black + hispan + nodegree + 
                    married + re74 + re75, method = "nearest", data = lalonde)
Then we check balance using the summary and plot procedures (which we don't show here). We reestimate the matching procedure until we achieve the best balance possible. (The running examples here are meant merely to illustrate, not to suggest that we've achieved the best balance.) Then we go to Zelig, and in this case choose to fit a linear least squares model to the control group only:
> z.out1 <- zelig(re78 ~ age + educ + black + hispan + nodegree + 
                  married + re74 + re75, data = match.data(m.out1, "control"), 
                  model = "ls")
where the "control" option in match.data() extracts only the matched control units and ls specifies least squares regression. In a smaller data set, this example should probably be changed to include all the data in this estimation (using data = match.data(m.out1) for the data) and by including the treatment indicator (which is excluded in the example since its a constant in the control group.) Next, we use the coefficients estimated in this way from the control group, and combine them with the values of the covariates set to the values of the treated units. We do this by choosing conditional prediction (which means use the observed values) in setx(). The sim() command does the imputation.
> x.out1 <- setx(z.out1, data = match.data(m.out1, "treat"), cond = TRUE)
> s.out1 <- sim(z.out1, x = x.out1)
Finally, we obtain a summary of the results by
> summary(s.out1)

Average Treatment Effect (Overall)
To estimate the average treatment effect, we continue with the previous example and fit the linear model to the treatment group:
> z.out2 <- zelig(re78 ~ age + educ + black + hispan + nodegree + 
                  married + re74 + re75, data = match.data(m.out1, "treat"), 
                  model = "ls")
We then conduct the same simulation procedure in order to impute the counterfactual outcome for the control group,
> x.out2 <- setx(z.out2, data = match.data(m.out1, "control"), cond = TRUE)
> s.out2 <- sim(z.out2, x = x.out2)
In this calculation, Zelig is computing the difference between observed and the expected values. This means that the treatment effect for the control units is the effect of control (observed control outcome minus the imputed outcome under treatment from the model). Hence, to combine treatment effects just reverse the signs of the estimated treatment effect of controls.
> ate.all <- c(s.out1$qi$att.ev, -s.out2$qi$att.ev)
The point estimate, its standard error, and the $ 95\%$ confidence interval is given by
> mean(ate.all)
> sd(ate.all)
> quantile(ate.all, c(0.025, 0.975))

Subclassification
In subclassification, the average treatment effect estimates are obtained separately for each subclass, and then aggregated for an overall estimate. Estimating the treatment effects separately for each subclass, and then aggregating across subclasses, can increase the robustness of the ultimate results since the parametric analysis within each subclass requires only local rather than global assumptions. However, fewer observations are obviously available within each subclass, and so this option is normally chosen for larger data sets.

We begin this example by conducting subclassification with four subclasses,

> m.out2 <- matchit(treat ~ age + educ + black + hispan + nodegree + 
                    married + re74 + re75, data = lalonde, 
                    method = "subclass", subclass = 4)
When balance is as good as we can get it, we then fit a linear regression within each subclass by controlling for the estimated propensity score (called distance) and other covariates. In most software, this would involve running four separate regressions and then combining the results. In Zelig, however, all we need to do is to use the by option:
> z.out3 <- zelig(re78 ~ re74 + re75 + distance, 
                  data = match.data(m.out2, "control"), 
                  model = "ls", by = "subclass")
The same set of commands as in the first example are used to do the imputation of the counterfactual outcomes for the treated units:
> x.out3 <- setx(z.out3, data = match.data(m.out2, "treat"), fn = NULL, 
                 cond = TRUE)
> s.out3 <- sim(z.out3, x = x.out3)
> summary(s.out3)
It is also possible to get the summary result for each subclass. For example, the following command summarizes the result for the second subclass.
> summary(s.out3, subset = 2)

How Adjustment After Exact Matching Has No Effect
Regression adjustment after exact one-to-one exact matching gives the identical answer as a simple, unadjusted difference in means. General exact matching, as implemented in MatchIt, allows one-to-many matches, so to see the same result we must weight when adjusting. In other words: weighted regression adjustment after general exact matching gives the identical answer as a simple, unadjusted weighted difference in means. For example:

> m.out <- matchit(treat ~ educ + black + hispan, data = lalonde, 
                   method = "exact")
> m.data <- match.data(m.out)
> ## weighted diff in means 
> weighted.mean(mdata$re78[mdata$treat == 1], mdata$weights[mdata$treat==1]) 
        - weighted.mean(mdata$re78[mdata$treat==0], mdata$weights[mdata$treat==0])
[1] 807
> ## weighted least squares without covariates
> zelig(re78 ~ treat, data = m.data, model = "ls", weights = "weights")

Call:
zelig(formula = re78 ~ treat, model = "ls", data = m.data, weights =
"weights")

Coefficients:
(Intercept)        treat  
       5524          807 

> ## weighted least squares with covariates
> zelig(re78 ~ treat + black + hispan + educ, data = m.data, model = "ls", 
        weights = "weights")

Call:
zelig(formula = re78 ~ treat + black + hispan + educ, model = "ls",
data = m.data, weights = "weights")

Coefficients:
(Intercept)        treat        black       hispan         educ  
        314          807        -1882          258          657



Gary King 2010-12-11