Previous: Quick Overview | Up: Conducting Analyses after Matching | Next: Reference Manual |
We now give four examples using the Lalonde data. They are meant to be read sequentially. You can run these example commands by typing demo(analysis). Although we use the linear least squares model in these examples, a wide range of other models are available in Zelig (for the list of supported models, see http://gking.harvard.edu/zelig/docs/Models_Zelig_Can.html.
To load the Zelig package after installing it, type
> library(Zelig)
> m.out <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = "nearest", discard = "hull.control", data = lalonde)Then we check balance using the summary and plot procedures (which we don't show here). When the best balance is achieved, we run the parametric analysis:
> z.out <- zelig(re78 ~ treat + age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out), model = "ls")and then set the explanatory variables at their means (the default) and change the treatment variable from a 0 to a 1:
> x.out <- setx(z.out, treat=0) > x1.out <- setx(z.out, treat=1)and finally compute the result and examine a summary:
> s.out <- sim(z.out, x = x.out, x1 = x1.out) > summary(s.out)
We begin by conducting nearest neighbor matching with a logistic regression-based propensity score:
> m.out1 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = "nearest", data = lalonde)Then we check balance using the summary and plot procedures (which we don't show here). We reestimate the matching procedure until we achieve the best balance possible. (The running examples here are meant merely to illustrate, not to suggest that we've achieved the best balance.) Then we go to Zelig, and in this case choose to fit a linear least squares model to the control group only:
> z.out1 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, "control"), model = "ls")where the "control" option in match.data() extracts only the matched control units and ls specifies least squares regression. In a smaller data set, this example should probably be changed to include all the data in this estimation (using data = match.data(m.out1) for the data) and by including the treatment indicator (which is excluded in the example since its a constant in the control group.) Next, we use the coefficients estimated in this way from the control group, and combine them with the values of the covariates set to the values of the treated units. We do this by choosing conditional prediction (which means use the observed values) in setx(). The sim() command does the imputation.
> x.out1 <- setx(z.out1, data = match.data(m.out1, "treat"), cond = TRUE) > s.out1 <- sim(z.out1, x = x.out1)Finally, we obtain a summary of the results by
> summary(s.out1)
> z.out2 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, "treat"), model = "ls")We then conduct the same simulation procedure in order to impute the counterfactual outcome for the control group,
> x.out2 <- setx(z.out2, data = match.data(m.out1, "control"), cond = TRUE) > s.out2 <- sim(z.out2, x = x.out2)In this calculation, Zelig is computing the difference between observed and the expected values. This means that the treatment effect for the control units is the effect of control (observed control outcome minus the imputed outcome under treatment from the model). Hence, to combine treatment effects just reverse the signs of the estimated treatment effect of controls.
> ate.all <- c(s.out1$qi$att.ev, -s.out2$qi$att.ev)The point estimate, its standard error, and the confidence interval is given by
> mean(ate.all) > sd(ate.all) > quantile(ate.all, c(0.025, 0.975))
We begin this example by conducting subclassification with four subclasses,
> m.out2 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = lalonde, method = "subclass", subclass = 4)When balance is as good as we can get it, we then fit a linear regression within each subclass by controlling for the estimated propensity score (called distance) and other covariates. In most software, this would involve running four separate regressions and then combining the results. In Zelig, however, all we need to do is to use the by option:
> z.out3 <- zelig(re78 ~ re74 + re75 + distance, data = match.data(m.out2, "control"), model = "ls", by = "subclass")The same set of commands as in the first example are used to do the imputation of the counterfactual outcomes for the treated units:
> x.out3 <- setx(z.out3, data = match.data(m.out2, "treat"), fn = NULL, cond = TRUE) > s.out3 <- sim(z.out3, x = x.out3) > summary(s.out3)It is also possible to get the summary result for each subclass. For example, the following command summarizes the result for the second subclass.
> summary(s.out3, subset = 2)
> m.out <- matchit(treat ~ educ + black + hispan, data = lalonde, method = "exact") > m.data <- match.data(m.out) > ## weighted diff in means > weighted.mean(mdata$re78[mdata$treat == 1], mdata$weights[mdata$treat==1]) - weighted.mean(mdata$re78[mdata$treat==0], mdata$weights[mdata$treat==0]) [1] 807 > ## weighted least squares without covariates > zelig(re78 ~ treat, data = m.data, model = "ls", weights = "weights") Call: zelig(formula = re78 ~ treat, model = "ls", data = m.data, weights = "weights") Coefficients: (Intercept) treat 5524 807 > ## weighted least squares with covariates > zelig(re78 ~ treat + black + hispan + educ, data = m.data, model = "ls", weights = "weights") Call: zelig(formula = re78 ~ treat + black + hispan + educ, model = "ls", data = m.data, weights = "weights") Coefficients: (Intercept) treat black hispan educ 314 807 -1882 258 657