We now give four examples using the Lalonde data. They are meant to
be read sequentially. You can run these example commands by typing
`demo(analysis)`. Although we use the linear least squares model
in these examples, a wide range of other models are available in Zelig
(for the list of supported models, see
`http://gking.harvard.edu/zelig/docs/Models_Zelig_Can.html`.

To load the Zelig package after installing it, type

> library(Zelig)

**Model-Based Estimates**- In our first example, we conduct a
standard parametric analysis and compute quantities of interest in
the most common way. We begin with nearest neighbor matching with a
logistic regression-based propensity score, discarding control units
outside the convex hull of the treated units
(, ,):
> m.out <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = "nearest", discard = "hull.control", data = lalonde)

Then we check balance using the summary and plot procedures (which we don't show here). When the best balance is achieved, we run the parametric analysis:> z.out <- zelig(re78 ~ treat + age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out), model = "ls")

and then set the explanatory variables at their means (the default) and change the treatment variable from a 0 to a 1:> x.out <- setx(z.out, treat=0) > x1.out <- setx(z.out, treat=1)

and finally compute the result and examine a summary:> s.out <- sim(z.out, x = x.out, x1 = x1.out) > summary(s.out)

**Average Treatment Effect on the Treated**- We illustrate now how
to estimate the average treatment effect on the treated in a way
that is quite robust. We do this by estimating the coefficients in
the control group alone.
We begin by conducting nearest neighbor matching with a logistic regression-based propensity score:

> m.out1 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = "nearest", data = lalonde)

Then we check balance using the summary and plot procedures (which we don't show here). We reestimate the matching procedure until we achieve the best balance possible. (The running examples here are meant merely to illustrate, not to suggest that we've achieved the best balance.) Then we go to Zelig, and in this case choose to fit a linear least squares model to the control group only:> z.out1 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, "control"), model = "ls")

where the`"control"`option in`match.data()`extracts only the matched control units and`ls`specifies least squares regression. In a smaller data set, this example should probably be changed to include all the data in this estimation (using`data = match.data(m.out1)`for the data) and by including the treatment indicator (which is excluded in the example since its a constant in the control group.) Next, we use the coefficients estimated in this way from the control group, and combine them with the values of the covariates set to the values of the treated units. We do this by choosing conditional prediction (which means use the observed values) in`setx()`. The`sim()`command does the imputation.> x.out1 <- setx(z.out1, data = match.data(m.out1, "treat"), cond = TRUE) > s.out1 <- sim(z.out1, x = x.out1)

Finally, we obtain a summary of the results by> summary(s.out1)

**Average Treatment Effect (Overall)**- To estimate the average
treatment effect, we continue with the previous example and fit the
linear model to the
*treatment group*:> z.out2 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, "treat"), model = "ls")

We then conduct the same simulation procedure in order to impute the counterfactual outcome for the*control group*,> x.out2 <- setx(z.out2, data = match.data(m.out1, "control"), cond = TRUE) > s.out2 <- sim(z.out2, x = x.out2)

In this calculation, Zelig is computing the difference between observed and the expected values. This means that the treatment effect for the control units is the effect of control (observed control outcome minus the imputed outcome under treatment from the model). Hence, to combine treatment effects just reverse the signs of the estimated treatment effect of controls.> ate.all <- c(s.out1$qi$att.ev, -s.out2$qi$att.ev)

The point estimate, its standard error, and the confidence interval is given by> mean(ate.all) > sd(ate.all) > quantile(ate.all, c(0.025, 0.975))

**Subclassification**- In subclassification, the average treatment
effect estimates are obtained separately for each subclass, and then
aggregated for an overall estimate. Estimating the treatment
effects separately for each subclass, and then aggregating across
subclasses, can increase the robustness of the ultimate results
since the parametric analysis within each subclass requires only
local rather than global assumptions. However, fewer observations
are obviously available within each subclass, and so this option is
normally chosen for larger data sets.
We begin this example by conducting subclassification with four subclasses,

> m.out2 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = lalonde, method = "subclass", subclass = 4)

When balance is as good as we can get it, we then fit a linear regression within each subclass by controlling for the estimated propensity score (called`distance`) and other covariates. In most software, this would involve running four separate regressions and then combining the results. In Zelig, however, all we need to do is to use the`by`option:> z.out3 <- zelig(re78 ~ re74 + re75 + distance, data = match.data(m.out2, "control"), model = "ls", by = "subclass")

The same set of commands as in the first example are used to do the imputation of the counterfactual outcomes for the treated units:> x.out3 <- setx(z.out3, data = match.data(m.out2, "treat"), fn = NULL, cond = TRUE) > s.out3 <- sim(z.out3, x = x.out3) > summary(s.out3)

It is also possible to get the summary result for each subclass. For example, the following command summarizes the result for the second subclass.> summary(s.out3, subset = 2)

**How Adjustment After Exact Matching Has No Effect**- Regression
adjustment after exact one-to-one exact matching gives the identical
answer as a simple, unadjusted difference in means. General exact
matching, as implemented in MatchIt, allows one-to-many matches, so to see
the same result we must weight when adjusting. In other words: weighted
regression adjustment after general exact matching gives the identical
answer as a simple, unadjusted weighted difference in means. For
example:
> m.out <- matchit(treat ~ educ + black + hispan, data = lalonde, method = "exact") > m.data <- match.data(m.out) > ## weighted diff in means > weighted.mean(mdata$re78[mdata$treat == 1], mdata$weights[mdata$treat==1]) - weighted.mean(mdata$re78[mdata$treat==0], mdata$weights[mdata$treat==0]) [1] 807 > ## weighted least squares without covariates > zelig(re78 ~ treat, data = m.data, model = "ls", weights = "weights") Call: zelig(formula = re78 ~ treat, model = "ls", data = m.data, weights = "weights") Coefficients: (Intercept) treat 5524 807 > ## weighted least squares with covariates > zelig(re78 ~ treat + black + hispan + educ, data = m.data, model = "ls", weights = "weights") Call: zelig(formula = re78 ~ treat + black + hispan + educ, model = "ls", data = m.data, weights = "weights") Coefficients: (Intercept) treat black hispan educ 314 807 -1882 258 657