next up previous contents home.gif
Next: Checking Balance Up: Statistical Overview Previous: Statistical Overview   Contents

Preprocessing via Matching

If $ t_i$ and $ X_i$ were independent, we would not need to control for $ X_i$ , and any parametric analysis would effectively reduce to a difference in means of $ Y$ for the treated and control groups. The goal of matching is to preprocess the data prior to the parametric analysis so that the actual relationship between $ t_i$ and $ X_i$ is eliminated or reduced without introducing bias and or increasing inefficiency too much.

When matching we select, duplicate, or selectively drop observations from our data, and we do so without inducing bias as long as we use a rule that is a function only of $ t_i$ and $ X_i$ and does not depend on the outcome variable $ Y_i$ . Many methods that offer this preprocessing are included here, including exact, subclassification, nearest neighbor, optimal, and genetic matching. For many of these methods the propensity score-defined as the probability of receiving the treatment given the covariates-is a key tool. In order to avoid changing the quantity of interest, most MATCHIT routines work by retaining all treated units and selecting (or weighting) control units to include in the final data set; this enables one to estimate the average treatment effect on the treated (the purpose of which is described in Section [*]).

MATCHIT implements and evaluates the choice of the rules for matching. Matching sometimes increases efficiency by eliminating heterogeneity or deleting observations outside of an area where a model can reasonably be used to extrapolate, but one needs to be careful not to lose too many observations in matching or efficiency will drop more than the reduction in bias that is achieved.

The simplest way to obtain good matches (as defined above) is to use one-to-one exact matching, which pairs each treated unit with one control unit for which the values of $ X_i$ are identical. However, with many covariates and finite numbers of potential matches, sufficient exact matches often cannot be found. Indeed, many of the other methods implemented in MATCHIT attempt to balance the overall covariate distributions as much as possible, when sufficient one-to-one exact matches are not available.

A key point in () is that matching methods by themselves are not methods of estimation: Every use of matching in the literature involves an analysis step following the matching procedure, but almost all analyses use a simple difference in means. This procedure is appropriate only if exact matching was conducted. In almost all other cases, some adjustment is required, and there is no reason to degrade your inferences by using an inferior method of analysis such as a difference in means even when improving your inferences via preprocessing. Thus, with MATCHIT, you can improve your analyses in two ways. MATCHIT analyses are ``doubly robust'' in that if either the matching analysis or the analysis model is correct (but not necessarily both) your inferences will be statistically consistent. In practice, the modeling choices you make at the analysis stage will be much less consequential if you match first.


next up previous contents home.gif
Next: Checking Balance Up: Statistical Overview Previous: Statistical Overview   Contents
RBuild autobuild user 2011-10-24