next up previous contents home.gif
Next: User's Guide to MATCHIT Up: Statistical Overview Previous: Checking Balance   Contents


Conducting Analyses after Matching

The most common way that parametric analyses are used to compute quantities of interest (without matching) is by (statistically) holding constant some explanatory variables, changing others, and computing predicted or expected values and taking the difference or ratio, all by using the parametric functional form. In the case of causal inference, this would mean looking at the effect on the expected value of the outcome variable when changing $ T$ from 0 to 1, while holding constant the pretreatment control variables $ X$ at their means or medians. This, and indeed any other appropriate analysis procedure, would be a perfectly reasonable way to proceed with analysis after matching. If it is the chosen way to proceed, then either treated or control units may be deleted during the matching stage, since the same parametric structure is assumed to apply to all observations.

In other instances, researchers wish to reduce the assumptions inherent in their statistical model and so want to allow for the possibility that their treatment effect to vary over observations. In this situation, one popular quantity of interest used is the average treatment effect on the treated (ATT). For example, for the treated group, the potential outcomes under control, $ Y_i(0)$ , are missing, whereas the outcomes under treatment, $ Y_i(1)$ , are observed, and the goal of the analysis is to impute the missing outcomes, $ Y_i(0)$ for observations with $ T_i=1$ . We do this via simulation using a parametric statistical model such as regression, logit, or others (as described below). Once those potential outcomes are imputed from the model, the estimate of individual $ i$ 's treatment effect is $ Y_i(1)-\widehat{Y}_i(0)$ where $ \widehat{Y}_i(0)$ is a predicted value of the dependent variable for unit $ i$ under the counterfactual condition where $ T_i=0$ . The in-sample average treatment effect for the treated individuals can then be obtained by averaging this difference over all observations $ i$ where in fact $ T_i=1$ . Most MATCHIT algorithms retain all treated units, and choose some subset of or repeated units from the control group, so that estimating the ATT is straightforward. If one chooses options that allow matching with replacement, or any solution that has different numbers of controls (or treateds) within each subclass or strata (such as full matching), then the parametric analysis following matching must accomodate these procedures, such as by using fixed effects or weights, as appropriate. (Similar procedures can also be used to estimate various other quantities of interest such as the average treatment effect by computing it for all observations, but then one must be aware that the quantity of interest may change during the matching procedure as some control units may be dropped.)

The imputation from the model can be done in at least two ways. Recall that the model is used to impute the value that the outcome variable would take among the treated units if those treated units were actually controls. Thus, one reasonable approach would be to fit a model to the matched data and create simulated predicted values of the dependent variable for the treated units with $ T_i$ switched counterfactually from 1 to 0. An alternative approach would be to fit a model without $ T$ by using only the outcomes of the matched control units (i.e., using only observations where $ T_i=0$ ). Then, given this fitted model, the missing outcomes $ Y_i(0)$ are imputed for the matched treated units by using the values of the explanatory variables for the treated units. The first approach will usually have lower variance, since all observations are used, and the second may have less bias, since no assumption of constant parameters across the models of the potential outcomes under treatment and control is needed. See () for more details.

Other quantities of interest can also be computed at the parametric stage, following any procedures you would have followed in the absence of matching. The advantage is that if matching is done well your answers will be more robust to many small changes in parametric specification.


next up previous contents home.gif
Next: User's Guide to MATCHIT Up: Statistical Overview Previous: Checking Balance   Contents
RBuild autobuild user 2011-10-24