The summary() Command

Next: The plot() Command Up: Details Previous: Details Contents

The `summary()` Command

The summary() command gives measures of the balance between the treated and control groups in the full (original) data set, and then in the matched data set. If the matching worked well, the measures of balance should be smaller in the matched data set (smaller values of the measures indicate better balance).

The summary() output for subclassification is the same as that for other types of matching, except that the balance statistics are shown separately for each subclass, and the overall balance in the matched samples is calculated by aggregating across the subclasses, where each subclass is weighted by the number of units in the subclass. For exact matching, the covariate values within each subclass are guaranteed to be the same, and so the measures of balance are not output for exact matching; only the sample sizes in each subclass are shown.

Balance statistics: The statistics the summary() command provides include means, the original control group standard deviation (where applicable), mean differences, standardized mean differences, and (median, mean and maximum) Quantile-Quantile (Q-Q) plot differences. In addition, the summary() command will report (a) the matched call, (b) how many units were matched, unmatched, or discarded due to the discard option (described below), and (c) the percent improvement in balance for each of the balance measures, defined as , where is the balance before and is the balance after matching. For each set of units (original and matched data sets, with weights used as appropriate in the matched data sets), the following statistics are provided:
1. ``Means Treated'' and ``Means Control'' show the weighted means in the treated and control groups
2. ``SD Control" is the standard deviation calculated in the control group (where applicable)
3. ``Mean Diff'' is the difference in means between the groups
4. The final three columns of the summary output give summary statistics of a Q-Q plot (see below for more information on these plots). Those columns give the median, mean, and maximum distance between the two empirical quantile functions (treated and control groups). Values greater than 0 indicate deviations between the groups in some part of the empirical distributions. The plots of the two empirical quantile functions themselves, described below, can provide further insight into which part of the covariate distribution has differences between the two groups.
Additional options: Three options to the summary() command can also help with assessing balance and respecifying the propensity score model, as necessary. First, the interactions = TRUE option with summary() shows the balance of all squares and interactions of the covariates used in the matching procedure. Large differences in higher order interactions usually are a good indication that the propensity score model (the distance measure) needs to be respecified. Similarly, the addlvariables option with summary() will provide balance measures on additional variables not included in the original matching procedure. If a variable (or interaction of variables) not included in the original propensity score model has large imbalances in the matched groups, including that variable in the next model specification may improve the resulting balance on that variable. Because the outcome variable is not used in the matching procedure, a variety of matching methods can be tried, and the one that leads to the best resulting balance chosen. Finally, the standardize = TRUE option will print out standardized versions of the balance measures, where the mean difference is standardized (divided) by the standard deviation in the original treated group.

Next: The plot() Command Up: Details Previous: Details Contents

RBuild autobuild user 2011-10-24