Additional Arguments for Specification of
Distance Measures
The following arguments specify distance measures that are used for
matching methods. These arguments apply to all matching methods except exact matching.
- distance: the method used to estimate the distance
measure (default = "logit", logistic regression) or a
numerical vector of user's own distance measure. Before using any
of these techniques, it is best to understand the theoretical
groundings of these techniques and to evaluate the results. Most of
these methods (such as logistic or probit regression) define the
distance by first estimating the propensity score, defined as the
probability of receiving treatment, conditional on the covariates.
Available methods include:
- "mahalanobis": the Mahalanobis distance measure.
- binomial generalized linear models with one of the following
link functions:
- "logit": logistic link
- "linear.logit": logistic link with linear propensity
score4.1
- "probit": probit link
- "linear.probit": probit link with linear propensity
score
- "cloglog": complementary log-log link
- "linear.cloglog": complementary log-log link with linear
propensity score
- "log": log link
- "linear.log": log link with linear propensity score
- "cauchit" Cauchy CDF link
- "linear.cauchit" Cauchy CDF link with linear propensity
score
- Choose one of the following generalized additive models (see
help(gam) for more options).
- "GAMlogit": logistic link
- "GAMlinear.logit": logistic link with linear propensity
score
- "GAMprobit": probit link
- "GAMlinear.probit": probit link with linear propensity
score
- "GAMcloglog": complementary log-log link
- "GAMlinear.cloglog": complementary log-log link with
linear propensity score
- "GAMlog": log link
- "GAMlinear.log": log link with linear propensity score,
- "GAMcauchit": Cauchy CDF link
- "GAMlinear.cauchit": Cauchy CDF link with linear
propensity score
- "nnet": neural network model. See help(nnet) for
more options.
- "rpart": classification trees. See help(rpart)
for more options.
- distance.options: optional arguments for estimating the
distance measure. The input to this argument should be a list. For
example, if the distance measure is estimated with a logistic
regression, users can increase the maximum IWLS iterations by
distance.options = list(maxit = 5000). Find additional
options for general linear models using help(glm) or help(family), for general additive models using help(gam),
for neutral network models help(nnet), and for classification
trees help(rpart).
- discard: specifies whether to discard units that fall
outside some measure of support of the distance measure (default =
"none", discard no units). Discarding units may change the
quantity of interest being estimated by changing the observations left in the analysis.
Enter a logical vector
indicating which unit should be discarded or choose from the
following options:
- "none": no units will be discarded before matching.
Use this option when the units to be matched are substantially
similar, such as in the case of matching treatment and control
units from a field experiment that was close to (but not fully)
randomized (e.g., Imai 2005), when caliper matching will
restrict the donor pool, or when you do not wish to change the
quantity of interest and the parametric methods to be used
post-matching can be trusted to extrapolate.
- "hull.both": all units that are not within the convex
hull will be discarded. See King and Zeng (2007); King and Zeng (2006) for
information about the convex hull in this context and as a measure
of model dependence.
- "both": all units (treated and control) that are
outside the support of the distance measure will be discarded.
- "hull.control": only control units that are not
within the convex hull of the treated units will be discarded.
- "control": only control units outside the support of
the distance measure of the treated units will be discarded. Use
this option when the average treatment effect on the treated is of
most interest and when you are unwilling to discard
non-overlapping treatment units (which would change the quantity
of interest).
- "hull.treat": only treated units that are not within
the convex hull of the control units will be discarded.
- "treat": only treated units outside the support of
the distance measure of the control units will be discarded. Use
this option when the average treatment effect on the control units
is of most interest and when unwilling to discard control units.
- reestimate: If FALSE (default), the model for the
distance measure will not be re-estimated after units are discarded.
The input must be a logical value. Re-estimation may be desirable
for efficiency reasons, especially if many units were discarded and
so the post-discard samples are quite different from the original
samples.
Gary King
2011-01-08