Subclassification

When there are many covariates (or some covariates can take a large number of values), finding sufficient exact matches will often be impossible. The goal of subclassification is to form subclasses, such that in each the distribution (rather than the exact values) of covariates for the treated and control groups are as similar as possible. Various subclassification schemes exist, including the one based on a scalar distance measure such as the propensity score estimated using the distance option (see Section 4.1.0.2.2). Subclassification is implemented in MATCHIT using method = "subclass".

The following example script can be run by typing demo(subclass) at the R prompt,

> m.out <- matchit(treat ~ re74 + re75 + educ + black + hispan + age, 
                   data = lalonde, method = "subclass")

The above syntax forms 6 subclasses, which is the default number of subclasses, based on a distance measure (the propensity score) estimated using logistic regression. By default, each subclass will have approximately the same number of treated units.

Subclassification may also be used in conjunction with nearest neighbor matching described below, by leaving the default of method = "nearest" but adding the option subclass. When you choose this option, MATCHIT selects matches using nearest neighbor matching, but after the nearest neighbor matches are chosen it places them into subclasses, and adds a variable to the output object indicating subclass membership.

Gary King 2010-12-11