Technical Details

The computational task of determining the convex hull membership is made feasible even for large numbers of explanatory variables and observations by the solution proposed in King & Zeng (2006), which eliminates the most time-consuming part of the problem: the characterization of the convex hull itself. In addition, they show that the remaining (implicit) point location problem can be expressed as a linear programming exercise, making it possible to take advantage of existing well-developed algorithms designed for other purposes to speed up the computation. Specifically, a counterfactual is in the convex hull of the explanatory variables if there exists a feasible solution to the following standard form linear programming problem:

min	$\displaystyle C'\eta$
s.t.	$\displaystyle A'\eta=B'$	(1)
	$\displaystyle \eta \geq 0$

where

is a vector of zeros (so that there is no objective function to minimize); $\eta$ is a vector of coefficients;

with an additional, final row of

's; and

with an additional, final element equal to

The default Gower distance (which is suitable for both quantitative and qualitative data) between a pair of dimensional points and is defined simply as the average absolute distance between the elements of the two points divided by the range of the data:

$\displaystyle G_{ij} = \frac{1}{K}\sum_{k=1}^K \frac{\left\vert x_{ik} - x_{jk}\right\vert}{r_k}$

(2)

where the range is

max $(X_{.k}) -$ min $(X_{.k})$ and the min and max functions return the smallest and largest elements respectively in the set including the

th element of the explanatory variables

. The optional squared Euclidian distance (which is suitable only for quantitative data) between points $x_{i}$ and $x_{j}$ is given by the familiar definition, i.e. the sum of the squared differences between the elements of the two points:

$\displaystyle E_{ij} = \sum_{k=1}^K (x_{ik} - x_{jk})^2$ .

(3)

Gary King 2010-08-12