x.out <- setx(z.out, fn = list(numeric = mean, ordered =
median, others = mode), data = NULL, cond = FALSE, ...)
The setx() command lets you choose values for the explanatory
variables, with which sim() will simulate quantities of
interest. There are two types of setx() procedures:
- You may perform the usual unconditional prediction (by
default, cond = FALSE), by explicitly choosing the values of
each explanatory variable yourself or letting setx() compute
them, either from the data used to create z.out or from a new
data set specified in the optional data argument. You may
also compute predictions for all observed values of your explanatory
variables using fn = NULL.
- Alternatively, for advanced uses, you may perform
conditional prediction (cond = TRUE), which predicts
certain quantities of interest by conditioning on the observed value
of the dependent variable. In a simple linear regression model,
this procedure is not particularly interesting, since the
conditional prediction is merely the observed value of the dependent
variable for that observation. However, conditional prediction is
extremely useful for other models and methods, including the
following:
- In a matched sampling design, the sample average treatment
effect for the treated can be estimated by computing the
difference between the observed dependent variable for the treated
group and their expected or predicted values of the dependent
variable under no treatment (, ).
- With censored data, conditional prediction will ensure that
all predicted values are greater than the censored observed
values (, ).
- In ecological inference models, conditional prediction
guarantees that the predicted values are on the tomography line
and thus restricted to the known bounds
(, ,).
- The conditional prediction in many linear random effects (or
Bayesian hierarchical) models is a weighted average of the
unconditional prediction and the value of the dependent variable for
that observation, with the weight being an estimable function of the
accuracy of the unconditional prediction (see , ).
When the unconditional prediction is highly certain, the weight on
the value of the dependent variable for this observation is very
small, hence reducing inefficiency; when the unconditional
prediction is highly uncertain, the relative weight on the
unconditional prediction is very small, hence reducing bias.
Although the simple weighted average expression no longer holds in
nonlinear models, the general logic still holds and the mean square
error of the measurement is typically reduced
(see , ).
In these and other models, conditioning on the observed value of the
dependent variable can vastly increase the accuracy of prediction
and measurement.
The setx() arguments for unconditional prediction are
as follows:
- z.out, the zelig() output object, must be
included first.
- You can set particular explanatory variables to specified
values. For example:
> z.out <- zelig(vote ~ age + race, model = "logit", data = turnout)
> x.out <- setx(z.out, age = 30)
setx() sets the variables not explicitly listed to
their mean if numeric, and their median if ordered factors, and
their mode if unordered factors, logical values, or character
strings. Alternatively, you may specify one explanatory variable
as a range of values, creating one observation for every unique
value in the range of values:4.2
> x.out <- setx(z.out, age = 18:95)
This creates 78 observations with with age set to 18 in the first
observation, 19 in the second observation, up to 95 in the 78th
observation. The other variables are set to their default values,
but this may be changed by setting fn, as described next.
- Optionally, fn is a list which lets you to choose a different
function to apply to explanatory variables of class
- numeric, which is mean by default,
- ordered factor, which is median by default, and
- other variables, which consist of logical variables,
character string, and unordered factors, and are set to their
mode by default.
While any function may be applied to numeric variables, mean
will default to median for ordered factors, and mode is the only
available option for other types of variables. In the special case,
fn = NULL, setx() returns all of the observations.
- You cannot perform other math operations within the fn
argument, but can use the output from one call of setx to
create new values for the explanatory variables. For example, to
set the explanatory variables to one standard deviation below their
mean:
> X.sd <- setx(z.out, fn = list(numeric = sd))
> X.mean <- setx(z.out, fn = list(numeric = mean))
> x.out <- X.mean - X.sd
- Optionally, data identifies a new data frame (rather
than the one used to create z.out) from which the setx() values are calculated. You can use this argument to set
values of the explanatory variables for hold-out or out-of-sample
fit tests.
- The cond is always FALSE for unconditional
prediction.
If you wish to calculate risk ratios or first differences, call setx() a second time to create an additional set of the values for
the explanatory variables. For example, continuing from the example
above, you may create an alternative set of explanatory variables
values one standard deviation above their mean:
> x.alt <- X.mean + X.sd
The required arguments for conditional prediction are as
follows:
- z.out, the zelig() output object, must be included
first.
- fn, which equals NULL to indicate that all of the
observations are selected. You may only perform conditional
inference on actual observations, not the mean of observations or
any other function applied to the observations. Thus, if fn
is missing, but cond = TRUE, setx() coerces fn =
NULL.
- data, the data for conditional prediction.
- cond, which equals TRUE for conditional prediction.
Additional arguments, such as any of the variable names, are ignored
in conditional prediction since the actual values of that observation
are used.