Previous: Single Response Variable Models: Up: Managing Statistical Model Inputs Next: Easy Ways to Manage


Multivariate models: Bivariate Normal example

Most common models have one systematic component. For 2#2 observations, the systematic component varies over observations 70#70 . In the case of the Normal regression model, the systematic component is 68#68 (13#13 is not estimated as a function of covariates).

In some cases, however, your model may have more than one systematic component. In the case of bivariate probit, we have a dependent variable 71#71 observed as (0,0), (1,0), (0,1), or (1,1) for 70#70 . Similar to a single-response probit model, the stochastic component is described by two latent (unobserved) continuous variables (72#72 , 73#73 ) which follow the bivariate Normal distribution:

74#74 75#75 76#76  

where for 77#77 , 78#78 is the mean for 79#79 and 80#80 is a correlation parameter. The following observation mechanism links the observed dependent variables, 81#81 , with these latent variables
82#82 58#58 83#83  

The systemic components for each observation are

84#84 58#58 85#85  
86#86 58#58 87#87  

In the default specification, 80#80 is a scalar (such that 88#88 only contains an intercept term).

If so, we have two sets of parameters: 89#89 and 80#80 . This implies the following describe.bivariate.probit() function:

describe.bivariate.probit <- function() {
  category <- "dichotomous"
  package <- list(name = "mvtnorm",       # Required package and 
                  version = "0.7")        #  minimum version number
  mu <- list(equations = 2,               # Systematic component has 2
             tagsAllowed = TRUE,          #  required equations
             depVar = TRUE, 
             expVar = TRUE), 
  rho <- list(equations = 1,              # Optional systematic component
             tagsAllowed = FALSE,         #   (estimated as an ancillary
             depVar = FALSE,              #    parameter by default)
             expVar = TRUE), 
  pars <- parameters(mu = mu, rho = rho)
  list(category = category, package = package, parameters = pars)
}

Since users may choose different explanatory variables to parameterize 90#90 and 91#91 (and sometimes 80#80 ), the model requires a minimum of two formulas. For example,

formulae <- list(mu1 = y1 ~ x1 + x2,                         # User input
                 mu2 = y2 ~ x2 + x3)
fml <- parse.formula(formulae, model = "bivariate.probit")   # [1]
D <- model.frame(fml, data = mydata)
X <- model.matrix(fml, data = D)
Y <- model.response(D)
At comment [1], parse.formula() finds the describe.bivariate.probit() function and parses the formulas accordingly.

If 80#80 takes covariates (and becomes a systematic component rather than an ancillary parameter), there can be three sets of explanatory variables:

formulae <- list(mu1 = y1 ~ x1 + x2, 
                 mu2 = y2 ~ x2 + x3, 
                 rho = ~ x4 + x5)

From the perspective of the programmer, a nearly identical framework works for both single and multiple equation models. The (parse.formula()) line changes the class of fml from "list" to "multiple" and hence ensures that model.frame() and model.matrix() go to the appropriate methods. D, X , and Y are analogous to their single equation counterparts above:

Given for the bivariate probit probability density described above, the likelihood is:

94#94

where I is an indicator function and This implies the following log-likelihood:
99#99 58#58 100#100  
    101#101  

(For the corresponding R code, see Section [*] below.)



Gary King 2011-11-29