Single Response Variable Models: Normal Regression Model

Previous: Describe the Statistical Model

Next: Multivariate models: Bivariate Normal

Single Response Variable Models: Normal Regression Model

Let's say that you are trying to write a Normal regression model with stochastic component

54#54

with scalar variance parameter 55#55 , and systematic component 56#56 . This implies two sets of parameters in your model, and the following describe.normal.regression() function:

describe.normal.regression <- function() {
  category <- "continuous"
  mu <- list(equations = 1,              # Systematic component
             tagsAllowed = FALSE,
             depVar = TRUE,
             expVar = TRUE)
  sigma2 <- list(equations = 1,          # Scalar ancillary parameter
                 tagsAllowed = FALSE,
                 depVar = FALSE,
                 expVar = FALSE)
  pars <- list(mu = mu, sigma2 = sigma2)
  list(category = category, parameters = pars)
}

To find the log-likelihood:

57#57	58#58	59#59
	58#58	60#60
	58#58	61#61
	58#58	62#62
63#63	58#58	64#64
	58#58	65#65
	66#66	67#67

In R code, this translates to:

ll.normal <- function(par, X, Y, n, terms) {
  beta <- parse.par(par, terms, eqn = "mu")             # [1]
  gamma <- parse.par(par, terms, eqn = "sigma2")        # [2]
  sigma2 <- exp(gamma)
  -0.5 * (n * log(sigma2) + sum((Y - X %*% beta)^2 / sigma2))
}

At Comment [1] above, we use the function parse.par() to pull out the vector of parameters beta (which relate the systematic component 68#68 to the explanatory variables 69#69 ). No matter how many covariates there are, the parse.par() function can use terms to pull out the appropriate parameters from par. We also use parse.par() at Comment [2] to pull out the scalar ancillary parameter that (after transformation) corresponds to the 13#13 parameter.

To optimize this function, simply type:

out <- optim(start.val, ll.normal, control = list(fnscale = -1),
             method = "BFGS", hessian = TRUE, X = X, Y = Y, terms = terms)

where

start.val is a vector of starting values for par. Use set.start() to create starting values for all parameters, systematic and ancillary, in one step.
ll.normal is the log-likelihood function derived above.
"BFGS" specifies unconstrained optimization using a quasi-Newton method.
control = list(fnscale = -1) specifies that R should maximize the function (omitting this causes R to minimize the function by default).
hessian = TRUE instructs R to return the Hessian matrix (from which you may calculate the variance-covariance matrix).
X and Y are the matrix of explanatory variables and vector of dependent variables, used in the ll.normal() function.
terms are meta-data constructed from the model.frame() command.

Please refer to the R-help for optim() for more options.

To make this procedure generalizable, we can write a function that takes a user-specified data frame and formula, and optional starting values for the optimization procedure:

normal.regression <- function(formula, data, start.val = NULL, ...) {

  fml <- parse.formula(formula, model = "normal.regression") # [1]
  D <- model.frame(fml, data = data)
  X <- model.matrix(fml, data = D)
  Y <- model.response(D)
  terms <- attr(D, "terms")
  n <- nrow(X)

  start.val <- set.start(start.val, terms)

  res <- optim(start.val, ll.normal, method = "BFGS",        
               hessian = TRUE, control = list(fnscale = -1),
               X = X, Y = Y, n = n, terms = terms, ...)      # [2]

  fit <- model.end(res, D)                                   # [3]
  fit$n <- n
  class(fit) <- "normal"                                     # [4]
  fit                                                        
}

The following comments correspond to the bracketed numbers above:

The parse.formula() command looks for the describe.normal.regression() function, which changes the user-specified formula into the following format:
```
list(mu = formula,         # where `formula' was specified by the user
     sigma = ~ 1)
```
The ... here indicate that if the user enters any additional arguments when calling normal.regression(), that those arguments should go to the optim() function.
The model.end() function takes the optimized output and the listwise deleted data frame D and creates an object that will work with setx().
Choose a class for your model output so that you will be able to write an appropriate summary(), param(), and qi() function for your model.

Gary King 2011-11-29