Previous: Describe the Statistical Model Up: Managing Statistical Model Inputs Next: Multivariate models: Bivariate Normal


Single Response Variable Models: Normal Regression Model

Let's say that you are trying to write a Normal regression model with stochastic component

54#54

with scalar variance parameter 55#55 , and systematic component 56#56 . This implies two sets of parameters in your model, and the following describe.normal.regression() function:
describe.normal.regression <- function() {
  category <- "continuous"
  mu <- list(equations = 1,              # Systematic component
             tagsAllowed = FALSE,
             depVar = TRUE,
             expVar = TRUE)
  sigma2 <- list(equations = 1,          # Scalar ancillary parameter
                 tagsAllowed = FALSE,
                 depVar = FALSE,
                 expVar = FALSE)
  pars <- list(mu = mu, sigma2 = sigma2)
  list(category = category, parameters = pars)
}

To find the log-likelihood:

57#57 58#58 59#59  
  58#58 60#60  
  58#58 61#61  
  58#58 62#62  
63#63 58#58 64#64  
  58#58 65#65  
  66#66 67#67  

In R code, this translates to:
ll.normal <- function(par, X, Y, n, terms) {
  beta <- parse.par(par, terms, eqn = "mu")             # [1]
  gamma <- parse.par(par, terms, eqn = "sigma2")        # [2]
  sigma2 <- exp(gamma)
  -0.5 * (n * log(sigma2) + sum((Y - X %*% beta)^2 / sigma2))
}
At Comment [1] above, we use the function parse.par() to pull out the vector of parameters beta (which relate the systematic component 68#68 to the explanatory variables 69#69 ). No matter how many covariates there are, the parse.par() function can use terms to pull out the appropriate parameters from par. We also use parse.par() at Comment [2] to pull out the scalar ancillary parameter that (after transformation) corresponds to the 13#13 parameter.

To optimize this function, simply type:

out <- optim(start.val, ll.normal, control = list(fnscale = -1),
             method = "BFGS", hessian = TRUE, X = X, Y = Y, terms = terms)
where Please refer to the R-help for optim() for more options.

To make this procedure generalizable, we can write a function that takes a user-specified data frame and formula, and optional starting values for the optimization procedure:

normal.regression <- function(formula, data, start.val = NULL, ...) {

  fml <- parse.formula(formula, model = "normal.regression") # [1]
  D <- model.frame(fml, data = data)
  X <- model.matrix(fml, data = D)
  Y <- model.response(D)
  terms <- attr(D, "terms")
  n <- nrow(X)

  start.val <- set.start(start.val, terms)

  res <- optim(start.val, ll.normal, method = "BFGS",        
               hessian = TRUE, control = list(fnscale = -1),
               X = X, Y = Y, n = n, terms = terms, ...)      # [2]

  fit <- model.end(res, D)                                   # [3]
  fit$n <- n
  class(fit) <- "normal"                                     # [4]
  fit                                                        
}
The following comments correspond to the bracketed numbers above:
  1. The parse.formula() command looks for the describe.normal.regression() function, which changes the user-specified formula into the following format:
    list(mu = formula,         # where `formula' was specified by the user
         sigma = ~ 1)
    
  2. The ... here indicate that if the user enters any additional arguments when calling normal.regression(), that those arguments should go to the optim() function.
  3. The model.end() function takes the optimized output and the listwise deleted data frame D and creates an object that will work with setx().
  4. Choose a class for your model output so that you will be able to write an appropriate summary(), param(), and qi() function for your model.



Gary King 2011-11-29