Previous: Usage Up: Function va.gcv() Next: Value

Inputs

formula
A formula object. The left side of the formula is the collection of symptoms. The right side is the cause of death. For example, if there are totally 5 symptoms, named fever,coughing,chestpain,dizziness, shortbreath, and the cause of death variable is death, then the formula can be written as:

       formula=cbind(fever, coughing, chestpain, dizziness, shortbreath)~death
or for short:
       formula=cbind(fever, ... ,shortbreath)~death
Note that the short way of writing formula requires the symptoms variables are located in a consecutive block in the data starting from fever and ending with shortbreath. Note that the current version requires the variable on the right hand side of the formula, death in this example, to be present in the community sample. If it is unknown in the community sample, the user needs to create such variable with arbitrary numerical values.
data
A list of two datasets. The first is the hospital data, which contains a known cause of death for each individual, and a collection of symptoms from verbal autopsy studies. The second is the community data where typically only the symptoms are available from the verbal autopsy study. The known cause of death diagnostics may also be known in the community data if this is a validation study, but will not be used during estimation. Variable names must be exactly the same in two data sets.
nsymp.vec
A vector of positive integer, containing different nsymp that can be used by va(). For a total of J number of causes of death and a total of ns symptoms in the sample, nsymp.vec cna be set to be a vector a:b, while a is the smallest integer than $ 2^a>J$ . b is typically set to be floor0.75*b. If sample size is small, b can be set to smaller value to avoid function exiting due to data sparsity. No default value is set.
n.subset
A positive integer specifing the total number of subsets and thus estimations of all symptoms. The default is 300.
prob.wt
A positive integer or a vector of weights that determines how likely a symptom is of being selected for a subset. When prob.wt is a user input vector, it needs to be a vector of probabilities and sum up to 1. The length of [prob.wt needs to be equal to the total number of symptoms. When prob.wt=1, binomial weights which are proportion to the inverse of variances of the each reported binary symptom variable. When prob.wt=0, all symptoms will be equally selected. The default is 1.
boot.se
A Logical value. If TRUE, bootstrap standard errors of the CSMF are estimated. This typically takes a lot of computing time. It is highly suggested to set boot.se=FALSE in va.gcv. Default=FALSE.
nboot
A positive integer. If boot.se=TRUE, it specifies the number of bootstrapping samples taken to estimate the standard errors of CSMF. The default is 1.
printit
Logical value. If TRUE, the progress of the estimation procedure will be printed on the screen.
print.reg.size
Logical value. If TRUE, the size of the regression matrix is printed at each step of subsampling. It provides helpful information for user to choose the number of symptoms to subsample. It is recommended to print the size of the regression matrix for different values of nsymp with a small size of n.subset.



Gary King 2010-09-01