Previous: Usage Up: Function va() Next: Value

Inputs

formula
A formula object. The left side of the formula is the collection of symptoms. The right side is the cause of death. For example, if there are totally 5 symptoms, named fever,coughing,chestpain,dizziness, shortbreath, and the cause of death variable is death, then the formula can be written as:

       formula=cbind(fever, coughing, chestpain, dizziness, shortbreath)~death
or for short:
       formula=cbind(fever, ... ,shortbreath)~death
Note that the short way of writing formula requires the symptoms variables are located in a consecutive block in the data starting from fever and ending with shortbreath. Note that the current version requires the variable on the right hand side of the formula, death in this example, to be present in the community sample. If it is unknown in the community sample, the user needs to create such variable with arbitrary numerical values.
data
A list of two datasets. The first is the hospital data, which contains a known cause of death for each individual, and a collection of symptoms from verbal autopsy studies. The second is the community data where typically only the symptoms are available from the verbal autopsy study. The known cause of death diagnostics may also be known in the community data if this is a validation study, but will not be used during estimation. Variable names must be exactly the same in two data sets.
nsymp
A positive integer specifying the number of symptoms to be subset from all symptoms for estimating cause specific mortality fractions at each iteration. For the choice of nsymp, refer to King and Lu (2006). For practical purpose, we give the following recommendations: for total number of causes of death D<=10, use 7-12 symptoms; for D>10, use 12-18 symptoms. If the number of observations is large in both hospital and community samples, for example, over 1000 cases total, use more symptoms, otherwise use fewer. Sensitivity analysis can also be used to choose nsymp. In general, the results stabilize in the right range of the choices of nsymp. Default=16.
n.subset
A positive integer specifying the total number of draws of different subsets of symptoms. Default=300.
method
A string specifying the computational procedure used to estimate the cause specific mortality fractions. When method=''quadOpt'', CSMF is estimated via constrained quadratic programming. A subroutine (Solve.QP) from the quadprog package is called to perform the constrained quadratic optimization task. When method=``constrainLS'', CSMF is estimated via constrained least squares. The default method is quadprog as it is faster and more stable.
fix
A vector of strings that allow the user to fix a subset of the cause-specific mortality fractions to predetermined values chosen by the user (based on, e.g., the information obtained from other sources or prior knowledge). For example, setting fix=c("malaria=0.15", "injuries=0.05") fixes the mortality fractions due to malaria and injuries to 15% and 5%, respectively. Running va in this case will then attempt to allocate only the remaining 80% of the deaths. The default is NA, which means no constraint is imposed.
bound
A vector of strings that allow the user to set fixed lower and upper bounds for a given subset of the cause specific mortality fractions (based on, e.g., information obtained from other sources or prior knowledge). For example, running va while setting bound=c("0.2 < HIV < 0.35", "0.1 < TB < 0.15") restricts the mortality fraction due to HIV to be between 20% and 35% and the TB rate is constrained to be between 10% and 15%. Causes not specified here are assumed to be bounded only by 0 and 1, and with the collection still constrained to the simplex. The default is NA, which means no constraint is imposed.
prob.wt
A positive integer or a vector of weights that determines how likely a symptom is being selected in a subset. When prob.wt is a user input vector, it needs to be a vector of probabilities and sum up to 1. The length of prob.wt needs to be equal to the total number of symptoms. When prob.wt=1, binomial weights which are proportion to the inverse of variances of the each reported binary symptom variable. When prob.wt=0, all symptoms will be equally selected. Default=1.
boot.se
A Logical value. If TRUE, bootstrap standard errors of the CSMF are estimated. This option typically takes a lot of computing time. The default is FALSE.
nboot
A positive integer. If boot.se=TRUE, it specifies the number of bootstrapping samples taken to estimate the standard errors of CSMF. The default is 300.
printit
Logical value. If TRUE, the progress of the estimation procedure is printed on the screen.
clean.data
Logical value. If TRUE, va automatically deletes the symptoms variables(left-hand side of the formula) where there is no variation (all 0s or 1s). If FALSE, the user must make sure the data is cleaned before hand(which is recommended).
print.reg.size
Logical value. If TRUE, the size of the regression matrix is printed at each step of sub-sampling. It provides helpful information for user to choose the number of symptoms to subsample. It is recommended to print the size of the regression matrix for different values of nsymp with a small size of n.subset.



Gary King 2010-09-01