codeVA: Running automated method on VA data

Description

Running automated method on VA data

Usage

codeVA(
  data,
  data.type = c("WHO2012", "WHO2016", "PHMRC", "EAVA", "customize")[2],
  data.train = NULL,
  causes.train = NULL,
  causes.table = NULL,
  model = c("InSilicoVA", "InterVA", "Tariff", "NBC", "EAVA")[1],
  Nchain = 1,
  Nsim = 10000,
  version = c("4.02", "4.03", "5")[2],
  HIV = "h",
  Malaria = "h",
  phmrc.type = c("adult", "child", "neonate")[1],
  convert.type = c("quantile", "fixed", "empirical")[1],
  age_group = c("neonate", "child")[1],
  ...
)

Value

a fitted object

Arguments

data

Input VA data, see data.type below for more information about the format.

data.type

There are five data input types currently supported by codeVA function as below.

WHO2012: InterVA-4 input format using WHO 2012 questionnaire. For example see data(RandomVA1). The first column should be death ID.
WHO2016: InterVA-5 input format using WHO 2016 questionnaire. For example see data(RandomVA5). The first column should be death ID.
PHMRC: PHMRC data format. The raw PHMRC long format data will be processed internally following the steps described in McCormick et al. (2016). For example see ConvertData.phmrc
EAVA: EAVA data format using WHO 2016 questionnaire, as produced by [EAVA::odk2EAVA()].
customized: Any dichotomized dataset with ``Y`` denote ``presence'', ``'' denote ``absence'', and ``.'' denote ``missing''. The first column should be death ID.

data.train

Training data with the same columns as data, except for an additional column specifying cause-of-death label. It is not used if data.type is ``WHO'' and model is ``InterVA'' or ``InSilicoVA''. The first column also has to be death ID for ``WHO'' and ``customized'' types.

causes.train

the column name of the cause-of-death assignment label in training data.

causes.table

list of causes to consider in the training data. Default to be NULL, which uses all the causes present in the training data.

model

Currently supports five models: ``InSilicoVA'', ``InterVA'', ``Tariff'', ``NBC'', and ``EAVA''.

Nchain

Parameter specific to ``InSilicoVA'' model. Currently not used.

Nsim

Parameter specific to ``InSilicoVA'' model. Number of iterations to run the sampler.

version

Parameter specific to ``InterVA'' model. Currently supports ``4.02'', ``4.03'', and ``5''. For InterVA-4, ``4.03'' is strongly recommended as it fixes several major bugs in ``4.02'' version. ``4.02'' is only included for backward compatibility. ``5'' version implements the InterVA-5 model, which requires different data input format.

HIV

Parameter specific to ``InterVA'' model. HIV prevalence level, can take values ``h'' (high), ``l'' (low), and ``v'' (very low).

Malaria

HIV Parameter specific to ``InterVA'' model. Malaria prevalence level, can take values ``h'' (high), ``l'' (low), and ``v'' (very low).

phmrc.type

Which PHMRC data format is used. Currently supports only ``adult'' and ``child'', ``neonate'' will be supported in the next release.

convert.type

type of data conversion when calculating conditional probability (probability of each symptom given each cause of death) for InterVA and InSilicoVA models. Both ``quantile'' and ``fixed'' usually give similar results empirically.

quantile: the rankings of the P(S|C) are obtained by matching the same quantile distributions in the default InterVA P(S|C)
fixed: P(S|C) are matched to the closest values in the default InterVA P(S|C) table.
empirical: no ranking is calculated, but use the empirical conditional probabilities directly, which will force updateCondProb to be FALSE for InSilicoVA algorithm.

age_group

Parameter specific to ``EAVA'' model, which identifies the age group of the input VA data. Possible values are ``neonate'' or ``child''.

...

other arguments passed to insilico, InterVA, interVA_train, tariff, and nbc function in the nbc4va package. See respective package documents for details.

References

Tyler H. McCormick, Zehang R. Li, Clara Calvert, Amelia C. Crampin, Kathleen Kahn and Samuel J. Clark (2016) Probabilistic cause-of-death assignment using verbal autopsies. https://arxiv.org/abs/1411.3042, Journal of the American Statistical Association

James, S. L., Flaxman, A. D., Murray, C. J., & Population Health Metrics Research Consortium. (2011). Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Population Health Metrics, 9(1), 1-16.

Zehang R. Li, Tyler H. McCormick, Samuel J. Clark (2014) InterVA4: An R package to analyze verbal autopsy data. Center for Statistics and the Social Sciences Working Paper, No.146

http://www.interva.net/

Miasnikof P, Giannakeas V, Gomes M, Aleksandrowicz L, Shestopaloff AY, Alam D, Tollman S, Samarikhalaj, Jha P. Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths. BMC Medicine. 2015;13:286.

Henry D. Kalter, Abdoulaye-Mamadou Roubanatou, Alain Koffi, and Robert E. Black. (2015). Direct estimates of national neonatal and child cause-specific mortality proportions in Niger by expert algorithm and physician-coded analysis of verbal autopsy interviews. Journal of Global Health 5(1):010415.

Examples

Run this code

# \donttest{
data(RandomVA3)
test <- RandomVA3[1:200, ]
train <- RandomVA3[201:400, ]
fit1 <- codeVA(data = test, data.type = "customize", model = "InSilicoVA",
                    data.train = train, causes.train = "cause",
                    Nsim=1000, auto.length = FALSE)

fit2 <- codeVA(data = test, data.type = "customize", model = "InterVA",
               data.train = train, causes.train = "cause", write=FALSE,
               version = "4.02", HIV = "h", Malaria = "l")

fit3 <- codeVA(data = test, data.type = "customize", model = "Tariff",
               data.train = train, causes.train = "cause", 
               nboot.sig = 100)


# }