hyreg2_het: Estimating hybrid models accounting for heteroscedasticity in continuous data

Description

Estimation of hybrid model using continuous and dichotomous data e.g. EQ-5D data

Usage

hyreg2_het(
  formula,
  formula_sigma = NULL,
  data,
  type,
  type_cont,
  type_dich,
  k = 1,
  control = NULL,
  stv = NULL,
  stv_sigma = NULL,
  offset = NULL,
  opt_method = "BFGS",
  optimizer = "optim",
  lower = -Inf,
  upper = Inf,
  latent = "both",
  id_col = NULL,
  classes_only = FALSE,
  variables_both = NULL,
  variables_dich = NULL,
  variables_cont = NULL,
  ...
)

Value

model object of type flemix, coefficients named ..._h are coefficients for heteroscedasticity

Arguments

formula: linear model formula. Using |xg will include a grouping variable xg. see Details.
formula_sigma: linear formula linear formula for sigma estimation. If formula_sigma is not provided, formula (excluding any grouping variables) is used by default, see Details
data: a data.frame containing the data. see Details.
type: either the name of the column in data containing an indicator of whether an observation is continuous or dichotomous (as character), or a vector containing the indicator. see Details.
type_cont: value of type referring to continuous data. see Details.
type_dich: Value of type referring to dichotomous data. see Details.
k: numeric. Number of latent classes to be estimated via flexmix::flexmix().
control: control list for flexmix::flexmix().
stv: named vector or list of named vectors containing start values for all coefficients formula, including sigma and theta, see Details
stv_sigma: named vector with start values for sigma estimation. Names must correspond to the variables as given in formula_sigma, see Details
offset: offset as in flexmix::flexmix()
opt_method: charachter, optimization method to be used in optimizer, default "BFGS"
optimizer: charachter,optimizer to be used in bbmle::mle2(), default "optim"
lower: numeric, lower bound for censored data, default -INF. If this is used, opt_method must be set to "L-BFGS-B",
upper: numeric, upper bound for censored data, default INF. If this is used, opt_method must be set to "L-BFGS-B",
latent: charachter,data type to use in component identification, must be one of "both", "cont" or "dich", default “both”, see Details
id_col: character, name of the grouping variable, only needed if latent != "both”, see Details
classes_only: logical, default FALSE, indicates whether the function should perform only classification, rather than both classification and model estimation, only possible for latent != "both", see Datails
variables_both: character vector; variables to be fitted on both continuous and dichotomous data. If not specified, all variables from formula are used. If provided and not all variables from formula are included, variables_cont and variables_dich must be provided as well, while one of them can be NULL, see Details.
variables_dich: character vector; variables to be fitted only on dichotomous data, if provided, variables_both and variables_cont must be provided as well.
variables_cont: character vector; variables to be fitted only on continous data. If provided, variables_both and variables_dich must be provided as well.
...: additional arguments for flexmix::flexmix() or bbmle::mle2()

formula

a classic R formula of the form y ~ x1 + x2 + … should be provided. Additionally, it is possible to include a grouping variable for repeated measures by using “| xg” where xg is the column containing the group-memberships. The resulting formula will look like this: y ~ x1 + x2 +… | xg. In flexmix, this is called the concomitant variable specification: the model is fit conditional on grouping, so that all observations with the same group are treated as belonging together when computing likelihood contributions. One possible grouping variable can be an id number to identify answers by the same participants. We highly recommend using a grouping variable, since otherwise the algorithm for k = 2 tends to classify all continuous data into one estimated class and all dichotomous data into the other.

data

a dataframe having the following columns: all independent variables (x) and the dependent variable y used in formula, one column for the grouping variable xg if grouping should be used, e.g. id numbers of participants with repeated measurements, one column indicating if the observations belongs to continuous or dichotomous data with the entries type_cont and type_dich (e.g., for a column called "type" with the entries "TTO" for continuous datapoints and "DCE" for dichotomous datapoints, type_cont will be "TTO" and type_dich will be "DCE"). One row should match one observation (one datapoint).

start values (stv)

if the same start values stv are to be used for all latent classes, the given start values must be a named vector. Otherwise (if different start values are assumed for each latent class), a list of named vectors should be used . In this case, there must be one entry in the list for each latent class. Each start value vector must include start values for sigma and theta. Currently, it is necessary to use the names "sigma" and "theta" for these values. If users are unsure for which variables start values must be provided, this can be checked by calling colnames(model.matrix(formula,data)). In this call, the formula should not include the grouping variable.

formula_sigma, stv_sigma

To account for heteroscedasticity in the data, an additional formula formula_sigma and an additional vector of starting values for this formula (stv_sigma) can be specified. The provided formula_sigma must be linear and the vector stv_sigma must contain start values for all parameters used in the formula. If neither formula_sigma nor stv_sigma are provided, the same inputs as for formula (without controlling for groups) and stv (without sigma) are used. The estimates for sigma can be identified in the model output by the ending "_h". It is important to note that, when using hyreg2_het, neither stv nor stv_sigma are allowed to include sigma, because sigma is estimated with its own formula (in contrast to hyreg2, where sigma must always be specified in stv).

latent, id_col, classes_only

in some situations, it can be useful to identify the latent classes on only one type of data while estimating the model parameters on both types of data. In such cases, the input variable latent can be used to specify on which type of data the classification should be done. If “cont” or “dich” is used, the input parameter id_col must be specified and gives the name, i.e. a character string, of the grouping variable for classification. Some groups may be removed from the data, since they have only continuous or only dichotomous observations. Then in a first step, a model is estimated only on the continuous/dichotomous data and the achieved classification is stored. In a next step, model parameters are estimated separately for each identified class on both types of data using this classification. The output object of hyreg2 in this case is a list of k models. Additionally, at position k+1 of the list, a data frame containing the corresponding classifications from the first step is returned. Each element k in the list contains the estimated parameters for one of the latent classes. When setting the input variable classes_only to TRUE, the second step is left out and the estimated classes from step one are given as output.

variables_both, variables_cont, variables_dich

It is possible to specify partial coefficients, which are used only on continuous or dichotomous data.

Example: Suppose different models should be specified for continuous and dichotomous data:
Model continuous data: y ~ x1 + x3
Model dichotomous data: y ~ x1 + x2
The formula input to hyreg2 must then include all parameters that occur in either model: y ~ x1 + x2 + x3
The assignment of parameters to data types is then achieved via the input arguments variables_both, variables_cont, and variables_dich:
variables_both = “x1”,
variables_cont = “x3” and
variables_dich = “x2”.
Every variable included in the provided formula (except the grouping variable ) must appear in exactly one of these vectors. One of the variables_ vectors can also be NULL, if no variables should be used only on this type of the data.

Author

Svenja Elkenkamp, Kim Rand and John Grosser

Details

see details of different inputs listed below

Examples

Run this code


formula <- y ~  -1 + x1 + x2 + x3
formula_sigma <- y ~  x1 + x2 + x3

k <- 1
stv <- setNames(c(0.2,0,1,1),c(colnames(simulated_data_norm)[3:5],c("theta")))
stv_sigma <- setNames(c(0.2,0,1,1),c(colnames(simulated_data_norm)[3:5],c("(Intercept)")))
control = list(iter.max = 1000, verbose = 4)
rm(counter)
mod <- hyreg2_het(formula = formula,
                   formula_sigma = formula_sigma,
                    data =  simulated_data_norm,
                    type =  simulated_data_norm$type, # or "type"
                    stv = stv,
                    stv_sigma = stv_sigma,
                    k = k,
                    type_cont = "TTO",
                    type_dich = "DCE_A",
                    opt_method = "L-BFGS-B",
                    control = control,
                    latent = "both",
                    id_col = "id"
)
summary_hyreg2(mod)

Run the code above in your browser using DataLab