mlmm: mlmm function for missing response in multilevel model.

Description

mlmm function handles Bayesian multilevel model with responses that is not-missing-at-random. It is motivated from analysing mass spectrometry data with respondent dependant non-missing-at-random missingness, and when there are known covariates associating with the missingness.

Usage

mlmm(formula_completed, formula_missing, formula_subject, pdata,
  respond_dep_missing = TRUE, pidname, sidname, iterno = 100, chains = 3,
  pathname, thin = 1, seed = 125, algorithm = "NUTS",
  warmup = floor(iterno/2), adapt_delta_value = 0.85, savefile = TRUE,
  usefit = T, stanfit)

Arguments

formula_completed

The main regression model formula. It has the same formula format as lmr() and it is used to define the first level response and its explanatory variables.

formula_missing

The logistic regression model formula. It has the same formula as formula_completed.

formula_subject

The second level formula in the multilevel model which is used to define responses such as subject and its explanatory variables.

pdata

The dataset contains response and predictors in a long format. Response is a vector with an indictor variable to define the corresponding unit. The data needs to have the following rudimental variables: the indicator variable for first level response, second level indicator variable for subject, or a sampling unit, an indicator for missingness and indictor of censoring. Missingness and censor are two different classification, there should not have any overlap between missingness and censored. Data structure can be referenced from the example and reference papers.

respond_dep_missing

An indicator of whether response value is missing-dependant.

pidname

Variable name to define the multilevel response unit , i.e. protein name or gene name

sidname

Vriable name to define the subject unit, i.e. patient id or sampling id

iterno

Number of iterations for the posterior samplings

chains

rstan() parameter to define number of chains of posterior samplings

pathname

Path to save output summary results

thin

rstan() parameter to define the frequency of iterations saved

seed

random seed for rstan() function

algorithm

rstan() parameter which has three options c(NUTS,HMC,Fixed_param).

warmup

Number of iterations for burn-out in stan.

adapt_delta_value

Adaptive delta value is an adaptation parameters for sampling algorithms,default is 0.85, value between 0-1.

savefile

A logical variable to indicate if the sampling files are to be saved.

usefit

A logical variable to indicate if the model use the existing fit.

stanfit

The name of the fitted stan model read from .rds fle.

Value

Return of the function is the result fitted by stan(). It will have the summarized parameters from all chains and summary results for each chain.Plot() function will return the visualization of the mean and parameters.

Examples

Run this code

# NOT RUN {
library(MASS)
set.seed(150)
var2=abs(rnorm(1000,0,1));treatment=c(rep(0,500),rep(1,500))
geneid=rep(c(1:100),50);sid=c(rep(c(1:25),20),rep(c(26:50),20))
cov1=rWishart(1,df=100,Sigma=diag(rep(1,100)))
u=rnorm(100,0,1)
mu=mvrnorm(n=1,mu=u,cov1[,,1])
sdd=rgamma(1,shape=1,scale=1/10)
var1=(1/0.85)*var2+2*treatment
for (i in 1:1000) {var1[i]=var1[i]+rnorm(1,mu[geneid[i]],sdd)}
miss_logit=var2*(-0.9)+var1*(0.01)
probmiss=exp(miss_logit)/(exp(miss_logit)+1)
miss=rbinom(1000,1,probmiss);table(miss)
pdata=data.frame(var1,var2,treatment,miss,geneid,sid)
for ( i in 1:1000) if (pdata$miss[i]==1) pdata$var1[i]=NA;
pidname="geneid";sidname="sid";
#copy and paste the following formulas to the mmlm() function respectively
formula_completed=var1~var2+treatment
formula_missing=miss~var2
formula_censor=censor~1
formula_subject=~treatment
pathdir=getwd()

model3=mlmm(formula_completed=var1~var2+treatment,formula_missing=miss~var2,
formula_subject=~treatment,pdata=pdata,respond_dep_missing=TRUE,
pidname="geneid",sidname="sid",pathname=pathdir,iterno=5,chains=1,savefile=FALSE)
# }

Run the code above in your browser using DataLab