Learn R Programming

MEGB (version 0.1)

simLong: Simulate Low/High Dimensional and Linear/Nonlinear Longitudinal dataset.

Description

Simulate p-dimensional linear/Nonlinear mixed-effects model given by: $$Y_i(t)=f(X_i(t))+Z_i(t)\beta_i+\epsilon_i$$ with \(Y_i(t)\) the output at time \(t\) for the \(i\)th individual; \(X_i(t)\) the input predictors (fixed effects) at time \(t\) for the \(i\)th individual; \(Z_i(t)\) are the random effects at time \(t\) for the \(i\)th individual; \(\epsilon_i\) is the residual error with variance \(\sigma^2\). If linear, \(f(X_i(t)) = X_i(t)\theta\), where \(\theta = 1, \forall p\), otherwise if nonlinear, the approach by Capitaine et al. (2021) is adapted.

Usage

simLong(
  n,
  p,
  rel_p = 6,
  time_points,
  rho_W = 0.5,
  rho_Z = 0.5,
  random_sd_intercept = 2,
  random_sd_slope = 1,
  noise_sd = 1,
  linear = TRUE
)

Value

a dataframe of dimension (n*time_points) by (p+5) containing the following elements:

  • id: vector of the individual IDs.

  • time: vector of the time realizations.

  • Y: vector of the outcomes variable.

  • RandomIntercept: vector of the Random Intercept.

  • RandomSlope: vector of the Random Slope.

  • Vars : Remainder columns corresponding to the fixed effect variables.

Arguments

n

[numeric]: Number of individuals.

p

[numeric]: Number of predictors.

rel_p

[numeric]: Number of relevant predictors (true predictors that are correlated to the outcome.). The default value is rel_p=6 if linear and rel_p=2 if nonlinear.

time_points

[numeric]: Number of realizations per individual. The default value is time_points=10.

rho_W

[numeric]: Within subject correlation. The default value is rho_W=0.5.

rho_Z

[numeric]: Correlation between intercept and slope for the random effect coefficients. The default value is rho_Z=0.5.

random_sd_intercept

[numeric]: Standard deviation for the random intercept. The default value is random_sd_intercept=\(\sqrt{0.5}\).

random_sd_slope

[numeric]: Standard deviation for the random slope. The default value is random_sd_slope=\(\sqrt{3}\).

noise_sd

[numeric]: Standard deviation for the random slope. The default value is noise_sd=0.5.

linear

[boolean]: If TRUE, a linear mixed effect model is simulated, if otherwise, a semi-parametric model similar to the one used in Capitaine et al. (2021).

Examples

Run this code
set.seed(1)
data = simLong(n = 17,p = 6,rel_p = 6,time_points = 10,rho_W = 0.6, rho_Z=0.6,
              random_sd_intercept = sqrt(0.5),
              random_sd_slope = sqrt(3),
              noise_sd = 0.5,linear=FALSE) # Generate the data
head(data)   # first six rows of the data.
# Let's see the output :
w <- which(data$id==1)
plot(data$time[w],data$Y[w],type="l",ylim=c(min(data$Y),max(data$Y)), col="grey")
for (i in unique(data$id)){
  w <- which(data$id==i)
  lines(data$time[w],data$Y[w], col='grey')
}
# Let's see the fixed effects predictors:
oldpar <- par(no.readonly = TRUE)
oldopt <- options()
par(mfrow=c(2,3), mar=c(2,3,3,2))
for (i in 1:ncol(data[,-1:-5])){
  w <- which(data$id==1)
  plot(data$time[w],data[,-1:-5][w,i], col="grey",ylim=c(min(data[,-1:-5][,i]),
  max(data[,-1:-5][,i])),xlim=c(1,max(data$time)),main=latex2exp::TeX(paste0("$X^{(",i,")}$")))
  for (k in unique(data$id)){
    w <- which(data$id==k)
    lines(data$time[w],data[,-1:-5][w,i], col="grey")
  }
}
par(oldpar)
options(oldopt)

Run the code above in your browser using DataLab