simLong: Simulate Low/High Dimensional and Linear/Nonlinear Longitudinal dataset.

Description

Simulate p-dimensional linear/Nonlinear mixed-effects model given by: $$Y_i(t)=f(X_i(t))+Z_i(t)\beta_i+\epsilon_i$$ with $Y_i(t)$ the output at time $t$ for the $i$th individual; $X_i(t)$ the input predictors (fixed effects) at time $t$ for the $i$th individual; $Z_i(t)$ are the random effects at time $t$ for the $i$th individual; $\epsilon_i$ is the residual error with variance $\sigma^2$. If linear, $f(X_i(t)) = X_i(t)\theta$, where $\theta = 1, \forall p$, otherwise if nonlinear, the approach by Capitaine et al. (2021) is adapted.

Usage

simLong(
  n,
  p,
  rel_p = 6,
  time_points,
  rho_W = 0.5,
  rho_Z = 0.5,
  random_sd_intercept = 2,
  random_sd_slope = 1,
  noise_sd = 1,
  linear = TRUE
)

Value

a dataframe of dimension (n*time_points) by (p+5) containing the following elements:

id: vector of the individual IDs.
time: vector of the time realizations.
Y: vector of the outcomes variable.
RandomIntercept: vector of the Random Intercept.
RandomSlope: vector of the Random Slope.
Vars : Remainder columns corresponding to the fixed effect variables.

Arguments

n: [numeric]: Number of individuals.
p: [numeric]: Number of predictors.
rel_p: [numeric]: Number of relevant predictors (true predictors that are correlated to the outcome.). The default value is rel_p=6 if linear and rel_p=2 if nonlinear.
time_points: [numeric]: Number of realizations per individual. The default value is time_points=10.
rho_W: [numeric]: Within subject correlation. The default value is rho_W=0.5.
rho_Z: [numeric]: Correlation between intercept and slope for the random effect coefficients. The default value is rho_Z=0.5.
random_sd_intercept: [numeric]: Standard deviation for the random intercept. The default value is random_sd_intercept=$\sqrt{0.5}$.
random_sd_slope: [numeric]: Standard deviation for the random slope. The default value is random_sd_slope=$\sqrt{3}$.
noise_sd: [numeric]: Standard deviation for the random slope. The default value is noise_sd=0.5.
linear: [boolean]: If TRUE, a linear mixed effect model is simulated, if otherwise, a semi-parametric model similar to the one used in Capitaine et al. (2021).

Examples

Run this code

set.seed(1)
data = simLong(n = 17,p = 6,rel_p = 6,time_points = 10,rho_W = 0.6, rho_Z=0.6,
              random_sd_intercept = sqrt(0.5),
              random_sd_slope = sqrt(3),
              noise_sd = 0.5,linear=FALSE) # Generate the data
head(data)   # first six rows of the data.
# Let's see the output :
w <- which(data$id==1)
plot(data$time[w],data$Y[w],type="l",ylim=c(min(data$Y),max(data$Y)), col="grey")
for (i in unique(data$id)){
  w <- which(data$id==i)
  lines(data$time[w],data$Y[w], col='grey')
}
# Let's see the fixed effects predictors:
oldpar <- par(no.readonly = TRUE)
oldopt <- options()
par(mfrow=c(2,3), mar=c(2,3,3,2))
for (i in 1:ncol(data[,-1:-5])){
  w <- which(data$id==1)
  plot(data$time[w],data[,-1:-5][w,i], col="grey",ylim=c(min(data[,-1:-5][,i]),
  max(data[,-1:-5][,i])),xlim=c(1,max(data$time)),main=latex2exp::TeX(paste0("$X^{(",i,")}$")))
  for (k in unique(data$id)){
    w <- which(data$id==k)
    lines(data$time[w],data[,-1:-5][w,i], col="grey")
  }
}
par(oldpar)
options(oldopt)

Run the code above in your browser using DataLab