gen_SMSNCens_sample: Generate simulated censored data under heavy‑tailed Distributions

Description

Simulates a univariate linear regression dataset with censoring and/or missing values in the response variable, considering that the error follows a SMSN distribution.

Usage

gen_SMSNCens_sample(
  n,
  x,
  beta,
  sigma2,
  lambda,
  nu,
  cens = "Int",
  pcens = 0,
  pna = 0,
  family = "ST"
)

Value

A list with the following components:

y: Fully observed response values (uncensored).
yc: Incomplete response values.
cc: Censoring indicator. 0 for observed data and 1 for censored or missing case.
UL: Vector of upper limits of the censoring interval. Equal to NULL for left or right censoring. For missing data, equal to Inf.

Arguments

n: Integer. Sample size to be generated.
x: Numeric matrix of covariates (dimension n x p). Not contain missing values.
beta: Numeric vector of regression coefficients of length p.
sigma2: Positive numeric scalar. Scale parameter of SMSN class.
lambda: Numeric scalar. Shape parameter that controls the skewness in the SMSN class. Ignored when family = "N", "T" or "CN".
nu: Distribution-specific parameter: for "ST" or "T", nu is a scalar > 2 (degrees of freedom); for "SCN" or "CN", a vector (nu1, nu2) with values in (0,1). Ignored for "SN" and "N".
cens: Character string indicating the type of censoring: "Left", "Right" or "Int". Default is "Int".
pcens: Proportion of censored observations. Must be between 0 and 1. Default is 0.
pna: Proportion of missing values (treated as extreme interval censoring). Must be between 0 and 1. Only allowed when cens = "Int". Default is 0.
family: Character string indicating the error distribution family. Possible values: "SN" (Skew-Normal), "ST" (Skew-t), "SCN" (Skew Contaminated Normal), "N" (Normal), "T" (Student-t) and "CN" (Contaminated Normal). Default is "ST".

Details

The following procedures are applied to the generated response variable with incomplete observation:

Left censoring: values below a cutoff point (defined based on the pcens) are replaced by that cutoff, indicating that the true value is less than or equal to it.
Right censoring: values above a cutoff point (also based on the pcens) are replaced by that value, indicating that the true value is greater than or equal to it.
Interval censoring: a subset of observations is randomly selected (based on the pcens), and each value is replaced by an interval centered at the true value.
Missing data: an additional subset of observations (defined based on the pna) is replaced by unbounded intervals of the form (-Inf, Inf), representing complete uncertainty about the true value.

Examples

Run this code

set.seed(1997)

# Generate covariates and true parameter values
n      <- 500
x      <- cbind(1, rnorm(n))
beta   <- c(2, -1)
sigma2 <- 1
lambda <- 3
nu     <- 3

# Generate a simulated dataset under SMSN-ICR model, with interval censoring and/or missing values
sample <- gen_SMSNCens_sample(n = n, x = x, beta = beta, sigma2 = sigma2,
                         lambda = lambda, nu = nu, cens = "Int",
                         pcens = 0.1, pna = 0.05, family = "ST")

# Fit the SMSN-ICR model using the generated data
fit <- CensRegSMSN(sample$cc, x, sample$yc, cens = "Int", UL = sample$UL, get.init = TRUE,
                   show.envelope = TRUE, family = "ST")

Run the code above in your browser using DataLab