Learn R Programming

CensRegSMSN (version 0.0.1)

gen_SMSNCens_sample: Generate simulated censored data under heavy‑tailed Distributions

Description

Simulates a univariate linear regression dataset with censoring and/or missing values in the response variable, considering that the error follows a SMSN distribution.

Usage

gen_SMSNCens_sample(
  n,
  x,
  beta,
  sigma2,
  lambda,
  nu,
  cens = "Int",
  pcens = 0,
  pna = 0,
  family = "ST"
)

Value

A list with the following components:

y

Fully observed response values (uncensored).

yc

Incomplete response values.

cc

Censoring indicator. 0 for observed data and 1 for censored or missing case.

UL

Vector of upper limits of the censoring interval. Equal to NULL for left or right censoring. For missing data, equal to Inf.

Arguments

n

Integer. Sample size to be generated.

x

Numeric matrix of covariates (dimension n x p). Not contain missing values.

beta

Numeric vector of regression coefficients of length p.

sigma2

Positive numeric scalar. Scale parameter of SMSN class.

lambda

Numeric scalar. Shape parameter that controls the skewness in the SMSN class. Ignored when family = "N", "T" or "CN".

nu

Distribution-specific parameter: for "ST" or "T", nu is a scalar > 2 (degrees of freedom); for "SCN" or "CN", a vector (nu1, nu2) with values in (0,1). Ignored for "SN" and "N".

cens

Character string indicating the type of censoring: "Left", "Right" or "Int". Default is "Int".

pcens

Proportion of censored observations. Must be between 0 and 1. Default is 0.

pna

Proportion of missing values (treated as extreme interval censoring). Must be between 0 and 1. Only allowed when cens = "Int". Default is 0.

family

Character string indicating the error distribution family. Possible values: "SN" (Skew-Normal), "ST" (Skew-t), "SCN" (Skew Contaminated Normal), "N" (Normal), "T" (Student-t) and "CN" (Contaminated Normal). Default is "ST".

Details

The following procedures are applied to the generated response variable with incomplete observation:

  • Left censoring: values below a cutoff point (defined based on the pcens) are replaced by that cutoff, indicating that the true value is less than or equal to it.

  • Right censoring: values above a cutoff point (also based on the pcens) are replaced by that value, indicating that the true value is greater than or equal to it.

  • Interval censoring: a subset of observations is randomly selected (based on the pcens), and each value is replaced by an interval centered at the true value.

  • Missing data: an additional subset of observations (defined based on the pna) is replaced by unbounded intervals of the form (-Inf, Inf), representing complete uncertainty about the true value.

Examples

Run this code
set.seed(1997)

# Generate covariates and true parameter values
n      <- 500
x      <- cbind(1, rnorm(n))
beta   <- c(2, -1)
sigma2 <- 1
lambda <- 3
nu     <- 3

# Generate a simulated dataset under SMSN-ICR model, with interval censoring and/or missing values
sample <- gen_SMSNCens_sample(n = n, x = x, beta = beta, sigma2 = sigma2,
                         lambda = lambda, nu = nu, cens = "Int",
                         pcens = 0.1, pna = 0.05, family = "ST")

# Fit the SMSN-ICR model using the generated data
fit <- CensRegSMSN(sample$cc, x, sample$yc, cens = "Int", UL = sample$UL, get.init = TRUE,
                   show.envelope = TRUE, family = "ST")

Run the code above in your browser using DataLab