cnorm.shash: Fit a Sinh-Arcsinh (ShaSh) Regression Model for Continuous Norming

Description

This function fits a Sinh-Arcsinh (ShaSh; Jones & Pewsey, 2009) regression model for continuous norm score modeling, where the distribution parameters vary smoothly as polynomial functions of age or other predictors. The ShaSh distribution is well-suited for psychometric data as it can flexibly model skewness and tail weight independently, making it ideal for handling floor effects, ceiling effects, and varying degrees of individual differences across age groups. In a simulation study (Lenhard et al, 2019), the ShaSh model demonstrated superior performance compared to other parametric approaches from the Box Cox family of functions. In contrast to Box Cox, Sinh-Arcsinh can model distributions including zero and negativ values.

Usage

cnorm.shash(
  age,
  score,
  weights = NULL,
  mu_degree = 3,
  sigma_degree = 2,
  epsilon_degree = 1,
  delta = 1,
  delta_degree = NULL,
  control = NULL,
  scale = "T",
  plot = TRUE
)

Value

An object of class "cnormShaSh" containing the fitted model results. This is a list with components:

mu_est: Numeric vector of estimated coefficients for the location parameter mu(age). The first coefficient is the intercept, subsequent coefficients correspond to polynomial terms.
sigma_est: Numeric vector of estimated coefficients for the scale parameter log(sigma(age)). Note: These are coefficients for log(sigma) to ensure sigma > 0.
epsilon_est: Numeric vector of estimated coefficients for the skewness parameter epsilon(age).
delta: The fixed tail weight parameter value used in fitting.
delta_est: Numeric vector of estimated coefficients for the tail weight parameter delta(age) - in case, a degree has been set.
se: Numeric vector of standard errors for all estimated coefficients (if Hessian computation succeeds).
mu_degree, sigma_degree, epsilon_degree: The polynomial degrees used for each parameter.
result: Complete output from the optim function, including convergence information, log-likelihood value, and other optimization details.

Arguments

age

A numeric vector of predictor values (typically age, but can be any continuous predictor).

score

A numeric vector of response values (raw test scores). Must be the same length as age. The value range is unresticted and it can include zeros and negative values.

weights

An optional numeric vector of weights for each observation. Useful for incorporating sampling weights. If NULL (default), all observations are weighted equally.

mu_degree

Integer specifying the degree of the polynomial for modeling the location parameter mu(age). Default is 3. Higher degrees allow more flexible modeling of how the central tendency changes with age, but may lead to overfitting with small samples. Common choices:

1: Linear change with age
2: Quadratic change (allows one inflection point)
3: Cubic change (allows two inflection points, suitable for most developmental curves)
4+: Higher-order changes (use cautiously, mainly for large samples)

sigma_degree

Integer specifying the degree of the polynomial for modeling the scale parameter sigma(age). Default is 2. This controls how the variability (spread) of scores changes with age. Lower degrees are often sufficient as variability typically changes more smoothly than location.

epsilon_degree

Integer specifying the degree of the polynomial for modeling the skewness parameter epsilon(age). Default is 1. This controls how the asymmetry of the distribution changes with age.

delta

Fixed tail weight parameter (must be > 0). Default is 1. This parameter controls the heaviness of the distribution tails and is kept constant across all ages in this implementation.

delta = 1: Normal-like tail behavior (baseline)
delta > 1: Heavier tails, higher kurtosis (more extreme scores than normal distribution)
delta < 1: Lighter tails, lower kurtosis (fewer extreme scores than normal distribution)

delta_degree

Instead of the fixed delta parameter, you can also model delta as a polynomial function of age. The default setting is NULL, which means delta is fixed. If you specify an integer value (e.g., 1, 2), the function will fit a polynomial of that degree to model how tail weight changes with age. The degree has to be set explicitly and should be kept low (1 or 2) to avoid overfitting.

control

An optional list of control parameters passed to the optim function for maximum likelihood estimation. If NULL, sensible defaults are chosen automatically based on the model complexity. Common parameters to adjust:

factr: Controls precision of optimization (default: 1e-8)
maxit: Maximum number of iterations (default: n_parameters * 200)
lmm: Memory limit for L-BFGS-B (default: min(n_parameters, 20))

Increase maxit or decrease factr if optimization fails to converge.

scale

Character string or numeric vector specifying the type of norm scale for output. This affects the scaling of derived norm scores but does not influence model fitting:

"T": T-scores (mean = 50, SD = 10) - default
"IQ": IQ-like scores (mean = 100, SD = 15)
"z": z-scores (mean = 0, SD = 1)
c(M, SD): Custom scale with specified mean M and standard deviation SD

plot

Logical indicating whether to automatically display a diagnostic plot of the fitted model. Default is TRUE.

Author

Wolfgang Lenhard

Details

This implementation uses the Jones & Pewsey (2009) parameterization of the Sinh-Arcsinh distribution. Parameters are estimated using maximum likelihood via the L-BFGS-B algorithm.

The Sinh-Arcsinh Distribution

The ShaSh distribution is defined by the transformation: $$X = \mu + \sigma \cdot \sinh\left(\frac{\text{arcsinh}(Y) - \epsilon}{\delta}\right)$$ where Y is a standard normal variable, Y ~ N(0,1).

This transformation provides:

mu: Location parameter (similar to mean)
sigma: Scale parameter (similar to standard deviation)
epsilon: Skewness parameter (epsilon = 0 for symmetry)
delta: Tail weight parameter (delta = 1 for normal-like tails)

Model Selection

Choose polynomial degrees based on:

Sample size (higher degrees need more data)
Theoretical expectations about developmental trajectories
Model comparison criteria (AIC, BIC)
Visual inspection of fitted curves

For most applications, mu_degree = 3, sigma_degree = 2, epsilon_degree = 2 provides a good balance of flexibility and parsimony.

References

Jones, M. C., & Pewsey, A. (2009). Sinh-arcsinh distributions. *Biometrika*, 96(4), 761-780.

Lenhard, A., Lenhard, W., Gary, S. (2019). Continuous norming of psychometric tests: A simulation study of parametric and semi-parametric approaches. PLoS ONE, 14(9), e0222279. https://doi.org/10.1371/journal.pone.0222279

Examples

Run this code

if (FALSE) {
# Basic usage with default settings
model <- cnorm.shash(age = children$age, score = children$raw_score)

# Custom polynomial degrees for complex developmental pattern
model_complex <- cnorm.shash(
  age = adolescents$age,
  score = adolescents$vocabulary_score,
  mu_degree = 4,      # Complex mean trajectory
  sigma_degree = 3,   # Changing variability pattern
  epsilon_degree = 2, # Skewness shifts
  delta = 1.3         # Slightly heavy tails
)

# Even more complex model, including a polynomial for the delta parameter
model_complex2 <- cnorm.shash(
  age = adolescents$age,
  score = adolescents$vocabulary_score,
  mu_degree = 4,      # Complex mean trajectory
  sigma_degree = 3,   # Changing variability pattern
  epsilon_degree = 2, # Skewness shifts
  delta_deree = 2     # Quadratic change for the tail weight
)

# Homogeneous population with light tails
model_selective <- cnorm.shash(
  age = gifted$age,
  score = gifted$achievement,
  delta = 0.8,        # Lighter tails for selected population
  sigma_degree = 1    # Simple linear variance change
)

# With sampling weights
model_weighted <- cnorm.shash(
  age = survey$age,
  score = survey$score,
  weights = survey$sample_weight
)

# Custom optimization control for difficult convergence
model_robust <- cnorm.shash(
  age = mixed$age,
  score = mixed$score,
  control = list(factr = 1e-6, maxit = 2000),
  delta = 1.5
)

# Compare model fit
compare(model_complex, model_complex2)
}

Run the code above in your browser using DataLab