This function fits a Sinh-Arcsinh (shash; Jones & Pewsey, 2009) regression model for continuous norm score modeling, where the distribution parameters vary smoothly as polynomial functions of age or other predictors. The shash distribution is well-suited for psychometric data as it can flexibly model skewness and tail weight independently, making it ideal for handling floor effects, ceiling effects, and varying degrees of individual differences across age groups. In a simulation study (Lenhard et al, 2019), the shash model demonstrated superior performance compared to other parametric approaches from the Box Cox family of functions. In contrast to Box Cox, Sinh-Arcsinh can model distributions including zero and negativ values.
cnorm.shash(
age,
score,
weights = NULL,
mu_degree = 3,
sigma_degree = 2,
epsilon_degree = 2,
delta_degree = 1,
delta = 1,
control = NULL,
scale = "T",
plot = TRUE
)
An object of class "cnormShash" containing the fitted model results. This is a list with components:
Numeric vector of estimated coefficients for the location parameter mu(age). The first coefficient is the intercept, subsequent coefficients correspond to polynomial terms.
Numeric vector of estimated coefficients for the scale parameter log(sigma(age)). Note: These are coefficients for log(sigma) to ensure sigma > 0.
Numeric vector of estimated coefficients for the skewness parameter epsilon(age).
The fixed tail weight parameter value used in fitting.
Numeric vector of estimated coefficients for the tail weight parameter delta(age) - in case, a degree has been set.
Numeric vector of standard errors for all estimated coefficients (if Hessian computation succeeds).
The polynomial degrees used for each parameter.
Complete output from the optim
function, including convergence information,
log-likelihood value, and other optimization details.
A numeric vector of predictor values (typically age, but can be any continuous predictor).
A numeric vector of response values (raw test scores). Must be the same length as age. The value range is unresticted and it can include zeros and negative values.
An optional numeric vector of weights for each observation. Useful for incorporating sampling weights. If NULL (default), all observations are weighted equally.
Integer specifying the degree of the polynomial for modeling the location parameter mu(age). Default is 3. Higher degrees allow more flexible modeling of how the central tendency changes with age, but may lead to overfitting with small samples. Common choices:
1: Linear change with age
2: Quadratic change (allows one inflection point)
3: Cubic change (allows two inflection points, suitable for most developmental curves)
4+: Higher-order changes (use cautiously, mainly for large samples)
Integer specifying the degree of the polynomial for modeling the scale parameter sigma(age). Default is 2. This controls how the variability (spread) of scores changes with age. Lower degrees are often sufficient as variability typically changes more smoothly than location.
Integer specifying the degree of the polynomial for modeling the skewness parameter epsilon(age). Default is 2. This controls how the asymmetry of the distribution changes with age.
Integer specifying the plynomial for modelling the tail weight parameter delte(age). Default is 1. The tail weight can be fixed as well in case of numerical instability. In that case, set 'delta_degree' to NULL and specify a value for delta instead. Recommendation: Keep delta_degree low to avoid overfitting.
Fixed tail weight parameter (must be > 0). Default is 1. This parameter controls the heaviness of the distribution tails and is kept constant across all ages in this implementation. It is only used, if 'delta_degree' is set to NULL. Common values:
delta = 1: Normal-like tail behavior (baseline)
delta > 1: Heavier tails, higher kurtosis (more extreme scores than normal distribution)
delta < 1: Lighter tails, lower kurtosis (fewer extreme scores than normal distribution)
An optional list of control parameters passed to the optim
function for
maximum likelihood estimation. If NULL, sensible defaults are chosen automatically based on
the model complexity. Common parameters to adjust:
factr
: Controls precision of optimization (default: 1e-8)
maxit
: Maximum number of iterations (default: n_parameters * 200)
lmm
: Memory limit for L-BFGS-B (default: min(n_parameters, 20))
Increase maxit
or decrease factr
if optimization fails to converge.
Character string or numeric vector specifying the type of norm scale for output. This affects the scaling of derived norm scores but does not influence model fitting:
"T": T-scores (mean = 50, SD = 10) - default
"IQ": IQ-like scores (mean = 100, SD = 15)
"z": z-scores (mean = 0, SD = 1)
c(M, SD): Custom scale with specified mean M and standard deviation SD
Logical indicating whether to automatically display a diagnostic plot of the fitted model. Default is TRUE.
Wolfgang Lenhard
This implementation uses the Jones & Pewsey (2009) parameterization of the Sinh-Arcsinh distribution. Parameters are estimated using maximum likelihood via the L-BFGS-B algorithm. In case, optimization fails, try reducing model complexity by reducing polynomial degrees or fixing the delta parameter.
The shash distribution is defined by the transformation: $$X = \mu + \sigma \cdot \sinh\left(\frac{\text{arcsinh}(Y) - \epsilon}{\delta}\right)$$ where Y is a standard normal variable, Y ~ N(0,1).
This transformation provides:
mu: Location parameter (similar to mean)
sigma: Scale parameter (similar to standard deviation)
epsilon: Skewness parameter (epsilon = 0 for symmetry)
delta: Tail weight parameter (delta = 1 for normal-like tails)
Choose polynomial degrees based on:
Sample size (higher degrees need more data)
Theoretical expectations about developmental trajectories
Model comparison criteria (AIC, BIC)
Visual inspection of fitted curves
For most applications, mu_degree = 3, sigma_degree = 2, epsilon_degree = 2, delta_degree = 1 provides a good balance of flexibility and parsimony.
Jones, M. C., & Pewsey, A. (2009). Sinh-arcsinh distributions. *Biometrika*, 96(4), 761-780.
Lenhard, A., Lenhard, W., Gary, S. (2019). Continuous norming of psychometric tests: A simulation study of parametric and semi-parametric approaches. *PLoS ONE*, 14(9), e0222279. https://doi.org/10.1371/journal.pone.0222279
if (FALSE) {
# Basic usage with default settings
model <- cnorm.shash(age = children$age, score = children$raw_score)
# Custom polynomial degrees for complex developmental pattern
model_complex <- cnorm.shash(
age = adolescents$age,
score = adolescents$vocabulary_score,
mu_degree = 4, # Complex mean trajectory
sigma_degree = 3, # Changing variability pattern
epsilon_degree = 2, # Skewness shifts
epsilon_degree = NULL, # set to NULL to activate fixed delta
delta = 1.3 # Slightly heavy tails
)
# With sampling weights
model_weighted <- cnorm.shash(
age = survey$age,
score = survey$score,
weights = survey$sample_weight
)
# Custom optimization control for difficult convergence
model_robust <- cnorm.shash(
age = mixed$age,
score = mixed$score,
control = list(factr = 1e-6, maxit = 2000),
delta = 1.5
)
# Compare model fit
compare(model, model_complex)
}
Run the code above in your browser using DataLab