The shash
family implements the four-parameter sinh-arcsinh (shash) distribution of
Jones and Pewsey (2009). The location, scale, skewness and kurtosis of the density can depend
on additive smooth predictors. Useable only with gam, the linear predictors are specified
via a list of formulae. It is worth carefully considering whether the data are sufficient to support
estimation of such a flexible model before using it.
shash(link = list("identity", "logeb", "identity", "identity"),
b = 1e-2, phiPen = 1e-3)
An object inheriting from class general.family
.
vector of four characters indicating the link function for location, scale, skewness and kurtosis parameters.
positive parameter of the logeb link function, see Details.
positive multiplier of a ridge penalty on kurtosis parameter. Do not touch it unless you know what you are doing, see Details.
Matteo Fasiolo <matteo.fasiolo@gmail.com> and Simon N. Wood.
The density function of the shash family is
shash
can model skewness to either side, depending on the sign of
The density is based on the expression given on the second line of section 4.1 and equation (2) of Jones and Pewsey (2009), and uses the simple reparameterization given in section 4.3.
The link function used for
Jones, M. and A. Pewsey (2009). Sinh-arcsinh distributions. Biometrika 96 (4), 761-780. tools:::Rd_expr_doi("10.1093/biomet/asp053")
Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 tools:::Rd_expr_doi("10.1080/01621459.2016.1180986")
###############
# Shash dataset
###############
## Simulate some data from shash
set.seed(847)
n <- 1000
x <- seq(-4, 4, length.out = n)
X <- cbind(1, x, x^2)
beta <- c(4, 1, 1)
mu <- X %*% beta
sigma = .5+0.4*(x+4)*.5 # Scale
eps = 2*sin(x) # Skewness
del = 1 + 0.2*cos(3*x) # Kurtosis
dat <- mu + (del*sigma)*sinh((1/del)*asinh(qnorm(runif(n))) + (eps/del))
dataf <- data.frame(cbind(dat, x))
names(dataf) <- c("y", "x")
plot(x, dat, xlab = "x", ylab = "y")
## Fit model
fit <- gam(list(y ~ s(x), # <- model for location
~ s(x), # <- model for log-scale
~ s(x), # <- model for skewness
~ s(x, k = 20)), # <- model for log-kurtosis
data = dataf,
family = shash, # <- new family
optimizer = "efs")
## Plotting truth and estimates for each parameters of the density
muE <- fit$fitted[ , 1]
sigE <- exp(fit$fitted[ , 2])
epsE <- fit$fitted[ , 3]
delE <- exp(fit$fitted[ , 4])
par(mfrow = c(2, 2))
plot(x, muE, type = 'l', ylab = expression(mu(x)), lwd = 2)
lines(x, mu, col = 2, lty = 2, lwd = 2)
legend("top", c("estimated", "truth"), col = 1:2, lty = 1:2, lwd = 2)
plot(x, sigE, type = 'l', ylab = expression(sigma(x)), lwd = 2)
lines(x, sigma, col = 2, lty = 2, lwd = 2)
plot(x, epsE, type = 'l', ylab = expression(epsilon(x)), lwd = 2)
lines(x, eps, col = 2, lty = 2, lwd = 2)
plot(x, delE, type = 'l', ylab = expression(delta(x)), lwd = 2)
lines(x, del, col = 2, lty = 2, lwd = 2)
## Plotting true and estimated conditional density
par(mfrow = c(1, 1))
plot(x, dat, pch = '.', col = "grey", ylab = "y", ylim = c(-35, 70))
for(qq in c(0.001, 0.01, 0.1, 0.5, 0.9, 0.99, 0.999)){
est <- fit$family$qf(p=qq, mu = fit$fitted)
true <- mu + (del * sigma) * sinh((1/del) * asinh(qnorm(qq)) + (eps/del))
lines(x, est, type = 'l', col = 1, lwd = 2)
lines(x, true, type = 'l', col = 2, lwd = 2, lty = 2)
}
legend("topleft", c("estimated", "truth"), col = 1:2, lty = 1:2, lwd = 2)
#####################
## Motorcycle example
#####################
# Here shash is overkill, in fact the fit is not good, relative
# to what we would get with mgcv::gaulss
library(MASS)
b <- gam(list(accel~s(times, k=20, bs = "ad"), ~s(times, k = 10), ~1, ~1),
data=mcycle, family=shash)
par(mfrow = c(1, 1))
xSeq <- data.frame(cbind("accel" = rep(0, 1e3),
"times" = seq(2, 58, length.out = 1e3)))
pred <- predict(b, newdata = xSeq)
plot(mcycle$times, mcycle$accel, ylim = c(-180, 100))
for(qq in c(0.1, 0.3, 0.5, 0.7, 0.9)){
est <- b$family$qf(p=qq, mu = pred)
lines(xSeq$times, est, type = 'l', col = 2)
}
plot(b, pages = 1, scale = FALSE)
Run the code above in your browser using DataLab