tobit1: Tobit I family for censored GAM

Description

This function implements the Tobit I family for the mgcv package.

Usage

tobit1(link="identity", left.threshold=-Inf, right.threshold=Inf, 
theta=NULL, initial.theta=0)

Value

An object inheriting from class family for use with the mgcv package.

Arguments

link: The link function: one of "log", "identity", "inverse", "sqrt", or a power link.
left.threshold: Threshold value, for which response values below this will be censored. Can be a scalar or a vector of same length as the response.
right.threshold: Threshold value, for which response values above this will be censored. Can be a scalar or a vector of same length as the response.
theta: The log std error. If left NULL, it is estimated.
initial.theta: Optional parameter to set initial theta value if it is being estimated - change this value if the variance is very different from 1.

Details

Under the Tobit I model, given a value sigma and a conditional mean of mu, and left and right threshold values lt and rt, response values y are distributed as

lt if z < lt

rt if z > rt

z otherwise

where z is distribution Normal(mu, sigma). (Note that we have extended the model to include right as well as left censoring.

This function allows a non-linear relationship be estimated between mu and the covariates in a restricted maximum likelihood approach, via application of Wood (2016). We allow differing levels of censorship across dataset by allowing the left and right thresholds to be different between data points.

See the examples for more details of how to fit in practice.

References

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association. <URL: http://arxiv.org/abs/1511.03864>

Examples

Run this code

# Generate random data
set.seed(1)
x <- matrix(2*rnorm(300), 100)
yn <- 2*x[,3] + 4*cos(x[,1]*2)
y <- yn + rnorm(100)
ycensored <- pmax(y, 0) # data left-censored at 0
ycensored <- pmin(ycensored, 4) # data right-censored at 4

par(mfrow = c(3,3))

# True model
plot(gam(y ~ s(x[,1]) + s(x[,2]) + s(x[, 3])), ylim=c(-5, 5), main = "True")

# Naive estimation
plot(gam(ycensored ~ s(x[,1]) + s(x[,2]) + s(x[, 3])), ylim=c(-5, 5), main = "Naive")

# Tobit I estimation
m <- gam(ycensored ~ s(x[,1]) + s(x[,2]) + s(x[, 3]), family = tobit1(left.threshold=0))
summary(m) #note x[,2] is not significant
m$family$getTheta(FALSE) #estimate of theta
m$family$getTheta(TRUE) #estimate of sigma = exp(theta)
plot(m, ylim = c(-5, 5), main = "Tobit I") 

# More realistic dataset requires some data processing
data(nutrient)
# For this dataset, all values >1 are censored. At location D values are censored below at 0, other
# locations are censored at 0.1.
head(nutrient)
nut = data.frame(day = nutrient$day, location = factor(nutrient$location), y=as.numeric(nutrient$y))
table(nutrient$y[is.na(nut$y)])
summary(nut$y)
nut$upper = 10
nut$lower = ifelse(nut$location == "D", 0, 1)
# Recode the data to communicate which is and isn't censored
nut$y[nutrient$y %in% c("<1", "<0")] = -Inf
nut$y[nutrient$y %in% c(">10")] = Inf
# Missing values are best removed here, or can cause confusion later
nut = na.omit(nut)

# Fit including a random effect for location
m = gam(y~ s(day) + s(location, bs="re") , data = nut, 
family = tobit1(left.threshold=nut$lower, right.threshold = nut$upper))

gam.vcomp(m)
anova(m)
summary(m)

Run the code above in your browser using DataLab