glmssn: Fitting Generalized Linear Models for Spatial Stream Networks

Description

This function works on objects of class SpatialStreamNetwork to fit generalized linear models with spatially autocorrelated errors using normal likelihood methods (including REML) and quasi-likelihood for Poisson and Binomial families. The spatial formulation is described in Ver Hoef and Peterson (2010) and Peterson and Ver Hoef (2010).

Usage

glmssn(formula, ssn.object, family = "Gaussian", CorModels = c("Exponential.tailup","Exponential.taildown","Exponential.Euclid"), use.nugget = TRUE, use.anisotropy = FALSE, addfunccol = NULL, trialscol = NULL,  EstMeth = "REML", useTailDownWeight = FALSE, trans.power = NULL,trans.shift = 0, control = list(max.range.factor = 4, trunc.pseudo = NULL, maxiter.pseudo = 20, beta.converge = 1e-05))

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

ssn.object

an object of class SpatialStreamNetwork, representing a spatial stream network. This contains the variables used in the model.

family

the error distribution and link function to be used in the model. This is a character string that is either "Gaussian", "Poisson", or "Binomial."

CorModels

a vector of spatial autocorrelation models for stream networks. The individual models should be of different "types" tail-up, tail-down, Euclidean, or NULL for a non-spatial model. The tailup models include: "Exponential.tailup" (default), "LinearSill.tailup", "Spherical.tailup", "Mariah.tailup" "Epanech.tailup"; tail-down models include: "Exponential.taildown" (default), "LinearSill.taildown", "Spherical.taildown", "Mariah.taildown", "Epanech.taildown"; Euclidean distance models include: "Spherical.Euclid", "Gaussian.Euclid", "Exponential.Euclid" (default), "Cauchy.Euclid". The first 4 tailup and taildown models are described in Ver Hoef and Peterson (2010), and the "Epanech" models are described in Garreta, Monestiez, and Ver Hoef (2010), and the 4 Euclidean distance models are standard spatial covariance models. If this is NULL, then use.nugget = TRUE will impose independence between observations, or a classical regression analysis non-spatial model. Basic random effects can be included in the model here also. See examples below.

use.nugget

add a nugget effect, default is TRUE. This can be thought of as a variance component for independent errors, adding a variance component only along the diagonal of the covariance matrix.

use.anisotropy

use anistropy for the Euclidean distance based spatial model in CorModels

addfunccol

the name of the variable in the SpatialStreamNetwork object that is used to define spatial weights. For the tailup models, weights need to be used for branching. This is an additive function and is described in Ver Hoef and Peterson (2010). See example below.

trialscol

name of the variable in the SpatialStreamNetwork object that contains the sample size when a binomial distribution is used. If NULL, a sample size of 1 is assumed, and the response variable must be binary (0 or 1).

EstMeth

Estimation method; either "ML" for maximum likelihood, or "REML" for restricted maximum likelihood (default).

useTailDownWeight

use stream segment weighting in the tail-down model? Default is FALSE. Weighting is same as for tail-up models, based on an additive function.

trans.power

power transformation for the response variable in case of Gaussian data. It must be between 0 and 0.5, and if 0, a natural log is used.

trans.shift

a shift (addition or subtraction) applied to the response variable prior to the power tranformation

control

a list of control parameters, consisting of four items: 1) max.range.factor; this sets the maximum range as a function of the maximum distance among observed data locations, 2) trunc.pseudo; this sets a truncation value for pseudo-data for the quasi-models (family binomial and poisson). Because the data are modeled on a log or logit scale, exponentiation can cause numerical overflows, so this sets an upper bound, 3) maxiter.pseudo; this sets the maximum number of iterations when creating pseudo data for quasi-models. 4)beta.converge; this sets convergence criteria on fixed effect estimates. When all changes in the fixed effect estimates are less than beta.converge during an iteratively reweighted least squares update, then iteration stops. The default setting for control is control = list(max.range.factor = 4, trunc.pseudo = NULL, maxiter.pseudo = 20, beta.converge = 1e-5)

Value

glmssn returns an object of class "glmssn". This is a list of 5 objects, with the following structure:

    outpt <- list(
    args = list( ## stores all arguments used in function call
    formula = formula,
    zcol = dataXY.out$respvecs$response.col, # response column
    family = family,
    CorModels = CorModels,
    useTailDownWeights = useTailDownWeights,
    use.nugget = use.nugget,
    use.anisotropy = use.anisotropy,
    addfunccol = addfunccol,
    trialscol = trialscol,
    EstMeth = EstMeth,
    trans.power = trans.power,
    trans.shift = trans.shift
    ),ssn.object = ssn.object, # input object of class "SpatialStreamNetwork"sampinfo = list( # sample information
    # indicator vector for non-missing response values
    ind.obs = ind[order(data[,"pid"])], 
    sample.size = n.all, # total number of records in the data frame
    # number of records with non-missing response values
    obs.sample.size = n.allxy, 
    missing.sample.size = n.all - n.allxy, # number of missing response values
    rankX = p, # rank of X
    # vector of the response variable
    z = zt[order(dataXY.out$datasets$data2[,"pid"])], 
    X = X2[order(dataXY.out$datasets$data2[,"pid"]),], # design matrix
    effnames = dataXY.out$Xmats$effnames,
    setzero = dataXY.out$indvecs$setzero,
    setNA = dataXY.out$indvecs$setNA,
    setNA2 = dataXY.out$indvecs$setNA2,
    cutX1toX2 = dataXY.out$indvecs$cutX1toX2,
    StdXDataFrame = dataXY.out$Xmats$StdXDataFrame
    ),estimates = list(
    theta=parmest, # estimated covariance parameters
    # estimated covariance matrix
    V = V[order(dataXY.out$datasets$data2[,"pid"]),
    order(dataXY.out$datasets$data2[,"pid"])],
    # inverse of estimated covariance matrix
    Vi = Vi[order(dataXY.out$datasets$data2[,"pid"]), 
    order(dataXY.out$datasets$data2[,"pid"])],
    betahat = b.hat, # estimated fixed effects
    covb = covb, # estimated covariance matrix of estimated fixed effects
    # inverse of estimated covariance matrix of estimated fixed effects
    covbi = covbi, 
    m2LL = m2LL # -2 times log-likelihood
    ),optimOutput=parmest.out
    )

Details

Models for glmssn are specified symbolically, similar to lm and other models in R. A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of fixed effect linear predictors for the response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second. See model.matrix for further details. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on. A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula for more details of allowed formulae.

The spatial formulation is described in Ver Hoef and Peterson (2010) and Peterson and Ver Hoef (2010).

References

Garreta, V., Monestiez, P. and Ver Hoef, J.M. (2010) Spatial modelling and prediction on river networks: up model, down model, or hybrid? Environmetrics 21(5), 439--456.

Peterson, E.E. and Ver Hoef, J.M. (2010) A mixed-model moving-average approach to geostatistical modeling in stream networks. Ecology 91(3), 644--651.

Ver Hoef, J.M. and Peterson, E.E. (2010) A moving average approach for spatial statistical models of stream networks (with discussion). Journal of the American Statistical Association 105, 6--18. DOI: 10.1198/jasa.2009.ap08248. Rejoinder pgs. 22--24.

Examples

Run this code


library(SSN)
# NOT RUN 
# mf04p <- importSSN(system.file("lsndata/MiddleFork04.ssn", 
#	  package = "SSN1.1"), predpts = "pred1km", o.write = TRUE)
#   use SpatialStreamNetwork object mf04p that was already created
data(mf04p)
#make sure mf04p has the correct path, will vary for each users installation
mf04p@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")

# The models take a little time to fit, so they are NOT RUN
# Uncomment the code to run them
# Alternatively, you can load the fitted models first to look at results
data(modelFits)

## Non-spatial model
# fitNS <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, CorModels = NULL, 
#    EstMeth = "REML", family = "Gaussian")
#make sure fitSN has the correct path, will vary for each users installation
fitNS$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(fitNS)

## Random effect model using STREAMNAME as our random effect
#fitRE <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, EstMeth = "REML", family = "Gaussian",
#    CorModels = c("STREAMNAME"))
#make sure fitRE has the correct path, will vary for each users installation
fitRE$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(fitRE)
## random effects details
fitREBLUP <- BLUP(fitRE)
str(fitREBLUP)
fitREBLUP$Mean

## Basic spatial model with a random effect
#fitSpRE1 <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, EstMeth = "REML", family = "Gaussian",
#    CorModels = c("STREAMNAME","Exponential.Euclid"))
#make sure fitSpRE1 has the correct path, will vary for each users installation
fitSpRE1$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(fitSpRE1)

## Spatial stream tail-up model with a random effect
#fitSpRE2 <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, EstMeth = "REML", family = "Gaussian",
#    CorModels = c("STREAMNAME","Exponential.tailup"),
#    addfunccol = "afvArea")
#make sure fitSpRE2 has the correct path, will vary for each users installation
fitSpRE2$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(fitSpRE2)

## 3 component spatial model
#fitSp <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, EstMeth = "REML", family = "Gaussian",
#    CorModels = c("Exponential.tailup","Exponential.taildown",
#    "Exponential.Euclid"), addfunccol = "afvArea")
#make sure fitSp has the correct path, will vary for each users installation
fitSp$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")

## Summarise last model
summary(fitSp)

## AIC for last model
AIC(fitSp)

## Generalised R-squared for last model
GR2(fitSp)

## Look at variance components in more detail
covparms(fitSp)
varcomp(fitSp)

## Compare models
InfoCritCompare(list(fitNS, fitRE, fitSpRE1, fitSpRE2, fitSp))

## Fit a model to binary data
#binSp <- glmssn(MaxOver20 ~ ELEV_DEM + SLOPE, mf04p,
#  CorModels = c("Mariah.tailup", "Spherical.taildown"),
#  family = "binomial", addfunccol = "afvArea")
#make sure binSp has the correct path, will vary for each users installation
binSp$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(binSp)

## Fit a model to count data
#poiSp <- glmssn(C16 ~ ELEV_DEM + SLOPE, mf04p,
#  CorModels = c("LinearSill.tailup", "LinearSill.taildown"),
#  family = "poisson", addfunccol = "afvArea")
#make sure poiSp has the correct path, will vary for each users installation
poiSp$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(poiSp)

Run the code above in your browser using DataLab