glmssn: Fitting Generalized Linear Models for Spatial Stream Networks

Description

This function works on objects of class SpatialStreamNetwork to fit generalized linear models with spatially autocorrelated errors using normal likelihood methods (including REML) and quasi-likelihood for Poisson and Binomial families. The spatial formulation is described in Ver Hoef and Peterson (2010) and Peterson and Ver Hoef (2010).

Usage

glmssn(formula, ssn.object, family = "Gaussian", CorModels =
  c("Exponential.tailup","Exponential.taildown","Exponential.Euclid"),
  use.nugget = TRUE, use.anisotropy = FALSE, addfunccol = NULL, trialscol = NULL, 
  EstMeth = "REML", useTailDownWeight = FALSE, trans.power = NULL,trans.shift = 0,
  control = list(max.range.factor = 4, trunc.pseudo = NULL,
  maxiter.pseudo = 20, beta.converge = 1e-05))

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

ssn.object

an object of class SpatialStreamNetwork, representing a spatial stream network. This contains the variables used in the model.

family

the error distribution and link function to be used in the model. This is a character string that is either "Gaussian", "Poisson", or "Binomial."

CorModels

a vector of spatial autocorrelation models for stream networks. The individual models should be of different "types" tail-up, tail-down, Euclidean, or NULL for a non-spatial model. The tailup models include: "Exponential.tailup" (default), "L

use.nugget

add a nugget effect, default is TRUE. This can be thought of as a variance component for independent errors, adding a variance component only along the diagonal of the covariance matrix.

use.anisotropy

use anistropy for the Euclidean distance based spatial model in CorModels

addfunccol

the name of the variable in the SpatialStreamNetwork object that is used to define spatial weights. For the tailup models, weights need to be used for branching.

trialscol

name of the variable in the SpatialStreamNetwork object that contains the sample size when a binomial distribution is used. If NULL, a sample size of 1 is as

EstMeth

Estimation method; either "ML" for maximum likelihood, or "REML" for restricted maximum likelihood (default).

useTailDownWeight

use stream segment weighting in the tail-down model? Default is FALSE. Weighting is same as for tail-up models, based on an additive function.

trans.power

power transformation for the response variable in case of Gaussian data. It must be between 0 and 0.5, and if 0, a natural log is used.

trans.shift

a shift (addition or subtraction) applied to the response variable prior to the power tranformation

control

a list of control parameters, consisting of four items: 1) max.range.factor; this sets the maximum range as a function of the maximum distance among observed data locations, 2) trunc.pseudo; this sets a truncation value for pseudo-data for the

Value

argsInformation on arguments used in the function call to glmssn
ssn.objecta copy of the input object of class SpatialStreamNetwork, so that the model fit is directly tied to an SpatialStreamNetwork object
sampinfosample information
estimatesEstimates of the covariance parameters
optimOutputOutput from last call to optim to enable the user to check for correct convergence
glmssn returns an object of class "glmssn". This is a list of 5 objects, with the following structure:
outpt <- list( args = list( ## stores all arguments used in function call formula = formula, zcol = dataXY.out$respvecs$response.col, # response column family = family, CorModels = CorModels, useTailDownWeights = useTailDownWeights, use.nugget = use.nugget, use.anisotropy = use.anisotropy, addfunccol = addfunccol, trialscol = trialscol, EstMeth = EstMeth, trans.power = trans.power, trans.shift = trans.shift ),
ssn.object = ssn.object, # input object of class "SpatialStreamNetwork"
sampinfo = list( # sample information # indicator vector for non-missing response values ind.obs = ind[order(data[,"pid"])], sample.size = n.all, # total number of records in the data frame # number of records with non-missing response values obs.sample.size = n.allxy, missing.sample.size = n.all - n.allxy, # number of missing response values rankX = p, # rank of X # vector of the response variable z = zt[order(dataXY.out$datasets$data2[,"pid"])], X = X2[order(dataXY.out$datasets$data2[,"pid"]),], # design matrix effnames = dataXY.out$Xmats$effnames, setzero = dataXY.out$indvecs$setzero, setNA = dataXY.out$indvecs$setNA, setNA2 = dataXY.out$indvecs$setNA2, cutX1toX2 = dataXY.out$indvecs$cutX1toX2, StdXDataFrame = dataXY.out$Xmats$StdXDataFrame ),
estimates = list( theta=parmest, # estimated covariance parameters # estimated covariance matrix V = V[order(dataXY.out$datasets$data2[,"pid"]), order(dataXY.out$datasets$data2[,"pid"])], # inverse of estimated covariance matrix Vi = Vi[order(dataXY.out$datasets$data2[,"pid"]), order(dataXY.out$datasets$data2[,"pid"])], betahat = b.hat, # estimated fixed effects covb = covb, # estimated covariance matrix of estimated fixed effects # inverse of estimated covariance matrix of estimated fixed effects covbi = covbi, m2LL = m2LL # -2 times log-likelihood ),
optimOutput=parmest.out )

Details

Models for glmssn are specified symbolically, similar to lm and other models in R. A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of fixed effect linear predictors for the response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second. See model.matrix for further details. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on. A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula for more details of allowed formulae.

The spatial formulation is described in Ver Hoef and Peterson (2010) and Peterson and Ver Hoef (2010).

References

Garreta, V., Monestiez, P. and Ver Hoef, J.M. (2010) Spatial modelling and prediction on river networks: up model, down model, or hybrid? Environmetrics 21(5), 439--456.

Peterson, E.E. and Ver Hoef, J.M. (2010) A mixed-model moving-average approach to geostatistical modeling in stream networks. Ecology 91(3), 644--651.

Ver Hoef, J.M. and Peterson, E.E. (2010) A moving average approach for spatial statistical models of stream networks (with discussion). Journal of the American Statistical Association 105, 6--18. DOI: 10.1198/jasa.2009.ap08248. Rejoinder pgs. 22--24.

Examples

Run this code

library(SSN)
# NOT RUN 
# mf04p <- importSSN(system.file("lsndata/MiddleFork04.ssn", 
#	  package = "SSN1.1"), predpts = "pred1km", o.write = TRUE)
#   use SpatialStreamNetwork object mf04p that was already created
data(mf04p)
#make sure mf04p has the correct path, will vary for each users installation
mf04p@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")

# The models take a little time to fit, so they are NOT RUN
# Uncomment the code to run them
# Alternatively, you can load the fitted models first to look at results
data(modelFits)

## Non-spatial model
# fitNS <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, CorModels = NULL, 
#    EstMeth = "REML", family = "Gaussian")
#make sure fitSN has the correct path, will vary for each users installation
fitNS$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(fitNS)

## Random effect model using STREAMNAME as our random effect
#fitRE <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, EstMeth = "REML", family = "Gaussian",
#    CorModels = c("STREAMNAME"))
#make sure fitRE has the correct path, will vary for each users installation
fitRE$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(fitRE)
## random effects details
fitREBLUP <- BLUP(fitRE)
str(fitREBLUP)
fitREBLUP$Mean

## Basic spatial model with a random effect
#fitSpRE1 <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, EstMeth = "REML", family = "Gaussian",
#    CorModels = c("STREAMNAME","Exponential.Euclid"))
#make sure fitSpRE1 has the correct path, will vary for each users installation
fitSpRE1$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(fitSpRE1)

## Spatial stream tail-up model with a random effect
#fitSpRE2 <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, EstMeth = "REML", family = "Gaussian",
#    CorModels = c("STREAMNAME","Exponential.tailup"),
#    addfunccol = "afvArea")
#make sure fitSpRE2 has the correct path, will vary for each users installation
fitSpRE2$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(fitSpRE2)

## 3 component spatial model
#fitSp <- glmssn(Summer_mn ~ ELEV_DEM + netID,
#    ssn.object = mf04p, EstMeth = "REML", family = "Gaussian",
#    CorModels = c("Exponential.tailup","Exponential.taildown",
#    "Exponential.Euclid"), addfunccol = "afvArea")
#make sure fitSp has the correct path, will vary for each users installation
fitSp$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")

## Summarise last model
summary(fitSp)

## AIC for last model
AIC(fitSp)

## Generalised R-squared for last model
GR2(fitSp)

## Look at variance components in more detail
covparms(fitSp)
varcomp(fitSp)

## Compare models
InfoCritCompare(list(fitNS, fitRE, fitSpRE1, fitSpRE2, fitSp))

## Fit a model to binary data
#binSp <- glmssn(MaxOver20 ~ ELEV_DEM + SLOPE, mf04p,
#  CorModels = c("Mariah.tailup", "Spherical.taildown"),
#  family = "binomial", addfunccol = "afvArea")
#make sure binSp has the correct path, will vary for each users installation
binSp$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(binSp)

## Fit a model to count data
#poiSp <- glmssn(C16 ~ ELEV_DEM + SLOPE, mf04p,
#  CorModels = c("LinearSill.tailup", "LinearSill.taildown"),
#  family = "poisson", addfunccol = "afvArea")
#make sure poiSp has the correct path, will vary for each users installation
poiSp$ssn.object@path <- system.file("lsndata/MiddleFork04.ssn", package = "SSN")
summary(poiSp)

Run the code above in your browser using DataLab