modgam: Fit a Spatial Logistic Generalized Additive Model (GAM)

Description

Calculates crude or adjusted odds ratios on a user-supplied grid for spatial analysis using generalized additive model with a LOESS smooth for geolocation. Includes an optional permutation test for global and local tests of the null hypothesis that spatial location is not associated with the outcome.

Usage

modgam(rdata, rgrid, permute = 0, m = "adjusted", sp = NULL, keep = F, verbose = T, ...)

Arguments

rdata

Data set (required). The data must be structured so that the outcome (coded as 0s and 1s) is in the 1st column and X and Y location values are in the 2nd and 3rd columns. If additional columns are provided and m = "adjusted", then the additional columns

rgrid

A data frame containing the values at which to generate odds ratio predictions (required). X and Y location values must be in the 1st and 2nd columns. Additional covariates values for predictions may be provided in additional columns of rgrid. If m = "

permute

The number of permutations of the data set for testing the significance of location. permute = 0 (default) produces no permutation tests. If permute > 0, locations are randomly permuted in order to simulate the distribution of results under the null hypo

Model type for the GAM. Options are "crude" or "adjusted" (default). If "crude", only the spatial smooth term for location is included as a predictor in the model (i.e., "crude" is a synonym for "unadjusted"). If "adjusted", all covariates in the data

Span size for the LOESS smooth of spatial location. If sp = NULL (default) then an optimal span size will be determined using the optspan function.

keep

Logical value indicating whether to store and return the pointwise odds ratios for every permuted data set. These values aren't necessary for mapping or cluster identification, and storing them slows the permutation tests, so the default is keep = FALSE.

verbose

Logical value indicating whether to print the GAM model statement, the percentile rank for the global deviance statistic in the permutation test, and the progress of the permutation test (report completion of every 10 permutations). The default is verbos

...

Further arguments to be passed to the gam function (e.g., weights).

Value

gridA data frame with X and Y locations from rgrid
ORThe estimated odds ratio for each point on the grid, from the spatially smoothed GAM model. The referent is logistic model without spatial location.
mWhether the GAM was unadjusted (only spatial location as a predictor) or adjusted (included other covariates in addition to spatial location). "Crude" is a synonym for "unadjusted."
spanThe span size used for the LOESS smooth of the effects of location
gamobjThe GAM model object from the fit to the data
globalThe p-value based on the deviance statistic comparing models with and without spatial locations, using the distribution of deviance statistics from the permuted data sets to represent the null distribution. For a test of H0: spatial location is unassociated with risk of the outcome (adjusting for any covariates if m = "adjusted"), reject H0 if the percentile rank is below alpha. WARNING: modgam uses a conditional permutation test which produces inflated type I error rates; Young et al. (2012) recommend using alpha=0.025 to limit the type I error rate to approximately 5%.
pointwiseFor each point on the grid, the percentile rank of the local log odds for the model compared to the local log odds distribution from the permuted data sets. This result is needed to define clusters of the outcome (spatial regions with statistically significant risks--unusually high or low risks.)
permutationsA matrix containing null permuted odds ratio estimates for each point on the grid, with the results for each permutation in a separate column.

Warning

Permutation tests are computationally intensive, often requiring several hours for a large data set.

Details

The model used to fit the data is a generalized additive model with a LOESS smooth for geolocation (Hastie and Tibshirani, 1990; Kelsall and Diggle, 1998; Webster et al., 2006): $$\ln \left(\frac{\pi_{i}}{1-\pi_{i}}\right) = S(x_{i},y_{i}) + \mathbf{Z_{i}} \boldsymbol{\beta}$$ where $\pi_{i}$ is the probability that the outcome is 1 for participant i, $x_{i}$ and $y_{i}$ are spatial coordinates for participant i (i.e., projected distance east and north, respectively, from an arbitrarily defined origin location), $S(.,.)$ is a 2-dimensional smoothing function (currently LOESS), $\mathbf{Z_{i}}$ is a row vector of covariate values for participant i, and $\boldsymbol{\beta}$ is a vector of unknown regression parameters (including an intercept). When a permutation test is requested, for each permutation the geolocations (paired x and y coordinates) in the data set are randomly reassigned to participants, consistent with the null hypothesis that geolocation is not associated with risk for the outcome. Note that this procedure preserves associations between other covariates and risk of the outcome, so the permutations only test the signficance of the smoothed geolocation. See the references for more details.

References

Hastie TJ, Tibshirani RJ. Generalized Additive Models. (Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Boca Raton, Florida, 1990).

Kelsall J, Diggle P. Spatial variation in risk of disease: a nonparametric binary regression approach. J Roy Stat Soc C-App 1998, 47:559-573. Vieira V, Webster T, Weinberg J, Aschengrau A, Ozonoff D. http://www.ehjournal.net/content/4/1/11{Spatial analysis of lung, colorectal, and breast cancer on Cape Cod: An application of generalized additive models to case-control data}. Environmental Health 2005, 4:11.

Webster T, Vieira V, Weinberg J, Aschengrau A. http://www.ij-healthgeographics.com/content/5/1/26{Method for mapping population-based case-control studies using Generalized Additive Models}. International Journal of Health Geographics 2006, 5:26.

Young RL, Weinberg J, Vieira V, Ozonoff A, Webster TF. http://www.ij-healthgeographics.com/content/9/1/37{A power comparison of generalized additive models and the spatial scan statistic in a case-control setting}. International Journal of Health Geographics 2010, 9:37.

http://www.busrp.org/projects/project2.html

Examples

Run this code

# Load base map in SpatialPolygonsDataFrame format
# This map was read from ESRI shapefiles using the readShapePoly function
data(MAmap)	
# Load data and create grid on base map
data(MAdata)							
gamgrid <- predgrid(MAdata, MAmap)    # requires PBSmapping package		
# Fit crude GAM model to the MA data using span size of 50%
# and predict odds ratios for every point on gamgrid
fit1 <- modgam(MAdata, gamgrid, m="crude", sp=0.5)  
# Get summary statistics for pointwise crude odds ratios 
summary(fit1$OR)				

# fit adjusted GAM model to the MA data using span size of 50%, and run a (too small) permutation test
fit2 <- modgam(MAdata, gamgrid, permute=25, m="adjusted", sp=0.5)

Run the code above in your browser using DataLab