model.mapmake: Map Making

Description

Applies models to either ERDAS Imagine image (.img) files or ESRI Grids of predictors to create detailed prediction surfaces. It will handle large predictor files for map making, by reading in the .img files in chunks, and output to the .txt file the prediction for each data chunk, before reading the next chunk of data.

Usage

model.mapmake(model.obj= NULL, folder = NULL, MODELfn = NULL, 
rastLUTfn = NULL, na.action = NULL, numrows = 500, map.sd = FALSE, 
asciifn = NULL, asciifn.mean = NULL, asciifn.stdev = NULL, 
asciifn.coefv = NULL, make.img = TRUE, n.trees = NULL)

Arguments

model.obj

R model object. The model object to use for prediction, if the model has been previously created. The model object must be of type RF or SGB. (Eventually planned to include "GAM".) If NULL (the default), a model is generated

folder

String. The folder used for all output from predictions and/or maps. Do not add ending slash to path string. If folder = NULL (default), a GUI interface prompts user to browse to a folder. To use the working directory, specify folde

MODELfn

String. The file name to use to save the generated model object. If MODELfn = NULL (the default), a default name is generated by pasting model.type_response.type_response.name. If the other output filenames are left unspecified

rastLUTfn

String. The file name (full path or base name with path specified by folder) of a .csv file for a rastLUT. Alternatively, a dataframe containing the same information. The rastLUT must include 3 columns:

na.action

String. Model validation. Specifies the action to take if there are NA values in the prediction data or if there is a level or class of a categorical predictor variable in the validation test set or the production (mapping) data set, but no

numrows

Integer. Map Production. The number of rows to be predicted at a time.

map.sd

Logical. Map Production. If map.sd = TRUE, maps of mean, standard deviation, and coefficient of variation of the predictions from all the trees are generated for each pixel. If map.sd = FALSE (the default), only the predicted

asciifn

String. Map Production. Filename of output file for map production. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by folder. If asciifn = NULL (th

asciifn.mean

String. Map Production. Used if map.sd = TRUE and response.type = "continuous". Filename of output file for mean of trees. The filename can be the full path, or it can be the simple basename, in which case the output will be t

asciifn.stdev

String. Map Production. Used if map.sd = TRUE and response.type = "continuous". Filename of output file for standard deviation of trees. The filename can be the full path, or it can be the simple basename, in which case the ou

asciifn.coefv

String. Map Production. Used if map.sd = TRUE and response.type = "continuous". Filename of output file for coefficient of variation of trees. The filename can be the full path, or it can be the simple basename, in which case

make.img

Logical. Map Production. Will function make Imagine Image files in addition to ASCII grid files of map output.

n.trees

Integer. SGB models. The number of stochastic gradient boosting trees for an SGB model. If n.trees=NULL (the default) the model creation code will increase the number of trees 100 at a time until OOB error rate stops improving. The gb

Value

The function does not return a value, instead it writes Asci grid file and Imagine image file of map information (suitable for importing into a GIS) to the specified folder.

Details

model.mapmake() can be run in a traditional R command mode, where all arguments are specified in the function call. However it can also be used in a full push button mode, where you type in the simple command model.mapmake(), and GUI pop up windows will ask questions about the type of model, the file locations of the data, etc... When running model.mapmake() on non-Windows platforms, file names and folders need to be specified in the argument list, but other pushbutton selections are handled by the select.list() function, which is platform independent. For map making, the package rgdal is used to read .img files. The data for production mapping should be in the form of pixel-based raster layers representing the predictors in the model. If there is more than one predictor or raster layer, the layers must all have the same number of columns and rows. The layers must also have the same extent, projection, and pixel size, for effective model development and accuracy. The layers must also be in either ESRI Grid or ERDAS Imagine image (single or multi-band) raster data formats, having continuous or categorical data values. The R package rgdal is used to read spatial rasters into R. When creating maps of non-rectangular study regions there may be large portions of the rectangle where you have no predictors, and are uninterested in making predictions. The suggested value for the pixels outside the study area is -9999. These pixels will be ignored in the predictions, thus saving computing time, and will be exported as -9999. Any value other than -9999 will be treated as a legal data value and a prediction will be generated for each pixel. Note: in Imagine image files, if the specified NODATA is set as -9999, any -9999 pixels will be read into R as NA, and if na.action = "na.roughfix", predictions will be attempted for these pixels. This will cause the computation time to increase, and these predictions will need to be masked out when the final map is imported back into a GIS system. The function model.mapmake() outputs an ASCII grid file and Imagine Image file of map information suitable to be imported into a GIS. Small maps can also be imported back into R using the function read.asciigrid() from the sp package. For Binary response models the output is in the form of predicted probability of presence for each pixel. For Continuous response models the output is the predicted value for each pixel. For Categorical response models the map output depends on the category labels. If the categorical response variable is numeric, the map output will use the original numeric categories. If the categories are non-numeric, map output is in the form of integer class codes for each pixel, coded for each level of the factored response, and a CSV file containing a look up table is also generated to associate the integer codes with the original values of the response categories. The first predictor from predList is used to determine projection of output Imagine Image file.

References

Breiman, L. (2001) Random Forests. Machine Learning, 45:5-32. Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Stat., 29(5):1189-1232. Friedman, J.H. (2002). Stochastic gradient boosting. Comput. Stat. Data An., 38(4):367-378. Liaw, A. and Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), 18--22. Ridgeway, G., (1999). The state of boosting. Comp. Sci. Stat. 31:172-181

Examples

Run this code

###########################################################################
############################# Run this set up code: #######################
###########################################################################

# set seed:
seed=38

# Define training and test files:

qdata.trainfn = system.file("external", "helpexamples","DATATRAIN.csv", package = "ModelMap")

# Define folder for all output:
folder=getwd()	


#identifier for individual training and test data points

unique.rowname="ID"


###########################################################################
############## Pick one of the following sets of definitions: #############
###########################################################################


########## Continuous Response, Continuous Predictors ############

#file name to store model:
MODELfn="RF_Bio_TC"				

#predictors:
predList=c("TCB","TCG","TCW")	

#define which predictors are categorical:
predFactor=FALSE	

# Response name and type:
response.name="BIO"
response.type="continuous"


########## binary Response, Continuous Predictors ############

#file name to store model:
MODELfn="RF_CONIFTYP_TC"				

#predictors:
predList=c("TCB","TCG","TCW")		

#define which predictors are categorical:
predFactor=FALSE

# Response name and type:
response.name="CONIFTYP"

# This variable is 1 if a conifer or mixed conifer type is present, 
# otherwise 0.

response.type="binary"


########## Continuous Response, Categorical Predictors ############

# In this example, NLCD is a categorical predictor.
#
# You must decide what you want to happen if there are categories
# present in the data to be predicted (either the validation/test set
# or in the image file) that were not present in the original training data.
# Choices:
#       na.action =  "na.omit"
#                    Any validation datapoint or image pixel with a value for any
#                    categorical predictor not found in the training data will be
#                    returned as NA.
#       na.action =  "na.roughfix" 
#                    Any validation datapoint or image pixel with a value for any
#                    categorical predictor not found in the training data will have
#                    the most common category for that predictor substituted,
#                    and the a prediction will be made.

# You must also let R know which of the predictors are categorical, in other
# words, which ones R needs to treat as factors.
# This vector must be a subset of the predictors given in predList

#file name to store model:
MODELfn="RF_BIO_TCandNLCD"			

#predictors:
predList=c("TCB","TCG","TCW","NLCD")

#define which predictors are categorical:
predFactor=c("NLCD")

# Response name and type:
response.name="BIO"
response.type="continuous"



###########################################################################
########################### build model: ##################################
###########################################################################


### create model ###

model.obj = model.build( model.type="RF",
                       qdata.trainfn=qdata.trainfn,
                       folder=folder,		
                       unique.rowname=unique.rowname,		
                       MODELfn=MODELfn,
                       predList=predList,
                       predFactor=predFactor,
                       response.name=response.name,
                       response.type=response.type,
                       seed=seed,
                       na.action="na.roughfix"
)



###########################################################################
############ Then Run this code to predict map pixels #####################
###########################################################################

# A single model was built from the training data, 
# but it will be applied to two sets of image data, one from 2001 and one from 2004

####################################################################################################

### Create a list of the filenames (including paths) for the rast Look up Tables ###


rastLUTfn.2001 <- paste(system.file(package="ModelMap"),"/external/helpexamples/LUT_2001.csv",sep="")
rastLUTfn.2004 <- paste(system.file(package="ModelMap"),"/external/helpexamples/LUT_2004.csv",sep="")


### Load rast LUT tables, and add path to the filenames in column 1 ###

rastLUT.2001 <- read.table(rastLUTfn.2001,header=FALSE,sep=",",stringsAsFactors=FALSE)
rastLUT.2004 <- read.table(rastLUTfn.2004,header=FALSE,sep=",",stringsAsFactors=FALSE)

rastLUT.2001[,1] <- paste(system.file(package="ModelMap"),"external/helpexamples",rastLUT.2001[,1],sep="/")
rastLUT.2004[,1] <- paste(system.file(package="ModelMap"),"external/helpexamples",rastLUT.2004[,1],sep="/")                                      


### Define filenames for map  output ###

asciifn.2001 <- "RF_BIO_TCandNLCD_01.txt"
asciifn.2004 <- "RF_BIO_TCandNLCD_04.txt"


asciifn.2001 <- paste(folder,asciifn.2001,sep="/")
asciifn.2004 <- paste(folder,asciifn.2004,sep="/")


### Define Number of rows of raster to read in at one time ###
# if crashes with warning: "unable to assign..." lower this number

numrows=500


### Create ascii text files of predicted map data ###

model.mapmake( model.obj=model.obj,
               folder=folder,		
               rastLUTfn=rastLUT.2001,
           # Model Validation Arguments	
               na.action="na.roughfix",
           # Mapping arguments
               numrows = numrows,						
               asciifn=asciifn.2001
               )

model.mapmake( model.obj=model.obj,
               folder=folder,		
               rastLUTfn=rastLUT.2004,
           # Model Validation Arguments	
               na.action="na.roughfix",
           # Mapping arguments
               numrows = numrows,						
               asciifn=asciifn.2004
               )

###########################################################################
######### run this code to create maps in R (For small maps only!)#########
###########################################################################

### Define Color Ramp ###

l <- seq(100,0,length.out=101)
c <- seq(0,100,length.out=101)
col.ramp <- hcl(h = 120, c = c, l = l)


### read in map data ###

mapgrid.2001 <- read.asciigrid(asciifn.2001,as.image=TRUE)
mapgrid.2004 <- read.asciigrid(asciifn.2004,as.image=TRUE)


### create map ###

dev.new(width = 8, height = 4)
opar <- par(mfrow=c(1,2),mar=c(3,3,2,1),oma=c(0,0,3,4),xpd=NA)

zlim <- c(0,max(mapgrid.2001$z,mapgrid.2004$z,na.rm=TRUE))
legend.label<-rev(pretty(zlim,n=5))
legend.colors<-col.ramp[trunc((legend.label/max(legend.label))*100)+1]

image(mapgrid.2001, col = col.ramp,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext("2001 Imagery",side=3,line=1,cex=1.2)

image(mapgrid.2004, col = col.ramp,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext("2004 Imagery",side=3,line=1,cex=1.2)

legend(	x=max(mapgrid.2004$x),y=max(mapgrid.2004$y),
	legend=legend.label,
	fill=legend.colors,
	bty="n",
	cex=1.2
)
mtext("Predictions",side=3,line=1,cex=1.5,outer=TRUE)
par(opar)


###########################################################################
##### Run this code to map predictor data in R (For small maps only!) #####
###########################################################################

### Define Color Ramps ###

l <- seq(100,0,length.out=101)
c <- seq(0,100,length.out=101)
col.ramp.1 <- hcl(h = 15, c = c, l = l)
col.ramp.2 <- hcl(h = 70, c = c, l = l)
col.ramp.3 <- hcl(h = 150, c = c, l = l)


dev.new(width = 9, height = 6)
opar <- par(mfcol=c(2,3),mar=c(3,3,2,1),oma=c(0,0,3,4),xpd=NA)

#band 1
predgrid.2001=readGDAL(rastLUT.2001[1,1],band=rastLUT.2001[1,3])
predgrid.2001=as.image.SpatialGridDataFrame(predgrid.2001)
predgrid.2004=readGDAL(rastLUT.2004[1,1],band=rastLUT.2004[1,3])
predgrid.2004=as.image.SpatialGridDataFrame(predgrid.2004)

zlim <- range(predgrid.2001$z,predgrid.2004$z)

image(predgrid.2001, col = col.ramp.1,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext(rastLUT.2001[1,2],side=3,cex=1.5)
mtext("2001 Imagery",side=2,cex=1.5,line=1)

image(predgrid.2004, col = col.ramp.1,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext("2004 Imagery",side=2,cex=1.5,line=1)


#band 2
predgrid.2001=readGDAL(rastLUT.2001[2,1],band=rastLUT.2001[2,3])
predgrid.2001=as.image.SpatialGridDataFrame(predgrid.2001)
predgrid.2004=readGDAL(rastLUT.2004[2,1],band=rastLUT.2004[2,3])
predgrid.2004=as.image.SpatialGridDataFrame(predgrid.2004)

zlim <- range(predgrid.2001$z,predgrid.2004$z)

image(predgrid.2001, col = col.ramp.2,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext(rastLUT.2001[2,2],side=3,cex=1.5)

image(predgrid.2004, col = col.ramp.2,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")


#band 3
predgrid.2001=readGDAL(rastLUT.2001[3,1],band=rastLUT.2001[3,3])
predgrid.2001=as.image.SpatialGridDataFrame(predgrid.2001)
predgrid.2004=readGDAL(rastLUT.2004[3,1],band=rastLUT.2004[3,3])
predgrid.2004=as.image.SpatialGridDataFrame(predgrid.2004)

zlim <- range(predgrid.2001$z,predgrid.2004$z)

image(predgrid.2001, col = col.ramp.3,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext(rastLUT.2001[3,2],side=3,cex=1.5)

image(predgrid.2004, col = col.ramp.3,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")


mtext("Predictor Imagery",side=3,line=1,cex=1.5,outer=TRUE)
par(opar)

Run the code above in your browser using DataLab