AsciiGridImpute: Imputes/Predicts data for Ascii Grid maps

Description

AsciiGridImpute finds nearest neighbor reference observations for each point in the input grid maps and outputs maps of selected Y-variables in a corresponding set of output grid maps.

AsciiGridPredict applies a predict function to each point in the input grid maps and outputs maps of the prediction(s) in corresponding output grid maps (see Details).

One row of each grid map is read and processed at a time thereby avoiding the need to build huge objects in R that would be necessary if all the rows of all the maps were processed together.

Usage

AsciiGridImpute(object,xfiles,outfiles,xtypes=NULL,ancillaryData=NULL,
                ann=NULL,lon=NULL,lat=NULL,rows=NULL,cols=NULL,
                nodata=NULL,myPredFunc=NULL,...)
AsciiGridPredict(object,xfiles,outfiles,xtypes=NULL,lon=NULL,lat=NULL,
                 rows=NULL,cols=NULL,nodata=NULL,myPredFunc=NULL,...)

Value

An invisible list containing the following named elements:

unexpectedNAs: A data frame listing the map row numbers and the number of NA values generated by the predict function for each row. If none are generated for a row the row is not reported, if none are generated for any rows, the data frame is NULL.
illegalLevels: A data frame listing levels found in the maps that were not found in the xlevels for the object. The row names are the illegal levels, the column names are the variable names, and the values are the number of grid cells where the illegal levels were found.
outputLegend: A data frame showing the relationship between levels in the output maps and those found in object. The row names are level index values, the column names are variable names, and the values are the levels. NULL if no factors are output.
inputLegend: A data frame showing the relationship between levels found in the input maps and those found in object. The row names are level index values (this function assumes they correspond to numeric values on the maps), the column names are variable names, and the values are the levels. NULL if no factors are input. This information is consistent with that in xlevels.

Arguments

object

An object of class yai, any object for which a predict function is defined, or an object that is passed to a predict function you define using argument myPredFunc. See Details.

xfiles

A list of input file names where there is one grid file for each X-variable. List elements must be given the same names as the X-variables they correspond with and there must be one file for each X-variable used when object was built.

outfiles

One of these two forms:

(1) A file name that is understood to correspond to the single prediction returned by the generic predict function related to object or returned by myPredFunc. This form only applies to AsciiGridPredict, when the object is not class yai.
(2) A list of output file names where there is one grid file for each desired output variable. While there may be many variables predicted for object, only those for which an output grid is desire need to be specified. Note that some predict functions return data frames, some return a single vector, and often what is returned depends on the value of arguments passed to predict. In addition to names of the predicted variables, the following two special names can be coded when the object class is yai: For distance=“filename” a map of the distances is output and if useid=“filename” a map of integer indices to row numbers of the reference observations is output. When the predict function returns a vector, an additional special name of predict=“filename” can be used.

xtypes

A list of data type names that corresponds exactly to data type of the maps listed in xfiles. Each value can be one of: "logical", "integer", "numeric", "character". If NULL, or if a type is missing for a member of xfiles, type "numeric" is used. See Details if you used factors as predictors.

ancillaryData

A data frame of Y-variables that may not have been used in the original call to yai. There must be one row for each reference observation, no missing data, and row names must match those used in the original reference observations.

ann

if NULL, the value is taken from object. When TRUE, ann is used to find neighbors, and when FALSE a slow exact search is used (ignored for when method randomForest is used when the original yai object was created).

lon

if NULL, the value of cols is used. Otherwise, a 2-element vector given the range of longitudes (horizontal distance) desired for the output.

lat

if NULL, the value of rows is used. Otherwise, a 2-element vector given the range of latitudes (vertical distance) desired for the output.

rows

if NULL, all rows from the input grids are used. Otherwise, rows is a 2-element vector given the rows desired for the output. If the second element is greater than the number of rows, the header value YLLCORNER in the output is adjusted accordingly. Ignored if lon is specified.

cols

if NULL, all columns from the input grids are used. Otherwise, cols is a 2-element vector given the columns desired for the output. If the first element is greater than one, the header value XLLCORNER in the output is adjusted accordingly. Ignored if lat is specified.

nodata

the NODATA_VALUE for the output. If NULL, the value is taken from the input grids.

myPredFunc

called by AsciiGridPredict to predict output using the object and newdata from the xfiles. Two arguments are passed by AsciiGridPredict to this function, the first is the value of object and the second is a data frame of the new predictor variables created for each row of data from your input maps. If NULL, the generic predict function is called for object.

...

passed to myPredFunc, predict, or impute.

Author

Nicholas L. Crookston ncrookston.fs@gmail.com

Details

The input maps are assumed to be Asciigrid maps with 6-line headers containing the following tags: NCOLS, NROWS, XLLCORNER, YLLCORNER, CELLSIZE and NODATA_VALUE (case insensitive). The headers should be identical for all input maps, a warning is issued if they are not. It is critical that NODATA_VALUE is the same on all input maps.

The function builds data frames from the input maps one row at a time and builds predictions using those data frames as newdata. Each row of the input maps is processed in sequence so that the entire maps are not stored in memory. The function works by opening all the input and reads one line (row) at a time from each. The output file(s) are created one line at time as the input maps are processed.

Use AsciiGridImpute for objects builts with yai, otherwise use AsciiGridPredict. When AsciiGridPredict is used, the following rules apply. First, when myPredFunc is not null it is called with the arguments object, newdata, ... where the new data is the data frame built from the input maps, otherwise the generic predict function is called with these same arguments. When object and myPredFunc are both NULL a copy newdata used as the prediction. This is useful when lat, lon, rows, or cols are used in to subset the maps.

The NODATA_VALUE is output for every NODATA_VALUE found on any grid cell on any one of the input maps (the predict function is not called for these grid cells). NODATA_VALUE is also output for any grid cell where the predict function returns an NA.

If factors are used as X-variables in object, the levels found the map data are checked against those used in building the object. If new levels are found, the corresponding output map grid point is set to NODATA_VALUE; the predict function is not called for these cells as most predict functions will fail in these circumstances. Checking on factors depends on object containing a meaningful member named xlevels, as done for objects produced by lm.

Asciigrid maps do not contain character data, only numbers. The numbers in the maps are matched the xlevels by subscript (the first entry in a level corresponds to the numeric value 1 in the Asciigrid maps, the second to the number 2 and so on). Care must be taken by the user to insure that the coding scheme used in building the maps is identical to that used in building the object. See Value for information on how you can check the matching of these codes.

Examples

Run this code


## These commands write new files to your working directory

# Use the iris data
data(iris)

# Section 1: Imagine that the iris are planted in a planting bed.
# The following set of commands create Asciigrid map
# files for four attributes to illustrate the planting layout.

# Change species from a character factor to numeric (the sp classes
# can not handle character data).

sLen <- matrix(iris[,1],10,15)
sWid <- matrix(iris[,2],10,15)
pLen <- matrix(iris[,3],10,15)
pWid <- matrix(iris[,4],10,15)
spcd <- matrix(as.numeric(iris[,5]),10,15)

# Create and change to a temp directory. You can delete these steps
# if you wish to keep the files in your working directory.
curdir <- getwd()
setwd(tempdir())
cat ("Using working dir",getwd(),"\n")

# Make maps of each variable.
header = c("NCOLS 15","NROWS 10","XLLCORNER 1","YLLCORNER 1",
           "CELLSIZE 1","NODATA_VALUE -9999")
cat(file="slen.txt",header,sep="\n")
cat(file="swid.txt",header,sep="\n")
cat(file="plen.txt",header,sep="\n")
cat(file="pwid.txt",header,sep="\n")
cat(file="spcd.txt",header,sep="\n")


write.table(sLen,file="slen.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)
write.table(sWid,file="swid.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)
write.table(pLen,file="plen.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)
write.table(pWid,file="pwid.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)
write.table(spcd,file="spcd.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)

# Section 2: Create functions to predict species

# set the random number seed so that example results are consistant
# normally, leave out this command
set.seed(12345)

# sample the data
refs <- sample(rownames(iris),50)
y <- data.frame(Species=iris[refs,5],row.names=rownames(iris[refs,]))

# build a yai imputation for the reference data.
rfNN <- yai(x=iris[refs,1:4],y=y,method="randomForest")

# make lists of input and output map files.

xfiles <- list(Sepal.Length="slen.txt",Sepal.Width="swid.txt",
               Petal.Length="plen.txt",Petal.Width="pwid.txt")
outfiles1 <- list(distance="dist.txt",Species="spOutrfNN.txt",
                  useid="useindx.txt")

# map the imputation-based predictions for the input maps
AsciiGridImpute(rfNN,xfiles,outfiles1,ancillaryData=iris)
# read the asciigrids and get them ready to plot
spOrig <- t(as.matrix(read.table("spcd.txt",skip=6)))
sprfNN <- t(as.matrix(read.table("spOutrfNN.txt",skip=6)))
dist <- t(as.matrix(read.table("dist.txt",skip=6)))

# demonstrate the use of useid:
spViaUse <- read.table("useindx.txt",skip=6)
for (col in colnames(spViaUse)) spViaUse[,col]=as.character(y$Species[spViaUse[,col]])

# demonstrate how to use factors:
spViaLevels  <- read.table("spOutrfNN.txt",skip=6)
for (col in colnames(spViaLevels)) spViaLevels[,col]=levels(y$Species)[spViaLevels[,col]]

identical(spViaLevels,spViaUse)

if (require(randomForest))
{
  # build a randomForest predictor
  rf <- randomForest(x=iris[refs,1:4],y=iris[refs,5])
  AsciiGridPredict(rf,xfiles,list(predict="spOutrf.txt"))
  sprf <- t(as.matrix(read.table("spOutrf.txt",skip=6)))
} else sprf <- NULL

# reset the directory to that where the example was started.
setwd(curdir)

par(mfcol=c(2,2),mar=c(1,1,2,1))
image(spOrig,main="Original",col=c("red","green","blue"),
      axes=FALSE,useRaster=TRUE)
image(sprfNN,main="Using Impute",col=c("red","green","blue"),
      axes=FALSE,useRaster=TRUE)
if (!is.null(sprf))
  image(sprf,main="Using Predict",col=c("red","green","blue"),
      axes=FALSE,useRaster=TRUE)
image(dist,main="Neighbor Distances",col=terrain.colors(15),
      axes=FALSE,useRaster=TRUE)

Run the code above in your browser using DataLab