AsciiGridImpute: Imputes/Predicts data for Ascii Grid maps

Description

AsciiGridImpute finds nearest neighbor reference observations for each point in the input grid maps and outputs maps of selected Y-variables in a set of output grid maps.

AsciiGridPredict applies a predict function to each point in the input grid maps and outputs maps of the prediction(s) in one or more output grid maps (see Details).

One row of the each grid maps is read and processed at a time thereby avoiding the need to build huge objects in R that would be necessary if all the rows of all the maps were processed together.

Usage

AsciiGridImpute(object,xfiles,outfiles,xtypes=NULL,ancillaryData=NULL,
                ann=NULL,lon=NULL,lat=NULL,rows=NULL,cols=NULL,
                nodata=NULL,myPredFunc=NULL,...)
AsciiGridPredict(object,xfiles,outfiles,xtypes=NULL,lon=NULL,lat=NULL,
                 rows=NULL,cols=NULL,nodata=NULL,myPredFunc=NULL,...)

Arguments

object

An object of class yai, any other object for which a predict function is defined, or an object that is passed to a predict function you define

xfiles

A list of input file names where there is one grid file for each X-variable. List elements must be given the same names as the X-variables they correspond with and there must be one file for

outfiles

One of these two forms:

{A file name that is understood to correspond to the single prediction returned by the generic predict function related to object

Value

An invisible list containing the following named elements:
unexpectedNAsA data frame listing the map row numbers and the number of NA values generated by the predict function for each row. If none are generated for a row the row is not reported, if none are generated for any rows, the data frame is NULL.
illegalLevelsA data frame listing levels found in the maps that were not found in the xlevels for the object. The row names are the illegal levels, the column names are the variable names, and the values are the number of grid cells where the illegal levels were found.
outputLegendA data frame showing the relationship between levels in the output maps and those found in object. The row names are level index values, the column names are variable names, and the values are the levels. NULL if no factors are output.
inputLegendA data frame showing the relationship between levels found in the input maps and those found in object. The row names are level index values (this function assumes they correspond to numeric values on the maps), the column names are variable names, and the values are the levels. NULL if no factors are input. This information is basically the same as that in xlevels.

item

xtypes
ancillaryData
ann
lon
lat
rows
cols
nodata
myPredFunc
...

code

predict

Details

The input maps are assumed to be Asciigrid maps with 6-line headers containing the following tags:

NCOLS, NROWS, XLLCORNER, YLLCORNER,
  CELLSIZE

and NODATA_VALUE (case insensitive). The headers should be identical, a warning is issued if they are not. It is critical that NODATA_VALUE is the same on all input maps.

The function builds data frames from the input maps one row at a time and builds predictions using those data frames as newdata. Each row of the input maps is processed in sequence so that the entire maps are not stored in memory. The function works by opening all the input and reads one line (row) at a time from each. The output file(s) are created one line at time as the input maps are processed.

Use AsciiGridImpute for objects builts with yai, otherwise use AsciiGridPredict. When AsciiGridPredict is used, the following rules apply. First, when myPredFunc is not null it is called with the arguments object, newdata, ... where the new data is the data frame built from the input maps, otherwise the generic predict function is called with these same arguments. When object and myPredFunc are both NULL a copy newdata used as the prediction. This is useful when lat, lon, rows, or cols are used in to subset the maps.

The NODATA_VALUE is output for every NODATA_VALUE found on any grid cell on any one of the input maps (the predict function is not called for these grid cells). NODATA_VALUE is also output for any grid cell where the predict function returns an NA. If factors are used as X-variables in object, the levels found the map data are checked against those used in building the object. If new levels are found, the corresponding output map grid point is set to NODATA_VALUE; the predict function is not called for these cells as most predict functions will fail in these circumstances. Checking on factors depends on object containing a meaningful member named xlevels, as done objects model objects produced by lm. Note that objects produced by randomForest version 4.5-19 and prior do not contain xlevels; use function addXlevels to add the necessary element if one or more factors are used.

Asciigrid maps do not contain character data, only numbers. The numbers in the maps are matched the xlevels by subscript (the first entry in a level corresponds to the numeric value 1 in the Asciigrid maps, the second to the number 2 and so on). Care must be taken by the user to insure that the coding scheme used in building the maps is identical to that used in building the object. See Value for information on how you can check the matching of these codes.

Examples

Run this code

## These commands write new files to your working directory

# Use the iris data
data(iris)

# Section 1: Imagine that the iris are planted in a planting bed.
# The following set of commands create Asciigrid map
# files for four attributes to illustrate the planting layout.

# Change species from a character factor to numeric (the sp classes
# can not handle character data).

sLen <- matrix(iris[,1],10,15)
sWid <- matrix(iris[,2],10,15)
pLen <- matrix(iris[,3],10,15)
pWid <- matrix(iris[,4],10,15)
spcd <- matrix(as.numeric(iris[,5]),10,15)

# Make maps of each variable.

header = c("NCOLS 15","NROWS 10","XLLCORNER 1","YLLCORNER 1",
           "CELLSIZE 1","NODATA_VALUE -9999")
cat(file="slen.txt",header,sep="")
cat(file="swid.txt",header,sep="")
cat(file="plen.txt",header,sep="")
cat(file="pwid.txt",header,sep="")
cat(file="spcd.txt",header,sep="")


write.table(sLen,file="slen.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)
write.table(sWid,file="swid.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)
write.table(pLen,file="plen.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)
write.table(pWid,file="pwid.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)
write.table(spcd,file="spcd.txt",append=TRUE,col.names=FALSE,
            row.names=FALSE)

# Section 2: Create functions to predict species

# set the random number seed so that example results are consistant
# normally, leave out this command
set.seed(12345)

# sample the data
refs <- sample(rownames(iris),50)
y <- data.frame(Species=iris[refs,5],row.names=rownames(iris[refs,]))

# build a yai imputation for the reference data.
rfNN <- yai(x=iris[refs,1:4],y=y,method="randomForest")

# make lists of input and output map files.

xfiles <- list(Sepal.Length="slen.txt",Sepal.Width="swid.txt",
               Petal.Length="plen.txt",Petal.Width="pwid.txt")
outfiles1 <- list(distance="dist.txt",Species="spOutrfNN.txt")

# map the imputation-based predictions for the input maps
AsciiGridImpute(rfNN,xfiles,outfiles1,ancillaryData=iris)

# build a randomForest predictor
rf <- randomForest(x=iris[refs,1:4],y=iris[refs,5])

# map the predictions for the input maps
outfiles2 <- list(predict="spOutrf.txt")
AsciiGridPredict(rf,xfiles,outfiles2,xtypes=NULL,rows=NULL)

# read the species maps and plot them using the sp package classes
if (require(sp)) {
   spOrig <- read.asciigrid("spcd.txt")
   sprfNN <- read.asciigrid("spOutrfNN.txt")
   sprf <- read.asciigrid("spOutrf.txt")
   dist <- read.asciigrid("dist.txt")

   par(mfcol=c(2,2))
   image(spOrig)
   title("Original")
   image(sprfNN)
   title("Using Predict")
   image(sprf)
   title("Using Impute")
   image(dist)
   title("Neighbor Distances")
}

Run the code above in your browser using DataLab