- model.obj
R model object. The model object to use for prediction. The model object must be of type "RF" (random forest), "QRF" (quantile random forest), or "CF" (conditional forest). The ModelMap package does not currently support SGB models.
- qdata.trainfn
String. The name (full path or base name with path specified by folder) of the training data file used for building the model (file should include columns for both response and predictor variables). The file must be a comma-delimited file *.csv with column headings. qdata.trainfn can also be an R dataframe. If predictions will be made (predict = TRUE or map=TRUE) the predictor column headers must match the names of the raster layer files, or a rastLUT must be provided to match predictor columns to the appropriate raster and band. If qdata.trainfn = NULL (the default), a GUI interface prompts user to browse to the training data file.
- qdata.testfn
String. The name (full path or base name with path specified by folder) of the independent data set for testing (validating) the model's predictions. The file must be a comma-delimited file ".csv" with column headings and the column headings must be the same as those in the training data file. qdata.testfn can also be an R dataframe. If qdata.testfn = NULL (default), a GUI interface asks user if there is a test set available, then prompts user to browse to the test data file. If no test set is desired (for example, cross-fold validation will be performed, or for RF models, Out-Of-Bag estimation, set qdata.testfn = FALSE. If no test set is given, and qdata.testfn is not set to FALSE, the GUI interface asks if a proportion of the data should be set aside as an independent test set. If this is desired, the user will be prompted to specify the proportion to set aside as test data, and two new data files will be generated in the out put folder. The new file names will be the original data file name with "_train" and "_test" appended to the end of the file names.
- folder
String. The folder used for all output from predictions and/or maps. Do not add ending slash to path string. If folder = NULL (default), a GUI interface prompts user to browse to a folder. To use the working directory, specify folder = getwd().
- MODELfn
String. The file name to use to save the generated model object. If MODELfn = NULL (the default), a default name is generated by pasting model.type_response.type_response.name. If the other output filenames are left unspecified, MODELfn will be used as the basic name to generate other output filenames. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by folder.
- response.name
String. The name of the response variable used to build the model. The response.name must be column name from the training/test data files. If the model.obj was constructed in ModelMap with the model.build() function, then the model.diagnostics() can extract the response.name from the model.obj. If the model was constructed outside of ModelMap the you may need to specify the response.name. In particular, if a SGB model was constructed with the aid of Elith's code, it is necessary to specify the response.name argument, as all models constructed with this code are given a response name of "y.data". If the response.name argument differs from the response name in the model.obj, the specified argument is giver preference, and a warning generated.
- unique.rowname
String. The name of the unique identifier used to identify each row in the training data. If unique.rowname = NULL, a GUI interface prompts user to select a variable from the list of column names from the training data file. If unique.rowname = FALSE, a variable is generated of numbers from 1 to nrow(qdata) to index each row.
- diagnostic.flag
String. The name of a column used to identify a subset of rows in the training data or test data to
use for model diagnostics. This column must be either a logical vector (TRUE and FALSE) or a vector of zeros ond ones (where 0=FALSE and 1=TRUE. If this argument is used model diagnostics that depend on predicted and observed values will be calculated from a subset of the training or test data. These include confusion matrix and threshold criteria for binary response models and the scatterplot for continuous response models. The output file of predicted and observed values will have an aditional column, indicating which rows were used in the diagnostic calculations. Note that for cross validation, the entire training dataset will be used to create cross validation predictions, but that only the predictions on the the rows indicated by diagnostic.flag will be used for the diagnostics.
- seed
Integer. The number used to initialize randomization to build RF or SGB models. If you want to produce the same model later, use the same seed. If seed = NULL (the default), a new seed is created each run.
- prediction.type
String. Prediction type. "TEST", "CV", "OOB" or "TRAIN". If predict = "TEST", validation predictions will be made on the test set provided by qdata.testfn. If predict = "CV", cross validation will be used on the training data provided by qdata.trainfn. If model.obj is a Random Forest model and predict = "OOB" the Out-of-Bag predictions will be calculated on the training data. If model.obj is a Stochastic Gradient Boosting model and predict = "TRAIN" the predictions will be calculated on the training data, but these predictions should be used with caution as this will lead to over optimistic estimates of model quality. A *.csv file of the unique id, observed, and predicted values is generated and put in the specified (or default) folder.
- MODELpredfn
String. Model validation. A character string used to construct the output file names for the validation diagnostics, for example the prediction *.csv file, and the graphics *.jpg, *.pdf and *.ps files. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by folder. If MODELpredfn = NULL (the default), a default name is created by pasting modelfn and "_pred".
- na.action
String. Model validation. Specifies the action to take if there are NA values in the predictor data or if there is a level or class of a categorical predictor variable in the validation test set, but not in the training data set. By default, model.daignostics() will use the same na.action as was given to model.build. There are 2 options: (1) na.action = "na.omit" where any data point with NA or any new levels for any of the factored predictors is removed from the data; (2) na.action = "na.roughfix" where a missing categorical predictor is replaced with the most common category, and a missing continuous predictor is replaced with the median. Note: data points with missing response values will always be omitted.
- v.fold
Integer (or logical FALSE). Model validation. The number of cross validation folds to use when making validation predictions on the training data. Only used if prediction.type = "CV".
- device.type
String or vector of strings. Model validation. One or more device types for graphical output from model validation diagnostics.
Current choices:
| | | "default" | default graphics device |
| | | "jpeg" | *.jpg files |
| | | "none" | no graphics device generated |
| | | "pdf" | *.pdf files |
| | | "png" | *.png files |
| | | "postscript" | *.ps files |
| | | "tiff" | *.tif files |
- DIAGNOSTICfn
String. Model validation. Name used as base to create names for output files from model validation diagnostics. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by folder. Defaults to DIAGNOSTICfn = MODELfn followed by the appropriate suffixes (i.e. ".csv", ".jpg", etc...).
- res
Integer. Model validation. Pixels per inch for jpeg, png, and tiff plots. The default is 72dpi, good for on screen viewing. For printing, suggested setting is 300dpi.
- jpeg.res
Integer. Model validation. Deprecated. Ignored unless res not provided.
- device.width
Integer. Model validation. The device width for diagnostic plots in inches.
- device.height
Integer. Model validation. The device height for diagnostic plots in inches.
- units
Model validation. The units in which device.height and device.width are given. Can be "px" (pixels), "in" (inches, the default), "cm" or "mm".
- pointsize
Integer. Model validation. The default pointsize of plotted text, interpreted as big points (1/72 inch) at res ppi
- cex
Integer. Model validation. The cex for diagnostic plots.
- req.sens
Numeric. Model validation. The required sensitivity for threshold optimization for binary response model evaluation.
- req.spec
Numeric. Model validation. The required specificity for threshold optimization for binary response model evaluation.
- FPC
Numeric. Model validation. The False Positive Cost for threshold optimization for binary response model evaluation.
- FNC
Numeric. Model validation. The False Negative Cost for threshold optimization for binary response model evaluation.
- quantiles
Numeric Vector. QRF models. The quantiles to predict. A numeric vector with values between zero and one. If model was built without specifying quantiles, quantile importance can not be calculated, but quantiles can still be used to specify prediction quantiles. If model was built with quantiles specified, then the model quantiles will be used for importance graph. If quantiles are not specified for model building or diagnostics, prediction quantiles will default to quantiles=c(0.1,0.5,0.9)
- all
Logical. QRF models. all=TRUE uses all observations for prediction. all=FALSE uses only a certain number of observations per node for prediction (set with argument obs). Unlike in the quantredForest package itself, the default in ModelMap is all=TRUE, to more closely parallel ordinary random forest models.
- subset
CF models. NOT SUPPORTED. Only needed for prediction.type="CV" for CF models. An optional vector specifying a subset of observations to be used in the fitting process. Note: subset is not yet supported for cross validation diagnostics.
- weights
CF models. NOT SUPPORTED. Only needed for prediction.type="CV" for CF models. An optional vector of weights to be used in the fitting process. Non-negative integer valued weights are allowed as well as non-negative real weights. Observations are sampled (with or without replacement) according to probabilities weights/sum(weights). The fraction of observations to be sampled (without replacement) is computed based on the sum of the weights if all weights are integer-valued and based on the number of weights greater zero else. Alternatively, weights can be a double matrix defining case weights for all ncol(weights) trees in the forest directly. This requires more storage but gives the user more control. Note: weights is not yet supported for cross validation diagnostics.
- mtry
Integer. Only needed for prediction.type="CV" for CF models (for RF and QRF models mtry will be determined from the model object). Number of variables to try at each node of Random Forest trees.
- controls
CF models. Only needed for prediction.type="CV" for CF models. An object of class ForestControl-class, which can be obtained using cforest_control (and its convenience interfaces cforest_unbiased and cforest_classical). If controls is specified, then stand alone arguments mtry and ntree ignored and these parameters must be specified as part of the controls argument. If controls not specified, model.build defaults to cforest_unbiased(mtry=mtry, ntree=ntree) with the values of mtry and ntree specified by the stand alone arguments.
- xtrafo
CF models. Only needed for prediction.type="CV" for CF models. A function to be applied to all input variables. By default, the ptrafo function from the party package is applied.
- ytrafo
CF models. Only needed for prediction.type="CV" for CF models. A function to be applied to all response variables. By default, the ptrafo function from the party package is applied.
- scores
CF models. NOT SUPPORTED. Only needed for prediction.type="CV" for CF models. An optional named list of scores to be attached to ordered factors. Note: scores is not yet supported for cross validation diagnostics.