errorStats: Compute error components of k-NN imputations

Description

Error properties of estimates derived from imputation differ from those of regression-based estimates because the two methods include a different mix of error components. This function computes a partitioning of error statistics as proposed by Stage and Crookston (2007).

Usage

errorStats(mahal,...,scale=FALSE,pzero=0.1,plg=0.5,seeMethod="lm")

Arguments

mahal

An object of class yai computed with method="mahalanobis".

...

Other objects of class yai for which statistics are desired. All objects should be for the same data and variables used for the first argument.

scale

When TRUE, the errors are scaled by their respective standard deviations.

pzero

The lower tail p-value used to pick reference observations that are zero distance from each other (used to compute rmmsd0).

plg

The upper tail p-value used to pick reference observations that are substantially distant from each other (used to compute rmsdlg).

seeMethod

Method used to compute SEE: seeMethod="lm" uses lm and seeMethod="gam" uses gam. In both cases, the model formul

Value

A list that contains several data frames. The column names of each are a combination of the name of the object used to compute the statistics and the name of the statistic. The rownames correspond the the Y-variables from the first argument. The data frame names are as follows:
commonstatistics used to compute other statistics.
name of first argumenterror statistics for the first yai object.
names of ...argumentserror statistics for each of the remaining yai objects, if any.
seestandard error of estimate for individual regressions fit for corresponding Y-variables.
rmmsd0root mean square difference for imputations based on method="mahalanobis" (always based on the first argument to the function).
mlfsquare root of the model lack of fit: $sqrt(see^2 - (rmmsd0^2/2))$.
rmsdroot mean square error.
rmsdlgroot mean square error of the observations with larger distances.
seistandard error of imputation $sqrt(rmsd^2 - (rmmsd0^2/2))$.
dstcdistance component: $sqrt(rmsd^2 - rmmsd0^2)$.
Note that unlike Stage and Crookston (2007), all statistics reported here are in the natural units, not squared units.

Details

See http://www.fs.fed.us/rm/pubs_other/rmrs_2007_stage_a001.pdf

References

Stage, A.R.; Crookston, N.L. (2007). Partitioning error components for accuracy-assessment of near neighbor methods of imputation. For. Sci. 53(1):62-72. http://forest.moscowfsl.wsu.edu/gems/StagePartitioningFS.pdf

Examples

Run this code

require (yaImpute)

data(TallyLake)

diag(cov(TallyLake[,1:8])) # see col A in Table 3 in Stage and Crookston

mal=yai(x=TallyLake[,9:29],y=TallyLake[,1:8],ann=FALSE,
        noTrgs=TRUE,method="mahalanobis")


msn=yai(x=TallyLake[,9:29],y=TallyLake[,1:8],ann=FALSE,
        noTrgs=TRUE,method="msn")


# variable "see" for "mal" matches col B (when squared and scaled)
# other columns don't match exactly as Stage and Crookston used different 
# software to compute values 

errorStats(mal,msn)

Run the code above in your browser using DataLab