valesta: Function to compute prediction statistics based on observed values

Description

Computes three prediction statistics as a way to compare observed versus predicted values of a response variable of interest. The statistics are: the root mean square differences ($RMSD$), the aggregated difference ($AD$), and the absolute aggregated differences ($AAD$). All of them are based on $$r_i = y_i - \widehat{y}_i$$ where $y_i$ and $\widehat{y}_i$ are the observed and the predicted value of the response variable $y$ for the $i$-th observation, respectively. Both the observed and predicted values must be expressed in the same units.

Usage

valesta(y.obs = y.obs, y.pred = y.pred, want.percent = TRUE, want.n = FALSE)

Value

The main output depends on the want.percent; if TRUE, then it has the following six prediction statistics as a vector: (rmsd, rmsd.p,ad, ad.p, aad, aad.p); where rmsd.p stands for RMSD expressed as a percentage, and the same applies to AD.p and AAD.p. Meanwhile, if want.percent=FALSE, then it has the following three prediction statistics as a vector: (rmsd,ad,aad)

Arguments

y.obs: observed values of the variable of interest
y.pred: predicted values of the variable of interest
want.percent: A logic option for requesting to also computed the prediction statistics as a percentage of the sample mean of y.obs. By default is set to TRUE.
want.n: A logic option to add the sample size n to the output. Bu default is set to FALSE

Author

Christian Salas-Eljatib.

Details

The function computes the three aforementioned statistics expressed in both (a) the units of the response variable and (b) the percentage. Notice that to represent each statistic in percentual terms, we divided them by the mean observed value of the response variable.

References

Salas C, Ene L, Gregoire TG, Nasset E, Gobakken T. 2010. Modelling tree diameter from airborne laser scanning derived variables: a comparison of spatial statistical models. Remote Sensing of Environment 114(6):1277-1285. tools:::Rd_expr_doi("10.1016/j.rse.2010.01.020")
Salas C. 2002. Ajuste y validación de ecuaciones de volumen para un relicto del bosque de roble-laurel-lingue. Bosque 23(2):81–92. tools:::Rd_expr_doi("10.4067/S0717-92002002000200009").

Examples

Run this code


#Creates a fake dataframe
set.seed(1234)
df <- as.data.frame(cbind(Y=rnorm(30, 30,9), X=rnorm(30, 450,133)))
head(df)
descstat(df)
#fitting a candidate model
mod1 <- lm(Y~X, data=df)
#Using the valesta function
valesta(y.obs=df$Y,y.pred=fitted(mod1))
# If some of the predicted values is missing (e.g. because of a
# missing predictor variable) the number of observations can be exported
df2 <- data.frame(y.obs=df$Y,y.pred=fitted(mod1))
df2[c(14,26), 2] <- NA
descstat(df2)
#Notice the different sample size
valesta(y.obs = df2$y.obs, y.pred = df2$y.pred, want.n = TRUE)
# Thus, only 28 observations are used, as it should be.

Run the code above in your browser using DataLab