Learn R Programming

cvq2 (version 1.1.0)

q2: Model prediction power calculation.

Description

Determines the prediction power of a model. Therefore the model is applied to an external data set, and its observations are compared to the model predictions. If an external data set is not available, the prediction power is calculated while performing a cross-validation to the model data set.

Usage

looq2( modelData, formula = NULL, round = 4, extOut = FALSE, 
  extOutFile = NULL )

  cvq2( modelData, formula = NULL, nFold = N, nRun = 1, 
  round = 4, extOut = FALSE, extOutFile = NULL )

  q2( modelData, predictData, formula = NULL, round = 4, 
  extOut = FALSE, extOutFile = NULL )

Arguments

modelData
The model data set consists of parameter $x_1$, $x_2$, ..., $x_n$ and an observation y
predictData
The prediction data set consists of parameter $x_1$, $x_2$, ..., $x_n$ and an observation y
formula
The formula used to predict the observed value, like $y$ ~ $x_1 + x_2 + \ldots + x_n$ DEFAULT: NULL If NULL, a generic formula is derived from the data set, assuming that the last column contains the observed value
nFold
The model data set modelData is randomly partitioned into n equal sized subsets (test sets) during each run of cross-validation, DEFAULT: N, $2
nRun
Number of iterations, the cross-validation is applied to the data set. This corresponds to the number of individual predictions per observed value, DEFAULT: 1, $1
round
The rounding value used in the output, DEFAULT: 4
extOut
Extended output, DEFAULT: FALSE If extOutFile is not specified, write to stdout()
extOutFile
Write extended output into file (implies extOut = TRUE), DEFAULT: NULL

Value

  • q2()-method{ The method q2 returns an object of class "q2". It contains information about the model calibration and its prediction performance on the external data set. } cvq2()-method, looq2()-method{ The methods cvq2 and looq2 return an object of class "cvq2". It contains information about the model calibration and its prediction performance described by the model data set. Furthermore this object contains data about the cross-validation applied to the model data set. }

concept

  • cross-validation
  • Pearson correlation coefficient
  • squared correlation coefficient

Details

The calibration of modelData, including the conventional squared correlation coefficient, $r^2$, is calculated with a linear regression. q2()-method{ Alias: qsq(), qsquare() The model described by modelData is used to predict the observations of predictData. These predictions are used in the $q^2_{tr}$ equation to calculate the predictive squared correlation coefficient. } cvq2()-method{ Alias: cvqsq(), cvqsquare() A cross-validation is performed for modelData, whereas modelData ($N$ elements) is split into nFold disjunct and equal sized test sets (subsets). Each test set consists of $k$ elements: $$k = \left\lceil\frac{N}{nFold}\right\rceil$$ In case, $\frac{N}{nFold}$ is a decimal number, some test sets consist of $k-1$ elements. The remaining $N-k$ elements are merged together as training set for this test set and describe the model M'. This model is used to predict the observations in the test set. Note, that M' is slighlty different compared to the model M for the $r^2$-calculation, which is a result of the missing k values. Each observation from modelData is predicted once. The difference between the prediction and the observation within the test sets is used to calculate the PREdictive residual Sum of Squares (PRESS). Furthermore for any training set, the mean of the observed values, $y_{mean}^{N-k,i}$, is calculated. With PRESS and $y_{mean}^{N-k,i}$, the modified $q^2_{cv}$ equation is used to calculate the predictive squared correlation coefficient. In case $k > 1$ one can repeat the cross-validation to overcome biasing. Therefore, in each iteration ($\code{nRun} = 1 \ldots x$), the test sets are compiled individually by random. Within one iteration, each observation is predicted once. If $\code{nFold} = N$, one need one iteration only. } looq2()-method{ Same procedure as cvq2()-method (see above), but implicit $\code{nFold} = N$ to perform a Leave-One-Out cross-validation. For Leave-One-Out cross-validation one need one iteration (nRun = 1) only. }

Examples

Run this code
library(cvq2)
data(cvq2.setA)
result <- cvq2( cvq2.setA, y ~ x1 + x2 )
result

data(cvq2.setB)
result <- cvq2( cvq2.setB, y ~ x, nFold = 3 )
result

data(cvq2.setB)
result <- cvq2( cvq2.setB, y ~ x, nFold = 3, nRun = 5 )
result

data(cvq2.setA)
result <- looq2( cvq2.setA, y~x1+x2 )
result

data(cvq2.setA)
data(cvq2.setA_pred)
result <- q2( cvq2.setA, cvq2.setA, y~x1+x2 )
result

Run the code above in your browser using DataLab