lwplsr_agg: Aggregation of KNN-LWPLSR models with different numbers of LVs

Description

Ensemblist method where the predictions are calculated by averaging the predictions of KNN-LWPLSR models (lwplsr) built with different numbers of latent variables (LVs).

For instance, if argument nlv is set to nlv = "5:10", the prediction for a new observation is the simple average of the predictions returned by the models with 5 LVS, 6 LVs, ... 10 LVs, respectively.

Usage

lwplsr_agg(
    X, Y,
    nlvdis, diss = c("eucl", "mahal"),
    h, k,
    nlv,
    cri = 4,
    verb = FALSE
    )
# S3 method for Lwplsr_agg
predict(object, X, ...)

Value

For lwplsr_agg: object of class Lwplsr_agg

For predict.Lwplsr_agg:

pred: prediction calculated for each observation, which is the most occurent level (vote) over the predictions returned by the models with different numbers of LVS respectively
listnn: list with the neighbors used for each observation to be predicted
listd: list with the distances to the neighbors used for each observation to be predicted
listw: list with the weights attributed to the neighbors used for each observation to be predicted

Arguments

X: For the main function: Training X-data ( $n, p$ ). --- For the auxiliary function: New X-data ( $m, p$ ) to consider.
Y: Training Y-data ( $n, q$ ).
nlvdis: The number of LVs to consider in the global PLS used for the dimension reduction before calculating the dissimilarities. If nlvdis = 0, there is no dimension reduction.
diss: The type of dissimilarity used for defining the neighbors. Possible values are "eucl" (default; Euclidean distance), "mahal" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)).
h: A scale scalar defining the shape of the weight function. Lower is $h$ , sharper is the function. See wdist.
k: The number of nearest neighbors to select for each observation to predict.
nlv: A character string such as "5:20" defining the range of the numbers of LVs to consider (here: the models with nb LVS = 5, 6, ..., 20 are averaged). Syntax such as "10" is also allowed (here: correponds to the single model with 10 LVs).
cri: Argument cri in function wdist.
verb: Logical. If TRUE, fitting information are printed.
object: For the auxiliary function: A fitted model, output of a call to the main function.
...: For the auxiliary function: Optional arguments. Not used.

Examples

Run this code


## EXAMPLE 1

n <- 30 ; p <- 10
Xtrain <- matrix(rnorm(n * p), ncol = p)
ytrain <- rnorm(n)
Ytrain <- cbind(ytrain, 100 * ytrain)
m <- 4
Xtest <- matrix(rnorm(m * p), ncol = p)
ytest <- rnorm(m)
Ytest <- cbind(ytest, 10 * ytest)

nlvdis <- 5 ; diss <- "mahal"
h <- 2 ; k <- 10
nlv <- "2:6" 
fm <- lwplsr_agg(
    Xtrain, Ytrain, 
    nlvdis = nlvdis, diss = diss,
    h = h, k = k,
    nlv = nlv)
names(fm)
res <- predict(fm, Xtest)
names(res)
res$pred
msep(res$pred, Ytest)

## EXAMPLE 2

n <- 30 ; p <- 10
Xtrain <- matrix(rnorm(n * p), ncol = p)
ytrain <- rnorm(n)
Ytrain <- cbind(ytrain, 100 * ytrain)
m <- 4
Xtest <- matrix(rnorm(m * p), ncol = p)
ytest <- rnorm(m)
Ytest <- cbind(ytest, 10 * ytest)

nlvdis <- 5 ; diss <- "mahal"
h <- c(2, Inf)
k <- c(10, 20)
nlv <- c("1:3", "2:5")
pars <- mpars(nlvdis = nlvdis, diss = diss,
    h = h, k = k, nlv = nlv)
pars
res <- gridscore(
    Xtrain, Ytrain, Xtest, Ytest, 
    score = msep, 
    fun = lwplsr_agg, 
    pars = pars)
res

## EXAMPLE 3

n <- 30 ; p <- 10
Xtrain <- matrix(rnorm(n * p), ncol = p)
ytrain <- rnorm(n)
Ytrain <- cbind(ytrain, 100 * ytrain)
m <- 4
Xtest <- matrix(rnorm(m * p), ncol = p)
ytest <- rnorm(m)
Ytest <- cbind(ytest, 10 * ytest)

K = 3
segm <- segmkf(n = n, K = K, nrep = 1)
segm
res <- gridcv(
    Xtrain, Ytrain, 
    segm, score = msep, 
    fun = lwplsr_agg, 
    pars = pars,
    verb = TRUE)
res

Run the code above in your browser using DataLab

Get 50% off unlimited learning

Description

Usage

Value

Arguments

Examples