Learn R Programming

PPtreeExt (version 0.1.0)

findproj_Ext: Find Optimal Projection for Class Separation

Description

Finds an optimal 1D projection of multivariate data that best separates classes using Linear Discriminant Analysis (LDA) or Penalized Discriminant Analysis (PDA), then determines a cutpoint for classification based on entropy splitting.

Usage

findproj_Ext(
  origclass,
  origdata,
  PPmethod = "LDA",
  q = 1,
  weight = TRUE,
  lambda = 0.1
)

Value

A list with the following components:

Index

Numeric value representing the optimization criterion achieved by the best projection. Higher values indicate better class separation.

Alpha

Numeric vector of length ncol(origdata) containing the optimal projection direction coefficients. This vector defines the linear combination of original variables that maximizes class separation.

C

Numeric scalar representing the optimal cutpoint (threshold) on the projected data. This value is determined using entropy-based splitting and divides observations into two groups for classification.

IOindexL

Logical vector of length nrow(origdata) indicating which observations have projected values less than or equal to the cutpoint C (projdata <= C). These observations are assigned to the left node/class.

IOindexR

Logical vector of length nrow(origdata) indicating which observations have projected values greater than the cutpoint C (projdata > C). These observations are assigned to the right node/class.

Arguments

origclass

Factor or numeric vector containing the class labels for each observation.

origdata

Numeric matrix or data frame containing the predictor variables. Each row represents an observation and each column represents a variable.

PPmethod

Character string specifying the projection pursuit method. Either "LDA" (Linear Discriminant Analysis, default) or "PDA" (Penalized Discriminant Analysis).

q

Integer specifying the dimension of the projected data. Default is 1 for 1D projection.

weight

Logical indicating whether to use weighted LDA index calculation. Default is TRUE.

lambda

Numeric penalty parameter for the PDA method. Default is 0.1. Only used when PPmethod = "PDA".

Details

This function performs projection pursuit to find a one-dimensional projection that optimally separates classes in multivariate data. The process involves:

  1. Finding the optimal projection direction using either LDA or PDA

  2. Projecting all observations onto this direction

  3. Determining an optimal cutpoint using entropy-based splitting

  4. Creating binary classification indicators based on the cutpoint

The cutpoint is calculated to minimize the weighted entropy of the resulting split. In edge cases where the cutpoint equals the maximum projected value, the function uses the second-largest value to ensure a valid split.

References

Lee, YD, Cook, D., Park JW, and Lee, EK (2013) PPtree: Projection Pursuit Classification Tree, Electronic Journal of Statistics, 7:1369-1386.