SCA: Stepwise Cluster Analysis (SCA) Model

Description

This function implements a Stepwise Cluster Analysis (SCA) model for multivariate data analysis. The SCA model recursively partitions the data space based on Wilks' Lambda statistic, creating a tree structure that can be used for prediction. The function includes comprehensive input validation for data types, missing values, and sample size requirements, and supports both single and multiple predictants.

Usage

SCA(Training_data, X, Y, Nmin, alpha = 0.05, resolution = 1000, verbose = FALSE)

Value

A list containing:

Tree: The SCA tree structure
Map: Mapping information for predictions
XName: Names of predictors used
YName: Names of predictants
type: Mapping type (currently "mean")
totalNodes: Total number of nodes in the tree
leafNodes: Number of leaf nodes
cuttingActions: Number of cutting actions performed
mergingActions: Number of merging actions performed

Arguments

Training_data: A data.frame or matrix containing the training data. Must include all specified predictors and predictants. Must not contain missing values.
X: A character vector specifying the names of independent variables (e.g., c("Prcp","SRad","Tmax")). Must be present in Training_data. All variables must be numeric.
Y: A character vector specifying the name(s) of dependent variable(s) (e.g., c("swvl3","swvl4")). Must be present in Training_data. All variables must be numeric.
Nmin: Integer specifying the minimal number of samples in a leaf node for cutting. Must be a positive number and less than the sample size.
alpha: Numeric significance level for clustering, between 0 and 1. Default value is 0.05.
resolution: Numeric value specifying the resolution for splitting. Controls the granularity of the search for optimal split points. Default value is 1000.
verbose: A logical value indicating whether to print progress information during model building. Default value is FALSE.

Author

Xiuquan Wang <xxwang@upei.ca> (original SCA) Kailong Li <lkl98509509@gmail.com> (Resolution-search-based SCA)

Details

The SCA model building process involves:

Input validation:
- Data type and structure checks (data.frame or matrix)
- Missing value checks in both predictors and predictants
- Numeric data validation for all variables
- Sample size requirements (must be greater than Nmin)
- Parameter validation (alpha, Nmin, resolution)
Data preparation:
- Conversion of input data to matrix format
- Dimension checks and storage
- Parameter initialization
Tree construction:
- Recursive partitioning based on Wilks' Lambda
- Node splitting and merging
- Leaf node creation

Examples

Run this code

  ## Load SCE package
  library(SCE)

  ## Load training and testing data
  data("Streamflow_training_10var")
  data("Streamflow_testing_10var")

  ## Define independent (x) and dependent (y) variables
  Predictors <- c("Prcp","SRad","Tmax","Tmin","VP","smlt","swvl1","swvl2","swvl3","swvl4")
  Predictants <- c("Flow")

  ## Build the SCA model
  Model <- SCA(
    Training_data = Streamflow_training_10var,
    X = Predictors,
    Y = Predictants,
    Nmin = 5,
    alpha = 0.05,
    resolution = 1000
  )

Run the code above in your browser using DataLab