Learn R Programming

SCE (version 1.0.0)

SCA: Stepwise Cluster Analysis (SCA) Model

Description

This function implements a Stepwise Cluster Analysis (SCA) model for multivariate data analysis. The SCA model recursively partitions the data space based on Wilks' Lambda statistic, creating a tree structure that can be used for prediction. The function includes comprehensive input validation for data types, missing values, and sample size requirements, and supports both single and multiple predictants.

Usage

SCA(Training_data, X, Y, Nmin, alpha = 0.05, resolution = 1000, verbose = FALSE)

Value

A list containing:

  • Tree: The SCA tree structure

  • Map: Mapping information for predictions

  • XName: Names of predictors used

  • YName: Names of predictants

  • type: Mapping type (currently "mean")

  • totalNodes: Total number of nodes in the tree

  • leafNodes: Number of leaf nodes

  • cuttingActions: Number of cutting actions performed

  • mergingActions: Number of merging actions performed

Arguments

Training_data

A data.frame or matrix containing the training data. Must include all specified predictors and predictants. Must not contain missing values.

X

A character vector specifying the names of independent variables (e.g., c("Prcp","SRad","Tmax")). Must be present in Training_data. All variables must be numeric.

Y

A character vector specifying the name(s) of dependent variable(s) (e.g., c("swvl3","swvl4")). Must be present in Training_data. All variables must be numeric.

Nmin

Integer specifying the minimal number of samples in a leaf node for cutting. Must be a positive number and less than the sample size.

alpha

Numeric significance level for clustering, between 0 and 1. Default value is 0.05.

resolution

Numeric value specifying the resolution for splitting. Controls the granularity of the search for optimal split points. Default value is 1000.

verbose

A logical value indicating whether to print progress information during model building. Default value is FALSE.

Author

Xiuquan Wang <xxwang@upei.ca> (original SCA) Kailong Li <lkl98509509@gmail.com> (Resolution-search-based SCA)

Details

The SCA model building process involves:

  1. Input validation:

    • Data type and structure checks (data.frame or matrix)

    • Missing value checks in both predictors and predictants

    • Numeric data validation for all variables

    • Sample size requirements (must be greater than Nmin)

    • Parameter validation (alpha, Nmin, resolution)

  2. Data preparation:

    • Conversion of input data to matrix format

    • Dimension checks and storage

    • Parameter initialization

  3. Tree construction:

    • Recursive partitioning based on Wilks' Lambda

    • Node splitting and merging

    • Leaf node creation

Examples

Run this code
  ## Load SCE package
  library(SCE)

  ## Load training and testing data
  data("Streamflow_training_10var")
  data("Streamflow_testing_10var")

  ## Define independent (x) and dependent (y) variables
  Predictors <- c("Prcp","SRad","Tmax","Tmin","VP","smlt","swvl1","swvl2","swvl3","swvl4")
  Predictants <- c("Flow")

  ## Build the SCA model
  Model <- SCA(
    Training_data = Streamflow_training_10var,
    X = Predictors,
    Y = Predictants,
    Nmin = 5,
    alpha = 0.05,
    resolution = 1000
  )

Run the code above in your browser using DataLab