Learn R Programming

rSCA (version 2.0)

rSCA.modeling: Multivariate Modeling with Stepwise Cluster Analysis

Description

This function models the relationship between multivariate dependent variable and independent ones. It represents the relationship as a clustered tree. The information for the clustered tree is saved into two text files: tree file (file name: tree_***.txt) and map file (file name: map_***.txt). A tree file stores in the structure of the clustered tree, and a map file contains the detailed information of leaf nodes. There two files are usually generated under your current work directory. If the debug mode is enabled, a log file (file name: log_***.txt) will also be generated under the current work directory.

Usage

rSCA.modeling(alpha = 0.05, xfile, yfile, x.row.names = FALSE, 
        x.col.names = FALSE, y.row.names = FALSE, y.col.names = FALSE, 
        x.missing.flag = "NA", y.missing.flag = "NA", x.type = ".txt", 
        y.type = ".txt", mapvalue = "mean", GSS = FALSE, debug = FALSE)

Arguments

alpha
significance level for clustering, usually in 0.001 - 0.1, default value is 0.05.
xfile
a string to specify the full filename of the independent (x) data file, only supports files in *.txt or *.csv.
yfile
a string to specify the full filename of the dependent (y) data file, only supports files in *.txt or *.csv.
x.row.names
a logical value to specify if the independent (x) data file contains row names or not. Default value is FALSE.
x.col.names
a logical value to specify if the independent (x) data file contains column names or not. Default value is FALSE.
y.row.names
a logical value to specify if the dependent (y) data file contains row names or not. Default value is FALSE.
y.col.names
a logical value to specify if the dependent (y) data file contains column names or not. Default value is FALSE.
x.missing.flag
a string to specify the missing flag used in the independent (x) data file. Default value is "NA".
y.missing.flag
a string to specify the missing flag used in the dependent (y) data file. Default value is "NA".
x.type
a string to specify the type of independent (x) data file. Default value is ".txt".
y.type
a string to specify the type of dependent (y) data file. Default value is ".txt".
mapvalue
a predefined string to specify how the information of leaf nodes will be stored in the map file. A full list of options for mapvalue is: {mean, max, min, median, interval, radius, variation, random}. Default value is "mean". Use "interval" to get an inter
GSS
a logical value to specify if the Golden Section Search method will be used for seeking the best cutting point. Default value is FALSE.
debug
a logical value to specify if the debug mode is enabled or not. Default value is FALSE. A log file will be created under your current work directory if debug model is enabled.

Value

  • treefilea string shows the name of the tree file, tree_***.txt.
  • mapfilea string shows the name of the map file, map_***.txt.
  • logfilea string shows the name of the log file, log_***.txt. If the debug mode is disabled, the value of logfile will be NA.
  • typea string indicates how the information of leaf nodes will be stored. Generally, the "type" keeps the same value as specified by the input parameter -- "mapvalue".
  • totalNodesa number indicates how many nodes are included in the clustered tree.
  • leafNodesa number indicates how many leaf nodes are generated the clustered tree.
  • cuttingActionsa number indicates how many cutting actions are executed during the modeling process.
  • mergingActionsa number indicates how many merging actions are executed during the modeling process.

References

Wang, Xiuquan, Guohe Huang, Qianguo Lin, Xianghui Nie, Guanhui Cheng, Yurui Fan, Zhong Li, Yao Yao, and Meiqin Suo (2013). A stepwise cluster analysis approach for downscaled climate projection - A Canadian case study. Environmental Modelling & Software, 49: 141-151. Huang, Guohe (1992). A stepwise cluster analysis method for predicting air quality in an urban environment. Atmospheric Environment (Part B. Urban Atmosphere), 26(3): 349-357. Liu, Y. Y. and Y. L. Wang (1979). Application of stepwise cluster analysis in medical research. Scientia Sinica, 22(9): 1082-1094.

Examples

Run this code
## Load rSCA package
library(rSCA)

## X data file
xdata <- c("A B C Dr", "0.095 0.044 39.9 27r", 
           "0.810 0.058 9.1 8r", "0.101 0.077 11.4 14r",
           "0.006 0.141 20.5 29r", "0.070 0.281 27.3 26r",
           "0.481 0.514 30.2 48r", "0.120 0.286 36.4 39r",
           "0.480 0.199 40.9 27r", "0.112 0.101 29.9 18r",
           "0.026 0.203 48.1 28r", "0.128 1.235 48.2 61r",
           "2.681 0.439 51.1 98r", "1.601 0.333 56.1 99r",
           "1.398 0.455 19.3 103r", "1.256 0.314 14.9 17r",
           "2.618 0.609 9.1 19r", "1.217 0.880 17.2 73r",
           "1.411 2.115 19.6 203r", "0.245 6.839 49.2 296r",
           "0.724 3.060 17.1 192r", "0.019 2.252 29.1 123r",
           "1.321 5.730 41.1 288r", "0.903 3.078 39.0 97r",
           "0.714 1.013 16.7 5r", "0.581 1.398 11.7 57r",
           "0.080 1.734 10.2 52r", "0.120 1.848 6.6 132r",
           "0.089 1.357 10.3 148r", "0.112 0.585 19.3 79r",
           "0.192 0.675 6.9 39r", "0.301 1.937 11.9 6r")
xdatafile <- tempfile()
writeLines(xdata, xdatafile)

## Y data file
ydata <- c("Y1 Y2 Y3r", "0.020 0.034 10.01r",
           "0.011 0.011 6.92r", "0.016 0.018 9.53r",
           "0.022 0.018 5.04r", "0.031 0.029 8.90r",
           "0.057 0.036 9.98r", "0.040 0.048 12.96r",
           "0.061 0.050 9.84r", "0.023 0.031 8.84r",
           "0.025 0.020 4.66r", "0.041 0.042 9.02r",
           "0.070 0.029 11.37r", "0.077 0.022 11.88r",
           "0.105 0.038 11.06r", "0.038 0.027 11.64r",
           "0.058 0.019 8.25r", "0.051 0.050 10.01r",
           "0.073 0.038 9.20r", "0.123 0.080 9.91r",
           "0.089 0.046 9.37r", "0.073 0.039 7.99r",
           "0.139 0.069 13.28r", "0.095 0.048 9.80r",
           "0.034 0.040 8.50r", "0.055 0.034 9.21r",
           "0.020 0.050 8.67r", "0.070 0.036 8.03r",
           "0.058 0.039 8.01r", "0.057 0.031 6.30r",
           "0.050 0.014 7.92r", "0.039 0.040 8.08r")
ydatafile <- tempfile()
writeLines(ydata, ydatafile)

## Modeling with SCA: default parameters
myModel = rSCA.modeling(xfile = xdatafile, yfile = ydatafile, 
              x.col.names = TRUE, y.col.names = TRUE)

## Modeling with SCA: alpha = 0.1, with debug mode enabled
myModel = rSCA.modeling(alpha = 0.1, xfile = xdatafile, yfile = ydatafile, 
              x.col.names = TRUE, y.col.names = TRUE, debug = TRUE)
			  
## Modeling with SCA: alpha = 0.05, use interval for leaf nodes
myModel = rSCA.modeling(xfile = xdatafile, yfile = ydatafile, 
              x.col.names = TRUE, y.col.names = TRUE, mapvalue = "interval")

Run the code above in your browser using DataLab