dCUR: dCUR

Description

Dynamic CUR is a function that boosts the CUR decomposition varying the k, number of columns, and rows used. Its ultimate purpose is to find the stage which minimizes the relative error. The classic CUR and its extensions can be used in dCUR.

Usage

dCUR(
  data,
  variables,
  standardize = FALSE,
  dynamic_columns = FALSE,
  dynamic_rows = FALSE,
  parallelize = FALSE,
  skip = 0.05,
  ...
)

Value

CUR returns a list of lists, each one represents a stage, and it contains:

k: Number of principal components with which leverages scores are computed.
columns: number of columns selected.
rows: number of rows selected.
relative_error: relative_error obtained: \(\frac{||A-CUR||}{||A||}\)

Arguments

data: a data frame that contains the variables to use in CUR decomposition and other externals variables with which you want to correlate.
variables: correspond to the variables used to compute the leverage scores in CUR analysis. The external variable’s names must not be included. dplyr package notation can be used to specify the variables (see examples).
standardize: logical. If TRUE the data is standardized (by subtracting the average and dividing by the standard deviation)
dynamic_columns: logical. If TRUE, an iterative process begins where leverage scores are computed for the different values from 1 to k main components, as well as from 1 to c (the proportion of columns to be selected from the data matrix).
dynamic_rows: logical. If TRUE, an iterative process begins where leverage scores are computed for the different values from 1 to k main components, as well as from 1 to r (the proportion of rows to be selected from the data matrix).
parallelize: logical.If TRUE the CUR analysis is parallelized.
skip: numeric. It specifies the change ratio of columns and rows to be selected.
...: additional arguments to be passed to CUR.

Author

Cesar Gamboa-Sanabria, Stefany Matarrita-Munoz, Katherine Barquero-Mejias, Greibin Villegas-Barahona, Mercedes Sanchez-Barba and Maria Purificacion Galindo-Villardon.

Details

This function serves as a basis for selecting the best combination of k (principal components), c (number of columns) and r (number of rows), in other words, the stage that minimizes the relative error \(\frac{||A-CUR||}{||A||}\), and thus optimizes the number of columns in the analysis, ensuring a percentage of explained variability of the data matrix and facilitating the interpretation of the data set by reducing the dimensionality of the original matrix.

If skip = 0.1 for each k, it is tested with a column proportion of 0, 0.1, 0.11,0.22,...; the same applies for rows. Given the above, it is recommended not to choose a tiny skip, since this implies doing the CUR analysis for more stages.

Parallelizing the function improves its speed significantly.

Examples

Run this code


# \donttest{
 results <- dCUR::dCUR(data=AASP, variables=hoessem:notabachillerato,
 k=15, rows=0.25, columns=0.25,skip = 0.1, standardize=TRUE,
 cur_method="sample_cur",
 parallelize =TRUE, dynamic_columns  = TRUE,
 dynamic_rows  = TRUE)
 results
# }

Run the code above in your browser using DataLab