suggestK: Suggest optimal K value for the factorization

Description

This function sweeps through a series of k values (number of ranks the datasets are factorized into). For each k value, it repeats the factorization for a number of random starts and obtains the objective errors from each run. The optimal k value is recommended to be the one with the lowest variance.

We are currently actively testing the methodology and the function is subject to change. Please report any issues you encounter.

Currently we have identified that a wider step of k values (e.g. 5, 10, 15, ...) shows a more stable variance than a narrower step (e.g. 5, 6, 7, ...).

Note that this function is supposed to take a long time when a larger number of random starts is requested (e.g. 50) for a robust suggestion. It is safe to interrupt the progress (e.g. Ctrl+C) and the function will still return the recorded objective errors already completed.

Usage

suggestK(
  object,
  kTest = seq(5, 50, 5),
  nRandomStart = 10,
  lambda = 5,
  nIteration = 30,
  nCores = 1L,
  verbose = getOption("ligerVerbose", TRUE)
)

Value

A list containing:

stats: A data frame containing the k values, objective errors, and random starts.
figure: A ggplot2 object showing the objective errors and variance for each k value. The left y-axis corresponds to the dots and bands, the right second y-axis maps to the blue line that stands for the variance.

Arguments

object: A liger object.
kTest: A numeric vector of k values to be tested. Default 5, 10, 15, ..., 50.
nRandomStart: Number of random starts for each k value. Default 10.
lambda: Regularization parameter. Default 5.
nIteration: Number of iterations for each run. Default 30.
nCores: Number of cores to use for each run. Default 1L.
verbose: Whether to print progress messages. Default TRUE.

Examples

Run this code

pbmcPlot <- scaleNotCenter(pbmcPlot)
# Minimum test example, not for demonstrative recommendation
# \donttest{
suggests <- suggestK(
    object = pbmcPlot,
    kTest = c(2, 3),
    nRandomStart = 2,
    nIteration = 2
)
suggests$figure
# }

Run the code above in your browser using DataLab