Learn R Programming

SpaCCr (version 0.1.0)

SpaCC_CV: Perform Cross Validation to select gamma/sparsity level

Description

Perform Cross Validation to select gamma/sparsity level

Usage

SpaCC_CV(X, w, gamma.seq, nfolds = 5, nu = 1/nrow(X), verbose = FALSE, tol.base = 1e-04, tol.miss = 1e-04, max.iter.base = 5000, max.iter.miss = 500, parallel = FALSE, frac = 1)

Arguments

X
A subject (n) by variable (p) matrix; the data
w
A vector of length p-1; weights for clustering
gamma.seq
A vector of positive scalars; regularization parameter sequence
nfolds
A positive scalar; number of cross validation folds
nu
A positive scalar; augmented Lagrangian paramter
verbose
Logical; should messages be printed?
tol.base
A small positive scalar; convergence tolerance for base SpaCC problem.
tol.miss
A small positive scalar; convergence tolerance for missing data problem.
max.iter.base
A positive integer; maximum number of iterations for base SpaCC problem
max.iter.miss
A positive integer; maximum number of iterations for missing data problem
parallel
A logical; should CV paths be done in parallel?
frac
A positive scalar between 0 and 1; fraction of hold out set to utilize

Value

A list with elements: ErrMat - a length(gamma.seq) by nfold matrix containing error on out of fold data; SpMat - a length(gamma.seq) by nfold matrix containing sparsity levels; gamma.seq - original gamma.seq sorted largest to smallest

Examples

Run this code
library(dplyr)
library(tidyr)
data("methy")
methy <- methy[1:20,1:10]
Coordinates <- methy$Genomic_Coordinate
methy %>%
 tbl_df() %>%
 select(-Chromosome,-Genomic_Coordinate) %>%
 gather(Subject,Value,-ProbeID) %>%
 spread(ProbeID,Value) -> X
SubjectLabels <- X$Subject
X <- X[,-1] %>% as.matrix()
nsubj <- nrow(X)
nprobes <- ncol(X)
nweights <- choose(nprobes,2)
diff.vals <- diff(Coordinates)
too.far <- diff.vals > 20000
sig = 1/5e3
w.values <- exp(-sig*diff.vals)
w.values[too.far] = 0

verbose=TRUE
tol.base = 1e-4
tol.miss = 1e-4
max.iter.base=5000
max.iter.miss=500
ngam = 20
gamma.seq <- exp(seq(log(1e-1),log(1e1),length.out=ngam))
CVRes <- SpaCC_CV(X=t(scale(t(X),center=TRUE,scale=FALSE)),
                 w=w.values,
                 gamma.seq=gamma.seq,
                 nfolds=5,
                 nu=1/nsubj,
                 verbose=TRUE,
                 tol.base=tol.base,
                 tol.miss=tol.miss,
                 max.iter.base=max.iter.base,
                 max.iter.miss=max.iter.miss,
                 parallel=FALSE,frac = 1)

Run the code above in your browser using DataLab