'tgs_cor' is very similar to 'stats::cor'. Unlike the latter it uses
all available CPU cores to compute the correlation in a much faster way. The
basic implementation of 'pairwise.complete.obs' is also more efficient
giving overall great run-time advantage.
Unlike 'stats::cor' 'tgs_cor' implements only two modes of treating
data containing NA, which are equivalent to 'use="everything"' and
'use="pairwise.complete.obs". Please refer the documentation of this
function for more details.
'tgs_cor(x, y, spearman = FALSE)' is equivalent to 'cor(x, y, method =
"pearson")' 'tgs_cor(x, y, spearman = TRUE)' is equivalent to 'cor(x, y, method
= "spearman")' 'tgs_cor(x, y, pairwise.complete.obs = TRUE, spearman = TRUE)' is
equivalent to 'cor(x, y, use = "pairwise.complete.obs", method =
"spearman")' 'tgs_cor(x, y, pairwise.complete.obs = TRUE, spearman = FALSE)' is
equivalent to 'cor(x, y, use = "pairwise.complete.obs", method = "pearson")'
'tgs_cor' can output its result in "tidy" format: a data frame with three
columns named 'col1', 'col2' and 'cor'. Only the correlation values which
abs are equal or above the 'threshold' are reported. For auto-correlation
(i.e. when 'y=NULL') a pair of columns numbered X and Y is reported only if
X < Y.
'tgs_cor_knn' works similarly to 'tgs_cor'. Unlike the latter it returns
only the highest 'knn' correlations for each column in 'x'. The result of
'tgs_cor_knn' is always outputed in "tidy" format.
One of the reasons to opt 'tgs_cor_knn' over a pair of calls to 'tgs_cor'
and 'tgs_knn' is the reduced memory consumption of the former. For
auto-correlation case (i.e. 'y=NULL') given that the number of columns NC
exceeds the number of rows NR, then 'tgs_cor' memory consumption becomes a
factor of NCxNC. In contrast 'tgs_cor_knn' would consume in the similar
scenario a factor of max(NCxNR,NCxKNN). Similarly 'tgs_cor(x,y)' would
consume memory as a factor of NCXxNCY, wherever 'tgs_cor_knn(x,y,knn)' would
reduce that to max((NCX+NCY)xNR,NCXxKNN).