Run the Uniform Manifold Approximation and Projection (UMAP) algorithm to find a low dimensional embedding of the input data that approximates an underlying manifold.
cuml_umap(
x,
y = NULL,
n_components = 2L,
n_neighbors = 15L,
n_epochs = 500L,
learning_rate = 1,
init = c("spectral", "random"),
min_dist = 0.1,
spread = 1,
set_op_mix_ratio = 1,
local_connectivity = 1L,
repulsion_strength = 1,
negative_sample_rate = 5L,
transform_queue_size = 4,
a = NULL,
b = NULL,
target_n_neighbors = n_neighbors,
target_metric = c("categorical", "euclidean"),
target_weight = 0.5,
transform_input = TRUE,
seed = NULL,
cuml_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace")
)
The input matrix or dataframe. Each data point should be a row and should consist of numeric values only.
An optional numeric vector of target values for supervised dimension reduction. Default: NULL.
The dimension of the space to embed into. Default: 2.
The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. Default: 15.
The number of training epochs to be used in optimizing the low dimensional embedding. Default: 500.
The initial learning rate for the embedding optimization. Default: 1.0.
Initialization mode of the low dimensional embedding. Must be one of "spectral", "random". Default: "spectral".
The effective minimum distance between embedded points. Default: 0.1.
The effective scale of embedded points. In combination with
min_dist
this determines how clustered/clumped the embedded points
are. Default: 1.0.
Interpolate between (fuzzy) union and intersection as the set operation used to combine local fuzzy simplicial sets to obtain a global fuzzy simplicial sets. Both fuzzy set operations use the product t-norm. The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection. Default: 1.0.
The local connectivity required -- i.e. the number of nearest neighbors that should be assumed to be connected at a local level. Default: 1.
Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples. Default: 1.0.
The number of negative samples to select per positive sample in the optimization process. Default: 5.
For transform operations (embedding new points using a trained model this will control how aggressively to search for nearest neighbors. Default: 4.0.
More specific parameters controlling the embedding. If not set,
then these values are set automatically as determined by min_dist
and spread
. Default: NULL.
The number of nearest neighbors to use to construct the target simplcial set. Default: n_neighbors.
The metric for measuring distance between the actual and
and the target values (y
) if using supervised dimension reduction.
Must be one of "categorical", "euclidean". Default: "categorical".
Weighting factor between data topology and target topology. A value of 0.0 weights entirely on data, a value of 1.0 weights entirely on target. The default of 0.5 balances the weighting equally between data and target.
If TRUE, then compute an approximate representation of the input data. Default: TRUE.
Optional seed for pseudo random number generator. Default: NULL. Setting a PRNG seed will enable consistency of trained embeddings, allowing for reproducible results to 3 digits of precision, but at the expense of potentially slower training and increased memory usage. If the PRNG seed is not set, then the trained embeddings will not be deterministic.
Log level within cuML library functions. Must be one of "off", "critical", "error", "warn", "info", "debug", "trace". Default: off.
A UMAP model object that can be used as input to the
cuml_transform()
function.
If transform_input
is set to TRUE, then the model object will
contain a "transformed_data" attribute containing the lower dimensional
embedding of the input data.
# NOT RUN {
library(cuml)
model <- cuml_umap(
x = iris[1:4],
y = iris[[5]],
n_components = 2,
n_epochs = 200,
transform_input = TRUE
)
set.seed(0L)
print(kmeans(model$transformed, iter.max = 100, centers = 3))
# }
Run the code above in your browser using DataLab