Carry out an embedding of new data using an existing embedding. Requires
using the result of calling umap
or tumap
with
ret_model = TRUE
.
umap_transform(X, model, init_weighted = TRUE, search_k = NULL,
tmpdir = tempdir(), n_epochs = NULL, n_threads = max(1,
RcppParallel::defaultNumThreads()/2), n_sgd_threads = 0,
grain_size = 1, verbose = FALSE)
The new data to be transformed, either a matrix of data frame. Must
have the same columns in the same order as the input data used to generate
the model
.
Data associated with an existing embedding.
If TRUE
, then initialize the embedded coordinates
of X
using a weighted average of the coordinates of the nearest
neighbors from the original embedding in model
, where the weights
used are the edge weights from the UMAP smoothed knn distances. Otherwise,
use an unweighted average.
Number of nodes to search during the neighbor retrieval. The
larger k, the more the accurate results, but the longer the search takes.
Default is the value used in building the model
is used.
Temporary directory to store nearest neighbor indexes during
nearest neighbor search. Default is tempdir
. The index is
only written to disk if n_threads > 1
; otherwise, this parameter is
ignored.
Number of epochs to use during the optimization of the
embedded coordinates. A value between 30 - 100
is a reasonable trade
off between speed and thoroughness. By default, this value is set to one
third the number of epochs used to build the model
.
Number of threads to use, (except during stochastic gradient descent). Default is half that recommended by RcppParallel.
Number of threads to use during stochastic gradient descent. If set to > 1, then results will not be reproducible, even if `set.seed` is called with a fixed seed before running.
Minimum batch size for multithreading. If the number of
items to process in a thread falls below this number, then no threads will
be used. Used in conjunction with n_threads
and
n_sgd_threads
.
If TRUE
, log details to the console.
A matrix of coordinates for X
transformed into the space
of the model
.
Note that some settings are incompatible with the production of a UMAP model
via umap
: external neighbor data (passed via a list to the
argument of the nn_method
parameter), and factor columns that were
included in the UMAP calculation via the metric
parameter. In the
latter case, the model produced is based only on the numeric data.
A transformation is possible, but factor columns in the new data are ignored.
# NOT RUN {
iris_train <- iris[1:100, ]
iris_test <- iris[101:150, ]
# You must set ret_model = TRUE to return extra data needed
iris_train_umap <- umap(iris_train, ret_model = TRUE)
iris_test_umap <- umap_transform(iris_test, iris_train_umap)
# }
Run the code above in your browser using DataLab