Learn R Programming

biosurvey (version 0.1.1)

find_clusters: Detection of clusters in 2D spaces

Description

Finds clusters of data in two dimensions based on distinct methods.

Usage

find_clusters(data, x_column, y_column, space,
              cluster_method = "hierarchical", n_k_means = NULL,
              split_distance = NULL)

Arguments

data

matrix or data.frame that contains at least two columns.

x_column

(character) the name of the x-axis.

y_column

(character) the name of the y-axis.

space

(character) space in which the thinning will be performed. There are two options available: "G", if it will be in the geographic space, and "E", if it will be in the environmental space.

cluster_method

(character) name of the method to be used for detecting clusters. Options are "hierarchical" and "k-means"; default = "hierarchical".

n_k_means

(numeric) number of clusters to be identified when using the "k-means" in cluster_method.

split_distance

(numeric) distance in meters (if space = "G") or Euclidean distance (if space = "E") to identify clusters if cluster_method = "hierarchical".

Value

A data frame containing data and an additional column defining clusters.

Details

Clustering methods make distinct assumptions and one of them may perform better than the other depending on the pattern of the data.

The k-means method tends to perform better when data are grouped spatially (spherically) and clusters are of a similar size. The hierarchical clustering algorithm usually takes more time than the k-means method. Both methods make assumptions and may work well on some data sets but fail on others.

Examples

Run this code
# NOT RUN {
# Data
data("m_matrix", package = "biosurvey")

# Cluster detection
clusters <-  find_clusters(m_matrix$data_matrix, x_column = "PC1",
                           y_column = "PC2", space = "E",
                           cluster_method = "hierarchical", n_k_means = NULL,
                           split_distance = 4)
head(clusters)
# }

Run the code above in your browser using DataLab