Learn R Programming

stream (version 0.2-0)

DSD_GaussianStatic: Static Gaussians Data Stream Generator

Description

A data stream generator that produces a data stream with static Gaussians.

Usage

DSD_GaussianStatic(k=2, d=2, mu, sigma, p, noise=0, noise_range)

Arguments

k
Determines the number of clusters.
d
Determines the number of dimensions.
mu
A matrix of means for each dimension of each cluster.
sigma
A list of length k of covariance matrices.
p
A vector of probabilities that determines the likelihood of generated a data point from a particular cluster.
noise
Noise probability between 0 and 1. Noise is uniformly distributed within noise range (see below).
noise_range
A matrix with d rows and 2 columns. The first column contains the minimum values and the second column contains the maximum values for noise.

Value

  • Returns a DSD_GaussianStatic object (subclass of DSD_R, DSD) which is a list of the defined params. The params are either passed in from the function or created internally. They include:
  • descriptionA brief description of the DSD object.
  • kThe number of clusters.
  • dThe number of dimensions.
  • muThe matrix of means of the dimensions in each cluster.
  • sigmaThe covariance matrix.
  • pThe probability vector for the clusters.
  • noiseA flag that determines if or if not noise is generated.

Details

DSD_GaussianStatic creates a mixture of k d-dimensional static Gaussians in approximately unit space. The centers mu and the covariance matrices sigma can be supplied or will be randomly generates. The probability vector p defines for each cluster the probability that the next data point will be chosen from it (defaults to equal probability).

The generation method is similar to the one suggested by Jain and Dubes (1988).

References

Jain and Dubes(1988) Algorithms for clustering data, Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

See Also

DSD

Examples

Run this code
# create data stream with three clusters in 2D
dsd1 <- DSD_GaussianStatic(k=3, d=2)

# plotting the data
plot(dsd1)

# create data stream with specified clusters and 20% noise
dsd2 <- DSD_GaussianStatic(k=2, d=2, 
    mu=rbind(c(-.5,-.5), c(.5,.5)), 
    noise=0.2, noise_range=rbind(c(-1,1),c(-1,1)))
plot(dsd2)

Run the code above in your browser using DataLab