Learn R Programming

skm (version 0.1.5.4)

skm_mls: skm_mls

Description

a selective k-means problem solver - wrapper over skm_mls_cpp

Usage

skm_mls(x, k = 1L, s_colname = "s", t_colname = "t", d_colname = "d", w_colname = NULL, s_ggrp = integer(0L), s_must = integer(0L), max_it = 100L, max_at = 100L, auto_create_ggrp = TRUE, extra_immaculatism = TRUE, extra_at = 10L)

Arguments

x
data.table with s - t - d(s, t): s - t - d where s and t must characters and d must numeric. aware d is not necessary as an euclidean or any distance and even necessary as symmetric - d(s, t) can be unequal to d(t, s) - view d as such a measure of the cost of assigning one to the other!
k
number of centers
s_colname
s
t_colname
t
d_colname
d - view d as cost of assigning t into s. also modify the input data or build in the algorithm can solve problem with a different fixed cost on using each s as source - i prefer to moddify data so that the algorithm is clean and clear - i will show a how to in vignette
w_colname
w - optional: when not null will optimize toward objective to minimize d = d * w such as weighted cost of assigning t into s
s_ggrp
s_init will be stratified sampling from s w.r.t s_ggrp.
s_must
length
max_it
max number of iterations can run for optimizing result.
max_at
max number of attempts/repeats on running for optimial.
auto_create_ggrp
boolean indicator of whether auto creating the group structure using the first letter of s when s_ggrp is integer(0).
extra_immaculatism
boolean indicator of whether making extra runs for improving result consistency when multiple successive k is specified, e.g., k = c(9L, 10L).
extra_at
an integer specifying the number of extra runs when argument extra_immaculatism is TRUE.

Value

data.tableo - objective - based on d_colnamew - weighting - based on w_colnamek - k - based on k - inputs - s - based on s_colnamed - weighed averge value of d_colname weighed by w_column when s are selected.

Details

a selective k-means problem is defined as finding a subset of k rows from a m x n matrix such that the sum of each column minimial is minimized.

skm_mls would take data.table (data.frame) as inputs, rather than a matrix, assume that a data.table of s - t - d(s, t) for all combination of s and t, choose k of s that minimizes sum(min(d(s, t) over selected k of s) over t).