data.table with s - t - d(s, t): s - t - d
where s and t must characters and d must numeric.
aware d is not necessary as an euclidean or any distance and even
necessary as symmetric - d(s, t) can be unequal to d(t, s) - view d as such
a measure of the cost of assigning one to the other!
k
number of centers
s_colname
s
t_colname
t
d_colname
d - view d as cost of assigning t into s.
also modify the input data or build in the algorithm can solve problem with
a different fixed cost on using each s as source - i prefer to moddify data
so that the algorithm is clean and clear - i will show a how to in vignette
w_colname
w - optional: when not null will optimize toward
objective to minimize d = d * w such as weighted cost of assigning t into s
s_ggrp
s_init will be stratified sampling from s w.r.t s_ggrp.
s_must
length
max_it
max number of iterations can run for optimizing result.
max_at
max number of attempts/repeats on running for optimial.
auto_create_ggrp
boolean indicator of whether auto creating the group
structure using the first letter of s when s_ggrp is integer(0).
extra_immaculatism
boolean indicator of whether making extra runs for
improving result consistency when multiple successive k is specified, e.g.,
k = c(9L, 10L).
extra_at
an integer specifying the number of extra runs when argument
extra_immaculatism is TRUE.
Value
data.tableo - objective - based on d_colnamew - weighting - based on w_colnamek - k - based on k - inputs - s - based on s_colnamed - weighed averge value of d_colname weighed by w_column when s are selected.
Details
a selective k-means problem is defined as finding a subset of k rows from
a m x n matrix such that the sum of each column minimial is minimized.
skm_mls would take data.table (data.frame) as inputs, rather than a matrix,
assume that a data.table of s - t - d(s, t) for all combination of s and t,
choose k of s that minimizes sum(min(d(s, t) over selected k of s) over t).