mFLSSSparImposeBounds: Multithreaded multidimensional Subset Sum in bounded solution space given error thresholds

Description

For comparison, function mFLSSSpar() puts no bounds on the solution space so it can sort mV internally in a special order to accelerate computing speed. Function mFLSSSparImposeBounds() bounds the solution space and keeps the order of mV. See Examples for its application to the generalized assignment problem [1].

Usage

mFLSSSparImposeBounds(
  maxCore = 7L,
  len,
  mV,
  mTarget,
  mME,
  LB = 1L : len,
  UB = (nrow(mV) - len + 1L) : nrow(mV),
  solutionNeed = 1L,
  tlimit = 60,
  dl = ncol(mV),
  du = ncol(mV),
  targetsOrder = NULL,
  useBiSrchInFB = FALSE,
  avgThreadLoad = 8L
  )

Arguments

maxCore

See maxCore in mFLSSSpar().

len

See len in mFLSSSpar().

See mV in mFLSSSpar().

mTarget

See mTarget in mFLSSSpar().

mME

See mME in mFLSSSpar().

See LB in FLSSS().

See UB in FLSSS().

solutionNeed

See solutionNeed in mFLSSSpar().

tlimit

See tlimit in mFLSSSpar().

See dl in mFLSSSpar().

See du in mFLSSSpar().

targetsOrder

This argument is mainly for research and not recommended for use. Depending on certain structural characteristic of mV, mFLSSSpar() and mFLSSSparImposeBounds() would break the mining task into a collection of no more than len * (nrow(mV) - len) + 1 independent subtasks. Threads work on these subtasks, sequentially coordinated by an atomic counter [1]. Different subtasks have different probabilities of yielding a qualified subset, thus the order of subtasks matters to the mining time cost. targetsOrder is an index vector of size len * (nrow(mV) - len) + 1 for ordering the subtasks. targetsOrder <- NULL makes a special order, and is implicitly the choice in mFLSSSpar(). This order is empirically optimal based on simulations. See the package documentation for details.

useBiSrchInFB

See useBiSrchInFB in mFLSSSpar().

avgThreadLoad

See avgThreadLoad in mFLSSSpar().

Value

A list of index vectors.

References

[1] Atomic template class in Intel TBB. An atomic counter is used to coordinate heterogeneous subtasks to avoid idle threads. The atomic operation overhead is negligible compared to the time cost of the lightest subtask.

Examples

Run this code

# NOT RUN {
rm(list = ls()); gc()
subsetSize = 7L
supersetSize = 60L
dimension = 5L # dimensionality

# }
# NOT RUN {
# Create a supertset at random:
N = supersetSize * dimension
superset = matrix(1000 * (rnorm(N) ^ 3 + 2 * runif(N) ^ 2 +
                  3 * rgamma(N, 5, 1) + 4), ncol = dimension)
rm(N)


# Make up the lower and upper bounds for the solution space:
tmp = sort(sample(1L : supersetSize, subsetSize))
tmp2 = sort(sample(1L : supersetSize, subsetSize))
lowerBounds = pmin(tmp, tmp2)
upperBounds = pmax(tmp, tmp2)
rm(tmp, tmp2)


# Exclude elements not covered by 'lowerBounds' and 'upperBounds':
remainIndex = unique(unlist(apply(cbind(lowerBounds, upperBounds), 1,
                                  function(x) x[1] : x[2])))
lowerBounds = match(lowerBounds, remainIndex)
upperBounds = match(upperBounds, remainIndex)
superset = superset[remainIndex, ]


# Plant a subset sum:
solution = apply(rbind(lowerBounds, upperBounds), 2, function(x)
  sample(x[1] : x[2], 1))
subsetSum = colSums(superset[solution, ])
subsetSumError = abs(subsetSum) * 0.01 # relative error within 1%
rm(solution)


rst = FLSSS::mFLSSSparImposeBounds(
  maxCore = 2L, len = subsetSize, mV = superset, mTarget = subsetSum,
  mME = subsetSumError, LB = lowerBounds, UB = upperBounds,
  solutionNeed = 1, tlimit = 2, dl = ncol(superset), du = ncol(superset),
  targetsOrder = NULL, useBiSrchInFB = FALSE, avgThreadLoad = 8L)


# Verify:
cat("Number of solutions = ", length(rst), "\n")
if(length(rst) > 0)
{
  cat("Solutions unique: ")
  cat(length(unique(lapply(rst, function(x) sort(x)))) == length(rst), "\n")
  cat("Solution in bounded space: ")
  cat(all(unlist(lapply(rst, function(x)
    sort(x) <= upperBounds & sort(x) >= lowerBounds))), "\n")
  cat("Solutions correct: ")
  cat(all(unlist(lapply(rst, function(x)
    abs(colSums(superset[x, ]) - subsetSum) <= subsetSumError))), "\n")
} else
{
  cat("No solutions exist or timer ended too soon.\n")
}
# }

Run the code above in your browser using DataLab