chooseGroupNames: Choose Column Most Likely For Sample-Names

Description

This function looks at all comumns of mat which columns may be likely choices for sample-names and derives then group-names after stripping terminal enumerators. Ideal sample-names should contain some replicates indicates as terminal enumerators.

Usage

chooseGroupNames(
  mat,
  useCoNa = NULL,
  method = "median",
  sep = c("_", "-", " ", ".", "=", ";"),
  rmTxt = NULL,
  asUnique = TRUE,
  partEnumerator = FALSE,
  fullReport = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Value

This function returns a character vector with grouop-names (and sample-names as names of entries) or if fullReport=TRUE a list with $group, $sampleNames, $col (index of column from mat and name of column

Arguments

mat: (matrix or data.frame) contains possible choices for sample-names
useCoNa: (character) optional custom choice for columns of mat to check; if NULL all columns will be used/checked
method: (character) decide how to choose number of groups as : min, low, med, high, max or mode Note arguments asUnique and partEnumerator influence which columns of mat will be evaluated/checked
sep: (character) separators considered when searching and removing common words
rmTxt: (character, length=1) optional removing of custom text (eg variable file-extensions); no obligation that rmTxt occurs in all instances
asUnique: (logical) requires all (potential) samples-names to be unique (ie no repeats) to be considered for group-names; also removes all candidate columns with all different names
partEnumerator: (logical) when TRUE allows some instances of (potential) sample-names without numerator: a1, a2, b (ie some wo enumerator)
fullReport: (logical) if TRUE returns list with $group, $sampleNames, $col (iondex of column from mat and name of c)
silent: (logical) suppress messages if TRUE
debug: (logical) additional messages for debugging
callFrom: (character) allows easier tracking of messages produced

Details

The basic idea is that the column containing (good) samples-names contains all different entries and that by stripping terminal enumerators one can understand the grouping of replicates. Note arguments asUnique and partEnumerator influence which columns of mat will be evaluated/checked

Examples

Run this code

 mat <- cbind(a=letters[1:6], b=paste(rep(c("b","B"), each=3), 1:3), c=rep(1,6), 
   d=gl(3,2), e=rep(c("e","E"),3), f=paste(rep(c("F","f","ff"), each=2), 1:2))
chooseGroupNames(mat, method="median")         # col 2 (b/B)
chooseGroupNames(mat, method="median", fullReport=TRUE) 
chooseGroupNames(mat, method="min")            # col 2 (b/B)
chooseGroupNames(mat, method="max")            # col 6 (F/f/ff)
chooseGroupNames(mat, method="max", asUnique=FALSE) # col 1 (a..)