Learn R Programming

wrMisc (version 1.15.4)

chooseGroupNames: Choose Column Most Likely For Sample-Names

Description

This function looks at all comumns of mat which columns may be likely choices for sample-names and derives then group-names after stripping terminal enumerators. Ideal sample-names should contain some replicates indicates as terminal enumerators.

Usage

chooseGroupNames(
  mat,
  useCoNa = NULL,
  method = "median",
  sep = c("_", "-", " ", ".", "=", ";"),
  rmTxt = NULL,
  asUnique = TRUE,
  partEnumerator = FALSE,
  fullReport = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Value

This function returns a character vector with grouop-names (and sample-names as names of entries) or if fullReport=TRUE a list with $group, $sampleNames, $col (index of column from mat and name of column

Arguments

mat

(matrix or data.frame) contains possible choices for sample-names

useCoNa

(character) optional custom choice for columns of mat to check; if NULL all columns will be used/checked

method

(character) decide how to choose number of groups as : min, low, med, high, max or mode Note arguments asUnique and partEnumerator influence which columns of mat will be evaluated/checked

sep

(character) separators considered when searching and removing common words

rmTxt

(character, length=1) optional removing of custom text (eg variable file-extensions); no obligation that rmTxt occurs in all instances

asUnique

(logical) requires all (potential) samples-names to be unique (ie no repeats) to be considered for group-names; also removes all candidate columns with all different names

partEnumerator

(logical) when TRUE allows some instances of (potential) sample-names without numerator: a1, a2, b (ie some wo enumerator)

fullReport

(logical) if TRUE returns list with $group, $sampleNames, $col (iondex of column from mat and name of c)

silent

(logical) suppress messages if TRUE

debug

(logical) additional messages for debugging

callFrom

(character) allows easier tracking of messages produced

Details

The basic idea is that the column containing (good) samples-names contains all different entries and that by stripping terminal enumerators one can understand the grouping of replicates. Note arguments asUnique and partEnumerator influence which columns of mat will be evaluated/checked

See Also

rmSharedWords, replicateStructure, protectSpecChar

Examples

Run this code
 mat <- cbind(a=letters[1:6], b=paste(rep(c("b","B"), each=3), 1:3), c=rep(1,6), 
   d=gl(3,2), e=rep(c("e","E"),3), f=paste(rep(c("F","f","ff"), each=2), 1:2))
chooseGroupNames(mat, method="median")         # col 2 (b/B)
chooseGroupNames(mat, method="median", fullReport=TRUE) 
chooseGroupNames(mat, method="min")            # col 2 (b/B)
chooseGroupNames(mat, method="max")            # col 6 (F/f/ff)
chooseGroupNames(mat, method="max", asUnique=FALSE) # col 1 (a..)

Run the code above in your browser using DataLab