subsampleDb
will sample the same number of sequences for each gene, family
or allele (specified with mode
) in data
. Samples or subjects can
be subsampled indepently by setting group
.
subsampleDb(
data,
gene = "v_call",
mode = c("gene", "allele", "family"),
min_n = 1,
max_n = NULL,
group = NULL
)
Subsampled version of the input data
.
data.frame
containing repertoire data.
name of the column in data
with allele calls. Default
is v_call
.
one of c("gene", "family", "allele")
defining the degree of
specificity regarding allele calls when subsetting sequences.
Determines how data
will be split into subsets from
which the same number of sequences will be subsampled. See
also group
.
minimum number of observations to sample from each groupe. A group with less observations than the minimum is excluded.
maximum number of observations to sample for all mode
groups.
If NULL
, it will be set automatically to the size of
the smallest group. If max_n
is larger than the availabe
number of sequences for any mode
group, if will be
automatically adjusted and the efective max_n
used
will be the size of the smallest mode
group.
columns containing additional grouping variables, e.g. sample_id.
These groups will be subsampled independently. If
max_n
is NULL
, a max_n
will be
automatically set for each group
.
data
will be split into gene, allele or family subsets (mode
) from
which the same number of sequences will be subsampled. If mode=gene
,
for each gene in the field gene
from data
, a maximum of
max_n
sequences will be subsampled. Input sequences
that have multiple gene calls (ties), can be subsampled from any of their calls,
but these duplicated samplings will be removed, and the final
subsampled data
will contain unique rows.
selectNovel