This function allows for i) separation of a mutation dataset into training, validation and testing components, and ii) conversion from annotated mutation format to sparse mutation matrices, as described in the function get_table_from_maf().
get_mutation_tables(
maf,
split = c(train = 0.7, val = 0.15, test = 0.15),
sample_list = NULL,
gene_list = NULL,
acceptable_genes = NULL,
for_biomarker = "TIB",
include_synonymous = TRUE,
dictionary = NULL,
seed_id = 1234
)(dataframe) A table of annotated mutations containing the columns 'Tumor_Sample_Barcode', 'Hugo_Symbol', and 'Variant_Classification'.
(double) A vector of three positive values with names 'train', 'val' and 'test'. Specifies the proportions into which to split the dataset.
sample_list (character) Optional parameter specifying the set of samples to include in the mutation matrices.
(character) Optional parameter specifying the set of genes to include in the mutation matrices.
(character) Optional parameter specifying a set of acceptable genes, for example those which are in an ensembl databse.
(character) Used for defining a dictionary of mutations. See the function get_mutation_dictionary() for details.
(logical) Optional parameter specifying whether to include synonymous mutations in the mutation matrices.
(character) Optional parameter directly specifying the mutation dictionary to use. See the function get_mutation_dictionary() for details.
(numeric) Input value for the function set.seed().
A list of three items with names 'train', 'val' and 'test'. Each element will contain a sparse mutation matrix for the samples in that branch, alongside other information as described as the output of the function get_table_from_maf().
# NOT RUN {
tables <- get_mutation_tables(example_maf_data$maf, sample_list = paste0("SAMPLE_", 1:100))
print(names(tables))
print(names(tables$train))
# }
Run the code above in your browser using DataLab