Learn R Programming

ggDNAvis (version 0.3.2)

extract_and_sort_sequences: Extract, sort, and add spacers between sequences in a dataframe

Description

This function takes a dataframe that contains sequences and metadata, recursively splits it into multiple levels of groups defined by grouping_levels, and adds breaks between each level of group as defined by grouping_levels. Within each lowest-level group, reads are sorted by sort_by, with order determined by desc_sort.

Default values are set up to work with the included dataset example_many_sequences.

The returned sequences vector is ideal input for visualise_many_sequences().

Also called by extract_methylation_from_dataframe() to produce input for visualise_methylation().

Usage

extract_and_sort_sequences(
  sequence_dataframe,
  sequence_variable = "sequence",
  grouping_levels = c(family = 8, individual = 2),
  sort_by = "sequence_length",
  desc_sort = TRUE
)

Value

character vector. The sequences ordered and grouped as specified, with blank sequences ("") inserted as spacers as specified.

Arguments

sequence_dataframe

dataframe. A dataframe containing the sequence information and all required meta-data. See example_many_sequences for an example of a compatible dataframe.

sequence_variable

character. The name of the column within the dataframe containing the sequence information to be output. Defaults to "sequence".

grouping_levels

named character vector. What variables should be used to define the groups/chunks, and how large a gap should be left between groups at that level. Set to NA to turn off grouping.

Defaults to c("family" = 8, "individual" = 2), meaning the highest-level groups are defined by the family column, and there is a gap of 8 between each family. Likewise the second-level groups (within each family) are defined by the individual column, and there is a gap of 2 between each individual.

Any number of grouping variables and gaps can be given, as long as each grouping variable is a column within the dataframe. It is recommended that lower-level groups are more granular and subdivide higher-level groups (e.g. first divide into families, then into individuals within families).

To change the order of groups within a level, make that column a factor with the order specified e.g. example_many_sequences$family <- factor(example_many_sequences$family, levels = c("Family 2", "Family 3", "Family 1")) to change the order to Family 2, Family 3, Family 1.

sort_by

character. The name of the column within the dataframe that should be used to sort/order the rows within each lowest-level group. Set to NA to turn off sorting within groups.

Recommended to be the length of the sequence information, as is the case for the default "sequence_length" which was generated via example_many_sequences$sequence_length <- nchar(example_many_sequences$sequence).

desc_sort

logical. Boolean specifying whether rows within groups should be sorted by the sort_by variable descending (TRUE, default) or ascending (FALSE).

Examples

Run this code
extract_and_sort_sequences(
    example_many_sequences,
    sequence_variable = "sequence",
    grouping_levels = c("family" = 8, "individual" = 2),
    sort_by = "sequence_length",
    desc_sort = TRUE
)

extract_and_sort_sequences(
    example_many_sequences,
    sequence_variable = "sequence",
    grouping_levels = c("family" = 3),
    sort_by = "sequence_length",
    desc_sort = FALSE
)

extract_and_sort_sequences(
    example_many_sequences,
    sequence_variable = "sequence",
    grouping_levels = NA,
    sort_by = "sequence_length",
    desc_sort = TRUE
)

extract_and_sort_sequences(
    example_many_sequences,
    sequence_variable = "sequence",
    grouping_levels = c("family" = 8, "individual" = 2),
    sort_by = NA
)

extract_and_sort_sequences(
    example_many_sequences,
    sequence_variable = "sequence",
    grouping_levels = NA,
    sort_by = NA
)

extract_and_sort_sequences(
    example_many_sequences,
    sequence_variable = "quality",
    grouping_levels = c("individual" = 3),
    sort_by = "quality",
    desc_sort = FALSE
)

Run the code above in your browser using DataLab