Converts a list of run-length encodings (RLEs) into a data frame with 16 features after mappability profiling and nucleotide filtering.
convert_rle_to_df(
covs,
unreliable_region_version = "1_4_0",
unreliable_region_enabled = TRUE,
additional_nucleotide_info = data.frame()
)A data frame object that contains the mapping result for each virus segment that the plant sample reads are aligned to and a RLE list of coverage information.
A list of Coverage profile(s) in RLE format. Can be one or more samples.
The version number (character string) of unreliable regions of the virus segments.
Default is 1_4_0. It includes the mappability profile from
a host genome (we only have Arabidopsis thaliana right now) and virus
references, and the regions that have CG% and A% over 60% and 45%
respectively.
Default is TRUE. If TRUE, the input will be
checked against unreliable_region_df. If FALSE, this step will be
skipped.
Additional nucleotide information for virus
segments that are not included in nucleotide_info. The information
provided must be a data frame that follows the format of
nucleotide_info. Default is an empty data frame.
Converts a list of run-length encodings (RLEs) into a data frame.
The returned dataframe contains 16 features for training a machine learning model. after mappability profiling and nucleotide filtering.
if (FALSE) {
df <- convert_rle_to_df(example_cov)
}
Run the code above in your browser using DataLab