sequence_length_summary_covariate: Summarize Sequence Lengths by Covariate

Description

This function calculates summary statistics for the lengths of sequences of elements, grouped by a specified covariate. It includes mean, standard deviation, median, minimum, and maximum lengths, along with the number of distinct elements and the p-value comparing to shuffled sequences.

Usage

sequence_length_summary_covariate(sequences, covariate)

Value

A data frame with the following columns:

covariate: The value of the covariate.
mean_seq_elements: The mean length of sequences for this covariate value.
sd_seq_elements: The standard deviation of the sequence lengths for this covariate value.
median_seq_elements: The median length of sequences for this covariate value.
min_seq_elements: The minimum length of sequences for this covariate value.
max_seq_elements: The maximum length of sequences for this covariate value.
distinct_elements: The number of distinct elements for this covariate value.
pvalue_distinct_elements: The p-value comparing the number of distinct elements to shuffled sequences for this covariate value.

Arguments

sequences: A character vector where each element is a sequence of elements separated by spaces.
covariate: A vector of covariates with the same length as `sequences`, used to group the sequences.

Examples

Run this code

sequences <- c('hello world', 'hello world hello', 'hello world hello world')
covariate <- c('A', 'B', 'A')
sequence_length_summary_covariate(sequences, covariate)

Run the code above in your browser using DataLab