Learn R Programming

AnimalSequences (version 0.2.0)

sequence_length_summary_covariate: Summarize Sequence Lengths by Covariate

Description

This function calculates summary statistics for the lengths of sequences of elements, grouped by a specified covariate. It includes mean, standard deviation, median, minimum, and maximum lengths, along with the number of distinct elements and the p-value comparing to shuffled sequences.

Usage

sequence_length_summary_covariate(sequences, covariate)

Value

A data frame with the following columns:

covariate

The value of the covariate.

mean_seq_elements

The mean length of sequences for this covariate value.

sd_seq_elements

The standard deviation of the sequence lengths for this covariate value.

median_seq_elements

The median length of sequences for this covariate value.

min_seq_elements

The minimum length of sequences for this covariate value.

max_seq_elements

The maximum length of sequences for this covariate value.

distinct_elements

The number of distinct elements for this covariate value.

pvalue_distinct_elements

The p-value comparing the number of distinct elements to shuffled sequences for this covariate value.

Arguments

sequences

A character vector where each element is a sequence of elements separated by spaces.

covariate

A vector of covariates with the same length as `sequences`, used to group the sequences.

Examples

Run this code
sequences <- c('hello world', 'hello world hello', 'hello world hello world')
covariate <- c('A', 'B', 'A')
sequence_length_summary_covariate(sequences, covariate)

Run the code above in your browser using DataLab