Learn R Programming

GencoDymo2 (version 1.0.2)

df_to_fasta: Convert Data Frame to FASTA File

Description

Converts a data frame containing sequence IDs and sequences into a FASTA-formatted file, optionally compressed as gzip.

Usage

df_to_fasta(df, id_col, seq_col, output_file = NULL, gzip = TRUE, verbose = TRUE)

Value

No return value. Writes a FASTA file to the specified path.

Arguments

df

A data frame with at least two columns: one for sequence IDs and one for sequences.

id_col

A character string specifying the column name containing sequence IDs.

seq_col

A character string specifying the column name containing sequence data.

output_file

A character string specifying the output file path. If NULL, the function will stop with an informative message.

gzip

A logical indicating whether to compress the output as a gzip file. Defaults to TRUE.

verbose

A logical indicating whether to print progress messages. Defaults to TRUE.

Details

This function efficiently writes large sequence datasets to FASTA format, handling compression and progress reporting. It validates input columns and manages memory by processing data in chunks.

Examples

Run this code
temp_dir <- tempdir()
temp_output <- file.path(temp_dir, "output.fa.gz")
seq_data <- data.frame(
  transcript_id = c("ENST0001", "ENST0002"),
  sequence = c("ATGCTAGCTAG", "GCTAGCTAGCT")
)
df_to_fasta(seq_data, "transcript_id", "sequence", temp_output)

Run the code above in your browser using DataLab