Learn R Programming

ORscraper (version 0.1.0)

extract_values_from_tables: Extract values from tables within text

Description

This function analyzes a subset of text lines, extracting information such as mutations, pathogenicity, frequencies, codifications and changes.

Usage

extract_values_from_tables(
  lines,
  mutations,
  genes_mutated = list(),
  pathogenicity = list(),
  frequencies = list(),
  codifications = list(),
  changes = list(),
  values = list(),
  start = "Variantes de secuencia de ADN",
  start2 = "   Variaciones del número de copias",
  end = "Genes analizados",
  end2 = "Comentarios adicionales sobre las variantes"
)

Value

A list containing extracted data: genes, pathogenicity, frequencies, codifications and changes.

Arguments

lines

Character vector. Lines of text to process.

mutations

Character vector. List of known mutation identifiers.

genes_mutated

Ordered list to store extracted gene data.

pathogenicity

Ordered list to store extracted pathogenicity information.

frequencies

Ordered list to store extracted frequency data.

codifications

Ordered list to store extracted codification data.

changes

Ordered list to store extracted changes data.

values

Aggregated list of extracted information.

start

Starting marker for the relevant table section.

start2

Secondary starting marker for the table section, in case the table is divided in two pages.

end

text marker indicating the end of the subset.

end2

secondary end marker.

Examples

Run this code
InputPath <- system.file("extdata", package = "ORscraper")
files <- read_pdf_files(InputPath)
lines <- read_pdf_content(files[1])  # Example with the first file

genes_file <- system.file("extdata/Genes.xlsx", package = "ORscraper")
genes <- readxl::read_excel(genes_file)
mutations <- unique(genes$GEN)

TableValues <- extract_values_from_tables(lines, mutations)
mutateGenes <- TableValues[[1]]
pathogenity <- TableValues[[2]]
frequencies <- TableValues[[3]]
codifications <- TableValues[[4]]
changes <- TableValues[[5]]

Run the code above in your browser using DataLab