Learn R Programming

grepreaper (version 0.1.0)

grep_read: grep_read: Efficiently read and filter lines from one or more files using grep, returning a data.table.

Description

grep_read: Efficiently read and filter lines from one or more files using grep, returning a data.table.

Usage

grep_read(
  files = NULL,
  path = NULL,
  file_pattern = NULL,
  pattern = "",
  invert = FALSE,
  ignore_case = FALSE,
  fixed = FALSE,
  show_cmd = FALSE,
  recursive = FALSE,
  word_match = FALSE,
  show_line_numbers = FALSE,
  only_matching = FALSE,
  nrows = Inf,
  skip = 0,
  header = TRUE,
  col.names = NULL,
  include_filename = FALSE,
  show_progress = FALSE,
  ...
)

Value

A data.table with different structures based on the options:

  • Default: Data columns with original types preserved

  • show_line_numbers=TRUE: Additional 'line_number' column (integer) with source file line numbers

  • include_filename=TRUE: Additional 'source_file' column (character)

  • only_matching=TRUE: Single 'match' column with matched substrings

  • show_cmd=TRUE: Character string containing the grep command

Arguments

files

Character vector of file paths to read.

path

Optional. Directory path to search for files.

file_pattern

Optional. A pattern to filter filenames when using the path argument. Passed to list.files.

pattern

Pattern to search for within files (passed to grep).

invert

Logical; if TRUE, return non-matching lines.

ignore_case

Logical; if TRUE, perform case-insensitive matching (default: TRUE).

fixed

Logical; if TRUE, pattern is a fixed string, not a regular expression.

show_cmd

Logical; if TRUE, return the grep command string instead of executing it.

recursive

Logical; if TRUE, search recursively through directories.

word_match

Logical; if TRUE, match only whole words.

show_line_numbers

Logical; if TRUE, include line numbers from source files. Headers are automatically removed and lines renumbered.

only_matching

Logical; if TRUE, return only the matching part of the lines.

nrows

Integer; maximum number of rows to read.

skip

Integer; number of rows to skip.

header

Logical; if TRUE, treat first row as header. Note that using FALSE means that the first row will be included as a row of data in the reading process.

col.names

Character vector of column names.

include_filename

Logical; if TRUE, include source filename as a column.

show_progress

Logical; if TRUE, show progress indicators.

...

Additional arguments passed to fread.