Using regular expressions, metadata is extracted from file names and directory structure, checked and cleaned.
clean_metadata(
project_dir = NULL,
project_files = NULL,
file_type = "wav",
subset = NULL,
subset_type = "keep",
pattern_site_id = create_pattern_site_id(),
pattern_aru_id = create_pattern_aru_id(),
pattern_date = create_pattern_date(),
pattern_time = create_pattern_time(),
pattern_dt_sep = create_pattern_dt_sep(),
pattern_tz_offset = create_pattern_tz_offset(),
order_date = "ymd",
quiet = FALSE
)
Data frame with extracted metadata
Character. Directory where project files are stored. File paths will be used to extract information and must actually exist.
Character. Vector of project file paths. These paths can
be absolute or relative to the working directory, and don't actually need
to point to existing files unless you plan to use clean_gps()
or other
sampling steps down the line. Must be provided if project_dir
is NULL
.
Character. Type of file (extension) to summarize. Default wav.
Character. Text pattern to mark a subset of files/directories
to either "keep"
or "omit"
(see subset_type
)
Character. Either keep
(default) or omit
files/directories which match the pattern in subset
.
Character. Regular expression to extract site ids. See
create_pattern_site_id()
. Can be a vector of multiple patterns to match.
Character. Regular expression to extract ARU ids. See
create_pattern_aru_id()
. Can be a vector of multiple patterns to match.
Character. Regular expression to extract dates. See
create_pattern_date()
. Can be a vector of multiple patterns to match.
Character. Regular expression to extract times. See
create_pattern_time()
. Can be a vector of multiple patterns to match.
Character. Regular expression to mark separators
between dates and times. See create_pattern_dt_sep()
.
Character. Regular expression to extract time zone
offsets from file names. See. create_pattern_tz_offset()
.
Character. Order that the date appears in. "ymd" (default), "mdy", or "dmy". Can be a vector of multiple patterns to match.
Logical. Whether to suppress progress messages and other non-essential updates.
Note that times are extracted by first combining the date, date/time separator and the time patterns. This means that if there is a problem with this combination, dates might be extracted but date/times will not. This mismatch can be used to determine which part of a pattern needs to be tweaked.
See vignette("customizing", package = "ARUtools")
for details on
customizing clean_metadata()
for your project.
clean_metadata(project_files = example_files)
clean_metadata(project_files = example_files, subset = "P02")
Run the code above in your browser using DataLab