Learn R Programming

LightLogR (version 0.9.2)

remove_partial_data: Remove groups that have too few data points

Description

This function removes groups from a dataframe that do not have sufficient data points. Groups of one data point will automatically be removed. Single data points are common after using aggregate_Datetime().

Usage

remove_partial_data(
  dataset,
  Variable.colname = Datetime,
  threshold.missing = 0.2,
  by.date = FALSE,
  Datetime.colname = Datetime,
  show.result = FALSE,
  handle.gaps = FALSE
)

Value

if show.result = FALSE(default), a reduced dataframe without the groups that did not have sufficient data

Arguments

dataset

A light logger dataset. Expects a dataframe. If not imported by LightLogR, take care to choose sensible variables for the Datetime.colname and Variable.colname.

Variable.colname

Column name that contains the variable for which to assess sufficient datapoints. Expects a symbol. Needs to be part of the dataset. Default is Datetime, which makes only sense in the presence of single data point groups that need to be removed.

threshold.missing

either

  • percentage of missing data, before that group gets removed. Expects a numeric scalar.

  • duration of missing data, before that group gets removed. Expects either a lubridate::duration() or a character that can be converted to one, e.g., "30 mins".

by.date

Logical. Should the data be (additionally) grouped by day? Defaults to FALSE. Additional grouping is not persitant beyond the function call.

Datetime.colname

Column name that contains the datetime. Defaults to "Datetime" which is automatically correct for data imported with LightLogR. Expects a symbol. Needs to be part of the dataset. Must be of type POSIXct.

show.result

Logical, whether the output of the function is summary of the data (TRUE), or the reduced dataset (FALSE, the default)

handle.gaps

Logical, whether the data shall be treated with gap_handler(). Is set to FALSE by default. If TRUE, it will be used with the argument full.days = TRUE.

Examples

Run this code
#create sample data with gaps
gapped_data <-
  sample.data.environment |>
  dplyr::filter(MEDI < 30000)

#check their status, based on the MEDI variable
gapped_data |> remove_partial_data(MEDI, handle.gaps = TRUE, show.result = TRUE)

#the function will produce a warning if implicit gaps are present
gapped_data |> remove_partial_data(MEDI, show.result = TRUE)

#one group (Environment) does not make the cut of 20% missing data
gapped_data |> remove_partial_data(MEDI, handle.gaps = TRUE) |> dplyr::count(Id)
#for comparison
gapped_data |> dplyr::count(Id)
#If the threshold is set differently, e.g., to 2 days allowed missing, results vary
gapped_data |>
  remove_partial_data(MEDI, handle.gaps = TRUE, threshold.missing = "2 days") |>
  dplyr::count(Id)

#The removal can be automatically switched to daily detections within groups
gapped_data |>
 remove_partial_data(MEDI, handle.gaps = TRUE, by.date = TRUE, show.result = TRUE) |>
 head()

Run the code above in your browser using DataLab