Learn R Programming

siftr (version 1.1.0)

options_sift: Set and get options related to how sift() runs.

Description

  • sift_limit (Integer; default 25)

    • How many matches should sift() print? This saves you from locking up R by accidentally printing a summary that is thousands of columns long.

  • sift_guessmax (Integer; default 1000)

    • Running summary statistics on very large dataframes (hundreds of columns, millions of rows) can take a long time. This option controls the point at which sift() decides that a dataframe has too many rows to use as-is, and starts randomly sampling from it instead.

    • For any dataframe with nrow() <= guessmax, the entirety of each column will be used for summary stats like "Missing %" and "Peek at unique values". Above this row count, n = guessmax elements of each column will be randomly sampled without replacement to make these stats, and a warning glyph will be shown alongside those stats to show that they were estimated.

    • Factor variables are never sampled; their levels are used in full.

  • sift_peeklength (Integer; default 3000)

    • When sift() creates a dictionary, it generates a "peek" that previews the unique values of each column. You are only shown a small part of that peek in query results, but the full peek is used to search through the dictionary. This option controls how long (in characters) this full peek is allowed to be. The practical maximum is around 30 thousand characters. The default of 3000 characters is about as long as a 1-page Word document at default settings.

Usage

options_sift(
  key = c("sift_limit", "sift_guessmax", "sift_peeklength"),
  val = NULL
)

Value

The option's value. If invoked with no arguments (options_sift()), prints the status of all options to the console and returns NULL.

Arguments

key

(String) The name of an option.

val

(Optional) A new value for the option, if you want to change it.

Examples

Run this code
# \donttest{
options_sift("sift_limit")  # Returns the option's current value
options_sift("sift_limit", 100)  # Change the value to something else.
options_sift("sift_limit", 25)  # Change it back.

# Options set in this function are set in R's options() interface
options("sift_limit")
getOption("sift_limit")
# }

Run the code above in your browser using DataLab