Learn R Programming

tidycells (version 0.2.2)

collate_columns: Collate Columns Based on Content

Description

After compose_cells, this function rearranges and rename attribute-columns in order to make columns properly aligned, based on the content of the columns.

Usage

collate_columns(
  composed_data,
  combine_threshold = 1,
  rest_cols = Inf,
  retain_other_cols = FALSE,
  retain_cell_address = FALSE
)

Arguments

composed_data

output of compose_cells (preferably not processed)

combine_threshold

a numerical threshold (between 0-1) for content-based collation of columns. (Default 1)

rest_cols

number of rest columns (beyond combine_threshold joins these many numbers of columns to keep)

retain_other_cols

whether to keep other intermediate (and possibly not so important) columns. (Default FALSE)

retain_cell_address

whether to keep columns like (row, col, data_block). This may be required for traceback (Default FALSE)

Value

A column collated data.frame

Details

  • Dependency on stringdist: If you have stringdist installed, the approximate string matching will be enhanced. There may be variations in outcome if you have stringdist vs if you don't have it.

  • Possibility of randomness: If the attribute column is containing many distinct values, then a column representative sample will be drawn. Hence it is always recommended to set.seed if reproducibility is a matter of concern.

Examples

Run this code
# NOT RUN {
d <- system.file("extdata", "marks_cells.rds", package = "tidycells", mustWork = TRUE) %>%
  readRDS()
d <- numeric_values_classifier(d)
da <- analyze_cells(d)

dc <- compose_cells(da, print_attribute_overview = TRUE)

collate_columns(dc)
# }

Run the code above in your browser using DataLab