Learn R Programming

StatsTFLValR (version 1.0.0)

get_column_info: Extract Column Metadata from a Data Frame

Description

Inspects a data frame and returns a summary of metadata for each column, including column name, label, format, class/type, missingness, uniqueness, and (optionally) SAS-style display for Date variables (e.g., DATE9 -> 09JUL2012).

Usage

get_column_info(
  df,
  include_attributes = TRUE,
  exclude_attributes = c("class", "row.names"),
  label_attr = c("label", "var.label", "labelled", "Label"),
  format_attr = c("format", "format.sas", "Format", "displayWidth"),
  compute_ranges = TRUE,
  sas_date_display = TRUE
)

Value

A tibble with one row per column and metadata fields.

  • column: Column name

  • label: Label attribute (if present)

  • format: Format attribute (if present; e.g., DATE9.)

  • class: Class(es)

  • typeof: Underlying storage type

  • n: Total length

  • n_missing: Number of NAs

  • n_unique: Number of unique values

  • min_raw/max_raw: Min/max as raw values (Date/numeric)

  • min_disp/max_disp: Min/max as display strings (SAS-like for dates when enabled)

  • sample_disp: First non-missing value as display string (SAS-like for dates when enabled)

  • attribute_names: Comma-separated attribute names (after exclusions)

  • attributes: List column of attributes (optional)

Arguments

df

A data.frame or tibble. The input dataset whose column metadata should be extracted.

include_attributes

Logical. If TRUE, includes a list-column of full attributes (after exclusions).

exclude_attributes

Character vector of attribute names to drop from the attributes list.

label_attr

Character vector of attribute names to check (in order) for a label.

format_attr

Character vector of attribute names to check (in order) for a format.

compute_ranges

Logical. If TRUE, computes min/max for numeric and date/datetime types.

sas_date_display

Logical. If TRUE, adds SAS-style display columns for Date/POSIXct.

Examples

Run this code

df <- data.frame(
  USUBJID = c("01", "02", "03"),
  AGE     = c(45, 50, NA),
  TRTAN   = c(1L, 2L, 1L),
  ASTDT   = as.Date(c("2024-01-01", "2024-01-02", "2024-01-03")),
  stringsAsFactors = FALSE
)

get_column_info(df)

Run the code above in your browser using DataLab