Learn R Programming

pointblank (version 0.5.1)

scan_data: Thoroughly scan the table data so as to understand it better

Description

Generates an HTML report that scours the input table data. Before calling up an agent to validate the data, it's a good idea to understand the data with some level of precision. Make this the initial step of a well-balanced data quality reporting workflow. The reporting output contains several sections to make everything more digestible, and these are:

Overview

Table dimensions, duplicate row count, column types, and reproducibility information

Variables

A summary for each table variable and further statistics and summaries depending on the variable type

Interactions

A matrix plot that shows interactions between variables

Correlations

A set of correlation matrix plots for numerical variables

Missing Values

A summary figure that shows the degree of missingness across variables

Sample

A table that provides the head and tail rows of the dataset

The output HTML report is viewable in the RStudio Viewer and can also be integrated in R Markdown HTML reports. If you need the output HTML as a string, it's possible to get that by using as.character() (e.g., scan_data(tbl = mtcars) %>% as.character()). The resulting HTML string is a complete HTML document where Bootstrap and jQuery are embedded within.

Usage

scan_data(
  tbl,
  sections = c("overview", "variables", "interactions", "correlations", "missing",
    "sample"),
  navbar = TRUE,
  reporting_lang = NULL
)

Arguments

tbl

The input table. This can be a data frame, tibble, a tbl_dbi object, or a tbl_spark object.

sections

The sections to include in the finalized Table Scan report. A character vector with section names is required here. The sections in their default order are: "overview", "variables", "interactions", "correlations", "missing", and "sample". This vector can be comprised of less elements and the order can be changed to suit the desired layout of the report. For tbl_dbi and tbl_spark objects, the "interactions" and "correlations" sections are excluded.

navbar

Should there be a navigation bar anchored to the top of the report page? By default this is TRUE.

reporting_lang

The language to use for label text in the report. By default, NULL will create English ("en") text. Other options include French ("fr"), German ("de"), Italian ("it"), and Spanish ("es").

Function ID

1-1

See Also

Other Planning and Prep: action_levels(), col_schema(), create_agent(), validate_rmd()

Examples

Run this code
# NOT RUN {
# Get an HTML report that describes all of
# the data in the `dplyr::storms` dataset
# scan_data(tbl = dplyr::storms)

# }

Run the code above in your browser using DataLab