Generates an HTML report that scours the input table data. Before calling up an agent to validate the data, it's a good idea to understand the data with some level of precision. Make this the initial step of a well-balanced data quality reporting workflow. The reporting output contains several sections to make everything more digestible, and these are:
Table dimensions, duplicate row count, column types, and reproducibility information
A summary for each table variable and further statistics and summaries depending on the variable type
A matrix plot that shows interactions between variables
A set of correlation matrix plots for numerical variables
A summary figure that shows the degree of missingness across variables
A table that provides the head and tail rows of the dataset
The output HTML report is viewable in the RStudio Viewer and can also be
integrated in R Markdown HTML reports. If you need the output HTML as a
string, it's possible to get that by using as.character()
(e.g.,
scan_data(tbl = mtcars) %>% as.character()
). The resulting HTML string is a
complete HTML document where Bootstrap and jQuery are embedded within.
scan_data(
tbl,
sections = c("overview", "variables", "interactions", "correlations", "missing",
"sample"),
navbar = TRUE,
reporting_lang = NULL
)
The input table. This can be a data frame, tibble, a tbl_dbi
object, or a tbl_spark
object.
The sections to include in the finalized Table Scan
report.
A character vector with section names is required here. The sections in
their default order are: "overview"
, "variables"
, "interactions"
,
"correlations"
, "missing"
, and "sample"
. This vector can be comprised
of less elements and the order can be changed to suit the desired layout of
the report. For tbl_dbi
and tbl_spark
objects, the "interactions"
and
"correlations"
sections are excluded.
Should there be a navigation bar anchored to the top of the
report page? By default this is TRUE
.
The language to use for label text in the report. By
default, NULL
will create English ("en"
) text. Other options include
French ("fr"
), German ("de"
), Italian ("it"
), and Spanish ("es"
).
1-1
Other Planning and Prep:
action_levels()
,
col_schema()
,
create_agent()
,
validate_rmd()
# NOT RUN {
# Get an HTML report that describes all of
# the data in the `dplyr::storms` dataset
# scan_data(tbl = dplyr::storms)
# }
Run the code above in your browser using DataLab