Summary of a data frame consisting of: variable names, labels if any, factor levels, frequencies and/or numerical summary statistics, and valid/missing observation counts.
dfSummary(x, round.digits = 2, varnumbers = TRUE,
labels.col = length(label(x, all = TRUE)) > 0, valid.col = TRUE,
na.col = TRUE, graph.col = TRUE, style = "multiline",
plain.ascii = TRUE, justify = "left", omit.headings = FALSE,
max.distinct.values = 10, trim.strings = FALSE, max.string.width = 25,
split.cells = 40, split.table = Inf, ...)A data frame.
Number of significant digits to display in numerical
summaries and in frequency proportions. Defaults to 2.
Logical. Should the first column contain variable number? Defaults
to TRUE.
Logical. If TRUE, variable labels (as defined with
rapportools, Hmisc or summarytools' label functions)
will be displayed. By default, the labels column is shown if at least
one column has a defined label.
Logical. Include column indicating count and proportion of valid
(non-missing) values. TRUE by default.
Logical. Include column indicating count and proportion of missing
(NA) values. TRUE by default.
Logical. Display barplots / histograms column in html
reports. TRUE by default.
Style to be used by pander when
rendering output table. Defaults to “multiline”. The only other valid
option is “grid”. Style “simple” is not supported for this particular
function, and “rmarkdown” will fallback to “multiline”.
Logical. pander argument; When
TRUE, no markup characters will be generated (useful when printing
to console). Defaults to TRUE.
String indicating alignment of columns; one of “l” (left) “c” (center), or “r” (right). Defaults to “l”.
Logical. Set to TRUE to omit headings.
The maximum number of values to display frequencies for. If variable has more distinct values than this number, the remaining frequencies will be reported as a whole, along with the number of additional distinct values. Defaults to 10.
Logical; for character variables, should leading and
trailing white space be removed? Defaults to FALSE. See details
section.
Limits the number of characters to display in the
frequency tables. Defaults to 25.
A numeric argument passed to pander.
It is the number of characters allowed on a line before splitting the cell.
Defaults to 40.
pander argument which determines the maximum
width of a table. Keeping the default value (Inf) is recommended.
Additional arguments passed to pander.
A data frame containing as many rows as there are columns in x,
with additional attributes to inform print function. Columns of the
output data frame are:
Number indicating the order in which column appears in the data frame.
Name of the variable, along with its class(es).
Label of the variable (if applicable).
For factors, a list of their values, limited by the
max.distinct.values parameter. For character variables, the most
common values (in descending frequency order), also limited by
max.distinct.values. For numerical variables, common univariate
statistics (mean, std. deviation, min, med, max, IQR and CV).
For factors and character variables, the frequencies
and proportions of the values listed in the previous column. For numerical
vectors, number of distinct values, or frequency of distinct values if
their number is not greater than max.distinct.values.
An ascii histogram for numerical variables, and ascii barplot for factors and character variables.
Number and proportion of valid values.
Number and proportion of missing (NA) values, including NaN's.
The default plain.ascii = TRUE option is there to make results
appear cleaner in the console. When used in a context of rmarkdown rendering,
set this option to FALSE.
When the trim.strings is set to TRUE, trimming is done
before calculating frequencies, so those will be impacted
accordingly.
# NOT RUN {
data(tobacco)
dfSummary(tobacco)
# }
# NOT RUN {
view(dfSummary(iris))
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab