This function was inspired by the excellent skimr
package for R.
See the Details and Examples sections below, and the vignettes on the
modelsummary
website:
https://vincentarelbundock.github.io/modelsummary/
https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html
datasummary_skim(
data,
type = "numeric",
output = "default",
fmt = "%.1f",
histogram = TRUE,
title = NULL,
notes = NULL,
align = NULL,
escape = TRUE,
...
)
A data.frame (or tibble)
of variables to summarize: "numeric" or "categorical" (character)
filename or object type (character string)
Supported filename extensions: .docx, .html, .tex, .md, .txt, .png, .jpg.
Supported object types: "default", "html", "markdown", "latex", "latex_tabular", "data.frame", "gt", "kableExtra", "huxtable", "flextable", "DT", "jupyter". The "modelsummary_list" value produces a lightweight object which can be saved and fed back to the modelsummary
function.
Warning: Users should not supply a file name to the output
argument if they intend to customize the table with external packages. See the 'Details' section.
LaTeX compilation requires the booktabs
and siunitx
packages, but siunitx
can be disabled or replaced with global options. See the 'Details' section.
The default output formats and table-making packages can be modified with global options. See the 'Details' section.
determines how to format numeric values
integer: the number of digits to keep after the period format(round(x, fmt), nsmall=fmt)
character: passed to the sprintf
function (e.g., '%.3f' keeps 3 digits with trailing zero). See ?sprintf
function: returns a formatted character string.
NULL: does not format numbers, which allows users to include function in the "glue" strings in the estimate
and statistic
arguments.
include a histogram (TRUE/FALSE). Supported for:
type = "numeric"
output is "html", "default", "jpg", "png", or "kableExtra"
PDF and HTML documents compiled via Rmarkdown or knitr
See the examples section below for an example of how to use
datasummary
to include histograms in other formats such as markdown.
string
list or vector of notes to append to the bottom of the table.
A string with a number of characters equal to the number of columns in
the table (e.g., align = "lcc"
). Valid characters: l, c, r, d.
"l": left-aligned column
"c": centered column
"r": right-aligned column
"d": dot-aligned column. For LaTeX/PDF output, this option requires at least version 3.0.25 of the siunitx LaTeX package. These commands must appear in the LaTeX preamble (they are added automatically when compiling Rmarkdown documents to PDF):
\usepackage{booktabs}
\usepackage{siunitx}
\newcolumntype{d}{S[ input-open-uncertainty=, input-close-uncertainty=, parse-numbers = false, table-align-text-pre=false, table-align-text-post=false ]}
boolean TRUE escapes or substitutes LaTeX/HTML characters which could prevent the file from compiling/displaying. This setting does not affect captions or notes.
all other arguments are passed through to the table-making
functions kableExtra::kbl, gt::gt, DT::datatable, etc. depending on the output
argument.
This allows users to pass arguments directly to datasummary
in order to
affect the behavior of other functions behind the scenes.
The behavior of modelsummary
can be affected by setting global options:
modelsummary_factory_default
modelsummary_factory_latex
modelsummary_factory_html
modelsummary_factory_png
modelsummary_get
modelsummary_format_numeric_latex
modelsummary_format_numeric_html
modelsummary
supports 4 table-making packages: kableExtra
, gt
,
flextable
, huxtable
, and DT
. Some of these packages have overlapping
functionalities. For example, 3 of those packages can export to LaTeX. To
change the default backend used for a specific file format, you can use
the options
function:
options(modelsummary_factory_html = 'kableExtra')
options(modelsummary_factory_latex = 'gt')
options(modelsummary_factory_word = 'huxtable')
options(modelsummary_factory_png = 'gt')
modelsummary
can use two sets of packages to extract information from
statistical models: the easystats
family (performance
and parameters
)
and broom
. By default, it uses easystats
first and then falls back on
broom
in case of failure. You can change the order of priorities or include
goodness-of-fit extracted by both packages by setting:
options(modelsummary_get = "broom")
options(modelsummary_get = "easystats")
options(modelsummary_get = "all")
By default, LaTeX tables enclose all numeric entries in the \num{}
command
from the siunitx package. To prevent this behavior, or to enclose numbers
in dollar signs (for LaTeX math mode), users can call:
options(modelsummary_format_numeric_latex = "plain")
options(modelsummary_format_numeric_latex = "mathmode")
A similar option can be used to display numerical entries using MathJax in HTML tables:
options(modelsummary_format_numeric_html = "mathjax")
Arel-Bundock V (2022). “modelsummary: Data and Model Summaries in R.” Journal of Statistical Software, 103(1), 1-23. tools:::Rd_expr_doi("10.18637/jss.v103.i01").'
if (FALSE) {
dat <- mtcars
dat$vs <- as.logical(dat$vs)
dat$cyl <- as.factor(dat$cyl)
datasummary_skim(dat)
datasummary_skim(dat, "categorical")
# You can use `datasummary` to produce a similar table in different formats.
# Note that the `Histogram` function relies on unicode characters. These
# characters will only display correctly in some operating systems, under some
# locales, using some fonts. Displaying such histograms on Windows computers
# is notoriously tricky. The `modelsummary` authors cannot provide support to
# display these unicode histograms.
f <- All(mtcars) ~ Mean + SD + Min + Median + Max + Histogram
datasummary(f, mtcars, output="markdown")
}
Run the code above in your browser using DataLab