Computes summary statistics (e.g., mean, standard deviation, median, etc.) for a specified column ("character string") in a data frame, grouped by one or more grouping variables in that data frame ("character strings"). Summary parameters can be customized and the results can be exported to an 'Excel' file.
f_summary(
data,
data.column,
...,
show_n = TRUE,
show_mean = TRUE,
show_sd = TRUE,
show_se = TRUE,
show_min = TRUE,
show_max = TRUE,
show_median = TRUE,
show_Q1 = TRUE,
show_Q3 = TRUE,
digits = 2,
export_to_excel = FALSE,
close_generated_files = FALSE,
open_generated_files = TRUE,
output_file = NULL,
output_dir = NULL,
save_in_wdir = FALSE,
open_excel = TRUE,
check_input = TRUE,
eval_input = FALSE,
digits_excel = NULL,
detect_int_col = TRUE
)
A data frame containing the computed summary statistics, grouped by the specified variables. This data frame can be automatically saved as an 'Excel' file using export_to_excel = TRUE
.
A 'data.frame', 'data.table' or 'tibble', i.e. input data to be summarized.
A character string, vector or list with characters. The name of the column(s) in data
for which summary statistics will be calculated.
One or more character strings specifying the grouping variables in data
. At least one grouping variable must be provided.
Logical. If TRUE
, the summary results n
will be included in the output.
Logical. If TRUE
, the summary results mean
will be included in the output.
Logical. If TRUE
, the summary results sd
will be included in the output.
Logical. If TRUE
, the summary results se
will be included in the output.
Logical. If TRUE
, the summary results min
will be included in the output.
Logical. If TRUE
, the summary results max
will be included in the output.
Logical. If TRUE
, the summary results median
will be included in the output.
Logical. If TRUE
, the summary results Q1
will be included in the output.
Logical. If TRUE
, the summary results Q3
will be included in the output.
Integer. Round to the number of digits specified. If digits = NULL
no rounding is applied (default is digits = 2
). Note that this rounding is independent of the rounding in the exported excel file.
Logical. If TRUE
, the (unrounded values) summary results will be exported to an 'Excel' file. Default is FALSE
.
Logical. If TRUE
, closes open 'Excel' files. This to be able to save the newly generated file. Default is FALSE
.
Logical. If TRUE
, Opens the generated 'Excel' files. This to directly view the results after creation. Files are stored in tempdir(). Default is TRUE
.
Character string specifying the name of the output file. Default is "dataname_summary.xlsx".
Character string specifying the name of the directory of the output file. Default is tempdir()
. If the output_file
already contains a directory name output_dir
can be omitted, if used it overwrites the dir specified in output_file
.
Logical. If TRUE
, saves the file in the working directory Default is FALSE
, to avoid unintended changes to the global environment. If the output_dir
is specified save_in_wdir
is overwritten with output_dir
.
Logical. If TRUE
and export_to_excel
is also TRUE
, the generated 'Excel' file will be opened automatically. Default is TRUE
.
If TRUE
, checks the input and stops the function if the input is incorrect (default is TRUE
).
Logical. If TRUE
, the function evaluates the third function argument. This should be a character vector with the group by columns. Default is FALSE
, which allows group by columns to be written without quotes.
Integer. Round cells in the excel file to the number of digits specified. If digits_excel = NULL
no rounding is applied (default is digits_excel = NULL
). Note to preserve formatting numbers will be stored as text.
Logical. If TRUE
, columns in a data.frame containing only integers will be displayed without decimal digits. Columns containing a mix of integers and decimal values will display all values with the specified number of digits. If FALSE
, each individual cell is evaluated: integer values are displayed without digits, and numbers containing digits with the specified number of digits. Default is TRUE
.
Sander H. van Delden plantmind@proton.me
The function computes the following summary statistics for the specified column:
n
: number of observations
mean
: mean
sd
: standard deviation
se
: standard error of the mean
min
: minimum value
max
: maximum value
median
: median
Q1
: first quartile
Q3
: third quartile
Each of these summary statistics can be removed by setting e.g. show_n = FALSE
, The results are grouped by the specified grouping variables and returned as a data frame. If export_to_excel
is set to TRUE
, the results are saved as an 'Excel' file in the working directory with a dynamically generated filename.
# Example usage:
# Create a summary of mtcars for data column hp grouped by cyl and gear,
# and remove Q1 and Q3 from the output.
# Note that variable can be written as "hp" or as hp. Only data.frame must be data (no quotes)
summary_mtcars <- f_summary(mtcars, "hp", "cyl", "gear", show_Q1 = FALSE, show_Q3 = FALSE)
print(summary_mtcars)
# Create a summary for iris
summary_iris <- f_summary(iris, Sepal.Length, Species)
# Print the a table with column width of 10 characters and table length of 70 characters
print(summary_iris, col_width = 10, table_width = 70)
Run the code above in your browser using DataLab