Takes a data frame and returns a table of statistics with entries for each column.
# S3 method for data.frame
get_stats(
x,
t_skew = 2,
t_kurt = 3.5,
t_avail = 0.65,
t_zero = 0.5,
t_unq = 0.5,
nsignif = 3,
...
)
A data frame of statistics for each column
A data frame with only numeric columns.
Absolute skewness threshold. See details.
Kurtosis threshold. See details.
Data availability threshold. See details.
A threshold between 0 and 1 for flagging indicators with high proportion of zeroes. See details.
A threshold between 0 and 1 for flagging indicators with low proportion of unique values. See details.
Number of significant figures to round the output table to.
arguments passed to or from other methods.
The statistics (columns in the output table) are as follows (entries correspond to each column):
Min
: the minimum
Max
: the maximum
Mean
: the (arirthmetic) mean
Median
: the median
Std
: the standard deviation
Skew
: the skew
Kurt
: the kurtosis
N.Avail
: the number of non-NA
values
N.NonZero
: the number of non-zero values
N.Unique
: the number of unique values
Frc.Avail
: the fraction of non-NA
values
Frc.NonZero
: the fraction of non-zero values
Frc.Unique
: the fraction of unique values
Flag.Avail
: a data availability flag - columns with Frc.Avail < t_avail
will be flagged as "LOW"
, else "ok"
.
Flag.NonZero
: a flag for columns with a high proportion of zeros. Any columns with Frc.NonZero < t_zero
are
flagged as "LOW"
, otherwise "ok"
.
Flag.Unique
: a unique value flag - any columns with Frc.Unique < t_unq
are flagged as "LOW"
, otherwise "ok"
.
Flag.SkewKurt
: a skew and kurtosis flag which is an indication of possible outliers. Any columns with
abs(Skew) > t_skew
AND Kurt > t_kurt
are flagged as "OUT"
, otherwise "ok"
.
The aim of this table, among other things, is to check the basic statistics of each column/indicator, and identify
any possible issues for each indicator. For example, low data availability, having a high proportion of zeros and/or
a low proportion of unique values. Further, the combination of skew and kurtosis (i.e. the Flag.SkewKurt
column)
is a simple test for possible outliers, which may require treatment using Treat()
.
See also vignette("analysis")
.
# stats of mtcars
get_stats(mtcars)
Run the code above in your browser using DataLab