Given a coin and a specified data set (dset
), returns a table of statistics with entries for each column.
# S3 method for coin
get_stats(
x,
dset,
t_skew = 2,
t_kurt = 3.5,
t_avail = 0.65,
t_zero = 0.5,
t_unq = 0.5,
nsignif = 3,
out2 = "df",
...
)
Either a data frame or updated coin - see out2
.
A coin
A data set present in .$Data
Absolute skewness threshold. See details.
Kurtosis threshold. See details.
Data availability threshold. See details.
A threshold between 0 and 1 for flagging indicators with high proportion of zeroes. See details.
A threshold between 0 and 1 for flagging indicators with low proportion of unique values. See details.plot
Number of significant figures to round the output table to.
Either "df"
(default) to output a data frame of indicator statistics, or "coin
" to output an
updated coin with the data frame attached under .$Analysis
.
arguments passed to or from other methods.
The statistics (columns in the output table) are as follows (entries correspond to each column):
Min
: the minimum
Max
: the maximum
Mean
: the (arirthmetic) mean
Median
: the median
Std
: the standard deviation
Skew
: the skew
Kurt
: the kurtosis
N.Avail
: the number of non-NA
values
N.NonZero
: the number of non-zero values
N.Unique
: the number of unique values
Frc.Avail
: the fraction of non-NA
values
Frc.NonZero
: the fraction of non-zero values
Frc.Unique
: the fraction of unique values
Flag.Avail
: a data availability flag - columns with Frc.Avail < t_avail
will be flagged as "LOW"
, else "ok"
.
Flag.NonZero
: a flag for columns with a high proportion of zeros. Any columns with Frc.NonZero < t_zero
are
flagged as "LOW"
, otherwise "ok"
.
Flag.Unique
: a unique value flag - any columns with Frc.Unique < t_unq
are flagged as "LOW"
, otherwise "ok"
.
Flag.SkewKurt
: a skew and kurtosis flag which is an indication of possible outliers. Any columns with
abs(Skew) > t_skew
AND Kurt > t_kurt
are flagged as "OUT"
, otherwise "ok"
.
The aim of this table, among other things, is to check the basic statistics of each column/indicator, and identify
any possible issues for each indicator. For example, low data availability, having a high proportion of zeros and/or
a low proportion of unique values. Further, the combination of skew and kurtosis (i.e. the Flag.SkewKurt
column)
is a simple test for possible outliers, which may require treatment using Treat()
.
The table can be returned either to the coin or as a standalone data frame - see out2
.
See also vignette("analysis")
.
# build example coin
coin <- build_example_coin(up_to = "new_coin", quietly = TRUE)
# get table of indicator statistics for raw data set
get_stats(coin, dset = "Raw", out2 = "df")
Run the code above in your browser using DataLab