Learn R Programming

skewsamp

The goal of skewsamp is to provide access to sample size estimation methods for group comparisons where the underlying data are skewed and thus violate the assumptions for common methods of sample size estimation.

In particular, skewsamp offers an approach based on generalized linear models (GLM) as described by Cundill & Alexander (2015) and the “NECDF” (Noether Empirical Distribution Function) approach based on the nonparametric Wilxocon-Mann-Whitney test in the location shift paradigm as described by Chakraborti, Hong, & van de Wiel (2006).

Installation

You can install the package directly from github:

# install.packages("devtools") # if you do not have devtools already installed, you need it for the installation
devtools::install_github("https://github.com/jobrachem/skewsamp)

Documentation

All function are documented, so that you can use R’s builtin help system. You can also refer to the online documentation, which includes a list of all functions.

Simulation study

We verified the correctness of our implementation through extensive simulations. The data, code and final report are available on the Open Science Framework. Note that the report is written in german.

The simulations revealed that the GLM-based approach (Cundill & Alexander, 2015) works robustly. The nonparametric NECDF approach is dependent on pilot data and can provide significant underestimations of the required sample sizes. Please consult the report linked above for further details.

Usage

Example 1

Sample size determination in the GLM approach for gamma-distributed data:

library(skewsamp)
skewsamp::n_gamma(mean0 = 1, effect = 0.5, shape0 = 1, alpha = 0.05, power = 0.9)
#> Estimated sample size for group difference.
#> Generalized Regression, Gamma Distribution, link: log 
#> 
#> N (total)         87.48 
#> n0 (Group 0)      43.74 
#> n1 (Group 1)      43.74 
#> 
#> Effect size       0.5 
#> Effect type       1 - (mean1/mean0) 
#> Type I error      0.05 
#> Target power      0.9 
#> Two-sided         TRUE 
#> 
#> Call: skewsamp::n_gamma(mean0 = 1, effect = 0.5, shape0 = 1, alpha = 0.05, 
#>     power = 0.9)

Example 2

Sample size determination in the location shift approach. This approach requires pilot data, which we draw from an exponential distribution for the sake of the example:

library(skewsamp)
skewsamp::n_locshift(s1 = rexp(10), s2 = rexp(10), delta = 0.5, alpha = 0.05, power = 0.9)
#> Estimated sample size for group difference.
#> Wilcoxon-Mann-Whitney Test, Location shift 
#> 
#> N (total)         97.35 
#> n0 (Group 0)      48.68 
#> n1 (Group 1)      48.68 
#> 
#> Effect size       0.5 
#> Effect type       location shift 
#> Type I error      0.05 
#> Target power      0.9 
#> Two-sided         FALSE 
#> 
#> Call: skewsamp::n_locshift(s1 = rexp(10), s2 = rexp(10), delta = 0.5, 
#>     alpha = 0.05, power = 0.9)

References

Copy Link

Version

Install

install.packages('skewsamp')

Monthly Downloads

254

Version

1.0.0

License

MIT + file LICENSE

Maintainer

Johannes Brachem

Last Published

December 16th, 2021

Functions in skewsamp (1.0.0)

n_negbinom

Calculate sample size for negative binomial distribution
find_smaller_index

Finds the index of the smaller neighbour of the given value in the vector x.
n_noether

Noether's (1987) formula for obtaining a sample size estimation for the two-sample Wilcoxon Mann-Whitney test.
extend_sample

Extends a vector by adding a lower and upper boundary.
pemp

Empirical cumulative density function (ECDF)
n_poisson

Calculate sample size for poisson distribution
demp

Empirical probability density function (EPDF)
resample_n_locshift

Compute a distribution of estimates of N based on two pilot samples.
estimate_p

Computes an empirical estimate of p (\(P(X < X + \delta)\))
n_glm

Calculate sample size for a group comparison via generalized linear models
resample_n_locshift_one

Compute n_resamples estimates of N
n_locshift

Estimate N on the basis of two pilot samples.
qemp

Empirical quantile function
remp

Draws random values from the ECDF obtained from sample
n_binom

Calculate sample size for binomial distribution
n_locshift_bound

Compute an upper bound the sample size based on two pilot samples.
n_gamma

Calculate sample size for gamma distribution
n_locshift_one

Estimate N on the basis of one pilot sample.
create_lower_extension

Computes the lower boundary value for the empirical CDF
create_upper_extension

Computes the upper boundary value for the empirical CDF