srvyr v0.3.5

0

Monthly downloads

0th

Percentile

'dplyr'-Like Syntax for Summary Statistics of Survey Data

Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.

Readme

srvyr

CRAN\_Status\_Badge Travis-CI Build
Status AppVeyor Build
Status Coverage
Status Documentation via
pkgdown

srvyr brings parts of dplyr’s syntax to survey analysis, using the survey package.

srvyr focuses on calculating summary statistics from survey data, such as the mean, total or quantile. It allows for the use of many dplyr verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, rlang’s style of non-standard evaluation and more consistent return types than the survey package.

You can try it out:

install.packages("srvyr")
# or for development version
# devtools::install_github("gergness/srvyr")

Example usage

First, describe the variables that define the survey’s stucture with the function as_survey()with the bare column names of the names that you would use in functions from the survey package like survey::svydesign(), survey::svrepdesign() or survey::twophase().

library(srvyr, warn.conflicts = FALSE)
data(api, package = "survey")

dstrata <- apistrat %>%
   as_survey_design(strata = stype, weights = pw)

Now many of the dplyr verbs are available.

  • mutate() adds or modifies a variable.
dstrata <- dstrata %>%
  mutate(api_diff = api00 - api99)
  • summarise() calculates summary statistics such as mean, total, quantile or ratio.
dstrata %>% 
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
#> # A tibble: 1 x 3
#>   api_diff api_diff_low api_diff_upp
#>      <dbl>        <dbl>        <dbl>
#> 1     32.9         28.8         37.0
  • group_by() and then summarise() creates summaries by groups.
dstrata %>% 
  group_by(stype) %>%
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
#> # A tibble: 3 x 4
#>   stype api_diff api_diff_low api_diff_upp
#>   <fct>    <dbl>        <dbl>        <dbl>
#> 1 E        38.6         33.1          44.0
#> 2 H         8.46         1.74         15.2
#> 3 M        26.4         20.4          32.4
  • Functions from the survey package are still available:
my_model <- survey::svyglm(api99 ~ stype, dstrata)
summary(my_model)
#> 
#> Call:
#> svyglm(formula = api99 ~ stype, design = dstrata)
#> 
#> Survey design:
#> Called via srvyr
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   635.87      13.34  47.669   <2e-16 ***
#> stypeH        -18.51      20.68  -0.895    0.372    
#> stypeM        -25.67      21.42  -1.198    0.232    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for gaussian family taken to be 16409.56)
#> 
#> Number of Fisher Scoring iterations: 2

What people are saying about srvyr

[srvyr] lets us use the survey library’s functions within a data analysis pipeline in a familiar way.

Kieran Healy, in Data Visualization: A practical introduction

  1. Yay!

Thomas Lumley, in the Biased and Inefficent blog

Contributing

I do appreciate bug reports, suggestions and pull requests! I started this as a way to learn about R package development, and am still learning, so you’ll have to bear with me. Please review the Contributor Code of Conduct, as all participants are required to abide by its terms.

If you’re unfamiliar with contributing to an R package, I recommend the guides provided by Rstudio’s tidyverse team, such as Jim Hester’s blog post or Hadley Wickham’s R packages book.

Functions in srvyr

Name Description
svychisq Chisquared tests of association for survey data.
%>% Pipe operator
current_svy Get the survey data for the current context
rlang-tidyeval Tidy eval helpers from rlang
tbl_svy tbl_svy object.
unweighted Calculate the an unweighted summary statistic from a survey
srvyr-se-deprecated Deprecated SE versions of main srvyr verbs
tbl_vars List variables produced by a tbl.
set_survey_vars Set the variables for the current survey variable
as_survey_design Create a tbl_svy survey object using sampling design
summarise_all Manipulate multiple columns.
srvyr svrvyr: A package for 'dplyr'-Like Syntax for Summary Statistics of Survey Data.
survey_mean Calculate the mean and its variation using survey methods
summarise Summarise multiple values to a single value.
survey_total Calculate the total and its variation using survey methods
group_by Group a (survey) dataset by one or more variables.
survey_var Calculate the population variance and its variation using survey methods
survey_ratio Calculate the ratio and its variation using survey methods
survey_quantile Calculate the quantile and its variation using survey methods
groups Get/set the grouping variables for tbl.
collect Force computation of a database query
as_survey_rep Create a tbl_svy survey object using replicate weights
cascade Summarise multiple values into cascading groups
as_survey Create a tbl_svy from a data.frame
as_tibble Coerce survey variables to a data frame (tibble)
dplyr_single Single table verbs from dplyr
as_survey_twophase Create a tbl_svy survey object using two phase design
get_var_est Get the variance estimates for a survey estimate
No Results!

Vignettes of srvyr

Name
acs_m.RData
extending-srvyr.Rmd
save_acs_data.R
srvyr-database.Rmd
srvyr-vs-survey.Rmd
No Results!

Last month downloads

Details

Type Package
Date 2019-07-07
URL http://gdfe.co/srvyr, https://github.com/gergness/srvyr
BugReports https://github.com/gergness/srvyr/issues
License GPL-2 | GPL-3
LazyData TRUE
Encoding UTF-8
VignetteBuilder knitr
RoxygenNote 6.1.1
NeedsCompilation no
Packaged 2019-07-09 11:52:43 UTC; greg
Repository CRAN
Date/Publication 2019-07-09 12:10:03 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/srvyr)](http://www.rdocumentation.org/packages/srvyr)