Learn R Programming

cchsflow

cchsflow supports the use of the Canadian Community Health Survey (CCHS) by transforming variables from each cycle into harmonized, consistent versions that span survey cycles (currently, 2001 to 2018).

The CCHS is a population-based cross-sectional survey of Canadians that has been administered every two years since 2001. There are approximately 130,000 respondents per cycle. Studies use multiple CCHS cycles to examine trends over time and increase sample size to examine sub-groups that are too small to examine in a single cycle.

The CCHS is one of the largest and most robust ongoing population health surveys worldwide. The CCHS, administered by Statistics Canada, is Canada's main general population health survey. Information about the survey is found here. The CCHS has a Statistic Canada Open Licence.

Concept

Each cycle of the CCHS contains over 1000 variables that cover the four main topics: sociodemographic measures, health behaviours, health status and health care use. The seemingly consistent questions across CCHS cycles entice you to combine them together to increase sample size; however, you soon realize a challenge...

Imagine you want to use BMI (body mass index) for a study that spans CCHS 2001 to 2018. BMI seems like a straightforward measure that is routinely-collected worldwide. Indeed, BMI is included in all CCHS cycles. You examine the documentation and find the variable HWTAGBMI in the CCHS 2001 corresponds to body mass index, but that in other cycles, the variable name changes to HWTCGBMI, HWTDGBMI, HWTEGBMI, etc. On reading the documentation, you notice that some cycles round the value to one decimal, whereas other cycles round to two digits. Furthermore, some cycles don't calculate BMI for respondents < age 20 or > 64 years. Also, some cycles calculate BMI only if height and weight are within specific ranges. These types of changes occur for almost all CCHS variables. Sometimes the changes are subtle and difficult to find in the documentation, even for seemingly straightforward variables such as BMI. cchsflow harmonizes the BMI variable across different cycles.

Usage

cchsflow creates harmonized variables (where possible) between CCHS cycles. Searching BMI in variables (described in the Introduction section of variableDetails.csv vignette) shows HWTGBMI calculates BMI with two decimal places for all cycles for all respondents using the respondents' untruncated height and weight.

Calculate a harmonized BMI variable for CCHS 2001 cycle

    # load test cchs data - included in cchsflow

    cchs2001_BMI <- rec_with_table(cchs2001_p, "HWTGBMI")
    

Notes printed to console indicate issues that may affect BMI classification for your study.

Loading cchsflow variable_details
Using the passed data variable name as database_name
NOTE for HWTGBMI : CCHS 2001 restricts BMI to ages 20-64
NOTE for HWTGBMI : CCHS 2001 and 2003 codes not applicable and missing 
variables as 999.6 and 999.7-999.9 respectively, while CCHS 2005 onwards codes 
not applicable and missing variables as 999.96 and 999.7-999.99 respectively
NOTE for HWTGBMI : Don't know (999.7) and refusal (999.8) not included
in 2001 CCHS"

Important notes

Combining CCHS across survey cycles will result in misclassification error and other forms of bias that affects studies in different ways. The transformations that are described in this repository have been used in several research projects, but there are no guarantees regarding the accuracy or appropriate uses. Thomas and Wannell describe methodolgy issues when combining CCHS cycles.

Care must be taken to understand how specific variable transformation and harmonization with cchsflow affect your study or use of CCHS data. Across survey cycles, almost all CCHS variables have had at least some change in wording and category responses. Furthermore, there have been changes in survey sampling, response rates, weighting methods and other survey design changes that affect responses.

Installation

    # Install release version from CRAN
    install.packages("cchsflow")

    # Install the most recent version from GitHub
    devtools::install_github("Big-Life-Lab/cchsflow")

New variables not yet added to the CRAN version

You can download and use the latest version of variables.csv and variable_details.csv from GitHub.

What is in the cchsflow package?

cchsflow package includes:

  1. variables.csv - a list of variables that can be transformed across CCHS

surveys.
2. variable_details.csv - information that describes how the variables are recoded. 3. Vignettes - that describe how to use R to transform or generate new derived variables that are listed in variables.csv. Transformations are performed using rec_with_table(). variables.csv and variable_details.csv can be used with other statistics programs (see issue). 4. Demonstration CCHS data - cchsflow includes a random sample of 200 respondents from each CCHS PUMF file from 2001 to 2018. These data are used for the vignettes. The CCHS test data is stored in /data as .RData files. They can be read as a package database.

# read the CCHS 2017-2018 PUMF test data

test_data <- cchs2017_2018_p

This repository does not include the full CCHS data. Information on how to access the CCHS data can is here. The Canadian university community can also access the CCHS through ODESI (see health/Canada/Canadian Community Health Survey).

Roadmap

Project on the roadmap can be found on here.

Contributing

Please follow this guide if you would like to contribute to the cchsflow package.

We encourage PRs for additional variable transformations and derived variables that you believe may be helpful to the broad CCHS community.

Currently, cchsflow supports R through the rec_with_table() function. The CCHS community commonly uses SAS, Stata and other statistical packages. Please feel free to contribute to cchsflow by making a PR that creates versions of rec_with_table() for other statistical and programming languages.

Statistics Canada Attribution

CCHS data used in this library is accessed and adapted in accordance to the Statistics Canada Open Licence Agreement.

Source from Statistics Canada, Canadian Community Health Survey 2001 to 2018 PUMF, accessed March 2022. Reproduced and distributed on an "as is" basis with the permission of Statistics Canada.

Adapted from Statistics Canada, Canadian Community Health Surveys 2001 to 2018 PUMF, accessed March 2022. This does not constitute an endorsement by Statistics Canada of this product.

Copy Link

Version

Install

install.packages('cchsflow')

Monthly Downloads

210

Version

2.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Kitty Chen

Last Published

May 26th, 2022

Functions in cchsflow (2.1.0)

DPSDPP

Depression Scale - Predicted Probability
active_transport1_fun

Daily active transportation (2001-2005)
DPSDSF

Derived Depression Scale - Short Form Score
active_transport2_fun

Daily active transportation (2007-2014)
bmi_fun_cat

Categorical BMI (international standard)
bmi_fun

Body Mass Index (BMI) derived variable
ALW_2A5

Number of drinks - Thursday
GEN_02A2

Satisfaction with life (GEN_02A/GEN_02A2)
SMKDSTY_fun

Type of smokers
LBFA_31A

Occupation Group (9 categories)
RACDPAL_fun

Participation and Activity Limitation
LBFA_31A_a

Occupation Group (5 categories)
active_transport3_fun

Daily active transportation (2015-2018)
LBFA_31A_b

Occupation Group (6 categories)
COPD_Emph_der_fun1

COPD_Emph_der_fun1
SMKG207_fun

Age started to smoke daily - former daily smoker
diet_score_fun

Diet score
SPS_5_fun

Five-item social provision scale (SPS-5)
cchs2003_p

2003 CCHS PUMF subset data (200 respondents)
get_data_variable_name

Get Data Variable Name
cchs2001_p

2001 CCHS PUMF subset data (200 respondents)
cchs2015_2016_p

2015-2016 CCHS PUMF subset data (200 respondents)
cchs2014_p

2014 CCHS PUMF subset data (200 respondents)
adl_fun

Derived needs help with tasks
ALW_2A7

Number of drinks - Saturday
diet_score_fun_cat

Categorized diet score
adjusted_bmi_fun

Adjusted Body Mass Index (BMI) derived variable
set_data_labels

Set Data Labels
if_else2

if_else2
COPD_Emph_der_fun2

COPD_Emph_der_fun2
age_cat_fun

Derived categorical age
cchs2005_p

2005 CCHS PUMF subset data (200 respondents)
multiple_conditions_fun1

Number of chronic conditions (5 chronic conditions)
multiple_conditions_fun2

Number of chronic conditions (6 chronic conditions)
smoke_simple_fun

Simple smoking status
time_quit_smoking_fun

Time since quit smoking
cchs2007_2008_p

2007-2008 CCHS PUMF subset data (200 respondents)
binge_drinker_fun

Binge drinking
variable_details

variable_details.csv
cchs2009_s

2009 CCHS synthetic subset data (200 respondents)
cchs2009_2010_p

2009-2010 CCHS PUMF subset data (200 respondents)
SMKG040_fun

Age started smoking daily - daily/former daily smokers
SMKG203_fun

Age started to smoke daily - daily smoker
adl_score_5_fun

The number of activities of daily living tasks that require help.
cchs2012_s

2012 CCHS synthetic subset data (200 respondents)
cchs2010_p

2010 CCHS PUMF subset data (200 respondents)
cchs2013_2014_p

2013-2014 CCHS PUMF subset data (200 respondents)
energy_exp_fun

Daily energy expenditure in leisure activity
low_drink_short_fun

Short term risks due to drinking
cchs2011_2012_p

2011-2012 CCHS PUMF subset data (200 respondents)
compare_value_based_on_interval

Compare Value Based On Interval
pack_years_fun

Smoking pack-years
merge_rec_data

Merge recoded data
rec_with_table

Recode with Table
cchs2017_2018_p

2017-2018 CCHS PUMF subset data (200 respondents)
cchs2010_s

2010 CCHS synthetic subset data (200 respondents)
cchs2012_p

2012 CCHS PUMF subset data (200 respondents)
food_insecurity_der

Food insecurity
pack_years_fun_cat

Categorical smoking pack-years
recode_variable_NA_formating

Recode NA formatting
resp_condition_fun1

resp_condition_fun1
immigration_fun

Immigration by ethnicity and settlement
is_equal

is equal
label_data

label_data
low_drink_long_fun

Long term risks due to drinking
pct_time_fun

Percent time in Canada
pct_time_fun_cat

Categorical percent time in Canada
low_drink_score_fun

Low drinking score (all cycles)
low_drink_score_fun1

Low drinking score (select cycles)
resp_condition_fun2

resp_condition_fun2
variables

variables.csv
resp_condition_fun3

resp_condition_fun3
recode_columns

recode_columns
ALW_1

Any alcohol past week
ALW_2A1

Number of drinks - Sunday
ALW_2A2

Number of drinks - Monday
ALW_2A3

Number of drinks - Tuesday
ALW_2A4

Number of drinks - Wednesday
ALW_2A6

Number of drinks - Friday
ALWDWKY

Number of drinks consumed in the past week
ALWDDLY

Average daily alcohol consumption
ALCDTTM

Type of drinker (12 months)
ALCDTYP

Type of drinker