Learn R Programming

recodeflow

Introduction

What is recodeflow?

recodeflow recodes variables from multiple data sets into harmonized variables.

recodeflow has basic functions and templates required to define, recode, and harmonize variables for any dataset.

Why should I use recodeflow?

Recoding and cleaning your data is typically the most time consuming step of your project. Existing functions such as sjmisc::rec() and dplyr:recode() work well but they are limited to recoding one variable at a time.

The recodeflow package takes data cleaning and recoding one step further. recodeflow allows you to recode multiple variables at the same time, and harmonize variables across similar databases even when the variables and variables' categories change.

recodeflow also helps to reduce errors, document the recode process, and ensures your new variables have labels and other metadata.

Even if your project has few variables,recodeflow can save you time.

How does recodeflow work?

Use the worksheets variables and variable_details to list your variables and state how to recode the each variable.

Once your variables are defined, use recodeflow functions to clean and recode your data. The main recodeflow function is rec_with_table which recodes variables within you dataset(s) based on how you've defined the variable in the worksheets variables and variable_details.

What's included in recodeflow?

The recodeflow package includes:

  • functions required to clean and recode variables.

  • worksheets:

    • variables a list of variables to recode and
    • variable_details mapping of variables across datasets and a list of instructions for recoding variables.

We've also created the following documentation to help you understand recodeflow:

  • how to guides examples of how to use recodeflow and adapt recodeflow for your dataset,
  • articles that describe package elements (e.g., variables) in detail,
  • references that describe all recodeflow functions, and
  • example data to demonstrate recodeflow functions and templates.

Where is recodeflow used?

Currently recodeflow is used in packages that harmonize health surveys and health administrative databases.

  • cchsflow is a package that harmonizes variables across cycles of the Canadian Community Health Survey (CCHS). cchsflow is published.

  • raiflow is a package that will harmonize variables within the Resident Assessments Instruments (RAI) from various sources: Canada's Continuing Care Reporting System (CCRS) and Ontario's Resident Assessment Instrutment for Home Care (RAI-HC). raiflow is currently underdevelopment.

Roadmap

Projects on the roadmap are at the Github repository recodeflow under the projects tab.

Contributing

Please follow the recodeflow contribution guide if you would like to contribute to the recodeflow package.

Copy Link

Version

Install

install.packages('recodeflow')

Monthly Downloads

219

Version

0.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Rostyslav Vyuha

Last Published

June 9th, 2021

Functions in recodeflow (0.1.0)

build_missing_const_node

Build Constant node for a missing value for a variable.
get_margins

Extract margins from character vector.
get_start_var_name

Get variable name from variableStart using database name.
attach_cont_value_nodes_for_start_var

Attach continuous Value nodes for start variable.
build_variable_field_ref_node

Build FieldRef node for variable.
compare_value_based_on_interval

Compare Value Based On Interval
get_var_sheet_row

Get variable row from variable sheet.
get_variable_type_data_type

Get data type for variable type.
format_recoded_value

Recode NA formatting
example_der_fun

example_der_fun caluclates chol*bili
get_var_details_row_indices

Get all variable details row indices for a variable.
get_data_variable_name

Get Data Variable Name
create_id_row

ID role creation
create_label_list_element

Create label list element
get_var_details_rows

Get all variable details rows for a variable and database combination.
build_trans_dict

Build a TransformationDictionary node.
build_ranged_derived_field_apply_node

Build Apply node with interval nodes for DerivedField node.
is_equal

Checks whether two values are equal including NA
rec_with_table

Recode with Table
is_left_open

Extract margins from character vector.
recode_to_pmml

Creates a PMML document from variable and variable details sheets for specified database.
select_vars_by_role

Vars selected by role
build_numeric_derived_field_apply_node

Build Apply node with singleton numeric for DerivedField node.
is_right_open

Extract margins from character vector.
label_data

label_data
get_margin_closure

Get closure type for a margin.
set_data_labels

Set Data Labels
is_numeric

Check if a character object can be converted to a number.
is_rec_from_range

Check if recFrom is a range for a variable details row.
recode_columns

recode_columns
attach_apply_nodes

Attach Apply nodes to a parent node.
build_derived_field_value_node

Build Value node for DerivedField node.
build_data_field_for_var

Build DataField node for variable.
build_derived_field_node

Build DerivedField node.
add_data_field_children_for_start_var

Add DataField child nodes for start variable.
build_data_field_for_start_var

Build DataField node for start variable.
attach_derived_field_child_nodes

Attach child nodes to DerivedField node.
attach_cat_value_nodes_for_start_var

Attach categorical value nodes to DataField node for start variable.
attach_range_value_nodes

Attach Value nodes to DataField node. Used when `recFrom` has a value range.