Learn R Programming

retroharmonize

The goal of retroharmonize is to facilitate retrospective (ex-post) harmonization of data, particularly survey data, in a reproducible manner. The package provides tools for organizing the metadata, standardizing the coding of variables, variable names and value labels, including missing values, and for documenting all transformations, with the help of comprehensive S3 classes.

Currently being generalized from problems solved in the not yet released eurobarometer package (doi.)

Installation

The package is available on CRAN:

install.packages("retroharmonize")

The development version has new features with the create_codebook() functions. It can be installed from GitHub with:

# install.packages("devtools")
devtools::install_github("rOpenGov/retroharmonize")

You can download the manual in PDF for the 0.2.0 release.

Retrospective data harmonization

The aim of retroharmonize is to provide tools for reproducible retrospective (ex-post) harmonization of datasets that contain variables measuring the same concepts but coded in different ways. Ex-post data harmonization enables better use of existing data and creates new research opportunities. For example, harmonizing data from different countries enables cross-national comparisons, while merging data from different time points makes it possible to track changes over time.

Retrospective data harmonization is associated with challenges including conceptual issues with establishing equivalence and comparability, practical complications of having to standardize the naming and coding of variables, technical difficulties with merging data stored in different formats, and the need to document a large number of data transformations. The retroharmonize package assists with the latter three components, freeing up the capacity of researchers to focus on the first.

Specifically, the retroharmonize package proposes a reproducible workflow, including a new class for storing data together with the harmonized and original metadata, as well as functions for importing data from different formats, harmonizing data and metadata, documenting the harmonization process, and converting between data types. See here for an overview of the functionalities.

The new labelled_spss_survey() class is an extension of haven’s labelled_spss class. It not only preserves variable and value labels and the user-defined missing range, but also gives an identifier, for example, the filename or the wave number, to the vector. Additionally, it enables the preservation – as metadata attributes – of the original variable names, labels, and value codes and labels, from the source data, in addition to the harmonized variable names, labels, and value codes and labels. This way, the harmonized data also contain the pre-harmonization record. The stored original metadata can be used for validation and documentation purposes.

The vignette Working With The labelled_spss_survey Class provides more information about the labelled_spss_survey() class.

In Harmonize Value Labels we discuss the characteristics of the labelled_spss_survey() class and demonstrates the problems that using this class solves.

We also provide three extensive case studies illustrating how the retroharmonize package can be used for ex-post harmonization of data from cross-national surveys:

The creators of retroharmonize are not affiliated with either Afrobarometer, Arab Barometer, Eurobarometer, or the organizations that designs, produces or archives their surveys.

We started building an experimental APIs data is running retroharmonize regularly and improving known statistical data sources. See: Digital Music Observatory, Green Deal Data Observatory, Economy Data Observatory.

Citations and related work

Citing the data sources

Our package has been tested on three harmonized survey’s microdata. Because retroharmonize is not affiliated with any of these data sources, to replicate our tutorials or work with the data, you have download the data files from these sources, and you have to cite those sources in your work.

Afrobarometer data: Cite Afrobarometer Arab Barometer data: cite Arab Barometer. Eurobarometer data: The Eurobarometer data Eurobarometer raw data and related documentation (questionnaires, codebooks, etc.) are made available by GESIS, ICPSR and through the Social Science Data Archive networks. You should cite your source, in our examples, we rely on the GESIS data files.

Citing the retroharmonize R package

For main developer and contributors, see the package homepage.

This work can be freely used, modified and distributed under the GPL-3 license:

citation("retroharmonize")
#> 
#> To cite package 'retroharmonize' in publications use:
#> 
#>   Daniel Antal (2021). retroharmonize: Ex Post Survey Data
#>   Harmonization. https://retroharmonize.dataobservatory.eu/,
#>   https://ropengov.github.io/retroharmonize/,
#>   https://github.com/rOpenGov/retroharmonize.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {retroharmonize: Ex Post Survey Data Harmonization},
#>     author = {Daniel Antal},
#>     year = {2021},
#>     note = {https://retroharmonize.dataobservatory.eu/,
#> https://ropengov.github.io/retroharmonize/,
#> https://github.com/rOpenGov/retroharmonize},
#>   }

Contact

For contact information, see the package homepage.

Code of Conduct

Please note that the retroharmonize project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('retroharmonize')

Monthly Downloads

503

Version

0.2.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Daniel Antal

Last Published

November 2nd, 2021

Functions in retroharmonize (0.2.0)

suggest_var_names

Suggest variable names
suggest_permanent_names

Suggest permanent names
metadata_initialize

Initialize a metadata data frame
metadata_create

Create a metadata table
document_survey_item

Document survey item harmonization
label_normalize

Normalize value and variable labels
here

Here
document_waves

Document survey lists
create_codebook

Create a codebook
as_factor

Convert labelled_spss_survey vector To Factor
convert_to_labelled_spss

Convert to haven_labelled_spss
subset_waves

Subset all surveys in a wave
subset_save_surveys

Subset and Save Surveys
harmonize_na_values

Harmonize na_values in haven_labelled_spss
labelled_spss_survey

Labelled vectors for multiple SPSS surveys
pull_survey

Pull a survey from a survey list
harmonize_values

Harmonize the values and labels of labelled vectors
merge_waves

Merge waves
read_dta

Read Stata DTA files (`.dta`) files
harmonize_var_names

Harmonize the variable names of surveys
as_labelled_spss_survey

Labelled to labelled_spss_survey
harmonize_waves

Harmonize waves
collect_val_labels

Collect labels from metadata file
read_surveys

Read Survey Files
concatenate

Concatenate haven_labelled_spss vectors
%>%

Pipe operator
na_range_to_values

Harmonize user-defined missing value ranges
read_rds

Read survey from rds file
read_spss

Read SPSS (`.sav`, `.zsav`, `.por`) files. Write `.sav` and `.zsav` files.
survey

Survey data frame
retroharmonize

retroharmonize: Retrospective harmonization of survey data files
validate_harmonize_labels

Validate harmonize_labels parameter Check if "from", "to", and "numeric_values" are of equal lengths.