incadata v0.6.1

0

Monthly downloads

0th

Percentile

by Erik Bulow

Recognize and Handle Data in Formats Used by Swedish Cancer Centers

Handle data in formats used by cancer centers in Sweden, both from 'INCA' (the current register platform, (see <http://www.incanet.se> for more information) and by the older register platform 'Rockan' (used in the Western and Northern part of the country). All variables are coerced to suitable classes based on their format. Dates (from various formats such as with missing month or day, with or without century prefix or with just a week number) are all recognized as dates and coerced to the ISO 8601 standard (Y-m-d). Boolean variables (internally stored either as 0/1 or "True"/"False"/blanks when exported) are coerced to logical. Variable names ending in '_Beskrivning' and '_Varde' will be character, and 'PERSNR' will be coerced (if possible) to a valid personal identification number 'pin' (by the 'sweidnumbr' package). The package also allow the user to interactively choose if a variable should be coerced into a potential format even though not all of its values might conform to the recognized pattern. It also contain a caching mechanism in order to temporarily store data sets with its newly decided formats in order to not rerun the identification process each time. And finally, it also include a mechanism to aid the documentation process connected to projects build on data from 'INCA'.

Readme

AppVeyor Build Status Project Status: Active - The project has reached a stable, usable state and is being actively developed. CRAN\_Status\_Badge Monthly downloads Total downloads

NOTE: This package is still in beta! Please report any issue!

incadata

Motivating example

Some INCA formats are strange!

  1. All of these are valid dates: 1985-05-04, "", 19850504, 19850500 , 19850000, 8513
  2. This is an INCA internal Boolean: c(0, 1, 0, 1, 0, 0)
  3. This is an INCA exported Boolean: c(NA, "True", NA, "True", NA, NA)
  4. This is a valid personal identification number: 19470101000X (note the last "X")

The workflow of INCA today requires that you use a data frame "df" online but that you instead read in your data from disk offline. This force you to work either with different prescripts based on development stage, or to include an "if else"" clause identifying the current environment.

To work with register data often require good knowledge about form structure and access to register documentation, which must be found online.

What can "incadata" do for you?

The incadata package will recognize all peculiarities above and will coerce all formats into reasonable ones. It will also:

  • Always use lower case names since these are generally easier to work with
  • Treat data frames as "tibbles" since these have some advantages over regular data frames.
  • Add an id column to data frames in order to always have an identification variable at hand (regardless if the data has none or one of PERSNR, PNR or PAT_ID)
  • Enhance the data with some automatically decoded variables (relying on the decoder package)
  • Let you cache your data sets between work sessions in on order to speed up the data loading and munging process
  • Let you use a single data reading/munging function regardless if you work on INCA or locally
  • The package also contains a mechanism for you to interactively engage in the coercing process of variable formats. This is handy for example if a variable is almost a date but has some additional entries that are not recognised as such.
  • Finally, there is also a mechanis for project documentation for easu acces and storage of INCA register documentation (see vignette "incadoc").

Introduction

Some learning resources in their recommended order. Note that these refer to the published CRAN version of the documentation. Please also confirm any uncertenties with the current development versions after installing the package from Bitbucket (documentation might differ heavily during the initial development and evaluation phase of the package).

  1. Vignette: incadata
  2. Vignette: rccdates
  3. Vignette: incadoc
  4. function documentation
  5. PDF Reference manual

Install

# A stable version of the package can be installed from CRAN:
install.packages("incadata")

# The lates development version can be installed from Bitbucket:
Set argument `build_vignettes = TRUE` to also build the vignettes linked above
devtools::install_bitbucket("cancercentrum/incadata")

Functions in incadata

Name Description
as.Dates Converting potential date to Date vector
as.incadata Identify data formats used by INCA and Rockan
is.incalogical Coerce to logical if value is logical according to INCA
lt Lead time from one date to another
documents Download and possibly open INCA documentation
dplyr_methods dplyr methods for INCA data
ex_data Synthetic example data from INCA
find_documents List all documents for a register
find_register Find register by name
id Add id variables to data frame
use_incadata Use incadata from file or dataframe df
next_method Function to create methods for generics
reexports Objects exported from other packages
No Results!

Vignettes of incadata

Name
incadata.Rmd
incadoc.Rmd
rccdates.Rmd
No Results!

Last month downloads

Details

Type Package
License GPL-2
RoxygenNote 6.0.1.9000
VignetteBuilder knitr
URL https://www.bitbucket.org/cancercentrum/incadata
BugReports https://www.bitbucket.org/cancercentrum/incadata/issues
LazyData true
NeedsCompilation no
Packaged 2017-07-28 12:39:59 UTC; erikbulow
Repository CRAN
Date/Publication 2017-07-28 12:46:05 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/incadata)](http://www.rdocumentation.org/packages/incadata)