Learn R Programming

dataPreparation (version 0.1)

prepareSet: Preparation pipeline

Description

Full pipeline for preparing your dataSet set It will perform the following steps: - Correct set: id dates and numerics that are hiden in string - Transform set: compute differences between every date, if `key` is provided, will perform aggregate according to this key - Filter set: filter constant, in double or bijection variables. If `digits` is provided, will round numerics - Handle NA: will perform fastHandleNa) - Shape set: will put the result in asked shape (`finalForm`) with acceptable columns format.

Usage

prepareSet(dataSet, finalForm = "data.table", verbose = TRUE, ...)

Arguments

dataSet

Matrix, data.frame or data.table

finalForm

"data.table" or "numerical_matrix" (default to data.table)

verbose

Should the algorithm talk? (logical, default to TRUE)

...

additional parameters to thune pipeline (see details)

Value

A data.table or a numerical matrix (according to finalForm) and

Details

Aditional arguments are available to thune pipeline:

  • key name of a column of dataSet according to which dataSet should be aggregated (character)

  • analysisDate A date at which the dataSet should be aggregated (differences between every date and analysisDate will be computed) (Date)

  • digits The number of digits after comma (optional, numeric, if set will perform fastRound)

  • dateFormats List of format of Dates in dataSet (list of characters)

  • name_separator string to separate parts of new column names (string)

  • functions: aggregation functions for numeric columns (list of functions)

Examples

Run this code
# NOT RUN {
# Load ugly set
# }
# NOT RUN {
data(messy_adult)

# Have a look to set
head(messy_adult)

# Compute full pipeline
clean_adult <- prepareSet(messy_adult)

# With a reference date
adult_agg <- prepareSet(messy_adult, analysisDate = as.Date("2017-01-01"))

# Add aggregation by country
adult_agg <- prepareSet(messy_adult, analysisDate = as.Date("2017-01-01"), key = "country")

# With some new aggregation functions
power <- function(x){sum(x^2)}
adult_agg <- prepareSet(messy_adult, analysisDate = as.Date("2017-01-01"), key = "country", 
                        functions = c(min, max, mean, power))
# }
# NOT RUN {
# "##NOT RUN:" mean that this example hasn't been run on CRAN since its long. But you can run it!
# }

Run the code above in your browser using DataLab