Learn R Programming

dataPreparation (version 1.1.2)

Automated Data Preparation

Description

Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.

Copy Link

Version

Install

install.packages('dataPreparation')

Monthly Downloads

762

Version

1.1.2

License

GPL-3 | file LICENSE

Maintainer

Emmanuel-Lin Toulemonde

Last Published

September 2nd, 2025

Functions in dataPreparation (1.1.2)

fast_filter_variables

Filtering useless variables
fast_handle_na

Handle NA values
adult

Adult for UCI repository
aggregate_by_key

Automatic data_set aggregation by key
fast_is_equal

Fast checks of equality
as.POSIXct_fast

Faster date transformation
build_bins

Compute bins
generate_from_factor

Recode factor
fast_round

Fast round
find_and_transform_numerics

Identify numeric columns in a data_set set
generate_date_diffs

Date difference
messy_adult

Adult with some ugly columns added
identify_dates

Identify date columns
compute_weight_of_evidence

Compute weight of evidence
compute_probability_ratio

Compute probability ratio
generate_from_character

Recode character
date_format_unifier

Unify dates format
generate_factor_from_date

Generate factor from dates
data_preparation_news

Show the NEWS file
one_hot_encoder

One hot encoder
prepare_set

Preparation pipeline
set_col_as_date

Set columns as POSIXct
tiny_messy_adult

First 500 rows of messy_adult
get_most_frequent_element

Get most frequent element
shape_set

Final preparation before ML algorithm
set_col_as_numeric

Set columns as numeric
target_encode

Target encode
which_are_included

Identify columns that are included in others
remove_rare_categorical

Filter rare categories
set_as_numeric_matrix

Numeric matrix preparation for Machine Learning.
remove_percentile_outlier

Percentile outlier filtering
which_are_constant

Identify constant columns
set_col_as_character

Set columns as character
same_shape

Give same shape
remove_sd_outlier

Standard deviation outlier filtering
un_factor

Unfactor factor with too many values
which_are_in_double

Identify double columns
set_col_as_factor

Set columns as factor
which_are_bijection

Identify bijections
build_date_factor

Date Factor
build_encoding

Compute encoding
build_scales

Compute scales
build_target_encoding

Build target encoding
fast_discretization

Discretization
fast_scale

scale
description

Describe data set
find_and_transform_dates

Identify date columns