Learn R Programming

dataprep (version 0.1.5)

Efficient and Flexible Data Preprocessing Tools

Description

Efficiently and flexibly preprocess data using a set of data filtering, deletion, and interpolation tools. These data preprocessing methods are developed based on the principles of completeness, accuracy, threshold method, and linear interpolation and through the setting of constraint conditions, time completion & recovery, and fast & efficient calculation and grouping. Key preprocessing steps include deletions of variables and observations, outlier removal, and missing values (NA) interpolation, which are dependent on the incomplete and dispersed degrees of raw data. They clean data more accurately, keep more samples, and add no outliers after interpolation, compared with ordinary methods. Auto-identification of consecutive NA via run-length based grouping is used in observation deletion, outlier removal, and NA interpolation; thus, new outliers are not generated in interpolation. Conditional extremum is proposed to realize point-by-point weighed outlier removal that saves non-outliers from being removed. Plus, time series interpolation with values to refer to within short periods further ensures reliable interpolation. These methods are based on and improved from the reference: Liang, C.-S., Wu, H., Li, H.-Y., Zhang, Q., Li, Z. & He, K.-B. (2020) .

Copy Link

Version

Install

install.packages('dataprep')

Monthly Downloads

323

Version

0.1.5

License

GPL (>= 2)

Maintainer

Chun-Sheng Liang

Last Published

January 15th, 2022

Functions in dataprep (0.1.5)

obsedele

Delete observations with variable(s) containing too many consecutive missing values (NA) in time series
dataprep

Data preprocessing with multiple steps in one function
percdata

Calculate the top and bottom percentiles of each selected variable
descdata

Fast descriptive statistics
melt

Turn variable names and values into two columns
descplot

View the descriptive statistics via plot
optisolu

Find an optimal combination of interval and times for condextr
condextr

Remove outliers using point-by-point weighed outlier removal by conditional extremum
data1

Example data (data1, particle number concentrations in SMEAR I Varrio forest)
percoutl

Traditional percentile-based outlier removal
shorvalu

Interpolation with values to refer to within short periods
zerona

Turn zeros to missing values
percplot

Plot the top and bottom percentiles of each selected variable
data

Example data (particle number concentrations in SMEAR I Varrio forest)
varidele

Delete variables containing too many missing values (NA)