Learn R Programming

cleanerR (version 0.1.1)

How to Handle your Missing Data

Description

How to deal with missing data?Based on the concept of almost functional dependencies, a method is proposed to fill missing data, as well as help you see what data is missing. The user can specify a measure of error and how many combinations he wish to test the dependencies against, the closer to the length of the dataset, the more precise. But the higher the number, the more time it will take for the process to finish. If the program cannot predict with the accuracy determined by the user it shall not fill the data, the user then can choose to increase the error or deal with the data another way.

Copy Link

Version

Install

install.packages('cleanerR')

Monthly Downloads

16

Version

0.1.1

License

MIT + file LICENSE

Maintainer

Rafael Pereira

Last Published

February 10th, 2019

Functions in cleanerR (0.1.1)

GenerateCandidates

GenerateCandidates Asks for a dataframe and some parameters and returns all possible combinations of collums for prediction that satisfy a given error in input in a list the first element of the list are the combinations while the second is its measure of error,to get the best parameters call BestVector

WorstAccuracyTable

WorstAccuracyTable Asks for a data.table, a vector of collumn indices and the goal collumn and returns the minimum possible value of accuracy of filling missing values

WorstAccuracy Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the minimum possible value of accuracy of filling missing values

NA_VALUES Asks for a dataframe and returns a table of how many missing values are in each collum

GenerateCandidatesTable

GenerateCandidatesTable Asks for a data.table and some parameters and returns all possible combinations of collums for prediction that satisfy a given error in input in a list the first element of the list are the combinations while the second is its measure of error,to get the best parameters call BestVector

MeanAccuracyTable

MeanAccuracyTable Asks for a data.table, a vector of collumn indices and the goal collumn the expected value of accuracy of filling missing values if the dataset is representative

MeanAccuracy Asks for a dataframe, a vector of collumn indices and the goal collumn the expected value of accuracy of filling missing values if the dataset is representative

CandidatesTable

CandidatesTable candidates implementation that asks for a data.table object

AutoComplete Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the data frame with the values filled

AutoCompleteTable

AutoCompleteTable Asks for a data.table, a vector of collumn indices and the goal collumn and returns the data frame with the values filled

BestAccuracy Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the maximum possible value of accuracy of filling missing values

BestAccuracyTable

BestAccuracyTable Asks for a data.table, a vector of collumn indices and the goal collumn and returns the maximum possible value of accuracy of filling missing values

CompleteDataset

CompleteDataset Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the data frame with the values filled

BestVector Asks for a dataframe and some parameters and returns the best combination of collums to predict the missing value

BestVectorTable

BestVectorTable Asks for a data.table and some parameters and returns the best combination of collums to predict the missing value

CompleteDatasetTable

CompleteDatasetTable Asks for a data.table, a vector of collumn indices and the goal collumn and returns the data frame with the values filled

Candidates Asks for a dataframe and some parameters and returns how close the collums chosen can predict the goal collum Should be used mostly with generate_candidates or preferably BestVector in case you only want the best combination possible for prediction