Learn R Programming

cleanerR (version 0.1.1)

How to Handle your Missing Data

Description

How to deal with missing data?Based on the concept of almost functional dependencies, a method is proposed to fill missing data, as well as help you see what data is missing. The user can specify a measure of error and how many combinations he wish to test the dependencies against, the closer to the length of the dataset, the more precise. But the higher the number, the more time it will take for the process to finish. If the program cannot predict with the accuracy determined by the user it shall not fill the data, the user then can choose to increase the error or deal with the data another way.

Copy Link

Version

Install

install.packages('cleanerR')

Monthly Downloads

16

Version

0.1.1

License

MIT + file LICENSE

Maintainer

Rafael Pereira

Last Published

February 10th, 2019

Functions in cleanerR (0.1.1)

GenerateCandidates

GenerateCandidates Asks for a dataframe and some parameters and returns all possible combinations of collums for prediction that satisfy a given error in input in a list the first element of the list are the combinations while the second is its measure of error,to get the best parameters call BestVector
WorstAccuracyTable

WorstAccuracyTable Asks for a data.table, a vector of collumn indices and the goal collumn and returns the minimum possible value of accuracy of filling missing values
WorstAccuracy

WorstAccuracy Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the minimum possible value of accuracy of filling missing values
NA_VALUES

NA_VALUES Asks for a dataframe and returns a table of how many missing values are in each collum
GenerateCandidatesTable

GenerateCandidatesTable Asks for a data.table and some parameters and returns all possible combinations of collums for prediction that satisfy a given error in input in a list the first element of the list are the combinations while the second is its measure of error,to get the best parameters call BestVector
MeanAccuracyTable

MeanAccuracyTable Asks for a data.table, a vector of collumn indices and the goal collumn the expected value of accuracy of filling missing values if the dataset is representative
MeanAccuracy

MeanAccuracy Asks for a dataframe, a vector of collumn indices and the goal collumn the expected value of accuracy of filling missing values if the dataset is representative
CandidatesTable

CandidatesTable candidates implementation that asks for a data.table object
AutoComplete

AutoComplete Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the data frame with the values filled
AutoCompleteTable

AutoCompleteTable Asks for a data.table, a vector of collumn indices and the goal collumn and returns the data frame with the values filled
BestAccuracy

BestAccuracy Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the maximum possible value of accuracy of filling missing values
BestAccuracyTable

BestAccuracyTable Asks for a data.table, a vector of collumn indices and the goal collumn and returns the maximum possible value of accuracy of filling missing values
CompleteDataset

CompleteDataset Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the data frame with the values filled
BestVector

BestVector Asks for a dataframe and some parameters and returns the best combination of collums to predict the missing value
BestVectorTable

BestVectorTable Asks for a data.table and some parameters and returns the best combination of collums to predict the missing value
CompleteDatasetTable

CompleteDatasetTable Asks for a data.table, a vector of collumn indices and the goal collumn and returns the data frame with the values filled
Candidates

Candidates Asks for a dataframe and some parameters and returns how close the collums chosen can predict the goal collum Should be used mostly with generate_candidates or preferably BestVector in case you only want the best combination possible for prediction