Learn R Programming

DupChecker (version 1.10.2)

validateFile: validateFile

Description

The function calculate MD5 fingerprint for each file in table and then check to see if any two files have same MD5 fingerprint. The files with same fingerprint will be treated as duplication. The function will return a table contains all duplicated files and datasets.

Usage

validateFile(fileTable, saveMd5File = TRUE)

Arguments

fileTable
a table with column name "dataset" and "file", here column "file" should contain full name of file.
saveMd5File
if calculated MD5 fingerprint should be save to local file

Value

a list contains two tables. One is the table contains three columns: "dataset", "file" and "md5". Another one is the duplication table whose row indicates MD5 fingerprint and whose column indicates dataset, table cell indicates the corresponding filename.

Examples

Run this code
rootDir<-paste0(dirname(tempdir()), "/DupChecker")
datafile<-buildFileTable(rootDir=rootDir)
if(nrow(datafile) > 0){
  result<-validateFile(datafile)
  if(result$hasdup){
    duptable<-result$duptable
    write.csv(duptable, file="duptable.csv")
  }
}

Run the code above in your browser using DataLab