ScraperLedger.class: ScraperLedger

Description

Ledger of scraping status of each objects. Allows different type of states: queued, scraping, scraped, failed, exception, skipped

ScraperLedger.class

R6Class object.

initialize(): initializes the object
addFilename(source, source.filename): add filename to the ledger
getIdFilename(source, source.filename): Returns id/row of source and filenames parameters in the ledger
getCRCFilename(source, source.filename): Returns CRC of the source filename for storing in db folder
updateStatus(source, source.filename, status, status.field = 'status', scraped.polyhedron = NA, obs =''): Updates status of source and filenames parameters in Ledger
savePreloadedData(): Internal method which saves a file with an estimation of time required time to scrape each filename
loadPreloadedData(): Load a file with an estimation of time required time to scrape each filename
getSizeToTimeScrape(sources, time2scrape = 60): Estimates how much filenames could be scraped in a time frame, considering data retrieved with loadPreloadedData
resetStatesMetrics(): Reset metrics of application of different status values
countStatusUse(status.field,status): Add an use to the metrics of status.field and status parameters
getFilenamesStatusMode(mode,sources = sort(unique(self$df$source)),max.quant = 0,order.by.vertices.faces = FALSE): Get a list of the filenames in the ledger with a defined mode (status agrupation)
getFilenamesStatus(status,sources = sort(unique(self$df$source)),max.quant = 0,order.by.vertices.faces = FALSE): Get a list of the filenames in the ledger with specified status

For flexible and reproducible configuration for package development