Ledger of scraping status of each objects. Allows different type of states: queued, scraping, scraped, failed, exception, skipped
ScraperLedger.class
R6Class
object.
initialize()
initializes the object
addFilename(source, source.filename)
add filename to the ledger
getIdFilename(source, source.filename)
Returns id/row of source and filenames parameters in the ledger
getCRCFilename(source, source.filename)
Returns CRC of the source filename for storing in db folder
updateStatus(source, source.filename, status, status.field = 'status', scraped.polyhedron = NA, obs ='')
Updates status of source and filenames parameters in Ledger
savePreloadedData()
Internal method which saves a file with an estimation of time required time to scrape each filename
loadPreloadedData()
Load a file with an estimation of time required time to scrape each filename
getSizeToTimeScrape(sources, time2scrape = 60)
Estimates how much filenames could be scraped in a time frame, considering data retrieved with loadPreloadedData
resetStatesMetrics()
Reset metrics of application of different status values
countStatusUse(status.field,status)
Add an use to the metrics of status.field and status parameters
getFilenamesStatusMode(mode,sources = sort(unique(self$df$source)),max.quant = 0,order.by.vertices.faces = FALSE)
Get a list of the filenames in the ledger with a defined mode (status agrupation)
getFilenamesStatus(status,sources = sort(unique(self$df$source)),max.quant = 0,order.by.vertices.faces = FALSE)
Get a list of the filenames in the ledger with specified status
For flexible and reproducible configuration for package development