Last chance! 50% off unlimited learning
Sale ends in
This function can be used to prepare R objects from remote or local data
sources. The object of this function is to provide a reproducible version of
a series of commonly used steps for getting, loading, and processing data.
This function has two stages: Getting data (download, extracting from archives,
loading into R) and postProcessing (for Spatial*
and Raster*
objects, this is crop, reproject, mask/intersect).
To trigger the first stage, provide url
or archive
.
To trigger the second stage, provide studyArea
or rasterToMatch
.
See examples.
prepInputs(targetFile = NULL, url = NULL, archive = NULL,
alsoExtract = NULL,
destinationPath = getOption("reproducible.destinationPath", "."),
fun = NULL, quick = getOption("reproducible.quick"),
overwrite = getOption("reproducible.overwrite", FALSE),
purge = FALSE, useCache = getOption("reproducible.useCache", FALSE),
...)
Character string giving the path to the eventual file
(raster, shapefile, csv, etc.) after downloading and extracting from a zip
or tar archive. This is the file before it is passed to
postProcess
. Currently, the internal checksumming does not checksum
the file after it is postProcess
ed (e.g., cropped/reprojected/masked).
Using Cache
around prepInputs
will do a sufficient job in these cases.
See table in preProcess
.
Optional character string indicating the URL to download from.
If not specified, then no download will be attempted. If not entry
exists in the CHECKSUMS.txt
(in destinationPath
), an entry
will be created or appended to. This CHECKSUMS.txt
entry will be used
in subsequent calls to
prepInputs
or preProcess
, comparing the file on hand with the ad hoc
CHECKSUMS.txt
. See table in preProcess
.
Optional character string giving the path of an archive
containing targetFile
, or a vector giving a set of nested archives
(e.g., c("xxx.tar", "inner.zip", "inner.rar")
). If there is/are (an) inner
archive(s), but they are unknown, the function will try all until it finds
the targetFile
. See table in preProcess
.
Optional character string naming files other than
targetFile
that must be extracted from the archive
. If
NULL
, the default, then it will extract all files. Other options:
"similar"
will extract all files with the same filename without
file extension as targetFile
. NA
will extract nothing other
than targetFile
. A character string of specific file names will cause
only those to be extracted. See table in preProcess
.
Character string of a directory in which to download
and save the file that comes from url
and is also where the function
will look for archive
or targetFile
. NOTE (still experimental):
To prevent repeated downloads in different locations, the user can also set
options("reproducible.inputPaths")
to one or more local file paths to
search for the file before attempting to download. Default for that option is
NULL
meaning do not search locally.
Function or character string indicating the function to use to load
targetFile
into an R
object, e.g., in form with package name:
"raster::raster"
.
Logical. Should downloading and all the other actions occur even if they pass the checksums or the files are all there.
Logical or Integer. 0/FALSE
(default) keeps existing
CHECKSUMS.txt
file and
prepInputs
will write or append to it. 1/TRUE
will deleted the entire
CHECKSUMS.txt
file. Other options, see details.
Passed to Cache in various places. Defaults to getOption("reproducible.useCache")
Additional arguments passed to fun
(i.e,. user supplied),
postProcess
and Cache
.
Since ...
is passed to postProcess
, these will
...
will also be passed into the inner
functions, e.g., cropInputs
. See details and examples.
See preProcess
for combinations of arguments.
Download from the web via either drive_download
,
download.file
;
Load into R using raster
,
shapefile
, or any other function passed in with fun
;
Checksumming of all files during this process. This is put into a
CHECKSUMS.txt
file in the destinationPath
, appending if it is
already there, overwriting the entries for same files if entries already exist.
This will be triggered if either rasterToMatch
or studyArea
is supplied.
Fix errors. Currently only errors fixed are for SpatialPolygons
using buffer(..., width = 0)
;
Crop using cropInputs
;
Project using projectInputs
;
Mask using maskInputs
;
Determine file name determineFilename
via filename2
;
Optionally, write that file name to disk via writeOutputs
.
NOTE: checksumming does not occur during the post-processing stage, as
there are no file downloads. To achieve fast results, wrap
prepInputs
with Cache
.
NOTE: sf
objects are still very experimental.
Raster*
and Spatial*
objects:If rasterToMatch
or studyArea
are used, then this will
trigger several subsequent functions, specifically the sequence,
Crop, reproject, mask, which appears to be a common sequence in
spatial simulation. See postProcess.spatialObjects
.
Understanding various combinations of rasterToMatch
and/or studyArea
:
Please see postProcess.spatialObjects
.
In options for control of purging the CHECKSUMS.txt
file are:
0 |
keep file |
1 |
delete file |
2 |
delete entry for targetFile |
4 |
delete entry for alsoExtract |
3 |
delete entry for archive |
5 |
delete entry for targetFile & alsoExtract |
6 |
delete entry for targetFile , alsoExtract & archive |
7 |
delete entry that is failing (i.e., for the file downloaded by the url ) |
will only remove entries in the CHECKSUMS.txt
that are associated with
targetFile
, alsoExtract
or archive
When prepInputs is called, it will write or append to a (if
already exists)
CHECKSUMS.txt
file. If the CHECKSUMS.txt
is not correct, use
this argument to remove it.
downloadFile
, extractFromArchive
,
downloadFile
, postProcess
.
# NOT RUN {
# This function works within a module; however, currently,
# \cde{sourceURL} is not yet working as desired. Use \code{url}.
# }
# NOT RUN {
# download a zip file from internet, unzip all files, load as shapefile, Cache the call
# First time: don't know all files - prepInputs will guess, if download file is an archive,
# then extract all files, then if there is a .shp, it will load with raster::shapefile
dPath <- file.path(tempdir(), "ecozones")
shpEcozone <- prepInputs(destinationPath = dPath,
url = "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip")
# Robust to partial file deletions:
unlink(dir(dPath, full.names = TRUE)[1:3])
shpEcozone <- prepInputs(destinationPath = dPath,
url = "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip")
unlink(dPath, recursive = TRUE)
# Once this is done, can be more precise in operational code:
# specify targetFile, alsoExtract, and fun, wrap with Cache
ecozoneFilename <- file.path(dPath, "ecozones.shp")
ecozoneFiles <- c("ecozones.dbf", "ecozones.prj",
"ecozones.sbn", "ecozones.sbx", "ecozones.shp", "ecozones.shx")
shpEcozone <- prepInputs(targetFile = ecozoneFilename,
url = "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip",
alsoExtract = ecozoneFiles,
fun = "shapefile", destinationPath = dPath)
unlink(dPath, recursive = TRUE)
#' # Add a study area to Crop and Mask to
# Create a "study area"
library(sp)
library(raster)
coords <- structure(c(-122.98, -116.1, -99.2, -106, -122.98, 59.9, 65.73, 63.58, 54.79, 59.9),
.Dim = c(5L, 2L))
Sr1 <- Polygon(coords)
Srs1 <- Polygons(list(Sr1), "s1")
StudyArea <- SpatialPolygons(list(Srs1), 1L)
crs(StudyArea) <- "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# specify targetFile, alsoExtract, and fun, wrap with Cache
ecozoneFilename <- file.path(dPath, "ecozones.shp")
# Note, you don't need to "alsoExtract" the archive... if the archive is not there, but the
# targetFile is there, it will not redownload the archive.
ecozoneFiles <- c("ecozones.dbf", "ecozones.prj",
"ecozones.sbn", "ecozones.sbx", "ecozones.shp", "ecozones.shx")
shpEcozoneSm <- Cache(prepInputs,
url = "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip",
targetFile = reproducible::asPath(ecozoneFilename),
alsoExtract = reproducible::asPath(ecozoneFiles),
studyArea = StudyArea,
fun = "shapefile", destinationPath = dPath,
filename2 = "EcozoneFile.shp") # passed to determineFilename
plot(shpEcozone)
plot(shpEcozoneSm, add = TRUE, col = "red")
unlink(dPath)
# Big Raster, with crop and mask to Study Area - no reprojecting (lossy) of raster,
# but the StudyArea does get reprojected, need to use rasterToMatch
dPath <- file.path(tempdir(), "LCC")
lcc2005Filename <- file.path(dPath, "LCC2005_V1_4a.tif")
url <- file.path("ftp://ftp.ccrs.nrcan.gc.ca/ad/NLCCLandCover",
"LandcoverCanada2005_250m/LandCoverOfCanada2005_V1_4.zip")
# messages received below may help for filling in more arguments in the subsequent call
LCC2005 <- prepInputs(url = url,
destinationPath = asPath(dPath),
studyArea = StudyArea)
plot(LCC2005)
# if wrapped with Cache, will be fast second time, very fast 3rd time (via memoised copy)
LCC2005 <- Cache(prepInputs, url = url,
targetFile = lcc2005Filename,
archive = asPath("LandCoverOfCanada2005_V1_4.zip"),
destinationPath = asPath(dPath),
studyArea = StudyArea)
# Using dlFun -- a custom download function -- passed to preProcess
test1 <- prepInputs(targetFile = "GADM_2.8_LUX_adm0.rds", # must specify currently
dlFun = "raster::getData", name = "GADM", country = "LUX", level = 0,
path = dPath)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab