Create the aws
argument of tar_resources()
to specify optional settings to AWS for
tar_target(..., repository = "aws")
.
See the format
argument of tar_target()
for details.
tar_resources_aws(
bucket,
prefix = targets::path_objects_dir_cloud(),
region = NULL,
part_size = 5 * (2^20),
endpoint = NULL
)
Character of length 1, name of an existing bucket to upload and download the return values of the affected targets during the pipeline.
Character of length 1, "directory path" in the bucket where the target return values are stored.
Character of length 1, AWS region containing the S3 bucket.
Set to NULL
to use the default region.
Positive numeric of length 1, number of bytes for each part of a multipart upload. (Except the last part, which is the remainder.) In a multipart upload, each part must be at least 5 MB.
Character of length 1, URL endpoint for S3 storage.
Defaults to the Amazon AWS endpoint if NULL
. Example:
To use the S3 protocol with Google Cloud Storage,
set endpoint = "https://storage.googleapis.com"
and region = "auto"
. Also make sure to create
HMAC access keys in the Google Cloud Storage console
(under Settings => Interoperability) and set the
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment
variables accordingly. After that, you should be able to use
S3 storage formats with Google Cloud storage buckets.
There is one limitation, however: even if your bucket has
object versioning turned on, targets
may fail to record object
versions. Google Cloud Storage in particular has this
incompatibility.
Object of class "tar_resources_aws"
, to be supplied
to the aws
argument of tar_resources()
.
Functions tar_target()
and tar_option_set()
each takes an optional resources
argument to supply
non-default settings of various optional backends for data storage
and high-performance computing. The tar_resources()
function
is a helper to supply those settings in the correct manner.
Resources are all-or-nothing: if you specify any resources
with tar_target()
, all the resources from tar_option_get("resources")
are dropped for that target. In other words, if you write
tar_option_set(resources = resources_1)
and then
tar_target(x, my_command(), resources = resources_2)
, then everything
in resources_1
is discarded for target x
.
See the cloud storage section of https://books.ropensci.org/targets/data.html for details for instructions.
Other resources:
tar_resources_clustermq()
,
tar_resources_feather()
,
tar_resources_fst()
,
tar_resources_future()
,
tar_resources_gcp()
,
tar_resources_parquet()
,
tar_resources_qs()
,
tar_resources_url()
,
tar_resources()
# NOT RUN {
# Somewhere in you target script file (usually _targets.R):
tar_target(
name,
command(),
format = "qs",
repository = "aws",
resources = tar_resources(
aws = tar_resources_aws(bucket = "yourbucketname"),
qs = tar_resources_qs(preset = "fast")
)
)
# }
Run the code above in your browser using DataLab