AzureStor (version 2.0.1)

list_adls_files: Operations on an Azure Data Lake Storage Gen2 filesystem

Description

Upload, download, or delete a file; list files in a directory; create or delete directories.

Usage

list_adls_files(filesystem, dir = "/", info = c("all", "name"),
  recursive = FALSE)

multiupload_adls_file(filesystem, src, dest, blocksize = 2^22, lease = NULL, use_azcopy = FALSE, max_concurrent_transfers = 10)

upload_adls_file(filesystem, src, dest, blocksize = 2^24, lease = NULL, use_azcopy = FALSE)

multidownload_adls_file(filesystem, src, dest, overwrite = FALSE, use_azcopy = FALSE, max_concurrent_transfers = 10)

download_adls_file(filesystem, src, dest, overwrite = FALSE, use_azcopy = FALSE)

delete_adls_file(filesystem, file, confirm = TRUE)

create_adls_dir(filesystem, dir)

delete_adls_dir(filesystem, dir, recursive = FALSE, confirm = TRUE)

Arguments

filesystem

An ADLSgen2 filesystem object.

dir, file

A string naming a directory or file respectively.

info

Whether to return names only, or all information in a directory listing.

recursive

For list_adls_files, and delete_adls_dir, whether the operation should recurse through subdirectories. For delete_adls_dir, this must be TRUE to delete a non-empty directory.

src, dest

The source and destination files for uploading and downloading. Paths are allowed. For uploading, src can also be a textConnection or rawConnection object to allow transferring in-memory R objects without creating a temporary file.

blocksize

The number of bytes to upload per HTTP(S) request.

lease

The lease for a file, if present.

use_azcopy

Whether to use the AzCopy utility from Microsoft to do the transfer, rather than doing it in R.

max_concurrent_transfers

For multiupload_adls_file and multidownload_adls_file, the maximum number of concurrent file transfers. Each concurrent file transfer requires a separate R process, so limit this if you are low on memory.

overwrite

When downloading, whether to overwrite an existing destination file.

confirm

Whether to ask for confirmation on deleting a file or directory.

Value

For list_adls_files, if info="name", a vector of file/directory names. If info="all", a data frame giving the file size and whether each object is a file or directory.

For download_adls_file, if dest=NULL, the contents of the downloaded file as a raw vector.

Details

upload_adls_file and download_adls_file are the workhorse file transfer functions for ADLSgen2 storage. They each take as inputs a single filename or connection as the source for uploading/downloading, and a single filename as the destination.

multiupload_adls_file and multidownload_adls_file are functions for uploading and downloading multiple files at once. They parallelise file transfers by deploying a pool of R processes in the background, which can lead to significantly greater efficiency when transferring many small files. They take as input a wildcard pattern as the source, which expands to one or more files. The dest argument should be a directory.

The file transfer functions also support working with connections to allow transferring R objects without creating temporary files. For uploading, src can be a textConnection or rawConnection object. For downloading, dest can be NULL or a rawConnection object. In the former case, the downloaded data is returned as a raw vector, and for the latter, it will be placed into the connection. See the examples below.

By default, download_adls_file will display a progress bar as it is downloading. To turn this off, use options(azure_dl_progress_bar=FALSE). To turn the progress bar back on, use options(azure_dl_progress_bar=TRUE).

See Also

adls_filesystem, az_storage, storage_download, call_azcopy

Examples

Run this code
# NOT RUN {
fs <- adls_filesystem("https://mystorage.dfs.core.windows.net/myfilesystem", key="access_key")

list_adls_files(fs, "/")
list_adls_files(fs, "/", recursive=TRUE)

create_adls_dir(fs, "/newdir")

upload_adls_file(fs, "~/bigfile.zip", dest="/newdir/bigfile.zip")
download_adls_file(fs, "/newdir/bigfile.zip", dest="~/bigfile_downloaded.zip")

delete_adls_file(fs, "/newdir/bigfile.zip")
delete_adls_dir(fs, "/newdir")

# uploading/downloading multiple files at once
multiupload_adls_file(fs, "/data/logfiles/*.zip")
multidownload_adls_file(fs, "/monthly/jan*.*", "/data/january")

# uploading serialized R objects via connections
json <- jsonlite::toJSON(iris, pretty=TRUE, auto_unbox=TRUE)
con <- textConnection(json)
upload_adls_file(fs, con, "iris.json")

rds <- serialize(iris, NULL)
con <- rawConnection(rds)
upload_adls_file(fs, con, "iris.rds")

# downloading files into memory: as a raw vector, and via a connection
rawvec <- download_adls_file(fs, "iris.json", NULL)
rawToChar(rawvec)

con <- rawConnection(raw(0), "r+")
download_adls_file(fs, "iris.rds", con)
unserialize(con)

# }

Run the code above in your browser using DataLab