file_in: Declare input files and directories.

Description

file_in() marks individual files (and whole directories) that your targets depend on.

Usage

file_in(...)

Arguments

...

Character vector, paths to files and directories.

Value

A character vector of declared input file or directory paths.

Keywords

drake_plan() understands special keyword functions for your commands. With the exception of target(), each one is a proper function with its own help file.

target(): declare more than just the command, e.g. assign a trigger or transform. Examples: https://ropenscilabs.github.io/drake-manual/plans.html#large-plans. # nolint
file_in(): declare an input file dependency.
file_out(): declare an output file to be produced when the target is built.
knitr_in(): declare a knitr file dependency such as an R Markdown (*.Rmd) or R LaTeX (*.Rnw) file.
ignore(): force drake to entirely ignore a piece of code: do not track it for changes and do not analyze it for dependencies.
no_deps(): tell drake to not track the dependencies of a piece of code. drake still tracks the code itself for changes.
drake_envir(): get the environment where drake builds targets. Intended for advanced custom memory management.

Details

As of drake 7.4.0, file_in() and file_out() have experimental support for URLs. If the file name begins with "http://", "https://", or "ftp://", make() attempts to check the ETag to see if the data changed from last time. If no ETag can be found, drake simply uses the ETag from last make() and registers the file as unchanged (which prevents your workflow from breaking if you lose internet access). If this approach to tracking remote data does not work for you, consider a custom trigger: https://ropenscilabs.github.io/drake-manual/triggers.html.

Examples

Run this code

# NOT RUN {
isolate_example("contain side effects", {
# The `file_out()` and `file_in()` functions
# just takes in strings and returns them.
file_out("summaries.txt")
# Their main purpose is to orchestrate your custom files
# in your workflow plan data frame.
plan <- drake_plan(
  out = write.csv(mtcars, file_out("mtcars.csv")),
  contents = read.csv(file_in("mtcars.csv"))
)
plan
# drake knows "\"mtcars.csv\"" is the first target
# and a dependency of `contents`. See for yourself:

make(plan)
file.exists("mtcars.csv")

# You can also work with entire directories this way.
# However, in `file_out("your_directory")`, the directory
# becomes an entire unit. Thus, `file_in("your_directory")`
# is more appropriate for subsequent steps than
# `file_in("your_directory/file_inside.txt")`.
plan <- drake_plan(
  out = {
    dir.create(file_out("dir"))
    write.csv(mtcars, "dir/mtcars.csv")
  },
  contents = read.csv(file.path(file_in("dir"), "mtcars.csv"))
)
plan

make(plan)
file.exists("dir/mtcars.csv")

# See the connections that the file relationships create:
config <- drake_config(plan)
vis_drake_graph(config)
})
# }

Run the code above in your browser using DataLab