git2rdata (version 0.1)

relabel: Relabel Factor Levels by Updating the Metadata

Description

Imagine the situation where we have a dataframe with a factor variable and we have stored it with write_vc(optimize = TRUE). The raw data file contains the factor indices and the metadata contains the link between the factor index and the corresponding label. See vignette("version_control", package = "git2rdata"). In such a case, relabelling a factor can be fast and lightweight by updating the metadata.

Usage

relabel(file, root = ".", change)

Arguments

file

the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. file is a path relative to the root.

root

The root of a project. Can be a file path or a git-repository. Defaults to the current working directory (".").

change

either a list or a data.frame. In case of a list is a named list with named vectors. The names of list elements must match the names of the variables. The names of the vector elements must match the existing factor labels. The values represent the new factor labels. In case of a data.frame it needs to have the variables factor (name of the factor), old (the old) factor label and new (the new factor label). relabel() ignores all other columns.

Value

invisible NULL.

See Also

Other storage: list_data, prune_meta, read_vc, rm_data, write_vc

Examples

Run this code
# NOT RUN {
# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "alice@example.org")

# Create a dataframe and store it as an optimized git2rdata object.
# Note that write_vc() uses optimization by default.
# Stage and commit the git2rdata object.
ds <- ds <- data.frame(a = c("a1", "a2"), b = c("b2", "b1"))
junk <- write_vc(ds, "relabel", repo, sorting = "b", stage = TRUE)
cm <- commit(repo, "initial commit")
# check that the workspace is clean
status(repo)

# Define new labels as a list and apply them to the git2rdata object.
new_labels <- list(
  a = list(a2 = "a3")
)
relabel("relabel", repo, new_labels)
# check the changes
read_vc("relabel", repo)
# relabel() changed the metadata, not the raw data
status(repo)
git2r::add(repo, "relabel.*")
cm <- commit(repo, "relabel using a list")

# Define new labels as a dataframe and apply them to the git2rdata object
change <- data.frame(
  factor = c("a", "a", "b"),
  old = c("a3", "a1", "b2"),
  new = c("c2", "c1", "b3")
)
relabel("relabel", repo, change)
# check the changes
read_vc("relabel", repo)
# relabel() changed the metadata, not the raw data
status(repo)

# clean up
junk <- file.remove(
  rev(list.files(repo_path, full.names = TRUE, recursive = TRUE,
                 include.dirs = TRUE, all.files = TRUE)),
  repo_path)
# }

Run the code above in your browser using DataCamp Workspace