Learn R Programming

⚠️There's a newer version (1.10.0) of this package.Take me there.

arkhe

Overview

A dependency-free collection of simple functions for cleaning rectangular data. This package allows to detect, count and replace values or discard rows/columns using a predicate function. In addition, it provides tools to check conditions and return informative error messages.


To cite arkhe in publications use:

Frerebeau N (2024). arkhe: Tools for Cleaning Rectangular Data. Université Bordeaux Montaigne, Pessac, France. doi:10.5281/zenodo.3526659 https://doi.org/10.5281/zenodo.3526659, R package version 1.9.0, https://packages.tesselle.org/arkhe/.

This package is a part of the tesselle project https://www.tesselle.org.

Installation

You can install the released version of arkhe from CRAN with:

install.packages("arkhe")

And the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("tesselle/arkhe")

Usage

## Load the package
library(arkhe)

## Set seed for reproductibility
set.seed(12345)

## Create a matrix
X <- matrix(sample(1:10, 25, TRUE), nrow = 5, ncol = 5)

## Add NA
k <- sample(1:25, 3, FALSE)
X[k] <- NA
X
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8   NA    7   10    7
#> [4,]   NA   NA    6    3    2
#> [5,]    8   10    1    9    4

## Count missing values in rows
count(X, f = is.na, margin = 1)
#> [1] 0 0 1 2 0

## Count non-missing values in columns
count(X, f = is.na, margin = 2, negate = TRUE)
#> [1] 4 3 5 5 5

## Find row with NA
detect(X, f = is.na, margin = 1)
#> [1] FALSE FALSE  TRUE  TRUE FALSE

## Find column without any NA
detect(X, f = is.na, margin = 2, negate = TRUE, all = TRUE)
#> [1] FALSE FALSE  TRUE  TRUE  TRUE

## Remove row with any NA
discard(X, f = is.na, margin = 1, all = FALSE)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8   10    1    9    4

## Remove column with any NA
discard(X, f = is.na, margin = 2, all = FALSE)
#>      [,1] [,2] [,3]
#> [1,]    1    4    4
#> [2,]    8    8   10
#> [3,]    7   10    7
#> [4,]    6    3    2
#> [5,]    1    9    4

## Replace NA with zeros
replace_NA(X, value = 0)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8    0    7   10    7
#> [4,]    0    0    6    3    2
#> [5,]    8   10    1    9    4

Translation

This package provides translations of user-facing communications, like messages, warnings and errors. The preferred language is by default taken from the locale. This can be overridden by setting of the environment variable LANGUAGE (you only need to do this once per session):

Sys.setenv(LANGUAGE = "<language code>")

Languages currently available are English (en) and French (fr).

Contributing

Please note that the arkhe project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('arkhe')

Monthly Downloads

866

Version

1.9.0

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Nicolas Frerebeau

Last Published

December 9th, 2024

Functions in arkhe (1.9.0)

assert_numeric

Check Numeric Values
assert_length

Check Object Length(s)
arkhe-deprecated

Deprecated Functions in arkhe
assert_square

Check Matrix
assert_lower

Check Numeric Relations
arkhe-package

arkhe: Tools for Cleaning Rectangular Data
assert_type

Check Data Types
assert_constant

Check Numeric Trend
assert_dim

Check Object Dimensions
append_column

Add a (Named) Vector as a Column
assert_missing

Check Missing Values
append_rownames

Convert Row Names to an Explicit Column
assert_unique

Check Duplicates
clean_whitespace

Remove Leading/Trailing Whitespace
confidence_binomial

Confidence Interval for Binomial Proportions
assert_package

Check the Availability of a Package
confidence_mean

Confidence Interval for a Mean
describe

Data Description
assert_names

Check Object Names
interval_credible

Bayesian Credible Interval
detect

Find Rows/Columns Using a Predicate
compact

Remove Empty Rows/Columns
is_scalar

Scalar Type Predicates
discard

Remove Rows/Columns Using a Predicate
math_gcd

Greatest Common Divisor
assign

Assign a Specific Row/Column to the Column/Row Names
bootstrap

Bootstrap Estimation
confidence_multinomial

Confidence Interval for Multinomial Proportions
math_lcm

Least Common Multiple
interval_hdr

Highest Density Regions
jackknife

Jackknife Estimation
count

Count Values Using a Predicate
predicate-data

Utility Predicates
keep

Keep Rows/Columns Using a Predicate
check_class

Class Diagnostic
null

Default value for NULL
label_percent

Label Percentages
predicate-trend

Numeric Trend Predicates
predicate-attributes

Attributes Predicates
remove_constant

Remove Constant Columns
get

Get Rows/Columns by Name
predicate-matrix

Matrix Predicates
predicate-names

Names Predicates
predicate-numeric

Numeric Predicates
predicate-type

Type Predicates
remove_zero

Remove Rows/Columns with Zeros
replace_Inf

Replace Infinite Values
remove_Inf

Remove Rows/Columns with Infinite Values
remove_NA

Remove Rows/Columns with Missing Values
sparsity

Sparsity
remove_empty

Remove Rows/Columns with Empty String
concat

Concatenate
conditions

Conditions
replace_NA

Replace Missing Values
replace_zero

Replace Zeros
scale_midpoint

Rescale Continuous Vector (minimum, midpoint, maximum)
validate

Validate a Condition
scale_range

Rescale Continuous Vector (minimum, maximum)
replace_empty

Replace Empty String
seek

Search Rows/Columns by Name
with_seed

Evaluate an Expression with a Temporarily Seed
assert_empty

Check Object Filling
assert_infinite

Check Infinite Values