Learn R Programming

arkhe

Overview

A dependency-free collection of simple functions for cleaning rectangular data. This package allows to detect, count and replace values or discard rows/columns using a predicate function. In addition, it provides tools to check conditions and return informative error messages.


To cite arkhe in publications use:

Frerebeau N (2025). arkhe: Tools for Cleaning Rectangular Data. Université Bordeaux Montaigne, Pessac, France. doi:10.5281/zenodo.3526659 https://doi.org/10.5281/zenodo.3526659, R package version 1.11.0, https://packages.tesselle.org/arkhe/.

This package is a part of the tesselle project https://www.tesselle.org.

Installation

You can install the released version of arkhe from CRAN with:

install.packages("arkhe")

And the development version from Codeberg with:

# install.packages("remotes")
remotes::install_git("https://codeberg.org/tesselle/arkhe")

Usage

## Load the package
library(arkhe)

## Set seed for reproductibility
set.seed(12345)

## Create a matrix
X <- matrix(sample(1:10, 25, TRUE), nrow = 5, ncol = 5)

## Add NA
k <- sample(1:25, 3, FALSE)
X[k] <- NA
X
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8   NA    7   10    7
#> [4,]   NA   NA    6    3    2
#> [5,]    8   10    1    9    4

## Count missing values in rows
count(X, f = is.na, margin = 1)
#> [1] 0 0 1 2 0

## Count non-missing values in columns
count(X, f = is.na, margin = 2, negate = TRUE)
#> [1] 4 3 5 5 5

## Find row with NA
detect(X, f = is.na, margin = 1)
#> [1] FALSE FALSE  TRUE  TRUE FALSE

## Find column without any NA
detect(X, f = is.na, margin = 2, negate = TRUE, all = TRUE)
#> [1] FALSE FALSE  TRUE  TRUE  TRUE

## Remove row with any NA
discard(X, f = is.na, margin = 1, all = FALSE)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8   10    1    9    4

## Remove column with any NA
discard(X, f = is.na, margin = 2, all = FALSE)
#>      [,1] [,2] [,3]
#> [1,]    1    4    4
#> [2,]    8    8   10
#> [3,]    7   10    7
#> [4,]    6    3    2
#> [5,]    1    9    4

## Replace NA with zeros
replace_NA(X, value = 0)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8    0    7   10    7
#> [4,]    0    0    6    3    2
#> [5,]    8   10    1    9    4

Translation

This package provides translations of user-facing communications, like messages, warnings and errors. The preferred language is by default taken from the locale. This can be overridden by setting of the environment variable LANGUAGE (you only need to do this once per session):

Sys.setenv(LANGUAGE = "<language code>")

Languages currently available are English (en), French (fr) and Spanish (es).

Contributing

Please note that the arkhe project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('arkhe')

Monthly Downloads

866

Version

1.11.0

License

GPL (>= 3)

Maintainer

Nicolas Frerebeau

Last Published

May 12th, 2025

Functions in arkhe (1.11.0)

assert_square

Check Matrix
assert_missing

Check Missing Values
assert_numeric

Check Numeric Values
assign

Assign a Specific Row/Column to the Column/Row Names
check_class

Class Diagnostic
count

Count Values Using a Predicate
concat

Concatenate
get

Get Rows/Columns by Name
confidence_multinomial

Confidence Interval for Multinomial Proportions
confidence_mean

Confidence Interval for a Mean
interval_credible

Bayesian Credible Interval
jackknife

Jackknife Estimation
confidence_binomial

Confidence Interval for Binomial Proportions
keep

Keep Rows/Columns Using a Predicate
confidence_bootstrap

Nonparametric Bootstrap Confidence Interval
assert_package

Check the Availability of a Package
compact

Remove Empty Rows/Columns
is_scalar

Scalar Type Predicates
clean_whitespace

Remove Leading/Trailing Whitespace
interval_hdr

Highest Density Regions
discard

Remove Rows/Columns Using a Predicate
math_lcm

Least Common Multiple
detect

Find Rows/Columns Using a Predicate
predicate-type

Type Predicates
null

Default value for NULL
describe

Data Description
remove_Inf

Remove Rows/Columns with Infinite Values
label_percent

Label Percentages
replace_Inf

Replace Infinite Values
replace_zero

Replace Zeros
replace_empty

Replace Empty String
predicate-matrix

Matrix Predicates
math_gcd

Greatest Common Divisor
predicate-names

Names Predicates
predicate-attributes

Attributes Predicates
predicate-data

Utility Predicates
seek

Search Rows/Columns by Name
replace_NA

Replace Missing Values
sparsity

Sparsity
remove_empty

Remove Rows/Columns with Empty String
validate

Validate a Condition
resample_uniform

Draw Uniform Random Sample
resample_multinomial

Draw Multinomial Random Sample
with_seed

Evaluate an Expression with a Temporarily Seed
remove_NA

Remove Rows/Columns with Missing Values
scale_midpoint

Rescale Continuous Vector (minimum, midpoint, maximum)
scale_range

Rescale Continuous Vector (minimum, maximum)
remove_zero

Remove Rows/Columns with Zeros
predicate-trend

Numeric Trend Predicates
predicate-numeric

Numeric Predicates
remove_constant

Remove Constant Columns
conditions

Conditions
assert_empty

Check Object Filling
append_column

Add a (Named) Vector as a Column
assert_lower

Check Numeric Relations
append_rownames

Convert Row Names to an Explicit Column
assert_length

Check Object Length(s)
assert_dim

Check Object Dimensions
arkhe-deprecated

Deprecated Functions in arkhe
arkhe-package

arkhe: Tools for Cleaning Rectangular Data
assert_constant

Check Numeric Trend
assert_infinite

Check Infinite Values
assert_names

Check Object Names
bootstrap

Nonparametric Bootstrap Estimation
assert_unique

Check Duplicates
assert_type

Check Data Types