Learn R Programming

kibior: easy scientific data handling, searching and sharing with Elasticsearch

Version: 0.1.1

TL;DR

Whatkibior is a R package dedicated to ease the pain of data handling in science, and more notably with biological data.
Wherekibior is using Elasticsearch as database and search engine.
Whokibior is built for data science and data manipulation, so when any data-related action or need is involved, notably sharing data. It mainly targets bioinformaticians, and more broadly, data scientists.
WhenAvailable now from this repository, or CRAN repository.
Public instancesUse the $get_kibio_instance() method to connect to Kibio and access known datasets. See Kibio datasets at the end of this document for a complete list.
Cite this packageIn R session, run citation("kibior")
Publicationcoming soon.

Main features

This package allows:

  • Pushing, pulling, joining, sharing and searching tabular data between an R session and one or multiple Elasticsearch instances/clusters.
  • Massive data query and filter with Elasticsearch engine.
  • Multiple living Elasticsearch connections to different addresses.
  • Method autocompletion in proper environments (e.g. R cli, RStudio).
  • Import and export datasets from an to files.
  • Server-side execution for most of operations (i.e. on Elasticsearch instances/clusters).

How

Install

# Get from CRAN
install.packages("kibior")

# or get the latest from Github
devtools::install_github("regisoc/kibior")

Run

# load
library(kibior)

# Get a specific instance
kc <- Kibior$new("server_or_address", port)

# Or try something bigger...
kibio <- Kibior$get_kibio_instance()
kibio$list()

Examples

Here is an extract of some of the features proposed by KibioR. See Introduction vignette for more advanced usage.

Example: push datasets

# Push data (R memory -> Elasticsearch)
dplyr::starwars %>% kc$push("sw")
dplyr::storms %>% kc$push("st")

Example: pull datasets

# Pull data with columns selection (Elasticsearch -> R memory)
kc$pull("sw", query = "homeworld:(naboo || tatooine)", 
              columns = c("name", "homeworld", "height", "mass", "species"))
# see vignette for query syntax

Example: copy datasets

# Copy dataset (Elasticsearch internal operation)
kc$copy("sw", "sw_copy")

Example: delete datasets


# Delete datasets
kc$delete("sw_copy")

Example: list, match dataset names

# List available datasets
kc$list()

# Search for index names starting with "s"
kc$match("s*")

Example: get columns names and list unique keys in values

# Get columns of all datasets starting with "s"
kc$columns("s*")

# Get unique values of a column
kc$keys("sw", "homeworld")

Example: some Elasticsearch basic statistical methods

# Count number of lines in dataset
kc$count("st")

# Count number of lines with query (name of the storm is Anita)
kc$count("st", query = "name:anita")

# Generic stats on two columns
kc$stats("sw", c("height", "mass"))

# Specific descriptive stats with query
kc$avg("sw", c("height", "mass"), query = "homeworld:naboo")

Example: join

# Inner join between:
#   1/ a Elasticsearch-based dataset with query ("sw"), 
#   2/ and a in-memory R dataset (dplyr::starwars) 
kc$inner_join("sw", dplyr::starwars, 
              left_query = "hair_color:black",
              left_columns = c("name", "mass", "height"),
              by = "name")

Copy Link

Version

Install

install.packages('kibior')

Monthly Downloads

30

Version

0.1.1

License

GPL-2

Issues

Pull Requests

Stars

Forks

Maintainer

R<c3><a9>gis Ongaro-Carcy

Last Published

January 28th, 2021

Functions in kibior (0.1.1)

Kibior not-equals operator

Kibior not-equals operator
Kibior equals operator

Kibior equals operator
Static - Tests if packages are installed

Static - Tests if packages are installed
Static - Kibior is instance

Static - Kibior is instance
Static - initiate a direct instance to Kibio public repository

Static - initiate a direct instance to Kibio public repository
kibior

KibioR, an Kibio and Elasticsearch data manipulation package.