Samuel Macedo

Samuel Macedo

4 packages on CRAN

99.99th

Percentile

This is an R wrapper from the AWS Command Line Interface that provides methods to manage the user configuration on Amazon Web Service. You can create as many profiles as you want, manage them, and delete them. The profiles created with this tool work with all AWS products such as S3, Glacier, and EC2. It also provides a function to automatically install AWS CLI, but you can download it and install it manually if you prefer.

sparkhail

cran
99.99th

Percentile

'Hail' is an open-source, general-purpose, 'python' based data analysis tool with additional data types and methods for working with genomic data, see <https://hail.is/>. 'Hail' is built to scale and has first-class support for multi-dimensional structured data, like the genomic data in a genome-wide association study (GWAS). 'Hail' is exposed as a 'python' library, using primitives for distributed queries and linear algebra implemented in 'scala', 'spark', and increasingly 'C++'. The 'sparkhail' is an R extension using 'sparklyr' package. The idea is to help R users to use 'hail' functionalities with the well-know 'tidyverse' syntax, see <https://www.tidyverse.org/>.

99.99th

Percentile

This is a 'sparklyr' extension integrating 'VariantSpark' and R. 'VariantSpark' is a framework based on 'scala' and 'spark' to analyze genome datasets, see <https://bioinformatics.csiro.au/>. It was tested on datasets with 3000 samples each one containing 80 million features in either unsupervised clustering approaches and supervised applications, like classification and regression. The genome datasets are usually writing in VCF, a specific text file format used in bioinformatics for storing gene sequence variations. So, 'VariantSpark' is a great tool for genome research, because it is able to read VCF files, run analyses and return the output in a 'spark' data frame.

sparklyr

cran
99.99th

Percentile

R interface to Apache Spark, a fast and general engine for big data processing, see <http://spark.apache.org>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.