
Javier Luraschi
18 packages on CRAN
1 packages on GitHub
Enables 'DBI' compliant packages to integrate with the 'RStudio' connections pane, and the 'pins' package. It automates the display of schemata, tables, views, as well as the preview of the table's top 1000 records.
Low-level socket-based interface to calling the Spark API via the RBackend server included in Spark.
'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.
Interface to the Google Cloud Machine Learning Platform <https://cloud.google.com/ml-engine>, which provides cloud tools for training machine learning models.
R binds 'GeoSpark' <http://geospark.datasyslab.org/> extending 'sparklyr' <https://spark.rstudio.com/> R package to make distributed 'geocomputing' easier. Sf is a package that provides [simple features] <https://en.wikipedia.org/wiki/Simple_Features> access for R and which is a leading 'geospatial' data processing tool. 'Geospark' R package bring the same simple features access like sf but running on Spark distributed system.
Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.
R interface to 'MLflow', open source platform for the complete machine learning life cycle, see <https://mlflow.org/>. This package supports installing 'MLflow', tracking experiments, creating and running projects, and saving and serving models.
A tool for drawing sassy 'UML' diagrams based on a simple syntax, see <https://www.nomnoml.com>. Supports styling, R Markdown and exporting diagrams in the PNG format.
Pin remote resources into a local cache to work offline, improve speed and avoid recomputing; discover and share resources in local folders, 'GitHub', 'Kaggle' or 'RStudio Connect'. Resources can be anything from 'CSV', 'JSON', or image files to arbitrary R objects.
Provides tools to show and draw image pixels using 'HTML' widgets and 'Shiny' applications. It can be used to visualize the 'MNIST' dataset for handwritten digit recognition or to create new image recognition datasets.
'Hail' is an open-source, general-purpose, 'python' based data analysis tool with additional data types and methods for working with genomic data, see <https://hail.is/>. 'Hail' is built to scale and has first-class support for multi-dimensional structured data, like the genomic data in a genome-wide association study (GWAS). 'Hail' is exposed as a 'python' library, using primitives for distributed queries and linear algebra implemented in 'scala', 'spark', and increasingly 'C++'. The 'sparkhail' is an R extension using 'sparklyr' package. The idea is to help R users to use 'hail' functionalities with the well-know 'tidyverse' syntax, see <https://www.tidyverse.org/>.
R interface to Apache Spark, a fast and general engine for big data processing, see <http://spark.apache.org>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.
A collection of 'HTML', 'JavaScript', and 'CSS' assets that dynamically generate beautiful documentation from a 'Swagger' compliant API: <https://swagger.io/specification/>.
Tools to deploy 'TensorFlow' <https://www.tensorflow.org/> models across multiple services. Currently, it provides a local server for testing 'cloudml' compatible services.
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <arXiv:1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
This is a 'sparklyr' extension integrating 'VariantSpark' and R. 'VariantSpark' is a framework based on 'scala' and 'spark' to analyze genome datasets, see <https://bioinformatics.csiro.au/>. It was tested on datasets with 3000 samples each one containing 80 million features in either unsupervised clustering approaches and supervised applications, like classification and regression. The genome datasets are usually writing in VCF, a specific text file format used in bioinformatics for storing gene sequence variations. So, 'VariantSpark' is a great tool for genome research, because it is able to read VCF files, run analyses and return the output in a 'spark' data frame.