Learn R Programming


output: pdf_document: default html_document: default

cocoreg

cocoreg is an R-package for extracting shared variation in collections of datasets using regression models. The current stable release is available in CRAN:

http://cran.r-project.org/package=cocoreg

The algorithm is described in the paper:

Using regression makes extraction of shared variation in multiple datasets easy: Jussi Korpela, Andreas Henelius, Lauri Ahonen, Arto Klami, Kai Puolamäki, Data Mining and Knowledge Discovery, 2016. URL: http://dx.doi.org/10.1007/s10618-016-0465-y

The authors' version is available in this repository as cocoreg_plain.pdf. The final publication will be available at link.springer.com.

Usage

A minimal usage example:

library(cocoreg)
dc <- create_syn_data_toy()
ccr <- cocoreg(dc$data)
shared.by.all.df <- variation_shared_by(dc, 'all') #only on synthetic datasets
ggplot_dclst(list(observed = dc$data, shared = shared.by.all.df, cocoreg = ccr$data))


library(reshape) #importing from namespace does not work as expected
ggcompare_dclst(list(shared = shared.by.all.df, cocoreg = ccr$data))

Overview

The most important functions in cocoreg are:

  • cocoreg() which extracts shared variation from a collection of datasets

  • Functions to visualize output such as ggplot_dclst() and ggcompare_dclst() for lists of data collections, ggplot_dflst() for lists of data.frames (i.e. one data collection) and ggplot_df() for a single data.frame (a dataset)

Installation

Install the release version from CRAN:

install.packages("cocoreg")

Or the development version from GitHub:

# install.packages("devtools")
# library(devtools)
devtools::install_github("bwrc/cocoreg-r")

Copy Link

Version

Install

install.packages('cocoreg')

Monthly Downloads

8

Version

0.1.1

License

MIT + file LICENSE

Maintainer

Jussi Korpela

Last Published

May 30th, 2017

Functions in cocoreg (0.1.1)

apply_dc_meta

Apply extracted properties of a data collection to a data collection (restore)
average_R2_dflst

Computes the R^2 (variance explained) between two lists of data.frames
PCA_cocoreg_interface

PCA projection using cocoreg interface
RGCCA_cocoreg_interface

COCOREG style analysis using RGCCA projection
create_syn_data_uds

A data collection with one unrelated dataset
create_syn_data_uvar

A collection with unrelated variables
BGFA_cocoreg_interface

Apply GFA using the same interface as cocoreg()
BGFA_joint_info

Project BGFA components common to all datasets back to the original space
create_Z_linear

Contains functions to create synthetic datasets with different properties.
create_mappings

Generate all possible pairwise mappings between the given multivariate
dflst_pca

Apply PCA to the data after catenating data.frames horizontally
dl_remove_NA

Remove rows with NA values from a list of data.frames
mapping_glmnet

Define a mapping function using glmnet::glmnet
cocoreg

The Common Components by Regression (CoCoReg) algorithm
cocoreg_by_path

Compute D_joint for dataset i separately for all paths
create_syndata_mv

Create multivariate synthetic data
mapping_lm

Mapping stats::lm
rename_variables

Rename variables of a data collection
repmat

Replicate matrix to create a larger one
create_syndata_pwl

A non-linear data collection using piecewise linearity
df_ggplot_melt

Melt data.frame into ggplottable format
df_scale

Apply scale on a numeric data.frame
SCA_cocoreg_interface

SCA projection using cocoreg interface
add_notches

Add notch-like gaussian snippets to an existing signal x
data_matrix_rmse

Compute RMSE between data.matrices dm1 and dm2
mapping_rf

Mapping randomForest
mapping_rlm

Mapping MASS::rlm
row_suffle_variability

Determine the variability of matrices under row shuffling
dc_variability

Compute ds_variability for all datasets in a data collection
generate_mapping_function

Generate a mapping function between two datasets
generate_paths

Generate all/some paths between points
dflst2df

Catenate a list of data.frames vertically to a single data.frame
dflst2dfmelt

Combine a list of data.frames to a single molten data.frame
generate_paths_cyclic

Generate cyclic paths
generate_paths_noncyclic

Generate non-cyclic paths
ggplot_dflst

Plot a list of data.frames using ggplot2
make_data_gauss_2d

Make 2D gauss data (maybe obsolete)
mappings_R2_matrix

Extract R2 values from a list of mappings using summary()
se

Standard error of mean
create_syn_data_puvar

A data collection with variables that "become unrelated during measurement"
create_syn_data_toy

An illustrative synthetic data collection
cshift

Circularly shift vector elements
compose

Calculate the composition formed by applying all functions
compose_all

Calculate the average of the composition formed by applying all functions
df_scale_ols

Scales variables in data.frame dfx using ordinary least squares such
data_collections2ggdf

Catenate a set of data collections (lists of data.frames) into a single molted data.frame.
dflst_add_ds

Add a data.frame (dataset) to a list of data.frames (datasets)
dflst_dsnames2varnames

Append dataset names to variable names of the respective dataset
get_dc_meta

Extract important properties of data collection
dflst2array

Catenate a list of data.frames to a matrix along dim
dl_scale

Run scale() on a list of data.frames
ds_variability

Compute variability_components for a dataset
ggplot_dclst

Plotting data collections using ggplot
ggplot_df

Plotting data.frame using ggplot
mapping_lmridge

Define a mapping function using MASS::lm.ridge
mapping_pcr

Define a mapping function using pls::pcr
get_range_datalist

Get [min, max] of a list of numeric objects
nplst_reorder_grid

Reorders a nested list of ggplots
pathify

Create a path
to_unit_vec

Make vector of unit norm
get_starting_dataset

Helper function to get the starting dataset based on
ggcompare_dclst

Compare data collections variable by variable
mapping_svm

Mapping svm
mapping_svm_sigmoid

Mapping svm using sigmoid
rmse

Compute RMSE between vectors v1 and v2
rotation_matrix

A rotation matrix
wrapper_BGFA

Run BGFA by Klami et. al. using data format conventions of this repo
matrix_variability

Compute "variance" of the matrices using Frobenius norm.
validate_data

Validate a data collection for use with cocoreg
var_explained

Sum-of-squares values showing what portion of variance in dvec is explained
traverse_nested_list

Apply fun to the bottom level of a nested list structure
vecnorm

Compute Euclidean norm of vector
vector_variability

Compute "variance" of the vectors var()
ss

Sum of squares
svm_sigmoid

SVM using sigmoid kernel
variability_components

Compute total, within group and between group variability using fun
variation_shared_by

Return a specific variation component