# partition v0.1.0

Monthly downloads

## Agglomerative Partitioning Framework for Dimension Reduction

A fast and flexible framework for agglomerative partitioning. 'partition' uses an approach
called Direct-Measure-Reduce to create new variables that maintain the
user-specified minimum level of information. Each reduced variable is also interpretable:
the original variables map to one and only one variable in the reduced data set. 'partition'
is flexible, as well: how variables are selected to reduce, how information loss is measured,
and the way data is reduced can all be customized.

## Readme

# partition

partition is a fast and flexible framework for agglomerative partitioning. partition uses an approach called Direct-Measure-Reduce to create new variables that maintain the user-specified minimum level of information. Each reduced variable is also interpretable: the original variables map to one and only one variable in the reduced data set. partition is flexible, as well: how variables are selected to reduce, how information loss is measured, and the way data is reduced can all be customized.

## Installation

You can install the development version of partition2 GitHub with:

```
# install.packages("remotes)
remotes::install_github("USCbiostats/partition")
```

## Example

```
library(partition)
set.seed(1234)
df <- simulate_block_data(c(3, 4, 5), lower_corr = .4, upper_corr = .6, n = 100)
# don't accept reductions where information < .6
prt <- partition(df, threshold = .6)
prt
#> Partitioner:
#> Director: Minimum Distance (Pearson)
#> Metric: Intraclass Correlation
#> Reducer: Scaled Mean
#>
#> Reduced Variables:
#> 1 reduced variables created from 2 observed variables
#>
#> Mappings:
#> reduced_var_1 = {block2_x3, block2_x4}
#>
#> Minimum information:
#> 0.602
# return reduced data
partition_scores(prt)
#> # A tibble: 100 x 11
#> block1_x1 block1_x2 block1_x3 block2_x1 block2_x2 block3_x1 block3_x2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -1.00 -0.344 1.35 -0.526 -1.25 1.13 0.357
#> 2 0.518 -0.434 -0.361 -1.48 -1.53 -0.317 0.290
#> 3 -1.77 -0.913 -0.722 0.122 0.224 -0.529 0.114
#> 4 -1.49 -0.998 0.189 0.149 -0.994 -0.433 0.0120
#> 5 0.616 0.0211 0.895 1.09 -1.25 0.440 -0.550
#> 6 0.0765 0.522 1.20 -0.152 -0.419 -0.912 -0.362
#> 7 1.74 0.0993 -0.654 -1.26 -0.502 -0.792 -1.03
#> 8 1.05 2.19 0.913 0.254 0.328 -1.07 -0.976
#> 9 -1.07 -0.292 -0.763 0.437 0.739 0.899 -0.342
#> 10 -1.02 -0.959 -1.33 -1.57 -1.11 0.618 0.153
#> # … with 90 more rows, and 4 more variables: block3_x3 <dbl>,
#> # block3_x4 <dbl>, block3_x5 <dbl>, reduced_var_1 <dbl>
# access mapping keys
mapping_key(prt)
#> # A tibble: 11 x 4
#> variable mapping information indices
#> <chr> <list> <dbl> <list>
#> 1 block1_x1 <chr [1]> 1 <int [1]>
#> 2 block1_x2 <chr [1]> 1 <int [1]>
#> 3 block1_x3 <chr [1]> 1 <int [1]>
#> 4 block2_x1 <chr [1]> 1 <int [1]>
#> 5 block2_x2 <chr [1]> 1 <int [1]>
#> 6 block3_x1 <chr [1]> 1 <int [1]>
#> 7 block3_x2 <chr [1]> 1 <int [1]>
#> 8 block3_x3 <chr [1]> 1 <int [1]>
#> 9 block3_x4 <chr [1]> 1 <int [1]>
#> 10 block3_x5 <chr [1]> 1 <int [1]>
#> 11 reduced_var_1 <chr [2]> 0.602 <int [2]>
unnest_mappings(prt)
#> # A tibble: 12 x 4
#> variable information mapping indices
#> <chr> <dbl> <chr> <int>
#> 1 block1_x1 1 block1_x1 1
#> 2 block1_x2 1 block1_x2 2
#> 3 block1_x3 1 block1_x3 3
#> 4 block2_x1 1 block2_x1 4
#> 5 block2_x2 1 block2_x2 5
#> 6 block3_x1 1 block3_x1 8
#> 7 block3_x2 1 block3_x2 9
#> 8 block3_x3 1 block3_x3 10
#> 9 block3_x4 1 block3_x4 11
#> 10 block3_x5 1 block3_x5 12
#> 11 reduced_var_1 0.602 block2_x3 6
#> 12 reduced_var_1 0.602 block2_x4 7
# use a lower threshold of information loss
partition(df, threshold = .5, partitioner = part_kmeans())
#> Partitioner:
#> Director: K-Means Clusters
#> Metric: Minimum Intraclass Correlation
#> Reducer: Scaled Mean
#>
#> Reduced Variables:
#> 2 reduced variables created from 7 observed variables
#>
#> Mappings:
#> reduced_var_1 = {block3_x1, block3_x2, block3_x5}
#> reduced_var_2 = {block2_x1, block2_x2, block2_x3, block2_x4}
#>
#> Minimum information:
#> 0.508
# use a custom partitioner
part_icc_rowmeans <- replace_partitioner(
part_icc,
reduce = as_reducer(rowMeans)
)
partition(df, threshold = .6, partitioner = part_icc_rowmeans)
#> Partitioner:
#> Director: Minimum Distance (Pearson)
#> Metric: Intraclass Correlation
#> Reducer: <custom reducer>
#>
#> Reduced Variables:
#> 1 reduced variables created from 2 observed variables
#>
#> Mappings:
#> reduced_var_1 = {block2_x3, block2_x4}
#>
#> Minimum information:
#> 0.602
```

partition also supports a number of ways to visualize partitions and
permutation tests; these functions all start with `plot_*()`

. These
functions all return ggplots and can thus be extended using ggplot2.

```
plot_stacked_area_clusters(df) +
ggplot2::theme_minimal(14)
```

## Functions in partition

Name | Description | |

build_next_name | Create new variable name based on prefix and previous reductions | |

as_partition_step | Create a partition object from a data frame | |

map_partition | Map a partition across a range of minimum information | |

corr | Efficiently fit correlation coefficient for matrix or two vectors | |

is_partition_step | Is this object a partition_step? | |

part_kmeans | Partitioner: K-means, ICC, scaled means | |

direct_distance | Target based on minimum distance matrix | |

is_partition | Is this object a partition? | |

part_minr2 | Partitioner: distance, minimum R-squared, scaled means | |

linear_k_search | Search for best k using the linear search method | |

direct_k_cluster | Target based on K-means clustering | |

k_searching_forward | Assess k search | |

filter_reduced | Filter the reduced mappings | |

calculate_new_variable | Calculate or retrieve stored reduced variable | |

matrix_is_exhausted | Have all pairs of variables been checked for metric? | |

find_min_distance_variables | Find the index of the pair with the smallest distance | |

fit_distance_matrix | Fit a distance matrix using correlation coefficients | |

assign_partition | Process a dataset with a partitioner | |

binary_k_search | Search for best k using the binary search method | |

guess_init_k | Guess initial k based on threshold and p | |

summarize_partitions | Summarize and map partitions and permutations | |

measure_icc | Measure the information loss of reduction using intraclass correlation coefficient | |

mutual_information | Calculate the standardized mutual information of a data set | |

mapping_key | Return partition mapping key | |

%>% | Pipe operator | |

plot_area_clusters | Plot partitions | |

search_k | Search for the best k | |

scaled_mean | Average and scale rows in a data.frame | |

part_icc | Partitioner: distance, ICC, scaled means | |

count_clusters | Helper functions to print partition summary | |

under_threshold | Compare metric to threshold | |

direct_measure_reduce | Apply a partitioner | |

pull_composite_variables | Access mapping variables | |

fill_in_missing | Process reduced variables when missing data | |

rewind_target | Set target to last value | |

return_if_single | Reduce targets if more than one variable, return otherwise | |

icc | Calculate the intraclass correlation coefficient | |

measure_min_icc | Measure the information loss of reduction using the minimum intraclass correlation coefficient | |

test_permutation | Permute partitions | |

update_dist | Only fit the distances for a new variable | |

increase_hits | Count and retrieve the number of metrics below threshold | |

icc_r | Calculate the intraclass correlation coefficient | |

measure_min_r2 | Measure the information loss of reduction using minimum R-squared | |

is_partitioner | Is this object a partitioner? | |

partition | Agglomerative partitioning | |

is_same_function | Are two functions the same? | |

k_exhausted | Have all values of k been checked for metric? | |

measure_std_mutualinfo | Measure the information loss of reduction using standardized mutual information | |

find_algorithm | Which kmeans algorithm to use? | |

plot_permutation | Plot permutation tests | |

measure_variance_explained | Measure the information loss of reduction using the variance explained | |

partition_scores | Return the reduced data from a partition | |

part_pc1 | Partitioner: distance, first principal component, scaled means | |

cat_bold | Print to the console in color | |

part_stdmi | Partitioner: distance, mutual information, scaled means | |

paste_director | Lookup partitioner types to print in English | |

reduce_mappings | Create a mapping key out of a list of targets | |

reduce_scaled_mean | Reduce selected variables to scaled means | |

reduce_cluster | Reduce a target | |

replace_partitioner | Replace the director, metric, or reducer for a partitioner | |

permute_df | Permute a data set | |

simplify_names | Simplify reduced variable names | |

simulate_block_data | Simulate correlated blocks of variables | |

reduce_first_component | Reduce selected variables to first principal component | |

reduce_kmeans | Reduce selected variables to scaled means | |

as_partitioner | Create a partitioner | |

as_measure | Create a custom metric | |

as_director | Create a custom director | |

as_reducer | Create a custom reducer | |

get_indices | Process mapping key to return from partition() | |

all_columns_reduced | Check if all variables reduced to a single composite | |

append_mappings | Append a new variable to mapping and filter out composite variables | |

as_partition | Return a partition object | |

all_done | Mark the partition as complete to stop search | |

No Results! |

## Vignettes of partition

Name | ||

extending-partition.Rmd | ||

introduction-to-partition.Rmd | ||

No Results! |

## Last month downloads

## Details

Type | Package |

License | MIT + file LICENSE |

Encoding | UTF-8 |

LazyData | true |

LinkingTo | Rcpp, RcppArmadillo |

RoxygenNote | 6.1.1 |

URL | https://uscbiostats.github.io/partition/, https://github.com/USCbiostats/partition |

BugReports | https://github.com/USCbiostats/partition/issues |

Language | en-US |

VignetteBuilder | knitr |

NeedsCompilation | yes |

Packaged | 2019-05-16 16:08:33 UTC; malcolmbarrett |

Repository | CRAN |

Date/Publication | 2019-05-17 07:00:04 UTC |

suggests | covr , knitr , rmarkdown , spelling , testthat |

imports | crayon , dplyr (>= 0.8.0) , forcats , ggplot2 (>= 3.0.0) , infotheo , magrittr , MASS , pillar , purrr , Rcpp , rlang , stringr , tibble , tidyr |

depends | R (>= 3.3.0) |

linkingto | RcppArmadillo |

Contributors | Joshua Millstein |

#### Include our badge in your README

```
[![Rdoc](http://www.rdocumentation.org/badges/version/partition)](http://www.rdocumentation.org/packages/partition)
```