title: "Introduction to hicream" author: "Elise Jorge, Toby Hocking, Sylvain Foissac, Pierre Neuvial, and Nathalie Vialaneix" date: "2026-01-30" output: html_document: toc: yes keep_md: TRUE self_contained: yes vignette: > %\VignetteIndexEntry{Introduction to hicream} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
Introduction
hicream is an R package designed to perform Hi-C data differential
analysis. More specifically, it performs a pixel-level differential analysis
using diffHic and, using a two-dimensional connectivity constrained
clustering, renders a post hoc bound on True Discovery Proportion for each
cluster. This method allows to identify differential genomic regions and
quantifies signal in those regions.
library("hicream")## Loading required package: reticulateHow to load external data?
Using hicream requires to load data that corresponds to Hi-C matrices and
their index, the latter in bed format. In the following code, the paths to Hi-C
matrices and the index file path are used in the loadData function alongside
the chromosome number. The option normalize = TRUE allows to perform a cyclic
LOESS normalization. The output is an InteractionSet object.
replicates <- 1:2
cond <- "90"
allBegins <- interaction(expand.grid(replicates, cond), sep = "-")
allBegins <- as.character(allBegins)
chromosome <- 1
nbChr <- 1
allMat <- sapply(allBegins, function(ab) {
matFile <- paste0("Rep", ab, "-chr", chromosome, "_200000.bed")
})
index <- system.file("extdata", "index.200000.longest18chr.abs.bed",
package = "hicream")
format <- rep("HiC-Pro", length(replicates) * length(cond) * nbChr)
binsize <- 200000
files <- system.file("extdata", unlist(allMat), package = "hicream")
exData <- loadData(files, index, chromosome, normalize = TRUE)## chr(0)## ## chr(0)## pighic dataset
The pighic dataset has been produced using 6 Hi-C matrices (3 in each
condition) obtained from two different developmental stages of pig embryos (90
and 110 days of gestation) corresponding to chromosome 1. The dataset contains
two elements, the first is an output of the loadData function corresponding to
normalized data. The second is the vector giving, for each matrix, the condition
it belongs to.
data("pighic")
head(pighic)## $data
## class: InteractionSet
## dim: 21 6
## metadata(1): width
## assays(2): counts offset
## rownames: NULL
## rowData names(2): anchor1.id anchor2.id
## colnames: NULL
## colData names(1): totals
## type: ReverseStrictGInteractions
## regions: 6
##
## $conditions
## [1] "90" "90" "90" "110" "110" "110"Perform hicream test
Once data have been loaded, the three steps of the analysis can be performed.
Use performTest function
In this first step, the output of a loadData object is used, with a vector
indicating the condition of each sample. Here, we use data stored in pighic.
performTest outputs the result of the diffHic pixel-level differential
analysis.
resdiff <- performTest(pighic$data, pighic$conditions)
resdiff## Testing for differential interactions using diffHic.
##
## region1 region2 p.value p.adj logFC
## 1 1125 1125 0.7318482 0.8538229 -0.01830721
## 2 1125 1126 0.0562111 0.2342558 -0.16338780
## [ reached 'max' / getOption("max.print") -- omitted 19 rows ]summary(resdiff)## Summary of diffHic results.## p.value p.adj logFC
## Min. :0.0002064 Min. :0.004334 Min. :-0.163388
## 1st Qu.:0.0669302 1st Qu.:0.234256 1st Qu.:-0.034500
## Median :0.2751093 Median :0.525209 Median : 0.003099
## Mean :0.3960997 Mean :0.580792 Mean : 0.022271
## 3rd Qu.:0.6924986 3rd Qu.:0.853823 3rd Qu.: 0.040991
## Max. :0.9537715 Max. :0.953771 Max. : 0.349596Several plotting options allow to look at the $p$-value map, adjusted $p$-value map or log-fold-change map.
plot(resdiff)plot(resdiff, whichPlot = "p.adj")plot(resdiff, whichPlot = "logFC")Perform two-dimensional connectivity-constrained clustering
The AggloClust2D function uses either the output of loadData
or resdiff function. If the output of loadData is used, the clustering is
performed on the counts. Using the output of resdiff amounts to
performing clustering on adjusted $p$-values and log-fold-changes.
In both cases, a connectivity graph of pixels is built and a two-dimensional
hierarchical clustering under the connectivity constraint is performed.
The function renders a clustering
corresponding to the optimal number of clusters found by the elbow heuristic.
However, a clustering corresponding to any other number of clusters (chosen by
the user) can be obtained by specifying a value for the input argument
nbClust.
res2D_counts <- AggloClust2D(pighic$data)
res2D_counts ## Tree obtained from constrained 2D clustering.
##
## Call:
## AggloClust2D(pighic$data)
##
## Cluster method : Constrained HC with Ward linkage from sklearn
## Distance : euclidean
## Number of objects: 21
##
##
## Optimal number of clusters: 7
##
## Clustering:
## 1 1 5 2 2 2 1 1 2 4 ...summary(res2D_counts)## Summary of 2D constrained clustering results.##
## Call:
## AggloClust2D(pighic$data)
##
## Cluster method : Constrained HC with Ward linkage from sklearn
## Distance : euclidean
## Number of objects: 21res2D_diff <- AggloClust2D(resdiff)
res2D_diff ## Tree obtained from constrained 2D clustering.
##
## Call:
## AggloClust2D(resdiff)
##
## Cluster method : Constrained HC with Ward linkage from sklearn
## Distance : euclidean
## Number of objects: 21
##
##
## Optimal number of clusters: 3
##
## Clustering:
## 1 1 1 1 3 1 1 1 1 1 ...summary(res2D_diff)## Summary of 2D constrained clustering results.##
## Call:
## AggloClust2D(resdiff)
##
## Cluster method : Constrained HC with Ward linkage from sklearn
## Distance : euclidean
## Number of objects: 21Plotting the output shows the tree that corresponds to the hierarchy of clusters.
plot(res2D_diff)Perform post hoc inference
Post hoc inference is performed by the postHoc function, using the output
of performTest, and a clustering (the latter can be obtained with
AggloClust2D). The user sets a level of test confidence $\alpha$, typically
equal to $0.05$.
clusters <- res2D_diff$clustering
alpha <- 0.05
resposthoc <- postHoc(resdiff, clusters, alpha)
resposthoc## Posthoc results.
##
## # A tibble: 21 × 10
## # Groups: clust [3]
## region1 region2 clust TDP p.value p.adj logFC meanlogFC varlogFC
## <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1125 1125 1 0 0.732 0.854 -0.0183 -0.00828 0.00467
## 2 1125 1126 1 0 0.0562 0.234 -0.163 -0.00828 0.00467
## 3 1125 1127 1 0 0.800 0.884 -0.00890 -0.00828 0.00467
## 4 1125 1128 1 0 0.882 0.926 -0.00775 -0.00828 0.00467
## 5 1125 1129 3 0 0.0313 0.234 0.275 0.275 NA
## 6 1125 1130 1 0 0.269 0.525 -0.0659 -0.00828 0.00467
## 7 1126 1126 1 0 0.692 0.854 0.0280 -0.00828 0.00467
## 8 1126 1127 1 0 0.186 0.489 -0.0648 -0.00828 0.00467
## 9 1126 1128 1 0 0.409 0.716 0.0364 -0.00828 0.00467
## 10 1126 1129 1 0 0.251 0.525 0.0487 -0.00828 0.00467
## # ℹ 11 more rows
## # ℹ 1 more variable: propPoslogFC <dbl>summary(resposthoc)## Summary of post hoc results.## TDP p.value p.adj logFC
## Min. :0.00000 Min. :0.0002064 Min. :0.004334 Min. :-0.163388
## 1st Qu.:0.00000 1st Qu.:0.0669302 1st Qu.:0.234256 1st Qu.:-0.034500
## Median :0.00000 Median :0.2751093 Median :0.525209 Median : 0.003099
## Mean :0.04762 Mean :0.3960997 Mean :0.580792 Mean : 0.022271
## 3rd Qu.:0.00000 3rd Qu.:0.6924986 3rd Qu.:0.853823 3rd Qu.: 0.040991
## Max. :1.00000 Max. :0.9537715 Max. :0.953771 Max. : 0.349596
## propPoslogFC
## Min. :0.4737
## 1st Qu.:0.4737
## Median :0.4737
## Mean :0.5238
## 3rd Qu.:0.4737
## Max. :1.0000Plotting the output of postHoc function shows the minimal proportion of True
Discoveries in each cluster.
plot(resposthoc)Session Information
sessionInfo()## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Paris
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] hicream_0.0.3 reticulate_1.44.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2
## [3] dplyr_1.1.4 farver_2.1.2
## [5] viridis_0.6.5 Biostrings_2.78.0
## [7] S7_0.2.1 bitops_1.0-9
## [9] RCurl_1.98-1.17 fastmap_1.2.0
## [11] GenomicAlignments_1.46.0 XML_3.99-0.20
## [13] digest_0.6.39 lifecycle_1.0.5
## [15] capushe_1.1.3 statmod_1.5.1
## [17] magrittr_2.0.4 compiler_4.5.2
## [19] rlang_1.1.7 sass_0.4.10
## [21] tools_4.5.2 yaml_2.3.12
## [23] rtracklayer_1.70.1 knitr_1.51
## [25] Rhtslib_3.6.0 labeling_0.4.3
## [27] S4Arrays_1.10.1 curl_7.0.0
## [29] DelayedArray_0.36.0 plyr_1.8.9
## [31] RColorBrewer_1.1-3 abind_1.4-8
## [33] BiocParallel_1.44.0 withr_3.0.2
## [35] BiocGenerics_0.56.0 grid_4.5.2
## [37] stats4_4.5.2 Rhdf5lib_1.32.0
## [39] edgeR_4.8.2 ggplot2_4.0.1
## [41] scales_1.4.0 MASS_7.3-65
## [43] SummarizedExperiment_1.40.0 cli_3.6.5
## [45] rmarkdown_2.30 crayon_1.5.3
## [47] auk_0.9.1 generics_0.1.4
## [49] otel_0.2.0 metapod_1.18.0
## [51] rstudioapi_0.18.0 reshape2_1.4.5
## [53] rjson_0.2.23 httr_1.4.7
## [55] rhdf5_2.54.1 cachem_1.1.0
## [57] stringr_1.6.0 splines_4.5.2
## [59] parallel_4.5.2 restfulr_0.0.16
## [61] XVector_0.50.0 matrixStats_1.5.0
## [63] vctrs_0.7.0 Matrix_1.7-4
## [65] jsonlite_2.0.0 IRanges_2.44.0
## [67] S4Vectors_0.48.0 dendextend_1.19.1
## [69] adjclust_0.6.11 locfit_1.5-9.12
## [71] limma_3.66.0 jquerylib_0.1.4
## [73] glue_1.8.0 codetools_0.2-20
## [75] stringi_1.8.7 gtable_0.3.6
## [77] GenomeInfoDb_1.46.2 GenomicRanges_1.62.1
## [79] BiocIO_1.20.0 UCSC.utils_1.6.1
## [81] tibble_3.3.1 pillar_1.11.1
## [83] rhdf5filters_1.22.0 csaw_1.44.0
## [85] htmltools_0.5.9 Seqinfo_1.0.0
## [87] BSgenome_1.76.0 R6_2.6.1
## [89] sparseMatrixStats_1.22.0 diffHic_1.40.0
## [91] evaluate_1.0.5 lattice_0.22-7
## [93] Biobase_2.70.0 cigarillo_1.0.0
## [95] png_0.1-8 Rsamtools_2.26.0
## [97] bslib_0.9.0 Rcpp_1.1.1
## [99] InteractionSet_1.38.0 gridExtra_2.3
## [101] SparseArray_1.10.8 xfun_0.55
## [103] MatrixGenerics_1.22.0 pkgconfig_2.0.3