FisHiCal (version 1.1)

# prepareCalib: Prepare Hi-C calibration

## Description

A function to build a calibration function, by fitting a subset of FISH distances and Hi-C frequencies with a power law model (see details). The number of distances to fit (taking distances by increasing order) or a subset of selected distances should be provided by the user. Users can also choose how to estimate the distance threshold or may explicitly provide one.

## Usage

prepareCalib(data, npoints, threshold = NULL, useMax = TRUE, delta = 0.05, buffer = 1.0)

## Arguments

data
A data frame with 2 mandatory columns: distances and frequencies, standing for matching FISH distances and Hi-C frequencies, correspondingly. This data structure could be prepared with prepareData
npoints
An integer or an integer vector. If an integer n is given, than the shortest n distances and their matching frequencies will be used. Otherwise, the indices in the integer vector will indicate the subset of distances and frequencies to use from 'data'.
threshold
Optional numeric, set to NULL by default. If provided, will be used as the distance range threshold of the calibration.
useMax
Optional Boolean, set to True by default and ignored if 'threshold' is given. When TRUE, the maximal provided FISH distance will be used for the distance range threshold. Otherwise, the threshold will be estimated by the maximal FISH distance that present a small enough deviation (< delta) from the model.
delta
Optional numeric, set to 0.05 by default and ignored if 'threshold' is given or if 'useMax' is set to TRUE. Defines the acceptable deviation from the model, when the distance range threshold is estimated from the fit (see details).
buffer
Optional numeric, set to 1.0 by default and ignored if 'useMax' is set to FALSE. Defines a constant that is added to the threshold value when 'useMax' is set to TRUE.

## Value

A list with the following objects:
calib
a list defining the calibration, with the following objects: f - the calibration function (the power law model), and params - a list of parameters for f (the parameters of the model and the threshold).
fit
the return value of lm, used to solve the linear regression

## Details

We use a power law model to relate a set of FISH distances, D, and a matching set of contact frequencies, C: $C$ ~ $\beta$D^$\alpha$ Taking the log of this equation gives a linear dependency: $log(C)$~ $log(\beta)$ + $\alpha$$log(D)$ Here, we consider only a subset of distances for solving the latter equation and estimate alpha and beta with a linear regression. The threshold t, defining the range limit of Hi-C (a distance above which Hi-C frequencies are no longer informative) could be set to the maximal distance in D, or estimated more restrictively from the fit: t = maxD{ | e^( (log(C)-log($\beta)$)/$\alpha$ ) - D |< $\delta$ }

## References

Y. Shavit, F.K. Hamey, P. Lio', FisHiCal: an R package for iterative FISH-based calibration of Hi-C data, 2014 (submitted).

## Examples

data(match)
npoints = 10 # number of points to fit

# prepareCalib computes threshold according to the fit
# useMax is set to FALSE
res = prepareCalib(match, npoints, useMax = FALSE)
calib = res$calib calib fit = res$fit
alpha = calib$params[[1]] beta = calib$params[[2]]
threshold = calib$params[[3]] # plot plot(match$frequencies ~ match$distances, xlab = "distances", ylab = "frequencies") lines((exp(beta)*match$distances^alpha)~match$distances, col = "red") plot(log(match$frequencies) ~ log(match\$distances),
xlab = "log(distances)", ylab = "log(frequencies)")
abline(fit, col = "red")

# plot the estimated threshold
abline(h = beta + log(threshold)*alpha, lty = 3)