Function which accepts result from bibit
, bibit2
or bibit3
and will (re-)apply the column extension procedure. This means if the result already contained extended biclusters that these will be deleted.
bibit_columnextension(result, matrix, arff_row_col = NULL, BC = NULL,
extend_columns = "naive", extend_mincol = 1, extend_limitcol = 1,
extend_noise = 1, extend_contained = FALSE)
The binary input matrix.
A numeric/integer vector of BC's which should be extended. Different behaviour for the 3 types of input results:
bibit
BC
directly takes the corresponding biclusters from the result and extends them. (e.g. BC=c(1,10)
is then remapped to c("BC1","BC1_Ext1","BC2","BC2_Ext1") in the new output
)
bibit2
BC
corresponds with the original non-extended biclusters from the bibit2
result. These original biclusters are selected and extended. (e.g. BC=c(1,10)
selects biclusters c("BC1","BC10")
which are then remapped to c("BC1","BC1_Ext1","BC2","BC2_Ext1") in the new output
)
bibit3
BC
corresponds with the biclusters when combining the FULLPATTERN and SUBPATTERN result together. For example choosing BC=1
would only select the 1 FULLPATTERN bicluster for each pattern and try to extend it. (e.g. BC=c(1,10)
selects biclusters 1 and 10 from the combined fullpattern and subpattern result (meaning the full pattern BC and the 9th subpattern BC) which are then remapped to c("BC1","BC1_Ext1","BC2","BC2_Ext1") in the new output
)
Column Extension Parameter Can be one of the following: "naive"
or "recursive"
which will apply either a naive or recursive column extension procedure. (See Details Section for more information.)
Based on the extension, additional biclusters will be created in the Biclust object which can be seen in the column and row names of the RowxNumber
and NumberxCol
slots ("_Ext"
suffix).
The info
slot will also contain some additional information. Inside this slot, BC.Extended
contains info on which original biclusters were extended, how many columns were added, and in how many extra extended biclusters this resulted.
Warning: Using a percentage-based extend_noise
in combination with the recursive procedure will result in a large amount of biclusters and increase the computation time a lot. Depending on the data when using recursive in combination with a noise percentage, it is advised to keep it reasonable small (e.g. 10%). Another remedy is to sufficiently increase the extend_limitcol
either as a percentage or integer to limit the candidates of columns.
Column Extension Parameter A minimum number of columns that a bicluster should be able to be extended with before saving the result. (Default=1)
Column Extension Parameter The number (extend_limitcol>=1
) or percentage (0<extend_limitcol<1
) of 1's that a column (subsetted on the BC rows) should at least contain for it to be a candidate to be added to the bicluster as an extension. (Default=1) (Increase this parameter if the recursive extension takes too long. Limiting the pool of candidates will decrease computation time, but restrict the results more.)
Column Extension Parameter The maximum allowed noise (in each row) when extending the columns of the bicluster. Can take the same as the noise
parameter.
Column Extension Parameter Logical value if extended results should be checked if they contain each other (and deleted if this is the case). Default = FALSE
. This can be a lengthy procedure for a large amount of biclusters (>1000).
A Biclust S4 Class object or bibit3 S3 list Class object
An optional procedure which can be applied after applying the BiBit algorithm (with noise) is called Column Extension.
The procedure will add extra columns to a BiBit bicluster, keeping into account the allowed extend_noise
level in each row.
The primary goal is to, after applying BiBit with noise, to also try and add some noise to the 2 initial `perfect` rows.
Other parameters like extend_mincol
and extend_limitcol
can also further restrict which extensions should be discovered.
This procedure can be done either naively (fast) or recursively (more slow and thorough) with the extend_columns
parameter.
"naive"
Subsetting on the bicluster rows, the column candidates are ordered based on the most 1's in a column. Afterwards, in this order, each column is sequentially checked and added when the resulted BC is still within row noise levels. This has 2 major consequences:
If 2 columns are identical, the first in the dataset is added, while the second isn't (depending on the noise level allowed per row).
If 2 non-identical columns are viable to be added (correct row noise), the column with the most 1's is added. Afterwards the second column might not be viable anymore.
"recursive"
Conditioning the group of candidates for the allowed row noise level, each possible/allowed combination of adding columns to the bicluster is checked. Only the resulted biclusters with the highest number of extra columns are saved. Of course this could result in multiple extensions for 1 bicluster if there are multiple `maximum added columns` results.
Note: These procedures are followed by a fast check if the extensions resulted in any duplicate biclusters. If so, these are deleted from the final result.
set.seed(1)
data <- matrix(sample(c(0,1),100*100,replace=TRUE,prob=c(0.9,0.1)),nrow=100,ncol=100)
data[1:10,1:10] <- 1 # BC1
data[11:20,11:20] <- 1 # BC2
data[21:30,21:30] <- 1 # BC3
data <- data[sample(1:nrow(data),nrow(data)),sample(1:ncol(data),ncol(data))]
result <- bibit2(data,minr=5,minc=5,noise=0.1,extend_columns = "recursive",
extend_mincol=1,extend_limitcol=1)
result
result2 <- bibit_columnextension(result=out,matrix=data,arff_row_col=NULL,BC=c(1,10),
extend_columns="recursive",extend_mincol=1,
extend_limitcol=1,extend_noise=2,extend_contained=FALSE)
result2
Run the code above in your browser using DataLab