find_hits: Identification of putative hits using Zvalues or MIPHENO empirical pval

Description

Returns a dataframe containing all the 'hits' here 2 or more observations in source and/or in ID passing the threshold set by the supplied criteria.

Usage

find_hits(data=data, ID= 'LOCUS', source=NULL, values=list(start=11, stop=21), var.cuts=FALSE, low.cut=NULL, high.cut=NULL, cutoff=0.05, Z=NULL, ...)

Arguments

data

Dataframe containing a column of identifiers and column(s) of assay data providing scores to determine if an individual is a putative hit.

The name of the column containing individual identifiers. Must contain same values or as source.

source

A list of individuals (contained in data) to be tested to see if they are a hit.

values

Values (or columns) in data that are to be used to determine if an observation is a putative hit.

var.cuts

Logical, will variable cutoffs be used for each of the assays (columns)? Must provide high.cut and low.cut if TRUE

low.cut

A list of values (same length as the number of assay columns) giving the MAXIMUM value for an observation to be considered BELOW 'normal'.

high.cut

A list of values (same length as the number of assay columns) giving the MINIMUM value for an observation to be considered ABOVE 'normal'.

cutoff

p value below which observations are considered a putative hit.

Z score which is considered a hit.

...

Other parameters.

Value

find_hits returns a dataframe containing putative hits and data for other individuals in their group.

Details

This function uses data coming out of the cdf.pval function or data with Zscores. Suggestions for using pvalue data are given below. The whole data object can be used, including if there are additional descriptors. ID refers to the identifier for individuals. Does not need to be unique. source is optional and contains a list of identifiers to be test for putative hits. If there are multiple individuals with the same ID (ex, in the same test group) then over half of them need to meet the criteria to be a putative hit. values indicates the columns containing values to evaluate, with start = the position of the first column and stop = the position of the last column. If you wish to use a different cutoff for each column, then set var.cuts = TRUE and supply lists for both low.cut and high.cut that correspond to the largest value to be considered a hit on the low side (ex low abundance) and the smallest value to be considered a hit on the high side (ex high abundance), respectively. Alternatively, cutoff is used for data coming out of cdf.pval. cutoff=0.05 then values <=0.025 and="" values="">= 0.975 will be considered putative hits. If Zscores are provided (or other criteria where values >= abs(x) are considered a hit), then Z should be used to define a cutoff. data are subsetted based on the column (ID) either by all levels (e.g. group A, group B) or by source, if provided. Each column in values (e.g. assay) is evaluated to see if any individuals in that column meet the criteria for a putative hit. If more than half of the individuals meet the criteria to be a putative hit for that column, all the individuals belonging to that level are put into the output data frame. If not, then the remaining columns are evaluated or it moves to the next level. Individual responses that are low or high are evaluated separately.

References

Bell SM, Burgoon LD, Last RL. MIPHENO: Data normalization for high throughput metabolite analysis. BMC Bioinformatics 2012, 13(10)

Examples

Run this code

 #See the sweave document in the corresponding paper for examples

Run the code above in your browser using DataLab