aggregateSegmentExpression: Aggregating genes across copy number segments.

Description

Calculates average expression of genes grouped by common segment membership.

Usage

aggregateSegmentExpression(epg, segments, dataset="hsapiens_gene_ensembl", 
						mingps = 20, GRCh = 37, host=NULL)

Value

List with fields:

eps: Segment-by-cell matrix of expression values.
gps: Segment-by-cell matrix of the number of expressed genes.

Arguments

epg: Gene-by-cell matrix of expression. Recommendation is to cap extreme UMI counts (e.g. at the 99% quantile) and to include only cells expressing at least 1,000 genes.
segments: Matrix in which each row corresponds to a copy number segment as calculated by a circular binary segmentation algorithm. Has to contain at least the following column names:
chr - chromosome;
startpos - the first genomic position of a copy number segment;
endpos - the last genomic position of a copy number segment;
CN_Estimate - the copy number estimated for each segment.
dataset: Dataset to download from BioMart.
mingps: Minimum number of expressed genes a segment needs to contain in order to be included in output.
GRCh: Human reference genome version to be used for annotating gene coordinates.
host: Host address used by BioMart.

Author

Noemi Andor

Details

Let S := { \(S_1, S_2, ... S_n\) } be the set of \(n\) genomic segments that have been obtained from DNA-sequencing a given sample (e.g. from bulk exome-sequencing, scDNA-sequencing, etc.). Genes are mapped to their genomic coordinates using the biomaRt package and assigned to a segment based on their coordinates. Genes are grouped by their segment membership, to obtain the average number of UMIs and the number of expressed genes per segment \(S_j\) per cell i.

Examples

Run this code

data(epg)
data(segments)
# \donttest{
	X=aggregateSegmentExpression(epg, segments, mingps=20, GRCh=38)
# }

Run the code above in your browser using DataLab