Learn R Programming

misha (version 5.3.1)

gvtrack.create: Creates a new virtual track

Description

Creates a new virtual track.

Usage

gvtrack.create(
  vtrack = NULL,
  src = NULL,
  func = NULL,
  params = NULL,
  dim = NULL,
  sshift = NULL,
  eshift = NULL,
  filter = NULL,
  ...
)

Value

None.

Arguments

vtrack

virtual track name

src

source (track/intervals). NULL for PWM functions. For value-based tracks, provide a data frame with columns chrom, start, end, and one numeric value column. The data frame functions as an in-memory sparse track and supports all track-based summarizer functions. Intervals must not overlap.

func

function name (see above)

params

function parameters (see above)

dim

use 'NULL' or '0' for 1D iterators. '1' converts 2D iterator to (chrom1, start1, end1) , '2' converts 2D iterator to (chrom2, start2, end2)

sshift

shift of 'start' coordinate

eshift

shift of 'end' coordinate

filter

genomic mask to apply. Can be:

  • A data.frame with columns 'chrom', 'start', 'end' (intervals to mask)

  • A character string naming an intervals set

  • A character string naming a track (must be intervals-type track)

  • A list of any combination of the above (all will be unified)

  • NULL to clear the filter

...

additional PWM parameters

Details

This function creates a new virtual track named 'vtrack' with the given source, function and parameters. 'src' can be either a track, intervals (1D or 2D), or a data frame with intervals and a numeric value column (value-based track). The tables below summarize the supported combinations.

Value-based tracks Value-based tracks are data frames containing genomic intervals with associated numeric values. They function as in-memory sparse tracks without requiring track creation in the database. To create a value-based track, provide a data frame with columns chrom, start, end, and one numeric value column (any name is acceptable). Value-based tracks support all track-based summarizer functions (e.g., avg, min, max, sum, stddev, quantile, nearest, exists, size, first, last, sample, and position functions), but do not support overlapping intervals. They behave like sparse tracks in aggregation: values are aggregated using count-based averaging (each interval contributes equally regardless of length), not coverage-based averaging.

Track-based summarizers

SourcefuncparamsDescription
TrackavgNULLAverage track value in the iterator interval.
Track (1D)existsvals (optional)Returns 1 if any value exists (or specific vals if provided), 0 otherwise.
Track (1D)firstNULLFirst value in the iterator interval.
Track (1D)lastNULLLast value in the iterator interval.
TrackmaxNULLMaximum track value in the iterator interval.
TrackminNULLMinimum track value in the iterator interval.
Dense / Sparse / Array tracknearestNULLAverage value inside the iterator; for sparse tracks with no samples in the interval, falls back to the closest sample outside the interval (by genomic distance).
Track (1D)sampleNULLUniformly sampled source value from the iterator interval.
Track (1D)sizeNULLNumber of non-NaN values in the iterator interval.
Dense / Sparse / Array trackstddevNULLUnbiased standard deviation of values in the iterator interval.
Dense / Sparse / Array tracksumNULLSum of values in the iterator interval.
Dense / Sparse / Array trackquantilePercentile in [0, 1]Quantile of values in the iterator interval.
Dense trackglobal.percentileNULLPercentile of the interval average relative to the full-track distribution.
Dense trackglobal.percentile.maxNULLPercentile of the interval maximum relative to the full-track distribution.
Dense trackglobal.percentile.minNULLPercentile of the interval minimum relative to the full-track distribution.

Track position summarizers

SourcefuncparamsDescription
Track (1D)first.pos.absNULLAbsolute genomic coordinate of the first value.
Track (1D)first.pos.relativeNULLZero-based position (relative to interval start) of the first value.
Track (1D)last.pos.absNULLAbsolute genomic coordinate of the last value.
Track (1D)last.pos.relativeNULLZero-based position (relative to interval start) of the last value.
Track (1D)max.pos.absNULLAbsolute genomic coordinate of the maximum value inside the iterator interval.
Track (1D)max.pos.relativeNULLZero-based position (relative to interval start) of the maximum value.
Track (1D)min.pos.absNULLAbsolute genomic coordinate of the minimum value inside the iterator interval.
Track (1D)min.pos.relativeNULLZero-based position (relative to interval start) of the minimum value.
Track (1D)sample.pos.absNULLAbsolute genomic coordinate of a uniformly sampled value.
Track (1D)sample.pos.relativeNULLZero-based position (relative to interval start) of a uniformly sampled value.

For max.pos.relative, min.pos.relative, first.pos.relative, last.pos.relative, sample.pos.relative, iterator modifiers (including sshift / eshift and 1D projections generated via gvtrack.iterator) are applied before the position is reported. In other words, the returned coordinate is always 0-based and measured from the start of the iterator interval after all modifier adjustments.

Interval-based summarizers

SourcefuncparamsDescription
1D intervalsdistanceMinimal distance from center (default 0)Signed distance using normalized formula when inside intervals, distance to edge when outside; see notes below for exact formula.
1D intervalsdistance.centerNULLDistance from iterator center to the closest interval center, NA if outside all intervals.
1D intervalsdistance.edgeNULLEdge-to-edge distance from iterator interval to closest source interval (like gintervals.neighbors); see notes below for strand handling.
1D intervalscoverageNULLFraction of iterator length covered by source intervals (after unifying overlaps).
1D intervalsneighbor.countMax distance (>= 0)Number of source intervals whose edge-to-edge distance from the iterator interval is within params (no unification).

2D track summarizers

SourcefuncparamsDescription
2D trackareaNULLArea covered by intersections of track rectangles with the iterator interval.
2D trackweighted.sumNULLWeighted sum of values where each weight equals the intersection area.

Motif (PWM) summarizers

SourcefuncKey paramsDescription
NULL (sequence)pwmpssm, bidirect, prior, extend, spat_*Log-sum-exp score of motif likelihoods across all anchors inside the iterator interval.
NULL (sequence)pwm.maxpssm, bidirect, prior, extend, spat_*Maximum log-likelihood score among all anchors (per-position union across strands).
NULL (sequence)pwm.max.pospssm, bidirect, prior, extend, spat_*1-based position of the best-scoring anchor (signed by strand when bidirect = TRUE); coordinates are always relative to the iterator interval after any gvtrack.iterator() shifts/extensions.
NULL (sequence)pwm.countpssm, score.thresh, bidirect, prior, extend, strand, spat_*Count of anchors whose score exceeds score.thresh (per-position union).

K-mer summarizers

SourcefuncKey paramsDescription
NULL (sequence)kmer.countkmer, extend, strandNumber of k-mer occurrences whose anchor lies inside the iterator interval.
NULL (sequence)kmer.frackmer, extend, strandFraction of possible anchors within the interval that match the k-mer.

Masked sequence summarizers

SourcefuncKey paramsDescription
NULL (sequence)masked.countNULLNumber of masked (lowercase) base pairs in the iterator interval.
NULL (sequence)masked.fracNULLFraction of base pairs in the iterator interval that are masked (lowercase).

The sections below provide additional notes for motif, interval, k-mer, and masked sequence functions.

Motif (PWM) notes

  • pssm: Position-specific scoring matrix (matrix or data frame) with columns A, C, G, T; extra columns are ignored.

  • bidirect: When TRUE (default), both strands are scanned and combined per genomic start (per-position union). The strand argument is ignored. When FALSE, only the strand specified by strand is scanned.

  • prior: Pseudocount added to frequencies (default 0.01). Set to 0 to disable.

  • extend: Extends the fetched sequence so boundary-anchored motifs retain full context (default TRUE). The END coordinate is padded by motif_length - 1 for all strand modes; anchors must still start inside the iterator.

  • Neutral characters (N, n, *) contribute the mean log-probability of the corresponding PSSM column on both strands.

  • strand: Used only when bidirect = FALSE; 1 scans the forward strand, -1 scans the reverse strand. For pwm.max.pos, strand = -1 reports the hit position at the end of the match (still relative to the forward orientation).

  • score.thresh: Threshold for pwm.count. Anchors with log-likelihood >= score.thresh are counted; only one count per genomic start.

  • Spatial weighting (spat_factor, spat_bin, spat_min, spat_max): optional position-dependent weights applied in log-space. Provide a positive numeric vector spat_factor; spat_bin (integer > 0) defines bin width; spat_min/spat_max restrict the scanning window.

  • pwm.max.pos: Positions are reported 1-based relative to the final scan window (after iterator shifts and spatial trimming). Ties resolve to the most 5' anchor; the forward strand wins ties at the same coordinate. Values are signed when bidirect = TRUE (positive for forward, negative for reverse).

Spatial weighting enables position-dependent weighting for modeling positional biases. Bins are 0-indexed from the scan start. When using gvtrack.iterator() shifts (e.g., sshift = -50, eshift = 50), bins index from the expanded scan window start, not the original interval. Both strands use the same bin at each genomic position. Positions beyond the last bin reuse the final bin's weight. If the window size is not divisible by spat_bin, the last bin is shorter (e.g., scanning 500 bp with 40 bp bins yields bins 0-11 of 40 bp plus bin 12 of 20 bp). Use spat_min and spat_max to restrict scanning to a range divisible by spat_bin if needed.

PWM parameters can be supplied either as a single list (params) or via named arguments (see examples).

Interval distance notes

distance: Given the center 'C' of the current iterator interval, returns 'DC * X/2' where 'DC' is the normalized distance to the center of the interval that contains 'C', and 'X' is the value of the parameter (default: 0). If no interval contains 'C', the result is 'D + X/2' where 'D' is the distance between 'C' and the edge of the closest interval.

distance.center: Given the center 'C' of the current iterator interval, returns NaN if 'C' is outside of all intervals, otherwise returns the distance between 'C' and the center of the closest interval.

distance.edge: Computes edge-to-edge distance from the iterator interval to the closest source interval, using the same calculation as gintervals.neighbors. Returns 0 for overlapping intervals. Distance sign depends on the strand column of source intervals; returns unsigned (absolute) distance if no strand column exists. Returns NA if no source intervals exist on the current chromosome.

For distance and distance.center, distance can be positive or negative depending on the position of the coordinate relative to the interval and the strand (-1 or 1) of the interval. Distance is always positive if strand = 0 or if the strand column is missing. The result is NA if no intervals exist for the current chromosome.

Difference between distance functions: The distance function measures from the center of the iterator interval (a single coordinate point) to the closest edge of source intervals when outside, or returns a normalized distance within the interval when inside. The distance.center function measures from the center of the iterator interval to the center of source intervals. The distance.edge function measures edge-to-edge distance between intervals, exactly like gintervals.neighbors. Use distance.edge when you need the same distance computation as gintervals.neighbors within a virtual track context.

K-mer notes

  • kmer: DNA sequence (case-insensitive) to count.

  • extend: If TRUE (default), counts kmers whose anchor lies in the interval even if the kmer extends beyond it; when FALSE, only kmers fully contained in the interval are considered.

  • strand: 1 counts forward-strand occurrences, -1 counts reverse-strand occurrences, 0 counts both strands (default). For palindromic kmers, consider using 1 or -1 to avoid double counting.

K-mer parameters can be supplied as a list or via named arguments (see examples).

Modify iterator behavior with 'gvtrack.iterator' or 'gvtrack.iterator.2d'.

See Also

gvtrack.info, gvtrack.iterator, gvtrack.iterator.2d, gvtrack.array.slice, gvtrack.ls, gvtrack.rm

gvtrack.iterator, gvtrack.iterator.2d, gvtrack.filter

Examples

Run this code
# \dontshow{
options(gmax.processes = 2)
# }

gdb.init_examples()

gvtrack.create("vtrack1", "dense_track", "max")
gvtrack.create("vtrack2", "dense_track", "quantile", 0.5)
gextract("dense_track", "vtrack1", "vtrack2",
    gintervals(1, 0, 10000),
    iterator = 1000
)

gvtrack.create("vtrack3", "dense_track", "global.percentile")
gvtrack.create("vtrack4", "annotations", "distance")
gdist(
    "vtrack3", seq(0, 1, l = 10), "vtrack4",
    seq(-500, 500, 200)
)

gvtrack.create("cov", "annotations", "coverage")
gextract("cov", gintervals(1, 0, 1000), iterator = 100)

pssm <- matrix(
    c(
        0.7, 0.1, 0.1, 0.1, # Example PSSM
        0.1, 0.7, 0.1, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1
    ),
    ncol = 4, byrow = TRUE
)
colnames(pssm) <- c("A", "C", "G", "T")
gvtrack.create(
    "motif_score", NULL, "pwm",
    list(pssm = pssm, bidirect = TRUE, prior = 0.01)
)
gvtrack.create("max_motif_score", NULL, "pwm.max",
    pssm = pssm, bidirect = TRUE, prior = 0.01
)
gvtrack.create("max_motif_pos", NULL, "pwm.max.pos",
    pssm = pssm
)
gextract(
    c(
        "dense_track", "motif_score", "max_motif_score",
        "max_motif_pos"
    ),
    gintervals(1, 0, 10000),
    iterator = 500
)

# Kmer counting examples
gvtrack.create("cg_count", NULL, "kmer.count", kmer = "CG", strand = 1)
gvtrack.create("cg_frac", NULL, "kmer.frac", kmer = "CG", strand = 1)
gextract(c("cg_count", "cg_frac"), gintervals(1, 0, 10000), iterator = 1000)

gvtrack.create("at_pos", NULL, "kmer.count", kmer = "AT", strand = 1)
gvtrack.create("at_neg", NULL, "kmer.count", kmer = "AT", strand = -1)
gvtrack.create("at_both", NULL, "kmer.count", kmer = "AT", strand = 0)
gextract(c("at_pos", "at_neg", "at_both"), gintervals(1, 0, 10000), iterator = 1000)

# GC content
gvtrack.create("g_frac", NULL, "kmer.frac", kmer = "G")
gvtrack.create("c_frac", NULL, "kmer.frac", kmer = "C")
gextract("g_frac + c_frac", gintervals(1, 0, 10000),
    iterator = 1000,
    colnames = "gc_content"
)

# Masked base pair counting
gvtrack.create("masked_count", NULL, "masked.count")
gvtrack.create("masked_frac", NULL, "masked.frac")
gextract(c("masked_count", "masked_frac"), gintervals(1, 0, 10000), iterator = 1000)

# Combined with GC content (unmasked regions only)
gvtrack.create("gc", NULL, "kmer.frac", kmer = "G")
gextract("gc * (1 - masked_frac)",
    gintervals(1, 0, 10000),
    iterator = 1000,
    colnames = "gc_unmasked"
)

# Value-based track examples
# Create a data frame with intervals and numeric values
intervals_with_values <- data.frame(
    chrom = "chr1",
    start = c(100, 300, 500),
    end = c(200, 400, 600),
    score = c(10, 20, 30)
)
# Use as value-based sparse track (functions like sparse track)
gvtrack.create("value_track", intervals_with_values, "avg")
gvtrack.create("value_track_max", intervals_with_values, "max")
gextract(c("value_track", "value_track_max"),
    gintervals(1, 0, 10000),
    iterator = 1000
)

# Spatial PWM examples
# Create a PWM with higher weight in the center of intervals
pssm <- matrix(
    c(
        0.7, 0.1, 0.1, 0.1,
        0.1, 0.7, 0.1, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.1, 0.7
    ),
    ncol = 4, byrow = TRUE
)
colnames(pssm) <- c("A", "C", "G", "T")

# Spatial factors: low weight at edges, high in center
# For 200bp intervals with 40bp bins: bins 0, 40, 80, 120, 160
spatial_weights <- c(0.5, 1.0, 2.0, 1.0, 0.5)

gvtrack.create(
    "spatial_pwm", NULL, "pwm",
    list(
        pssm = pssm,
        bidirect = TRUE,
        spat_factor = spatial_weights,
        spat_bin = 40L
    )
)

# Compare with non-spatial PWM
gvtrack.create(
    "regular_pwm", NULL, "pwm",
    list(pssm = pssm, bidirect = TRUE)
)

gextract(c("spatial_pwm", "regular_pwm"),
    gintervals(1, 0, 10000),
    iterator = 200
)

# Using spatial parameters with iterator shifts
gvtrack.create(
    "spatial_extended", NULL, "pwm.max",
    pssm = pssm,
    spat_factor = c(0.5, 1.0, 2.0, 2.5, 2.0, 1.0, 0.5),
    spat_bin = 40L
)
# Scan window will be 280bp (100bp + 2*90bp)
gvtrack.iterator("spatial_extended", sshift = -90, eshift = 90)
gextract("spatial_extended", gintervals(1, 0, 10000), iterator = 100)

# Using spat_min/spat_max to restrict scanning to a window
# For 500bp intervals, scan only positions 30-470 (440bp window)
gvtrack.create(
    "window_pwm", NULL, "pwm",
    pssm = pssm,
    bidirect = TRUE,
    spat_min = 30, # 1-based position
    spat_max = 470 # 1-based position
)
gextract("window_pwm", gintervals(1, 0, 10000), iterator = 500)

# Combining spatial weighting with window restriction
# Scan positions 50-450 with spatial weights favoring the center
gvtrack.create(
    "window_spatial_pwm", NULL, "pwm",
    pssm = pssm,
    bidirect = TRUE,
    spat_factor = c(0.5, 1.0, 2.0, 2.5, 2.0, 1.0, 0.5, 1.0, 0.5, 0.5),
    spat_bin = 40L,
    spat_min = 50,
    spat_max = 450
)
gextract("window_spatial_pwm", gintervals(1, 0, 10000), iterator = 500)

Run the code above in your browser using DataLab