Learn R Programming

Rgff (version 0.1.6)

check_gff: Test consistency and order of a GFF file

Description

This function tests the consistency and order of a GFF file.

Usage

check_gff(inFile, fileType = c("AUTO", "GFF3", "GTF"))

Value

A data frame of detected issues, including a short code name, a description and estimated severity each. In no issues are detected the function will return an empty data frame.

Arguments

inFile

Path to the input GFF file

fileType

Version of the input file (GTF/GFF3). Default AUTO: determined from the file name.

Details

The following list indicates the code and description of the issues detected in GFF3 files

NCOLUMNS_EXCEEDED

Input file contains lines with more than 9 fields

NCOLUMNS_INFERIOR

Input file contains lines with less than 9 fields

TOO_MANY_FEATURE_TYPES

Input file contains too many (more than 100) different feature types

NO_IDs

ID attribute not found in any feature

DUPLICATED_IDs

There are duplicated IDs

ID_IN_MULTIPLE_CHR

The same ID has been found in more than one chromosome

NO_PARENTs

Parent attribute not found in any feature

MISSING_PARENT_IDs

There are missing Parent IDs

PARENT_IN DIFFERENT CHR

There are features whose Parent is located in a different chromosome

PARENT_DEFINED_BEFORE_ID

Feature ids referenced in Parent attribute before being defined as ID

NOT_GROUPED_BY_CHR

Features are not grouped by chromosome

NOT_SORTED_BY_COORDINATE

Features are not sorted by start coordinate

NOT_VALID_WARNING

File cannot be recognized as valid GFF3. Parsing warnings.

NOT_VALID_ERROR

File cannot be recognized as valid GFF3. Parsing errors.

The following list indicates the code and description of the issues detected in GTF files

NCOLUMNS_EXCEEDED

Input file contains lines with more than 9 fields

NCOLUMNS_INFERIOR

Input file contains lines with less than 9 fields

TOO_MANY_FEATURE_TYPES

Input file contains too many (more than 100) different feature types

NO_GENE_ID_ATTRIBUTE

gene_id attribute not found in any feature

MISSING_GENE_IDs

There are features without gene_id attribute

NO_GENE_FEATURES

Gene features are not included in this GTF file

DUPLICATED_GENE_IDs

There are duplicated gene_ids

GENE_ID_IN_MULTIPLE_CHR

The same gene_id has been found in more than one chromosome

NO_TRANSCRIPT_ID_ATTRIBUTE

transcript_id attribute not found in any feature There are no elements with transcript_id attribute

MISSING_TRANSCRIPT_IDs

There are features without transcript_id attribute

NO_TRANSCRIPT_FEATURES

Transcript features are not included in this GTF file

DUPLICATED_TRANSCRIPT_IDs

There are duplicated transcript_ids

TRANSCRIPT_ID_IN_MULTIPLE_CHR

The same transcript_id has been found in more than one chromosome

DUPLICATED_GENE_AND_TRANSCRIPT_IDs

Same id has been defined as gene_id and transcript_id

NOT_GROUPED_BY_CHR

Features are not grouped by chromosome

NOT_SORTED_BY_COORDINATE

Features are not sorted by start coordinate

NOT_VALID_WARNING

File cannot be recognized as valid GTF. Parsing warnings.

NOT_VALID_ERROR

File cannot be recognized as valid GTF. Parsing errors.

Examples

Run this code
test_gff3<-system.file("extdata", "eden.gff3", package="Rgff")
check_gff(test_gff3)

Run the code above in your browser using DataLab