parseInputFile takes a mixed file containing group identifiers (numeric) and gene names, returning the list of group identifiers and genes with the remaining columns removed.
The package was originally written to work from a file laid out thus:
group_id1
gene_name1
gene_name2
group_id2
gene_name1
gene_name3
The methods assume that both group identifiers and gene names are alphanumeric; the group identifiers, where present, begining with a number and gene names starting with a character.
please note, this populates the vector with only the alphanumric strings begining each line of the input file. Also, RNA genes (begining ENSG000) are excluded.
parseInputFile(x, file)
"parseInputFile"(x, file)geneanno: vector of character strings