extractContentDOM: Extract Main HTML Content from DOM
Description
Function extracts main HTML Content using its Document
Object Model. Idea comes basically from the fact, that
main content of an HTML Document is in a subnode of the
HTML DOM Tree with a high text-to-tag ratio. Internally,
this function also calls assignValues,
calcDensity, getMainText and
removeTags.
http://www.elias.cn/En/ExtMainText,
http://ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/
Gupta et al., DOM-based Content Extraction of HTML
Documents,http://www2003.org/cdrom/papers/refereed/p583/p583-gupta.html