extractContentDOM: Extract Main HTML Content from DOM
Description
Function extracts main HTML Content using its Document
Object Model. Idea comes basically from the fact, that main
content of an HTML Document is in a subnode of the HTML DOM
Tree with a high text-to-tag ratio. Internally, this
function also calls assignValues,
calcDensity, getMainText and
removeTags.
http://www.elias.cn/En/ExtMainText,
http://ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/
Gupta et al., DOM-based Content Extraction of HTML
Documents,http://www2003.org/cdrom/papers/refereed/p583/p583-gupta.html