The IS index is calculated as: IS = (sum 1/freq_i) × freq_ngram × n_lexical
where freq_i is the frequency of each word in the n-gram, freq_ngram is the
frequency of the n-gram, and n_lexical is the number of lexical words.
IS_norm is the normalized version: IS / L^2 where L is the n-gram length.
OPTIMIZATION: Only n-grams that start AND end with lexical words (as defined by
the 'pos' parameter) are generated, significantly reducing computation time.