powered by
Changes multiline documents to single line. Strips extra whitespace and punctuation. Changes digits to 'X's. Non-alpha characters converted to spaces.
clean.text(bigcorp)
A tm Corpus object.
# NOT RUN { library( tm ) txt = c( "thhis s! and bonkus 4:33pm and Jan 3, 2015. ", " big space\n dawg-ness?") a <- clean.text( VCorpus( VectorSource( txt ) ) ) a[[1]] # }
Run the code above in your browser using DataLab