This data set provides some basic quantiative measures for all texts in the Brown corpus of written American English (Francis & Kucera 1964),
BrownStatsA data frame with 500 rows and the following columns:
ty:number of distinct types
to:number of tokens (including punctuation)
se:number of sentences
towl:mean word length in characters, averaged over tokens
tywl:mean word length in characters, averaged over types
Marco Baroni <baroni@sslmit.unibo.it>
Francis, W.~N. and Kucera, H. (1964). Manual of information to accompany a standard sample of present-day edited American English, for use with digital computers. Technical report, Department of Linguistics, Brown University, Providence, RI.
LOBStats