Cleans, vectorizes and appends lexical norms to all content words in a language corpus. User guides options for stopword removal and lemmatization. User selects up to three psycholinguistic dimensions to yoke norms on each content word in the original conversation transcript.
prep_dyads(
dat_read,
lemmatize = TRUE,
omit_stops = TRUE,
which_stoplist = "Temple_stops25",
remove_backchannel = FALSE,
verbose = TRUE
)dataframe with text cleaned and vectorized to a one word per-row format. Lexical norms and metadata are appended to each content word. Cleaned text appears under a new column called 'Text_Clean'. Any selected dimensions (e.g., word length) and metadata are also appended to each word along with speaker identity, turn, and Event_ID (conversation identifier).
dataframe produced from read_dyads() function
logical, should words be lemmatized (switched to base morphological form), default is TRUE
option to remove stopwords, default TRUE
user-specified stopword removal method with options including "none", "SMART", "MIT_stops", "CA_OriginalStops", or "Temple_Stopwords25". "Temple_Stopwords25 is the default list
logical, should turns that are full of stopwords (e.g., "Uhm yeah") be preserved as NAs or removed. Removal will 'squish' the turn before and after together into one. If NAs are preserved they are later interpolated.
display detailed output such as error messages and progress (default is TRUE)