Bridging the Gap Between Qualitative Data and Quantitative Analysis

Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.



Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows. Build Status DOI

qdap (Quantitative Discourse Analysis Package) is an R package designed to assist in quantitative discourse analysis. The package stands as a bridge between qualitative transcripts of dialogue and statistical analysis & visualization.


To download the development version of qdap:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version (The user may want to install the dev version of reports first):

if (!require("pacman")) install.packages("pacman")



You are welcome to:

Note: If you are reporting a bug make sure you have first read the Cleaning Text & Debugging vignette

Functions in qdap

Name Description
Animate.discourse_map Discourse Map
Animate.formality Animate Formality
Network Generic Network Method
automated_readability_index Readability Measures
Network.formality Network Formality
Search Search Columns of a Data Frame
Animate.lexical_classification Animate Formality
Animate.polarity Animate Polarity
as.tdm tm Package Compatibility Tools: Apply to or Convert to/from Term Document Matrix or Document Term Matrix
Network.lexical_classification Network Lexical Classification
bag_o_words Bag of Words
bracketX Bracket Parsing
Network.polarity Network Polarity
add_incomplete Detect Incomplete Sentences; Add | Endmark
build_qdap_vignette Replace Temporary Introduction to qdap Vignette
cm_code.transform Transform Codes
add_s Make Plural (or Verb to Singular) Versions of Words
beg2char Grab Begin/End of String to Character
cm_combine.dummy Find Co-occurrence Between Dummy Codes
wfm Word Frequency Matrix
+.Network Add themes to a Network object.
colcomb2class Combine Columns to Class
blank2NA Replace Blanks in a dataframe
cm_code.exclude Exclude Codes
colsplit2df Wrapper for colSplit that Returns Dataframe(s)
capitalizer Capitalize Select Words
%&% qdap Chaining
cm_code.overlap Find Co-occurrence Between Codes
clean Remove Escaped Characters
counts.flesch_kincaid Readability Measures
cm_df.fill Range Coding
cm_2long A Generic to Long Function
cm_df.temp Break Transcript Dialogue into Blank Code Matrix
comma_spacer Ensure Space After Comma
cm_range2long Transform Codes to Start-End Durations
common Find Common Words Between Groups
Animate Generic Animate Method
DATA Fictitious Classroom Dialogue
counts.object_pronoun_type Question Counts
counts.formality Formality
Animate.character Animate Character
DATA.SPLIT Fictitious Split Sentence Classroom Dialogue
counts.polarity Polarity
counts.word_length Word Length Counts
cm_time.temp Time Span Code Sheet
DATA2 Fictitious Repeated Measures Classroom Dialogue
Title Add Title to Select qdap Plots
counts.subject_pronoun_type Question Counts
Trim Remove Leading/Trailing White Space
counts.termco Term Counts
counts.word_position Word Position
freq_terms Find Frequent Terms
duplicates Find Duplicated Words in a Text String
check_spelling Check Spelling
counts.coleman_liau Readability Measures
cm_time2long Transform Codes to Start-End Times
colSplit Separate a Column Pasted by paste2
check_spelling_interactive.character Check Spelling
counts Generic Counts Method
counts.pronoun_type Question Counts
counts.question_type Question Counts
cm_long2dummy Stretch and Dummy Code cm_xxx2long
htruncdf Dataframe Viewing
dir_map Map Transcript Files from a Directory to a Script
counts.SMOG Readability Measures
cm_range.temp Range Code Sheet
gantt_plot Gantt Plot
counts.end_mark_by Question Counts
gantt_rep Generate Unit Spans for Repeated Measures
all_words Searches Text Column for Words
left_just Text Justification
key_merge Merge Demographic Information with Person/Text Transcript
outlier_labeler Locate Outliers in Numeric String
Dissimilarity Dissimilarity Statistics
adjacency_matrix Takes a Matrix and Generates an Adjacency Matrix
dist_tab SPSS Style Frequency Tables
diversity Diversity Statistics
paste2 Paste an Unspecified Number Of Text Columns
plot.character_table Plots a character_table Object
plot.automated_readability_index Plots a automated_readability_index Object
gantt Gantt Durations
plot.cmspans Plots a cmspans object
plot.cm_distance Plots a cm_distance object
plot.cumulative_polarity Plots a cumulative_polarity Object
inspect_text Inspect Text Vectors
plot.cumulative_lexical_classification Plots a cumulative_lexical_classification Object
hamlet Hamlet (Complete & Split by Sentence)
is.global Test If Environment is Global
check_spelling_interactive.check_spelling Check Spelling
imperative Intuitively Remark Sentences as Imperative
vertex_apply Apply Parameter to List of Igraph Vertices/Edges
incomplete_replace Denote Incomplete End Marks With "|"
plot.freq_terms Plots a freq_terms Object
multigsub Multiple gsub
mcsv_r Read/Write Multiple csv Files at a Time
plot.animated_lexical_classification Plots an animated_lexical_classification Object
plot.animated_polarity Plots an animated_polarity Object
check_spelling_interactive.factor Check Spelling
end_inc Test for Incomplete Sentences
gantt_wrap Gantt Plot
plot.gantt Plots a gantt object
plot.polarity_count Plots a polarity_count Object
plot.end_mark_by Plots a end_mark_by Object
phrase_net Phrase Nets
check_text Check Text For Potential Problems
gradient_cloud Gradient Word Cloud
chunker Break Text Into Ordered Word Chunks
cm_df.transcript Transcript With Word Number
mraja1 Romeo and Juliet: Act 1 Dialogue Merged with Demographics
cm_df2long Transform Codes to Start-End Durations
plot.polarity_score Plots a polarity_score Object
common.list list Method for common
plot.pos_preprocessed Plots a pos_preprocessed Object
condense Condense Dataframe Columns
mraja1spl Romeo and Juliet: Act 1 Dialogue Merged with Demographics and Split
plot.Network Plots a Network Object
counts.fry Readability Measures
plot.cumulative_animated_polarity Plots a cumulative_animated_polarity Object
plot.table_proportion Plots a table_proportion Object
plot.table_count Plots a table_count Object
plot.pronoun_type Plots an pronoun_type Object
object_pronoun_type Count Object Pronouns Per Grouping Variable
plot.cumulative_combo_syllable_sum Plots a cumulative_combo_syllable_sum Object
plot.diversity Plots a diversity object
plot.end_mark_by_count Plots a end_mark_by_count Object
counts.linsear_write Readability Measures
counts.word_stats Word Stats
outlier_detect Detect Outliers in Text
potential_NA Search for Potential Missing Values
plot.formality Plots a formality Object
sentiment_frame Power Score (Sentiment Analysis)
plot.formality_scores Plots a formality_scores Object
preprocessed.pronoun_type Question Counts
cumulative Cumulative Scores
discourse_map Discourse Mapping
plot.end_mark Plots an end_mark Object
Filter.all_words Filter
dispersion_plot Lexical Dispersion Plot
preprocessed.question_type Question Counts
plot.object_pronoun_type Plots an object_pronoun_type Object
NAer Replace Missing Values (NA)
print.Dissimilarity Prints a Dissimilarity object
plot.animated_discourse_map Plots an animated_discourse_map Object
print.Network Prints a Network Object
print.check_spelling_interactive Prints a check_spelling_interactive Object
plot.animated_formality Plots a animated_formality Object
kullback_leibler Kullback Leibler Statistic
plot.lexical_classification_score Plots a lexical_classification_score Object
plot.linsear_write Plots a linsear_write Object
lexical_classification Lexical Classification Score
cm_code.blank Blank Code Transformation
cm_code.combine Combine Codes
cm_distance Distance Matrix Between Codes
plot.coleman_liau Plots a coleman_liau Object
plot.question_type Plots a question_type Object
cm_dummy2long Convert cm_combine.dummy Back to Long
plot.question_type_preprocessed Plots a question_type_preprocessed Object
multiscale Nested Standardization
plot.sums_gantt Plots a sums_gantt object
plot.polarity Plots a polarity Object
plot.rmgantt Plots a rmgantt object
plot.combo_syllable_sum Plots a combo_syllable_sum Object
plot.sent_split Plots a sent_split Object
plot.wfdf Plots a wfdf object
plot.lexical_classification Plots a lexical_classification Object
print.check_text Prints a check_text Object
counts.automated_readability_index Readability Measures
plot.cumulative_end_mark Plots a cumulative_end_mark Object
name2sex Names to Gender
plot.syllable_freq Plots a syllable_freq Object
counts.character_table Term Counts
print.cumulative_animated_polarity Prints a cumulative_animated_polarity Object
plot.wfm Plots a wfm object
plot.word_stats Plots a word_stats object
counts.pos Parts of Speech
print.cumulative_combo_syllable_sum Prints a cumulative_combo_syllable_sum Object
plot.word_position Plots a word_position object
plot.word_stats_counts Plots a word_stats_counts Object
counts.pos_by Parts of Speech
print.cumulative_lexical_classification Prints a cumulative_lexical_classification Object
preprocessed.subject_pronoun_type Question Counts
plot.subject_pronoun_type Plots an subject_pronoun_type Object
plot.lexical_classification_preprocessed Plots a lexical_classification_preprocessed Object
plot.sum_cmspans Plot Summary Stats for a Summary of a cmspans Object
end_mark Sentence End Marks
preprocessed.word_position Word Position
plot.type_token_ratio Plots a type_token_ratio Object
plot.word_proximity Plots a word_proximity object
plot.cumulative_formality Plots a cumulative_formality Object
env.syl Syllable Lookup Environment
plot.weighted_wfm Plots a weighted_wfm object
print.cumulative_animated_formality Prints a cumulative_animated_formality Object
print.cumulative_animated_lexical_classification Prints a cumulative_animated_lexical_classification Object
preprocessed.pos Parts of Speech
exclude Exclude Elements From a Vector
plot.cumulative_syllable_freq Plots a cumulative_syllable_freq Object
print.cumulative_syllable_freq Prints a cumulative_syllable_freqObject
print.discourse_map Prints a discourse_map Object
polarity Polarity Score (Sentiment Analysis)
formality Formality Score
print.linsear_write Prints an linsear_write Object
preprocessed.end_mark_by Question Counts
print.cumulative_polarity Prints a cumulative_polarity Object
pos Parts of Speech Tagging
print.lexical_classification_preprocessed Prints a lexical_classification_preprocessed Object
print.linsear_write_count Prints a linsear_write_count Object
preprocessed.pos_by Parts of Speech
new_project Project Template
preprocessed.formality Formality
print.end_mark_by_preprocessed Prints a end_mark_by_preprocessed object
print.animated_polarity Prints an animated_polarity Object
print.animated_lexical_classification Prints an animated_lexical_classification Object
print.end_mark_by Prints an end_mark_by object
print.object_pronoun_type Prints a object_pronoun_type object
print.lexical_classification_score Prints a lexical_classification_score Object
print.inspect_text Prints an inspect_text Object
print.SMOG Prints an SMOG Object
print.pos Prints a pos Object.
ngrams Generate ngrams
print.adjacency_matrix Prints an adjacency_matrix Object
print.character_table Prints a character_table object
print.check_spelling Prints a check_spelling Object
print.coleman_liau Prints an coleman_liau Object
print.cm_distance Prints a cm_distance Object
print.pos_preprocessed Prints a pos_preprocessed object
print.pos_by Prints a pos_by Object.
print.diversity Prints a diversity object
print.end_mark Prints an end_mark object
print.polarity_count Prints a polarity_count Object
print.wfm Prints a wfm Object
print.wfm_summary Prints a wfm_summary Object
print.polarity Prints an polarity Object
print.readability_count Prints a readability_count Object
print.readability_score Prints a readability_score Object
print.kullback_leibler Prints a kullback_leibler Object.
proportions.character_table Term Counts
proportions.end_mark_by Question Counts
proportions.question_type Question Counts
print.sum_cmspans Prints a sum_cmspans object
print.subject_pronoun_type Prints a subject_pronoun_type object
print.phrase_net Prints a phrase_net Object
proportions.pronoun_type Question Counts
raj Romeo and Juliet (Unchanged & Complete)
raj.act.1 Romeo and Juliet: Act 1
plot.SMOG Plots a SMOG Object
print.linsear_write_scores Prints a linsear_write_scores Object
print.pronoun_type Prints a pronoun_type object
plot.discourse_map Plots a discourse_map Object
print.word_proximity Prints a word_proximity object
print.table_count Prints a table_count object
print.word_stats Prints a word_stats object
print.table_proportion Prints a table_proportion object
plot.animated_character Plots an animated_character Object
scores.object_pronoun_type Question Counts
plot.cumulative_animated_formality Plots a cumulative_animated_formality Object
print.ngrams Prints an ngrams object
print.qdapProj Prints a qdapProj Object
plot.cumulative_animated_lexical_classification Plots a cumulative_animated_lexical_classification Object
plot.end_mark_by_preprocessed Plots a end_mark_by_preprocessed Object
plot.lexical Plots a lexical Object
plot.linsear_write_count Plots a linsear_write_count Object
plot.kullback_leibler Plots a kullback_leibler object
print.qdap_context Prints a qdap_context object
plot.linsear_write_scores Plots a linsear_write_scores Object
scores.polarity Polarity
proportions.termco Term Counts
proportions.subject_pronoun_type Question Counts
scores.pos_by Parts of Speech
plot.end_mark_by_proportion Plots a end_mark_by_proportion Object
strip Strip Text
scores.pronoun_type Question Counts
strWrap Wrap Character Strings to Format Paragraphs
proportions.pos Parts of Speech
plot.table_score Plots a table_score Object
plot.termco Plots a termco object
print.sums_gantt Prints a sums_gantt object
raj.act.1POS Romeo and Juliet: Act 1 Parts of Speech by Person A dataset containing a list from pos_by using the mraja1spl data set (see pos_by for more information).
proportions.pos_by Parts of Speech
print.syllable_sum Prints an syllable_sum object
raj.act.5 Romeo and Juliet: Act 5
raj.act.2 Romeo and Juliet: Act 2
plot.word_cor Plots a word_cor object
trans_context Print Context Around Indices
qcombine Combine Columns
plot.end_mark_by_score Plots a end_mark_by_score Object
raj.demographics Romeo and Juliet Demographics
sample.time.span Minimal Time Span Data Set
trans_venn Venn Diagram by Grouping Variable
print.word_list Prints a word_list Object
scores Generic Scores Method
plot.flesch_kincaid Plots a flesch_kincaid Object
print.word_position Prints a word_position object.
print.word_stats_counts Prints a word_stats_counts object
scores.formality Formality
plot.word_length Plots a word_length Object
scores.fry Readability Measures
preprocessed Generic Preprocessed Method
word_list Raw Word Lists/Frequency Counts
word_diff_list Differences In Word Use Between Groups
word_length Count of Word Lengths Type
qcv Quick Character Vector
summary.wfdf Summarize a wfdf object
word_network_plot Word Network Plot
preprocessed.check_spelling_interactive Check Spelling
pronoun_type Count Object/Subject Pronouns Per Grouping Variable
plot.pos Plots a pos Object
print.all_words Prints an all_words Object
qheat Quick Heatmap
plot.pos_by Plots a pos_by Object
qprep Quick Preparation of Text
summary.wfm Summarize a wfm object
print.animated_character Prints an animated_character Object
raj.act.3 Romeo and Juliet: Act 3
print.automated_readability_index Prints an automated_readability_index Object
print.boolean_qdap Prints a boolean_qdap object
raw.time.span Minimal Raw Time Span Data Set
syllable_sum Syllabication
plot.readability_count Plots a readability_count Object
read.transcript Read Transcripts Into R
print.colsplit2df Prints a colsplit2df Object.
plot.readability_score Plots a readability_score Object
raj.act.4 Romeo and Juliet: Act 4
synonyms Search For Synonyms
scores.SMOG Readability Measures
print.combo_syllable_sum Prints an combo_syllable_sum object
preprocessed.lexical_classification Lexical Classification
replace_abbreviation Replace Abbreviations
preprocessed.object_pronoun_type Question Counts
scores.automated_readability_index Readability Measures
print.lexical_classification Prints an lexical_classification Object
replace_contraction Replace Contractions
weight Weight a qdap Object
print.lexical_classification_by Prints a lexical_classification Object
rm_row Remove Rows That Contain Markers
word_associate Find Associated Words
rm_stopwords Remove Stop Words
pres_debate_raw2012 First 2012 U.S. Presidential Debate
pres_debates2012 2012 U.S. Presidential Debates
scores.end_mark_by Question Counts
scores.character_table Term Counts
print.animated_discourse_map Prints an animated_discourse_map Object
print.animated_formality Prints a animated_formality Object
print.cumulative_end_mark Prints a cumulative_end_mark Object
print.question_type Prints a question_type object
scores.flesch_kincaid Readability Measures
print.cumulative_formality Prints a cumulative_formality Object
print.question_type_preprocessed Prints a question_type_preprocessed object
print.flesch_kincaid Prints an flesch_kincaid Object
scores.word_position Word Position
print.formality Prints a formality Object
print.table_score Prints a table_score object
print.termco Prints a termco object.
print.formality_scores Prints a formality_scores object
print.which_misspelled Prints a which_misspelled Object
print.fry Prints an fry Object
print.polarity_score Prints a polarity_score Object
print.polysyllable_sum Prints an polysyllable_sum object
scores.coleman_liau Readability Measures
print.sent_split Prints a sent_split object
scores.question_type Question Counts
print.sub_holder Prints a sub_holder object
scores.word_stats Word Stats
print.word_associate Prints a word_associate object
proportions.formality Formality
print.trunc Prints a trunc object
subject_pronoun_type Count Subject Pronouns Per Grouping Variable
print.type_token_ratio Prints a type_token_ratio Object
print.word_cor Prints a word_cor object
scores.subject_pronoun_type Question Counts
print.word_length Prints a word_length object
proportions.object_pronoun_type Question Counts
t.DocumentTermMatrix Transposes a DocumentTermMatrix object
t.TermDocumentMatrix Transposes a TermDocumentMatrix object
word_cor Find Correlated Words
word_count Word Counts
proportions.word_length Word Length Counts
proportions.word_position Word Position
summary.cmspans Summarize a cmspans object
prop Convert Raw Numeric Matrix or Data Frame to Proportions
qtheme Add themes to a Network object.
question_type Count of Question Type
termco Search For and Count Terms
proportions Generic Proportions Method
termco_c Combine Columns from a termco Object
word_proximity Proximity Matrix Between Words
word_position Word Position
qdap qdap: Quantitative Discourse Analysis Package
rajPOS Romeo and Juliet Split in Parts of Speech
qdap_df Create qdap Specific Data Structure
rajSPLIT Romeo and Juliet (Complete & Split)
replace_symbol Replace Symbols With Word Equivalents
random_sent Generate Random Dialogue Data
replacer Replace Cells in a Matrix or Data Frame
scores.termco Term Counts
rank_freq_mplot Rank Frequency Plot
replace_number Replace Numbers With Text Representation
scores.word_length Word Length Counts
replace_ordinal Replace Mixed Ordinal Numbers With Text Representation
scores.lexical_classification Lexical Classification
scores.linsear_write Readability Measures
space_fill Replace Spaces
spaste Add Leading/Trailing Spaces
tot_plot Visualize Word Length by Turn of Talk
trans_cloud Word Clouds by Grouping Variable
scrubber Clean Imported Text
sentSplit Sentence Splitting
visual Generic visual Method
visual.discourse_map Discourse Map
speakerSplit Break and Stretch if Multiple Persons per Cell
stemmer Stem Text
type_token_ratio Type-Token Ratio
unique_by Find Unique Words by Grouping Variable
word_stats Descriptive Word Statistics
Animate.gantt Gantt Durations
Animate.gantt_plot Gantt Plot
Vignettes of qdap

