Learn R Programming

childeswordfreq (version 0.2.0)

word_counts: Get word counts by speaker role

Description

Reads a CSV with a word column or an in-memory character vector and writes an Excel file with Word_Frequencies, Dataset_Summary, File_Speaker_Summary, and Run_Metadata. If no word list is provided, all types in the selected slice are counted (FREQ-style “all words” mode).

Usage

word_counts(
  word_list_file = NULL,
  output_file,
  words = NULL,
  collection = NULL,
  language = NULL,
  corpus = NULL,
  age = NULL,
  sex = NULL,
  role = NULL,
  role_exclude = NULL,
  wildcard = FALSE,
  collapse = c("none", "stem"),
  part_of_speech = NULL,
  tier = c("main", "mor"),
  normalize = FALSE,
  per = 1000L,
  zipf = FALSE,
  include_patterns = NULL,
  exclude_patterns = NULL,
  sort_by = c("word", "frequency"),
  min_count = 0L,
  freq_ignore_special = TRUE,
  db_version = "current",
  cache = FALSE,
  cache_dir = NULL,
  ...
)

Value

Invisibly returns output_file after writing the workbook.

Arguments

word_list_file

Optional path to a CSV file with a column named word. If NULL and words is also NULL, all types in the slice are counted.

output_file

Path to the output .xlsx file.

words

Optional character vector of target words/patterns. Ignored if word_list_file is provided. If both are NULL, all types are counted.

collection

Optional CHILDES filter.

language

Optional CHILDES filter.

corpus

Optional CHILDES filter.

age

Optional numeric: single value or c(min, max) in months.

sex

Optional: "male" and/or "female".

role

Optional character vector of roles to include.

role_exclude

Optional character vector of roles to exclude.

wildcard

Logical; treat "%" as any number of characters and "_" as one character (token mode).

collapse

Either "none" or "stem". Using "stem" triggers token mode.

part_of_speech

Optional POS filter, e.g., c("n","v") (token mode).

tier

Which tier to count from: "main" or "mor".

normalize

Logical; if TRUE, add per-N rate columns.

per

Integer denominator for rates (for example 1000 for per-1k).

zipf

Logical; if TRUE, also add Zipf columns (log10 per-billion).

include_patterns

Optional character vector of CHILDES-style patterns, using "%" and "_" to restrict output to matching words (FREQ-style +s).

exclude_patterns

Optional character vector of CHILDES-style patterns to drop from the output.

sort_by

Final sort order: "word" (alphabetical) or "frequency" (descending Total).

min_count

Integer; drop rows with Total < min_count (after counting).

freq_ignore_special

Logical; if TRUE, drop "xxx", "www", and any word starting with 0, &, +, -, or # (FREQ default ignore rules).

db_version

CHILDES database version label to record in metadata.

cache

Logical; if TRUE, cache CHILDES queries on disk.

cache_dir

Optional cache directory when cache = TRUE.

...

Reserved for future extensions; currently unused.

Details

Uses exact type counts by default; switches to token mode when wildcards, stems, or POS filters are requested. Optional MOR-only tier.

Examples

Run this code
if (FALSE) {
# Minimal example (not run during R CMD check)
tmp_csv <- tempfile(fileext = ".csv")
write.csv(data.frame(word = c("the","go")), tmp_csv, row.names = FALSE)

out_file <- tempfile(fileext = ".xlsx")
word_counts(
  word_list_file = tmp_csv,
  output_file    = out_file,
  language       = "eng",
  corpus         = "Brown",
  age            = c(24, 26)
)

# All-words mode (no word list; counts every type in the slice)
out_all <- tempfile(fileext = ".xlsx")
word_counts(
  word_list_file = NULL,
  words          = NULL,
  output_file    = out_all,
  language       = "eng",
  corpus         = "Brown",
  age            = c(24, 26)
)
}

Run the code above in your browser using DataLab