phrase_counts: Count phrase matches in CHILDES utterances (experimental)

Description

Matches surface phrases in utterance text and outputs counts, plus dataset summary and run metadata. Supports simple wildcards in phrases: * (any chars), ? (one char). Normalization is per number of utterances.

Usage

phrase_counts(
  phrases,
  collection = NULL,
  language = NULL,
  corpus = NULL,
  age = NULL,
  sex = NULL,
  role = NULL,
  role_exclude = NULL,
  wildcard = FALSE,
  ignore_case = TRUE,
  normalize = FALSE,
  per_utts = 10000L,
  db_version = "current",
  cache = FALSE,
  cache_dir = NULL,
  output_file = NULL
)

Value

If output_file is NULL, returns a tibble of phrase counts; otherwise writes an Excel file and returns the file path (invisibly).

Arguments

phrases: Character vector of phrases or patterns.
collection, language, corpus, age, sex, role, role_exclude: CHILDES filters.
wildcard: Logical; enable * and ? in phrases.
ignore_case: Logical; case-insensitive matching.
normalize: Logical; if TRUE, add per-N utterance rates.
per_utts: Integer; denominator for utterance rates (default 10000).
db_version: CHILDES DB version (recorded).
cache: Logical; cache CHILDES queries on disk.
cache_dir: Optional cache directory.
output_file: Optional .xlsx path; if NULL, returns a tibble.

Details

Tier targeting is not applied in phrase mode. Phrases are matched in the main utterance text. For tier-constrained contexts around words, use contexts_for(..., mode = "word", tier = "mor").