qdap (version 2.2.4)

bracketX: Bracket Parsing

Description

bracketX - Apply bracket removal to character vectors. bracketXtract - Apply bracket extraction to character vectors. genX - Apply general chunk removal to character vectors. A generalized version of bracketX. genXtract - Apply general chunk extraction to character vectors. A generalized version of bracketXtract.

Usage

bracketX(text.var, bracket = "all", missing = NULL, names = FALSE,
  fix.space = TRUE, scrub = fix.space)

bracketXtract(text.var, bracket = "all", with = FALSE, merge = TRUE)

genX(text.var, left, right, missing = NULL, names = FALSE,
  fix.space = TRUE, scrub = TRUE)

genXtract(text.var, left, right, with = FALSE, merge = TRUE)

Arguments

text.var
The text variable
bracket
The type of bracket (and encased text) to remove. This is one or more of the strings "curly", "square", "round", "angle" and "all". These strings correspond to: {, [, (, < or all four t
missing
Value to assign to empty cells.
names
logical. If TRUE the sentences are given as the names of the counts.
fix.space
logical. If TRUE extra spaces left behind from an extraction will be eliminated. Additionally, non-space (e.g., "text(no space between text and parenthesis)") is replaced with a single space (e.g., "text (space b
scrub
logical. If TRUE scrubber will clean the text.
with
logical. If TRUE returns the brackets and the bracketed text.
merge
logical. If TRUE the results of each bracket type will be merged by sentence. FALSE returns a named list of lists of vectors of bracketed text per bracket type.
left
A vector of character or numeric symbols as the left edge to extract.
right
A vector of character or numeric symbols as the right edge to extract.

Value

  • bracketX - returns a vector of text with brackets removed. bracketXtract - returns a list of vectors of bracketed text. genXtract - returns a vector of text with chunks removed. genX - returns a list of vectors of removed text.

References

http://stackoverflow.com/q/8621066/1000343

See Also

regex

Examples

Run this code
examp <- structure(list(person = structure(c(1L, 2L, 1L, 3L),
    .Label = c("bob", "greg", "sue"), class = "factor"), text =
    c("I love chicken [unintelligible]!",
    "Me too! (laughter) It's so good.[interrupting]",
    "Yep it's awesome {reading}.", "Agreed. {is so much fun}")), .Names =
    c("person", "text"), row.names = c(NA, -4L), class = "data.frame")

examp
bracketX(examp$text, "square")
bracketX(examp$text, "curly")
bracketX(examp$text, c("square", "round"))
bracketX(examp$text)


bracketXtract(examp$text, "square")
bracketXtract(examp$text, "curly")
bracketXtract(examp$text, c("square", "round"))
bracketXtract(examp$text, c("square", "round"), merge = FALSE)
bracketXtract(examp$text)
bracketXtract(examp$text, with = TRUE)

paste2(bracketXtract(examp$text, "curly"), " ")

x <- c("Where is the /big dog#?",
    "I think he's @arunning@b with /little cat#.")
genXtract(x, c("/", "@a"), c("#", "@b"))

x <- c("Where is the L1big dogL2?",
    "I think he's 98running99 with L1little catL2.")
genXtract(x, c("L1", 98), c("L2", 99))

DATA$state  #notice number 1 and 10
genX(DATA$state, c("is", "we"), c("too", "on"))

Run the code above in your browser using DataLab