Learn R Programming

midfieldr (version 1.0.1)

filter_cip: Subset rows that include matches to search strings

Description

Subset a CIP data frame, retaining rows that match or partially match a vector of character strings. Columns are not subset unless selected in an optional argument.

Usage

filter_cip(keep_text = NULL, ..., drop_text = NULL, cip = NULL, select = NULL)

Value

A data.table subset of cip with the following properties:

  • Rows matching elements of keep_text but excluding rows matching elements of drop_text.

  • All columns or those specified by select.

  • Grouping structures are not preserved.

Arguments

keep_text

Character vector of search text for retaining rows, not case-sensitive. Can be empty if drop_text is used.

...

Not used, force later arguments to be used by name

drop_text

Optional character vector of search text for dropping rows, default NULL.

cip

Data frame to be searched. Default cip.

select

Optional character vector of column names to return, default all columns.

Details

Search terms can include regular expressions. Uses grepl(), therefore non-character columns (if any) that can be coerced to character are also searched for matches. Columns are subset by the values in select after the search concludes.

If none of the optional arguments are specified, the function returns the original data frame.

Examples

Run this code
# Subset using keywords
filter_cip(keep_text = "engineering")

# \donttest{
    # Multiple passes to narrow the results
    first_pass <- filter_cip("civil")
    second_pass <- filter_cip("engineering", cip = first_pass)
    filter_cip(drop_text = "technology", cip = second_pass)
    
    # drop_text argument, when used, must be named
    filter_cip("civil engineering", drop_text = "technology")
    
    # Subset using numerical codes
    filter_cip(keep_text = c("050125", "160501"))
    
    # Subset using regular expressions
    filter_cip(keep_text = "^54")
    filter_cip(keep_text = c("^1407", "^1408"))
    
    # Select columns
    filter_cip(keep_text = "^54", select = c("cip6", "cip4name"))
# }

Run the code above in your browser using DataLab