custom_grep: Retrieve Text Between XML Tags

Description

Extract text form a string containing XML or HTML tags. Text included between tags of interest will be returned. If multiple tagged substrings are found, they will be returned as different elements of a list or character vector.

Usage

custom_grep(xml_data, tag, format = "list")

Value

List or vector where each element corresponds to an in-tag substring.

Arguments

xml_data: String (of class character and length 1): corresponds to the PubMed record or any string including XML/HTML tags.
tag: String (of class character and length 1): the tag of interest (does NOT include < > chars).
format: c("list", "char"): specifies the format for the output.

Author

Damiano Fantini damiano.fantini@gmail.com

Details

The `custom_grep()` function is now obsolete. This is a helper function that will be replaced by `easyPubMed:::EPM_custom_grep()`, an internal function that won't be exported. The `custom_grep()` function will be retired in 2026.

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

Run this code

try({
  ## extract substrings based on regular expressions
  string_01 <- paste0(
    "The itsy bitsy spider ", 
    "Went up the water spout. Down came the rain ", 
    "And washed the spider out.")
  print(string_01)
  custom_grep(xml_data = string_01, tag = "strong", format = "char")
  custom_grep(xml_data = string_01, tag = "strong", format = "list")
}, silent = TRUE)

Run the code above in your browser using DataLab