explore: Launch Shiny app for exploration of text collection

Description

Launch Shiny app for exploration of text collection. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).

explore() explores a 'corporaexplorerobject' created with the prepare_data() function. App settings optionally specified in the arguments to explore().

explore0() is a convenience function to directly explore a data frame or character vector without first creating a corporaexplorerobject using prepare_data(), instead creating one on the fly as the app launches. Functionally equivalent to explore(prepare_data(dataset, use_matrix = FALSE)).

Usage

explore(
  corpus_object,
  search_options = list(),
  ui_options = list(),
  search_input = list(),
  plot_options = list(),
  ...
)
explore0(
  dataset,
  arguments_prepare_data = list(use_matrix = FALSE),
  arguments_explore = list()
)

Arguments

corpus_object

A corporaexplorerobject created by prepare_data.

search_options

List. Specify how search operations in the app are carried out. Available options:

use_matrix Logical. If the corporaexplorerobject contains a document term matrix, should it be used for searches? (See prepare_data.) Defaults to TRUE.
regex_engine Character. Specify regular expression engine to be used (defaults to "default"). Available options:
- "default": use the re2r package (https://github.com/qinwf/re2r) for simple searches and the stringr package (https://github.com/tidyverse/stringr for complex regexes (i.e. when special regex characters are used).
- "stringr": use stringr for all searches.
- "re2r": use re2r for all searches.
optional_info Logical. If TRUE, information about search method (regex engine and whether the search was conducted in the document term matrix or in the full text documents).
allow_unreasonable_patterns Logical. If FALSE, the default, the app will not allow patterns that will result in an enormous amount of hits or will lead to a very slow search. (Examples of such patterns will include '.' and '\b'.)

ui_options

List. Specify custom app settings (see example below). Currently available:

font_size. Character string specifying font size in document view, e.g. "10px"

search_input

List. Gives the opportunity to pre-populate the following sidebar fields (see example below):

search_terms: The 'Term(s) to chart and highlight' field. Character vector with maximum length 5.
highlight_terms: The 'Additional terms for text highlighting' field. Character vector.
filter_terms: The 'Filter corpus?' field. Character vector.
case_sensitivity: Should the 'Case sensitive search' box be checked? Logical.

plot_options

List. Specify custom plot settings (see example below). Currently available:

max_docs_in_wall_view. Integer specifying the maximum number of documents to be rendered in the 'document wall' view. Default value is 12000.
plot_size_factor. Numeric. Tweaks the corpus map plot's height. Value > 1 increases height, value < 1 decreases height. Ignored if value <= 0.
documents_per_row_factor. Numeric. Tweaks the number of documents included in each row in 'document wall' view. Value > 1 increases number of documents, value < 1 decreases number of documents. Ignored if value <= 0.
document_tiles. Integer specifying the number of tiles used in the tile chart representing occurences of terms in document. Ignored if value < 1 or if value > 50.
colours. Character vector of length 1 to 6. Specify the order of the colours used to represent search (and highlight) terms in plots and documents. The default order and available colours are defined by the character vector c("red", "blue", "green", "purple", "orange", "gray"). Passing e.g. plot_options = list(colours = c("gray", "green")) will change that order to c("gray", "green", "red", "blue", "purple", "orange"). Arguments with duplicated colours or with colours not present in the default character vector will be ignored.
tile_length. Either "scaled" or "uniform". With "scaled", the default, the length of the tiles in document wall view and day corpus view will vary according to length of document (see the tile_length_range argument in prepare_data()). If "uniform", all tiles will be of equal length.

...

Other arguments passed to runApp in the Shiny package.

dataset

Data frame or character vector as specified in prepare_data()

arguments_prepare_data

List. Arguments to be passed to prepare_data() in order to override this function's default argument values.

arguments_explore

List. Arguments to be passed to explore() in order to override this function's default argument values.

Value

Launches a Shiny app.

Details

For explore0(): by default, no document term matrix will be generated, meaning that the data will be prepared for exploration faster than by using the default settings in prepare_data(), but also that searches in the app are likely to be slower.

Examples

Run this code

# NOT RUN {
# Constructing test data frame:
dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-"))
texts <- paste0(
  "This is a document about ", month.name[1:10], ". ",
  "This is not a document about ", rev(month.name[1:10]), "."
)
titles <- paste("Text", 1:10)
test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles)

# Converting to corporaexplorerobject:
corpus <- prepare_data(test_df, corpus_name = "Test corpus")

if(interactive()){

# Running exploration app:
explore(corpus)
explore(corpus,
        search_options = list(optional_info = TRUE),
        ui_options = list(font_size = "10px"),
        search_input = list(search_terms = c("Tottenham", "Spurs")),
        plot_options = list(MAX_DOCS_IN_WALL_VIEW = 12001,
                                        colours = c("gray", "green")))

# Running app to extract documents:
run_document_extractor(corpus)
}
if (interactive()) {

explore0(rep(sample(LETTERS), 10))

explore0(rep(sample(LETTERS), 10),
  arguments_explore = list(search_input = list(search_terms = "Z"))
)

}
# }

Run the code above in your browser using DataLab