flickr_caption_dataset: Flickr Caption Datasets

Description

Flickr8k Dataset

Usage

flickr8k_caption_dataset(
  root = tempdir(),
  train = TRUE,
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)
flickr30k_caption_dataset(
  root = tempdir(),
  train = TRUE,
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

Value

A torch dataset of class flickr8k_caption_dataset. Each element is a named list:

x: a H x W x 3 integer array representing an RGB image.
y: a character vector containing all five captions associated with the image.

A torch dataset of class flickr30k_caption_dataset. Each element is a named list:

x: a H x W x 3 integer array representing an RGB image.
y: a character vector containing all five captions associated with the image.

Arguments

root: Character. Root directory where the dataset will be stored under root/flickr30k.
train: : If TRUE, loads the training set. If FALSE, loads the test set. Default is TRUE.
transform: Optional function to transform input images after loading. Default is NULL.
target_transform: Optional function to transform labels. Default is NULL.
download: Logical. Whether to download the dataset if not found locally. Default is FALSE.

Details

The Flickr8k and Flickr30k collections are image captionning datasets composed of 8,000 and 30,000 color images respectively, each paired with five human-annotated captions. The images are in RGB format with varying spatial resolutions, and these datasets are widely used for training and evaluating vision-language models.

Examples

Run this code

if (FALSE) {
# Load the Flickr8k caption dataset
flickr8k <- flickr8k_caption_dataset(download = TRUE)

# Access the first item
first_item <- flickr8k[1]
first_item$x  # image array with shape {3, H, W}
first_item$y  # character vector containing five captions.

# Load the Flickr30k caption dataset
flickr30k <- flickr30k_caption_dataset(download = TRUE)

# Access the first item
first_item <- flickr30k[1]
first_item$x  # image array with shape {3, H, W}
first_item$y  # character vector containing five captions.
}