Learn R Programming

arete (version 0.1)

create_training_data: Create training data for GPT

Description

Open WebAnnoTSV files following RECODE structure and build training data for large language models in a variety of formats.

Usage

create_training_data(
  input,
  prompt = NULL,
  service = "GPT",
  aggregate = TRUE,
  export_type = "jsonl",
  out_path = NULL
)

Value

matrix / data.frame

Arguments

input

character or list. Either a set of paths to WebAnno TSV 3.3 files from which the text and annotated data are taken or a list with two terms, 1) paths to .txt or .pdf files e.g: "./folder/file.pdf" from which text data will be taken from and 2) paths to WebAnno TSV 3.3 files from which to take annotation data.

prompt

character. Custom prompt to be attached to each text during construction of the training data. Default prompt used otherwise.

service

character. Service to be used. Right now, only GPT is available.

aggregate

boolean. If TRUE and prompt is "csv", a single csv is created.

export_type

character. Either "jsonl" or "csv". If "jsonl", a single file is created in which each line is a json specifying the input (prompt and text) and expected output (data).

out_path

character. Path to where the training data will be saved.

Examples

Run this code
example = system.file(paste0("extdata/insecta_annot_1.tsv"), package = "arete")

create_training_data(input = example, service = "GPT", export_type = "jsonl")

Run the code above in your browser using DataLab