Learn R Programming

arete (version 0.1)

get_geodata: Call a Large Language Model (LLM) to extract species geographic data

Description

Send an API request to extract species data from a document. For now only service = "GPT" is supported but more are planned including both proprietary and open source models. Uses the API...

Usage

get_geodata(
  path,
  user_key,
  service = "GPT",
  model = "gpt-3.5",
  tax = NULL,
  outpath = NULL,
  outliers = FALSE,
  verbose = TRUE
)

Value

matrix. Containing the extracted information.

Arguments

path

character. string of a file with species data in either pdf or txt format, e.g: "./folder/file.pdf"

user_key

list. Two elements, first element is a character with the user's API key, second element is a logical Bool determining whether the user's account has access to premium features. Both free keys and premium keys are allowed.

service

character. Model to be used. Right now, only requests using OpenAI's chatGPT are available.

model

character. Model name from given service to be used. You may use any of the models listed on OpenAI's developer platform. If you are unsure which model to use, we recommend picking "gpt-3.5" (default) or "gpt-4o", as these will pick our recommended model from that version.

tax

character. Binomial name of the species to specify extraction to. Most often increases performance of the model.

outpath

Character string of a path to save output to in the format "path/to/file/file_prefix".

outliers

logical. Whether or not results should be processed using the methods described in gecko::outliers.detect()

verbose

logical determining if output should be printed.

See Also

arete_setup

Examples

Run this code
if (FALSE) {
file_path = arete_data("holzapfelae")

get_geodata(
  path = file_path,
  user_key = list(key = "your key here", premium = TRUE),
  model = "gpt-4o",
  outpath = "./out"
)}

Run the code above in your browser using DataLab