Learn R Programming

edgemodelr (version 0.2.0)

edge_load_model: Load a local GGUF model for inference

Description

Load a local GGUF model for inference

Usage

edge_load_model(model_path, n_ctx = 2048L, n_gpu_layers = 0L,
  n_threads = NULL, flash_attn = TRUE)

Value

External pointer to the loaded model context

Arguments

model_path

Path to a .gguf model file

n_ctx

Maximum context length (default: 2048)

n_gpu_layers

Number of layers to offload to GPU (default: 0, CPU-only)

n_threads

Number of CPU threads for inference (default: NULL = use all hardware threads). Set to a lower value to leave cores free for other tasks.

flash_attn

Enable flash attention for faster inference (default: TRUE). Reduces memory usage and improves speed. Set to FALSE for maximum compatibility.

Examples

Run this code
if (FALSE) {
# Load a TinyLlama model (requires model file)
model_path <- "~/models/TinyLlama-1.1B-Chat.Q4_K_M.gguf"
if (file.exists(model_path)) {
  ctx <- edge_load_model(model_path, n_ctx = 2048)

  # Generate completion
  result <- edge_completion(ctx, "Explain R data.frame:", n_predict = 100)
  cat(result)

  # Load with threading control
  ctx2 <- edge_load_model(model_path, n_threads = 4, flash_attn = TRUE)

  # Free model when done
  edge_free_model(ctx)
}
}

Run the code above in your browser using DataLab