model_load: Load Language Model with Automatic Download Support

Description

Loads a GGUF format language model from local path or URL with intelligent caching and download management. Supports various model sources including Hugging Face, Ollama repositories, and direct HTTPS URLs. Models are automatically cached to avoid repeated downloads.

Usage

model_load(
  model_path,
  cache_dir = NULL,
  n_gpu_layers = 0L,
  use_mmap = TRUE,
  use_mlock = FALSE,
  show_progress = TRUE,
  force_redownload = FALSE,
  verify_integrity = TRUE,
  check_memory = TRUE,
  hf_token = NULL,
  verbosity = 1L
)

Value

A model object (external pointer) that can be used with context_create,

tokenize, and other model functions

Arguments

model_path

Path to local GGUF model file, URL, or cached model name. Supported URL formats:

https:// - Direct download from web servers

If you previously downloaded a model through this package you can supply the cached file name (or a distinctive fragment of it) instead of the full path or URL. The loader will search the local cache and offer any matches.

cache_dir

Custom directory for downloaded models (default: NULL uses system cache directory)

n_gpu_layers

Number of transformer layers to offload to GPU (default: 0 for CPU-only). Set to -1 to offload all layers, or a positive integer for partial offloading

use_mmap

Enable memory mapping for efficient model loading (default: TRUE). Disable only if experiencing memory issues

use_mlock

Lock model in physical memory to prevent swapping (default: FALSE). Enable for better performance but requires sufficient RAM

show_progress

Display download progress bar for remote models (default: TRUE)

force_redownload

Force re-download even if cached version exists (default: FALSE). Useful for updating to newer model versions

verify_integrity

Verify file integrity using checksums when available (default: TRUE)

check_memory

Check if sufficient system memory is available before loading (default: TRUE)

hf_token

Optional Hugging Face access token to set during model resolution. Defaults to the existing `HF_TOKEN` environment variable.

verbosity

Control backend logging during model loading (default: 1L). Larger numbers print more detail: 0 shows only errors, 1 adds warnings, 2 prints informational messages, and 3 enables the most verbose debug output.

Examples

Run this code

if (FALSE) {
# Load local GGUF model
model <- model_load("/path/to/my_model.gguf")

# Download from Hugging Face and cache locally
hf_path = "https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf"
model <- model_load(hf_path)

# Load with GPU acceleration (offload 10 layers)
model <- model_load("/path/to/model.gguf", n_gpu_layers = 10)

# Download to custom cache directory
model <- model_load(hf_path, 
                    cache_dir = file.path(tempdir(), "my_models"))

# Force fresh download (ignore cache)
model <- model_load(hf_path, 
                    force_redownload = TRUE)

# High-performance settings for large models
model <- model_load("/path/to/large_model.gguf", 
                    n_gpu_layers = -1,     # All layers on GPU
                    use_mlock = TRUE)      # Lock in memory

# Load with minimal verbosity (quiet mode)
model <- model_load("/path/to/model.gguf", verbosity = 2L)
}

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples