Learn R Programming

VectrixDB

Overview

Zero config. Text in, results out. Pure R - no Python required.

VectrixDB is a lightweight, pure R vector database with built-in text embeddings. No external dependencies, no API keys, no Python - just install and use.

Installation

# Install from GitHub
devtools::install_github("knowusuboaky/vectrixdb-r")

# Required dependencies
install.packages(c("R6", "text2vec", "digest", "Matrix"))

# Optional for better performance
install.packages(c("RcppAnnoy", "word2vec", "stopwords"))

Quick Start

library(VectrixDB)

# Create and add - ONE LINE
db <- Vectrix$new("my_docs")$add(c("Python is great", "Machine learning is fun", "R is awesome"))

# Search - ONE LINE
results <- db$search("programming")

# Get top result
print(results$top()$text)
#> "R is awesome"

Features

Pure R - No Python Required

Unlike other vector databases, VectrixDB is 100% R: - TF-IDF embeddings - Built-in, works out of the box - BM25 search - Using text2vec - Optional word vectors - Load GloVe or word2vec files - ANN indexing - Fast search with RcppAnnoy

Search Modes

# Dense (semantic) search using TF-IDF/word vectors
results <- db$search("query", mode = "dense")

# Sparse (keyword/BM25) search
results <- db$search("query", mode = "sparse")

# Hybrid (dense + sparse with RRF fusion)
results <- db$search("query", mode = "hybrid")

# Ultimate (hybrid + reranking)
results <- db$search("query", mode = "ultimate")

Embedding Options

# Default: TF-IDF (no external files needed)
db <- Vectrix$new("docs")

# With GloVe word vectors (download from Stanford NLP)
db <- Vectrix$new("docs", model = "glove", model_path = "glove.6B.100d.txt")

# With word2vec (download or train your own)
db <- Vectrix$new("docs", model = "word2vec", model_path = "GoogleNews-vectors.bin")

# Custom embedding function
my_embed <- function(texts) {
  # Your custom logic
  matrix(rnorm(length(texts) * 100), nrow = length(texts))
}
db <- Vectrix$new("docs", embed_fn = my_embed, dimension = 100)

Metadata Filtering

# Add with metadata
db$add(
  texts = c("Python guide", "ML tutorial", "R handbook"),
  metadata = list(
    list(category = "programming", year = 2024),
    list(category = "ai", year = 2024),
    list(category = "programming", year = 2023)
  )
)

# Search with filter
results <- db$search("guide", filter = list(category = "programming"))

Reranking

# MMR for diversity
results <- db$search("AI", rerank = "mmr", diversity = 0.7)

# Cross-encoder style reranking
results <- db$search("AI", rerank = "cross-encoder")

Results API

results <- db$search("query")

# Access results
results$top()          # Get top result
results$texts()        # All result texts as vector
results$ids()          # All result IDs
results$scores()       # All scores
results$length()       # Number of results
results$get(2)         # Get second result

# Iterate
results$foreach(function(r) {
  cat(sprintf("%s: %.3f\n", r$id, r$score))
})

Advanced Usage

VectrixDB Class

# Create database
vdb <- VectrixDB$new("./my_data")

# Create collection with custom dimension
collection <- vdb$create_collection("docs", dimension = 100)

# Add vectors directly
collection$add(
  ids = c("doc1", "doc2"),
  vectors = matrix(rnorm(200), nrow = 2),
  metadata = list(list(a = 1), list(a = 2)),
  texts = c("First doc", "Second doc")
)

# Search
results <- collection$search(query_vector, limit = 10)

# Hybrid search
results <- collection$hybrid_search(
  query = query_vector,
  query_text = "search terms",
  limit = 10
)

REST API Server

# Start server (requires plumber package)
vectrix_serve(path = "./my_data", port = 7377)

Performance Tips

  1. Use RcppAnnoy for large collections - Automatically enabled for 100+ docs
  2. Pre-train embeddings - Fit the embedder once on your corpus
  3. Use word vectors - GloVe/word2vec provides better semantic search than TF-IDF
# Pre-train for better embeddings
db <- Vectrix$new("docs")
# Add your corpus - TF-IDF will fit automatically
db$add(large_corpus)
# Now searches will use the fitted vocabulary

Comparison with Python Version

FeatureR VersionPython Version
DependenciesPure RPython + ONNX
Default embeddingsTF-IDFMiniLM (neural)
Word vectorsGloVe, word2vecSame + more
ANN indexingRcppAnnoyusearch
API serverplumberFastAPI

The R version prioritizes simplicity and R-native tools. For neural embeddings, use the Python version or provide custom embed_fn.

Dependencies

Required: - R6 - OOP classes - text2vec - TF-IDF, BM25, vocabulary - digest - ID generation - Matrix - Sparse matrices

Optional: - RcppAnnoy - Fast ANN search - word2vec - Load word2vec models - stopwords - Remove stopwords - RSQLite, DBI - Persistent storage - plumber - REST API

License

Apache License 2.0

Author

Kwadwo Daddy Nyame Owusu Boakye

Links

Copy Link

Version

Install

install.packages('VectrixDB')

Version

1.1.2

License

Apache License (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Kwadwo Daddy Nyame Owusu Boakye

Last Published

February 20th, 2026

Functions in VectrixDB (1.1.2)

CacheEntry

Cache Entry
DistanceMetric

Distance Metric Enumeration
CacheStats

Cache Statistics
Collection

Collection Class
EnhancedSearchResults

Enhanced Search Results
DocumentChunker

Document Chunker
CommunityDetector

Simple Community Detector
ENGLISH_STOPWORDS

English Stopwords
Community

Community
DenseEmbedder

Dense Embedder using word2vec or GloVe
FacetConfig

Facet Configuration
Entity

Entity
FacetAggregator

Facet Aggregator
FacetResult

Facet Result
ExtractorType

Extractor Types
ExtractionResult

Extraction Result
FacetValue

Facet Value
Filter

Filter Class for Metadata Filtering
FileCache

File Cache
GlobalSearchResult

Global Search Result
KeywordAnalyzer

Keyword Analyzer
HNSWIndex

HNSW Index
LateInteractionEmbedder

Late Interaction Embedder (Simplified ColBERT-style)
GraphRAGPipeline

GraphRAG Pipeline
LLMProvider

LLM Provider Types
InMemoryStorage

In-Memory Storage
GraphSearchType

Graph Search Types
GlobalSearcher

Global Searcher
MMRReranker

Maximal Marginal Relevance (MMR) Reranker
GraphRAGConfig

GraphRAG Configuration
KnowledgeGraph

Knowledge Graph
LocalSearcher

Local Searcher
RegexExtractor

Regex Entity Extractor
NoCache

No-Op Cache
Relationship

Relationship
RerankerEmbedder

Reranker (Cross-Encoder Style Scoring)
Results

Search Results Collection
LocalSearchResult

Local Search Result
Result

Single Search Result
MemoryCache

Memory Cache
TextAnalyzer

Text Analyzer
SentenceEmbedder

Sentence Embedder using Word Vectors
VectorCache

Vector Cache
SQLiteStorage

SQLite Storage
SubGraph

SubGraph
SimpleStemmer

Simple Stemmer
SparseEmbedder

Sparse Embedder (BM25/TF-IDF)
Vectrix

VectrixDB Easy API - The Simplest Vector Database
advanced_search

VectrixDB Advanced Search Features
acl_config_from_list

Create ACL Config from List
cache_config_from_env

Create Cache Config from Environment
check_python_module

Check if Python module is available
TextUnit

Text Unit
SearchMode

Search Mode Enumeration
cli_print

Print CLI Message
cache

VectrixDB Cache Layer
VectrixDB

VectrixDB Database Class
create_default_graphrag_config

Create Default GraphRAG Config
VectrixDB-package

VectrixDB: Lightweight Vector Database with Embedded Machine Learning Models
create_pipeline

Create GraphRAG Pipeline
create_sentence_embedder

Create a sentence embedder with automatic download
create_hnsw_index

Create HNSW Index
cosine_similarity

Compute cosine similarity
create_cache

Create Cache
generate_id

Generate deterministic ID from text
download_vectors

Download pre-trained word vectors
cli

VectrixDB Command Line Interface
create_vector_cache

Create Vector Cache
load_glove_text

Load GloVe text format
get_cli_config

Get CLI Config
load_hnsw_index

Load HNSW Index
mmr_rerank

Maximal Marginal Relevance (MMR) reranking
euclidean_distance

Compute euclidean distance
server

VectrixDB Server Functions
storage

VectrixDB Storage Classes
set_cli_config

Set CLI Config
text_analyzer_english

Create English Text Analyzer
load_word_vectors

Load word vectors into memory
load_word2vec_binary

Load word2vec binary format
vdb_dashboard_simple

Launch Simple Dashboard
quick_search

Quick search - Index texts and search immediately
types

VectrixDB Types and Classes
read_word2vec_model

Load word2vec model via optional runtime namespace lookup
utils

VectrixDB Utility Functions
vdb_interactive

Start Interactive CLI
vdb_add_dir

Batch Add from Directory
vdb_info

Collection Info
vdb_add

Add Documents
normalize_language_tag

Normalize language tag to supported values
word_vectors

Word Vector Management
word2vec_available

Check whether word2vec package is installed
graphrag

VectrixDB GraphRAG Module
download_word_vectors

Download pre-trained word vectors
embedders

VectrixDB Embedders (Pure R Implementation)
cli_table

Print Table
format_time

Format time duration
normalize_vectors

Normalize vectors for cosine similarity
vdb_delete

Delete Collection
text_analyzer_standard

Create Standard Text Analyzer
parse_acl

Parse ACL String
vdb_import

Import from File
vdb_get

Get Document
vdb_dashboard

Launch VectrixDB Dashboard
vdb_create

Create Collection
tokenize_text_by_language

Language-aware tokenizer used across embedders and keyword search
reranker

Advanced Reranking Module
hnsw

VectrixDB HNSW Index
rrf_fusion

Reciprocal Rank Fusion (RRF)
vectrix_create

Create a new Vectrix collection
vectrix_open

Open an existing Vectrix collection
vectrix_info

Display VectrixDB information
vectrix_serve

Start VectrixDB server
vdb_export

Export Collection
text_analyzer_simple

Create Simple Text Analyzer
text_analyzer_keyword

Create Keyword Text Analyzer
vdb_list

List Collections
vdb_delete_docs

Delete Documents
vdb_stats

Collection Statistics
vdb_search

Search Collection
vdb_open

Open Collection
ACLOperator

ACL Operator Types
BaseCache

Base Cache
AdvancedReranker

Advanced Reranker with Learned Weights
ACLFilter

ACL Filter
ACLPrincipal

ACL Principal
AnalyzerChain

Analyzer Chain
CacheConfig

Cache Configuration
ACLConfig

ACL Configuration
CLIConfig

CLI Configuration
CacheBackend

Cache Backend Types