Type: | Package |
Title: | Local Language Model Inference |
Version: | 0.1.0 |
Author: | Pawan Rama Mali [aut, cre] |
Maintainer: | Pawan Rama Mali <prm@outlook.in> |
Description: | Enables R users to run large language models locally using 'GGUF' model files and the 'llama.cpp' inference engine. Provides a complete R interface for loading models, generating text completions, and streaming responses in real-time. Supports local inference without requiring cloud APIs or internet connectivity, ensuring complete data privacy and control. References: 'Gerganov' et al. (2023) https://github.com/ggml-org/llama.cpp. |
License: | MIT + file LICENSE |
URL: | https://github.com/PawanRamaMali/edgemodelr |
BugReports: | https://github.com/PawanRamaMali/edgemodelr/issues |
Encoding: | UTF-8 |
Depends: | R (≥ 4.0) |
LinkingTo: | Rcpp |
Imports: | Rcpp (≥ 1.0.0), utils, tools |
Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown |
SystemRequirements: | C++17, GNU make or equivalent for building |
Note: | This package includes a self-contained 'llama.cpp' implementation resulting in a larger installation size (~56MB) to provide complete functionality without external dependencies. |
Config/testthat/edition: | 3 |
RoxygenNote: | 7.3.3 |
NeedsCompilation: | yes |
Packaged: | 2025-09-17 19:22:58 UTC; aeroe |
Repository: | CRAN |
Date/Publication: | 2025-09-22 12:00:08 UTC |
edgemodelr: Local Language Model Inference
Description
Enables R users to run large language models locally using 'GGUF' model files and the 'llama.cpp' inference engine. Provides a complete R interface for loading models, generating text completions, and streaming responses in real-time. Supports local inference without requiring cloud APIs or internet connectivity, ensuring complete data privacy and control. References: Gerganov et al. (2023) https://github.com/ggml-org/llama.cpp.
Details
The edgemodelr package provides R bindings for Inference for Local Language Models using llama.cpp and GGUF model files. This enables completely private, on-device text generation without requiring cloud APIs or internet connectivity.
Main Functions
edge_load_model
Load a GGUF model file
edge_completion
Generate text completions
edge_stream_completion
Stream text generation in real-time
edge_chat_stream
Interactive chat interface
edge_quick_setup
One-line model download and setup
edge_free_model
Release model memory
Model Management
edge_list_models
List available pre-configured models
edge_download_model
Download models from Hugging Face
Getting Started
Basic usage workflow:
Download a model:
setup <- edge_quick_setup("TinyLlama-1.1B")
Generate text:
edge_completion(setup$context, "Hello")
Clean up:
edge_free_model(setup$context)
For interactive chat:
setup <- edge_quick_setup("TinyLlama-1.1B") edge_chat_stream(setup$context)
Examples
See comprehensive examples in the package:
-
system.file("examples/getting_started_example.R", package = "edgemodelr")
-
system.file("examples/data_science_assistant_example.R", package = "edgemodelr")
-
system.file("examples/text_analysis_example.R", package = "edgemodelr")
-
system.file("examples/creative_writing_example.R", package = "edgemodelr")
-
system.file("examples/advanced_usage_example.R", package = "edgemodelr")
Run examples:
# Getting started guide source(system.file("examples/getting_started_example.R", package = "edgemodelr")) # Data science assistant source(system.file("examples/data_science_assistant_example.R", package = "edgemodelr"))
System Requirements
C++17 compatible compiler
Sufficient RAM for model size (1GB+ for small models, 8GB+ for 7B models)
GGUF model files (downloaded automatically or manually)
Privacy and Security
This package processes all data locally on your machine. No data is sent to external servers, ensuring complete privacy and control over your text generation workflows.
Author(s)
Pawan Rama Mali prm@outlook.in
See Also
Package repository: https://github.com/PawanRamaMali/edgemodelr
llama.cpp project: https://github.com/ggml-org/llama.cpp
GGUF format: https://github.com/ggml-org/ggml
Build chat prompt from conversation history
Description
Build chat prompt from conversation history
Usage
build_chat_prompt(history)
Arguments
history |
List of conversation turns with role and content |
Value
Formatted prompt string
Interactive chat session with streaming responses
Description
Interactive chat session with streaming responses
Usage
edge_chat_stream(ctx, system_prompt = NULL, max_history = 10, n_predict = 200L,
temperature = 0.8, verbose = TRUE)
Arguments
ctx |
Model context from edge_load_model() |
system_prompt |
Optional system prompt to set context |
max_history |
Maximum conversation turns to keep in context (default: 10) |
n_predict |
Maximum tokens per response (default: 200) |
temperature |
Sampling temperature (default: 0.8) |
verbose |
Whether to print responses to console (default: TRUE) |
Value
NULL (runs interactively)
Examples
setup <- edge_quick_setup("TinyLlama-1.1B")
ctx <- setup$context
if (!is.null(ctx)) {
# Start interactive chat with streaming
# edge_chat_stream(ctx,
# system_prompt = "You are a helpful R programming assistant.")
edge_free_model(ctx)
}
Clean up cache directory and manage storage
Description
Remove outdated model files from the cache directory to comply with CRAN policies about actively managing cached content and keeping sizes small.
Usage
edge_clean_cache(
cache_dir = NULL,
max_age_days = 30,
max_size_mb = 500,
interactive = TRUE,
verbose = TRUE
)
Arguments
cache_dir |
Cache directory path (default: user cache directory) |
max_age_days |
Maximum age of files to keep in days (default: 30) |
max_size_mb |
Maximum total cache size in MB (default: 500) |
interactive |
Whether to ask for user confirmation before deletion |
verbose |
Whether to print detailed messages about cleaning progress |
Value
Invisible list of deleted files
Examples
# Clean cache files older than 30 days
edge_clean_cache()
# Clean cache with custom settings
edge_clean_cache(max_age_days = 7, max_size_mb = 100)
Generate text completion using loaded model
Description
Generate text completion using loaded model
Usage
edge_completion(ctx, prompt, n_predict = 128L, temperature = 0.8, top_p = 0.95)
Arguments
ctx |
Model context from edge_load_model() |
prompt |
Input text prompt |
n_predict |
Maximum tokens to generate (default: 128) |
temperature |
Sampling temperature (default: 0.8) |
top_p |
Top-p sampling parameter (default: 0.95) |
Value
Generated text as character string
Examples
model_path <- "model.gguf"
if (file.exists(model_path)) {
ctx <- edge_load_model(model_path)
result <- edge_completion(ctx, "The capital of France is", n_predict = 50)
cat(result)
edge_free_model(ctx)
}
Download a GGUF model from Hugging Face
Description
Download a GGUF model from Hugging Face
Usage
edge_download_model(model_id, filename, cache_dir = NULL,
force_download = FALSE, verbose = TRUE)
Arguments
model_id |
Hugging Face model identifier (e.g., "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF") |
filename |
Specific GGUF file to download |
cache_dir |
Directory to store downloaded models (default: "~/.cache/edgemodelr") |
force_download |
Force re-download even if file exists |
verbose |
Whether to print download progress messages |
Value
Path to the downloaded model file
Examples
# Download TinyLlama model
model_path <- edge_download_model(
model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
filename = "tinyllama-1.1b-chat-v1.0.q4_k_m.gguf"
)
# Use the downloaded model (example only - requires actual model)
if (FALSE && file.exists(model_path)) {
ctx <- edge_load_model(model_path)
response <- edge_completion(ctx, "Hello, how are you?")
edge_free_model(ctx)
}
Free model context and release memory
Description
Free model context and release memory
Usage
edge_free_model(ctx)
Arguments
ctx |
Model context from edge_load_model() |
Value
NULL (invisibly)
Examples
model_path <- "model.gguf"
if (file.exists(model_path)) {
ctx <- edge_load_model(model_path)
# ... use model ...
edge_free_model(ctx) # Clean up
}
List popular pre-configured models
Description
List popular pre-configured models
Usage
edge_list_models()
Value
Data frame with model information
Load a local GGUF model for inference
Description
Load a local GGUF model for inference
Usage
edge_load_model(model_path, n_ctx = 2048L, n_gpu_layers = 0L)
Arguments
model_path |
Path to a .gguf model file |
n_ctx |
Maximum context length (default: 2048) |
n_gpu_layers |
Number of layers to offload to GPU (default: 0, CPU-only) |
Value
External pointer to the loaded model context
Examples
# Load a TinyLlama model
model_path <- "~/models/TinyLlama-1.1B-Chat.Q4_K_M.gguf"
if (file.exists(model_path)) {
ctx <- edge_load_model(model_path, n_ctx = 2048)
# Generate completion
result <- edge_completion(ctx, "Explain R data.frame:", n_predict = 100)
cat(result)
# Free model when done
edge_free_model(ctx)
}
Quick setup for a popular model
Description
Quick setup for a popular model
Usage
edge_quick_setup(model_name, cache_dir = NULL, verbose = TRUE)
Arguments
model_name |
Name of the model from edge_list_models() |
cache_dir |
Directory to store downloaded models |
verbose |
Whether to print setup progress messages |
Value
List with model path and context (if llama.cpp is available)
Examples
# Quick setup with TinyLlama
setup <- edge_quick_setup("TinyLlama-1.1B")
ctx <- setup$context
if (!is.null(ctx)) {
response <- edge_completion(ctx, "Hello!")
cat("Response:", response, "\n")
edge_free_model(ctx)
}
Control llama.cpp logging verbosity
Description
Enable or disable verbose output from the underlying llama.cpp library. By default, all output except errors is suppressed to comply with CRAN policies.
Usage
edge_set_verbose(enabled = FALSE)
Arguments
enabled |
Logical. If TRUE, enables verbose llama.cpp output. If FALSE (default), suppresses all output except errors. |
Value
Invisible NULL
Examples
# Enable verbose output (not recommended for normal use)
edge_set_verbose(TRUE)
# Disable verbose output (default, recommended)
edge_set_verbose(FALSE)
Stream text completion with real-time token generation
Description
Stream text completion with real-time token generation
Usage
edge_stream_completion(ctx, prompt, callback, n_predict = 128L, temperature = 0.8,
top_p = 0.95)
Arguments
ctx |
Model context from edge_load_model() |
prompt |
Input text prompt |
callback |
Function called for each generated token. Receives list with token info. |
n_predict |
Maximum tokens to generate (default: 128) |
temperature |
Sampling temperature (default: 0.8) |
top_p |
Top-p sampling parameter (default: 0.95) |
Value
List with full response and generation statistics
Examples
model_path <- "model.gguf"
if (file.exists(model_path)) {
ctx <- edge_load_model(model_path)
# Basic streaming with token display
result <- edge_stream_completion(ctx, "Hello, how are you?",
callback = function(data) {
if (!data$is_final) {
cat(data$token)
flush.console()
} else {
cat("\n[Done: ", data$total_tokens, " tokens]\n")
}
return(TRUE) # Continue generation
})
edge_free_model(ctx)
}
Check if model context is valid
Description
Check if model context is valid
Usage
is_valid_model(ctx)
Arguments
ctx |
Model context to check |
Value
Logical indicating if context is valid