BYOB AI¶

Bring Your Own AI allows enterprises to use their own language models and AI infrastructure with Archivus, maintaining full control over AI processing.

Why BYOB AI?¶

Data Privacy¶

Sensitive documents never leave your infrastructure for AI processing. Process everything locally with no external API calls.

Model Control¶

Use models fine-tuned for your industry, terminology, or compliance requirements.

Cost Optimization¶

High-volume users can reduce costs by running models on their own infrastructure.

Air-Gapped Compliance¶

Required for defense, government, and highly regulated industries where external AI services are prohibited.

Supported AI Backends¶

Local LLM Providers¶

Provider	Models	GPU Required	Notes
Ollama	Llama, Mistral, Qwen, Gemma	Recommended	Easy setup, production-ready
vLLM	Most HuggingFace models	Yes	High-throughput serving
TGI	Most HuggingFace models	Yes	Hugging Face's inference server
LocalAI	Various	Optional	CPU-friendly option

Cloud AI Providers¶

Provider	Models	Use Case
OpenAI	GPT-4, GPT-4 Turbo	Your API key, your billing
Anthropic	Claude 3.5, Claude 3	Your API key, your billing
Azure OpenAI	GPT models	Enterprise Azure deployment
Google Vertex AI	Gemini, PaLM	GCP-hosted models
AWS Bedrock	Various	AWS-managed AI

Custom Models¶

Deploy your own fine-tuned models:

Custom document classification models
Industry-specific entity extraction
Domain terminology embeddings
Compliance-trained summarization

Architecture¶

graph LR
    subgraph "Your Infrastructure"
        M[Your LLM Server]
        E[Your Embedding Server]
    end
    subgraph "Archivus"
        A[Document Processor]
        B[AI Router]
        C[Knowledge Graph]
    end

    A --> B
    B --> M
    B --> E
    M --> C
    E --> C

Processing Flow¶

Document Upload - New document enters the system
AI Router - Archivus routes AI requests to your configured backend
Your Infrastructure - Processing happens on your models
Results Return - AI outputs returned to Archivus for storage

What Your AI Handles¶

Task	Description
Text Extraction	OCR and content extraction
Classification	Document type identification
Summarization	Executive summaries
Entity Extraction	People, organizations, dates, amounts
Embeddings	Semantic search vectors
Q&A	Natural language document queries

Configuration¶

Ollama (Recommended for On-Premises)¶

ai:
  provider: ollama
  endpoint: http://localhost:11434
  models:
    chat: llama3.2
    embedding: nomic-embed-text
  timeout: 300s
  max_tokens: 4096

OpenAI (Your Key)¶

ai:
  provider: openai
  api_key: ${OPENAI_API_KEY}  # Your organization's key
  models:
    chat: gpt-4-turbo
    embedding: text-embedding-3-large
  organization: org-xxxxx  # Optional

Azure OpenAI¶

ai:
  provider: azure-openai
  endpoint: https://your-instance.openai.azure.com
  api_key: ${AZURE_OPENAI_KEY}
  api_version: "2024-02-01"
  deployments:
    chat: gpt-4-deployment
    embedding: embedding-deployment

Custom Endpoint¶

For any OpenAI-compatible API:

ai:
  provider: openai-compatible
  endpoint: https://your-llm-server.internal/v1
  api_key: ${YOUR_API_KEY}
  models:
    chat: your-model-name
    embedding: your-embedding-model

Recommended Models¶

For Document Intelligence¶

Task	Recommended	Alternative
General Chat	Llama 3.2 (8B)	Mistral 7B
Long Documents	Qwen2.5 (32K context)	Llama 3.2
Embeddings	nomic-embed-text	mxbai-embed-large
Fast Classification	Gemma 2B	Phi-3-mini

Hardware Requirements¶

CPU OnlySingle GPUMulti-GPU

Suitable for lower volume or non-time-sensitive processing:

16+ CPU cores
32+ GB RAM
Models: Gemma 2B, Phi-3-mini

Good balance of performance and cost:

NVIDIA RTX 4090 (24GB) or better
Supports 7-8B parameter models
Models: Llama 3.2, Mistral 7B

For high throughput or larger models:

2-8x NVIDIA A100 or H100
Supports 70B+ parameter models
Enables batch processing

Performance Tuning¶

Ollama Optimization¶

# Environment variables for Ollama
OLLAMA_NUM_PARALLEL=4      # Concurrent requests
OLLAMA_MAX_LOADED_MODELS=2 # Models kept in memory
OLLAMA_KEEP_ALIVE=24h      # Keep models loaded
OLLAMA_GPU_LAYERS=33       # GPU offloading (adjust for VRAM)

Batch Processing¶

For high-volume document processing:

Enable request batching for embeddings
Use async processing for non-blocking operations
Configure queue priorities for different document types

Caching¶

AI responses are cached to reduce redundant processing:

Identical queries return cached results
Embeddings cached by content hash
Classification results cached per document

Fallback Configuration¶

Configure fallback providers for resilience:

ai:
  primary:
    provider: ollama
    endpoint: http://primary-gpu:11434
  fallback:
    provider: openai
    api_key: ${OPENAI_API_KEY}
  fallback_on:
    - timeout
    - server_error
    - rate_limit

When the primary provider fails, Archivus automatically falls back to the secondary.

Security Considerations¶

Network Isolation¶

Run AI servers on private networks
Use internal DNS for service discovery
No internet access required for local LLMs

API Key Security¶

Store keys in secret management (Vault, AWS Secrets Manager)
Rotate keys periodically
Audit API key usage

Model Security¶

Verify model checksums before deployment
Use official model sources
Monitor for model drift or tampering

Cost Comparison¶

Example: 10,000 Documents/Month¶

Approach	Cost	Notes
Archivus AI	~$200/month	Platform credits, fully managed
Your OpenAI Key	~$150/month	Your billing, OpenAI rates
Local Ollama	~$50/month	Electricity + amortized hardware

Break-even for local LLM hardware typically occurs at 20,000-50,000 documents/month depending on infrastructure costs.

Getting Started¶

1. Choose Your Backend¶

Quick Start: Ollama with Llama 3.2
Enterprise Cloud: Azure OpenAI or AWS Bedrock
Maximum Control: Self-hosted vLLM

2. Deploy and Test¶

# Ollama quick start
ollama pull llama3.2
ollama pull nomic-embed-text
curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello"}'

3. Configure Archivus¶

Update your Archivus configuration with AI backend settings.

4. Validate¶

Run test documents through the pipeline to verify AI processing.

Next Steps¶

BYOB Storage - Bring your own storage
Deployment Options - Deployment models
Compliance - Security certifications