Semantic Search

Learn how to use Archivus’s AI-powered semantic search to find documents by meaning, not just keywords.


Semantic search understands the meaning behind your query, not just the exact words. It finds documents even when they use different terminology.

Example:

  • Query: “tax return documents”
  • Finds: “1040 forms”, “income tax filing”, “tax documents”, “IRS forms”, etc.

Quick Start

curl "https://api.archivus.app/api/v1/search?q=find+all+contracts+expiring+soon&mode=semantic" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant"

Response:

{
  "results": [
    {
      "document": {
        "id": "doc_abc123",
        "filename": "service-agreement.pdf",
        "ai_summary": "Service agreement expiring December 31, 2025..."
      },
      "score": 0.92,
      "relevance": "high"
    }
  ],
  "mode": "semantic",
  "query": "find all contracts expiring soon",
  "total": 5
}

How It Works

Vector Embeddings

  1. Document Processing: Documents are converted to vector embeddings (1536 dimensions)
  2. Query Processing: Your search query is converted to the same vector format
  3. Similarity Matching: Archivus finds documents with similar vectors
  4. Hybrid Scoring: Combines semantic similarity (70%) with text relevance (30%)

Why It’s Better

Traditional Keyword Search:

  • Finds “contract renewal” only if those exact words appear
  • Misses “service agreement extension”
  • Requires exact terminology match

Semantic Search:

  • Finds “contract renewal”, “service agreement extension”, “term extension”, etc.
  • Understands synonyms and related concepts
  • Works with natural language queries

Code Examples

import requests

class ArchivusSemanticSearch:
    def __init__(self, api_key, tenant):
        self.api_key = api_key
        self.tenant = tenant
        self.base_url = "https://api.archivus.app/api/v1"
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "X-Tenant-Subdomain": tenant
        }
    
    def search(self, query, limit=20, filters=None):
        url = f"{self.base_url}/search"
        params = {
            "q": query,
            "mode": "semantic",
            "limit": limit
        }
        
        if filters:
            if "folder_id" in filters:
                params["folder_id"] = filters["folder_id"]
            if "tag_id" in filters:
                params["tag_id"] = filters["tag_id"]
            if "date_start" in filters:
                params["date_start"] = filters["date_start"]
            if "date_end" in filters:
                params["date_end"] = filters["date_end"]
        
        response = requests.get(url, headers=self.headers, params=params)
        return response.json()
    
    def enhanced_search(self, query, filters=None, min_score=0.7):
        url = f"{self.base_url}/search/enhanced"
        data = {
            "query": query,
            "min_score": min_score
        }
        if filters:
            data["filters"] = filters
        
        response = requests.post(url, headers=self.headers, json=data)
        return response.json()

# Usage
search = ArchivusSemanticSearch("YOUR_API_KEY", "your-tenant")

# Basic semantic search
results = search.search("contracts expiring in Q4 2025")

# Filtered search
results = search.search(
    "employee benefits documents",
    filters={
        "folder_id": "folder_hr",
        "date_start": "2025-01-01"
    }
)

# Enhanced search
results = search.enhanced_search(
    "Find all contracts with renewal clauses",
    filters={
        "tags": ["contract"],
        "ai_categories": ["legal"]
    }
)
class ArchivusSemanticSearch {
  constructor(apiKey, tenant) {
    this.apiKey = apiKey;
    this.tenant = tenant;
    this.baseURL = 'https://api.archivus.app/api/v1';
    this.headers = {
      'Authorization': `Bearer ${apiKey}`,
      'X-Tenant-Subdomain': tenant
    };
  }
  
  async search(query, limit = 20, filters = {}) {
    const url = new URL(`${this.baseURL}/search`);
    url.searchParams.set('q', query);
    url.searchParams.set('mode', 'semantic');
    url.searchParams.set('limit', limit);
    
    Object.entries(filters).forEach(([key, value]) => {
      if (value) url.searchParams.set(key, value);
    });
    
    const response = await fetch(url, { headers: this.headers });
    return response.json();
  }
  
  async enhancedSearch(query, filters = {}, minScore = 0.7) {
    const response = await fetch(`${this.baseURL}/search/enhanced`, {
      method: 'POST',
      headers: { ...this.headers, 'Content-Type': 'application/json' },
      body: JSON.stringify({ query, filters, min_score: minScore })
    });
    return response.json();
  }
}

// Usage
const search = new ArchivusSemanticSearch('YOUR_API_KEY', 'your-tenant');

// Basic semantic search
const results = await search.search('contracts expiring in Q4 2025');

// Filtered search
const filteredResults = await search.search('employee benefits', 20, {
  folder_id: 'folder_hr',
  date_start: '2025-01-01'
});

// Enhanced search
const enhancedResults = await search.enhancedSearch(
  'Find all contracts with renewal clauses',
  {
    tags: ['contract'],
    ai_categories: ['legal']
  }
);

Query Tips

Use Natural Language

Good:

  • “Find all contracts expiring soon”
  • “Documents about employee benefits”
  • “Contracts with renewal clauses”
  • “Invoices from last quarter”

Less Effective:

  • “contract expiring” (too short, may use keyword search)
  • “doc OR contract OR agreement” (keyword syntax)

Be Specific

Better:

  • “Q4 2025 service contracts”
  • “Employee health insurance documents”
  • “Contracts requiring 30-day notice”

Less Specific:

  • “contracts”
  • “documents”
  • “files”

Combine with Filters

Use semantic search for discovery, filters for refinement:

# Find all contracts (semantic search)
results = search.search("service agreements")

# Then filter by date
results = search.search(
    "service agreements",
    filters={"date_start": "2025-01-01"}
)

Understanding Results

Score

Relevance score (0.0 to 1.0):

  • 0.8 - 1.0: Highly relevant
  • 0.6 - 0.8: Relevant
  • 0.4 - 0.6: Somewhat relevant
  • < 0.4: Low relevance

Hybrid Scoring

Archivus uses hybrid scoring:

  • 70% semantic similarity (vector search)
  • 30% text relevance (keyword matching)

This combines the best of both approaches!

Result Structure

{
  "results": [
    {
      "document": {
        "id": "doc_abc123",
        "filename": "contract.pdf",
        "ai_summary": "...",
        "highlight": "...matching text..."
      },
      "score": 0.92,
      "relevance": "high"
    }
  ],
  "mode": "semantic",
  "query": "contract renewal",
  "total": 5
}

Advanced Features

Enhanced Search (Pro+)

AI-enhanced search with advanced filtering:

curl -X POST https://api.archivus.app/api/v1/search/enhanced \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Find all contracts expiring in Q4",
    "filters": {
      "date_range": {
        "start": "2025-10-01",
        "end": "2025-12-31"
      },
      "tags": ["contract"],
      "ai_categories": ["legal"]
    },
    "include_similar": true,
    "min_score": 0.7
  }'

Combine semantic and keyword search:

curl -X POST https://api.archivus.app/api/v1/search/hybrid \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "contract renewal terms",
    "semantic_weight": 0.7,
    "text_weight": 0.3
  }'

Performance

Speed

  • Typical: 300-800ms
  • With filters: 200-500ms
  • Cached queries: < 100ms

Cost

  • Semantic search: 0.5 AI credits per query
  • Very affordable - Search thousands of times for minimal cost

Best Practices

Query Strategy

  1. Start broad - Use semantic search to discover documents
  2. Narrow down - Add filters based on initial results
  3. Refine query - Adjust query based on results quality

Performance

  1. Use filters - Narrow search scope for faster results
  2. Limit results - Use limit parameter (default: 20)
  3. Cache results - Store results client-side when possible

Accuracy

  1. Be specific - More specific queries = better results
  2. Use filters - Combine semantic search with filters
  3. Review scores - Check relevance scores for quality

Use Cases

# Find documents similar to a specific document
results = search.search("documents similar to contract abc123")

Concept Discovery

# Discover documents about a concept
results = search.search("employee benefits and insurance")
# Search across all documents for a concept
results = search.search("compliance requirements")

Next Steps


Questions? Check the FAQ or contact support@ubiship.com