Semantic Search
Learn how to use Archivus’s AI-powered semantic search to find documents by meaning, not just keywords.
What is Semantic Search?
Semantic search understands the meaning behind your query, not just the exact words. It finds documents even when they use different terminology.
Example:
- Query: “tax return documents”
- Finds: “1040 forms”, “income tax filing”, “tax documents”, “IRS forms”, etc.
Quick Start
Basic Semantic Search
curl "https://api.archivus.app/api/v1/search?q=find+all+contracts+expiring+soon&mode=semantic" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant"
Response:
{
"results": [
{
"document": {
"id": "doc_abc123",
"filename": "service-agreement.pdf",
"ai_summary": "Service agreement expiring December 31, 2025..."
},
"score": 0.92,
"relevance": "high"
}
],
"mode": "semantic",
"query": "find all contracts expiring soon",
"total": 5
}
How It Works
Vector Embeddings
- Document Processing: Documents are converted to vector embeddings (1536 dimensions)
- Query Processing: Your search query is converted to the same vector format
- Similarity Matching: Archivus finds documents with similar vectors
- Hybrid Scoring: Combines semantic similarity (70%) with text relevance (30%)
Why It’s Better
Traditional Keyword Search:
- Finds “contract renewal” only if those exact words appear
- Misses “service agreement extension”
- Requires exact terminology match
Semantic Search:
- Finds “contract renewal”, “service agreement extension”, “term extension”, etc.
- Understands synonyms and related concepts
- Works with natural language queries
Code Examples
Python - Semantic Search
import requests
class ArchivusSemanticSearch:
def __init__(self, api_key, tenant):
self.api_key = api_key
self.tenant = tenant
self.base_url = "https://api.archivus.app/api/v1"
self.headers = {
"Authorization": f"Bearer {api_key}",
"X-Tenant-Subdomain": tenant
}
def search(self, query, limit=20, filters=None):
url = f"{self.base_url}/search"
params = {
"q": query,
"mode": "semantic",
"limit": limit
}
if filters:
if "folder_id" in filters:
params["folder_id"] = filters["folder_id"]
if "tag_id" in filters:
params["tag_id"] = filters["tag_id"]
if "date_start" in filters:
params["date_start"] = filters["date_start"]
if "date_end" in filters:
params["date_end"] = filters["date_end"]
response = requests.get(url, headers=self.headers, params=params)
return response.json()
def enhanced_search(self, query, filters=None, min_score=0.7):
url = f"{self.base_url}/search/enhanced"
data = {
"query": query,
"min_score": min_score
}
if filters:
data["filters"] = filters
response = requests.post(url, headers=self.headers, json=data)
return response.json()
# Usage
search = ArchivusSemanticSearch("YOUR_API_KEY", "your-tenant")
# Basic semantic search
results = search.search("contracts expiring in Q4 2025")
# Filtered search
results = search.search(
"employee benefits documents",
filters={
"folder_id": "folder_hr",
"date_start": "2025-01-01"
}
)
# Enhanced search
results = search.enhanced_search(
"Find all contracts with renewal clauses",
filters={
"tags": ["contract"],
"ai_categories": ["legal"]
}
)
JavaScript - Semantic Search
class ArchivusSemanticSearch {
constructor(apiKey, tenant) {
this.apiKey = apiKey;
this.tenant = tenant;
this.baseURL = 'https://api.archivus.app/api/v1';
this.headers = {
'Authorization': `Bearer ${apiKey}`,
'X-Tenant-Subdomain': tenant
};
}
async search(query, limit = 20, filters = {}) {
const url = new URL(`${this.baseURL}/search`);
url.searchParams.set('q', query);
url.searchParams.set('mode', 'semantic');
url.searchParams.set('limit', limit);
Object.entries(filters).forEach(([key, value]) => {
if (value) url.searchParams.set(key, value);
});
const response = await fetch(url, { headers: this.headers });
return response.json();
}
async enhancedSearch(query, filters = {}, minScore = 0.7) {
const response = await fetch(`${this.baseURL}/search/enhanced`, {
method: 'POST',
headers: { ...this.headers, 'Content-Type': 'application/json' },
body: JSON.stringify({ query, filters, min_score: minScore })
});
return response.json();
}
}
// Usage
const search = new ArchivusSemanticSearch('YOUR_API_KEY', 'your-tenant');
// Basic semantic search
const results = await search.search('contracts expiring in Q4 2025');
// Filtered search
const filteredResults = await search.search('employee benefits', 20, {
folder_id: 'folder_hr',
date_start: '2025-01-01'
});
// Enhanced search
const enhancedResults = await search.enhancedSearch(
'Find all contracts with renewal clauses',
{
tags: ['contract'],
ai_categories: ['legal']
}
);
Query Tips
Use Natural Language
Good:
- “Find all contracts expiring soon”
- “Documents about employee benefits”
- “Contracts with renewal clauses”
- “Invoices from last quarter”
Less Effective:
- “contract expiring” (too short, may use keyword search)
- “doc OR contract OR agreement” (keyword syntax)
Be Specific
Better:
- “Q4 2025 service contracts”
- “Employee health insurance documents”
- “Contracts requiring 30-day notice”
Less Specific:
- “contracts”
- “documents”
- “files”
Combine with Filters
Use semantic search for discovery, filters for refinement:
# Find all contracts (semantic search)
results = search.search("service agreements")
# Then filter by date
results = search.search(
"service agreements",
filters={"date_start": "2025-01-01"}
)
Understanding Results
Score
Relevance score (0.0 to 1.0):
- 0.8 - 1.0: Highly relevant
- 0.6 - 0.8: Relevant
- 0.4 - 0.6: Somewhat relevant
- < 0.4: Low relevance
Hybrid Scoring
Archivus uses hybrid scoring:
- 70% semantic similarity (vector search)
- 30% text relevance (keyword matching)
This combines the best of both approaches!
Result Structure
{
"results": [
{
"document": {
"id": "doc_abc123",
"filename": "contract.pdf",
"ai_summary": "...",
"highlight": "...matching text..."
},
"score": 0.92,
"relevance": "high"
}
],
"mode": "semantic",
"query": "contract renewal",
"total": 5
}
Advanced Features
Enhanced Search (Pro+)
AI-enhanced search with advanced filtering:
curl -X POST https://api.archivus.app/api/v1/search/enhanced \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant" \
-H "Content-Type: application/json" \
-d '{
"query": "Find all contracts expiring in Q4",
"filters": {
"date_range": {
"start": "2025-10-01",
"end": "2025-12-31"
},
"tags": ["contract"],
"ai_categories": ["legal"]
},
"include_similar": true,
"min_score": 0.7
}'
Hybrid Search
Combine semantic and keyword search:
curl -X POST https://api.archivus.app/api/v1/search/hybrid \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant" \
-H "Content-Type: application/json" \
-d '{
"query": "contract renewal terms",
"semantic_weight": 0.7,
"text_weight": 0.3
}'
Performance
Speed
- Typical: 300-800ms
- With filters: 200-500ms
- Cached queries: < 100ms
Cost
- Semantic search: 0.5 AI credits per query
- Very affordable - Search thousands of times for minimal cost
Best Practices
Query Strategy
- Start broad - Use semantic search to discover documents
- Narrow down - Add filters based on initial results
- Refine query - Adjust query based on results quality
Performance
- Use filters - Narrow search scope for faster results
- Limit results - Use
limitparameter (default: 20) - Cache results - Store results client-side when possible
Accuracy
- Be specific - More specific queries = better results
- Use filters - Combine semantic search with filters
- Review scores - Check relevance scores for quality
Use Cases
Finding Related Documents
# Find documents similar to a specific document
results = search.search("documents similar to contract abc123")
Concept Discovery
# Discover documents about a concept
results = search.search("employee benefits and insurance")
Cross-Document Search
# Search across all documents for a concept
results = search.search("compliance requirements")
Next Steps
- Chat with Documents - Ask questions about search results
- Auto-Tagging - AI-powered document tagging
- API Reference - Complete search API documentation
Questions? Check the FAQ or contact support@ubiship.com