Hybrid Analytics
DuckDB-Powered Analytics for Real-Time Insights (Team+)
Overview
Archivus implements a hybrid analytics architecture that combines PostgreSQL (OLTP) with DuckDB (OLAP) to deliver 10-100x faster analytical queries without impacting operational workloads.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Archivus Data Layer │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Operational Data (OLTP) │ │
│ │ PostgreSQL │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Documents│ │ Chats │ │ Users │ │ Folders │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ └───────┼───────────┼───────────┼───────────┼───────────────┘ │
│ │ │ │ │ │
│ └───────────┴─────┬─────┴───────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ ETL Sync │ (Background workers) │
│ │ (Periodic) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌─────────────────────────▼─────────────────────────────────┐ │
│ │ Analytical Data (OLAP) │ │
│ │ DuckDB + S3 Parquet │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ document_ │ │ chat_ │ │ usage_ │ │ │
│ │ │ analytics │ │ analytics │ │ analytics │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Query Examples
Query Type
PostgreSQL
DuckDB
Improvement
Daily document counts (30 days)
450ms
12ms
37x faster
Monthly active users
890ms
25ms
35x faster
Storage growth trend
1.2s
45ms
27x faster
Top tags analysis
2.1s
78ms
27x faster
AI credit usage breakdown
3.4s
120ms
28x faster
Cross-tenant aggregation*
8.5s
340ms
25x faster
*Super admin only
Why DuckDB?
Feature
PostgreSQL
DuckDB
Primary use
Transactions (OLTP)
Analytics (OLAP)
Storage
Row-based
Column-based
Aggregations
Moderate
Excellent
Concurrent writes
Excellent
Limited
Query latency
Low for CRUD
Low for analytics
Data Model
Parquet Files on S3
s3://archivus-analytics/
├── tenant_{id}/
│ ├── documents/
│ │ ├── 2026-01-18.parquet
│ │ ├── 2026-01-17.parquet
│ │ └── ...
│ ├── chats/
│ │ └── ...
│ ├── usage/
│ │ └── ...
│ └── ai_traffic/
│ └── ...
Schema Example: Document Analytics
-- DuckDB table backed by Parquet
CREATE TABLE document_analytics AS
SELECT * FROM read_parquet ( 's3://archivus-analytics/tenant_*/documents/*.parquet' );
-- Schema
document_id UUID
tenant_id UUID
created_at TIMESTAMP
file_type VARCHAR
file_size_bytes BIGINT
page_count INTEGER
has_ai_analysis BOOLEAN
embedding_count INTEGER
tag_count INTEGER
view_count INTEGER
chat_count INTEGER
Available Analytics
Document Analytics
GET /api/v1/analytics/documents?period= 30d
Response:
{
"period" : "30d" ,
"summary" : {
"total_documents" : 1523,
"total_size_gb" : 12.4,
"documents_added" : 234,
"documents_analyzed" : 189
} ,
"by_type" : [
{ "type" : "pdf" , "count" : 890, "size_gb" : 8.2} ,
{ "type" : "docx" , "count" : 412, "size_gb" : 2.8} ,
{ "type" : "xlsx" , "count" : 221, "size_gb" : 1.4}
] ,
"daily_trend" : [
{ "date" : "2026-01-18" , "added" : 12, "deleted" : 2} ,
{ "date" : "2026-01-17" , "added" : 8, "deleted" : 0}
]
}
AI Usage Analytics
GET /api/v1/analytics/ai-usage?period= 30d
Response:
{
"period" : "30d" ,
"summary" : {
"total_credits_used" : 4520,
"total_requests" : 2890,
"avg_credits_per_day" : 150.7
} ,
"by_operation" : [
{ "operation" : "chat" , "credits" : 2100, "requests" : 1050} ,
{ "operation" : "summary" , "credits" : 890, "requests" : 445} ,
{ "operation" : "analysis" , "credits" : 1200, "requests" : 400} ,
{ "operation" : "search" , "credits" : 330, "requests" : 660}
] ,
"by_user" : [
{ "user_id" : "user_123" , "credits" : 1200, "requests" : 450} ,
{ "user_id" : "user_456" , "credits" : 980, "requests" : 320}
]
}
Storage Analytics
GET /api/v1/analytics/storage?period= 90d
Response:
{
"period" : "90d" ,
"current" : {
"total_size_gb" : 45.2,
"quota_gb" : 100,
"utilization_percent" : 45.2
} ,
"growth" : {
"last_30_days_gb" : 8.5,
"projected_30_days_gb" : 9.2,
"days_until_quota" : 183
} ,
"by_folder" : [
{ "folder" : "/contracts" , "size_gb" : 12.4, "documents" : 456} ,
{ "folder" : "/invoices" , "size_gb" : 8.9, "documents" : 2341}
]
}
Search Analytics
GET /api/v1/analytics/search?period= 30d
Response:
{
"period" : "30d" ,
"summary" : {
"total_searches" : 3420,
"avg_results" : 8.5,
"avg_latency_ms" : 340
} ,
"popular_queries" : [
{ "query" : "contract terms" , "count" : 89} ,
{ "query" : "invoice 2025" , "count" : 67}
] ,
"search_success_rate" : 0.92,
"zero_result_queries" : 274
}
ETL Process
Sync Architecture
┌─────────────────────────────────────────────────────────────────┐
│ ETL Pipeline │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ PostgreSQL │────▶│ Worker │────▶│ S3 Parquet │ │
│ │ (Source) │ │ (Extract) │ │ (Target) │ │
│ └─────────────┘ └──────┬──────┘ └─────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ DuckDB │ │
│ │ (Query) │ │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Sync Schedule
Data Type
Sync Frequency
Latency
Documents
Every 5 minutes
~5 min
Chat sessions
Every 15 minutes
~15 min
AI usage
Hourly
~1 hour
Storage metrics
Daily
~24 hours
Incremental Updates
-- Only sync changed records
SELECT * FROM documents
WHERE updated_at > : last_sync_time
AND tenant_id = : tenant_id ;
Query API
Custom Analytics Queries
POST /api/v1/analytics/query
Content-Type: application/json
{
"query" : "document_growth" ,
"parameters" : {
"start_date" : "2026-01-01" ,
"end_date" : "2026-01-18" ,
"group_by" : "week"
}
}
Response:
{
"query" : "document_growth" ,
"results" : [
{ "week" : "2026-W01" , "documents_added" : 45, "total" : 1450} ,
{ "week" : "2026-W02" , "documents_added" : 52, "total" : 1502} ,
{ "week" : "2026-W03" , "documents_added" : 21, "total" : 1523}
] ,
"execution_time_ms" : 23
}
Available Query Types
Query Type
Description
document_growth
Document count over time
storage_trend
Storage usage trend
ai_usage
AI credit consumption
user_activity
User engagement metrics
search_patterns
Search behavior analysis
tag_distribution
Tag usage analysis
Cost Efficiency
Pricing Model
Component
Cost
S3 Storage
~$0.023/GB/month
DuckDB Compute
Ephemeral (no always-on)
ETL Workers
Shared infrastructure
vs Traditional Data Warehouse
Aspect
Traditional DW
Archivus Hybrid
Fixed costs
$500-5000/month
$0
Storage
$0.10-0.50/GB
$0.023/GB
Query pricing
Per-query or reserved
Included
Setup complexity
High
Built-in
Tenant Isolation
Analytics data maintains tenant isolation:
-- All analytics queries filtered by tenant
SELECT * FROM document_analytics
WHERE tenant_id = current_setting ( 'app.tenant_id' );
-- Parquet files organized by tenant
s3 : // archivus - analytics / tenant_ { tenant_id } / ...
API Reference
Analytics Endpoints
Endpoint
Method
Description
/analytics/documents
GET
Document analytics
/analytics/ai-usage
GET
AI credit usage
/analytics/storage
GET
Storage metrics
/analytics/search
GET
Search analytics
/analytics/users
GET
User activity
/analytics/query
POST
Custom queries
/analytics/export
POST
Export analytics data
Tier Availability
Feature
Pro
Team
Enterprise
Basic analytics
Dashboard only
Full API
Full API
Custom queries
-
Limited
Unlimited
Data retention
30 days
90 days
Custom
Export
-
CSV
CSV, Parquet
Real-time sync
-
15 min
5 min
Best Practices
Query Optimization
Use date ranges - Limit query scope
Specify aggregation level - Daily, weekly, monthly
Filter early - Use query parameters
Data Management
Monitor sync status - Ensure ETL healthy
Review data freshness - Understand latency
Archive old data - Manage storage costs
Ready to analyze? Explore Analytics Dashboard