Document Analysis
Learn about Archivus’s advanced AI analysis features including summarization, entity extraction, and classification.
Overview
Archivus provides comprehensive AI analysis of your documents:
- Summarization - Quick and full summaries
- Entity Extraction - People, organizations, dates, amounts
- Classification - Automatic categorization
- Duplicate Detection - Find similar documents
- Sensitive Data Scanning - PII, financial data detection
Summarization
Quick Summary
Get a 2-3 sentence summary:
curl -X POST https://api.archivus.app/api/v1/documents/doc_abc123/summarize \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant" \
-H "Content-Type: application/json" \
-d '{
"type": "quick"
}'
Response:
{
"summary": "This document is a service agreement between Acme Corp and XYZ Inc, covering a 24-month term with payment terms of Net 30 days.",
"type": "quick",
"created_at": "2025-12-16T10:30:00Z"
}
Full Summary
Get a comprehensive summary:
curl -X POST https://api.archivus.app/api/v1/documents/doc_abc123/summarize \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant" \
-H "Content-Type: application/json" \
-d '{
"type": "full"
}'
Response:
{
"summary": "This service agreement document outlines the terms and conditions between Acme Corp (Service Provider) and XYZ Inc (Client) for the provision of consulting services.\n\n**Key Terms:**\n- Term: 24 months starting January 1, 2025\n- Payment: Net 30 days, $50,000 monthly\n- Termination: Either party with 30 days notice\n\n**Obligations:**\n- Service Provider: Deliver consulting services as specified\n- Client: Make timely payments\n\n**Important Dates:**\n- Start Date: January 1, 2025\n- End Date: December 31, 2026",
"type": "full",
"key_points": [
"24-month service agreement",
"$50,000 monthly payment",
"30-day termination notice"
],
"created_at": "2025-12-16T10:30:00Z"
}
Entity Extraction
Extract people, organizations, dates, and amounts:
curl https://api.archivus.app/api/v1/documents/doc_abc123/entities \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant"
Response:
{
"entities": {
"people": [
{"name": "John Doe", "role": "CEO", "confidence": 0.95},
{"name": "Jane Smith", "role": "CFO", "confidence": 0.92}
],
"organizations": [
{"name": "Acme Corp", "type": "company", "confidence": 0.98},
{"name": "XYZ Inc", "type": "company", "confidence": 0.97}
],
"dates": [
{"value": "2025-01-01", "type": "start_date", "confidence": 0.95},
{"value": "2025-12-31", "type": "end_date", "confidence": 0.94}
],
"amounts": [
{"value": 50000, "currency": "USD", "type": "monthly_payment", "confidence": 0.96}
],
"locations": [
{"name": "New York, NY", "type": "city", "confidence": 0.93}
]
}
}
Classification
Automatically categorize documents:
curl https://api.archivus.app/api/v1/documents/doc_abc123 \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant"
Response includes:
{
"id": "doc_abc123",
"ai_categories": ["legal", "business", "contract"],
"classification": {
"primary": "legal",
"secondary": ["business", "contract"],
"confidence": 0.95
}
}
Code Examples
Python - Document Analysis
import requests
class ArchivusAnalysis:
def __init__(self, api_key, tenant):
self.api_key = api_key
self.tenant = tenant
self.base_url = "https://api.archivus.app/api/v1"
self.headers = {
"Authorization": f"Bearer {api_key}",
"X-Tenant-Subdomain": tenant
}
def summarize(self, document_id, summary_type="quick"):
url = f"{self.base_url}/documents/{document_id}/summarize"
data = {"type": summary_type}
response = requests.post(url, headers=self.headers, json=data)
return response.json()
def get_entities(self, document_id):
url = f"{self.base_url}/documents/{document_id}/entities"
response = requests.get(url, headers=self.headers)
return response.json()
def get_classification(self, document_id):
url = f"{self.base_url}/documents/{document_id}"
response = requests.get(url, headers=self.headers)
doc = response.json()
return doc.get("classification", {})
def find_duplicates(self, document_id):
url = f"{self.base_url}/documents/{document_id}/duplicates"
response = requests.get(url, headers=self.headers)
return response.json()
def scan_sensitive_data(self, document_id):
url = f"{self.base_url}/documents/{document_id}/scan-sensitive"
response = requests.post(url, headers=self.headers)
return response.json()
# Usage
analysis = ArchivusAnalysis("YOUR_API_KEY", "your-tenant")
# Get quick summary
summary = analysis.summarize("doc_abc123", "quick")
print(summary["summary"])
# Get full summary
full_summary = analysis.summarize("doc_abc123", "full")
print(full_summary["summary"])
print("Key points:", full_summary["key_points"])
# Extract entities
entities = analysis.get_entities("doc_abc123")
print("People:", entities["entities"]["people"])
print("Organizations:", entities["entities"]["organizations"])
# Get classification
classification = analysis.get_classification("doc_abc123")
print("Category:", classification["primary"])
# Find duplicates
duplicates = analysis.find_duplicates("doc_abc123")
print("Duplicate documents:", duplicates["duplicates"])
# Scan for sensitive data
sensitive = analysis.scan_sensitive_data("doc_abc123")
print("PII found:", sensitive["pii_detected"])
JavaScript - Document Analysis
class ArchivusAnalysis {
constructor(apiKey, tenant) {
this.apiKey = apiKey;
this.tenant = tenant;
this.baseURL = 'https://api.archivus.app/api/v1';
this.headers = {
'Authorization': `Bearer ${apiKey}`,
'X-Tenant-Subdomain': tenant
};
}
async summarize(documentId, type = 'quick') {
const response = await fetch(`${this.baseURL}/documents/${documentId}/summarize`, {
method: 'POST',
headers: { ...this.headers, 'Content-Type': 'application/json' },
body: JSON.stringify({ type })
});
return response.json();
}
async getEntities(documentId) {
const response = await fetch(`${this.baseURL}/documents/${documentId}/entities`, {
headers: this.headers
});
return response.json();
}
async getClassification(documentId) {
const response = await fetch(`${this.baseURL}/documents/${documentId}`, {
headers: this.headers
});
const doc = await response.json();
return doc.classification || {};
}
async findDuplicates(documentId) {
const response = await fetch(`${this.baseURL}/documents/${documentId}/duplicates`, {
headers: this.headers
});
return response.json();
}
async scanSensitiveData(documentId) {
const response = await fetch(`${this.baseURL}/documents/${documentId}/scan-sensitive`, {
method: 'POST',
headers: this.headers
});
return response.json();
}
}
// Usage
const analysis = new ArchivusAnalysis('YOUR_API_KEY', 'your-tenant');
// Get quick summary
const summary = await analysis.summarize('doc_abc123', 'quick');
console.log(summary.summary);
// Get full summary
const fullSummary = await analysis.summarize('doc_abc123', 'full');
console.log(fullSummary.summary);
console.log('Key points:', fullSummary.key_points);
// Extract entities
const entities = await analysis.getEntities('doc_abc123');
console.log('People:', entities.entities.people);
console.log('Organizations:', entities.entities.organizations);
// Get classification
const classification = await analysis.getClassification('doc_abc123');
console.log('Category:', classification.primary);
// Find duplicates
const duplicates = await analysis.findDuplicates('doc_abc123');
console.log('Duplicate documents:', duplicates.duplicates);
// Scan for sensitive data
const sensitive = await analysis.scanSensitiveData('doc_abc123');
console.log('PII found:', sensitive.pii_detected);
Duplicate Detection
Find similar or duplicate documents:
curl https://api.archivus.app/api/v1/documents/doc_abc123/duplicates \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant"
Response:
{
"duplicates": [
{
"document_id": "doc_def456",
"similarity": 0.95,
"reason": "Very similar content"
},
{
"document_id": "doc_ghi789",
"similarity": 0.87,
"reason": "Similar structure"
}
]
}
Sensitive Data Scanning
Scan for PII, financial data, and credentials:
curl -X POST https://api.archivus.app/api/v1/documents/doc_abc123/scan-sensitive \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Tenant-Subdomain: your-tenant"
Response:
{
"pii_detected": true,
"pii_types": ["email", "phone", "ssn"],
"financial_data": true,
"credentials": false,
"details": [
{
"type": "email",
"value": "john.doe@example.com",
"location": "page 2, line 15"
},
{
"type": "ssn",
"value": "***-**-1234",
"location": "page 3, line 8"
}
]
}
Cost
| Feature | Credits | Description |
|---|---|---|
| Quick Summary | 1 | 2-3 sentence summary |
| Full Summary | 2 | Comprehensive summary |
| Entity Extraction | 2 | Extract people, orgs, dates, amounts |
| Classification | 1 | Automatic categorization |
| Duplicate Detection | 1 | Find similar documents |
| Sensitive Data Scan | 1 | Scan for PII, financial data |
Best Practices
Summarization
- Use quick summaries - For overviews and lists
- Use full summaries - For detailed analysis
- Cache summaries - Store summaries client-side
Entity Extraction
- Review confidence scores - Verify low-confidence entities
- Use for automation - Extract data for workflows
- Combine with tags - Use entities as tags
Classification
- Review categories - Verify AI classification
- Use for organization - Organize by category
- Combine with folders - Use categories for folder structure
Next Steps
- Chat with Documents - Ask questions about documents
- Semantic Search - Find documents by meaning
- API Reference - Complete analysis API documentation
Questions? Check the FAQ or contact support@ubiship.com