Document Analysis

Learn about Archivus’s advanced AI analysis features including summarization, entity extraction, and classification.


Overview

Archivus provides comprehensive AI analysis of your documents:

  • Summarization - Quick and full summaries
  • Entity Extraction - People, organizations, dates, amounts
  • Classification - Automatic categorization
  • Duplicate Detection - Find similar documents
  • Sensitive Data Scanning - PII, financial data detection

Summarization

Quick Summary

Get a 2-3 sentence summary:

curl -X POST https://api.archivus.app/api/v1/documents/doc_abc123/summarize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "quick"
  }'

Response:

{
  "summary": "This document is a service agreement between Acme Corp and XYZ Inc, covering a 24-month term with payment terms of Net 30 days.",
  "type": "quick",
  "created_at": "2025-12-16T10:30:00Z"
}

Full Summary

Get a comprehensive summary:

curl -X POST https://api.archivus.app/api/v1/documents/doc_abc123/summarize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "full"
  }'

Response:

{
  "summary": "This service agreement document outlines the terms and conditions between Acme Corp (Service Provider) and XYZ Inc (Client) for the provision of consulting services.\n\n**Key Terms:**\n- Term: 24 months starting January 1, 2025\n- Payment: Net 30 days, $50,000 monthly\n- Termination: Either party with 30 days notice\n\n**Obligations:**\n- Service Provider: Deliver consulting services as specified\n- Client: Make timely payments\n\n**Important Dates:**\n- Start Date: January 1, 2025\n- End Date: December 31, 2026",
  "type": "full",
  "key_points": [
    "24-month service agreement",
    "$50,000 monthly payment",
    "30-day termination notice"
  ],
  "created_at": "2025-12-16T10:30:00Z"
}

Entity Extraction

Extract people, organizations, dates, and amounts:

curl https://api.archivus.app/api/v1/documents/doc_abc123/entities \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant"

Response:

{
  "entities": {
    "people": [
      {"name": "John Doe", "role": "CEO", "confidence": 0.95},
      {"name": "Jane Smith", "role": "CFO", "confidence": 0.92}
    ],
    "organizations": [
      {"name": "Acme Corp", "type": "company", "confidence": 0.98},
      {"name": "XYZ Inc", "type": "company", "confidence": 0.97}
    ],
    "dates": [
      {"value": "2025-01-01", "type": "start_date", "confidence": 0.95},
      {"value": "2025-12-31", "type": "end_date", "confidence": 0.94}
    ],
    "amounts": [
      {"value": 50000, "currency": "USD", "type": "monthly_payment", "confidence": 0.96}
    ],
    "locations": [
      {"name": "New York, NY", "type": "city", "confidence": 0.93}
    ]
  }
}

Classification

Automatically categorize documents:

curl https://api.archivus.app/api/v1/documents/doc_abc123 \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant"

Response includes:

{
  "id": "doc_abc123",
  "ai_categories": ["legal", "business", "contract"],
  "classification": {
    "primary": "legal",
    "secondary": ["business", "contract"],
    "confidence": 0.95
  }
}

Code Examples

Python - Document Analysis

import requests

class ArchivusAnalysis:
    def __init__(self, api_key, tenant):
        self.api_key = api_key
        self.tenant = tenant
        self.base_url = "https://api.archivus.app/api/v1"
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "X-Tenant-Subdomain": tenant
        }
    
    def summarize(self, document_id, summary_type="quick"):
        url = f"{self.base_url}/documents/{document_id}/summarize"
        data = {"type": summary_type}
        response = requests.post(url, headers=self.headers, json=data)
        return response.json()
    
    def get_entities(self, document_id):
        url = f"{self.base_url}/documents/{document_id}/entities"
        response = requests.get(url, headers=self.headers)
        return response.json()
    
    def get_classification(self, document_id):
        url = f"{self.base_url}/documents/{document_id}"
        response = requests.get(url, headers=self.headers)
        doc = response.json()
        return doc.get("classification", {})
    
    def find_duplicates(self, document_id):
        url = f"{self.base_url}/documents/{document_id}/duplicates"
        response = requests.get(url, headers=self.headers)
        return response.json()
    
    def scan_sensitive_data(self, document_id):
        url = f"{self.base_url}/documents/{document_id}/scan-sensitive"
        response = requests.post(url, headers=self.headers)
        return response.json()

# Usage
analysis = ArchivusAnalysis("YOUR_API_KEY", "your-tenant")

# Get quick summary
summary = analysis.summarize("doc_abc123", "quick")
print(summary["summary"])

# Get full summary
full_summary = analysis.summarize("doc_abc123", "full")
print(full_summary["summary"])
print("Key points:", full_summary["key_points"])

# Extract entities
entities = analysis.get_entities("doc_abc123")
print("People:", entities["entities"]["people"])
print("Organizations:", entities["entities"]["organizations"])

# Get classification
classification = analysis.get_classification("doc_abc123")
print("Category:", classification["primary"])

# Find duplicates
duplicates = analysis.find_duplicates("doc_abc123")
print("Duplicate documents:", duplicates["duplicates"])

# Scan for sensitive data
sensitive = analysis.scan_sensitive_data("doc_abc123")
print("PII found:", sensitive["pii_detected"])

JavaScript - Document Analysis

class ArchivusAnalysis {
  constructor(apiKey, tenant) {
    this.apiKey = apiKey;
    this.tenant = tenant;
    this.baseURL = 'https://api.archivus.app/api/v1';
    this.headers = {
      'Authorization': `Bearer ${apiKey}`,
      'X-Tenant-Subdomain': tenant
    };
  }
  
  async summarize(documentId, type = 'quick') {
    const response = await fetch(`${this.baseURL}/documents/${documentId}/summarize`, {
      method: 'POST',
      headers: { ...this.headers, 'Content-Type': 'application/json' },
      body: JSON.stringify({ type })
    });
    return response.json();
  }
  
  async getEntities(documentId) {
    const response = await fetch(`${this.baseURL}/documents/${documentId}/entities`, {
      headers: this.headers
    });
    return response.json();
  }
  
  async getClassification(documentId) {
    const response = await fetch(`${this.baseURL}/documents/${documentId}`, {
      headers: this.headers
    });
    const doc = await response.json();
    return doc.classification || {};
  }
  
  async findDuplicates(documentId) {
    const response = await fetch(`${this.baseURL}/documents/${documentId}/duplicates`, {
      headers: this.headers
    });
    return response.json();
  }
  
  async scanSensitiveData(documentId) {
    const response = await fetch(`${this.baseURL}/documents/${documentId}/scan-sensitive`, {
      method: 'POST',
      headers: this.headers
    });
    return response.json();
  }
}

// Usage
const analysis = new ArchivusAnalysis('YOUR_API_KEY', 'your-tenant');

// Get quick summary
const summary = await analysis.summarize('doc_abc123', 'quick');
console.log(summary.summary);

// Get full summary
const fullSummary = await analysis.summarize('doc_abc123', 'full');
console.log(fullSummary.summary);
console.log('Key points:', fullSummary.key_points);

// Extract entities
const entities = await analysis.getEntities('doc_abc123');
console.log('People:', entities.entities.people);
console.log('Organizations:', entities.entities.organizations);

// Get classification
const classification = await analysis.getClassification('doc_abc123');
console.log('Category:', classification.primary);

// Find duplicates
const duplicates = await analysis.findDuplicates('doc_abc123');
console.log('Duplicate documents:', duplicates.duplicates);

// Scan for sensitive data
const sensitive = await analysis.scanSensitiveData('doc_abc123');
console.log('PII found:', sensitive.pii_detected);

Duplicate Detection

Find similar or duplicate documents:

curl https://api.archivus.app/api/v1/documents/doc_abc123/duplicates \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant"

Response:

{
  "duplicates": [
    {
      "document_id": "doc_def456",
      "similarity": 0.95,
      "reason": "Very similar content"
    },
    {
      "document_id": "doc_ghi789",
      "similarity": 0.87,
      "reason": "Similar structure"
    }
  ]
}

Sensitive Data Scanning

Scan for PII, financial data, and credentials:

curl -X POST https://api.archivus.app/api/v1/documents/doc_abc123/scan-sensitive \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Tenant-Subdomain: your-tenant"

Response:

{
  "pii_detected": true,
  "pii_types": ["email", "phone", "ssn"],
  "financial_data": true,
  "credentials": false,
  "details": [
    {
      "type": "email",
      "value": "john.doe@example.com",
      "location": "page 2, line 15"
    },
    {
      "type": "ssn",
      "value": "***-**-1234",
      "location": "page 3, line 8"
    }
  ]
}

Cost

Feature Credits Description
Quick Summary 1 2-3 sentence summary
Full Summary 2 Comprehensive summary
Entity Extraction 2 Extract people, orgs, dates, amounts
Classification 1 Automatic categorization
Duplicate Detection 1 Find similar documents
Sensitive Data Scan 1 Scan for PII, financial data

Best Practices

Summarization

  1. Use quick summaries - For overviews and lists
  2. Use full summaries - For detailed analysis
  3. Cache summaries - Store summaries client-side

Entity Extraction

  1. Review confidence scores - Verify low-confidence entities
  2. Use for automation - Extract data for workflows
  3. Combine with tags - Use entities as tags

Classification

  1. Review categories - Verify AI classification
  2. Use for organization - Organize by category
  3. Combine with folders - Use categories for folder structure

Next Steps


Questions? Check the FAQ or contact support@ubiship.com