The Three Pillars¶
Archivus is built on three foundational architectural pillars. Every feature, every design decision, every engineering choice strengthens at least one pillar.
Understanding these pillars is essential to understanding what makes Archivus fundamentally different from traditional document platforms.
Pillar 1: Knowledge Substrate¶
Every piece of data becomes a node in a tenant-scoped knowledge graph¶
Traditional document systems store files. Search systems index text. Archivus does something different: it extracts knowledge and builds a structured graph.
What Gets Extracted¶
When you upload a document, Archivus extracts:
Entities
- People (names, roles, contact information)
- Organizations (companies, departments, divisions)
- Locations (addresses, regions, facilities)
- Concepts (products, services, topics)
- Temporal markers (dates, timeframes, events)
Relationships
- Employs (Person → Organization)
- Authored (Person → Document)
- Located at (Entity → Location)
- References (Document → Entity)
- Mentions (Document → Concept)
- Supersedes (Document → Document)
Claims
Factual statements that can be verified, contradicted, or supported:
Claim: "Contract ABC requires 60 days notice for termination"
├─ Source: vendor_agreement_2024.pdf
├─ Section: 12.3
├─ Confidence: 0.98
├─ Valid from: 2024-03-15
└─ Status: Active
Context
Every node carries critical context:
- Temporal: When was this valid?
- Geographic: Where does this apply?
- Provenance: Who said it? What's the source?
- Confidence: How certain are we?
Not Triples—Quadruples
Traditional knowledge graphs use triples: (Subject, Predicate, Object). Archivus uses quadruples: (Entity, Relationship, Entity, CONTEXT). Context isn't metadata—it's the structure that makes knowledge actionable.
Why This Matters¶
1. Persistent Intelligence
Knowledge accumulates. Every document enriches the graph. Unlike session-based chatbots that forget, your knowledge graph remembers—and grows smarter over time.
2. Relationship Discovery
The graph reveals connections that text search cannot:
- "Show me all people who authored documents related to Project X"
- "What companies are mentioned in contracts signed in Q4 2024?"
- "Find all documents that reference both Entity A and Entity B"
3. Temporal Awareness
Facts change. The graph tracks when:
- Current version vs. historical versions
- Superseded information marked but preserved
- Time-based queries: "What did we know about X in January?"
4. Contradiction Detection
Claims can conflict. The graph makes this explicit:
- Document A says X
- Document B says Y
- These contradict
- Both preserved, conflict surfaced
Data Sources¶
The knowledge graph ingests from multiple sources:
| Source Type | Examples | Integration Method |
|---|---|---|
| Documents | PDFs, Word, contracts, reports | AI extraction |
| Voice | Meetings, calls, conversations | Real-time transcription + extraction |
| Connectors | POS systems, review platforms, APIs | Structured schema mapping |
| Research | Web intelligence, fact-checking | Autonomous research with provenance |
| Manual | User corrections, verified updates | High-confidence direct input |
Everything flows into one unified graph.
Pillar 2: Symbolic Reasoning¶
Query the graph BEFORE calling the LLM¶
Most AI systems jump straight to neural generation. Archivus takes a different path: symbolic reasoning first, neural generation second.
The Reasoning Pipeline¶
Step 1: Entity Extraction
User asks: "What are our vendor termination obligations?"
System extracts entities: - Type: vendor - Type: termination clause - Type: obligation
Step 2: Graph Traversal
Query the knowledge graph: - Find all entities of type "vendor" - Find relationships of type "has_contract" - Find clauses of type "termination" - Retrieve with full context and provenance
Step 3: Relationship Discovery
Build inference chains: - Vendor A has Contract X - Contract X contains Clause 12.3 - Clause 12.3 specifies 60-day notice - Therefore: Vendor A requires 60-day notice
Step 4: Contradiction Check
Symbolically verify consistency: - Are there multiple termination clauses for Vendor A? - Do any contradict? - Which is most recent? - Surface conflicts before synthesis
Step 5: Temporal Filtering
Apply time-based logic: - Is this clause current or superseded? - When was it last updated? - Are there pending amendments?
Step 6: Provenance Assembly
Build the evidence chain: - Fact → Claim → Document → Section → Page - Every link preserved - Ready for verification
Step 7: Neural Synthesis
NOW call the LLM: - Here are the verified facts - Here are the relationships - Here are any contradictions - Generate a fluent response grounded in this structure
Why This Matters¶
1. Grounded Responses
The LLM can only synthesize what the graph provides. It cannot hallucinate facts that don't exist in the knowledge base.
2. Explainable AI
Every response can be deconstructed: - Why did you say X? → Because Fact A and Fact B - Where did Fact A come from? → Document Y, Section Z - How confident are you? → 0.95 based on extraction method
3. Contradiction Surface
The system doesn't hide inconsistency:
Warning: Conflicting termination requirements detected
Current requirement: 60 days (per Amendment 2)
Previous requirement: 30 days (per Original Contract)
Changed on: 2024-06-20
4. Inference Chains
Complex questions get logical answers:
Query: "Who has access to confidential Project X information?"
Reasoning chain: - Project X has confidential classification - Document D123 references Project X - User Alice authored Document D123 - User Bob has access to Document D123 - Therefore: Alice and Bob have potential access
All symbolic. All verifiable.
The Research Foundation
Academic research on "Context Graph Reasoning" (2024-2025) shows that adding entity descriptions and relationship context improves LLM accuracy by 11-25%. Archivus implements this research at production scale.
Pillar 3: Federated Intelligence¶
Enterprises share verified facts, not raw data¶
The ultimate vision: organizations collaborate through intelligence sharing while maintaining complete data sovereignty.
The Core Principle¶
Intelligence flows. Data stays home. Always.
Documents, PDFs, voice recordings, raw data—these never leave your boundary. What flows between organizations are verified facts with provenance.
How Federation Works¶
Traditional data sharing:
Company A → Sends entire database → Company B
Problems:
• Complete data exposure
• Privacy violation risk
• Compliance challenges
• IP leakage
• No control after transfer
Federated intelligence:
Company A → Shares verified claims with provenance → Company B
What flows:
• Structured facts: "Product X demand increased 40% in Q2"
• Provenance: Source, confidence, temporal validity
• Trust signals: Verification status, contradiction flags
What stays home:
• Raw sales data
• Internal forecasts
• Customer identities
• Proprietary algorithms
Use Cases¶
Supply Chain Intelligence
Manufacturer knows demand forecast. Distributor needs planning signal. Raw sales data is proprietary.
Solution: - Manufacturer extracts verified claim: "Product X: +40% demand Q2" - Claim includes confidence: 0.92 - Claim includes source type: "sales_forecast" - Distributor receives signal, updates procurement - No raw data exchanged
M&A Due Diligence
Acquiring company needs contract intelligence. Target company has confidential agreements.
Solution: - Target creates temporary federated access - Scope: specific entity types (contracts, obligations) - Duration: 90-day window - Queries logged for audit - Raw contracts never transferred - After close/abandon, access revoked
Industry Consortiums
Law firms share legal precedent intelligence. Client confidentiality is paramount.
Solution: - Verified claims: "Clause type X succeeded in Y jurisdictions" - Anonymized: client identities removed - Aggregated: pattern-level intelligence - Provenance: "verified by 12 firms" - Individual case details stay private
The Trust Layer¶
Federation requires trustless verification. When Company B receives claims from Company A, how does B verify without trusting A's database?
The solution: Cryptographic anchoring
Every federated fact exchange can be anchored to a public consensus ledger:
- Company A extracts verified claim
- Claim receives cryptographic hash
- Hash anchored to public ledger with timestamp
- Company B receives claim + anchor reference
- Company B verifies via public ledger
- No need to trust Company A—cryptographic proof replaces trust
Progressive Implementation¶
Federation is a journey:
Today: Foundation - Tenant-scoped knowledge graphs - Trust chain protocols - Permission scoping architecture
Tomorrow: Cross-Organization - Federated queries between enterprises - Verified fact sharing protocol - Cryptographic provenance
Future: Networks - Industry consortiums - Supply chain intelligence networks - Cross-border compliance verification
The TCP/IP Analogy
TCP/IP federated computer networks while preserving autonomy. Each network retained sovereignty but could exchange packets. Archivus federates knowledge networks while preserving data sovereignty. Each organization retains control but can exchange verified facts.
How the Pillars Work Together¶
The three pillars are not independent—they're symbiotic:
Example: Complex Query¶
User asks: "What are all active vendor contracts with auto-renewal clauses expiring in 2025?"
Pillar 1 (Knowledge Substrate) provides: - All vendor entities - All contract entities - All clause entities of type "auto-renewal" - Temporal data: effective dates, expiration dates - Relationships: vendor → has_contract → contains_clause
Pillar 2 (Symbolic Reasoning) executes: - Graph traversal: vendors with active contracts - Temporal filter: expiration in 2025 - Clause type filter: auto-renewal - Contradiction check: conflicting renewal terms? - Inference: which require action before expiration? - Provenance assembly: link each result to source
Pillar 3 (Federation) enables (future): - Query across subsidiary knowledge graphs - Aggregate parent + child company contracts - Maintain data sovereignty (contracts stay local) - Return verified facts, not raw documents
Result: Structured, verifiable, temporally-aware results that can be audited, shared, and trusted.
The Architectural Promise¶
These three pillars create capabilities that cannot be achieved with traditional architectures:
- Knowledge accumulates instead of being queried and forgotten
- Reasoning is explainable instead of black-box
- Contradictions are surfaced instead of hidden
- Intelligence can federate while data stays sovereign
This isn't incremental improvement. This is architectural transformation.
Next: Why Choose Archivus?