The Three Pillars¶

Archivus is built on three foundational architectural pillars. Every feature, every design decision, every engineering choice strengthens at least one pillar.

Understanding these pillars is essential to understanding what makes Archivus fundamentally different from traditional document platforms.

Pillar 1: Knowledge Substrate¶

Every piece of data becomes a node in a tenant-scoped knowledge graph¶

Traditional document systems store files. Search systems index text. Archivus does something different: it extracts knowledge and builds a structured graph.

What Gets Extracted¶

When you upload a document, Archivus extracts:

Entities

People (names, roles, contact information)
Organizations (companies, departments, divisions)
Locations (addresses, regions, facilities)
Concepts (products, services, topics)
Temporal markers (dates, timeframes, events)

Relationships

Employs (Person → Organization)
Authored (Person → Document)
Located at (Entity → Location)
References (Document → Entity)
Mentions (Document → Concept)
Supersedes (Document → Document)

Claims

Factual statements that can be verified, contradicted, or supported:

Claim: "Contract ABC requires 60 days notice for termination"
  ├─ Source: vendor_agreement_2024.pdf
  ├─ Section: 12.3
  ├─ Confidence: 0.98
  ├─ Valid from: 2024-03-15
  └─ Status: Active

Context

Every node carries critical context:

Temporal: When was this valid?
Geographic: Where does this apply?
Provenance: Who said it? What's the source?
Confidence: How certain are we?

Not Triples—Quadruples

Traditional knowledge graphs use triples: (Subject, Predicate, Object). Archivus uses quadruples: (Entity, Relationship, Entity, CONTEXT). Context isn't metadata—it's the structure that makes knowledge actionable.

Why This Matters¶

1. Persistent Intelligence

Knowledge accumulates. Every document enriches the graph. Unlike session-based chatbots that forget, your knowledge graph remembers—and grows smarter over time.

2. Relationship Discovery

The graph reveals connections that text search cannot:

"Show me all people who authored documents related to Project X"
"What companies are mentioned in contracts signed in Q4 2024?"
"Find all documents that reference both Entity A and Entity B"

3. Temporal Awareness

Facts change. The graph tracks when:

Current version vs. historical versions
Superseded information marked but preserved
Time-based queries: "What did we know about X in January?"

4. Contradiction Detection

Claims can conflict. The graph makes this explicit:

Document A says X
Document B says Y
These contradict
Both preserved, conflict surfaced

Data Sources¶

The knowledge graph ingests from multiple sources:

Source Type	Examples	Integration Method
Documents	PDFs, Word, contracts, reports	AI extraction
Voice	Meetings, calls, conversations	Real-time transcription + extraction
Connectors	POS systems, review platforms, APIs	Structured schema mapping
Research	Web intelligence, fact-checking	Autonomous research with provenance
Manual	User corrections, verified updates	High-confidence direct input

Everything flows into one unified graph.

Pillar 2: Symbolic Reasoning¶

Query the graph BEFORE calling the LLM¶

Most AI systems jump straight to neural generation. Archivus takes a different path: symbolic reasoning first, neural generation second.

The Reasoning Pipeline¶

Step 1: Entity Extraction

User asks: "What are our vendor termination obligations?"

System extracts entities: - Type: vendor - Type: termination clause - Type: obligation

Step 2: Graph Traversal

Query the knowledge graph: - Find all entities of type "vendor" - Find relationships of type "has_contract" - Find clauses of type "termination" - Retrieve with full context and provenance

Step 3: Relationship Discovery

Build inference chains: - Vendor A has Contract X - Contract X contains Clause 12.3 - Clause 12.3 specifies 60-day notice - Therefore: Vendor A requires 60-day notice

Step 4: Contradiction Check

Symbolically verify consistency: - Are there multiple termination clauses for Vendor A? - Do any contradict? - Which is most recent? - Surface conflicts before synthesis

Step 5: Temporal Filtering

Apply time-based logic: - Is this clause current or superseded? - When was it last updated? - Are there pending amendments?

Step 6: Provenance Assembly

Build the evidence chain: - Fact → Claim → Document → Section → Page - Every link preserved - Ready for verification

Step 7: Neural Synthesis

NOW call the LLM: - Here are the verified facts - Here are the relationships - Here are any contradictions - Generate a fluent response grounded in this structure

Why This Matters¶

1. Grounded Responses

The LLM can only synthesize what the graph provides. It cannot hallucinate facts that don't exist in the knowledge base.

2. Explainable AI

Every response can be deconstructed: - Why did you say X? → Because Fact A and Fact B - Where did Fact A come from? → Document Y, Section Z - How confident are you? → 0.95 based on extraction method

3. Contradiction Surface

The system doesn't hide inconsistency:

Warning: Conflicting termination requirements detected

Current requirement: 60 days (per Amendment 2)
Previous requirement: 30 days (per Original Contract)
Changed on: 2024-06-20

4. Inference Chains

Complex questions get logical answers:

Query: "Who has access to confidential Project X information?"

Reasoning chain: - Project X has confidential classification - Document D123 references Project X - User Alice authored Document D123 - User Bob has access to Document D123 - Therefore: Alice and Bob have potential access

All symbolic. All verifiable.

The Research Foundation

Academic research on "Context Graph Reasoning" (2024-2025) shows that adding entity descriptions and relationship context improves LLM accuracy by 11-25%. Archivus implements this research at production scale.

Pillar 3: Federated Intelligence¶

The ultimate vision: organizations collaborate through intelligence sharing while maintaining complete data sovereignty.

The Core Principle¶

Intelligence flows. Data stays home. Always.

Documents, PDFs, voice recordings, raw data—these never leave your boundary. What flows between organizations are verified facts with provenance.

How Federation Works¶

Traditional data sharing:

Company A → Sends entire database → Company B
Problems:
  • Complete data exposure
  • Privacy violation risk
  • Compliance challenges
  • IP leakage
  • No control after transfer

Federated intelligence:

Company A → Shares verified claims with provenance → Company B

What flows:
  • Structured facts: "Product X demand increased 40% in Q2"
  • Provenance: Source, confidence, temporal validity
  • Trust signals: Verification status, contradiction flags

What stays home:
  • Raw sales data
  • Internal forecasts
  • Customer identities
  • Proprietary algorithms

Use Cases¶

Supply Chain Intelligence

Manufacturer knows demand forecast. Distributor needs planning signal. Raw sales data is proprietary.

Solution: - Manufacturer extracts verified claim: "Product X: +40% demand Q2" - Claim includes confidence: 0.92 - Claim includes source type: "sales_forecast" - Distributor receives signal, updates procurement - No raw data exchanged

M&A Due Diligence

Acquiring company needs contract intelligence. Target company has confidential agreements.

Solution: - Target creates temporary federated access - Scope: specific entity types (contracts, obligations) - Duration: 90-day window - Queries logged for audit - Raw contracts never transferred - After close/abandon, access revoked

Industry Consortiums

Law firms share legal precedent intelligence. Client confidentiality is paramount.

Solution: - Verified claims: "Clause type X succeeded in Y jurisdictions" - Anonymized: client identities removed - Aggregated: pattern-level intelligence - Provenance: "verified by 12 firms" - Individual case details stay private

The Trust Layer¶

Federation requires trustless verification. When Company B receives claims from Company A, how does B verify without trusting A's database?

The solution: Cryptographic anchoring

Every federated fact exchange can be anchored to a public consensus ledger:

Company A extracts verified claim
Claim receives cryptographic hash
Hash anchored to public ledger with timestamp
Company B receives claim + anchor reference
Company B verifies via public ledger
No need to trust Company A—cryptographic proof replaces trust

Progressive Implementation¶

Federation is a journey:

Today: Foundation - Tenant-scoped knowledge graphs - Trust chain protocols - Permission scoping architecture

Tomorrow: Cross-Organization - Federated queries between enterprises - Verified fact sharing protocol - Cryptographic provenance

Future: Networks - Industry consortiums - Supply chain intelligence networks - Cross-border compliance verification

The TCP/IP Analogy

TCP/IP federated computer networks while preserving autonomy. Each network retained sovereignty but could exchange packets. Archivus federates knowledge networks while preserving data sovereignty. Each organization retains control but can exchange verified facts.

How the Pillars Work Together¶

The three pillars are not independent—they're symbiotic:

Example: Complex Query¶

User asks: "What are all active vendor contracts with auto-renewal clauses expiring in 2025?"

Pillar 1 (Knowledge Substrate) provides: - All vendor entities - All contract entities - All clause entities of type "auto-renewal" - Temporal data: effective dates, expiration dates - Relationships: vendor → has_contract → contains_clause

Pillar 2 (Symbolic Reasoning) executes: - Graph traversal: vendors with active contracts - Temporal filter: expiration in 2025 - Clause type filter: auto-renewal - Contradiction check: conflicting renewal terms? - Inference: which require action before expiration? - Provenance assembly: link each result to source

Pillar 3 (Federation) enables (future): - Query across subsidiary knowledge graphs - Aggregate parent + child company contracts - Maintain data sovereignty (contracts stay local) - Return verified facts, not raw documents

Result: Structured, verifiable, temporally-aware results that can be audited, shared, and trusted.

The Architectural Promise¶

These three pillars create capabilities that cannot be achieved with traditional architectures:

Knowledge accumulates instead of being queried and forgotten
Reasoning is explainable instead of black-box
Contradictions are surfaced instead of hidden
Intelligence can federate while data stays sovereign

This isn't incremental improvement. This is architectural transformation.

Next: Why Choose Archivus?

The Three Pillars¶

Pillar 1: Knowledge Substrate¶

Every piece of data becomes a node in a tenant-scoped knowledge graph¶

What Gets Extracted¶

Why This Matters¶

Data Sources¶

Pillar 2: Symbolic Reasoning¶

Query the graph BEFORE calling the LLM¶

The Reasoning Pipeline¶

Why This Matters¶

Pillar 3: Federated Intelligence¶

Enterprises share verified facts, not raw data¶

The Core Principle¶

How Federation Works¶

Use Cases¶

The Trust Layer¶

Progressive Implementation¶

How the Pillars Work Together¶

Example: Complex Query¶

The Architectural Promise¶