Skip to content

Knowledge Graph Architecture

The Knowledge Graph is the substrate layer that transforms unstructured data into queryable, verifiable intelligence.

The Problem

LLMs alone have no memory. They hallucinate connections. They sound confident but don't know anything.

Enterprise needs answers it can trust and prove.

The Solution: Contextual Knowledge Graph

A tenant-scoped graph where every piece of intelligence—extracted from documents, voice, connectors, or research—becomes a verifiable node with rich context.

graph LR
    subgraph Sources["Intelligence Sources"]
        DOC[Documents]
        VOICE[Voice Conversations]
        CONN[Data Connectors]
        WEB[Web Research]
    end

    subgraph Graph["Knowledge Graph"]
        ENT[Entities]
        REL[Relationships]
        CLAIMS[Claims]
        PROV[Provenance]
    end

    subgraph Output["Grounded AI"]
        QUERY[Query Graph]
        VERIFY[Verify Facts]
        RESPOND[Generate Response]
    end

    DOC --> Graph
    VOICE --> Graph
    CONN --> Graph
    WEB --> Graph

    Graph --> QUERY
    QUERY --> VERIFY
    VERIFY --> RESPOND

The Quadruple Model

Traditional knowledge graphs use triples:

(Subject, Predicate, Object)
(Obama, president_of, USA)

Archivus uses quadruples with rich context:

(Entity, Relationship, Entity, CONTEXT)

Context = {
    temporal:    { from: "2009-01-20", until: "2017-01-20" }
    geographic:  { country: "USA" }
    provenance:  { source: "inauguration.pdf", confidence: 0.99 }
    supporting:  ["Barack Obama was inaugurated as the 44th President..."]
}

Why this matters: "Who is the president?" has different answers at different times. Without temporal context, the graph cannot reason about change.

Core Components

1. Entities

People, organizations, concepts, products, events—anything with identity.

Properties: - Name and aliases - Entity type (person, organization, location, concept, etc.) - Description (long-form context for LLM injection) - External identifiers (Wikidata QID, Wikipedia URL) - Confidence score - Source provenance

Key insight: Entity descriptions improve LLM accuracy by 11-25% (arXiv:2406.11160v3). The LLM performs better when it knows about the entity, not just its name.

2. Relationships

Connections between entities with temporal and geographic validity.

Types: - employs, employed_by - authored, published_by - located_at, part_of - mentions, references - supports, contradicts (for claims) - supersedes (for versioning)

Context fields: - Valid from / valid until (temporal bounds) - Geographic scope - Confidence score - Evidence sentences - Source provenance

3. Claims

Factual statements extracted from sources.

Properties: - Claim text (the assertion) - Source type (document, web, connector, research) - Confidence (0.0 - 1.0) - Validation status (unverified, verified, disputed) - About entities (linked) - Supporting evidence - Content hash (for verification)

Claim Network: Claims can support or contradict other claims. This enables epistemic reasoning—reasoning about knowledge itself.

4. Provenance Tracking

Every node tracks: - Source type (primary/secondary/tertiary) - Source ID (document, connector, research finding) - Extraction method (AI, schema mapping, manual) - First seen / last updated timestamps - Confidence scoring

This is what makes "Show me the sources" answerable.

Intelligence Flow

From Documents

1. User uploads PDF
2. Text extraction (OCR if needed)
3. AI entity extraction (Claude)
   → People, organizations, dates, amounts
4. Relationship extraction
   → "John Smith works at Acme Corp"
5. Claim extraction with evidence
   → "Revenue increased 20%" + [supporting sentence]
6. Knowledge Graph insertion
   → Entities deduplicated, relations created, claims stored

From Voice

1. Real-time transcription (Deepgram)
2. AI extraction during conversation
   → Extract entities, claims, relationships
3. Knowledge Graph update
   → All intelligence captured and linked

From Data Connectors

1. Connector syncs structured data (e.g., Google Reviews, POS)
2. Schema mapping to entities
   → Direct entity creation (no LLM needed)
3. Knowledge Graph insertion
   → Linked to existing entities

Query-Time: CGR3 Pipeline

CGR3 = Context Graph Reasoning (Retrieve → Rank → Reason)

Stage 1: Retrieve

  • Extract entities from user's question
  • Semantic search using embeddings
  • Text search on names and descriptions
  • Retrieve connected claims

Stage 2: Rank

Weight signals: - Similarity: How relevant to the query? - Confidence: How certain is this claim? - Recency: When was this last updated? - Authority: What's the source credibility? - Corroboration: How many sources agree?

Stage 3: Reason

Build structured context for LLM:

VERIFIED FACTS (confidence >= 0.8):
- Claim A (source: contract.pdf, 98% confidence)
- Claim B (source: Wikidata, 95% confidence)

CONFLICTING INFORMATION:
- Claim C says X (source: old_report.pdf, 75% confidence)
- Claim D says Y (source: new_report.pdf, 92% confidence)
[TEMPORAL DIFFERENCE]: D supersedes C (newer date)

UNCERTAIN (low confidence):
- Claim E (source: unverified_email.pdf, 45% confidence)

LLM receives structured truth instead of raw documents.

Contradiction Detection

The graph symbolically detects conflicts before the LLM generates a response.

Types: - Temporal: Same claim, different dates - Direct: Claim A contradicts Claim B - Semantic: Similar claims with different values

Resolution: - Surface all conflicting claims to the LLM - Label contradiction type - Provide resolution hints from AI analysis - Let LLM explain the conflict in fluent language

Entity Enrichment

External knowledge providers contribute to the graph without replacing existing data.

Sources: - Wikidata: Structured properties, canonical QIDs - Wikipedia: Prose context, descriptions, categories - Industry databases: Domain-specific enrichment

Source Merging: - Duplicate claims are merged, not rejected - Sources accumulate with corroboration bonus (+5% per source) - Temporal conflicts tracked when sources disagree - Everything maintains provenance

Source Authority

Tenant-configurable trust multipliers:

Source Type Default Authority
Internal Documents 0.85
Wikidata 0.80
Wikipedia 0.75
Data Connectors 0.70
Web Research 0.60
User Input 0.50

Tenants can override these to match their trust models.

Temporal Reasoning

The graph supports queries like: - "Who was the CEO in Q3 2024?" - "What contracts were active on January 1, 2025?" - "Show me claims that changed between March and June"

All relationships and claims track valid_from and valid_until timestamps.

Federation-Ready Design

All entities and claims are designed to flow across organizational boundaries:

What federates: - Entity references (canonical IDs) - Verified claims (with full provenance) - Relationship signals (anonymized if needed) - Trust scores

What stays home: - Documents (source material) - Raw data (never leaves) - PII (protected)

Security & Isolation

Every Knowledge Graph table enforces: - Row-Level Security (RLS) with tenant_id - Service role bypass for background enrichment - Audit logging on all mutations

A tenant can never see another tenant's graph.

Performance

  • 100% embedding coverage for semantic search
  • 86.7% entity enrichment rate from external sources
  • Indexes: 41+ specialized indexes for graph traversal
  • Caching: Redis caching for frequently accessed entities

What This Enables

Without Knowledge Graph With Knowledge Graph
"AI said X" "AI said X, here's the source"
No temporal context "This was true in Q3 2024"
Hallucinated connections "No evidence for this relationship"
Black box reasoning "Here's the chain of inference"
Session-based memory Persistent organizational knowledge
No contradiction detection "Sources A and B disagree"

The Result

Every question Archie answers is grounded in verified facts from your organization's Knowledge Graph.

Not "here's what I think"—here's what I know, and why I know it.


The foundation of verifiable intelligence.