Knowledge Graph Architecture¶
The Knowledge Graph is the substrate layer that transforms unstructured data into queryable, verifiable intelligence.
The Problem¶
LLMs alone have no memory. They hallucinate connections. They sound confident but don't know anything.
Enterprise needs answers it can trust and prove.
The Solution: Contextual Knowledge Graph¶
A tenant-scoped graph where every piece of intelligence—extracted from documents, voice, connectors, or research—becomes a verifiable node with rich context.
graph LR
subgraph Sources["Intelligence Sources"]
DOC[Documents]
VOICE[Voice Conversations]
CONN[Data Connectors]
WEB[Web Research]
end
subgraph Graph["Knowledge Graph"]
ENT[Entities]
REL[Relationships]
CLAIMS[Claims]
PROV[Provenance]
end
subgraph Output["Grounded AI"]
QUERY[Query Graph]
VERIFY[Verify Facts]
RESPOND[Generate Response]
end
DOC --> Graph
VOICE --> Graph
CONN --> Graph
WEB --> Graph
Graph --> QUERY
QUERY --> VERIFY
VERIFY --> RESPOND The Quadruple Model¶
Traditional knowledge graphs use triples:
Archivus uses quadruples with rich context:
(Entity, Relationship, Entity, CONTEXT)
Context = {
temporal: { from: "2009-01-20", until: "2017-01-20" }
geographic: { country: "USA" }
provenance: { source: "inauguration.pdf", confidence: 0.99 }
supporting: ["Barack Obama was inaugurated as the 44th President..."]
}
Why this matters: "Who is the president?" has different answers at different times. Without temporal context, the graph cannot reason about change.
Core Components¶
1. Entities¶
People, organizations, concepts, products, events—anything with identity.
Properties: - Name and aliases - Entity type (person, organization, location, concept, etc.) - Description (long-form context for LLM injection) - External identifiers (Wikidata QID, Wikipedia URL) - Confidence score - Source provenance
Key insight: Entity descriptions improve LLM accuracy by 11-25% (arXiv:2406.11160v3). The LLM performs better when it knows about the entity, not just its name.
2. Relationships¶
Connections between entities with temporal and geographic validity.
Types: - employs, employed_by - authored, published_by - located_at, part_of - mentions, references - supports, contradicts (for claims) - supersedes (for versioning)
Context fields: - Valid from / valid until (temporal bounds) - Geographic scope - Confidence score - Evidence sentences - Source provenance
3. Claims¶
Factual statements extracted from sources.
Properties: - Claim text (the assertion) - Source type (document, web, connector, research) - Confidence (0.0 - 1.0) - Validation status (unverified, verified, disputed) - About entities (linked) - Supporting evidence - Content hash (for verification)
Claim Network: Claims can support or contradict other claims. This enables epistemic reasoning—reasoning about knowledge itself.
4. Provenance Tracking¶
Every node tracks: - Source type (primary/secondary/tertiary) - Source ID (document, connector, research finding) - Extraction method (AI, schema mapping, manual) - First seen / last updated timestamps - Confidence scoring
This is what makes "Show me the sources" answerable.
Intelligence Flow¶
From Documents¶
1. User uploads PDF
↓
2. Text extraction (OCR if needed)
↓
3. AI entity extraction (Claude)
→ People, organizations, dates, amounts
↓
4. Relationship extraction
→ "John Smith works at Acme Corp"
↓
5. Claim extraction with evidence
→ "Revenue increased 20%" + [supporting sentence]
↓
6. Knowledge Graph insertion
→ Entities deduplicated, relations created, claims stored
From Voice¶
1. Real-time transcription (Deepgram)
↓
2. AI extraction during conversation
→ Extract entities, claims, relationships
↓
3. Knowledge Graph update
→ All intelligence captured and linked
From Data Connectors¶
1. Connector syncs structured data (e.g., Google Reviews, POS)
↓
2. Schema mapping to entities
→ Direct entity creation (no LLM needed)
↓
3. Knowledge Graph insertion
→ Linked to existing entities
Query-Time: CGR3 Pipeline¶
CGR3 = Context Graph Reasoning (Retrieve → Rank → Reason)
Stage 1: Retrieve¶
- Extract entities from user's question
- Semantic search using embeddings
- Text search on names and descriptions
- Retrieve connected claims
Stage 2: Rank¶
Weight signals: - Similarity: How relevant to the query? - Confidence: How certain is this claim? - Recency: When was this last updated? - Authority: What's the source credibility? - Corroboration: How many sources agree?
Stage 3: Reason¶
Build structured context for LLM:
VERIFIED FACTS (confidence >= 0.8):
- Claim A (source: contract.pdf, 98% confidence)
- Claim B (source: Wikidata, 95% confidence)
CONFLICTING INFORMATION:
- Claim C says X (source: old_report.pdf, 75% confidence)
- Claim D says Y (source: new_report.pdf, 92% confidence)
[TEMPORAL DIFFERENCE]: D supersedes C (newer date)
UNCERTAIN (low confidence):
- Claim E (source: unverified_email.pdf, 45% confidence)
LLM receives structured truth instead of raw documents.
Contradiction Detection¶
The graph symbolically detects conflicts before the LLM generates a response.
Types: - Temporal: Same claim, different dates - Direct: Claim A contradicts Claim B - Semantic: Similar claims with different values
Resolution: - Surface all conflicting claims to the LLM - Label contradiction type - Provide resolution hints from AI analysis - Let LLM explain the conflict in fluent language
Entity Enrichment¶
External knowledge providers contribute to the graph without replacing existing data.
Sources: - Wikidata: Structured properties, canonical QIDs - Wikipedia: Prose context, descriptions, categories - Industry databases: Domain-specific enrichment
Source Merging: - Duplicate claims are merged, not rejected - Sources accumulate with corroboration bonus (+5% per source) - Temporal conflicts tracked when sources disagree - Everything maintains provenance
Source Authority¶
Tenant-configurable trust multipliers:
| Source Type | Default Authority |
|---|---|
| Internal Documents | 0.85 |
| Wikidata | 0.80 |
| Wikipedia | 0.75 |
| Data Connectors | 0.70 |
| Web Research | 0.60 |
| User Input | 0.50 |
Tenants can override these to match their trust models.
Temporal Reasoning¶
The graph supports queries like: - "Who was the CEO in Q3 2024?" - "What contracts were active on January 1, 2025?" - "Show me claims that changed between March and June"
All relationships and claims track valid_from and valid_until timestamps.
Federation-Ready Design¶
All entities and claims are designed to flow across organizational boundaries:
What federates: - Entity references (canonical IDs) - Verified claims (with full provenance) - Relationship signals (anonymized if needed) - Trust scores
What stays home: - Documents (source material) - Raw data (never leaves) - PII (protected)
Security & Isolation¶
Every Knowledge Graph table enforces: - Row-Level Security (RLS) with tenant_id - Service role bypass for background enrichment - Audit logging on all mutations
A tenant can never see another tenant's graph.
Performance¶
- 100% embedding coverage for semantic search
- 86.7% entity enrichment rate from external sources
- Indexes: 41+ specialized indexes for graph traversal
- Caching: Redis caching for frequently accessed entities
What This Enables¶
| Without Knowledge Graph | With Knowledge Graph |
|---|---|
| "AI said X" | "AI said X, here's the source" |
| No temporal context | "This was true in Q3 2024" |
| Hallucinated connections | "No evidence for this relationship" |
| Black box reasoning | "Here's the chain of inference" |
| Session-based memory | Persistent organizational knowledge |
| No contradiction detection | "Sources A and B disagree" |
The Result¶
Every question Archie answers is grounded in verified facts from your organization's Knowledge Graph.
Not "here's what I think"—here's what I know, and why I know it.
The foundation of verifiable intelligence.