Proactive Research Engine

Autonomous Research with Source of Truth Grounding (Pro+)

Overview

The Proactive Research Engine extends Archie’s capabilities beyond your document library. Research the web, validate claims against external sources, and ground findings in your verified Source of Truth documents.

Core Capabilities

Web Research

Search and synthesize information from the web:

User: "Research current market rates for cloud storage services"

Archie:
1. Searching Perplexity Sonar for "cloud storage pricing 2026"
2. Found 12 relevant sources
3. Synthesizing findings...

Results:
- AWS S3: $0.023/GB/month (Standard)
- Azure Blob: $0.0184/GB/month (Hot tier)
- Google Cloud: $0.020/GB/month (Standard)
- Wasabi: $6.99/TB/month (no egress fees)

Sources:
- aws.amazon.com/s3/pricing
- azure.microsoft.com/pricing/details/storage
- cloud.google.com/storage/pricing

Power Mode (Pro+)

Multi-provider parallel search for comprehensive research:

┌─────────────────────────────────────────────────────────────┐
│                     Research Query                           │
│       "Compare enterprise document management solutions"     │
└──────────────────────────┬──────────────────────────────────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
        ┌─────▼─────┐ ┌────▼────┐ ┌─────▼─────┐
        │ Perplexity│ │  Brave  │ │   Bing    │
        │   Sonar   │ │ Search  │ │   News    │
        └─────┬─────┘ └────┬────┘ └─────┬─────┘
              │            │            │
        ┌─────▼────────────▼────────────▼─────┐
        │         Result Aggregation           │
        │   Deduplication │ Ranking │ Synthesis│
        └─────────────────┬───────────────────┘
                          │
                    ┌─────▼─────┐
                    │ Grounding │
                    │  Against  │
                    │   SoT     │
                    └───────────┘

Power Mode Providers:

Perplexity Sonar - AI-powered search synthesis
Brave Search - Privacy-focused web search
Bing News - News and current events

Source of Truth Grounding

Validate research findings against your verified documents:

User: "Research competitor pricing and compare to our current contracts"

Archie:
1. Researching competitor pricing...
2. Found external data on 5 competitors
3. Grounding against Source of Truth collection...

Findings:
- Market average: $50/user/month
- Our current contract (Contract-2024.pdf): $45/user/month
- Potential savings identified: 10% below market

⚠️ Discrepancy found:
- External source claims competitor X offers $35/user
- Source of Truth (Vendor-Analysis.pdf) shows $42/user
- Recommend verification before decisions

Confidence: High (0.89) - Grounded in 3 Source of Truth documents

Research Tools

propose_research

Initiate a research task:

POST /api/v1/archie/chat
{
  "message": "Research current trends in AI document processing",
  "tools": ["propose_research"]
}

Response:
{
  "research_id": "res_123",
  "status": "proposed",
  "scope": {
    "query": "AI document processing trends 2026",
    "sources": ["web", "news"],
    "depth": "comprehensive"
  },
  "estimated_credits": 5
}

search_web

Direct web search:

{
  "tool": "search_web",
  "parameters": {
    "query": "enterprise OCR accuracy benchmarks",
    "sources": ["perplexity"],
    "max_results": 10
  }
}

ground_finding

Validate against Source of Truth:

{
  "tool": "ground_finding",
  "parameters": {
    "finding": "Market leader offers 99.5% OCR accuracy",
    "source_of_truth_collection": "verified_benchmarks"
  }
}

Response:
{
  "grounding_status": "partial_match",
  "confidence": 0.75,
  "supporting_documents": [
    {
      "document": "OCR-Benchmarks-2025.pdf",
      "relevant_excerpt": "Leading providers achieve 98-99% accuracy...",
      "match_score": 0.82
    }
  ],
  "discrepancies": [
    "External claim: 99.5% vs Document claim: 99% maximum"
  ]
}

validate_claim

Fact-check external claims:

{
  "tool": "validate_claim",
  "parameters": {
    "claim": "Company X processes 1 million documents per day",
    "validation_sources": ["web", "source_of_truth"]
  }
}

Response:
{
  "claim_status": "unverified",
  "web_evidence": {
    "supporting": 2,
    "contradicting": 1,
    "neutral": 3
  },
  "source_of_truth_evidence": {
    "found": false,
    "message": "No relevant documents in Source of Truth"
  },
  "recommendation": "Verify directly with Company X"
}

Research Task Management

Task Lifecycle

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Proposed   │────▶│   Active    │────▶│  Completed  │
└─────────────┘     └──────┬──────┘     └─────────────┘
                           │
                    ┌──────▼──────┐
                    │   Paused    │
                    │ (User input │
                    │   needed)   │
                    └─────────────┘

API Endpoints

Endpoint	Method	Description
`/research/tasks`	GET	List research tasks
`/research/tasks`	POST	Create research task
`/research/tasks/{id}`	GET	Get task details
`/research/tasks/{id}/status`	GET	Get task status
`/research/tasks/{id}/results`	GET	Get research results
`/research/tasks/{id}/cancel`	POST	Cancel task

Research Task Status

GET /api/v1/research/tasks/res_123/status

Response:
{
  "task_id": "res_123",
  "status": "active",
  "progress": {
    "phase": "grounding",
    "percent_complete": 75,
    "sources_searched": 8,
    "documents_grounded": 3
  },
  "preliminary_findings": 5,
  "estimated_completion": "2026-01-18T10:35:00Z"
}

Intelligence Reports

Research tasks automatically generate professional, publication-ready intelligence reports.

Report Modes

Standard Mode (Pro):

Concise research brief (800-1500 words)
Executive summary with key takeaway
Findings organized by confidence level
Tiered source list

Power Mode (Team+):

Comprehensive intelligence report (2000-3500 words)
Quick Facts metrics table
Table of Contents with linked sections
Detailed evidence assessment with source quality matrix
Contradictions & limitations analysis
Strategic implications with risk monitoring
Prioritized recommendations with success metrics

Report Structure (Power Mode)

# [Topic]: Intelligence Report

**Report Date:** January 19, 2026
**Classification:** Research Intelligence Report
**Confidence Level:** HIGH - Multiple Tier 1 sources verified

---

## Executive Summary

[4-5 sentence summary of key findings]

> **Bottom Line:** [One powerful sentence for executives]

---

## Quick Facts

| Metric | Value |
|--------|-------|
| Total Findings | 47 |
| Verified Findings | 12 |
| Source Quality | Primarily Tier 1 |
| Confidence Level | HIGH |

---

## Table of Contents
1. Key Findings
2. Detailed Analysis
3. Evidence Assessment
4. Contradictions & Limitations
5. Strategic Implications
6. Recommendations
7. Sources & Methodology

---

[Full sections follow...]

Accessing Reports

Via UI:

Navigate to Research tab
Click on completed task
View full report or download as Markdown

Via API:

GET /api/v1/research/tasks/{id}/report

Response:
{
  "task_id": "res_123",
  "topic": "AI regulations in Canada",
  "report": "# AI Regulations in Canada: Intelligence Report\n\n..."
}

Report Features

Feature	Standard	Power Mode
Executive Summary	✓	✓
Bottom Line Callout	✓	✓
Quick Facts Table	-	✓
Table of Contents	-	✓
Evidence Assessment	-	✓
Source Quality Matrix	-	✓
Risk Monitoring Table	-	✓
Success Metrics	-	✓

Source of Truth Collections

Marking Collections as Source of Truth

PUT /api/v1/collections/{id}
{
  "is_source_of_truth": true,
  "sot_priority": 1,
  "sot_categories": ["pricing", "contracts", "policies"]
}

Source of Truth Hierarchy

Priority 1 - Highest authority (e.g., signed contracts)
Priority 2 - Official documents (e.g., policies)
Priority 3 - Reference materials (e.g., guidelines)

When grounding, higher priority sources take precedence.

Epistemic Metadata

Research results include confidence and provenance tracking:

{
  "finding": "Cloud storage market growing 25% YoY",
  "epistemic_metadata": {
    "confidence": 0.85,
    "confidence_factors": {
      "source_reliability": 0.9,
      "recency": 0.8,
      "corroboration": 0.85
    },
    "provenance": {
      "primary_source": "Gartner Research Report 2026",
      "secondary_sources": ["AWS Blog", "Azure Documentation"],
      "grounded_in_sot": true,
      "sot_documents": ["Market-Analysis-2025.pdf"]
    },
    "limitations": [
      "Data from Q3 2025, may not reflect recent changes"
    ]
  }
}

Use Cases

Competitive Intelligence

"Research our top 3 competitors' pricing and compare to our current rates"
→ Web search for competitor pricing
→ Ground against internal pricing documents
→ Generate comparison report

Due Diligence

"Research this company's financials and validate against our assessment"
→ Search financial news and reports
→ Ground against internal due diligence docs
→ Flag discrepancies for review

Policy Compliance

"Research latest GDPR requirements and compare to our current policy"
→ Search for GDPR updates
→ Ground against current privacy policy
→ Identify gaps and recommendations

Market Research

"Research current trends in document AI and summarize for leadership"
→ Multi-source web research
→ Synthesize findings
→ Generate executive summary

Credits and Limits

Operation	Credits	Notes
Basic web search	2	Single provider
Power Mode search	5	Multi-provider parallel
Ground finding	2	Per finding validated
Validate claim	2	Per claim checked
Full research task	5-15	Depends on scope

Tier Limits

Tier	Research Tasks/Day	Power Mode
Pro	20	Available
Team	100	Available
Enterprise	Unlimited	Available

Best Practices

Research Quality

Be Specific - Narrow queries yield better results
Use Source of Truth - Ground findings in verified docs
Check Confidence - Review epistemic metadata
Verify Discrepancies - Don’t auto-trust external sources

Cost Optimization

Start Narrow - Expand scope if needed
Use Basic Search First - Upgrade to Power Mode if insufficient
Batch Related Queries - Combine related research

Source of Truth Management

Curate Carefully - Only mark verified documents as SoT
Set Priorities - Establish clear authority hierarchy
Keep Current - Update SoT documents regularly
Categorize - Enable targeted grounding

API Reference

Research Tools in Archie

Tool	Description	Credits
`propose_research`	Start research task	2
`search_web`	Web search	2-5
`get_research_status`	Check progress	0
`ground_finding`	Validate vs SoT	2
`validate_claim`	Fact-check claim	2

Research Task Endpoints

Endpoint	Method	Description
`/research/tasks`	GET	List tasks
`/research/tasks`	POST	Create task
`/research/tasks/{id}`	GET	Task details
`/research/tasks/{id}/results`	GET	Task results

Ready to research? Ask Archie: “Research current market trends in [your topic]”