Proactive Research Engine

Autonomous Research with Source of Truth Grounding (Pro+)


Overview

The Proactive Research Engine extends Archie’s capabilities beyond your document library. Research the web, validate claims against external sources, and ground findings in your verified Source of Truth documents.


Core Capabilities

Web Research

Search and synthesize information from the web:

User: "Research current market rates for cloud storage services"

Archie:
1. Searching Perplexity Sonar for "cloud storage pricing 2026"
2. Found 12 relevant sources
3. Synthesizing findings...

Results:
- AWS S3: $0.023/GB/month (Standard)
- Azure Blob: $0.0184/GB/month (Hot tier)
- Google Cloud: $0.020/GB/month (Standard)
- Wasabi: $6.99/TB/month (no egress fees)

Sources:
- aws.amazon.com/s3/pricing
- azure.microsoft.com/pricing/details/storage
- cloud.google.com/storage/pricing

Power Mode (Pro+)

Multi-provider parallel search for comprehensive research:

┌─────────────────────────────────────────────────────────────┐
│                     Research Query                           │
│       "Compare enterprise document management solutions"     │
└──────────────────────────┬──────────────────────────────────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
        ┌─────▼─────┐ ┌────▼────┐ ┌─────▼─────┐
        │ Perplexity│ │  Brave  │ │   Bing    │
        │   Sonar   │ │ Search  │ │   News    │
        └─────┬─────┘ └────┬────┘ └─────┬─────┘
              │            │            │
        ┌─────▼────────────▼────────────▼─────┐
        │         Result Aggregation           │
        │   Deduplication │ Ranking │ Synthesis│
        └─────────────────┬───────────────────┘
                          │
                    ┌─────▼─────┐
                    │ Grounding │
                    │  Against  │
                    │   SoT     │
                    └───────────┘

Power Mode Providers:

  • Perplexity Sonar - AI-powered search synthesis
  • Brave Search - Privacy-focused web search
  • Bing News - News and current events

Source of Truth Grounding

Validate research findings against your verified documents:

User: "Research competitor pricing and compare to our current contracts"

Archie:
1. Researching competitor pricing...
2. Found external data on 5 competitors
3. Grounding against Source of Truth collection...

Findings:
- Market average: $50/user/month
- Our current contract (Contract-2024.pdf): $45/user/month
- Potential savings identified: 10% below market

⚠️ Discrepancy found:
- External source claims competitor X offers $35/user
- Source of Truth (Vendor-Analysis.pdf) shows $42/user
- Recommend verification before decisions

Confidence: High (0.89) - Grounded in 3 Source of Truth documents

Research Tools

propose_research

Initiate a research task:

POST /api/v1/archie/chat
{
  "message": "Research current trends in AI document processing",
  "tools": ["propose_research"]
}

Response:
{
  "research_id": "res_123",
  "status": "proposed",
  "scope": {
    "query": "AI document processing trends 2026",
    "sources": ["web", "news"],
    "depth": "comprehensive"
  },
  "estimated_credits": 5
}

search_web

Direct web search:

{
  "tool": "search_web",
  "parameters": {
    "query": "enterprise OCR accuracy benchmarks",
    "sources": ["perplexity"],
    "max_results": 10
  }
}

ground_finding

Validate against Source of Truth:

{
  "tool": "ground_finding",
  "parameters": {
    "finding": "Market leader offers 99.5% OCR accuracy",
    "source_of_truth_collection": "verified_benchmarks"
  }
}

Response:
{
  "grounding_status": "partial_match",
  "confidence": 0.75,
  "supporting_documents": [
    {
      "document": "OCR-Benchmarks-2025.pdf",
      "relevant_excerpt": "Leading providers achieve 98-99% accuracy...",
      "match_score": 0.82
    }
  ],
  "discrepancies": [
    "External claim: 99.5% vs Document claim: 99% maximum"
  ]
}

validate_claim

Fact-check external claims:

{
  "tool": "validate_claim",
  "parameters": {
    "claim": "Company X processes 1 million documents per day",
    "validation_sources": ["web", "source_of_truth"]
  }
}

Response:
{
  "claim_status": "unverified",
  "web_evidence": {
    "supporting": 2,
    "contradicting": 1,
    "neutral": 3
  },
  "source_of_truth_evidence": {
    "found": false,
    "message": "No relevant documents in Source of Truth"
  },
  "recommendation": "Verify directly with Company X"
}

Research Task Management

Task Lifecycle

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Proposed   │────▶│   Active    │────▶│  Completed  │
└─────────────┘     └──────┬──────┘     └─────────────┘
                           │
                    ┌──────▼──────┐
                    │   Paused    │
                    │ (User input │
                    │   needed)   │
                    └─────────────┘

API Endpoints

Endpoint Method Description
/research/tasks GET List research tasks
/research/tasks POST Create research task
/research/tasks/{id} GET Get task details
/research/tasks/{id}/status GET Get task status
/research/tasks/{id}/results GET Get research results
/research/tasks/{id}/cancel POST Cancel task

Research Task Status

GET /api/v1/research/tasks/res_123/status

Response:
{
  "task_id": "res_123",
  "status": "active",
  "progress": {
    "phase": "grounding",
    "percent_complete": 75,
    "sources_searched": 8,
    "documents_grounded": 3
  },
  "preliminary_findings": 5,
  "estimated_completion": "2026-01-18T10:35:00Z"
}

Intelligence Reports

Research tasks automatically generate professional, publication-ready intelligence reports.

Report Modes

Standard Mode (Pro):

  • Concise research brief (800-1500 words)
  • Executive summary with key takeaway
  • Findings organized by confidence level
  • Tiered source list

Power Mode (Team+):

  • Comprehensive intelligence report (2000-3500 words)
  • Quick Facts metrics table
  • Table of Contents with linked sections
  • Detailed evidence assessment with source quality matrix
  • Contradictions & limitations analysis
  • Strategic implications with risk monitoring
  • Prioritized recommendations with success metrics

Report Structure (Power Mode)

# [Topic]: Intelligence Report

**Report Date:** January 19, 2026
**Classification:** Research Intelligence Report
**Confidence Level:** HIGH - Multiple Tier 1 sources verified

---

## Executive Summary

[4-5 sentence summary of key findings]

> **Bottom Line:** [One powerful sentence for executives]

---

## Quick Facts

| Metric | Value |
|--------|-------|
| Total Findings | 47 |
| Verified Findings | 12 |
| Source Quality | Primarily Tier 1 |
| Confidence Level | HIGH |

---

## Table of Contents
1. Key Findings
2. Detailed Analysis
3. Evidence Assessment
4. Contradictions & Limitations
5. Strategic Implications
6. Recommendations
7. Sources & Methodology

---

[Full sections follow...]

Accessing Reports

Via UI:

  1. Navigate to Research tab
  2. Click on completed task
  3. View full report or download as Markdown

Via API:

GET /api/v1/research/tasks/{id}/report

Response:
{
  "task_id": "res_123",
  "topic": "AI regulations in Canada",
  "report": "# AI Regulations in Canada: Intelligence Report\n\n..."
}

Report Features

Feature Standard Power Mode
Executive Summary
Bottom Line Callout
Quick Facts Table -
Table of Contents -
Evidence Assessment -
Source Quality Matrix -
Risk Monitoring Table -
Success Metrics -

Source of Truth Collections

Marking Collections as Source of Truth

PUT /api/v1/collections/{id}
{
  "is_source_of_truth": true,
  "sot_priority": 1,
  "sot_categories": ["pricing", "contracts", "policies"]
}

Source of Truth Hierarchy

  1. Priority 1 - Highest authority (e.g., signed contracts)
  2. Priority 2 - Official documents (e.g., policies)
  3. Priority 3 - Reference materials (e.g., guidelines)

When grounding, higher priority sources take precedence.


Epistemic Metadata

Research results include confidence and provenance tracking:

{
  "finding": "Cloud storage market growing 25% YoY",
  "epistemic_metadata": {
    "confidence": 0.85,
    "confidence_factors": {
      "source_reliability": 0.9,
      "recency": 0.8,
      "corroboration": 0.85
    },
    "provenance": {
      "primary_source": "Gartner Research Report 2026",
      "secondary_sources": ["AWS Blog", "Azure Documentation"],
      "grounded_in_sot": true,
      "sot_documents": ["Market-Analysis-2025.pdf"]
    },
    "limitations": [
      "Data from Q3 2025, may not reflect recent changes"
    ]
  }
}

Use Cases

Competitive Intelligence

"Research our top 3 competitors' pricing and compare to our current rates"
→ Web search for competitor pricing
→ Ground against internal pricing documents
→ Generate comparison report

Due Diligence

"Research this company's financials and validate against our assessment"
→ Search financial news and reports
→ Ground against internal due diligence docs
→ Flag discrepancies for review

Policy Compliance

"Research latest GDPR requirements and compare to our current policy"
→ Search for GDPR updates
→ Ground against current privacy policy
→ Identify gaps and recommendations

Market Research

"Research current trends in document AI and summarize for leadership"
→ Multi-source web research
→ Synthesize findings
→ Generate executive summary

Credits and Limits

Operation Credits Notes
Basic web search 2 Single provider
Power Mode search 5 Multi-provider parallel
Ground finding 2 Per finding validated
Validate claim 2 Per claim checked
Full research task 5-15 Depends on scope

Tier Limits

Tier Research Tasks/Day Power Mode
Pro 20 Available
Team 100 Available
Enterprise Unlimited Available

Best Practices

Research Quality

  1. Be Specific - Narrow queries yield better results
  2. Use Source of Truth - Ground findings in verified docs
  3. Check Confidence - Review epistemic metadata
  4. Verify Discrepancies - Don’t auto-trust external sources

Cost Optimization

  1. Start Narrow - Expand scope if needed
  2. Use Basic Search First - Upgrade to Power Mode if insufficient
  3. Batch Related Queries - Combine related research

Source of Truth Management

  1. Curate Carefully - Only mark verified documents as SoT
  2. Set Priorities - Establish clear authority hierarchy
  3. Keep Current - Update SoT documents regularly
  4. Categorize - Enable targeted grounding

API Reference

Research Tools in Archie

Tool Description Credits
propose_research Start research task 2
search_web Web search 2-5
get_research_status Check progress 0
ground_finding Validate vs SoT 2
validate_claim Fact-check claim 2

Research Task Endpoints

Endpoint Method Description
/research/tasks GET List tasks
/research/tasks POST Create task
/research/tasks/{id} GET Task details
/research/tasks/{id}/results GET Task results


Ready to research? Ask Archie: “Research current market trends in [your topic]”