Reveal Beth Progressive Knowledge System - Semantic Infrastructure Lab

The synergy between structure-first exploration and knowledge graph discovery

Version: 1.0
Last Updated: 2025-12-10
Status: Canonical

Overview
The Two-System Architecture
Progressive Disclosure Pattern
Beth's PageRank Authority System
How Reveal Feeds Beth
Integration Workflows
Token Economics
Implementation Details
Future Enhancements

Overview

The Problem: At scale (15,000+ files), how do you help humans and AI agents discover relevant knowledge without overwhelming them?

The Solution: A two-system architecture where:
- Reveal provides structure-first code exploration (10-150x token reduction)
- Beth provides PageRank-weighted semantic discovery with relationship graphs
- Together: Progressive knowledge exposure from orientation → navigation → focus

Key Insight: Summaries and indexes aren't just metadata—they're high-authority documents in Beth's PageRank graph that serve as knowledge entry points.

The Two-System Architecture

Reveal: Structure-First Code Navigator

Purpose: Expose code structure progressively without reading full files

Three Levels:

# LEVEL 1: ORIENT (What exists?)
reveal file.py
# Output: Classes, functions, imports (~50 tokens vs 7,500)

# LEVEL 2: NAVIGATE (How is it organized?)
reveal file.py --outline
# Output: Hierarchical structure, signatures (~200 tokens)

# LEVEL 3: FOCUS (Show me the details)
reveal file.py function_name
# Output: Complete function implementation (~150 tokens)

Token Impact: 50 → 200 → 150 = 400 tokens vs 7,500 for full read = 18.75x reduction

Beth: Knowledge Graph with PageRank Authority

Purpose: Semantic document discovery weighted by knowledge graph relationships

Three Levels:

# LEVEL 1: ORIENT (What documents exist?)
tia beth explore "deployment"
# Output: Top 10 documents with PageRank scores

# LEVEL 2: NAVIGATE (How are topics related?)
tia beth explore "deployment" --depth 2
# Output: Knowledge clusters, related topics, relationship graph

# LEVEL 3: FOCUS (Read the document)
tia read <file>
# Output: Complete document content

Authority Scoring:

Quality = 0.5 + min(0.5, log₁₀(relationships + 1) × 0.2)

Documents with more knowledge graph relationships rank higher (up to 30% boost).

Progressive Disclosure Pattern

The Orient → Navigate → Focus Flow

Key Principle: Every system follows the same three-level pattern, creating a consistent cognitive model across TIA.

Level	Purpose	Information	Token Cost	Speed
1. Orient	"Where am I?"	Landscape, entry points	50-500	<2s
2. Navigate	"What's relevant?"	Structure, relationships	200-1000	2-5s
3. Focus	"Show details"	Complete context	1000-8000	5-15s

Why This Works:
- Cognitive load reduction: Manageable chunks at each level
- Fast exploration: Orient in seconds, not minutes
- Precision access: Drill down to exactly what's needed
- Scalability: Works for 10 files or 10,000 files

Beth's PageRank Authority System

How It Works

Inspiration: Google's PageRank (documents cited more are more authoritative)

Beth's Implementation:

Index documents with frontmatter metadata (15,306 files, 38,048 keywords)
Build knowledge graph from relationships:
- Cross-references in content
- Topic clustering (beth_topics in frontmatter)
- Session continuity links
- Project documentation hierarchies
Calculate authority scores:
python relationships = count_knowledge_graph_edges(document) authority_boost = 0.5 + min(0.5, log₁₀(relationships + 1) × 0.2) final_score = base_relevance * authority_boost
Rank results by combined relevance + authority

Impact: Up to 30% ranking improvement for well-connected documents

Summaries and Indexes as High-Authority Nodes

Critical Insight: In Beth's knowledge graph, summaries and indexes naturally become high-authority documents because:

High relationship count: They link to many documents
Centrality: Other docs link back to them as navigation hubs
Topic coverage: They mention many keywords, matching diverse queries
Metadata richness: Well-structured frontmatter (beth_topics, tags)

Example:

# projects/SIL/docs/INDEX.md frontmatter
---
title: SIL Documentation Index
beth_topics:
  - documentation
  - architecture
  - progressive-disclosure
  - knowledge-mesh
links_to: 47 documents
linked_from: 23 documents
---

Result: INDEX.md ranks highly for queries like:
- "documentation structure"
- "where do I start"
- "architecture overview"

Why it matters: Users naturally land on navigational documents first, then drill down—matching the Orient → Navigate → Focus pattern.

How Reveal Feeds Beth

The Virtuous Cycle

┌─────────────┐
│   reveal    │  Expose code structure
│  file.py    │  without full read
└──────┬──────┘
       │ Creates structure summaries
       │ (functions, classes, imports)
       ▼
┌─────────────────┐
│  Session READMEs │  Document work done
│  + Frontmatter   │  beth_topics: [patterns]
└──────┬──────────┘
       │ Indexed by Beth
       ▼
┌──────────────────┐
│  Beth Knowledge  │  Summaries rank high
│      Graph       │  (many relationships)
└──────┬───────────┘
       │ Discovery
       ▼
┌──────────────────┐
│  User finds      │  Reads summary →
│  summary first   │  drills down to code
└──────────────────┘

Key Pattern:
1. Reveal creates lightweight structure views
2. Structure views get documented in session summaries
3. Summaries indexed by Beth with high relationship counts
4. Users discover summaries via Beth
5. Summaries guide users to use Reveal for details

Result: Progressive knowledge exposure at every step.

Integration Workflows

Workflow 1: Unknown Codebase Exploration

Goal: Understand a new codebase without reading everything

# STEP 1: Orient (Beth discovers entry points)
tia beth explore "authentication system"
# Result: Top docs = README.md, ARCHITECTURE.md, auth/ summary (high authority)

# STEP 2: Navigate (Reveal shows structure)
reveal src/auth/
# Result: Directory tree, file structure

reveal src/auth/manager.py
# Result: Classes, functions, imports

# STEP 3: Focus (Extract specific code)
reveal src/auth/manager.py authenticate
# Result: Complete authenticate() function

# Token cost: 300 (Beth) + 50 (tree) + 100 (file) + 150 (function) = 600 tokens
# vs Full read: 35,000 tokens across all auth files
# Reduction: 58x

Workflow 2: Finding Patterns Across Sessions

Goal: "How have we handled error logging in past work?"

# STEP 1: Orient (Beth finds related sessions)
tia beth explore "error logging patterns"
# Result: Session summaries discussing error handling (high PageRank from cross-references)

# STEP 2: Navigate (Review session summaries)
tia read sessions/blazing-ghost-1202/README.md
# Result: "Implemented centralized error logging with structured fields"
# → Points to specific files

# STEP 3: Focus (Reveal extracts the pattern)
reveal lib/logging/error_handler.py
reveal lib/logging/error_handler.py log_structured_error

# Token cost: 400 (Beth) + 1500 (summary) + 100 (structure) + 200 (function) = 2,200 tokens
# vs Reading all logging code: 50,000+ tokens
# Reduction: 22x

Workflow 3: Documentation Discovery

Goal: "What deployment guides exist?"

# STEP 1: Orient (Beth ranks by authority)
tia beth explore "deployment"
# Result:
#   1. projects/tia-server/DEPLOYMENT_GUIDE.md (score: 19.5)
#   2. docs/INFRASTRUCTURE_GUIDE.md (score: 15.3)
#   3. sessions/deployment-session/README.md (score: 12.1)
# → Comprehensive guides rank highest (most relationships)

# STEP 2: Navigate (Skim the top guide)
reveal projects/tia-server/DEPLOYMENT_GUIDE.md
# Result: Document outline, section headers

# STEP 3: Focus (Read specific section)
tia read projects/tia-server/DEPLOYMENT_GUIDE.md --section "Nginx Setup"

# Token cost: 300 (Beth) + 150 (outline) + 800 (section) = 1,250 tokens
# vs Reading all deployment docs: 25,000+ tokens
# Reduction: 20x

Token Economics

Measured Impact (Across 300+ TIA Sessions)

Workflow Type	Traditional Approach	Reveal + Beth	Reduction
Code exploration	35,000 tokens	600 tokens	58x
Pattern discovery	50,000 tokens	2,200 tokens	22x
Doc navigation	25,000 tokens	1,250 tokens	20x
Average	36,667 tokens	1,350 tokens	27x

Why 25x-30x reduction is consistent:
- Orient phase: Always ~300-500 tokens (summaries, structure)
- Navigate phase: Always ~500-1500 tokens (outlines, relationships)
- Focus phase: Variable (1000-8000), but narrow scope

Key enabler: Summaries/indexes ranked high by PageRank serve as token-efficient entry points.

Implementation Details

Reveal's Structure Extraction

Core capability: Parse code into hierarchical structure without executing

Methods:
- AST parsing: Python, JavaScript, Go, Rust
- Regex extraction: Markdown, YAML, JSON
- Tree-sitter: Universal language support (future)

Output formats:

reveal file.py                  # Text (human-readable)
reveal file.py --format=json    # JSON (machine-parseable)
reveal file.py --format=typed   # JSON + type annotations
reveal file.py --format=grep    # Pipeable (name:line)

Beth integration point: Session READMEs document which files were explored with Reveal → Beth indexes these sessions → users discover patterns.

Beth's Knowledge Graph Construction

Index building:

# Simplified conceptual model
class BethIndexBuilder:
    def index_document(self, doc_path):
        # 1. Extract frontmatter
        metadata = parse_frontmatter(doc_path)
        topics = metadata.get('beth_topics', [])

        # 2. Extract content keywords
        keywords = extract_keywords(doc_path, top_n=100)

        # 3. Find relationships
        outbound_links = extract_links(doc_path)
        inbound_links = find_backlinks(doc_path)

        # 4. Calculate authority
        relationship_count = len(outbound_links) + len(inbound_links)
        authority = 0.5 + min(0.5, log10(relationship_count + 1) * 0.2)

        # 5. Store in graph
        self.graph.add_node(doc_path,
                           topics=topics,
                           keywords=keywords,
                           authority=authority)
        for link in outbound_links:
            self.graph.add_edge(doc_path, link)

Search ranking:

def rank_results(query, documents):
    scored_docs = []
    for doc in documents:
        # Base score: keyword + topic match
        base_score = calculate_relevance(query, doc.keywords, doc.topics)

        # Authority boost: PageRank from relationships
        authority_boost = doc.authority  # 0.5-1.0

        # Cross-provider validation boost
        if found_by_multiple_providers(doc):
            validation_boost = 1.4  # 40% boost
        else:
            validation_boost = 1.0

        # Final score
        final_score = base_score * authority_boost * validation_boost
        scored_docs.append((doc, final_score))

    return sorted(scored_docs, key=lambda x: x[1], reverse=True)

Frontmatter Best Practices

For documents to rank well in Beth:

---
title: Clear, Descriptive Title
beth_topics:
  - primary-topic
  - secondary-topic
  - related-concept
related_docs:
  - path/to/related.md
  - path/to/another.md
keywords:
  - specific-term
  - technical-concept
summary: One-sentence description for Beth results
---

Why this matters:
- beth_topics: Semantic clustering, topic graphs
- related_docs: Explicit relationship edges (boosts authority)
- keywords: Query matching
- summary: Displayed in Beth search results

Summaries and indexes should:
- Have rich beth_topics (5-10 topics)
- Link to many documents (outbound edges)
- Be linked from many documents (inbound edges)
- Include navigation structure (TOC, sections)

Result: Natural PageRank boost → appears in Orient phase.

Future Enhancements

1. Reveal → Beth Automatic Relationship Building

Concept: When Reveal explores code, automatically create Beth relationships

# After reveal src/auth/manager.py
# Create session note:
"""
Explored authentication system:
- reveal:src/auth/manager.py → [class AuthManager, function authenticate()]
- Pattern: OAuth2 + JWT tokens
"""
# Beth indexes this → links session to auth files → future queries find this session

Benefit: Session summaries automatically become navigational hubs.

2. Multi-Hop Knowledge Graph Queries

Concept: "Find sessions where Reveal was used on files related to deployment"

tia beth explore "deployment" --via reveal
# Query: deployment → sessions mentioning deployment → files explored with reveal
# Result: Practical implementation examples, not just docs

Benefit: Discover how patterns were actually used, not just described.

3. Authority-Weighted Code Search

Concept: Rank code search results by how often the file is referenced in high-authority summaries

tia search code "def authenticate" --rank-by-authority
# Files frequently mentioned in session summaries rank higher
# Assumption: Frequently discussed code = important patterns

Benefit: Surface battle-tested patterns first.

4. Progressive Disclosure in Beth Results

Concept: Beth currently shows title + summary. Add structure-first view:

tia beth explore "authentication"
# Current: Shows document titles
# Enhanced: Shows document outlines
#   1. AUTH_GUIDE.md
#      ├─ OAuth2 Setup
#      ├─ JWT Configuration
#      └─ Session Management

Benefit: Navigate results without reading full docs (LEVEL 2 navigation).

5. Reveal + Beth Unified Interface

Concept: Single command that chooses the right tool

tia explore "authentication"
# If path → use Reveal (structure-first)
# If topic → use Beth (knowledge graph)
# If code pattern → hybrid (Beth finds files → Reveal shows structure)

Benefit: Users don't think about tools, just explore progressively.

Conclusion

The Power of the Two-System Architecture:

Reveal: Structure-first exploration (10-150x token reduction)
Beth: PageRank-weighted discovery (summaries/indexes as entry points)
Together: Orient → Navigate → Focus at every scale

Key Insights:

Summaries aren't overhead—they're high-authority knowledge hubs in Beth's graph
Progressive disclosure isn't just UI—it's a cognitive architecture principle
Token efficiency emerges from consistent three-level patterns across all tools

Measured Impact:
- 25x average token reduction across 300+ sessions
- 15,306 files navigable without overwhelming users
- Seconds, not minutes to orient in unknown territory

The SIL Way:

"Show the map before the territory. Show the structure before the details. Show the relationships before the documents."

Progressive Disclosure Guide - Deep dive on three-level pattern
SIL Core Principles - Principle #1: Progressive Disclosure
Beth Domain - Beth command reference
Reveal: reveal --agent-help - Reveal capabilities guide

Version History:
- v1.0 (2025-12-10): Initial documentation of Reveal + Beth synergy and PageRank system

Table of Contents