"Sharpen your chisel before working the wood. Monitor tool effectiveness before trusting results."
Rank: #10 - META-FEEDBACK PRINCIPLE
The Core Insight
Before using tools to do work, verify the tools themselves are working effectively. This is semantic system hygiene - analogous to "sharpen your chisel before woodworking" or "calibrate your instruments before measuring."
The Pattern:
Before using Beth → Check: Is Beth index healthy?
Before using reveal → Check: Does reveal work on target files?
Before using search → Check: Are search results relevant?
Before deploying agents → Check: Are their tools functioning?
Why This Matters:
- Bad tools produce bad work (garbage in → garbage out)
- Tool degradation is invisible without monitoring
- Early detection prevents cascading failures
- Feedback loops require working sensors
The Problem: Invisible Tool Degradation
Scenario 1: Beth Index Corruption
# User: "Find deployment docs"
tia beth explore "deployment"
# Returns: 0 results
# Without monitoring, you assume:
❌ "No deployment docs exist" (wrong conclusion)
# With monitoring, you discover:
✅ "Beth index is stale/corrupted" (root cause)
Scenario 2: Search Indexing Lag
# User just created: docs/NEW_FEATURE.md
tia search all "NEW_FEATURE"
# Returns: 0 results
# Without monitoring:
❌ "File doesn't exist?" (confusion)
# With monitoring:
✅ "Search index hasn't rebuilt yet" (understanding)
Scenario 3: Reveal Version Mismatch
# CLAUDE.md has examples for reveal v0.15
# But system has reveal v0.9
# Without monitoring:
❌ Agent tries --check flag → command fails → confusion
# With monitoring:
✅ "reveal outdated, upgrade available" (actionable)
The Core Problem: Tool failures look like "no information exists" rather than "tool broken."
The Solution: Systematic Tool Monitoring
Level 1: Boot-Time Health Checks
Already Implemented in tia-boot:
## System Validation
✅ Tasks
✅ Search
✅ Domains
✅ AI
✅ Semantic
✅ Gemma
✅ Beth index healthy (14,459 files, 36,910 keywords)
✅ Beth
✅ Infrastructure
What This Catches:
- Beth index corruption
- Missing dependencies
- Service failures
- Configuration errors
Pattern: Every session starts with tool validation.
Level 2: Pre-Task Tool Verification
Before relying on a tool, verify it works for your specific use case.
Example 1: Beth Effectiveness Check
# BEFORE doing research on "authentication patterns"
# First, verify Beth can find known-good docs:
tia beth explore "SIL core principles"
# Expected: Should return SIL_CORE_PRINCIPLES.md (this doc!)
# If returns 0 results → Beth broken, fix before continuing
# If returns expected docs → Beth working, proceed with confidence
Example 2: Reveal Version Check
# BEFORE relying on reveal features
reveal --version
# Shows: reveal 0.9.0
# Check against CLAUDE.md expectations
# CLAUDE.md expects: reveal v0.15+ (for --check flag)
# Decision:
# - Upgrade reveal, OR
# - Don't use --check flag (not available)
Example 3: Search Relevance Check
# BEFORE complex search task
# Test search quality with known query:
tia search all "tia-boot"
# Expected: Should find bin/tia-boot
# If no results → search index broken
# If wrong results → search needs tuning
# If correct results → proceed
The Pattern:
Known Query (Calibration) → Verify Expected Result → Proceed or Fix
Level 3: Continuous Quality Monitoring
Track tool effectiveness over time.
Beth Health Metrics
# Regular health checks
tia beth health
# Reports:
# - Index size (files, keywords)
# - Last rebuild time
# - Coverage % (files indexed / files discovered)
# - Query success rate
# Example output:
Beth Health Report
==================
Index Size: 14,459 files, 36,910 keywords
Last Rebuild: 2 hours ago
Coverage: 98.7% (14,459 / 14,651 files)
Avg Query Time: 362ms
Success Rate: 87% (queries returning >0 results)
⚠️ Warning: 192 files not indexed (permission errors)
💡 Tip: Run `tia beth rebuild` to refresh
Search Quality Metrics
# Track search effectiveness
tia search metrics
# Reports:
# - Query patterns (most common searches)
# - Hit rate (% queries with results)
# - Result relevance (click-through on top results)
# - Index freshness (last update)
Search Metrics (Last 7 Days)
=============================
Total Queries: 342
Hit Rate: 94% (322/342 found results)
Avg Results: 8.2 per query
Index Freshness: 6 hours old
Top Queries:
1. "tia-boot" (45 queries, 100% hit rate)
2. "SIL" (38 queries, 97% hit rate)
3. "reveal features" (22 queries, 91% hit rate)
⚠️ Zero-result queries (20):
- "new_feature_xyz" (file not indexed yet)
- "deployment automation" (poor term matching)
Reveal Quality Checks
# Verify reveal works on representative files
reveal --check projects/scout/lib/core.py
# Reports:
# - Parse success/failure
# - Structure extraction quality
# - Performance (time to parse)
Reveal Quality Check: projects/scout/lib/core.py
=================================================
✅ Parse: Success
✅ Structure: 12 classes, 45 functions extracted
✅ Performance: 127ms
⚠️ Note: 2 complex decorators skipped (unsupported syntax)
Level 4: Automated Feedback Loops
Tools monitor themselves and auto-correct.
Auto-Rebuild Triggers
# Beth auto-rebuilds when staleness detected
class BethMonitor:
def check_health(self):
if self.index_age > timedelta(hours=24):
logger.warning("Beth index >24h old, triggering rebuild")
self.rebuild_index()
if self.coverage < 0.95:
logger.warning(f"Beth coverage {self.coverage:.1%}, rebuilding")
self.rebuild_index()
Search Index Auto-Update
# Search watches file system, auto-indexes new files
class SearchMonitor:
def on_file_created(self, path: Path):
logger.info(f"New file detected: {path}, indexing...")
self.index_file(path)
def on_file_modified(self, path: Path):
logger.info(f"File modified: {path}, re-indexing...")
self.reindex_file(path)
Tool Version Alerts
# During boot, check for outdated tools
tia-boot
# Output includes:
⚠️ Update available: reveal 0.16.0 (you have 0.9.0)
Update with: pip install --upgrade reveal-cli
⚠️ Update available: scout 2.1.0 (you have 1.8.0)
Update with: cd projects/scout && git pull
Real-World Workflows
Workflow 1: Research Task with Tool Verification
# Task: Research "authentication patterns" across codebase
# STEP 0: Verify tools BEFORE starting
tia-boot # Validates all tools
tia beth explore "SIL" # Calibration check (known-good query)
# Expected: Returns SIL docs
# ✅ Beth working
# STEP 1: Now proceed with confidence
tia beth explore "authentication patterns"
# Returns: 12 results
# STEP 2: If unexpected results
# Before assuming "no auth docs exist"
# Check: Is Beth index fresh?
tia beth health
# Shows: Last rebuild 3 days ago, coverage 87%
# → Stale index! Rebuild and retry
tia beth rebuild
tia beth explore "authentication patterns"
# Returns: 24 results (was missing 12 docs!)
Workflow 2: Code Exploration with Reveal Check
# Task: Understand structure of large Python project
# STEP 0: Verify reveal works
reveal --version
# v0.9.0
# Check: Does it work on a known file?
reveal bin/tia-boot
# ✅ Returns structure successfully
# STEP 1: Proceed to target
reveal projects/scout/lib/orchestrator.py --outline
# Returns clear hierarchy
# STEP 2: Extract specific function
reveal projects/scout/lib/orchestrator.py run_campaign
# ✅ Returns function implementation
Workflow 3: Deployment with Tool Checks
# Task: Deploy new SIL documentation to staging
# STEP 0: Verify deployment tools
tia secrets get github:gh_session # ✅ Auth works
gh auth status # ✅ GitHub CLI authenticated
tia git health # ✅ Git repo healthy
# STEP 1: Proceed with deployment
cd projects/SIL
tia git make-clean # Clean up repo
git push origin staging # Deploy
# STEP 2: Verify deployment
curl https://semanticinfrastructurelab.org/docs/ # ✅ Live
The Feedback Loop Structure
This is a meta-feedback loop - monitoring the monitors:
┌─────────────────────────────────────────────────┐
│ Primary Feedback Loop (Intent → Execution) │
│ │
│ User Intent → Tool Usage → Results → Learning │
│ ↑ │
│ │ │
│ │ Are tools working? │
│ │ │
└────────────────────┼─────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────┐
│ Meta-Feedback Loop (Tool Quality) │
│ │
│ Boot Checks → Health Monitoring → Auto-Repair │
│ ↓ ↓ ↓ │
│ ✅ Beth ⚠️ Coverage 🔧 Rebuild │
│ ✅ Search ⚠️ Staleness 🔧 Reindex │
│ ✅ Reveal ⚠️ Version 🔧 Upgrade │
└─────────────────────────────────────────────────┘
Connection to SEMANTIC_FEEDBACK_LOOPS.md:
- Primary loop: Measure intent-execution alignment
- Meta loop: Measure tool-effectiveness alignment
- Both required: Can't have good execution with broken tools
Connection to SEMANTIC_OBSERVABILITY.md:
- Observability instruments the primary loop (user satisfaction)
- Tool monitoring instruments the meta loop (tool health)
- Nested observability: Observe the observers
Application to Agent Systems
Critical for autonomous agents - agents can't self-correct with broken tools.
Scout Agent Tool Checks
# Before Scout starts research campaign
class ScoutPreflightCheck:
def verify_tools(self):
checks = [
self.check_llm_api(), # Can reach Groq/Anthropic?
self.check_search(), # Search index working?
self.check_beth(), # Beth healthy?
self.check_file_access(), # Can read/write files?
]
if not all(checks):
raise ToolFailureError("Preflight checks failed, aborting")
logger.info("✅ All tools verified, proceeding with campaign")
Agent-Ether Tool Monitoring
# Agent-Ether monitors tool health during multi-agent orchestration
class AgentEtherMonitor:
def before_agent_spawn(self, agent_config):
# Verify agent has working tools
for tool in agent_config.required_tools:
if not self.verify_tool(tool):
logger.error(f"Tool {tool} not working, cannot spawn agent")
return False
return True
def verify_tool(self, tool_name: str) -> bool:
"""Run calibration check on tool"""
if tool_name == "beth":
# Known-good query
results = beth.search("SIL")
return len(results) > 0
elif tool_name == "reveal":
# Can parse a simple file?
test_file = Path("bin/tia-boot")
return reveal.extract_structure(test_file) is not None
# ... other tools
Measuring Tool Quality
Quantitative Metrics
Beth Health:
- Index coverage: >95% (files indexed / files discovered)
- Query success rate: >85% (queries with results)
- Index freshness: <24 hours old
- Avg query time: <500ms
Search Health:
- Hit rate: >90% (queries finding results)
- Index lag: <1 hour (time from file change to indexed)
- Result relevance: >80% (user clicks top 3 results)
Reveal Health:
- Parse success: >98% (files successfully parsed)
- Performance: <200ms for typical files
- Version currency: Within 2 minor versions of latest
Qualitative Indicators
Green Flags (tools working well):
- ✅ Beth consistently finds expected docs
- ✅ Search returns relevant results quickly
- ✅ Reveal parses complex files without errors
- ✅ Boot checks pass every session
- ✅ Zero tool-related support questions
Red Flags (tool degradation):
- ❌ Beth returning 0 results for known topics
- ❌ Search missing recently created files
- ❌ Reveal failing on valid Python files
- ❌ Boot checks showing warnings
- ❌ Users complaining "can't find anything"
Implementation Checklist
For TIA System
- [x] Boot-time health checks (
tia-bootvalidation section) - [ ] Beth health command (
tia beth health) - [ ] Search metrics (
tia search metrics) - [ ] Reveal version check (auto-notify on outdated)
- [ ] Auto-rebuild triggers (Beth/search staleness detection)
- [ ] Tool calibration tests (known-good query suite)
For Agents (Scout, Agent-Ether)
- [ ] Preflight checks (verify tools before starting work)
- [ ] Mid-flight monitoring (detect tool failures during execution)
- [ ] Graceful degradation (fallback when tools fail)
- [ ] Tool failure reporting (alert human when tools broken)
For Documentation
- [ ] Add to SIL_CORE_PRINCIPLES.md (Principle #10)
- [ ] Update CLAUDE.md template (emphasize tool verification)
- [ ] Create tool health guide (how to monitor each tool)
- [ ] Document calibration tests (known-good queries for each tool)
Connection to Existing SIL Principles
Synergy with Other Principles
#1: Progressive Disclosure:
- Tool monitoring uses progressive disclosure (boot checks → health reports → detailed diagnostics)
#2: Composability First:
- Each tool monitors itself independently
- Monitoring tools are composable (beth health + search metrics + reveal check)
#8: Human-in-the-Loop:
- Tool degradation alerts require human attention
- Auto-repair for low-risk (rebuild index), human approval for high-risk (upgrade tools)
#9: Examples as Multi-Shot Reasoning Anchors:
- Calibration tests ARE examples (known-good queries)
- Agents learn "this is what good results look like"
Extends Existing Work
SEMANTIC_FEEDBACK_LOOPS.md:
- Primary feedback: User intent → execution → measurement
- Meta feedback: Tool health → monitoring → auto-repair
- Nested loops: Can't measure execution quality with broken tools
SEMANTIC_OBSERVABILITY.md:
- Observability framework measures intent-execution alignment
- Tool monitoring measures tool-health alignment
- Both required for semantic system reliability
The "Sharpen Your Chisel" Analogy
Woodworking:
- Dull chisel → poor cuts, wasted effort, frustration
- Sharp chisel → clean cuts, efficient work, quality results
- Master carpenters sharpen tools BEFORE starting work
Semantic Systems:
- Broken tools → wrong results, wasted tokens, confusion
- Working tools → accurate results, efficient search, confidence
- Master agents verify tools BEFORE starting research
The Discipline:
Apprentice: Starts work immediately, struggles with dull tools
Master: Sharpens tools first, works efficiently
Junior Agent: Uses Beth blindly, gets 0 results, assumes "no docs exist"
Senior Agent: Checks Beth health, discovers stale index, rebuilds, finds 24 docs
Key Takeaways
- Tool degradation is invisible without monitoring
- Boot-time health checks catch most failures early
- Calibration tests (known-good queries) verify tool effectiveness
- Continuous monitoring catches gradual degradation
- Auto-repair loops reduce human intervention
- Agents MUST verify tools before autonomous work
- Meta-feedback loop monitors the monitors
The Pattern:
# Before every significant task:
1. tia-boot # Verify system health
2. <tool> <calibration_test> # Verify specific tool works
3. Proceed with confidence # Tools are sharp, work efficiently
Remember:
- Garbage tools → garbage results
- Sharp tools → quality work
- Always sharpen your chisel before working the wood
Next Steps
Immediate (This Session)
- Review this principle with user
- Decide if this becomes SIL Core Principle #10
- Create implementation plan (commands, code, docs)
Short-Term (Next Week)
- Implement
tia beth healthcommand - Implement
tia search metricscommand - Add calibration test suite (known-good queries)
- Update CLAUDE.md with tool verification patterns
Medium-Term (Next Month)
- Add auto-rebuild triggers (Beth/search staleness detection)
- Implement Scout preflight checks
- Create tool health dashboard
- Document tool monitoring best practices
Long-Term (Next Quarter)
- Full automated tool monitoring infrastructure
- Predictive tool degradation detection
- Self-healing semantic systems
- Tool quality as first-class observability metric
Status: Published