skills$openclaw/smart-router
c0nspic0us7urk3r1.0k

by c0nspic0us7urk3r

smart-router – OpenClaw Skill

smart-router is an OpenClaw Skills integration for coding workflows. >

1.0k stars1.5k forksSecurity L1
Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

namesmart-router
description> OpenClaw Skills integration.
ownerc0nspic0us7urk3r
repositoryc0nspic0us7urk3r/smart-router
languageMarkdown
licenseMIT
topics
securityL1
installopenclaw add @c0nspic0us7urk3r/smart-router
last updatedFeb 7, 2026

Maintainer

c0nspic0us7urk3r

c0nspic0us7urk3r

Maintains smart-router in the OpenClaw Skills directory.

View GitHub profile
File Explorer
18 files
.
references
models.md
7.1 KB
security.md
21.1 KB
_meta.json
645 B
compactor.py
18.3 KB
context_guard.py
19.5 KB
dashboard.py
17.8 KB
executor.py
17.5 KB
expert_matrix.json
9.9 KB
log_decision.py
928 B
README.md
19.2 KB
router_config.json
2.8 KB
router_gateway.py
45.8 KB
router_hook.py
16.9 KB
semantic_router.py
20.1 KB
SKILL.md
29.8 KB
state_manager.py
17.4 KB
STATE.md
971 B
SKILL.md

name: smart-router description: > Expertise-aware model router with semantic domain scoring, context-overflow protection, and security redaction. Automatically selects the optimal AI model using weighted expertise scoring (Feb 2026 benchmarks). Supports Claude, GPT, Gemini, Grok with automatic fallback chains, HITL gates, and cost optimization. author: c0nSpIc0uS7uRk3r version: 2.1.0 license: MIT metadata: openclaw: requires: bins: ["python3"] env: ["ANTHROPIC_API_KEY"] optional_env: ["GOOGLE_API_KEY", "OPENAI_API_KEY", "XAI_API_KEY"] features: - Semantic domain detection - Expertise-weighted scoring (0-100) - Risk-based mandatory routing - Context overflow protection (>150K → Gemini) - Security credential redaction - Circuit breaker with persistent state - HITL gate for low-confidence routing benchmarks: source: "Feb 2026 MLOC Analysis" models: - "Claude Opus 4.5: SWE-bench 80.9%" - "GPT-5.2: AIME 100%, Control Flow 22 errors/MLOC" - "Gemini 3 Pro: Concurrency 69 issues/MLOC"

A.I. Smart-Router

Intelligently route requests to the optimal AI model using tiered classification with automatic fallback handling and cost optimization.

How It Works (Silent by Default)

The router operates transparently—users send messages normally and get responses from the best model for their task. No special commands needed.

Optional visibility: Include [show routing] in any message to see the routing decision.

Tiered Classification System

The router uses a three-tier decision process:

┌─────────────────────────────────────────────────────────────────┐
│                    TIER 1: INTENT DETECTION                      │
│  Classify the primary purpose of the request                     │
├─────────────────────────────────────────────────────────────────┤
│  CODE        │ ANALYSIS    │ CREATIVE   │ REALTIME  │ GENERAL   │
│  write/debug │ research    │ writing    │ news/live │ Q&A/chat  │
│  refactor    │ explain     │ stories    │ X/Twitter │ translate │
│  review      │ compare     │ brainstorm │ prices    │ summarize │
└──────┬───────┴──────┬──────┴─────┬──────┴─────┬─────┴─────┬─────┘
       │              │            │            │           │
       ▼              ▼            ▼            ▼           ▼
┌─────────────────────────────────────────────────────────────────┐
│                  TIER 2: COMPLEXITY ESTIMATION                   │
├─────────────────────────────────────────────────────────────────┤
│  SIMPLE (Tier $)        │ MEDIUM (Tier $$)    │ COMPLEX (Tier $$$)│
│  • One-step task        │ • Multi-step task   │ • Deep reasoning  │
│  • Short response OK    │ • Some nuance       │ • Extensive output│
│  • Factual lookup       │ • Moderate context  │ • Critical task   │
│  → Haiku/Flash          │ → Sonnet/Grok/GPT   │ → Opus/GPT-5      │
└──────────────────────────┴─────────────────────┴───────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────┐
│                TIER 3: SPECIAL CASE OVERRIDES                    │
├─────────────────────────────────────────────────────────────────┤
│  CONDITION                           │ OVERRIDE TO              │
│  ─────────────────────────────────────┼─────────────────────────│
│  Context >100K tokens                │ → Gemini Pro (1M ctx)    │
│  Context >500K tokens                │ → Gemini Pro ONLY        │
│  Needs real-time data                │ → Grok (regardless)      │
│  Image/vision input                  │ → Opus or Gemini Pro     │
│  User explicit override              │ → Requested model        │
└──────────────────────────────────────┴──────────────────────────┘

Intent Detection Patterns

CODE Intent

  • Keywords: write, code, debug, fix, refactor, implement, function, class, script, API, bug, error, compile, test, PR, commit
  • File extensions mentioned: .py, .js, .ts, .go, .rs, .java, etc.
  • Code blocks in input

ANALYSIS Intent

  • Keywords: analyze, explain, compare, research, understand, why, how does, evaluate, assess, review, investigate, examine
  • Long-form questions
  • "Help me understand..."

CREATIVE Intent

  • Keywords: write (story/poem/essay), create, brainstorm, imagine, design, draft, compose
  • Fiction/narrative requests
  • Marketing/copy requests

REALTIME Intent

  • Keywords: now, today, current, latest, trending, news, happening, live, price, score, weather
  • X/Twitter mentions
  • Stock/crypto tickers
  • Sports scores

GENERAL Intent (Default)

  • Simple Q&A
  • Translations
  • Summaries
  • Conversational

MIXED Intent (Multiple Intents Detected)

When a request contains multiple clear intents (e.g., "Write code to analyze this data and explain it creatively"):

  1. Identify primary intent — What's the main deliverable?
  2. Route to highest-capability model — Mixed tasks need versatility
  3. Default to COMPLEX complexity — Multi-intent = multi-step

Examples:

  • "Write code AND explain how it works" → CODE (primary) + ANALYSIS → Route to Opus
  • "Summarize this AND what's the latest news on it" → REALTIME takes precedence → Grok
  • "Creative story using real current events" → REALTIME + CREATIVE → Grok (real-time wins)

Language Handling

Non-English requests are handled normally — all supported models have multilingual capabilities:

ModelNon-English Support
Opus/Sonnet/HaikuExcellent (100+ languages)
GPT-5Excellent (100+ languages)
Gemini Pro/FlashExcellent (100+ languages)
GrokGood (major languages)

Intent detection still works because:

  • Keyword patterns include common non-English equivalents
  • Code intent detected by file extensions, code blocks (language-agnostic)
  • Complexity estimated by query length (works across languages)

Edge case: If intent unclear due to language, default to GENERAL intent with MEDIUM complexity.

Complexity Signals

Simple Complexity ($)

  • Short query (<50 words)
  • Single question mark
  • "Quick question", "Just tell me", "Briefly"
  • Yes/no format
  • Unit conversions, definitions

Medium Complexity ($$)

  • Moderate query (50-200 words)
  • Multiple aspects to address
  • "Explain", "Describe", "Compare"
  • Some context provided

Complex Complexity ($$$)

  • Long query (>200 words) or complex task
  • "Step by step", "Thoroughly", "In detail"
  • Multi-part questions
  • Critical/important qualifier
  • Research, analysis, or creative work

Routing Matrix

IntentSimpleMediumComplex
CODESonnetOpusOpus
ANALYSISFlashGPT-5Opus
CREATIVESonnetOpusOpus
REALTIMEGrokGrokGrok-3
GENERALFlashSonnetOpus

Token Exhaustion & Automatic Model Switching

When a model becomes unavailable mid-session (token quota exhausted, rate limit hit, API error), the router automatically switches to the next best available model and notifies the user.

Notification Format

When a model switch occurs due to exhaustion, the user receives a notification:

┌─────────────────────────────────────────────────────────────────┐
│  ⚠️ MODEL SWITCH NOTICE                                         │
│                                                                  │
│  Your request could not be completed on claude-opus-4-5         │
│  (reason: token quota exhausted).                               │
│                                                                  │
│  ✅ Request completed using: anthropic/claude-sonnet-4-5        │
│                                                                  │
│  The response below was generated by the fallback model.        │
└─────────────────────────────────────────────────────────────────┘

Switch Reasons

ReasonDescription
token quota exhaustedDaily/monthly token limit reached
rate limit exceededToo many requests per minute
context window exceededInput too large for model
API timeoutModel took too long to respond
API errorProvider returned an error
model unavailableModel temporarily offline

Implementation

def execute_with_fallback(primary_model: str, fallback_chain: list[str], request: str) -> Response:
    """
    Execute request with automatic fallback and user notification.
    """
    attempted_models = []
    switch_reason = None
    
    # Try primary model first
    models_to_try = [primary_model] + fallback_chain
    
    for model in models_to_try:
        try:
            response = call_model(model, request)
            
            # If we switched models, prepend notification
            if attempted_models:
                notification = build_switch_notification(
                    failed_model=attempted_models[0],
                    reason=switch_reason,
                    success_model=model
                )
                return Response(
                    content=notification + "\n\n---\n\n" + response.content,
                    model_used=model,
                    switched=True
                )
            
            return Response(content=response.content, model_used=model, switched=False)
            
        except TokenQuotaExhausted:
            attempted_models.append(model)
            switch_reason = "token quota exhausted"
            log_fallback(model, switch_reason)
            continue
            
        except RateLimitExceeded:
            attempted_models.append(model)
            switch_reason = "rate limit exceeded"
            log_fallback(model, switch_reason)
            continue
            
        except ContextWindowExceeded:
            attempted_models.append(model)
            switch_reason = "context window exceeded"
            log_fallback(model, switch_reason)
            continue
            
        except APITimeout:
            attempted_models.append(model)
            switch_reason = "API timeout"
            log_fallback(model, switch_reason)
            continue
            
        except APIError as e:
            attempted_models.append(model)
            switch_reason = f"API error: {e.code}"
            log_fallback(model, switch_reason)
            continue
    
    # All models exhausted
    return build_exhaustion_error(attempted_models)


def build_switch_notification(failed_model: str, reason: str, success_model: str) -> str:
    """Build user-facing notification when model switch occurs."""
    return f"""⚠️ **MODEL SWITCH NOTICE**

Your request could not be completed on `{failed_model}` (reason: {reason}).

✅ **Request completed using:** `{success_model}`

The response below was generated by the fallback model."""


def build_exhaustion_error(attempted_models: list[str]) -> Response:
    """Build error when all models are exhausted."""
    models_tried = ", ".join(attempted_models)
    return Response(
        content=f"""❌ **REQUEST FAILED**

Unable to complete your request. All available models have been exhausted.

**Models attempted:** {models_tried}

**What you can do:**
1. **Wait** — Token quotas typically reset hourly or daily
2. **Simplify** — Try a shorter or simpler request
3. **Check status** — Run `/router status` to see model availability

If this persists, your human may need to check API quotas or add additional providers.""",
        model_used=None,
        switched=False,
        failed=True
    )

Fallback Priority for Token Exhaustion

When a model is exhausted, the router selects the next best model for the same task type:

Original ModelFallback Priority (same capability)
OpusSonnet → GPT-5 → Grok-3 → Gemini Pro
SonnetGPT-5 → Grok-3 → Opus → Haiku
GPT-5Sonnet → Opus → Grok-3 → Gemini Pro
Gemini ProFlash → GPT-5 → Opus → Sonnet
Grok-2/3(warn: no real-time fallback available)

User Acknowledgment

After a model switch, the agent should note in the response that:

  1. The original model was unavailable
  2. Which model actually completed the request
  3. The response quality may differ from the original model's typical output

This ensures transparency and sets appropriate expectations.

Streaming Responses with Fallback

When using streaming responses, fallback handling requires special consideration:

async def execute_with_streaming_fallback(primary_model: str, fallback_chain: list[str], request: str):
    """
    Handle streaming responses with mid-stream fallback.
    
    If a model fails DURING streaming (not before), the partial response is lost.
    Strategy: Don't start streaming until first chunk received successfully.
    """
    models_to_try = [primary_model] + fallback_chain
    
    for model in models_to_try:
        try:
            # Test with non-streaming ping first (optional, adds latency)
            # await test_model_availability(model)
            
            # Start streaming
            stream = await call_model_streaming(model, request)
            first_chunk = await stream.get_first_chunk(timeout=10_000)  # 10s timeout for first chunk
            
            # If we got here, model is responding — continue streaming
            yield first_chunk
            async for chunk in stream:
                yield chunk
            return  # Success
            
        except (FirstChunkTimeout, StreamError) as e:
            log_fallback(model, str(e))
            continue  # Try next model
    
    # All models failed
    yield build_exhaustion_error(models_to_try)

Key insight: Wait for the first chunk before committing to a model. If the first chunk times out, fall back before any partial response is shown to the user.

Retry Timing Configuration

RETRY_CONFIG = {
    "initial_timeout_ms": 30_000,     # 30s for first attempt
    "fallback_timeout_ms": 20_000,    # 20s for fallback attempts (faster fail)
    "max_retries_per_model": 1,       # Don't retry same model
    "backoff_multiplier": 1.5,        # Not used (no same-model retry)
    "circuit_breaker_threshold": 3,   # Failures before skipping model entirely
    "circuit_breaker_reset_ms": 300_000  # 5 min before trying failed model again
}

Circuit breaker: If a model fails 3 times in 5 minutes, skip it entirely for the next 5 minutes. This prevents repeatedly hitting a down service.

Fallback Chains

When the preferred model fails (rate limit, API down, error), cascade to the next option:

Code Tasks

Opus → Sonnet → GPT-5 → Gemini Pro

Analysis Tasks

Opus → GPT-5 → Gemini Pro → Sonnet

Creative Tasks

Opus → GPT-5 → Sonnet → Gemini Pro

Real-time Tasks

Grok-2 → Grok-3 → (warn: no real-time fallback)

General Tasks

Flash → Haiku → Sonnet → GPT-5

Long Context (Tiered by Size)

┌─────────────────────────────────────────────────────────────────┐
│                  LONG CONTEXT FALLBACK CHAIN                     │
├─────────────────────────────────────────────────────────────────┤
│  TOKEN COUNT        │ FALLBACK CHAIN                            │
│  ───────────────────┼───────────────────────────────────────────│
│  128K - 200K        │ Opus (200K) → Sonnet (200K) → Gemini Pro  │
│  200K - 1M          │ Gemini Pro → Flash (1M) → ERROR_MESSAGE   │
│  > 1M               │ ERROR_MESSAGE (no model supports this)    │
└─────────────────────┴───────────────────────────────────────────┘

Implementation:

def handle_long_context(token_count: int, available_models: dict) -> str | ErrorMessage:
    """Route long-context requests with graceful degradation."""
    
    # Tier 1: 128K - 200K tokens (Opus/Sonnet can handle)
    if token_count <= 200_000:
        for model in ["opus", "sonnet", "haiku", "gemini-pro", "flash"]:
            if model in available_models and get_context_limit(model) >= token_count:
                return model
    
    # Tier 2: 200K - 1M tokens (only Gemini)
    elif token_count <= 1_000_000:
        for model in ["gemini-pro", "flash"]:
            if model in available_models:
                return model
    
    # Tier 3: > 1M tokens (nothing available)
    # Fall through to error
    
    # No suitable model found — return helpful error
    return build_context_error(token_count, available_models)


def build_context_error(token_count: int, available_models: dict) -> ErrorMessage:
    """Build a helpful error message when no model can handle the input."""
    
    # Find the largest available context window
    max_available = max(
        (get_context_limit(m) for m in available_models),
        default=0
    )
    
    # Determine what's missing
    missing_models = []
    if "gemini-pro" not in available_models and "flash" not in available_models:
        missing_models.append("Gemini Pro/Flash (1M context)")
    if token_count <= 200_000 and "opus" not in available_models:
        missing_models.append("Opus (200K context)")
    
    # Format token count for readability
    if token_count >= 1_000_000:
        token_display = f"{token_count / 1_000_000:.1f}M"
    else:
        token_display = f"{token_count // 1000}K"
    
    return ErrorMessage(
        title="Context Window Exceeded",
        message=f"""Your input is approximately **{token_display} tokens**, which exceeds the context window of all currently available models.

**Required:** Gemini Pro (1M context) {"— currently unavailable" if "gemini-pro" not in available_models else ""}
**Your max available:** {max_available // 1000}K tokens

**Options:**
1. **Wait and retry** — Gemini may be temporarily down
2. **Reduce input size** — Remove unnecessary content to fit within {max_available // 1000}K tokens
3. **Split into chunks** — I can process your input sequentially in smaller pieces

Would you like me to help split this into manageable chunks?""",
        
        recoverable=True,
        suggested_action="split_chunks"
    )

Example Error Output:

⚠️ Context Window Exceeded

Your input is approximately **340K tokens**, which exceeds the context 
window of all currently available models.

Required: Gemini Pro (1M context) — currently unavailable
Your max available: 200K tokens

Options:
1. Wait and retry — Gemini may be temporarily down
2. Reduce input size — Remove unnecessary content to fit within 200K tokens
3. Split into chunks — I can process your input sequentially in smaller pieces

Would you like me to help split this into manageable chunks?

Dynamic Model Discovery

The router auto-detects available providers at runtime:

1. Check configured auth profiles
2. Build available model list from authenticated providers
3. Construct routing table using ONLY available models
4. If preferred model unavailable, use best available alternative

Example: If only Anthropic and Google are configured:

  • Code tasks → Opus (Anthropic available ✓)
  • Real-time tasks → ⚠️ No Grok → Fall back to Opus + warn user
  • Long docs → Gemini Pro (Google available ✓)

Cost Optimization

The router considers cost when complexity is LOW:

ModelCost TierUse When
Gemini Flash$Simple tasks, high volume
Claude Haiku$Simple tasks, quick responses
Claude Sonnet$$Medium complexity
Grok 2$$Real-time needs only
GPT-5$$General fallback
Gemini Pro$$$Long context needs
Claude Opus$$$$Complex/critical tasks

Rule: Never use Opus ($$$) for tasks that Flash ($) can handle.

User Controls

Show Routing Decision

Add [show routing] to any message:

[show routing] What's the weather in NYC?

Output includes:

[Routed → xai/grok-2-latest | Reason: REALTIME intent detected | Fallback: none available]

Force Specific Model

Explicit overrides:

  • "use grok: ..." → Forces Grok
  • "use claude: ..." → Forces Opus
  • "use gemini: ..." → Forces Gemini Pro
  • "use flash: ..." → Forces Gemini Flash
  • "use gpt: ..." → Forces GPT-5

Check Router Status

Ask: "router status" or "/router" to see:

  • Available providers
  • Configured models
  • Current routing table
  • Recent routing decisions

Implementation Notes

For Agent Implementation

When processing a request:

1. DETECT available models (check auth profiles)
2. CLASSIFY intent (code/analysis/creative/realtime/general)
3. ESTIMATE complexity (simple/medium/complex)
4. CHECK special cases (context size, vision, explicit override)
5. FILTER by cost tier based on complexity ← BEFORE model selection
6. SELECT model from filtered pool using routing matrix
7. VERIFY model available, else use fallback chain (also cost-filtered)
8. EXECUTE request with selected model
9. IF failure, try next in fallback chain
10. LOG routing decision (for debugging)

Cost-Aware Routing Flow (Critical Order)

def route_with_fallback(request):
    """
    Main routing function with CORRECT execution order.
    Cost filtering MUST happen BEFORE routing table lookup.
    """
    
    # Step 1: Discover available models
    available_models = discover_providers()
    
    # Step 2: Classify intent
    intent = classify_intent(request)
    
    # Step 3: Estimate complexity
    complexity = estimate_complexity(request)
    
    # Step 4: Check special-case overrides (these bypass cost filtering)
    if user_override := get_user_model_override(request):
        return execute_with_fallback(user_override, [])  # No cost filter for explicit override
    
    if token_count > 128_000:
        return handle_long_context(token_count, available_models)  # Special handling
    
    if needs_realtime(request):
        return execute_with_fallback("grok-2", ["grok-3"])  # Realtime bypasses cost
    
    # ┌─────────────────────────────────────────────────────────────┐
    # │  STEP 5: FILTER BY COST TIER — THIS MUST COME FIRST!       │
    # │                                                             │
    # │  Cost filtering happens BEFORE the routing table lookup,   │
    # │  NOT after. This ensures "what's 2+2?" never considers     │
    # │  Opus even momentarily.                                    │
    # └─────────────────────────────────────────────────────────────┘
    
    allowed_tiers = get_allowed_tiers(complexity)
    # SIMPLE  → ["$"]
    # MEDIUM  → ["$", "$$"]
    # COMPLEX → ["$", "$$", "$$$"]
    
    cost_filtered_models = {
        model: meta for model, meta in available_models.items()
        if COST_TIERS.get(model) in allowed_tiers
    }
    
    # Step 6: NOW select from cost-filtered pool using routing preferences
    preferences = ROUTING_PREFERENCES.get((intent, complexity), [])
    
    for model in preferences:
        if model in cost_filtered_models:  # Only consider cost-appropriate models
            selected_model = model
            break
    else:
        # No preferred model in cost-filtered pool — use cheapest available
        selected_model = select_cheapest(cost_filtered_models)
    
    # Step 7: Build cost-filtered fallback chain
    task_type = get_task_type(intent, complexity)
    full_chain = MASTER_FALLBACK_CHAINS.get(task_type, [])
    filtered_chain = [m for m in full_chain if m in cost_filtered_models and m != selected_model]
    
    # Step 8-10: Execute with fallback + logging
    return execute_with_fallback(selected_model, filtered_chain)


def get_allowed_tiers(complexity: str) -> list[str]:
    """Return allowed cost tiers for a given complexity level."""
    return {
        "SIMPLE":  ["$"],                      # Budget only — no exceptions
        "MEDIUM":  ["$", "$$"],                # Budget + standard
        "COMPLEX": ["$", "$$", "$$$", "$$$$"], # All tiers — complex tasks deserve the best
    }.get(complexity, ["$", "$$"])


# Example flow for "what's 2+2?":
#
# 1. available_models = {opus, sonnet, haiku, flash, grok-2, ...}
# 2. intent = GENERAL
# 3. complexity = SIMPLE
# 4. (no special cases)
# 5. allowed_tiers = ["$"]  ← SIMPLE means $ only
#    cost_filtered_models = {haiku, flash, grok-2}  ← Opus/Sonnet EXCLUDED
# 6. preferences for (GENERAL, SIMPLE) = [flash, haiku, grok-2, sonnet]
#    first match in cost_filtered = flash ✓
# 7. fallback_chain = [haiku, grok-2]  ← Also cost-filtered
# 8. execute with flash
#
# Result: Opus is NEVER considered, not even momentarily.

Cost Optimization: Two Approaches

┌─────────────────────────────────────────────────────────────────┐
│           COST OPTIMIZATION IMPLEMENTATION OPTIONS               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  APPROACH 1: Explicit filter_by_cost() (shown above)            │
│  ─────────────────────────────────────────────────────────────  │
│  • Calls get_allowed_tiers(complexity) explicitly               │
│  • Filters available_models BEFORE routing table lookup         │
│  • Most defensive — impossible to route wrong tier              │
│  • Recommended for security-critical deployments                │
│                                                                  │
│  APPROACH 2: Preference ordering (implicit)                     │
│  ─────────────────────────────────────────────────────────────  │
│  • ROUTING_PREFERENCES lists cheapest capable models first      │
│  • For SIMPLE tasks: [flash, haiku, grok-2, sonnet]            │
│  • First available match wins → naturally picks cheapest        │
│  • Simpler code, relies on correct preference ordering          │
│                                                                  │
│  This implementation uses BOTH for defense-in-depth:            │
│  • Preference ordering provides first line of cost awareness    │
│  • Explicit filter_by_cost() guarantees tier enforcement        │
│                                                                  │
│  For alternative implementations that rely solely on            │
│  preference ordering, see references/models.md for the          │
│  filter_by_cost() function if explicit enforcement is needed.   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Spawning with Different Models

Use sessions_spawn for model routing:

sessions_spawn(
  task: "user's request",
  model: "selected/model-id",
  label: "task-type-query"
)

Security

  • Never send sensitive data to untrusted models
  • API keys handled via environment/auth profiles only
  • See references/security.md for full security guidance

Model Details

See references/models.md for detailed capabilities and pricing.

README.md

🧠 A.I. Smart-Router

Expertise-aware model router for multi-provider AI setups. Uses weighted semantic scoring and Feb 2026 benchmarks to route each request to the best-suited model. Security redaction, context-overflow protection, and HITL gates for ambiguous intents.

v2.0.0 — Now with Phase G Semantic Specialization: expertise-weighted scoring, concurrency detection, and risk-based mandatory routing.


✨ Features

FeatureDescription
Semantic Domain DetectionIdentifies SOFTWARE_ARCH, CONCURRENCY, LOGICAL_PRECISION, MASSIVE_SYNTHESIS, etc.
Expertise-Weighted ScoringEach model scored 0-100 per domain based on Feb 2026 benchmarks (MLOC analysis)
Risk-Based Mandatory RoutingMedical → GPT-5.2, Terminal/Shell → Opus, Concurrency → Gemini (enforced)
Context Overflow Protection>150K tokens auto-routes to Gemini Pro (1M context window)
Security Credential RedactionAPI keys, tokens, passwords blocked before reaching any model
Circuit BreakerPersistent state tracks model failures; auto-bypasses degraded providers
HITL GateLow-confidence (<75%) routing triggers Telegram notification for approval
Automatic FallbacksIf a model fails, cascades to next best option automatically
Cost OptimizationSimple tasks use Flash ($0.075/M); complex tasks unlock Opus ($$$$)
Multi-ProviderAnthropic (Claude), OpenAI (GPT), Google (Gemini), xAI (Grok)

🎯 Expert Domain Map (Feb 2026 Benchmarks)

ModelExpert DomainBenchmarkKnown Blind Spot
Claude Opus 4.5SOFTWARE_ARCHSWE-bench 80.9%, Terminal-bench 59.3%High cost, verbosity
GPT-5.2 (High)LOGICAL_PRECISION100% AIME, 22 control flow errors/MLOC414 concurrency issues/MLOC
Gemini 3 ProMASSIVE_SYNTHESIS1M+ context, 69 concurrency issues/MLOC200 control flow errors/MLOC
Grok-3REALTIME_SIGNALNative X/Twitter, 5-min news freshnessLess rigorous formal logic
Flash 2.5SYSTEM_ROUTINE200ms latency, $0.075/M tokensShallow multi-step reasoning

Mandatory Overrides

Risk DomainMandatory ModelReason
Medical/HealthGPT-5.21.6% HealthBench error rate (lowest)
Financial MathGPT-5.2100% AIME accuracy
Terminal/ShellOpus95.3% prompt-injection robustness
Security AuditOpus96.2% security constraint verification
High ConcurrencyGemini Pro6x fewer concurrency bugs than GPT-5
Context >150KGemini ProOnly model with 1M+ context window

🛡️ Context Safety (v2.1.0)

Never hit an overflow again with Proactive Budgeting.

Phase H introduces three-layer context protection:

LayerThresholdAction
Strike 1: Pre-Flight>180K tokensForce-route to Gemini Pro before API call
Strike 2: Silent Retry422/400 errorIntercept overflow, retry with Gemini silently
Strike 3: JIT Compact150K-180KSummarize oldest 30% to stay within limits
# Pre-flight audit
tokens = calculate_budget(messages)
if tokens > 180_000:
    model = "google/gemini-2.5-pro"  # Force Gemini

# Silent retry on overflow
try:
    response = await call_model(messages, model)
except ContextOverflow:
    response = await call_model(messages, "google/gemini-2.5-pro")

Result: Zero context overflow errors reach the user. Ever.


🚀 Quick Start

1. Install the Skill

Copy the smart-router folder to your OpenClaw workspace skills directory:

# From your workspace root
mkdir -p skills
cp -r /path/to/smart-router skills/

Or clone directly:

cd ~/.openclaw/workspace/skills
git clone https://github.com/c0nSpIc0uS7uRk3r/smart-router.git

2. Configure Providers

Add API keys for the providers you want to use (see Configuration).

3. Start Using

That's it! The router works automatically. Send messages normally and the best model is selected for each task.

Optional: Enable verbose mode to see routing decisions:

/router verbose on

⚙️ Configuration

Required: At Least One Provider

The router requires at least one AI provider configured. Add any combination:

ProviderEnvironment VariableOpenClaw Auth Profile
AnthropicANTHROPIC_API_KEYanthropic:default
OpenAIOPENAI_API_KEYopenai-codex:default
GoogleGOOGLE_API_KEYgoogle:manual
xAIXAI_API_KEYxai:manual
Option A: Environment Variables

Add to your shell profile (~/.bashrc, ~/.zshrc):

export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="AIza..."
export XAI_API_KEY="xai-..."
Option B: OpenClaw Auth Profiles
openclaw auth add anthropic
openclaw auth add google --manual
openclaw auth add xai --manual

Adding/Removing Providers

Simply add or remove the API key — the router auto-detects changes within 5 minutes.

Force immediate refresh:

/router refresh

Minimum Viable Setup

The router works with just one provider. Example with only Anthropic:

  • All tasks route to Claude models (Opus/Sonnet/Haiku)
  • Warnings shown for missing capabilities (real-time, long context)
  • No errors, just graceful degradation

📋 Commands

CommandDescription
/routerShow brief status (configured providers, available models)
/router statusFull status with last 5 routing decisions
/router verboseToggle verbose mode (show routing for all messages)
/router verbose onEnable verbose mode
/router verbose offDisable verbose mode (default)
/router refreshForce re-discover configured providers

Model Overrides

Force a specific model by prefixing your message:

use opus: Write a complex authentication system
use sonnet: Explain how this code works
use haiku: What's 2+2?
use gpt: Give me an alternative perspective
use gemini: Analyze this 200-page document
use flash: Translate this to Spanish
use grok: What's trending on X right now?

Per-Message Routing Visibility

Add [show routing] to any message to see the routing decision:

[show routing] What's the capital of France?

Output:

[Routed → Haiku | Reason: CONVERSATION + SIMPLE complexity, cost-optimized]

The capital of France is Paris.

🌳 Routing Decision Tree

                         ┌─────────────────────┐
                         │   Incoming Request  │
                         └──────────┬──────────┘
                                    │
                    ┌───────────────▼───────────────┐
                    │  STEP 0: SPECIAL CASE CHECK   │
                    └───────────────┬───────────────┘
                                    │
              ┌─────────────────────┼─────────────────────┐
              │                     │                     │
              ▼                     ▼                     ▼
    ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
    │ User specified  │  │ Tokens > 128K?  │  │ Needs real-time │
    │ model override? │  │                 │  │ data?           │
    └────────┬────────┘  └────────┬────────┘  └────────┬────────┘
             │YES                 │YES                 │YES
             ▼                    ▼                    ▼
       Use requested        Gemini Pro              Grok
          model              (forced)             (forced)
             │                    │                    │
             └────────────────────┴────────────────────┘
                                  │NO (none matched)
                                  ▼
                    ┌─────────────────────────────┐
                    │   STEP 1: INTENT DETECTION  │
                    │                             │
                    │  CODE      → coding tasks   │
                    │  ANALYSIS  → research/eval  │
                    │  CREATIVE  → writing/ideas  │
                    │  CONVO     → chat/Q&A       │
                    │  DOCUMENT  → long text      │
                    │  RESEARCH  → fact-finding   │
                    │  MATH      → calculations   │
                    └──────────────┬──────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │ STEP 2: COMPLEXITY ESTIMATE │
                    │                             │
                    │  SIMPLE  → <50 words, 1 Q   │
                    │  MEDIUM  → 50-200 words     │
                    │  COMPLEX → >200w, multi-part│
                    └──────────────┬──────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │ STEP 3: COST-AWARE SELECTION│
                    │                             │
                    │  SIMPLE  → $ tier only      │
                    │  MEDIUM  → $ or $$ tier     │
                    │  COMPLEX → any tier         │
                    └──────────────┬──────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │  STEP 4: MODEL MATRIX LOOKUP│
                    └──────────────┬──────────────┘
                                   │
          ┌────────────────────────┼────────────────────────┐
          │                        │                        │
          ▼                        ▼                        ▼
   ┌─────────────┐         ┌─────────────┐         ┌─────────────┐
   │   SIMPLE    │         │   MEDIUM    │         │   COMPLEX   │
   ├─────────────┤         ├─────────────┤         ├─────────────┤
   │ Flash    $  │         │ Sonnet  $$  │         │ Opus  $$$$  │
   │ Haiku    $  │         │ GPT-5   $$  │         │ GPT-5  $$$  │
   │ Grok-2   $  │         │ Grok-3  $$  │         │ Gemini $$$  │
   └─────────────┘         └─────────────┘         └─────────────┘
          │                        │                        │
          └────────────────────────┼────────────────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │   STEP 5: FALLBACK CHAIN    │
                    │                             │
                    │  If model fails → try next  │
                    │  Log failure → cascade      │
                    └─────────────────────────────┘

Model Selection Matrix

IntentSimple ($)Medium ($$)Complex ($$$$)
CODESonnetSonnetOpus
ANALYSISFlashSonnet/GPT-5Opus
CREATIVESonnetSonnetOpus
CONVERSATIONHaiku/FlashSonnetSonnet
DOCUMENTFlashGemini ProGemini Pro
RESEARCHGrokGPT-5Opus
MATH/LOGICSonnetOpusOpus

Fallback Chains

Task TypeFallback Order
Complex ReasoningOpus → GPT-5 → Grok 3 → Sonnet
Code GenerationOpus → GPT-5 → Sonnet → Grok 3
Fast/SimpleHaiku → Flash → Grok 2 → Sonnet
Large ContextGemini Pro → GPT-5 → Opus
Creative WritingOpus → Sonnet → GPT-5 → Grok 3

🔄 Token Exhaustion & Model Switching

When you run out of tokens on a model (quota exhausted, rate limited), the router automatically switches to the next best available model and notifies you.

What Happens

  1. You send a request
  2. Primary model fails (token quota exhausted)
  3. Router automatically tries the next model in the fallback chain
  4. You receive a notification + the completed response

Notification Example

⚠️ MODEL SWITCH NOTICE

Your request could not be completed on claude-opus-4-5
(reason: token quota exhausted).

✅ Request completed using: anthropic/claude-sonnet-4-5

The response below was generated by the fallback model.

---

[Your actual response here]

Switch Reasons

ReasonWhat It Means
token quota exhaustedDaily/monthly token limit reached for this model
rate limit exceededToo many requests per minute
context window exceededYour input was too large for this model
API timeoutModel took too long to respond
API errorProvider returned an error
model unavailableModel is temporarily offline

When All Models Fail

If all available models are exhausted, you'll see:

❌ REQUEST FAILED

Unable to complete your request. All available models have been exhausted.

Models attempted: claude-opus-4-5, claude-sonnet-4-5, gpt-5

What you can do:
1. Wait — Token quotas typically reset hourly or daily
2. Simplify — Try a shorter or simpler request
3. Check status — Run /router status to see model availability

Why This Matters

  • No silent failures — You always know what happened
  • Transparent attribution — You know which model generated your response
  • Quality expectations — If Opus fails and Sonnet completes, you know the response came from a less capable model
  • Continuity — Your workflow isn't interrupted by token limits

🎨 Customization

Modifying Routing Preferences

Edit SKILL.md and update the ROUTING_PREFERENCES dictionary:

ROUTING_PREFERENCES = {
    ("CODE", "SIMPLE"):   ["sonnet", "gpt-5", "flash"],  # Add/remove/reorder
    ("CODE", "MEDIUM"):   ["opus", "sonnet", "gpt-5"],   # Your preferences
    # ...
}

Adjusting Cost Tiers

Edit references/models.md to change tier assignments:

COST_TIERS = {
    "flash": "$",      # Budget
    "haiku": "$",      # Budget
    "sonnet": "$$",    # Standard
    "opus": "$$$$",    # Premium
    # ...
}

Adding New Models

  1. Add the model to PROVIDER_REGISTRY in SKILL.md
  2. Add pricing to references/models.md
  3. Update routing preferences as desired
  4. Run /router refresh

Changing Fallback Timeout

Edit the timeout constant in SKILL.md:

TIMEOUT_MS = 30000  # 30 seconds (default)

📁 Skill Structure

smart-router/
├── SKILL.md              # Main skill definition + routing logic
├── README.md             # This file
├── LICENSE               # MIT License
└── references/
    ├── models.md         # Model capabilities, pricing, cost tiers
    └── security.md       # Security guidelines

🔒 Security

  • API keys are never logged or exposed in routing output
  • Sensitive requests are not sent to untrusted models
  • All provider communication uses official SDKs/APIs
  • Input sanitization before any model API call
  • No arbitrary code execution paths
  • See references/security.md for full security guidelines

📊 Cost Optimization

The router is designed to minimize costs without sacrificing quality:

ModelCost/1M InputUsed For
Flash$0.075Simple Q&A, translations
Haiku$0.25Quick responses
Grok 2$0.20Real-time queries
Sonnet$3.00Most tasks (best value)
GPT-5SubscriptionFallback (no per-token)
Gemini Pro$1.25Large documents
Opus$15.00Complex tasks only

Rule: Simple tasks NEVER use premium models. A "what's 2+2?" query costs $0.000075, not $0.015.


🔧 Troubleshooting

Common Issues

ProblemCauseSolution
"No models available"No providers configuredAdd at least one API key
Always routes to same modelOnly one provider activeAdd more providers for variety
Expensive model for simple queryComplexity mis-detectedUse [show routing] to debug
Real-time query not using GrokGrok not configuredAdd xAI API key
Large document failsContext exceededCheck if Gemini Pro available

Debugging Routing Decisions

  1. Add [show routing] to your message:

    [show routing] Your question here
    
  2. Check provider status:

    /router status
    
  3. Force refresh providers:

    /router refresh
    

Model Not Responding

If a model times out or errors:

  1. Router automatically tries fallback chain
  2. Check /router status for recent failures
  3. Verify API key is valid
  4. Check provider status page (Anthropic, OpenAI, etc.)

Unexpected Model Selection

If routing seems wrong:

  1. Check intent detection — is it classifying correctly?
  2. Check complexity — short queries = SIMPLE = cheap models
  3. Check special cases — context size, real-time needs
  4. Override manually: use opus: your query

Cost Higher Than Expected

  1. Check if queries are being classified as COMPLEX
  2. Use [show routing] to see why
  3. Simplify queries for simple tasks
  4. Add budget models (Flash, Haiku) if missing

🤝 Contributing

Contributions welcome! Areas of interest:

  • Additional provider support
  • Improved intent classification
  • Better complexity estimation heuristics
  • Performance optimizations

📄 License

MIT License — see LICENSE


🙏 Acknowledgments

  • Built for OpenClaw
  • Inspired by the need to stop paying $15/1M tokens for "what's 2+2?"

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

- Never send sensitive data to untrusted models - API keys handled via environment/auth profiles only - See `references/security.md` for full security guidance

Requirements

  • OpenClaw CLI installed and configured.
  • Language: Markdown
  • License: MIT
  • Topics:

Configuration

```python RETRY_CONFIG = { "initial_timeout_ms": 30_000, # 30s for first attempt "fallback_timeout_ms": 20_000, # 20s for fallback attempts (faster fail) "max_retries_per_model": 1, # Don't retry same model "backoff_multiplier": 1.5, # Not used (no same-model retry) "circuit_breaker_threshold": 3, # Failures before skipping model entirely "circuit_breaker_reset_ms": 300_000 # 5 min before trying failed model again } ``` **Circuit breaker:** If a model fails 3 times in 5 minutes, skip it entirely for the next 5 minutes. This prevents repeatedly hitting a down service.

FAQ

How do I install smart-router?

Run openclaw add @c0nspic0us7urk3r/smart-router in your terminal. This installs smart-router into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/c0nspic0us7urk3r/smart-router. Review commits and README documentation before installing.