ARCHITECTURE was last true on 2026-05-03 (pre-gateway, OpenRouter chains, webhook mirror). Now documents: LiteLLM gateway routing, real spawning, inbox loop, transcript mirror, audit trail, token rotation procedure, RAM constraints. README no longer says 'Clawd Dashboard'.
6.3 KiB
Tiger Command Center — Architecture
Last updated: 2026-06-10. Covers the gateway migration, real sub-agent spawning, the TASKS.md inbox loop, the Telegram transcript mirror, and the unified audit trail.
1. System Overview
Self-hosted AI agent orchestration on a Hetzner VPS (8 GB RAM, Helsinki; Tailscale 100.75.128.45). Three host services + one containerised AI runtime behind Traefik, with ALL model traffic routed through a self-hosted LiteLLM gateway — no third-party balance can silently kill the system.
Internet/Manohar
| HTTPS 443
v
dokploy-traefik (v3.6.7)
|
+-- agent.manohargupta.com --> tiger-dashboard (Next.js, :3100)
| | /api/* proxies (token server-side)
| v
| tiger-bridge (Express+tsx, :3456, localhost)
| | docker exec / volume reads
| v
| tiger-openclaw (OpenClaw v2026.3.12)
| |
+-- llm.manohargupta.com ----> litellm-gateway <-- ALL model calls
| |-- MiniMax API (own key): minimax-3 (primary),
| | minimax-2.7, minimax-2.7-fast
| +-- Anthropic API (own key): claude-haiku, claude-sonnet
|
+-- angel.manohargupta.com --> position-tracker (standalone repo/deploy)
|
Telegram @Tiger_4321_bot <--> OpenClaw native channel (long-polling, owns the bot)
2. Model Routing (post-OpenRouter)
OpenRouter was removed 2026-06-10 after its credits ran dry and silently broke both Tiger and the bridge's classifier. Everything now goes through the self-hosted gateway:
- OpenClaw (
openclaw.json): custom providerlitellm(baseUrl: https://llm.manohargupta.com/v1,api: openai-completions). Primarylitellm/minimax-3(1M ctx), fallbackslitellm/minimax-2.7→litellm/claude-haiku(cross-provider: survives a MiniMax outage). - Bridge (
lib/llm.ts): slugs startinganthropic/go to Anthropic direct; everything else goes to the gateway. Env:LLM_GATEWAY_URL,LLM_GATEWAY_KEY,TIGER_ROUTER_MODEL(defaultminimax-3). - Gateway config:
/root/litellm/litellm_config.yaml(request_timeout: 300to match the cron budget).
3. Sub-Agent Execution (the orchestration layer)
bridge/src/lib/agents.ts is the canonical specialist registry:
cody (code), ethan (research), cathy (writing), elon (PM).
Legacy ids coder/researcher/writer/pm are accepted as aliases.
A spawn (POST /tiger/spawn) runs an isolated OpenClaw session
(--session-id spawn-<agent>-<id>) with the specialist persona prepended.
Message transport is docker-cp of a temp file (escaping-proof). Runs are
tracked in the executions table and serialized (MAX_CONCURRENT=1 —
parallel turns push the 8GB host into swap and everything times out).
Completion fires a Telegram notification via /tiger/notify.
Upgrade path: define real per-agent entries in openclaw.json agents.list
(own IDENTITY.md + workspace each), then change the --agent flag in
spawn.ts. Documented in lib/agents.ts; deferred until the RAM situation is
resolved.
4. TASKS.md Inbox Loop
workspace/TASKS.md has a ## 📥 INBOX section. bridge/src/lib/inbox.ts
checks every 30 min (09:00–20:00 IST): takes the first - [ ] line,
classifies it (classifyAgent), spawns the specialist, rewrites the line to
- [⏳ run-id → agent]. Manual trigger: POST /tiger/inbox/drain.
Bridge-side scheduling means zero model tokens burned on empty checks and
no bearer tokens embedded in cron prompts.
5. Telegram
- The bot is owned by OpenClaw's native channel (long-polling). The
bridge's
TelegramChannel,telegram-webhook.tsandchat-mirror.tsare legacy: Telegram forbids webhook + getUpdates on one token, so the webhook design could never receive a message. - The dashboard mirror reads the native session transcript —
routes/chat-telegram.tsresolves thetelegram:session fromsessions.jsonand serves the JSONL with cursor pagination and mtime caching. It filters to what Telegram actually saw: assistant messages carrying toolCall blocks (working narration) are skipped, thinking blocks ignored, injected metadata/system boilerplate stripped from user messages.
6. Audit Trail
GET /tiger/activity/audit merges, at read time, every durable action
store: executions (spawns), tasks (lifecycle), outputs (artifacts),
and OpenClaw's cron run JSONL. Cursor-paginated (before=<ISO>), type
filters. The dashboard /activity page adds recent file-modification
events on the first page. Read-time merging means history is complete
retroactively and no action can happen without its audit row.
7. Crons (OpenClaw, tz Asia/Kolkata)
| Job | Schedule | Timeout |
|---|---|---|
| Trade Baseline Reset | 9:15 daily | 60s |
| Trade P&L Monitor | every 2 min | 60s |
| Hourly Trade Summary + News | hourly | 90s |
| Hourly Task Check-in | 0 9-21 | 300s |
| EOD Trade Summary | 16:00 Mon–Fri | 300s |
| Weekly Digest | Mon 9:00 | 300s |
Timeout budget rationale: agent turns on this RAM-starved host can take minutes; 300s is the ceiling that made chronically-failing jobs pass.
8. Security Posture
- Bridge: Bearer auth on all routes; token in
bridge/.env+dashboard/.env.local+ embedded in cron payloads (rotate all four together —jobs.jsonhas it twice). Rotated 2026-06-10 after the old token leaked via a hardcode inagents-activity.tsto the public GitHub mirror. NEVER hardcode tokens in source: this repo mirrors publicly. - Git: Forgejo (origin, SSH port 2222, key
id_ed25519_forgejo) + GitHub mirror. Push both. - position-tracker binds 127.0.0.1:3457; public access via Traefik at angel.manohargupta.com.
- Known weak spots: litellm-db password,
/opt/dashboardfossil with a stale token, dual Telegram pollers (bridge poller should be disabled).
9. Known Constraints
- RAM: ~13GB workload on 8GB physical; 6+GB swap in steady state. This is the root cause of historical cron timeouts and the reason spawn concurrency is 1. Decision pending: evict homelab services vs upgrade.
- OpenClaw v2026.3.12 predates MiniMax-M3, hence the explicit
litellm/minimax-3provider-prefixed model id.