# Tiger Command Center β€” Architecture *Last updated: 2026-06-10. Covers the gateway migration, real sub-agent spawning, the TASKS.md inbox loop, the Telegram transcript mirror, and the unified audit trail.* --- ## 1. System Overview Self-hosted AI agent orchestration on a Hetzner VPS (8 GB RAM, Helsinki; Tailscale 100.75.128.45). Three host services + one containerised AI runtime behind Traefik, with ALL model traffic routed through a self-hosted LiteLLM gateway β€” no third-party balance can silently kill the system. ``` Internet/Manohar | HTTPS 443 v dokploy-traefik (v3.6.7) | +-- agent.manohargupta.com --> tiger-dashboard (Next.js, :3100) | | /api/* proxies (token server-side) | v | tiger-bridge (Express+tsx, :3456, localhost) | | docker exec / volume reads | v | tiger-openclaw (OpenClaw v2026.3.12) | | +-- llm.manohargupta.com ----> litellm-gateway <-- ALL model calls | |-- MiniMax API (own key): minimax-3 (primary), | | minimax-2.7, minimax-2.7-fast | +-- Anthropic API (own key): claude-haiku, claude-sonnet | +-- angel.manohargupta.com --> position-tracker (standalone repo/deploy) | Telegram @Tiger_4321_bot <--> OpenClaw native channel (long-polling, owns the bot) ``` ## 2. Model Routing (post-OpenRouter) OpenRouter was removed 2026-06-10 after its credits ran dry and silently broke both Tiger and the bridge's classifier. Everything now goes through the self-hosted gateway: - **OpenClaw** (`openclaw.json`): custom provider `litellm` (`baseUrl: https://llm.manohargupta.com/v1`, `api: openai-completions`). Primary `litellm/minimax-3` (1M ctx), fallbacks `litellm/minimax-2.7` β†’ `litellm/claude-haiku` (cross-provider: survives a MiniMax outage). - **Bridge** (`lib/llm.ts`): slugs starting `anthropic/` go to Anthropic direct; everything else goes to the gateway. Env: `LLM_GATEWAY_URL`, `LLM_GATEWAY_KEY`, `TIGER_ROUTER_MODEL` (default `minimax-3`). - **Gateway config**: `/root/litellm/litellm_config.yaml` (`request_timeout: 300` to match the cron budget). ## 3. Sub-Agent Execution (the orchestration layer) `bridge/src/lib/agents.ts` is the canonical specialist registry: **cody** (code), **ethan** (research), **cathy** (writing), **elon** (PM). Legacy ids coder/researcher/writer/pm are accepted as aliases. A spawn (`POST /tiger/spawn`) runs an isolated OpenClaw session (`--session-id spawn--`) with the specialist persona prepended. Message transport is docker-cp of a temp file (escaping-proof). Runs are tracked in the `executions` table and serialized (`MAX_CONCURRENT=1` β€” parallel turns push the 8GB host into swap and everything times out). Completion fires a Telegram notification via `/tiger/notify`. Upgrade path: define real per-agent entries in `openclaw.json agents.list` (own IDENTITY.md + workspace each), then change the `--agent` flag in spawn.ts. Documented in lib/agents.ts; deferred until the RAM situation is resolved. ## 4. TASKS.md Inbox Loop `workspace/TASKS.md` has a `## πŸ“₯ INBOX` section. `bridge/src/lib/inbox.ts` checks every 30 min (09:00–20:00 IST): takes the first `- [ ]` line, classifies it (`classifyAgent`), spawns the specialist, rewrites the line to `- [⏳ run-id β†’ agent]`. Manual trigger: `POST /tiger/inbox/drain`. Bridge-side scheduling means zero model tokens burned on empty checks and no bearer tokens embedded in cron prompts. ## 5. Telegram - **The bot is owned by OpenClaw's native channel** (long-polling). The bridge's `TelegramChannel`, `telegram-webhook.ts` and `chat-mirror.ts` are legacy: Telegram forbids webhook + getUpdates on one token, so the webhook design could never receive a message. - **The dashboard mirror reads the native session transcript** β€” `routes/chat-telegram.ts` resolves the `telegram:` session from `sessions.json` and serves the JSONL with cursor pagination and mtime caching. It filters to what Telegram actually saw: assistant messages carrying toolCall blocks (working narration) are skipped, thinking blocks ignored, injected metadata/system boilerplate stripped from user messages. ## 6. Audit Trail `GET /tiger/activity/audit` merges, at read time, every durable action store: `executions` (spawns), `tasks` (lifecycle), `outputs` (artifacts), and OpenClaw's cron run JSONL. Cursor-paginated (`before=`), type filters. The dashboard `/activity` page adds recent file-modification events on the first page. Read-time merging means history is complete retroactively and no action can happen without its audit row. ## 7. Crons (OpenClaw, tz Asia/Kolkata) | Job | Schedule | Timeout | |---|---|---| | Trade Baseline Reset | 9:15 daily | 60s | | Trade P&L Monitor | every 2 min | 60s | | Hourly Trade Summary + News | hourly | 90s | | Hourly Task Check-in | 0 9-21 | 300s | | EOD Trade Summary | 16:00 Mon–Fri | 300s | | Weekly Digest | Mon 9:00 | 300s | Timeout budget rationale: agent turns on this RAM-starved host can take minutes; 300s is the ceiling that made chronically-failing jobs pass. ## 8. Security Posture - Bridge: Bearer auth on all routes; token in `bridge/.env` + `dashboard/.env.local` + embedded in cron payloads (rotate all four together β€” `jobs.json` has it twice). Rotated 2026-06-10 after the old token leaked via a hardcode in `agents-activity.ts` to the public GitHub mirror. NEVER hardcode tokens in source: this repo mirrors publicly. - Git: Forgejo (origin, SSH port 2222, key `id_ed25519_forgejo`) + GitHub mirror. Push both. - position-tracker binds 127.0.0.1:3457; public access via Traefik at angel.manohargupta.com. - Known weak spots: litellm-db password, `/opt/dashboard` fossil with a stale token, dual Telegram pollers (bridge poller should be disabled). ## 9. Known Constraints - **RAM**: ~13GB workload on 8GB physical; 6+GB swap in steady state. This is the root cause of historical cron timeouts and the reason spawn concurrency is 1. Decision pending: evict homelab services vs upgrade. - OpenClaw v2026.3.12 predates MiniMax-M3, hence the explicit `litellm/minimax-3` provider-prefixed model id.