ARCHITECTURE was last true on 2026-05-03 (pre-gateway, OpenRouter chains, webhook mirror). Now documents: LiteLLM gateway routing, real spawning, inbox loop, transcript mirror, audit trail, token rotation procedure, RAM constraints. README no longer says 'Clawd Dashboard'.
139 lines
6.3 KiB
Markdown
139 lines
6.3 KiB
Markdown
# Tiger Command Center — Architecture
|
||
|
||
*Last updated: 2026-06-10. Covers the gateway migration, real sub-agent
|
||
spawning, the TASKS.md inbox loop, the Telegram transcript mirror, and the
|
||
unified audit trail.*
|
||
|
||
---
|
||
|
||
## 1. System Overview
|
||
|
||
Self-hosted AI agent orchestration on a Hetzner VPS (8 GB RAM, Helsinki;
|
||
Tailscale 100.75.128.45). Three host services + one containerised AI
|
||
runtime behind Traefik, with ALL model traffic routed through a self-hosted
|
||
LiteLLM gateway — no third-party balance can silently kill the system.
|
||
|
||
```
|
||
Internet/Manohar
|
||
| HTTPS 443
|
||
v
|
||
dokploy-traefik (v3.6.7)
|
||
|
|
||
+-- agent.manohargupta.com --> tiger-dashboard (Next.js, :3100)
|
||
| | /api/* proxies (token server-side)
|
||
| v
|
||
| tiger-bridge (Express+tsx, :3456, localhost)
|
||
| | docker exec / volume reads
|
||
| v
|
||
| tiger-openclaw (OpenClaw v2026.3.12)
|
||
| |
|
||
+-- llm.manohargupta.com ----> litellm-gateway <-- ALL model calls
|
||
| |-- MiniMax API (own key): minimax-3 (primary),
|
||
| | minimax-2.7, minimax-2.7-fast
|
||
| +-- Anthropic API (own key): claude-haiku, claude-sonnet
|
||
|
|
||
+-- angel.manohargupta.com --> position-tracker (standalone repo/deploy)
|
||
|
|
||
Telegram @Tiger_4321_bot <--> OpenClaw native channel (long-polling, owns the bot)
|
||
```
|
||
|
||
## 2. Model Routing (post-OpenRouter)
|
||
|
||
OpenRouter was removed 2026-06-10 after its credits ran dry and silently
|
||
broke both Tiger and the bridge's classifier. Everything now goes through
|
||
the self-hosted gateway:
|
||
|
||
- **OpenClaw** (`openclaw.json`): custom provider `litellm`
|
||
(`baseUrl: https://llm.manohargupta.com/v1`, `api: openai-completions`).
|
||
Primary `litellm/minimax-3` (1M ctx), fallbacks `litellm/minimax-2.7` →
|
||
`litellm/claude-haiku` (cross-provider: survives a MiniMax outage).
|
||
- **Bridge** (`lib/llm.ts`): slugs starting `anthropic/` go to Anthropic
|
||
direct; everything else goes to the gateway. Env: `LLM_GATEWAY_URL`,
|
||
`LLM_GATEWAY_KEY`, `TIGER_ROUTER_MODEL` (default `minimax-3`).
|
||
- **Gateway config**: `/root/litellm/litellm_config.yaml`
|
||
(`request_timeout: 300` to match the cron budget).
|
||
|
||
## 3. Sub-Agent Execution (the orchestration layer)
|
||
|
||
`bridge/src/lib/agents.ts` is the canonical specialist registry:
|
||
**cody** (code), **ethan** (research), **cathy** (writing), **elon** (PM).
|
||
Legacy ids coder/researcher/writer/pm are accepted as aliases.
|
||
|
||
A spawn (`POST /tiger/spawn`) runs an isolated OpenClaw session
|
||
(`--session-id spawn-<agent>-<id>`) with the specialist persona prepended.
|
||
Message transport is docker-cp of a temp file (escaping-proof). Runs are
|
||
tracked in the `executions` table and serialized (`MAX_CONCURRENT=1` —
|
||
parallel turns push the 8GB host into swap and everything times out).
|
||
Completion fires a Telegram notification via `/tiger/notify`.
|
||
|
||
Upgrade path: define real per-agent entries in `openclaw.json agents.list`
|
||
(own IDENTITY.md + workspace each), then change the `--agent` flag in
|
||
spawn.ts. Documented in lib/agents.ts; deferred until the RAM situation is
|
||
resolved.
|
||
|
||
## 4. TASKS.md Inbox Loop
|
||
|
||
`workspace/TASKS.md` has a `## 📥 INBOX` section. `bridge/src/lib/inbox.ts`
|
||
checks every 30 min (09:00–20:00 IST): takes the first `- [ ]` line,
|
||
classifies it (`classifyAgent`), spawns the specialist, rewrites the line to
|
||
`- [⏳ run-id → agent]`. Manual trigger: `POST /tiger/inbox/drain`.
|
||
Bridge-side scheduling means zero model tokens burned on empty checks and
|
||
no bearer tokens embedded in cron prompts.
|
||
|
||
## 5. Telegram
|
||
|
||
- **The bot is owned by OpenClaw's native channel** (long-polling). The
|
||
bridge's `TelegramChannel`, `telegram-webhook.ts` and `chat-mirror.ts`
|
||
are legacy: Telegram forbids webhook + getUpdates on one token, so the
|
||
webhook design could never receive a message.
|
||
- **The dashboard mirror reads the native session transcript** —
|
||
`routes/chat-telegram.ts` resolves the `telegram:` session from
|
||
`sessions.json` and serves the JSONL with cursor pagination and mtime
|
||
caching. It filters to what Telegram actually saw: assistant messages
|
||
carrying toolCall blocks (working narration) are skipped, thinking blocks
|
||
ignored, injected metadata/system boilerplate stripped from user messages.
|
||
|
||
## 6. Audit Trail
|
||
|
||
`GET /tiger/activity/audit` merges, at read time, every durable action
|
||
store: `executions` (spawns), `tasks` (lifecycle), `outputs` (artifacts),
|
||
and OpenClaw's cron run JSONL. Cursor-paginated (`before=<ISO>`), type
|
||
filters. The dashboard `/activity` page adds recent file-modification
|
||
events on the first page. Read-time merging means history is complete
|
||
retroactively and no action can happen without its audit row.
|
||
|
||
## 7. Crons (OpenClaw, tz Asia/Kolkata)
|
||
|
||
| Job | Schedule | Timeout |
|
||
|---|---|---|
|
||
| Trade Baseline Reset | 9:15 daily | 60s |
|
||
| Trade P&L Monitor | every 2 min | 60s |
|
||
| Hourly Trade Summary + News | hourly | 90s |
|
||
| Hourly Task Check-in | 0 9-21 | 300s |
|
||
| EOD Trade Summary | 16:00 Mon–Fri | 300s |
|
||
| Weekly Digest | Mon 9:00 | 300s |
|
||
|
||
Timeout budget rationale: agent turns on this RAM-starved host can take
|
||
minutes; 300s is the ceiling that made chronically-failing jobs pass.
|
||
|
||
## 8. Security Posture
|
||
|
||
- Bridge: Bearer auth on all routes; token in `bridge/.env` +
|
||
`dashboard/.env.local` + embedded in cron payloads (rotate all four
|
||
together — `jobs.json` has it twice). Rotated 2026-06-10 after the old
|
||
token leaked via a hardcode in `agents-activity.ts` to the public GitHub
|
||
mirror. NEVER hardcode tokens in source: this repo mirrors publicly.
|
||
- Git: Forgejo (origin, SSH port 2222, key `id_ed25519_forgejo`) + GitHub
|
||
mirror. Push both.
|
||
- position-tracker binds 127.0.0.1:3457; public access via Traefik at
|
||
angel.manohargupta.com.
|
||
- Known weak spots: litellm-db password, `/opt/dashboard` fossil with a
|
||
stale token, dual Telegram pollers (bridge poller should be disabled).
|
||
|
||
## 9. Known Constraints
|
||
|
||
- **RAM**: ~13GB workload on 8GB physical; 6+GB swap in steady state. This
|
||
is the root cause of historical cron timeouts and the reason spawn
|
||
concurrency is 1. Decision pending: evict homelab services vs upgrade.
|
||
- OpenClaw v2026.3.12 predates MiniMax-M3, hence the explicit
|
||
`litellm/minimax-3` provider-prefixed model id.
|