diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..f91e052 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,292 @@ +# Tiger Command Center — Architecture + +*Last updated: 2026-05-03. Covers all services through the hardening session.* + +--- + +## 1. System Overview + +Self-hosted AI agent orchestration on a Hetzner VPS (77.42.82.225, 8 GB RAM, Helsinki). +Three host services + one containerised AI runtime behind Traefik. + +Topology: + +``` +Internet/Manohar + | HTTPS 443 + v +dokploy-traefik (v3.6.7) + | + +-- agent.manohargupta.com --> tiger-dashboard (Next.js, :3100) + | | + | tiger-bridge (Express, :3456, 127.0.0.1 only) + | | docker exec + | tiger-openclaw (OpenClaw v2026.3.12) + | | + | MiniMax-M2.7 -> openrouter/auto -> trinity:free + | +Telegram @Tiger_4321_bot <-- /tiger/notify <-- Tiger agent +``` + +--- + +## 2. Services + +### 2.1 tiger-openclaw (Docker container) + +| Property | Value | +|----------|-------| +| Image | ghcr.io/openclaw/openclaw:2026.3.12 | +| Container | tiger-openclaw | +| User | node (uid=1000) | +| Config | /home/node/.openclaw/openclaw.json | +| Workspace | /home/node/.openclaw/workspace/ | +| Volumes | tiger-config, tiger-workspace | +| Bind mount | /root/OpenClawDashboard -> /home/node/dashboard:rw | +| Compose | /opt/tiger/docker-compose.yml | + +Agents: Tiger (orchestrator), Cody (coder), Ethan (researcher), Cathy (writer), Elon (PM). + +Model chain (agents.defaults.model in openclaw.json): + primary : minimax/MiniMax-M2.7 + fallback1: openrouter/auto + fallback2: openrouter/arcee-ai/trinity-large-preview:free (free - billing safety net) + +Cron jobs (cron/jobs.json): + Tiger: Hourly Task Check-in 0 * * * * IST 90s timeout + Tiger: Weekly Digest 0 9 * * 1 IST 90s timeout + +Both use delivery.mode="none" — they notify via curl to /tiger/notify, not OpenClaw delivery channel. + "none" = no channel opened at all (correct: cron delivers via curl) + "silent" = suppresses chat display but still opens the channel (wrong model for cron) + +### 2.2 tiger-bridge (systemd: tiger-bridge.service) + + Language : TypeScript/Express -> bridge/dist/ + Port : 3456, 127.0.0.1 only (UFW blocks public access) + Source : /root/OpenClawDashboard/bridge/src/ + Auth : Authorization: Bearer TIGER_BRIDGE_TOKEN (all routes) + SQLite : /root/OpenClawDashboard/bridge/tiger.db + Tables : tasks, projects, messages (chat history), agents + +Token shared with: dashboard (server-side only), Tiger cron curl commands, Tiger env var. + +### 2.3 tiger-dashboard (systemd: tiger-dashboard.service) + + Framework : Next.js 14, App Router + Port : 3100 + URL : agent.manohargupta.com (via Traefik) + Source : /root/OpenClawDashboard/dashboard/src/ + WorkingDir : /root/OpenClawDashboard/dashboard + +All API calls are server-side route handlers — bearer token never reaches the browser. + +Build discipline: NEVER run npm run build while next start is live. +In-memory and on-disk manifests split-brain -> ChunkLoadError in browser. Correct: + systemctl stop tiger-dashboard + npm run build + systemctl start tiger-dashboard + +### 2.4 Traefik (dokploy-traefik v3.6.7) + +File provider: /etc/dokploy/traefik/dynamic/ (host = container path, live reload). +One .yml file per service. No restart needed on edits. + +BasicAuth: single $ in bcrypt hash in YAML (not $$ — that is Docker label syntax). +Generate: htpasswd -nbB manohar 'password' + +UFW FORWARD — use subnet rules, not specific IPs (bridge IP changes on Traefik restart): + ufw route allow proto tcp from any to 172.17.0.0/16 port 80 + ufw route allow proto tcp from any to 172.17.0.0/16 port 443 + +--- + +## 3. Full API Surface (40+ routes, all Bearer-token protected) + +### Health + GET /tiger/status container health, memory/CPU + GET /tiger/logs SSE stream of container logs + +### Config + GET /tiger/config read openclaw.json + POST /tiger/config update openclaw.json + GET /tiger/config/models list LLM providers + models + GET /tiger/config/models/agents per-agent model overrides + PATCH /tiger/config/models/agents/:id update agent model + +### File-Backed Tasks and Projects (canonical source of truth) + GET /tiger/file-tasks TASKS.md JSON block -> tasks[] + GET /tiger/file-tasks/active in-progress + pending-action only + GET /tiger/file-tasks/completed completed section only + GET /tiger/file-tasks/projects PROJECTS.md JSON block -> projects[] + + Parser contract: TASKS.md must contain a fenced json TASKS block at end-of-file. + Absent -> 502 "TASKS.md missing TASKS json block". No regex fallback. + Tiger always emits this block on every TASKS.md write. + +### SQLite Tasks and Projects (legacy, used for dispatch queue) + GET /tiger/tasks list tasks + GET /tiger/tasks/:id get task + PUT /tiger/tasks/:id update task + DELETE /tiger/tasks/:id delete task + POST /tiger/tasks/:id/execute enqueue for execution + GET /tiger/projects list projects + POST /tiger/projects create project + GET /tiger/projects/:id get project + PUT /tiger/projects/:id update project + DELETE /tiger/projects/:id delete project + GET /tiger/projects/:id/tasks tasks in project + POST /tiger/projects/:id/tasks add task to project + +### Agents and Workspace + GET /tiger/agents list configured agents + GET /tiger/agents/:id/files list agent workspace files + GET /tiger/agents/:id/file read specific agent file + PUT /tiger/agents/:id/file write agent file + GET /tiger/agents/activity recent agent activity log + GET /tiger/workspace list workspace root files + GET /tiger/files/:path read workspace file by path + +### Chat (SSE streaming) + POST /tiger/chat SSE stream chat -> Tiger agent + GET /tiger/chat/history recent messages (SQLite) + DELETE /tiger/chat/history clear history + POST /tiger/chat/persist persist message to SQLite + + Shell safety: tempfile pattern (not string interpolation): + Write message -> /tmp/msg_ts.txt + docker cp /tmp/msg.txt tiger-openclaw:/tmp/msg.txt + docker exec openclaw agent -m "$(cat /tmp/msg.txt)" + +### Dispatch + POST /tiger/dispatch enqueue task -> SQLite + agent inbox file + GET /tiger/dispatch/status/:id poll execution status + +### Cron + GET /tiger/cron list jobs.json + POST /tiger/cron/:id/run fire job manually + +### Notifications and Routing + POST /tiger/notify send Telegram msg {message, chatId?} + POST /tiger/route-task LLM router: which agent handles this? + +### Keys + GET /tiger/keys presence map only (no values returned) + PATCH /tiger/keys upsert a key + DELETE /tiger/keys/:name remove a key + +### Ops + POST /tiger/exec run command in container (auth-gated) + POST /tiger/restart restart tiger-openclaw + POST /tiger/deploy-dashboard git pull + build + restart dashboard + ALL /api/gateway proxy to OpenClaw gateway port 18789 + +--- + +## 4. Data Flows + +### Chat Message + + Browser -> POST /tiger/chat (SSE) + bridge writes message -> /tmp/msg_ts.txt + docker cp -> tiger-openclaw:/tmp/msg_ts.txt + docker exec openclaw agent --session-id id -m "$(cat /tmp/msg.txt)" + OpenClaw -> MiniMax (or fallback chain) + SSE tokens -> bridge -> browser + POST /tiger/chat/persist -> SQLite messages + +### Cron Job Notification + + OpenClaw cron (hourly, IST) + Tiger reads TASKS.md from workspace + if active tasks: + curl POST http://172.17.0.1:3456/tiger/notify + Authorization: Bearer TOKEN + body: {message: status update} + bridge -> Telegram Bot API -> @Tiger_4321_bot -> Manohar + if HEARTBEAT_OK: + nothing sent + +--- + +## 5. Failure Modes + +| Scenario | What happens | Recovery | +|----------|-------------|----------| +| MiniMax timeout >90s | Falls to openrouter/auto | Automatic | +| OpenRouter billing error | Falls to trinity-large:free | Automatic | +| All LLMs fail | Chat 500; cron errors | Check /tiger/keys; top up credits | +| tiger-openclaw dies | 500 on exec routes | docker restart tiger-openclaw | +| Bridge EADDRINUSE | systemd restart fails (stale nohup) | pkill -f node.*dist/index then start | +| SQLite locked | Dispatch write contention | Retryable; rare | +| ChunkLoadError | Build ran while next start was live | systemctl restart tiger-dashboard | +| Traefik bridge IP change | UFW FORWARD drops traffic | Use subnet rules not specific IPs | +| TASKS.md missing JSON block | /tiger/file-tasks returns 502 | Tiger rewrites TASKS.md | + +--- + +## 6. Deploy Workflow + + On Mac: + cd ~/MyProjects/NemoClawDashboard + npm run build # preflight: catch errors locally first + git add -p # atomic commits, no git add -A + git push origin main + + On server (scripts/deploy.sh): + cd /root/OpenClawDashboard && git pull + cd bridge && npx tsc --noEmit && npm run build + systemctl restart tiger-bridge + cd ../dashboard + systemctl stop tiger-dashboard + npm run build + systemctl start tiger-dashboard + bash /root/OpenClawDashboard/scripts/smoke-test.sh + + Mutagen: pause before server-side edits, resume after verifying build. + Bind-mount perms: chown -R 1000:1000 /root/OpenClawDashboard + +--- + +## 7. File Layout + + /root/OpenClawDashboard/ canonical source (has .git) + /root/NemoClawDashboard/ HOLLOW / WRONG -- never use + ~/MyProjects/NemoClawDashboard Mac-side Mutagen source + + bridge/src/ + index.ts entry point; full route list in file header comment + auth.ts bearer token middleware + tiger.ts docker exec wrapper; SSH prefix for local dev + db.ts SQLite schema + helpers + lib/llm.ts LLM routing + model fallback chain + lib/telegram.ts Telegram Bot API client (tempfile pattern) + routes/ one file per route group (40+ routes) + + dashboard/src/ + app/ Next.js App Router pages + components/ React components + + scripts/smoke-test.sh run after every deploy + ARCHITECTURE.md this file + + /opt/tiger/docker-compose.yml OpenClaw container definition + + /var/lib/docker/volumes/tiger_tiger-config/_data/ + openclaw.json live config + *.bak.json auto-backups (keep latest 3) + cron/jobs.json cron job definitions + +--- + +## 8. Security Posture + + UFW: 22, 80, 443 open publicly. + 3456 (bridge) only from Docker bridge subnets. + 3000 (Dokploy), 3100 (dashboard) not directly exposed -- only via Traefik. + + Bearer token: 64-char hex. Never logged, never sent to browser. Rotate via bridge/.env. + Traefik BasicAuth: bcrypt, single $ in YAML files. Realm: Tiger Command Center. + OpenClaw gateway: bind: lan (Docker bridge only). Token in openclaw.json. + /tiger/exec: auth-gated. Arbitrary command execution requires bearer token. + /tiger/keys GET: presence map only. Key values never returned by any endpoint.