docs: rewrite README + ARCHITECTURE for 2026-06-10 reality, extend TOOLS

ARCHITECTURE was last true on 2026-05-03 (pre-gateway, OpenRouter chains,
webhook mirror). Now documents: LiteLLM gateway routing, real spawning,
inbox loop, transcript mirror, audit trail, token rotation procedure,
RAM constraints. README no longer says 'Clawd Dashboard'.
This commit is contained in:
Manohar 2026-06-10 14:59:41 +00:00
parent 0fcc209020
commit 50a6520c20
3 changed files with 165 additions and 330 deletions

View file

@ -1,15 +1,17 @@
# Tiger Command Center — Architecture
*Last updated: 2026-05-03. Covers all services through the hardening session.*
*Last updated: 2026-06-10. Covers the gateway migration, real sub-agent
spawning, the TASKS.md inbox loop, the Telegram transcript mirror, and the
unified audit trail.*
---
## 1. System Overview
Self-hosted AI agent orchestration on a Hetzner VPS (77.42.82.225, 8 GB RAM, Helsinki).
Three host services + one containerised AI runtime behind Traefik.
Topology:
Self-hosted AI agent orchestration on a Hetzner VPS (8 GB RAM, Helsinki;
Tailscale 100.75.128.45). Three host services + one containerised AI
runtime behind Traefik, with ALL model traffic routed through a self-hosted
LiteLLM gateway — no third-party balance can silently kill the system.
```
Internet/Manohar
@ -18,275 +20,120 @@ Internet/Manohar
dokploy-traefik (v3.6.7)
|
+-- agent.manohargupta.com --> tiger-dashboard (Next.js, :3100)
| |
| tiger-bridge (Express, :3456, 127.0.0.1 only)
| | docker exec
| | /api/* proxies (token server-side)
| v
| tiger-bridge (Express+tsx, :3456, localhost)
| | docker exec / volume reads
| v
| tiger-openclaw (OpenClaw v2026.3.12)
| |
| MiniMax-M2.7 -> openrouter/auto -> trinity:free
+-- llm.manohargupta.com ----> litellm-gateway <-- ALL model calls
| |-- MiniMax API (own key): minimax-3 (primary),
| | minimax-2.7, minimax-2.7-fast
| +-- Anthropic API (own key): claude-haiku, claude-sonnet
|
Telegram @Tiger_4321_bot <-- /tiger/notify <-- Tiger agent
+-- angel.manohargupta.com --> position-tracker (standalone repo/deploy)
|
Telegram @Tiger_4321_bot <--> OpenClaw native channel (long-polling, owns the bot)
```
---
## 2. Model Routing (post-OpenRouter)
## 2. Services
OpenRouter was removed 2026-06-10 after its credits ran dry and silently
broke both Tiger and the bridge's classifier. Everything now goes through
the self-hosted gateway:
### 2.1 tiger-openclaw (Docker container)
- **OpenClaw** (`openclaw.json`): custom provider `litellm`
(`baseUrl: https://llm.manohargupta.com/v1`, `api: openai-completions`).
Primary `litellm/minimax-3` (1M ctx), fallbacks `litellm/minimax-2.7`
`litellm/claude-haiku` (cross-provider: survives a MiniMax outage).
- **Bridge** (`lib/llm.ts`): slugs starting `anthropic/` go to Anthropic
direct; everything else goes to the gateway. Env: `LLM_GATEWAY_URL`,
`LLM_GATEWAY_KEY`, `TIGER_ROUTER_MODEL` (default `minimax-3`).
- **Gateway config**: `/root/litellm/litellm_config.yaml`
(`request_timeout: 300` to match the cron budget).
| Property | Value |
|----------|-------|
| Image | ghcr.io/openclaw/openclaw:2026.3.12 |
| Container | tiger-openclaw |
| User | node (uid=1000) |
| Config | /home/node/.openclaw/openclaw.json |
| Workspace | /home/node/.openclaw/workspace/ |
| Volumes | tiger-config, tiger-workspace |
| Bind mount | /root/OpenClawDashboard -> /home/node/dashboard:rw |
| Compose | /opt/tiger/docker-compose.yml |
## 3. Sub-Agent Execution (the orchestration layer)
Agents: Tiger (orchestrator), Cody (coder), Ethan (researcher), Cathy (writer), Elon (PM).
`bridge/src/lib/agents.ts` is the canonical specialist registry:
**cody** (code), **ethan** (research), **cathy** (writing), **elon** (PM).
Legacy ids coder/researcher/writer/pm are accepted as aliases.
Model chain (agents.defaults.model in openclaw.json):
primary : minimax/MiniMax-M2.7
fallback1: openrouter/auto
fallback2: openrouter/arcee-ai/trinity-large-preview:free (free - billing safety net)
A spawn (`POST /tiger/spawn`) runs an isolated OpenClaw session
(`--session-id spawn-<agent>-<id>`) with the specialist persona prepended.
Message transport is docker-cp of a temp file (escaping-proof). Runs are
tracked in the `executions` table and serialized (`MAX_CONCURRENT=1`
parallel turns push the 8GB host into swap and everything times out).
Completion fires a Telegram notification via `/tiger/notify`.
Cron jobs (cron/jobs.json):
Tiger: Hourly Task Check-in 0 * * * * IST 90s timeout
Tiger: Weekly Digest 0 9 * * 1 IST 90s timeout
Upgrade path: define real per-agent entries in `openclaw.json agents.list`
(own IDENTITY.md + workspace each), then change the `--agent` flag in
spawn.ts. Documented in lib/agents.ts; deferred until the RAM situation is
resolved.
Both use delivery.mode="none" — they notify via curl to /tiger/notify, not OpenClaw delivery channel.
"none" = no channel opened at all (correct: cron delivers via curl)
"silent" = suppresses chat display but still opens the channel (wrong model for cron)
## 4. TASKS.md Inbox Loop
### 2.2 tiger-bridge (systemd: tiger-bridge.service)
`workspace/TASKS.md` has a `## 📥 INBOX` section. `bridge/src/lib/inbox.ts`
checks every 30 min (09:0020:00 IST): takes the first `- [ ]` line,
classifies it (`classifyAgent`), spawns the specialist, rewrites the line to
`- [⏳ run-id → agent]`. Manual trigger: `POST /tiger/inbox/drain`.
Bridge-side scheduling means zero model tokens burned on empty checks and
no bearer tokens embedded in cron prompts.
Language : TypeScript/Express -> bridge/dist/
Port : 3456, 127.0.0.1 only (UFW blocks public access)
Source : /root/OpenClawDashboard/bridge/src/
Auth : Authorization: Bearer TIGER_BRIDGE_TOKEN (all routes)
SQLite : /root/OpenClawDashboard/bridge/tiger.db
Tables : tasks, projects, messages (chat history), agents
## 5. Telegram
Token shared with: dashboard (server-side only), Tiger cron curl commands, Tiger env var.
- **The bot is owned by OpenClaw's native channel** (long-polling). The
bridge's `TelegramChannel`, `telegram-webhook.ts` and `chat-mirror.ts`
are legacy: Telegram forbids webhook + getUpdates on one token, so the
webhook design could never receive a message.
- **The dashboard mirror reads the native session transcript**
`routes/chat-telegram.ts` resolves the `telegram:` session from
`sessions.json` and serves the JSONL with cursor pagination and mtime
caching. It filters to what Telegram actually saw: assistant messages
carrying toolCall blocks (working narration) are skipped, thinking blocks
ignored, injected metadata/system boilerplate stripped from user messages.
### 2.3 tiger-dashboard (systemd: tiger-dashboard.service)
## 6. Audit Trail
Framework : Next.js 14, App Router
Port : 3100
URL : agent.manohargupta.com (via Traefik)
Source : /root/OpenClawDashboard/dashboard/src/
WorkingDir : /root/OpenClawDashboard/dashboard
`GET /tiger/activity/audit` merges, at read time, every durable action
store: `executions` (spawns), `tasks` (lifecycle), `outputs` (artifacts),
and OpenClaw's cron run JSONL. Cursor-paginated (`before=<ISO>`), type
filters. The dashboard `/activity` page adds recent file-modification
events on the first page. Read-time merging means history is complete
retroactively and no action can happen without its audit row.
All API calls are server-side route handlers — bearer token never reaches the browser.
## 7. Crons (OpenClaw, tz Asia/Kolkata)
Build discipline: NEVER run npm run build while next start is live.
In-memory and on-disk manifests split-brain -> ChunkLoadError in browser. Correct:
systemctl stop tiger-dashboard
npm run build
systemctl start tiger-dashboard
| Job | Schedule | Timeout |
|---|---|---|
| Trade Baseline Reset | 9:15 daily | 60s |
| Trade P&L Monitor | every 2 min | 60s |
| Hourly Trade Summary + News | hourly | 90s |
| Hourly Task Check-in | 0 9-21 | 300s |
| EOD Trade Summary | 16:00 MonFri | 300s |
| Weekly Digest | Mon 9:00 | 300s |
### 2.4 Traefik (dokploy-traefik v3.6.7)
File provider: /etc/dokploy/traefik/dynamic/ (host = container path, live reload).
One .yml file per service. No restart needed on edits.
BasicAuth: single $ in bcrypt hash in YAML (not $$ — that is Docker label syntax).
Generate: htpasswd -nbB manohar 'password'
UFW FORWARD — use subnet rules, not specific IPs (bridge IP changes on Traefik restart):
ufw route allow proto tcp from any to 172.17.0.0/16 port 80
ufw route allow proto tcp from any to 172.17.0.0/16 port 443
---
## 3. Full API Surface (40+ routes, all Bearer-token protected)
### Health
GET /tiger/status container health, memory/CPU
GET /tiger/logs SSE stream of container logs
### Config
GET /tiger/config read openclaw.json
POST /tiger/config update openclaw.json
GET /tiger/config/models list LLM providers + models
GET /tiger/config/models/agents per-agent model overrides
PATCH /tiger/config/models/agents/:id update agent model
### File-Backed Tasks and Projects (canonical source of truth)
GET /tiger/file-tasks TASKS.md JSON block -> tasks[]
GET /tiger/file-tasks/active in-progress + pending-action only
GET /tiger/file-tasks/completed completed section only
GET /tiger/file-tasks/projects PROJECTS.md JSON block -> projects[]
Parser contract: TASKS.md must contain a fenced json TASKS block at end-of-file.
Absent -> 502 "TASKS.md missing TASKS json block". No regex fallback.
Tiger always emits this block on every TASKS.md write.
### SQLite Tasks and Projects (legacy, used for dispatch queue)
GET /tiger/tasks list tasks
GET /tiger/tasks/:id get task
PUT /tiger/tasks/:id update task
DELETE /tiger/tasks/:id delete task
POST /tiger/tasks/:id/execute enqueue for execution
GET /tiger/projects list projects
POST /tiger/projects create project
GET /tiger/projects/:id get project
PUT /tiger/projects/:id update project
DELETE /tiger/projects/:id delete project
GET /tiger/projects/:id/tasks tasks in project
POST /tiger/projects/:id/tasks add task to project
### Agents and Workspace
GET /tiger/agents list configured agents
GET /tiger/agents/:id/files list agent workspace files
GET /tiger/agents/:id/file read specific agent file
PUT /tiger/agents/:id/file write agent file
GET /tiger/agents/activity recent agent activity log
GET /tiger/workspace list workspace root files
GET /tiger/files/:path read workspace file by path
### Chat (SSE streaming)
POST /tiger/chat SSE stream chat -> Tiger agent
GET /tiger/chat/history recent messages (SQLite)
DELETE /tiger/chat/history clear history
POST /tiger/chat/persist persist message to SQLite
Shell safety: tempfile pattern (not string interpolation):
Write message -> /tmp/msg_ts.txt
docker cp /tmp/msg.txt tiger-openclaw:/tmp/msg.txt
docker exec openclaw agent -m "$(cat /tmp/msg.txt)"
### Dispatch
POST /tiger/dispatch enqueue task -> SQLite + agent inbox file
GET /tiger/dispatch/status/:id poll execution status
### Cron
GET /tiger/cron list jobs.json
POST /tiger/cron/:id/run fire job manually
### Notifications and Routing
POST /tiger/notify send Telegram msg {message, chatId?}
POST /tiger/route-task LLM router: which agent handles this?
### Keys
GET /tiger/keys presence map only (no values returned)
PATCH /tiger/keys upsert a key
DELETE /tiger/keys/:name remove a key
### Ops
POST /tiger/exec run command in container (auth-gated)
POST /tiger/restart restart tiger-openclaw
POST /tiger/deploy-dashboard git pull + build + restart dashboard
ALL /api/gateway proxy to OpenClaw gateway port 18789
---
## 4. Data Flows
### Chat Message
Browser -> POST /tiger/chat (SSE)
bridge writes message -> /tmp/msg_ts.txt
docker cp -> tiger-openclaw:/tmp/msg_ts.txt
docker exec openclaw agent --session-id id -m "$(cat /tmp/msg.txt)"
OpenClaw -> MiniMax (or fallback chain)
SSE tokens -> bridge -> browser
POST /tiger/chat/persist -> SQLite messages
### Cron Job Notification
OpenClaw cron (hourly, IST)
Tiger reads TASKS.md from workspace
if active tasks:
curl POST http://172.17.0.1:3456/tiger/notify
Authorization: Bearer TOKEN
body: {message: status update}
bridge -> Telegram Bot API -> @Tiger_4321_bot -> Manohar
if HEARTBEAT_OK:
nothing sent
---
## 5. Failure Modes
| Scenario | What happens | Recovery |
|----------|-------------|----------|
| MiniMax timeout >90s | Falls to openrouter/auto | Automatic |
| OpenRouter billing error | Falls to trinity-large:free | Automatic |
| All LLMs fail | Chat 500; cron errors | Check /tiger/keys; top up credits |
| tiger-openclaw dies | 500 on exec routes | docker restart tiger-openclaw |
| Bridge EADDRINUSE | systemd restart fails (stale nohup) | pkill -f node.*dist/index then start |
| SQLite locked | Dispatch write contention | Retryable; rare |
| ChunkLoadError | Build ran while next start was live | systemctl restart tiger-dashboard |
| Traefik bridge IP change | UFW FORWARD drops traffic | Use subnet rules not specific IPs |
| TASKS.md missing JSON block | /tiger/file-tasks returns 502 | Tiger rewrites TASKS.md |
---
## 6. Deploy Workflow
On Mac:
cd ~/MyProjects/NemoClawDashboard
npm run build # preflight: catch errors locally first
git add -p # atomic commits, no git add -A
git push origin main
On server (scripts/deploy.sh):
cd /root/OpenClawDashboard && git pull
cd bridge && npx tsc --noEmit && npm run build
systemctl restart tiger-bridge
cd ../dashboard
systemctl stop tiger-dashboard
npm run build
systemctl start tiger-dashboard
bash /root/OpenClawDashboard/scripts/smoke-test.sh
Mutagen: pause before server-side edits, resume after verifying build.
Bind-mount perms: chown -R 1000:1000 /root/OpenClawDashboard
---
## 7. File Layout
/root/OpenClawDashboard/ canonical source (has .git)
/root/NemoClawDashboard/ HOLLOW / WRONG -- never use
~/MyProjects/NemoClawDashboard Mac-side Mutagen source
bridge/src/
index.ts entry point; full route list in file header comment
auth.ts bearer token middleware
tiger.ts docker exec wrapper; SSH prefix for local dev
db.ts SQLite schema + helpers
lib/llm.ts LLM routing + model fallback chain
lib/telegram.ts Telegram Bot API client (tempfile pattern)
routes/ one file per route group (40+ routes)
dashboard/src/
app/ Next.js App Router pages
components/ React components
scripts/smoke-test.sh run after every deploy
ARCHITECTURE.md this file
/opt/tiger/docker-compose.yml OpenClaw container definition
/var/lib/docker/volumes/tiger_tiger-config/_data/
openclaw.json live config
*.bak.json auto-backups (keep latest 3)
cron/jobs.json cron job definitions
---
Timeout budget rationale: agent turns on this RAM-starved host can take
minutes; 300s is the ceiling that made chronically-failing jobs pass.
## 8. Security Posture
UFW: 22, 80, 443 open publicly.
3456 (bridge) only from Docker bridge subnets.
3000 (Dokploy), 3100 (dashboard) not directly exposed -- only via Traefik.
- Bridge: Bearer auth on all routes; token in `bridge/.env` +
`dashboard/.env.local` + embedded in cron payloads (rotate all four
together — `jobs.json` has it twice). Rotated 2026-06-10 after the old
token leaked via a hardcode in `agents-activity.ts` to the public GitHub
mirror. NEVER hardcode tokens in source: this repo mirrors publicly.
- Git: Forgejo (origin, SSH port 2222, key `id_ed25519_forgejo`) + GitHub
mirror. Push both.
- position-tracker binds 127.0.0.1:3457; public access via Traefik at
angel.manohargupta.com.
- Known weak spots: litellm-db password, `/opt/dashboard` fossil with a
stale token, dual Telegram pollers (bridge poller should be disabled).
Bearer token: 64-char hex. Never logged, never sent to browser. Rotate via bridge/.env.
Traefik BasicAuth: bcrypt, single $ in YAML files. Realm: Tiger Command Center.
OpenClaw gateway: bind: lan (Docker bridge only). Token in openclaw.json.
/tiger/exec: auth-gated. Arbitrary command execution requires bearer token.
/tiger/keys GET: presence map only. Key values never returned by any endpoint.
## 9. Known Constraints
- **RAM**: ~13GB workload on 8GB physical; 6+GB swap in steady state. This
is the root cause of historical cron timeouts and the reason spawn
concurrency is 1. Decision pending: evict homelab services vs upgrade.
- OpenClaw v2026.3.12 predates MiniMax-M3, hence the explicit
`litellm/minimax-3` provider-prefixed model id.

121
README.md
View file

@ -1,91 +1,60 @@
# Clawd Agent Dashboard
# Tiger Command Center
> A premium, dark-mode "Command Center" for the Clawd AI Agent.
> Self-hosted AI orchestration: one Tiger, four specialists, every action audited.
![Dashboard Preview](https://via.placeholder.com/800x400?text=Clawd+Dashboard+Preview)
The control plane for **Tiger**, an OpenClaw-based AI agent running on a
Hetzner VPS, reachable at `agent.manohargupta.com`. Tiger orchestrates four
specialist sub-agents — **Cody** (code), **Ethan** (research), **Cathy**
(writing), **Elon** (planning) — handles Telegram, watches Angel One
positions, and drains a TASKS.md inbox while you do real work.
## Overview
## What lives here
The **Clawd Dashboard** is a centralized interface designed to monitor and interact with the Clawd AI agent. It provides real-time visibility into the agent's memory, logs, scheduled tasks (cron jobs), and capabilities (skills), all wrapped in a sleek, responsive UI.
| Path | What it is |
|---|---|
| `dashboard/` | Next.js 14 command center UI (`tiger-dashboard`, :3100) |
| `bridge/` | Express control-plane API (`tiger-bridge`, :3456, localhost-only) |
| `skills/` | OpenClaw skills (spawn-delegate, angel-positions, inbox-manager, sys-health, youtube-full) |
| `ARCHITECTURE.md` | The real system map — read this first |
| `TOOLS.md` | Tool/skill quick reference |
## Features
## Core capabilities
- **📊 System Status**: Real-time heartbeat monitoring of the `clawdbot` process.
- **🧠 Memory Management**: View and edit the agent's core memory (`MEMORY.md`) and daily logs.
- **🛠️ Skills Registry**: Browse, edit, and manage the agent's capabilities and MCP tools.
- **⏱️ Cron Jobs**: detailed view and control over scheduled background tasks.
- **💬 Chat Interface**: Integrated chat window to communicate directly with the agent.
- **🌗 Dark Mode**: Built with a "Slate & Violet" aesthetic optimized for low-light environments.
- **Sub-agent spawning**`POST /tiger/spawn` runs a specialist in an
isolated OpenClaw session; result lands on Telegram. Tracked in `executions`.
- **TASKS.md inbox** — drop `- [ ]` lines under `## 📥 INBOX`; the bridge
dispatches the top item to the right specialist every 30 min (920 IST).
- **Telegram mirror** — the homepage thread reads OpenClaw's native session
transcript: full history, both directions, perfectly in sync.
- **Audit trail**`/activity` merges spawns, cron runs, task lifecycle,
and outputs into one paginated, filterable timeline.
- **Own model gateway** — every model call routes through
`llm.manohargupta.com` (LiteLLM on own MiniMax/Anthropic keys). Primary:
MiniMax-M3.
## Tech Stack
## Running it
- **Framework**: [Next.js 14](https://nextjs.org/) (App Router)
- **UI Components**: [Shadcn/UI](https://ui.shadcn.com/) (Radix Primitives)
- **Styling**: [Tailwind CSS](https://tailwindcss.com/)
- **Icons**: [Lucide React](https://lucide.dev/)
- **State Management**: [SWR](https://swr.vercel.app/) / React Query
- **Backend**: Next.js API Routes (Serverless)
## Getting Started
### Prerequisites
- Node.js 18+
- npm or pnpm
### Installation
1. Clone the repository:
```bash
git clone https://github.com/manohar6839/clawd-dashboard.git
cd clawd-dashboard
```
2. Install dependencies:
```bash
npm install
cd dashboard && npm install
```
3. Configure Environment:
- Copy example configs:
```bash
cp config/mcporter.example.json config/mcporter.json
cp config/cron.example.json config/cron.json
```
### Running the Dashboard
Start the development server:
Both services are systemd units on the host:
```bash
npm run dashboard
systemctl restart tiger-bridge # Express via tsx — no build step
cd dashboard && npm run build && systemctl restart tiger-dashboard
```
The dashboard will be available at [http://localhost:3000](http://localhost:3000).
Env contracts:
- `bridge/.env``TIGER_BRIDGE_TOKEN`, `LLM_GATEWAY_URL`, `LLM_GATEWAY_KEY`,
`TIGER_ROUTER_MODEL`, Telegram credentials
- `dashboard/.env.local``TIGER_BRIDGE_URL`, `TIGER_BRIDGE_TOKEN`
## Project Structure
⚠️ The bridge token is also embedded in OpenClaw cron payloads
(`cron/jobs.json`, twice). Rotate all four locations together.
## Git
Forgejo is canonical (`git.manohargupta.com/manohar/OpenClawDashboard`, SSH
port 2222); GitHub (`manohar6839/NemoClawDashboard`) is a **public** mirror —
never commit secrets. Push to both:
```bash
git push origin main && git push github main
```
clawd/
├── .agent/ # Agent self-knowledge & documentation
├── dashboard/ # Next.js Application
│ ├── src/app/ # App Router Pages (Memory, Skills, Cron, Chat)
│ └── src/components/ # Shared UI Components
├── config/ # Agent configuration (Cron, MCP)
├── memory/ # Agent daily logs
├── tools/ # External tool scripts
└── MEMORY.md # Core Agent Memory
```
## Contributing
1. Fork the repository.
2. Create a feature branch (`git checkout -b feature/amazing-feature`).
3. Commit your changes (`git commit -m 'feat: Add amazing feature'`).
4. Push to the branch (`git push origin feature/amazing-feature`).
5. Open a Pull Request.
## License
MIT © [Manohar Air](https://github.com/manohar6839)

View file

@ -14,3 +14,22 @@
- **Location**: `skills/youtube-full`
- **Capabilities**: Search videos, get transcripts, monitor channels/playlists.
- **Reference**: Read `skills/youtube-full/SKILL.md` for instructions.
## Specialist Delegation
- **Skill**: `spawn-delegate` — hand work to Cody/Ethan/Cathy/Elon via the
bridge; results arrive on Telegram. Read `skills/spawn-delegate/SKILL.md`.
## Trading Positions
- **Skill**: `angel-positions` — read-only live P&L from
`angel.manohargupta.com/api/positions`. Never executes trades.
## Task Inbox
- **Skill**: `inbox-manager` — add/list/drain `## 📥 INBOX` items in
TASKS.md; the bridge auto-dispatches the top item every 30 min.
## System Health
- **Skill**: `sys-health` — host RAM/swap from `/proc/meminfo`, LLM gateway
liveliness, bridge status, recent audit events.