diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index f91e052..a266c26 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,15 +1,17 @@ # Tiger Command Center — Architecture -*Last updated: 2026-05-03. Covers all services through the hardening session.* +*Last updated: 2026-06-10. Covers the gateway migration, real sub-agent +spawning, the TASKS.md inbox loop, the Telegram transcript mirror, and the +unified audit trail.* --- ## 1. System Overview -Self-hosted AI agent orchestration on a Hetzner VPS (77.42.82.225, 8 GB RAM, Helsinki). -Three host services + one containerised AI runtime behind Traefik. - -Topology: +Self-hosted AI agent orchestration on a Hetzner VPS (8 GB RAM, Helsinki; +Tailscale 100.75.128.45). Three host services + one containerised AI +runtime behind Traefik, with ALL model traffic routed through a self-hosted +LiteLLM gateway — no third-party balance can silently kill the system. ``` Internet/Manohar @@ -18,275 +20,120 @@ Internet/Manohar dokploy-traefik (v3.6.7) | +-- agent.manohargupta.com --> tiger-dashboard (Next.js, :3100) - | | - | tiger-bridge (Express, :3456, 127.0.0.1 only) - | | docker exec + | | /api/* proxies (token server-side) + | v + | tiger-bridge (Express+tsx, :3456, localhost) + | | docker exec / volume reads + | v | tiger-openclaw (OpenClaw v2026.3.12) | | - | MiniMax-M2.7 -> openrouter/auto -> trinity:free + +-- llm.manohargupta.com ----> litellm-gateway <-- ALL model calls + | |-- MiniMax API (own key): minimax-3 (primary), + | | minimax-2.7, minimax-2.7-fast + | +-- Anthropic API (own key): claude-haiku, claude-sonnet | -Telegram @Tiger_4321_bot <-- /tiger/notify <-- Tiger agent + +-- angel.manohargupta.com --> position-tracker (standalone repo/deploy) + | +Telegram @Tiger_4321_bot <--> OpenClaw native channel (long-polling, owns the bot) ``` ---- +## 2. Model Routing (post-OpenRouter) -## 2. Services +OpenRouter was removed 2026-06-10 after its credits ran dry and silently +broke both Tiger and the bridge's classifier. Everything now goes through +the self-hosted gateway: -### 2.1 tiger-openclaw (Docker container) +- **OpenClaw** (`openclaw.json`): custom provider `litellm` + (`baseUrl: https://llm.manohargupta.com/v1`, `api: openai-completions`). + Primary `litellm/minimax-3` (1M ctx), fallbacks `litellm/minimax-2.7` → + `litellm/claude-haiku` (cross-provider: survives a MiniMax outage). +- **Bridge** (`lib/llm.ts`): slugs starting `anthropic/` go to Anthropic + direct; everything else goes to the gateway. Env: `LLM_GATEWAY_URL`, + `LLM_GATEWAY_KEY`, `TIGER_ROUTER_MODEL` (default `minimax-3`). +- **Gateway config**: `/root/litellm/litellm_config.yaml` + (`request_timeout: 300` to match the cron budget). -| Property | Value | -|----------|-------| -| Image | ghcr.io/openclaw/openclaw:2026.3.12 | -| Container | tiger-openclaw | -| User | node (uid=1000) | -| Config | /home/node/.openclaw/openclaw.json | -| Workspace | /home/node/.openclaw/workspace/ | -| Volumes | tiger-config, tiger-workspace | -| Bind mount | /root/OpenClawDashboard -> /home/node/dashboard:rw | -| Compose | /opt/tiger/docker-compose.yml | +## 3. Sub-Agent Execution (the orchestration layer) -Agents: Tiger (orchestrator), Cody (coder), Ethan (researcher), Cathy (writer), Elon (PM). +`bridge/src/lib/agents.ts` is the canonical specialist registry: +**cody** (code), **ethan** (research), **cathy** (writing), **elon** (PM). +Legacy ids coder/researcher/writer/pm are accepted as aliases. -Model chain (agents.defaults.model in openclaw.json): - primary : minimax/MiniMax-M2.7 - fallback1: openrouter/auto - fallback2: openrouter/arcee-ai/trinity-large-preview:free (free - billing safety net) +A spawn (`POST /tiger/spawn`) runs an isolated OpenClaw session +(`--session-id spawn--`) with the specialist persona prepended. +Message transport is docker-cp of a temp file (escaping-proof). Runs are +tracked in the `executions` table and serialized (`MAX_CONCURRENT=1` — +parallel turns push the 8GB host into swap and everything times out). +Completion fires a Telegram notification via `/tiger/notify`. -Cron jobs (cron/jobs.json): - Tiger: Hourly Task Check-in 0 * * * * IST 90s timeout - Tiger: Weekly Digest 0 9 * * 1 IST 90s timeout +Upgrade path: define real per-agent entries in `openclaw.json agents.list` +(own IDENTITY.md + workspace each), then change the `--agent` flag in +spawn.ts. Documented in lib/agents.ts; deferred until the RAM situation is +resolved. -Both use delivery.mode="none" — they notify via curl to /tiger/notify, not OpenClaw delivery channel. - "none" = no channel opened at all (correct: cron delivers via curl) - "silent" = suppresses chat display but still opens the channel (wrong model for cron) +## 4. TASKS.md Inbox Loop -### 2.2 tiger-bridge (systemd: tiger-bridge.service) +`workspace/TASKS.md` has a `## 📥 INBOX` section. `bridge/src/lib/inbox.ts` +checks every 30 min (09:00–20:00 IST): takes the first `- [ ]` line, +classifies it (`classifyAgent`), spawns the specialist, rewrites the line to +`- [⏳ run-id → agent]`. Manual trigger: `POST /tiger/inbox/drain`. +Bridge-side scheduling means zero model tokens burned on empty checks and +no bearer tokens embedded in cron prompts. - Language : TypeScript/Express -> bridge/dist/ - Port : 3456, 127.0.0.1 only (UFW blocks public access) - Source : /root/OpenClawDashboard/bridge/src/ - Auth : Authorization: Bearer TIGER_BRIDGE_TOKEN (all routes) - SQLite : /root/OpenClawDashboard/bridge/tiger.db - Tables : tasks, projects, messages (chat history), agents +## 5. Telegram -Token shared with: dashboard (server-side only), Tiger cron curl commands, Tiger env var. +- **The bot is owned by OpenClaw's native channel** (long-polling). The + bridge's `TelegramChannel`, `telegram-webhook.ts` and `chat-mirror.ts` + are legacy: Telegram forbids webhook + getUpdates on one token, so the + webhook design could never receive a message. +- **The dashboard mirror reads the native session transcript** — + `routes/chat-telegram.ts` resolves the `telegram:` session from + `sessions.json` and serves the JSONL with cursor pagination and mtime + caching. It filters to what Telegram actually saw: assistant messages + carrying toolCall blocks (working narration) are skipped, thinking blocks + ignored, injected metadata/system boilerplate stripped from user messages. -### 2.3 tiger-dashboard (systemd: tiger-dashboard.service) +## 6. Audit Trail - Framework : Next.js 14, App Router - Port : 3100 - URL : agent.manohargupta.com (via Traefik) - Source : /root/OpenClawDashboard/dashboard/src/ - WorkingDir : /root/OpenClawDashboard/dashboard +`GET /tiger/activity/audit` merges, at read time, every durable action +store: `executions` (spawns), `tasks` (lifecycle), `outputs` (artifacts), +and OpenClaw's cron run JSONL. Cursor-paginated (`before=`), type +filters. The dashboard `/activity` page adds recent file-modification +events on the first page. Read-time merging means history is complete +retroactively and no action can happen without its audit row. -All API calls are server-side route handlers — bearer token never reaches the browser. +## 7. Crons (OpenClaw, tz Asia/Kolkata) -Build discipline: NEVER run npm run build while next start is live. -In-memory and on-disk manifests split-brain -> ChunkLoadError in browser. Correct: - systemctl stop tiger-dashboard - npm run build - systemctl start tiger-dashboard +| Job | Schedule | Timeout | +|---|---|---| +| Trade Baseline Reset | 9:15 daily | 60s | +| Trade P&L Monitor | every 2 min | 60s | +| Hourly Trade Summary + News | hourly | 90s | +| Hourly Task Check-in | 0 9-21 | 300s | +| EOD Trade Summary | 16:00 Mon–Fri | 300s | +| Weekly Digest | Mon 9:00 | 300s | -### 2.4 Traefik (dokploy-traefik v3.6.7) - -File provider: /etc/dokploy/traefik/dynamic/ (host = container path, live reload). -One .yml file per service. No restart needed on edits. - -BasicAuth: single $ in bcrypt hash in YAML (not $$ — that is Docker label syntax). -Generate: htpasswd -nbB manohar 'password' - -UFW FORWARD — use subnet rules, not specific IPs (bridge IP changes on Traefik restart): - ufw route allow proto tcp from any to 172.17.0.0/16 port 80 - ufw route allow proto tcp from any to 172.17.0.0/16 port 443 - ---- - -## 3. Full API Surface (40+ routes, all Bearer-token protected) - -### Health - GET /tiger/status container health, memory/CPU - GET /tiger/logs SSE stream of container logs - -### Config - GET /tiger/config read openclaw.json - POST /tiger/config update openclaw.json - GET /tiger/config/models list LLM providers + models - GET /tiger/config/models/agents per-agent model overrides - PATCH /tiger/config/models/agents/:id update agent model - -### File-Backed Tasks and Projects (canonical source of truth) - GET /tiger/file-tasks TASKS.md JSON block -> tasks[] - GET /tiger/file-tasks/active in-progress + pending-action only - GET /tiger/file-tasks/completed completed section only - GET /tiger/file-tasks/projects PROJECTS.md JSON block -> projects[] - - Parser contract: TASKS.md must contain a fenced json TASKS block at end-of-file. - Absent -> 502 "TASKS.md missing TASKS json block". No regex fallback. - Tiger always emits this block on every TASKS.md write. - -### SQLite Tasks and Projects (legacy, used for dispatch queue) - GET /tiger/tasks list tasks - GET /tiger/tasks/:id get task - PUT /tiger/tasks/:id update task - DELETE /tiger/tasks/:id delete task - POST /tiger/tasks/:id/execute enqueue for execution - GET /tiger/projects list projects - POST /tiger/projects create project - GET /tiger/projects/:id get project - PUT /tiger/projects/:id update project - DELETE /tiger/projects/:id delete project - GET /tiger/projects/:id/tasks tasks in project - POST /tiger/projects/:id/tasks add task to project - -### Agents and Workspace - GET /tiger/agents list configured agents - GET /tiger/agents/:id/files list agent workspace files - GET /tiger/agents/:id/file read specific agent file - PUT /tiger/agents/:id/file write agent file - GET /tiger/agents/activity recent agent activity log - GET /tiger/workspace list workspace root files - GET /tiger/files/:path read workspace file by path - -### Chat (SSE streaming) - POST /tiger/chat SSE stream chat -> Tiger agent - GET /tiger/chat/history recent messages (SQLite) - DELETE /tiger/chat/history clear history - POST /tiger/chat/persist persist message to SQLite - - Shell safety: tempfile pattern (not string interpolation): - Write message -> /tmp/msg_ts.txt - docker cp /tmp/msg.txt tiger-openclaw:/tmp/msg.txt - docker exec openclaw agent -m "$(cat /tmp/msg.txt)" - -### Dispatch - POST /tiger/dispatch enqueue task -> SQLite + agent inbox file - GET /tiger/dispatch/status/:id poll execution status - -### Cron - GET /tiger/cron list jobs.json - POST /tiger/cron/:id/run fire job manually - -### Notifications and Routing - POST /tiger/notify send Telegram msg {message, chatId?} - POST /tiger/route-task LLM router: which agent handles this? - -### Keys - GET /tiger/keys presence map only (no values returned) - PATCH /tiger/keys upsert a key - DELETE /tiger/keys/:name remove a key - -### Ops - POST /tiger/exec run command in container (auth-gated) - POST /tiger/restart restart tiger-openclaw - POST /tiger/deploy-dashboard git pull + build + restart dashboard - ALL /api/gateway proxy to OpenClaw gateway port 18789 - ---- - -## 4. Data Flows - -### Chat Message - - Browser -> POST /tiger/chat (SSE) - bridge writes message -> /tmp/msg_ts.txt - docker cp -> tiger-openclaw:/tmp/msg_ts.txt - docker exec openclaw agent --session-id id -m "$(cat /tmp/msg.txt)" - OpenClaw -> MiniMax (or fallback chain) - SSE tokens -> bridge -> browser - POST /tiger/chat/persist -> SQLite messages - -### Cron Job Notification - - OpenClaw cron (hourly, IST) - Tiger reads TASKS.md from workspace - if active tasks: - curl POST http://172.17.0.1:3456/tiger/notify - Authorization: Bearer TOKEN - body: {message: status update} - bridge -> Telegram Bot API -> @Tiger_4321_bot -> Manohar - if HEARTBEAT_OK: - nothing sent - ---- - -## 5. Failure Modes - -| Scenario | What happens | Recovery | -|----------|-------------|----------| -| MiniMax timeout >90s | Falls to openrouter/auto | Automatic | -| OpenRouter billing error | Falls to trinity-large:free | Automatic | -| All LLMs fail | Chat 500; cron errors | Check /tiger/keys; top up credits | -| tiger-openclaw dies | 500 on exec routes | docker restart tiger-openclaw | -| Bridge EADDRINUSE | systemd restart fails (stale nohup) | pkill -f node.*dist/index then start | -| SQLite locked | Dispatch write contention | Retryable; rare | -| ChunkLoadError | Build ran while next start was live | systemctl restart tiger-dashboard | -| Traefik bridge IP change | UFW FORWARD drops traffic | Use subnet rules not specific IPs | -| TASKS.md missing JSON block | /tiger/file-tasks returns 502 | Tiger rewrites TASKS.md | - ---- - -## 6. Deploy Workflow - - On Mac: - cd ~/MyProjects/NemoClawDashboard - npm run build # preflight: catch errors locally first - git add -p # atomic commits, no git add -A - git push origin main - - On server (scripts/deploy.sh): - cd /root/OpenClawDashboard && git pull - cd bridge && npx tsc --noEmit && npm run build - systemctl restart tiger-bridge - cd ../dashboard - systemctl stop tiger-dashboard - npm run build - systemctl start tiger-dashboard - bash /root/OpenClawDashboard/scripts/smoke-test.sh - - Mutagen: pause before server-side edits, resume after verifying build. - Bind-mount perms: chown -R 1000:1000 /root/OpenClawDashboard - ---- - -## 7. File Layout - - /root/OpenClawDashboard/ canonical source (has .git) - /root/NemoClawDashboard/ HOLLOW / WRONG -- never use - ~/MyProjects/NemoClawDashboard Mac-side Mutagen source - - bridge/src/ - index.ts entry point; full route list in file header comment - auth.ts bearer token middleware - tiger.ts docker exec wrapper; SSH prefix for local dev - db.ts SQLite schema + helpers - lib/llm.ts LLM routing + model fallback chain - lib/telegram.ts Telegram Bot API client (tempfile pattern) - routes/ one file per route group (40+ routes) - - dashboard/src/ - app/ Next.js App Router pages - components/ React components - - scripts/smoke-test.sh run after every deploy - ARCHITECTURE.md this file - - /opt/tiger/docker-compose.yml OpenClaw container definition - - /var/lib/docker/volumes/tiger_tiger-config/_data/ - openclaw.json live config - *.bak.json auto-backups (keep latest 3) - cron/jobs.json cron job definitions - ---- +Timeout budget rationale: agent turns on this RAM-starved host can take +minutes; 300s is the ceiling that made chronically-failing jobs pass. ## 8. Security Posture - UFW: 22, 80, 443 open publicly. - 3456 (bridge) only from Docker bridge subnets. - 3000 (Dokploy), 3100 (dashboard) not directly exposed -- only via Traefik. +- Bridge: Bearer auth on all routes; token in `bridge/.env` + + `dashboard/.env.local` + embedded in cron payloads (rotate all four + together — `jobs.json` has it twice). Rotated 2026-06-10 after the old + token leaked via a hardcode in `agents-activity.ts` to the public GitHub + mirror. NEVER hardcode tokens in source: this repo mirrors publicly. +- Git: Forgejo (origin, SSH port 2222, key `id_ed25519_forgejo`) + GitHub + mirror. Push both. +- position-tracker binds 127.0.0.1:3457; public access via Traefik at + angel.manohargupta.com. +- Known weak spots: litellm-db password, `/opt/dashboard` fossil with a + stale token, dual Telegram pollers (bridge poller should be disabled). - Bearer token: 64-char hex. Never logged, never sent to browser. Rotate via bridge/.env. - Traefik BasicAuth: bcrypt, single $ in YAML files. Realm: Tiger Command Center. - OpenClaw gateway: bind: lan (Docker bridge only). Token in openclaw.json. - /tiger/exec: auth-gated. Arbitrary command execution requires bearer token. - /tiger/keys GET: presence map only. Key values never returned by any endpoint. +## 9. Known Constraints + +- **RAM**: ~13GB workload on 8GB physical; 6+GB swap in steady state. This + is the root cause of historical cron timeouts and the reason spawn + concurrency is 1. Decision pending: evict homelab services vs upgrade. +- OpenClaw v2026.3.12 predates MiniMax-M3, hence the explicit + `litellm/minimax-3` provider-prefixed model id. diff --git a/README.md b/README.md index 8b8f5a7..5d99b06 100644 --- a/README.md +++ b/README.md @@ -1,91 +1,60 @@ -# Clawd Agent Dashboard +# Tiger Command Center -> A premium, dark-mode "Command Center" for the Clawd AI Agent. +> Self-hosted AI orchestration: one Tiger, four specialists, every action audited. -![Dashboard Preview](https://via.placeholder.com/800x400?text=Clawd+Dashboard+Preview) +The control plane for **Tiger**, an OpenClaw-based AI agent running on a +Hetzner VPS, reachable at `agent.manohargupta.com`. Tiger orchestrates four +specialist sub-agents — **Cody** (code), **Ethan** (research), **Cathy** +(writing), **Elon** (planning) — handles Telegram, watches Angel One +positions, and drains a TASKS.md inbox while you do real work. -## Overview +## What lives here -The **Clawd Dashboard** is a centralized interface designed to monitor and interact with the Clawd AI agent. It provides real-time visibility into the agent's memory, logs, scheduled tasks (cron jobs), and capabilities (skills), all wrapped in a sleek, responsive UI. +| Path | What it is | +|---|---| +| `dashboard/` | Next.js 14 command center UI (`tiger-dashboard`, :3100) | +| `bridge/` | Express control-plane API (`tiger-bridge`, :3456, localhost-only) | +| `skills/` | OpenClaw skills (spawn-delegate, angel-positions, inbox-manager, sys-health, youtube-full) | +| `ARCHITECTURE.md` | The real system map — read this first | +| `TOOLS.md` | Tool/skill quick reference | -## Features +## Core capabilities -- **📊 System Status**: Real-time heartbeat monitoring of the `clawdbot` process. -- **🧠 Memory Management**: View and edit the agent's core memory (`MEMORY.md`) and daily logs. -- **🛠️ Skills Registry**: Browse, edit, and manage the agent's capabilities and MCP tools. -- **⏱️ Cron Jobs**: detailed view and control over scheduled background tasks. -- **💬 Chat Interface**: Integrated chat window to communicate directly with the agent. -- **🌗 Dark Mode**: Built with a "Slate & Violet" aesthetic optimized for low-light environments. +- **Sub-agent spawning** — `POST /tiger/spawn` runs a specialist in an + isolated OpenClaw session; result lands on Telegram. Tracked in `executions`. +- **TASKS.md inbox** — drop `- [ ]` lines under `## 📥 INBOX`; the bridge + dispatches the top item to the right specialist every 30 min (9–20 IST). +- **Telegram mirror** — the homepage thread reads OpenClaw's native session + transcript: full history, both directions, perfectly in sync. +- **Audit trail** — `/activity` merges spawns, cron runs, task lifecycle, + and outputs into one paginated, filterable timeline. +- **Own model gateway** — every model call routes through + `llm.manohargupta.com` (LiteLLM on own MiniMax/Anthropic keys). Primary: + MiniMax-M3. -## Tech Stack +## Running it -- **Framework**: [Next.js 14](https://nextjs.org/) (App Router) -- **UI Components**: [Shadcn/UI](https://ui.shadcn.com/) (Radix Primitives) -- **Styling**: [Tailwind CSS](https://tailwindcss.com/) -- **Icons**: [Lucide React](https://lucide.dev/) -- **State Management**: [SWR](https://swr.vercel.app/) / React Query -- **Backend**: Next.js API Routes (Serverless) - -## Getting Started - -### Prerequisites - -- Node.js 18+ -- npm or pnpm - -### Installation - -1. Clone the repository: - ```bash - git clone https://github.com/manohar6839/clawd-dashboard.git - cd clawd-dashboard - ``` - -2. Install dependencies: - ```bash - npm install - cd dashboard && npm install - ``` - -3. Configure Environment: - - Copy example configs: - ```bash - cp config/mcporter.example.json config/mcporter.json - cp config/cron.example.json config/cron.json - ``` - -### Running the Dashboard - -Start the development server: +Both services are systemd units on the host: ```bash -npm run dashboard +systemctl restart tiger-bridge # Express via tsx — no build step +cd dashboard && npm run build && systemctl restart tiger-dashboard ``` -The dashboard will be available at [http://localhost:3000](http://localhost:3000). +Env contracts: +- `bridge/.env` — `TIGER_BRIDGE_TOKEN`, `LLM_GATEWAY_URL`, `LLM_GATEWAY_KEY`, + `TIGER_ROUTER_MODEL`, Telegram credentials +- `dashboard/.env.local` — `TIGER_BRIDGE_URL`, `TIGER_BRIDGE_TOKEN` -## Project Structure +⚠️ The bridge token is also embedded in OpenClaw cron payloads +(`cron/jobs.json`, twice). Rotate all four locations together. +## Git + +Forgejo is canonical (`git.manohargupta.com/manohar/OpenClawDashboard`, SSH +port 2222); GitHub (`manohar6839/NemoClawDashboard`) is a **public** mirror — +never commit secrets. Push to both: + +```bash +git push origin main && git push github main ``` -clawd/ -├── .agent/ # Agent self-knowledge & documentation -├── dashboard/ # Next.js Application -│ ├── src/app/ # App Router Pages (Memory, Skills, Cron, Chat) -│ └── src/components/ # Shared UI Components -├── config/ # Agent configuration (Cron, MCP) -├── memory/ # Agent daily logs -├── tools/ # External tool scripts -└── MEMORY.md # Core Agent Memory -``` - -## Contributing - -1. Fork the repository. -2. Create a feature branch (`git checkout -b feature/amazing-feature`). -3. Commit your changes (`git commit -m 'feat: Add amazing feature'`). -4. Push to the branch (`git push origin feature/amazing-feature`). -5. Open a Pull Request. - -## License - -MIT © [Manohar Air](https://github.com/manohar6839) diff --git a/TOOLS.md b/TOOLS.md index 72e2d60..ad00b4c 100644 --- a/TOOLS.md +++ b/TOOLS.md @@ -13,4 +13,23 @@ - **Skill**: `youtube-full` - **Location**: `skills/youtube-full` - **Capabilities**: Search videos, get transcripts, monitor channels/playlists. -- **Reference**: Read `skills/youtube-full/SKILL.md` for instructions. \ No newline at end of file +- **Reference**: Read `skills/youtube-full/SKILL.md` for instructions. +## Specialist Delegation + +- **Skill**: `spawn-delegate` — hand work to Cody/Ethan/Cathy/Elon via the + bridge; results arrive on Telegram. Read `skills/spawn-delegate/SKILL.md`. + +## Trading Positions + +- **Skill**: `angel-positions` — read-only live P&L from + `angel.manohargupta.com/api/positions`. Never executes trades. + +## Task Inbox + +- **Skill**: `inbox-manager` — add/list/drain `## 📥 INBOX` items in + TASKS.md; the bridge auto-dispatches the top item every 30 min. + +## System Health + +- **Skill**: `sys-health` — host RAM/swap from `/proc/meminfo`, LLM gateway + liveliness, bridge status, recent audit events.