Deployment Checklist
This runbook is configuration-driven. Do not hardcode state/log locations in operator commands.
Runtime Prerequisites
- Node.js 20+ (22.x recommended)
- npm
orchestrator_config.jsonpresent and valid JSON
Resolve Runtime Paths First
Run from the repo root:
STATE_FILE=$(jq -r '.stateFile' orchestrator_config.json)
LOGS_DIR=$(jq -r '.logsDir' orchestrator_config.json)
DOCS_PATH=$(jq -r '.docsPath' orchestrator_config.json)
COOKBOOK_PATH=$(jq -r '.cookbookPath' orchestrator_config.json)
echo "STATE_FILE=$STATE_FILE"
echo "LOGS_DIR=$LOGS_DIR"
echo "DOCS_PATH=$DOCS_PATH"
echo "COOKBOOK_PATH=$COOKBOOK_PATH"Choose Deployment Mode (Non-Interchangeable)
Mode A: Official Public Docker Demo
Use the repo-root docker-compose.yml when you want the supported public container path:
docker compose -f docker-compose.yml config --services
docker compose -f docker-compose.yml up -d --build
docker compose -f docker-compose.yml ps
bash scripts/docker-demo-smoke.shOperational truth for Mode A:
- brings up
orchestrator,mongo, andredis - serves the canonical
/operatorconsole on127.0.0.1:4300 - ships demo-local auth/database/cache credentials directly in compose so a new user can boot without creating
orchestrator/.env - keeps host exposure localhost-only by default
- should be treated as a try-it path, not a production credential posture
- has a dedicated smoke path in
scripts/docker-demo-smoke.shand theDocker Demo SmokeGitHub Actions workflow
Before any shared or non-local deployment:
cp docker-compose.override.example.yml docker-compose.override.yml
# edit docker-compose.override.yml with real keys, passwords, and ports
docker compose -f docker-compose.yml up -d --buildMode B: Advanced Observability Compose
Use orchestrator/docker-compose.yml only when you intentionally need the heavier stack with Prometheus, Grafana, and Alertmanager:
cp orchestrator/.env.example orchestrator/.env
docker compose -f orchestrator/docker-compose.yml config --services
docker compose -f orchestrator/docker-compose.yml up -d --build
docker compose -f orchestrator/docker-compose.yml psOperational truth for Mode B:
- the orchestrator container is forced onto the local
mongoservice even if the host.envused outside Docker points at an external database - MongoDB and Redis image versions must stay compatible with any existing named volumes on the host
- this is the advanced stack, not the first-run public path
Do not run both modes simultaneously on the same host without port planning. If you already have named Docker volumes from newer MongoDB or Redis images, do not silently downgrade the images in compose. Keep the image versions compatible with the stored data or reset those volumes intentionally.
Pre-Deployment Checklist
- [ ]
npm --prefix orchestrator run build - [ ]
npm --prefix orchestrator run test:integration - [ ]
bash scripts/check-doc-drift.sh - [ ]
jq . orchestrator_config.json >/dev/null - [ ] Required env vars prepared for the selected mode:
- Mode A: override file created if you are replacing the built-in demo-local credentials or provider keys
- Mode B:
API_KEY_ROTATIONorAPI_KEY - Mode B:
WEBHOOK_SECRET - Mode B:
MONGO_USERNAME - Mode B:
MONGO_PASSWORD - Mode B:
REDIS_PASSWORD OPENCLAW_MODEL_PRICING_JSONif you want to override the built-in model pricing catalog for runtime cost accounting
Post-Deployment Verification
- [ ] API health:bash
curl -fsS http://127.0.0.1:4300/health # or for Mode B: # curl -fsS http://localhost:3000/health - [ ] Effective persistence target is understood:bash
echo "$STATE_FILE" if [[ "$STATE_FILE" == *:* ]]; then echo "runtime target is non-file-backed" else test -f "$STATE_FILE" && echo "state file present" fi - [ ] Recent task state is readable:bash
curl -fsS -H "Authorization: Bearer $API_KEY" \ "http://127.0.0.1:4300/api/tasks/runs?limit=1&includeInternal=true" | jq '.runs[0]' # or for Mode B: # curl -fsS -H "Authorization: Bearer $API_KEY" \ # "http://localhost:3000/api/tasks/runs?limit=1&includeInternal=true" | jq '.runs[0]' - [ ] Logs directory exists:bash
test -d "$LOGS_DIR" && ls -la "$LOGS_DIR" | head
Release-Proof Gaps
Keep this list visible during release work. These are not hypothetical risks; they are the main still-unproven surfaces after the current release-hardening passes.
- [ ]
reddit-responseprovider-success branch is proven live Bounded fixture coverage now proves thehybrid-polishedbranch and the fallback branches, but the latest live runtime evidence is still a429 quota exceededfallback rather than a successful provider-polished draft. - [x] Agent
serviceRunningis host-proven where claimed (2026-03-09; current host truth isserviceInstalledCount=0,serviceRunningCount=0)serviceAvailableis not the same thing as a running service process. - [x] Canonical non-fast-start launch is proven on port
3312Proven on2026-03-09: non-fast-start boot now reaches Mongo-backed persistence, KnowledgeIntegration, andHTTP server listening on port 3312. The synthetic orchestrator-local memory scheduler was removed on2026-03-21; crash-safe continuity now lives in the workspace root memory files instead of a local snapshot/consolidation side path. - [x] Public milestone and demand proof are orchestrator-owned and integration-proven The active runtime no longer depends on retired external ingest endpoints; direct integration coverage now exercises the public proof routes against seeded runtime state.
- [x] Redis / Valkey first-slice coordination is real The current runtime now uses shared coordination for queue execution claims, doc-repair locks/cooldowns, and reddit-helper shared budgets when
REDIS_URLis reachable;/api/persistence/healthexposes explicit degraded truth when coordination falls back to memory.
systemd Host Service Path
Use this when the host/runtime contract requires persistent service coverage. In the current runtime slice, that expectation is explicit for doc-specialist and reddit-helper; other agents remain worker-first unless the orchestrator truth changes.
- Unit examples are in
systemd/. - Keep
WorkingDirectoryand file paths consistent with your deployed checkout. - Prefer
journalctl -u orchestrator -ffor first-line diagnostics. - Verify runtime truth through
/api/agents/overviewand/api/health/extendedbefore treating a unit file as required service coverage.
Safe Rollback Plan
If deployment fails:
- Stop service/stack:bash
docker compose -f docker-compose.yml down # or for advanced mode: # docker compose -f orchestrator/docker-compose.yml down - Checkout a known-good tag or commit (non-destructive):bash
git fetch --tags git checkout <known-good-tag-or-sha> - Restore state from backup to configured state path:bash
if [[ "$STATE_FILE" == *:* ]]; then echo "restore the backing runtime store for $STATE_FILE using its native backup/import path" else cp /backup/orchestrator-state-latest.json "$STATE_FILE" fi - Rebuild and start selected mode again.
- Re-run post-deploy verification checks.
Operational Notes
- Use RUNBOOK_BOUNDARIES.md to keep mode decisions consistent.
- Use backup-recovery.md for state/config backup policy.
- The official public compose path is the repo-root demo stack on
4300. - The advanced observability stack lives under
orchestrator/docker-compose.yml. - Docker Compose in this repo is orchestrator-only. The retired
openclawdbotpackage is not part of either compose mode. - If docs and code conflict, code/config is canonical and docs must be updated in the same change set.