Skip to content

OPERATOR GUIDE

Service inventory (from docker compose config)

Computed service lists:

  • Public demo stack (docker compose -f docker-compose.yml config --services): orchestrator, mongo, redis
  • Advanced observability stack (docker compose -f orchestrator/docker-compose.yml config --services): mongo, redis, orchestrator, prometheus, grafana, alertmanager

The public first-run path is now the repo-root Docker stack, not the advanced observability profile.


1) What is running

A. Public demo stack

Service: orchestrator

  • Purpose: main API + scheduler + task runtime, served from the repo-root image build.
  • Image/build: built from repo root Dockerfile.
  • Ports: 127.0.0.1:4300:3000 so the demo path stays local-only and does not collide with the usual repo-native dev port.
  • Env vars:
    • NODE_ENV=production
    • LOG_LEVEL=info
    • ORCHESTRATOR_FAST_START=true
    • demo-local API_KEY_ROTATION, WEBHOOK_SECRET, DATABASE_URL, and REDIS_URL
    • GITHUB_ACTIONS_MONITOR_ENABLED=false by default in the demo stack
  • Volumes:
    • named volume for /workspace/logs
    • named volume for /workspace/orchestrator/data
  • Healthcheck: curl -fsS http://localhost:3000/health
  • Dependencies: waits for healthy mongo and redis

Service: mongo

  • Purpose: persistent database for state, audit trail, and run history in the public demo stack.
  • Image/build: mongo:7
  • Host ports: none published; reachable only inside the compose network.
  • Auth posture: demo-local root username and password, intended only for localhost try-outs.
  • Volumes: named data volume
  • Healthcheck: mongosh ... db.adminCommand('ping')

Service: redis

  • Purpose: cache and coordination store in the public demo stack.
  • Image/build: redis:7-alpine
  • Host ports: none published; reachable only inside the compose network.
  • Auth posture: demo-local password with --requirepass
  • Volumes: named append-only data volume
  • Healthcheck: redis-cli -a <password> ping

B. Advanced observability stack

Service: orchestrator

  • Purpose: main API + scheduler + task runtime with the heavier observability sidecars.
  • Image/build: built from repo root Dockerfile.
  • Ports: 3000:3000
  • Env vars:
    • NODE_ENV, LOG_LEVEL, PORT (runtime mode/logging/listen port)
    • OPENAI_API_KEY (secret), ANTHROPIC_API_KEY (secret) (LLM credentials)
    • DATABASE_URL (Mongo connection URI)
    • REDIS_URL (Redis connection URI)
    • PROMETHEUS_ENABLED, PROMETHEUS_PORT, GRAFANA_ENABLED, GRAFANA_PORT (monitoring toggles/ports)
  • Additional runtime-required secrets from app startup checks:
    • API_KEY (secret), WEBHOOK_SECRET (secret), MONGO_PASSWORD (secret), REDIS_PASSWORD (secret), MONGO_USERNAME (credential)
  • Volumes:
    • ./logs:/workspace/logs
    • ./data:/workspace/orchestrator/data
  • Healthcheck: curl -f http://localhost:3000/health
  • Dependencies: waits for healthy mongo and redis

Service: mongo

Service: redis

Service: prometheus

Service: grafana

Service: alertmanager


2) How to start/stop

Use the public demo stack:

  • Start: docker compose -f docker-compose.yml up -d --build
  • Logs: docker compose -f docker-compose.yml logs -f
  • Stop/remove containers + network: docker compose -f docker-compose.yml down
  • Stop/remove including named volumes: docker compose -f docker-compose.yml down -v
    • Data loss warning: removes the demo stack's named volumes, including demo Mongo, Redis, logs, and state.

Use the advanced observability stack:

  • Start: docker compose -f orchestrator/docker-compose.yml up -d --build
  • Logs: docker compose -f orchestrator/docker-compose.yml logs -f
  • Stop/remove containers + network: docker compose -f orchestrator/docker-compose.yml down
  • Stop/remove including data volumes: docker compose -f orchestrator/docker-compose.yml down -v
    • Data loss warning: deletes named persistent volumes mongo-data, redis-data, prometheus-data, grafana-data, alertmanager-data.

Related app scripts:


3) How to verify

For the public demo stack:

  • Orchestrator health: http://127.0.0.1:4300/health
    • Expected: JSON with status: "healthy" and endpoint hints for metrics/knowledge/persistence.
  • Operator console: http://127.0.0.1:4300/operator
    • Expected: login screen that accepts demo-operator-key-local-only.

For the advanced stack (host machine URLs):

  • Orchestrator health: http://localhost:3000/health

    • Expected: JSON with status: "healthy" and endpoint hints for metrics/knowledge/persistence.
  • Orchestrator knowledge summary: http://localhost:3000/api/knowledge/summary

    • Expected: JSON summary object; HTTP 200 means API path is serving.
  • Orchestrator persistence health: http://localhost:3000/api/persistence/health

    • Expected: JSON with DB health plus coordination status (redis or memory); HTTP 200 indicates the persistence and coordination check path succeeded.
  • Prometheus UI: http://localhost:9090

    • Expected: Prometheus web UI loads; target orchestrator:9100 should be up in Targets page.
  • Grafana UI: http://localhost:3001

    • Expected: Grafana login page; credentials from GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD.
  • Alertmanager UI/API: http://localhost:9093

    • Expected: Alertmanager status page/API response.

Internal-only metric endpoint (inside compose network/container):

  • http://orchestrator:9100/metrics (scraped by Prometheus) and http://orchestrator:9100/health.

4) Common failures

A. Port already in use

Symptoms:

  • bind: address already in use during compose startup.

Likely conflicting host ports:

Fix steps:

  1. Identify conflict: sudo lsof -i :3000,:3001,:9090,:9093,:27017,:6379.
  2. Stop conflicting process or change host-side port in compose.
  3. Re-run docker-compose ... up -d.

B. Containers restarting continuously

Symptoms:

  • docker-compose ps shows restart loop.

Potential causes:

Fix steps:

  1. Inspect logs: docker-compose -f orchestrator/docker-compose.yml logs -f orchestrator mongo redis.
  2. Verify required env vars are set (API_KEY, WEBHOOK_SECRET, MONGO_PASSWORD, REDIS_PASSWORD, MONGO_USERNAME).
  3. Confirm healthcheck endpoint/commands are reachable from within each container.

C. Mongo not ready

Symptoms:

  • Orchestrator waits/fails around DB connection or dependency health.

Evidence:

Fix steps:

  1. Ensure MONGO_USERNAME/MONGO_PASSWORD values are set and consistent.
  2. Check mongo logs: docker-compose -f orchestrator/docker-compose.yml logs -f mongo.
  3. If initial auth/bootstrap is corrupted, stop stack and evaluate whether to recreate mongo-data volume (destructive).

D. Missing env vars

Symptoms:

  • Fatal startup error indicating missing credentials.

Evidence:

Fix steps:

  1. Create/provide .env values for all required secrets and credentials.
  2. Confirm compose interpolation variables used by services (OPENAI_API_KEY, ANTHROPIC_API_KEY, MONGO_USERNAME, MONGO_PASSWORD, REDIS_PASSWORD, GRAFANA_PASSWORD). (orchestrator/docker-compose.yml, orchestrator/docker-compose.yml, orchestrator/docker-compose.yml, orchestrator/docker-compose.yml)
  3. Restart stack after env correction.

E. Volume permission issues

Symptoms:

  • Write failures for logs/data directories.

Evidence:

Fix steps:

  1. Ensure host directories exist and are writable by container user/group.
  2. On Linux, align ownership/permissions for mounted paths.
  3. Recreate problematic container after permission fix.

5) Safe defaults for shipping


Dockerfile/runtime notes for operators

  • Root Dockerfile runs node dist/index.js with dumb-init and file-based healthcheck. (../Dockerfile)
  • Orchestrator Dockerfile runs as non-root user, exposes 3000, and healthchecks /health. (orchestrator/Dockerfile)
  • Package scripts for local dev/ops are in orchestrator package (build, start, dev, test:run, test:integration, test:load). (orchestrator/package.json)

Built from the canonical repo docs and generated site source.