Skip to content

Repository Hygiene Governance

Purpose

This document defines the mandatory, evidence-first policy for classifying repository paths as:

  • PROTECTED
  • PROTECTED-BUT-PRUNABLE
  • CANDIDATE
  • DRIFT-RISK

No deletion recommendation is valid unless the Protected Path Derivation flow below has executed and produced an evidence-backed allowlist.

This policy is authoritative. Other documents must reference this file and must not duplicate or reinterpret its rules.


Protected Path Derivation (Required Order)

The allowlist must be derived in this exact sequence:

1. Runtime Configuration Roots

Collect all configured paths from:

  • workspace/orchestrator_config.json

This file is canonical for runtime path policy.

Keys including (but not limited to):

  • docsPath
  • cookbookPath
  • knowledgePackDir
  • logsDir
  • stateFile
  • any other declared filesystem root

All configured paths are PROTECTED by default unless explicitly proven unused by runtime code.


2. Agent Path IO Expansion

Enumerate:

  • workspace/agents/*/agent.config.json
  • corresponding runtime files (src/index.ts, src/service.ts)

Any declared read/write path expands the protected surface.

Protection is transitive: If Agent A writes to a directory and Agent B consumes from it, that directory is PROTECTED.


3. Trigger Surfaces

Enumerate:

  • workspace/systemd/*.service
  • cron/jobs.json
  • workspace/.github/workflows/*
  • documented manual operational commands currently used by operators

If a trigger references a script, binary, or path, that path becomes PROTECTED unless proven otherwise.

Documented examples without active runtime trigger must be labeled Documented-only.


4. Sync / Fetch / Mirror Scripts

Enumerate all scripts matching:

  • sync_*
  • scripts invoking git clone, sparse-checkout, rsync, pull, fetch, curl, wget

Mirror destinations are PROTECTED inputs.


5. Runtime Consumption Paths

Scan runtime source:

  • config loaders
  • indexers
  • watchers
  • task handlers
  • ingestion logic
  • knowledge-pack consumers
  • API exposure paths

Any path read, watched, indexed, written, or served by runtime code is PROTECTED.


6. Construct Protected Allowlist

Only after steps 1–5 are complete may classification begin.

The allowlist must include:

  • Path
  • Role (input/output/substrate)
  • Evidence file(s)
  • Trigger status (Confirmed, Documented-only, Not found)
  • Confidence

Evidence Requirements

A path may only be marked non-protected if no evidence exists in:

  • runtime configuration
  • agent configuration or runtime code
  • systemd/cron/workflows
  • sync scripts
  • orchestrator indexing or ingestion code
  • currently used manual workflows

If evidence is absent but removal risk is historically high or intent is ambiguous, classify as DRIFT-RISK.

Zero-hit proof must be generated via deterministic scanning (scripted), not editor search.


Classification Definitions

PROTECTED

Path participates in an active or documented runtime/update/ingestion chain.

PROTECTED-BUT-PRUNABLE

Path is required at a root level, but a specific subpath is safely regenerable from deterministic sources and not required for runtime continuity.

Example: Source maps inside dist/ when deployment does not require them and rebuild is guaranteed.

Removal requires:

  • deterministic regeneration path
  • verification step

CANDIDATE

No evidence of required usage and low operational risk.

DRIFT-RISK

No direct evidence of active use, but:

  • historical/audit value exists
  • mirror intent unclear
  • removal impact ambiguous

Requires explicit operator intent before removal.


Mandatory Audit Flow

  1. Generate Protected Allowlist (evidence-backed).

  2. Re-score junk inventory against allowlist.

  3. Produce “Not Found” section for expected but missing triggers/primitives.

  4. Separate:

    • Low-risk CANDIDATE
    • DRIFT-RISK requiring operator intent

Deletion recommendations without steps 1–4 are invalid.


Tooling Requirement

Audits must be deterministic and script-assisted (Python or equivalent).

Editor search is not sufficient.

Audit output must include:

  • path
  • evidence source file(s)
  • trigger status
  • confidence level
  • zero-hit proof when claiming non-usage

Path Categories (Conceptual Model)

Mirror Inputs

  • openclaw-docs/
  • openai-cookbook/

These are sync destinations and are PROTECTED if configured or indexed.

Runtime Outputs

  • logs/knowledge-packs/
  • state files declared in config

If produced and consumed, they are PROTECTED.

Long-Memory Substrate

Defaults to DRIFT-RISK unless proven safe:

  • logs/
  • session histories
  • memory snapshots
  • archives
  • adjacent generated records

If configured or consumed by runtime, they escalate to PROTECTED.


Mirror Repository Rule

Nested .git directories inside mirrored sources are not automatically junk.

If sync scripts exclude .git (for example rsync --exclude='.git'), nested .git directories must be classified as DRIFT-RISK unless explicit non-usage is proven.


Expansion Rules for Future Changes

This policy automatically scales.

The protected surface expands when:

  • A new agent adds path IO in agent.config.json
  • A new .service, cron job, or workflow is introduced
  • orchestrator_config.json adds or modifies path roots
  • Runtime code begins indexing or consuming new directories

workspace/orchestrator_config.json remains the canonical runtime root authority.


Example Classification

PathEvidenceClassification
openclaw-docs/Configured + indexedPROTECTED
logs/knowledge-packs/Produced + consumedPROTECTED
Mirror nested .gitScript excludes .git, intent unclearDRIFT-RISK
Backup .bak with no referencesZero-hit proof, low riskCANDIDATE

Linking Rule

All documentation discussing:

  • cleanup
  • drift-repair
  • indexing
  • mirrors
  • knowledge packs
  • CI hygiene

must reference this document rather than redefining policy.

File paths referenced in documentation must be full repository-relative paths.


Documentation Placement Rules

Documentation must be compact, updated in place, and easy to navigate.

Update Existing Docs First

Before creating any new Markdown file:

  1. Identify the canonical existing file for that topic.
  2. Update that file if it can reasonably absorb the change.
  3. Only create a new document when no existing canonical doc can hold the content without becoming misleading or overloaded.

Mandatory anti-drift rule:

  • every material code or config change must update the appropriate existing .md file in the same change set
  • that documentation update should reference the affected code/config paths when it materially improves traceability

Preference order:

  • update canonical docs under docs/
  • update subproject entrypoints (orchestrator/README.md, openclawdbot/README.md, agents/README.md)
  • create a new file only as a last resort

Keep Root Markdown Compact

Root-level Markdown is reserved for:

  • public entrypoints such as README.md, QUICKSTART.md, DEPLOYMENT.md, CHANGELOG.md, and OPENCLAW_CONTEXT_ANCHOR.md
  • workspace bootstrap/context files intentionally read in-place by the agent workspace layer (for example AGENTS.md, BOOTSTRAP.md, SOUL.md, IDENTITY.md, USER.md, TOOLS.md, HEARTBEAT.md, MEMORY.md, and live runtime doctrine files)

Historical docs, audits, runbooks, and topic-specific reference material should live under docs/ or the relevant subproject directory, not at repo root.

Rearrangement Rule

If a root-level Markdown file is not part of the reserved root set above and it is still worth keeping, move it into the appropriate existing documentation area instead of leaving it at root.


Agent Primitive Requirements

All agent directories under workspace/agents/* must contain:

  • README.md
  • ROLE.md
  • SCOPE.md
  • POLICY.md
  • TOOLS.md

Requirements:

  • ROLE.md must define purpose, done criteria, and never-do boundaries.
  • SCOPE.md must define inputs, outputs, file I/O expectations, allowed actions, and explicit out-of-scope boundaries.
  • POLICY.md must reference this governance file and enforce evidence-first + protected-path derivation rules.
  • README.md must include mission, I/O contract, run instructions, and links to ROLE.md, SCOPE.md, POLICY.md, and TOOLS.md.

Governance note: CI should fail when any required primitive file is missing. This is a documentation requirement; CI implementation is tracked separately.

Built from the canonical repo docs and generated site source.