Repository Hygiene Governance
Purpose
This document defines the mandatory, evidence-first policy for classifying repository paths as:
PROTECTEDPROTECTED-BUT-PRUNABLECANDIDATEDRIFT-RISK
No deletion recommendation is valid unless the Protected Path Derivation flow below has executed and produced an evidence-backed allowlist.
This policy is authoritative. Other documents must reference this file and must not duplicate or reinterpret its rules.
Protected Path Derivation (Required Order)
The allowlist must be derived in this exact sequence:
1. Runtime Configuration Roots
Collect all configured paths from:
workspace/orchestrator_config.json
This file is canonical for runtime path policy.
Keys including (but not limited to):
docsPathcookbookPathknowledgePackDirlogsDirstateFile- any other declared filesystem root
All configured paths are PROTECTED by default unless explicitly proven unused by runtime code.
2. Agent Path IO Expansion
Enumerate:
workspace/agents/*/agent.config.json- corresponding runtime files (
src/index.ts,src/service.ts)
Any declared read/write path expands the protected surface.
Protection is transitive: If Agent A writes to a directory and Agent B consumes from it, that directory is PROTECTED.
3. Trigger Surfaces
Enumerate:
workspace/systemd/*.servicecron/jobs.jsonworkspace/.github/workflows/*- documented manual operational commands currently used by operators
If a trigger references a script, binary, or path, that path becomes PROTECTED unless proven otherwise.
Documented examples without active runtime trigger must be labeled Documented-only.
4. Sync / Fetch / Mirror Scripts
Enumerate all scripts matching:
sync_*- scripts invoking
git clone,sparse-checkout,rsync,pull,fetch,curl,wget
Mirror destinations are PROTECTED inputs.
5. Runtime Consumption Paths
Scan runtime source:
- config loaders
- indexers
- watchers
- task handlers
- ingestion logic
- knowledge-pack consumers
- API exposure paths
Any path read, watched, indexed, written, or served by runtime code is PROTECTED.
6. Construct Protected Allowlist
Only after steps 1–5 are complete may classification begin.
The allowlist must include:
- Path
- Role (input/output/substrate)
- Evidence file(s)
- Trigger status (
Confirmed,Documented-only,Not found) - Confidence
Evidence Requirements
A path may only be marked non-protected if no evidence exists in:
- runtime configuration
- agent configuration or runtime code
- systemd/cron/workflows
- sync scripts
- orchestrator indexing or ingestion code
- currently used manual workflows
If evidence is absent but removal risk is historically high or intent is ambiguous, classify as DRIFT-RISK.
Zero-hit proof must be generated via deterministic scanning (scripted), not editor search.
Classification Definitions
PROTECTED
Path participates in an active or documented runtime/update/ingestion chain.
PROTECTED-BUT-PRUNABLE
Path is required at a root level, but a specific subpath is safely regenerable from deterministic sources and not required for runtime continuity.
Example: Source maps inside dist/ when deployment does not require them and rebuild is guaranteed.
Removal requires:
- deterministic regeneration path
- verification step
CANDIDATE
No evidence of required usage and low operational risk.
DRIFT-RISK
No direct evidence of active use, but:
- historical/audit value exists
- mirror intent unclear
- removal impact ambiguous
Requires explicit operator intent before removal.
Mandatory Audit Flow
Generate Protected Allowlist (evidence-backed).
Re-score junk inventory against allowlist.
Produce “Not Found” section for expected but missing triggers/primitives.
Separate:
- Low-risk
CANDIDATE DRIFT-RISKrequiring operator intent
- Low-risk
Deletion recommendations without steps 1–4 are invalid.
Tooling Requirement
Audits must be deterministic and script-assisted (Python or equivalent).
Editor search is not sufficient.
Audit output must include:
- path
- evidence source file(s)
- trigger status
- confidence level
- zero-hit proof when claiming non-usage
Path Categories (Conceptual Model)
Mirror Inputs
openclaw-docs/openai-cookbook/
These are sync destinations and are PROTECTED if configured or indexed.
Runtime Outputs
logs/knowledge-packs/- state files declared in config
If produced and consumed, they are PROTECTED.
Long-Memory Substrate
Defaults to DRIFT-RISK unless proven safe:
logs/- session histories
- memory snapshots
- archives
- adjacent generated records
If configured or consumed by runtime, they escalate to PROTECTED.
Mirror Repository Rule
Nested .git directories inside mirrored sources are not automatically junk.
If sync scripts exclude .git (for example rsync --exclude='.git'), nested .git directories must be classified as DRIFT-RISK unless explicit non-usage is proven.
Expansion Rules for Future Changes
This policy automatically scales.
The protected surface expands when:
- A new agent adds path IO in
agent.config.json - A new
.service, cron job, or workflow is introduced orchestrator_config.jsonadds or modifies path roots- Runtime code begins indexing or consuming new directories
workspace/orchestrator_config.json remains the canonical runtime root authority.
Example Classification
| Path | Evidence | Classification |
|---|---|---|
openclaw-docs/ | Configured + indexed | PROTECTED |
logs/knowledge-packs/ | Produced + consumed | PROTECTED |
Mirror nested .git | Script excludes .git, intent unclear | DRIFT-RISK |
Backup .bak with no references | Zero-hit proof, low risk | CANDIDATE |
Linking Rule
All documentation discussing:
- cleanup
- drift-repair
- indexing
- mirrors
- knowledge packs
- CI hygiene
must reference this document rather than redefining policy.
File paths referenced in documentation must be full repository-relative paths.
Documentation Placement Rules
Documentation must be compact, updated in place, and easy to navigate.
Update Existing Docs First
Before creating any new Markdown file:
- Identify the canonical existing file for that topic.
- Update that file if it can reasonably absorb the change.
- Only create a new document when no existing canonical doc can hold the content without becoming misleading or overloaded.
Mandatory anti-drift rule:
- every material code or config change must update the appropriate existing
.mdfile in the same change set - that documentation update should reference the affected code/config paths when it materially improves traceability
Preference order:
- update canonical docs under
docs/ - update subproject entrypoints (
orchestrator/README.md,openclawdbot/README.md,agents/README.md) - create a new file only as a last resort
Keep Root Markdown Compact
Root-level Markdown is reserved for:
- public entrypoints such as
README.md,QUICKSTART.md,DEPLOYMENT.md,CHANGELOG.md, andOPENCLAW_CONTEXT_ANCHOR.md - workspace bootstrap/context files intentionally read in-place by the agent workspace layer (for example
AGENTS.md,BOOTSTRAP.md,SOUL.md,IDENTITY.md,USER.md,TOOLS.md,HEARTBEAT.md,MEMORY.md, and live runtime doctrine files)
Historical docs, audits, runbooks, and topic-specific reference material should live under docs/ or the relevant subproject directory, not at repo root.
Rearrangement Rule
If a root-level Markdown file is not part of the reserved root set above and it is still worth keeping, move it into the appropriate existing documentation area instead of leaving it at root.
Agent Primitive Requirements
All agent directories under workspace/agents/* must contain:
README.mdROLE.mdSCOPE.mdPOLICY.mdTOOLS.md
Requirements:
ROLE.mdmust define purpose, done criteria, and never-do boundaries.SCOPE.mdmust define inputs, outputs, file I/O expectations, allowed actions, and explicit out-of-scope boundaries.POLICY.mdmust reference this governance file and enforce evidence-first + protected-path derivation rules.README.mdmust include mission, I/O contract, run instructions, and links toROLE.md,SCOPE.md,POLICY.md, andTOOLS.md.
Governance note: CI should fail when any required primitive file is missing. This is a documentation requirement; CI implementation is tracked separately.