Rule Generation Pipeline
- Pipeline Stages
- Stage 1: Init
- Stage 2: Scanning
- Stage 3: Compressing
- Stage 4: Analyzing
- Stage 5: Formatting
- Stage 6: Validating
- Stage 7: Finalizing
- Stage 8: Writing
- Stage 9: Reporting
- Stage 10: Cleanup
- Dry Run Mode
ruley processes codebases through a 10-stage pipeline. Each stage has a clear responsibility and transitions cleanly to the next. The PipelineContext carries state through all stages.
Pipeline Stages
flowchart TD
S1["1. Init<br/>Configuration validation"]
S2["2. Scanning<br/>File discovery"]
S3["3. Compressing<br/>Tree-sitter compression"]
S4["4. Analyzing<br/>LLM analysis"]
S5["5. Formatting<br/>Per-format refinement"]
S6["6. Validating<br/>Rule validation"]
S7["7. Finalizing<br/>Post-processing"]
S8["8. Writing<br/>File output"]
S9["9. Reporting<br/>Summary display"]
S10["10. Cleanup<br/>Temp file removal"]
S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 --> S8 --> S9 --> S10
Stage 1: Init
Module: src/lib.rs
- Validates the repository path exists
- Creates the
.ruley/cache directory - Cleans up old temporary files (24-hour threshold)
- Ensures
.ruley/is in.gitignore - Loads previous run state from
.ruley/state.json
Stage 2: Scanning
Module: src/packer/
- Discovers all files in the repository
- Respects
.gitignorerules via theignorecrate - Applies
--includeand--excludeglob patterns - Identifies file languages for compression
- Caches the file list to
.ruley/for debugging
If --repomix-file is provided, scanning is skipped and the pre-packed file is used directly.
Stage 3: Compressing
Module: src/packer/compression/
- Reads discovered files and their content
- When
--compressis enabled, uses tree-sitter grammars to extract structural elements (functions, classes, types) while removing implementation details - Calculates compression metadata (file count, original size, compressed size, ratio)
- Target compression ratio: ~70% token reduction
Without --compress, files are included at full size.
Stage 4: Analyzing
Module: src/llm/, src/generator/
This is the core LLM interaction stage:
- Tokenize: Count tokens in the compressed codebase using the provider’s tokenizer
- Chunk: If the codebase exceeds the provider’s context window, split into chunks with configurable overlap
- Cost estimate: Calculate and display estimated cost
- Confirm: Prompt the user to approve (unless
--no-confirm) - Analyze: Send each chunk to the LLM with the analysis prompt
- Merge: If multi-chunk, perform an additional LLM call to merge chunk analyses
- Parse: Extract structured
GeneratedRulesfrom the LLM response
The analysis prompt asks the LLM to identify:
- Project conventions and coding style
- Architecture patterns and module structure
- Error handling approaches
- Testing practices
- Naming conventions
Stage 5: Formatting
Module: src/generator/
For each requested output format:
- Build a format-specific refinement prompt with the analysis result
- Call the LLM to generate format-adapted content
- Store the result in
GeneratedRules.rules_by_format
Each format refinement is a separate LLM call to ensure format-specific conventions are followed (e.g., YAML frontmatter for Cursor, markdown for Claude).
Stage 6: Validating
Module: src/utils/validation.rs
Validates generated rules against multiple criteria:
- Syntax validation: Format-specific structure checks (valid YAML, valid frontmatter, etc.)
- Schema validation: Required fields and structure
- Semantic validation (configurable):
- File paths referenced in rules exist in the codebase
- No contradictory rules
- Cross-format consistency
- Languages/frameworks match the actual codebase
If validation fails and --retry-on-validation-failure is set, ruley sends the errors back to the LLM for auto-fix (up to max_retries attempts).
Stage 7: Finalizing
Module: src/utils/finalization.rs
Post-processing of validated rules:
- Metadata injection: Adds generation timestamp, ruley version, and provider info
- Deconfliction: If existing rule files are present, uses an LLM call to merge new rules with existing ones (unless
--no-deconflict) - Formatting normalization: Normalizes line endings and trailing whitespace
- Post-finalize smoke validation: Re-validates after finalization to catch any introduced errors
Stage 8: Writing
Module: src/output/
Writes rule files to disk:
- Resolves output paths (format defaults, config overrides,
--outputflag) - Applies conflict resolution strategy (
prompt,overwrite,skip,smart-merge) - Creates backups when overwriting
- Reports what was written (created, updated, skipped, merged)
Stage 9: Reporting
Module: src/utils/summary.rs
Displays a summary of the pipeline run:
- Files analyzed
- Tokens processed
- Compression ratio (if applicable)
- Total LLM cost
- Elapsed time
- Output files written
Stage 10: Cleanup
Module: src/lib.rs, src/utils/cache.rs
Final cleanup:
- Saves pipeline state to
.ruley/state.json(for future runs) - Cleans up temporary files in
.ruley/ - Transitions to the
Completeterminal state
Dry Run Mode
When --dry-run is specified, the pipeline runs stages 1-3 (Init, Scanning, Compressing), displays what would be processed (file count, token estimate, cost), and exits without making any LLM calls.