Introduction

Make your codebase ruley. A Rust CLI tool for generating AI IDE rules from codebases.

ruley (the opposite of unruly) is a command-line tool that analyzes codebases and generates AI IDE rule files. It uses Large Language Models to understand project structure, conventions, and patterns, then produces actionable rules that help AI assistants provide better, context-aware code suggestions.

Tame your unruly codebase. Make it ruley.

Why ruley?

AI coding assistants work best when they understand your project’s conventions. Without explicit rules, they fall back to generic patterns that may not match your codebase. ruley bridges this gap by:

Scanning your repository to understand its structure, languages, and patterns
Compressing the codebase using tree-sitter for token efficiency
Analyzing the compressed code with an LLM to extract conventions
Generating format-specific rule files for your preferred AI IDE tools

The result is a set of rule files that teach AI assistants how your project works – coding style, architecture patterns, naming conventions, error handling approaches, and more.

Key Features

Single binary distribution – No runtime dependencies (Node.js, Python, etc.)
Multi-provider LLM support – Anthropic, OpenAI, Ollama, OpenRouter
Multi-format output – Generate rules for 7 different AI IDE formats in a single run
Native performance – Fast codebase analysis built with Rust
Smart compression – Tree-sitter-based code compression for token efficiency (~70% reduction)
Accurate token counting – Native tiktoken implementation for precise cost estimation
Cost transparency – Shows estimated cost before LLM calls, requires confirmation
Configurable – TOML configuration file, environment variables, and CLI flags

Supported Formats

Format	Output File	Description
Cursor	`.cursor/rules/*.mdc`	Cursor IDE rules
Claude	`CLAUDE.md`	Claude Code project instructions
Copilot	`.github/copilot-instructions.md`	GitHub Copilot instructions
Windsurf	`.windsurfrules`	Windsurf IDE rules
Aider	`.aider.conf.yml`	Aider conventions
Generic	`.ai-rules.md`	Generic markdown rules
JSON	`.ai-rules.json`	Machine-readable JSON

Where to Start

New users: Start with Installation and Quick Start
CLI reference: See Command-Line Interface for all options
Configuration: See Configuration for ruley.toml setup
Contributors: See Development Setup to get started
Architecture: See Architecture Overview to understand the internals

Installation

Pre-built Binaries (Recommended)

Pre-built binaries are available for Linux (x86_64, ARM64), macOS (ARM64), and Windows (x86_64) on the releases page.

macOS / Linux

curl -fsSL https://github.com/EvilBit-Labs/ruley/releases/latest/download/ruley-installer.sh | sh

Windows

powershell -ExecutionPolicy Bypass -c "irm https://github.com/EvilBit-Labs/ruley/releases/latest/download/ruley-installer.ps1 | iex"

Homebrew

brew install EvilBit-Labs/tap/ruley

Cargo (crates.io)

cargo install ruley

This builds from source with default features (Anthropic, OpenAI, TypeScript compression).

With All Features

cargo install ruley --all-features

Minimal Install

cargo install ruley --no-default-features --features anthropic

cargo-binstall

If you have cargo-binstall installed:

cargo binstall ruley

Building from Source

git clone https://github.com/EvilBit-Labs/ruley.git
cd ruley
cargo build --release

The binary will be at ./target/release/ruley.

System Requirements

Operating system: Linux (x86_64, ARM64), macOS (ARM64), Windows (x86_64)
Rust (build from source only): 1.91 or newer (see rust-version in Cargo.toml)
Network: Required for LLM API calls (except Ollama which runs locally)

Feature Flags

ruley uses Cargo feature flags to control which LLM providers and compression languages are compiled in:

Feature	Description	Default
`anthropic`	Anthropic Claude provider	Yes
`openai`	OpenAI GPT provider	Yes
`ollama`	Ollama local model provider	No
`openrouter`	OpenRouter multi-model provider	No
`all-providers`	All LLM providers	No
`compression-typescript`	TypeScript tree-sitter grammar	Yes
`compression-python`	Python tree-sitter grammar	No
`compression-rust`	Rust tree-sitter grammar	No
`compression-go`	Go tree-sitter grammar	No
`compression-all`	All compression languages	No

Verifying Releases

All release artifacts are signed via Sigstore using GitHub Attestations:

gh attestation verify <artifact> --repo EvilBit-Labs/ruley

See Release Verification for details.

Quick Start

This guide walks you through generating your first set of AI IDE rules with ruley.

Prerequisites

ruley installed (see Installation)
An API key for at least one LLM provider

Step 1: Set Your API Key

Set the environment variable for your chosen provider:

Anthropic

export ANTHROPIC_API_KEY="sk-ant-..."

OpenAI

export OPENAI_API_KEY="sk-..."

Ollama

# No API key needed -- just ensure Ollama is running
ollama serve

OpenRouter

export OPENROUTER_API_KEY="sk-or-..."

Step 2: Generate Rules

Navigate to your project directory and run ruley:

cd /path/to/your/project
ruley

By default, ruley uses Anthropic Claude and generates Cursor format rules.

Step 3: Review the Output

ruley shows you:

Scan results – How many files were discovered
Compression stats – Token reduction from tree-sitter compression
Cost estimate – Estimated LLM cost before proceeding
Confirmation prompt – You must approve before the LLM call is made
Generated files – Where the rule files were written

Common Variations

Use a Different Provider

ruley --provider openai --model gpt-4o

Generate Multiple Formats

ruley --format cursor,claude,copilot

Generate All Formats at Once

ruley --format all

Enable Tree-Sitter Compression

ruley --compress

Analyze a Specific Directory

ruley ./my-project --compress

Dry Run (Preview Without Calling LLM)

ruley --dry-run

This shows what would be processed (file count, token estimate, cost) without making any LLM calls. Useful for checking costs before committing.

Skip Cost Confirmation

ruley --no-confirm

Use a Local Ollama Model

ruley --provider ollama --model llama3.1

What Happens Next

The generated rule files are placed in your project directory at the standard locations for each format. Your AI IDE tools will automatically pick them up:

Cursor: .cursor/rules/*.mdc – loaded automatically by Cursor IDE
Claude: CLAUDE.md – read by Claude Code as project context
Copilot: .github/copilot-instructions.md – loaded by GitHub Copilot
Windsurf: .windsurfrules – loaded by Windsurf IDE
Aider: .aider.conf.yml – loaded by Aider CLI

Commit the generated files to your repository so your whole team benefits from consistent AI assistance.

Next Steps

Command-Line Interface – Full reference for all CLI options
Configuration – Set up a ruley.toml for your project
LLM Providers – Compare providers and choose the best fit
Output Formats – Understand what each format produces

Command-Line Interface

Usage
Options
Environment Variables
- Provider API Keys
Examples

Usage

ruley [OPTIONS] [PATH]

PATH: Path to repository (local path or remote URL). Defaults to . (current directory).

Options

Core Options

Flag	Env Variable	Default	Description
`-p, --provider <NAME>`	`RULEY_PROVIDER`	`anthropic`	LLM provider (`anthropic`, `openai`, `ollama`, `openrouter`)
`-m, --model <NAME>`	`RULEY_MODEL`	(provider default)	Model to use
`-f, --format <FORMATS>`	`RULEY_FORMAT`	`cursor`	Output format(s), comma-separated
`-o, --output <PATH>`	`RULEY_OUTPUT`	(format default)	Output file path (single format only)
`-c, --config <PATH>`	`RULEY_CONFIG`	`ruley.toml`	Config file path

Generation Options

Flag	Env Variable	Default	Description
`--description <TEXT>`	`RULEY_DESCRIPTION`	(none)	Focus area for rule generation
`--rule-type <TYPE>`	`RULEY_RULE_TYPE`	`auto`	Cursor rule type (`auto`, `always`, `manual`, `agent-requested`)
`--compress`	`RULEY_COMPRESS`	`false`	Enable tree-sitter compression
`--chunk-size <N>`	`RULEY_CHUNK_SIZE`	`100000`	Max tokens per LLM chunk
`--repomix-file <PATH>`	`RULEY_REPOMIX_FILE`	(none)	Use pre-packed repomix file as input

Filtering Options

Flag	Description
`--include <PATTERN>`	Include only matching files (repeatable)
`--exclude <PATTERN>`	Exclude matching files (repeatable)

Behavior Options

Flag	Env Variable	Default	Description
`--no-confirm`	`RULEY_NO_CONFIRM`	`false`	Skip cost confirmation prompt
`--dry-run`	`RULEY_DRY_RUN`	`false`	Show plan without calling LLM
`--on-conflict <STRATEGY>`	`RULEY_ON_CONFLICT`	`prompt`	Conflict resolution (`prompt`, `overwrite`, `skip`, `smart-merge`)
`--retry-on-validation-failure`		`false`	Auto-retry with LLM fix on validation failure
`--no-deconflict`		`false`	Disable LLM-based deconfliction with existing rules
`--no-semantic-validation`		`false`	Disable all semantic validation checks

Output Options

Flag	Description
`-v`	Increase verbosity (`-v` = DEBUG, `-vv` = TRACE)
`-q`	Suppress non-essential output
`--version`	Print version information
`--help`	Print help information

Environment Variables

All CLI flags can be set via RULEY_* environment variables. CLI flags take precedence over environment variables, which take precedence over config file values.

Provider API Keys

Variable	Provider	Required
`ANTHROPIC_API_KEY`	Anthropic	When using `--provider anthropic`
`OPENAI_API_KEY`	OpenAI	When using `--provider openai`
`OLLAMA_HOST`	Ollama	Optional (default: `http://localhost:11434`)
`OPENROUTER_API_KEY`	OpenRouter	When using `--provider openrouter`

Examples

Basic Usage

# Analyze current directory with defaults
ruley

# Analyze a specific project
ruley /path/to/project

Provider Selection

# Use OpenAI with a specific model
ruley --provider openai --model gpt-4o

# Use local Ollama
ruley --provider ollama --model llama3.1

# Use OpenRouter with Claude
ruley --provider openrouter --model anthropic/claude-3.5-sonnet

Format Control

# Generate Cursor rules (default)
ruley --format cursor

# Generate multiple formats
ruley --format cursor,claude,copilot

# Generate all formats
ruley --format all

# Write to a specific path (single format only)
ruley --format claude --output ./docs/CLAUDE.md

Compression and Performance

# Enable tree-sitter compression (~70% token reduction)
ruley --compress

# Adjust chunk size for large codebases
ruley --chunk-size 200000

# Use a pre-packed repomix file
ruley --repomix-file ./codebase.xml

Cost Management

# Preview without calling the LLM
ruley --dry-run

# Skip the cost confirmation prompt
ruley --no-confirm

Conflict Resolution

# Overwrite existing rule files
ruley --on-conflict overwrite

# Skip if files already exist
ruley --on-conflict skip

# Use LLM to smart-merge with existing rules
ruley --on-conflict smart-merge

Filtering Files

# Only include Rust files
ruley --include "**/*.rs"

# Exclude test directories
ruley --exclude "**/tests/**" --exclude "**/benches/**"

Configuration

Configuration Precedence
Config File Discovery
Configuration File Format

ruley supports hierarchical configuration from multiple sources. This page documents the configuration file format and precedence rules.

Configuration Precedence

Configuration is resolved in this order (highest to lowest precedence):

CLI flags – Explicitly provided command-line arguments
Environment variables – RULEY_* prefix (handled by clap’s env attribute)
Config files – Loaded and merged in discovery order (see below)
Built-in defaults – Hardcoded in the CLI parser

When a CLI flag is explicitly provided, it always wins. When it’s not provided (using the default), the config file value is used instead.

Config File Discovery

Config files are discovered and merged in this order (later overrides earlier):

~/.config/ruley/config.toml – User-level global config
ruley.toml in the git repository root – Project-level config
./ruley.toml in the current directory – Working directory config
Explicit --config <path> – If provided, overrides all above

All discovered files are merged. Duplicate keys in later files override earlier ones.

Configuration File Format

Configuration files use TOML format. All sections are optional.

Complete Example

[general]
provider = "anthropic"
model = "claude-sonnet-4-5-20250929"
format = ["cursor", "claude"]
compress = true
chunk_size = 100000
no_confirm = false
rule_type = "auto"

[output]
formats = ["cursor", "claude"]
on_conflict = "prompt"

[output.paths]
cursor = ".cursor/rules/project-rules.mdc"
claude = "CLAUDE.md"

[include]
patterns = ["**/*.rs", "**/*.toml"]

[exclude]
patterns = ["**/target/**", "**/node_modules/**"]

[chunking]
chunk_size = 100000
overlap = 10000

[providers.anthropic]
model = "claude-sonnet-4-5-20250929"
max_tokens = 8192

[providers.openai]
model = "gpt-4o"
max_tokens = 4096

[providers.ollama]
host = "http://localhost:11434"
model = "llama3.1:70b"

[providers.openrouter]
model = "anthropic/claude-3.5-sonnet"
max_tokens = 8192

[validation]
enabled = true
retry_on_failure = false
max_retries = 3

[validation.semantic]
check_file_paths = true
check_contradictions = true
check_consistency = true
check_reality = true

[finalization]
enabled = true
deconflict = true
normalize_formatting = true
inject_metadata = true

`[general]` Section

Core settings for the pipeline.

Key	Type	Default	Description
`provider`	string	`"anthropic"`	LLM provider name
`model`	string	(provider default)	Model to use
`format`	string[]	`["cursor"]`	Output formats
`compress`	bool	`false`	Enable tree-sitter compression
`chunk_size`	int	`100000`	Max tokens per LLM chunk
`no_confirm`	bool	`false`	Skip cost confirmation
`rule_type`	string	`"auto"`	Cursor rule type

`[output]` Section

Output format and path configuration.

Key	Type	Default	Description
`formats`	string[]	`[]`	Alternative to `general.format`
`on_conflict`	string	`"prompt"`	Conflict resolution strategy
`paths.<format>`	string	(format default)	Custom output path per format

`[include]` / `[exclude]` Sections

File filtering using glob patterns.

Key	Type	Default	Description
`patterns`	string[]	`[]`	Glob patterns for file matching

`[chunking]` Section

Controls how large codebases are split for LLM processing.

Key	Type	Default	Description
`chunk_size`	int	`100000`	Max tokens per chunk
`overlap`	int	`chunk_size / 10`	Token overlap between chunks

`[providers]` Section

Provider-specific configuration. Each provider has its own subsection.

[providers.anthropic] / [providers.openai] / [providers.openrouter]:

Key	Type	Description
`model`	string	Model name override
`max_tokens`	int	Max output tokens

[providers.ollama]:

Key	Type	Description
`host`	string	Ollama server URL
`model`	string	Model name

`[validation]` Section

Controls validation of generated rules.

Key	Type	Default	Description
`enabled`	bool	`true`	Enable validation
`retry_on_failure`	bool	`false`	Auto-retry with LLM fix
`max_retries`	int	`3`	Max auto-fix attempts

[validation.semantic] – Semantic validation checks:

Key	Type	Default	Description
`check_file_paths`	bool	`true`	Verify referenced file paths exist
`check_contradictions`	bool	`true`	Detect contradictory rules
`check_consistency`	bool	`true`	Cross-format consistency check
`check_reality`	bool	`true`	Verify language/framework references

`[finalization]` Section

Controls post-processing of generated rules.

Key	Type	Default	Description
`enabled`	bool	`true`	Enable finalization
`deconflict`	bool	`true`	LLM-based deconfliction with existing rules
`normalize_formatting`	bool	`true`	Normalize line endings and whitespace
`inject_metadata`	bool	`true`	Add timestamp/version/provider headers

LLM Providers

Provider Comparison
Anthropic
- Setup
- Usage
- Config File
OpenAI
- Setup
- Usage
- Config File
Ollama
OpenRouter
- Setup
- Usage
- Config File
Feature Flags
Choosing a Provider

ruley supports multiple LLM providers. Each provider is feature-gated at compile time and requires its own API key (except Ollama).

Provider Comparison

Provider	API Key Required	Local	Default Model	Context Window
Anthropic	Yes	No	`claude-sonnet-4-5-20250929`	200K tokens
OpenAI	Yes	No	`gpt-4o`	128K tokens
Ollama	No	Yes	`llama3.1:70b`	~100K tokens
OpenRouter	Yes	No	`anthropic/claude-3.5-sonnet`	Varies by model

Anthropic

Anthropic’s Claude models are the default provider and generally produce excellent rule quality.

Setup

export ANTHROPIC_API_KEY="sk-ant-..."

Usage

# Uses default model (Claude Sonnet 4.5)
ruley --provider anthropic

# Specify a model
ruley --provider anthropic --model claude-sonnet-4-5-20250929

Config File

[general]
provider = "anthropic"

[providers.anthropic]
model = "claude-sonnet-4-5-20250929"
max_tokens = 8192

OpenAI

OpenAI’s GPT models provide strong rule generation with fast response times.

Setup

export OPENAI_API_KEY="sk-..."

Usage

ruley --provider openai --model gpt-4o

Config File

[general]
provider = "openai"

[providers.openai]
model = "gpt-4o"
max_tokens = 4096

Ollama

Ollama runs models locally. No API key is needed, and there are no per-token costs. This is ideal for privacy-sensitive codebases or offline use.

Setup

Install Ollama
Pull a model: ollama pull llama3.1:70b
Start the server: ollama serve

Usage

ruley --provider ollama --model llama3.1

# Custom Ollama host
OLLAMA_HOST="http://192.168.1.100:11434" ruley --provider ollama

Config File

[general]
provider = "ollama"

[providers.ollama]
host = "http://localhost:11434"
model = "llama3.1:70b"

Considerations

Rule quality depends heavily on the model size. Larger models (70B+) produce better results.
Local models have smaller context windows. Use --compress and --chunk-size to manage large codebases.
No cost confirmation is shown since Ollama is free to use.

OpenRouter

OpenRouter provides access to models from multiple providers through a single API. It fetches dynamic pricing from the OpenRouter API for accurate cost estimation.

Setup

export OPENROUTER_API_KEY="sk-or-..."

Usage

ruley --provider openrouter --model anthropic/claude-3.5-sonnet

Config File

[general]
provider = "openrouter"

[providers.openrouter]
model = "anthropic/claude-3.5-sonnet"
max_tokens = 8192

Feature Flags

Providers are compiled in via Cargo feature flags. The default build includes anthropic and openai.

Feature	Provider
`anthropic`	Anthropic (default)
`openai`	OpenAI (default)
`ollama`	Ollama
`openrouter`	OpenRouter
`all-providers`	All of the above

To include all providers when building from source:

cargo install ruley --features all-providers

Choosing a Provider

Best quality: Anthropic Claude (default) – excellent at understanding code conventions
Fastest: OpenAI GPT-4o – lower latency per request
Free / Private: Ollama – no API costs, data stays local
Flexible: OpenRouter – access to many models through one API

Output Formats

Format Overview
Selecting Formats
Format Details
Conflict Resolution
Single Analysis, Multiple Outputs

ruley generates rule files in 7 formats. Each format targets a specific AI IDE tool and follows its conventions for file naming, structure, and content.

Format Overview

Format	Output File	Description
`cursor`	`.cursor/rules/*.mdc`	Cursor IDE rules with frontmatter
`claude`	`CLAUDE.md`	Claude Code project instructions
`copilot`	`.github/copilot-instructions.md`	GitHub Copilot instructions
`windsurf`	`.windsurfrules`	Windsurf IDE rules
`aider`	`.aider.conf.yml`	Aider conventions
`generic`	`.ai-rules.md`	Generic markdown rules
`json`	`.ai-rules.json`	Machine-readable JSON

Selecting Formats

Single Format (Default)

# Cursor format (default)
ruley

# Claude format
ruley --format claude

Multiple Formats

ruley --format cursor,claude,copilot

All Formats

ruley --format all

Custom Output Path

For a single format, you can override the output path:

ruley --format claude --output ./docs/CLAUDE.md

For multiple formats, use the config file:

[output.paths]
cursor = ".cursor/rules/project-rules.mdc"
claude = "docs/CLAUDE.md"

Format Details

Cursor (`.mdc`)

Cursor IDE rules use the .mdc (markdown component) format with YAML frontmatter. Rules are placed in .cursor/rules/ and loaded automatically by Cursor.

The --rule-type flag controls the frontmatter alwaysApply field:

Rule Type	Behavior
`auto`	LLM decides based on rule content
`always`	Rules always apply to every file
`manual`	Rules must be manually activated
`agent-requested`	Rules are requested by the AI agent

Claude (`CLAUDE.md`)

A single markdown file at the project root. Claude Code reads this file as project context for all conversations. Content is structured as guidelines and conventions in standard markdown.

Copilot (`.github/copilot-instructions.md`)

GitHub Copilot’s project-level instructions file. Placed in the .github/ directory. Content is natural language instructions that guide Copilot’s suggestions.

Windsurf (`.windsurfrules`)

Windsurf IDE rules file at the project root. Similar to Cursor rules but without frontmatter. Content is structured as conventions and patterns.

Aider (`.aider.conf.yml`)

Aider’s configuration file in YAML format. Contains conventions and patterns that guide Aider’s code generation.

Generic (`.ai-rules.md`)

A generic markdown format not tied to any specific tool. Useful as a portable set of conventions that can be manually included in any AI assistant’s context.

JSON (`.ai-rules.json`)

Machine-readable JSON format for programmatic consumption. Contains the same convention data in a structured format suitable for integration with custom tools.

Conflict Resolution

When output files already exist, ruley offers several strategies:

Strategy	Behavior
`prompt`	Ask the user what to do (default, interactive)
`overwrite`	Replace existing files (creates backups)
`skip`	Skip formats where files exist
`smart-merge`	Use LLM to merge new rules with existing ones

Set the strategy via CLI or config:

ruley --on-conflict smart-merge

[output]
on_conflict = "smart-merge"

When overwrite is used, ruley creates .bak backups of existing files before writing.

Single Analysis, Multiple Outputs

ruley performs a single LLM analysis of your codebase, then generates format-specific rules through a refinement step. This means:

The analysis cost is paid once regardless of how many formats you generate
Each format adds a small refinement LLM call to adapt the analysis to format-specific conventions
Generating all 7 formats is only marginally more expensive than generating 1

Architecture Overview

Module Map
Module Responsibilities
Design Principles
Data Flow
Key Abstractions

ruley is a single-crate Rust CLI tool organized into focused modules. This chapter describes the high-level architecture, module responsibilities, and design principles.

Module Map

graph TB
    CLI["cli/<br/>Argument parsing<br/>& configuration"]
    Packer["packer/<br/>File discovery<br/>& compression"]
    LLM["llm/<br/>Provider abstraction<br/>& token counting"]
    Gen["generator/<br/>Prompt templates<br/>& rule parsing"]
    Output["output/<br/>Format writers<br/>& conflict resolution"]
    Utils["utils/<br/>Errors, progress<br/>& caching"]

    CLI --> Packer
    CLI --> LLM
    Packer --> LLM
    LLM --> Gen
    Gen --> Output
    CLI --> Utils
    Packer --> Utils
    LLM --> Utils
    Gen --> Utils
    Output --> Utils

Module Responsibilities

Module	Purpose
`cli/`	Command-line interface with clap argument parsing, config file loading and merging
`packer/`	Repository scanning, file discovery, gitignore handling, tree-sitter compression
`llm/`	Multi-provider LLM integration, tokenization, chunking, cost calculation
`generator/`	Analysis and refinement prompt templates, response parsing, rule structures
`output/`	Multi-format file writers, conflict resolution, smart-merge
`utils/`	Shared utilities: error types, progress bars, caching, state management, validation

Design Principles

Provider-Agnostic LLM Interface

All LLM providers implement the LLMProvider trait, which defines a standard interface for completions. The LLMClient wraps a provider and handles retry logic. New providers can be added by implementing the trait and gating behind a Cargo feature flag.

Format-Agnostic Rule Generation

The pipeline performs a single LLM analysis pass, then generates format-specific rules through lightweight refinement calls. The GeneratedRules structure holds format-independent analysis results and per-format FormattedRules. This means adding a new output format requires only a new refinement prompt and writer – no changes to the analysis pipeline.

Token-Efficient Processing

ruley minimizes LLM costs through:

Tree-sitter compression: AST-based extraction reduces token count by ~70%
Accurate counting: Native tiktoken tokenization matches provider billing
Intelligent chunking: Large codebases are split at logical boundaries
Cost transparency: Estimates are shown before any LLM calls

Local-First Design

The scanning, compression, and output stages run entirely locally without network access. Only the analysis and refinement stages call external LLM APIs. When using Ollama, the entire pipeline runs on your machine.

Data Flow

flowchart LR
    Repo["Repository<br/>files"] --> Scan["Scan &<br/>filter"]
    Scan --> Compress["Compress<br/>(tree-sitter)"]
    Compress --> Tokenize["Tokenize<br/>& chunk"]
    Tokenize --> Analyze["LLM<br/>analysis"]
    Analyze --> Refine["Format<br/>refinement"]
    Refine --> Validate["Validate<br/>& finalize"]
    Validate --> Write["Write<br/>files"]

Repository files are scanned respecting .gitignore rules
Source files are compressed via tree-sitter (if enabled) to reduce token count
The compressed codebase is tokenized and split into chunks if needed
Chunks are sent to the LLM for analysis to extract conventions
The analysis is refined per output format through targeted prompts
Generated rules are validated (syntax, schema, semantic checks)
Final rules are written to disk at format-standard locations

See Rule Generation Pipeline for detailed stage-by-stage documentation.

Key Abstractions

`PipelineContext`

The central state container passed through all 10 pipeline stages. It carries:

config: MergedConfig – Final resolved configuration
stage: PipelineStage – Current execution stage
compressed_codebase – Scanned and compressed repository data
generated_rules – Analysis results and formatted rules
cost_tracker – Running tally of LLM costs
progress_manager – Visual progress feedback

`MergedConfig`

The single source of truth for all configuration values, produced by merging CLI flags, environment variables, and config files.

`LLMProvider` Trait

The abstraction layer for LLM providers. Each provider (Anthropic, OpenAI, Ollama, OpenRouter) implements this trait. The LLMClient wraps a provider and adds retry logic with exponential backoff.

`GeneratedRules`

Holds the format-independent analysis and per-format rule content. Populated during the analysis and formatting stages, consumed during writing.

Rule Generation Pipeline

Pipeline Stages
Stage 1: Init
Stage 2: Scanning
Stage 3: Compressing
Stage 4: Analyzing
Stage 5: Formatting
Stage 6: Validating
Stage 7: Finalizing
Stage 8: Writing
Stage 9: Reporting
Stage 10: Cleanup
Dry Run Mode

ruley processes codebases through a 10-stage pipeline. Each stage has a clear responsibility and transitions cleanly to the next. The PipelineContext carries state through all stages.

Pipeline Stages

flowchart TD
    S1["1. Init<br/>Configuration validation"]
    S2["2. Scanning<br/>File discovery"]
    S3["3. Compressing<br/>Tree-sitter compression"]
    S4["4. Analyzing<br/>LLM analysis"]
    S5["5. Formatting<br/>Per-format refinement"]
    S6["6. Validating<br/>Rule validation"]
    S7["7. Finalizing<br/>Post-processing"]
    S8["8. Writing<br/>File output"]
    S9["9. Reporting<br/>Summary display"]
    S10["10. Cleanup<br/>Temp file removal"]

    S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 --> S8 --> S9 --> S10

Stage 1: Init

Module: src/lib.rs

Validates the repository path exists
Creates the .ruley/ cache directory
Cleans up old temporary files (24-hour threshold)
Ensures .ruley/ is in .gitignore
Loads previous run state from .ruley/state.json

Stage 2: Scanning

Module: src/packer/

Discovers all files in the repository
Respects .gitignore rules via the ignore crate
Applies --include and --exclude glob patterns
Identifies file languages for compression
Caches the file list to .ruley/ for debugging

If --repomix-file is provided, scanning is skipped and the pre-packed file is used directly.

Stage 3: Compressing

Module: src/packer/compression/

Reads discovered files and their content
When --compress is enabled, uses tree-sitter grammars to extract structural elements (functions, classes, types) while removing implementation details
Calculates compression metadata (file count, original size, compressed size, ratio)
Target compression ratio: ~70% token reduction

Without --compress, files are included at full size.

Stage 4: Analyzing

Module: src/llm/, src/generator/

This is the core LLM interaction stage:

Tokenize: Count tokens in the compressed codebase using the provider’s tokenizer
Chunk: If the codebase exceeds the provider’s context window, split into chunks with configurable overlap
Cost estimate: Calculate and display estimated cost
Confirm: Prompt the user to approve (unless --no-confirm)
Analyze: Send each chunk to the LLM with the analysis prompt
Merge: If multi-chunk, perform an additional LLM call to merge chunk analyses
Parse: Extract structured GeneratedRules from the LLM response

The analysis prompt asks the LLM to identify:

Project conventions and coding style
Architecture patterns and module structure
Error handling approaches
Testing practices
Naming conventions

Stage 5: Formatting

Module: src/generator/

For each requested output format:

Build a format-specific refinement prompt with the analysis result
Call the LLM to generate format-adapted content
Store the result in GeneratedRules.rules_by_format

Each format refinement is a separate LLM call to ensure format-specific conventions are followed (e.g., YAML frontmatter for Cursor, markdown for Claude).

Stage 6: Validating

Module: src/utils/validation.rs

Validates generated rules against multiple criteria:

Syntax validation: Format-specific structure checks (valid YAML, valid frontmatter, etc.)
Schema validation: Required fields and structure
Semantic validation (configurable):
- File paths referenced in rules exist in the codebase
- No contradictory rules
- Cross-format consistency
- Languages/frameworks match the actual codebase

If validation fails and --retry-on-validation-failure is set, ruley sends the errors back to the LLM for auto-fix (up to max_retries attempts).

Stage 7: Finalizing

Module: src/utils/finalization.rs

Post-processing of validated rules:

Metadata injection: Adds generation timestamp, ruley version, and provider info
Deconfliction: If existing rule files are present, uses an LLM call to merge new rules with existing ones (unless --no-deconflict)
Formatting normalization: Normalizes line endings and trailing whitespace
Post-finalize smoke validation: Re-validates after finalization to catch any introduced errors

Stage 8: Writing

Module: src/output/

Writes rule files to disk:

Resolves output paths (format defaults, config overrides, --output flag)
Applies conflict resolution strategy (prompt, overwrite, skip, smart-merge)
Creates backups when overwriting
Reports what was written (created, updated, skipped, merged)

Stage 9: Reporting

Module: src/utils/summary.rs

Displays a summary of the pipeline run:

Files analyzed
Tokens processed
Compression ratio (if applicable)
Total LLM cost
Elapsed time
Output files written

Stage 10: Cleanup

Module: src/lib.rs, src/utils/cache.rs

Final cleanup:

Saves pipeline state to .ruley/state.json (for future runs)
Cleans up temporary files in .ruley/
Transitions to the Complete terminal state

Dry Run Mode

When --dry-run is specified, the pipeline runs stages 1-3 (Init, Scanning, Compressing), displays what would be processed (file count, token estimate, cost), and exits without making any LLM calls.

Tree-Sitter Compression

How It Works
- Before Compression
- After Compression
Supported Languages
Usage
What Gets Extracted
What Gets Removed
Compression Metrics
When to Use Compression
ABI Compatibility

ruley uses tree-sitter grammars to compress source code before sending it to the LLM. This reduces token count by approximately 70%, significantly lowering costs for large codebases.

How It Works

Tree-sitter parses source files into abstract syntax trees (ASTs). ruley walks these ASTs to extract structural elements – function signatures, type definitions, class declarations, imports – while removing implementation bodies. The result is a compressed representation that preserves the project’s API surface and architecture while discarding the details.

Before Compression

#![allow(unused)]
fn main() {
pub fn analyze_codebase(path: &Path, config: &Config) -> Result<Analysis> {
    let files = scan_files(path, config)?;
    let mut analysis = Analysis::new();
    for file in &files {
        let content = std::fs::read_to_string(&file.path)?;
        let tokens = tokenize(&content);
        analysis.add_file(file, tokens);
    }
    analysis.finalize()
}
}

After Compression

#![allow(unused)]
fn main() {
pub fn analyze_codebase(path: &Path, config: &Config) -> Result<Analysis> { ... }
}

The LLM sees the function signature, return type, and parameter types – enough to understand the codebase’s API surface without the implementation details.

Supported Languages

Each language requires a tree-sitter grammar compiled into ruley via Cargo feature flags:

Language	Feature Flag	Grammar Version
TypeScript	`compression-typescript` (default)	tree-sitter-typescript 0.23.2
Python	`compression-python`	tree-sitter-python 0.25.0
Rust	`compression-rust`	tree-sitter-rust 0.24.0
Go	`compression-go`	tree-sitter-go 0.25.0

Enable all languages with:

cargo install ruley --features compression-all

Files in unsupported languages are included at full size (no compression applied).

Usage

Enable compression with the --compress flag:

ruley --compress

Or in the config file:

[general]
compress = true

What Gets Extracted

The compression extracts structural elements that help the LLM understand your codebase:

Functions: Signatures, parameters, return types
Types: Struct/class definitions, enum variants, type aliases
Traits/Interfaces: Method signatures
Imports: Module dependencies
Constants: Top-level constant definitions
Module structure: File and directory organization

What Gets Removed

Implementation details that don’t affect the LLM’s understanding of conventions:

Function bodies (replaced with { ... })
Loop internals
Conditional branches
Local variable assignments
Comments (optional, depending on grammar)

Compression Metrics

ruley tracks and reports compression statistics:

Total files: Number of files processed
Original size: Total bytes before compression
Compressed size: Total bytes after compression
Compression ratio: Ratio of compressed to original (lower is better)

These metrics are displayed during pipeline execution and in the final summary.

When to Use Compression

Use compression when:

Your codebase is large (>1000 files or >500K tokens)
You want to minimize LLM costs
The codebase has languages with tree-sitter grammar support

Skip compression when:

Your codebase is small (the cost savings are negligible)
You need the LLM to see implementation details for accurate convention extraction
Your primary language doesn’t have a tree-sitter grammar in ruley

ABI Compatibility

ruley uses tree-sitter 0.26.x (ABI v15). Language parsers may use slightly older ABI versions:

tree-sitter-go 0.25.0: ABI v15
tree-sitter-python 0.25.0: ABI v15
tree-sitter-rust 0.24.0: ABI v15
tree-sitter-typescript 0.23.2: ABI v14 (compatible via backward compatibility)

The tree-sitter core library supports backward-compatible ABI versions, so older grammar versions work correctly.

Token Counting and Chunking

Development Setup

Prerequisites
Quick Start
Toolchain Management
- Without mise
Development Commands
IDE Setup
- rust-analyzer
Project Structure
Code Quality
- Lint Policy
Commit Standards

This chapter covers setting up a development environment for contributing to ruley.

Prerequisites

Rust 1.91+ (see rust-version in Cargo.toml for minimum supported version)
Git for version control
mise (recommended) for development toolchain management

Quick Start

# Clone the repository
git clone https://github.com/EvilBit-Labs/ruley.git
cd ruley

# Install development tools (mise handles everything via mise.toml)
just setup

# Build the project
just build

# Run tests
just test

# Run the CLI
just run --help

Toolchain Management

ruley uses mise to manage the development toolchain. The mise.toml file at the project root defines all required tools and versions:

Rust 1.93.1 with rustfmt and clippy components
cargo-nextest for faster test execution
cargo-llvm-cov for code coverage
cargo-audit and cargo-deny for security auditing
mdbook and plugins for documentation
git-cliff for changelog generation
pre-commit for pre-commit hooks
actionlint for GitHub Actions linting

Run mise install to install all tools, or let just setup handle it.

Without mise

If you prefer not to use mise, install Rust via rustup and install individual tools with cargo install:

rustup toolchain install 1.93.1 --profile default -c rustfmt,clippy
cargo install cargo-nextest cargo-llvm-cov cargo-audit cargo-deny

Development Commands

ruley uses just as its task runner. Run just to see all available recipes:

Command	Description
`just test`	Run tests with nextest (all features)
`just test-verbose`	Run tests with output
`just lint`	Run rustfmt check + clippy (all features)
`just clippy-min`	Run clippy with no default features
`just check`	Quick check: pre-commit + lint + build-check
`just ci-check`	Full CI suite: lint, test, build, audit, coverage
`just build`	Debug build
`just build-release`	Release build (all features, LTO)
`just fmt`	Format code
`just coverage`	Generate LCOV coverage report
`just audit`	Run cargo audit
`just deny`	Run cargo deny checks
`just outdated`	Check for outdated dependencies
`just doc`	Generate and open rustdoc
`just docs-serve`	Serve mdbook docs locally with live reload
`just run <args>`	Run the CLI with arguments
`just changelog`	Generate CHANGELOG.md from git history

IDE Setup

rust-analyzer

ruley works well with rust-analyzer. Recommended VS Code settings:

{
  "rust-analyzer.cargo.features": "all",
  "rust-analyzer.check.command": "clippy",
  "rust-analyzer.check.extraArgs": [
    "--all-features"
  ]
}

Project Structure

ruley/
  src/
    cli/          # CLI argument parsing and config management
    packer/       # File discovery, gitignore, compression
    llm/          # LLM providers, tokenization, chunking
    generator/    # Prompt templates and rule parsing
    output/       # Format writers and conflict resolution
    utils/        # Errors, progress, caching, validation
    lib.rs        # Pipeline orchestration (10-stage pipeline)
    main.rs       # Entry point
  tests/          # Integration tests
  benches/        # Criterion benchmarks
  prompts/        # LLM prompt templates (markdown)
  docs/           # mdbook documentation (this book)
  examples/       # Example configuration files

Code Quality

Before submitting changes, ensure:

All tests pass: just test
No clippy warnings: just lint (includes all features) and just clippy-min (no default features)
Code is formatted: just fmt
Full CI suite passes: just ci-check

Lint Policy

ruley enforces a zero-warnings policy. Key lint rules:

unsafe_code = "deny" – No unsafe code in production (tests may use #[allow(unsafe_code)])
unwrap_used = "deny" – No unwrap() in production code
panic = "deny" – No panic!() in production code
pedantic, nursery, cargo – Clippy lint groups at warn level

See [workspace.lints.clippy] in Cargo.toml for the full lint configuration.

Commit Standards

Follow Conventional Commits:

<type>[(<scope>)]: <description>

Types: feat, fix, docs, refactor, test, perf, build, ci, chore
Scope (optional): cli, packer, llm, generator, output, utils, config, deps
DCO: Always sign off with git commit -s

See CONTRIBUTING.md for full contribution guidelines.

Testing

Testing Philosophy
Running Tests
Test Organization
CI Testing
Writing Tests
- Guidelines
- Unsafe Code in Tests

This chapter covers ruley’s testing philosophy, how to run tests, and guidelines for writing new tests.

Testing Philosophy

ruley follows the test proportionality principle: test critical functionality and real edge cases. Test code should be shorter than implementation.

Do test:

Critical functionality and real edge cases
Error conditions and recovery paths
Token counting and chunking logic
Retry logic and error handling
Cost estimation
Compression ratio targets (~70% token reduction)

Don’t test:

Trivial operations or framework behavior
Every possible provider/format permutation
Obvious success cases or trivial formatting

Running Tests

All Tests

just test

This runs all tests with cargo-nextest and --all-features.

Verbose Output

just test-verbose

Specific Tests

# Run a specific test by name
cargo test test_name

# Run tests in a specific module
cargo test packer::

# Run integration tests only
cargo test --test '*'

Coverage

just coverage

Generates an LCOV coverage report at lcov.info using cargo-llvm-cov.

Test Organization

Unit Tests

Unit tests live in the same file as the code they test, inside #[cfg(test)] modules:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_something() {
        // ...
    }
}
}

Integration Tests

Integration tests live in the tests/ directory and test the CLI as a black box using assert_cmd:

tests/
  common/
    mod.rs        # Shared test utilities
  cli_tests.rs    # CLI integration tests
  ...

Test Utilities

The tests/common/mod.rs module provides shared helpers for integration tests:

Environment isolation: Uses a denylist pattern (env_remove) to strip sensitive variables from subprocess environments
Denylisted variables: RULEY_*, ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY, OLLAMA_HOST

Important: The denylist uses env_remove (not env_clear()) because env_clear() breaks coverage instrumentation (LLVM_PROFILE_FILE), rustflags, and other tooling-injected variables.

Async Tests

Use #[tokio::test] for async tests:

#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_async_operation() {
    let result = some_async_function().await;
    assert!(result.is_ok());
}
}

Snapshot Tests

ruley uses insta for snapshot testing of CLI outputs and generated rules:

#![allow(unused)]
fn main() {
use insta::assert_snapshot;

#[test]
fn test_output_format() {
    let output = generate_output();
    assert_snapshot!(output);
}
}

Update snapshots with:

cargo insta review

CI Testing

CI runs the full test suite on every push and pull request:

Quality: just lint-rust (formatting + clippy)
Tests: just test with all features
Cross-platform: Tests on Linux, macOS, and Windows
Feature combinations: Default features, no features, all features
MSRV: Checks compilation with stable minus 2 releases
Coverage: Generates and uploads to Codecov

All CI checks must pass before merge. See .github/workflows/ci.yml for the full configuration.

Writing Tests

Guidelines

Test the behavior, not the implementation – Focus on inputs and outputs
Use descriptive test names – test_chunk_size_exceeds_context_triggers_chunking
One assertion per concept – Multiple assert! calls are fine, but each test should verify one logical behavior
Avoid mocking when possible – Integration tests with real (but controlled) inputs are preferred
Keep tests fast – Use small inputs and avoid network calls in unit tests

Unsafe Code in Tests

Rust 2024 edition makes std::env::set_var unsafe due to data race concerns. Tests that manipulate environment variables need #[allow(unsafe_code)]:

#![allow(unused)]
fn main() {
#[test]
#[allow(unsafe_code)]
fn test_env_var_override() {
    unsafe { std::env::set_var("RULEY_PROVIDER", "openai") };
    // ... test logic ...
    unsafe { std::env::remove_var("RULEY_PROVIDER") };
}
}

Security

Reporting Vulnerabilities
Security Features
Scope
- In Scope
- Out of Scope
Response Timeline

This chapter covers ruley’s security model, vulnerability reporting, and security features.

Reporting Vulnerabilities

Do not report security vulnerabilities through public GitHub issues.

Use one of the following channels:

GitHub Private Vulnerability Reporting (preferred)
Email support@evilbitlabs.io encrypted with the project’s PGP key

Please include:

Description of the vulnerability
Steps to reproduce
Potential impact
Suggested fix (if any)

See SECURITY.md for full policy details including scope, response times, safe harbor provisions, and the PGP key.

Security Features

Code Safety

unsafe_code = "deny" enforced at the package level
Pure Rust implementation with no C dependencies in core logic
Zero unwrap() and panic!() in production code (enforced via clippy lints)

Credential Handling

API keys are read from environment variables at runtime
Keys are never stored in generated output files
Keys are never logged or included in error messages
No credential persistence between runs

Network Security

No network listening: ruley makes outbound-only connections
Connections are made only to configured LLM provider APIs
HTTPS is used for all API calls
No telemetry or analytics

Supply Chain Security

GitHub Actions pinned to full commit SHAs
cargo audit runs in CI to check for known vulnerabilities
cargo deny checks license compliance and duplicate dependencies
CodeQL analysis on every PR
OSSF Scorecard monitoring
Automated dependency updates via Dependabot

Scope

In Scope

API key or credential leakage through error messages, logs, or generated output
Command injection via CLI arguments or configuration files
Path traversal in file input/output handling
Prompt injection affecting output integrity
Denial of service via crafted input files or configuration
Unsafe handling of LLM responses (e.g., writing to unintended paths)

Out of Scope

Vulnerabilities in upstream LLM providers (Anthropic, OpenAI, etc.)
Issues requiring physical access to the machine
Social engineering attacks
LLM hallucinations or inaccurate generated rules (quality issue, not security)

Response Timeline

This is a volunteer-maintained project. Response times are best-effort:

Acknowledgment: Within 1 week
Initial assessment: Within 2 weeks
Fix release: Within 90 days of confirmed vulnerabilities
Disclosure: Coordinated through GitHub Security Advisories

Release Process

Overview
Platform Targets
Pre-Release Checklist
Version Bump Process
Tag and Release
Automated Release Pipeline
- release.yml (cargo-dist)
- release-plz.yml
Changelog Generation
Rollback Procedure
Prerelease Versions
Versioning Policy

ruley releases are automated via cargo-dist and GitHub Actions. This chapter documents the release workflow and verification procedures.

Overview

Pushing a version tag (e.g., v1.0.0) triggers the release workflow, which:

Validates the tag version matches Cargo.toml
Builds binaries for all 5 platform targets
Generates SHA256 checksums
Signs artifacts via Sigstore/GitHub Attestations
Creates a GitHub release with changelog and binaries
Publishes to crates.io (non-prereleases only)
Updates the Homebrew tap

Configuration lives in dist-workspace.toml.

Platform Targets

Platform	Target	Archive
Linux x86_64	`x86_64-unknown-linux-gnu`	`.tar.gz`
Linux x86_64 (static)	`x86_64-unknown-linux-musl`	`.tar.gz`
Linux ARM64	`aarch64-unknown-linux-gnu`	`.tar.gz`
macOS ARM64	`aarch64-apple-darwin`	`.tar.gz`
Windows x86_64	`x86_64-pc-windows-msvc`	`.zip`

Pre-Release Checklist

Before creating a release:

All tests pass locally: just ci-check
Zero clippy warnings: cargo clippy --all-targets --all-features -- -D warnings
Documentation is up to date (README.md, CHANGELOG.md)
Review open issues and PRs for release blockers
Release build succeeds: cargo build --release
Binary works correctly: ./target/release/ruley --help
Dry-run crates.io publish: cargo publish --dry-run --all-features

Version Bump Process

Update the version in Cargo.toml:
```
version = "X.Y.Z"
```
Run cargo update to update Cargo.lock.
Generate the changelog:
```
just changelog
```
Review and edit CHANGELOG.md for the new version entry.

Commit all changes:

git add Cargo.toml Cargo.lock CHANGELOG.md
git commit -s -m "chore(release): prepare for vX.Y.Z"

Tag and Release

Create an annotated tag:
```
git tag -a vX.Y.Z -m "Release vX.Y.Z"
```
Push the tag to trigger the release workflow:
```
git push origin vX.Y.Z
```

Automated Release Pipeline

The release is managed by two GitHub Actions workflows:

`release.yml` (cargo-dist)

Triggered by v* tags. Builds platform binaries, creates the GitHub release, publishes to crates.io, and updates the Homebrew tap.

`release-plz.yml`

Runs on every push to main:

release-plz-pr: Creates a release preparation PR with version bumps and changelog updates
release-plz-release: Publishes to crates.io when version changes are merged

Changelog Generation

Changelogs are generated by git-cliff using the configuration in cliff.toml. Commits follow the Conventional Commits specification and are grouped by type:

Features, Bug Fixes, Refactoring, Documentation, Performance, Testing, Miscellaneous, Security

Rollback Procedure

If a release needs to be rolled back:

Delete the tag locally: git tag -d vX.Y.Z
Delete the tag remotely: git push origin :refs/tags/vX.Y.Z
Delete the GitHub release via the web interface
Yank from crates.io: cargo yank --version X.Y.Z

Yanking prevents new installs but does not remove the package. Existing Cargo.lock files referencing this version will still work.

Prerelease Versions

For release candidates or beta releases:

Use a prerelease tag: v1.0.0-rc.1, v1.0.0-beta.1
The release workflow marks these as prereleases on GitHub
Prerelease versions are not published to crates.io automatically

Versioning Policy

ruley follows Semantic Versioning:

Major (X.0.0): Breaking changes to CLI interface or config format
Minor (0.X.0): New features, new providers, new output formats
Patch (0.0.X): Bug fixes, dependency updates, documentation

Release Verification

GitHub Attestations
- Verifying with gh
- What This Verifies
SHA256 Checksums
- Verifying Checksums
crates.io Verification
Verifying a Cargo Install
Supply Chain Security

This chapter explains how to verify the authenticity and integrity of ruley release artifacts.

GitHub Attestations

All release artifacts are signed via Sigstore using GitHub Attestations. This provides cryptographic proof that binaries were built by the official GitHub Actions workflow and have not been tampered with.

Verifying with `gh`

gh attestation verify <artifact> --repo EvilBit-Labs/ruley

Replace <artifact> with the path to the downloaded binary or archive.

What This Verifies

The artifact was built by the EvilBit-Labs/ruley repository’s GitHub Actions
The build environment matches the expected workflow
The artifact has not been modified since it was built

SHA256 Checksums

Each release includes SHA256 checksums for all platform binaries. These are attached to the GitHub release alongside the binaries.

Verifying Checksums

macOS / Linux

# Download the checksum file
curl -fsSLO https://github.com/EvilBit-Labs/ruley/releases/latest/download/sha256sums.txt

# Verify a specific artifact
sha256sum -c sha256sums.txt --ignore-missing

Windows

# Compute the hash of the downloaded archive
Get-FileHash ruley-x86_64-pc-windows-msvc.zip -Algorithm SHA256

crates.io Verification

When installing via cargo install ruley, Cargo verifies the package integrity automatically using the crates.io checksum. No additional steps are needed.

Verifying a Cargo Install

To verify the installed version:

ruley --version

Compare the output with the expected version from the releases page.

Supply Chain Security

ruley takes several measures to secure the build and release pipeline:

Measure	Description
Pinned Actions	All GitHub Actions are pinned to full commit SHAs
Sigstore signing	Artifacts signed via GitHub Attestations
cargo-audit	Checks for known vulnerabilities in dependencies
cargo-deny	Checks license compliance and duplicate dependencies
CodeQL	Static analysis for security vulnerabilities
OSSF Scorecard	Automated security posture monitoring
Dependabot	Automated dependency update PRs
Reproducible builds	Pinned Rust toolchain via `rust-toolchain.toml` and `mise.toml`
Committed lock file	`Cargo.lock` is committed for deterministic builds

Security Assurance Case

Attack Surface
- Entry Points
- Exit Points
Threat Model
Code Safety Guarantees
Data Flow Security
Updating This Document

This document provides a structured security assurance case for ruley, identifying the attack surface, threat model, and mitigations in place.

Attack Surface

ruley’s attack surface is limited by design. It is a CLI tool that reads local files and makes outbound API calls.

Entry Points

Entry Point	Description	Trust Level
CLI arguments	User-provided flags and paths	Untrusted
Configuration files	TOML files loaded from disk	Semi-trusted
Environment variables	API keys and overrides	Trusted
Repository files	Source files scanned for analysis	Untrusted
LLM API responses	Generated content from providers	Untrusted
Repomix files	Pre-packed XML input files	Untrusted

Exit Points

Exit Point	Description
Generated rule files	Written to disk at user-specified or default paths
LLM API requests	Outbound HTTPS calls to provider endpoints
Console output	Progress, cost estimates, summaries
Cache files	`.ruley/` directory for state and temp files

Threat Model

T1: Credential Leakage

Threat: API keys exposed in error messages, logs, or generated output.

Mitigations:

API keys are read from environment variables only, never persisted
Error messages do not include API key values
Generated rule files do not contain API keys
Logging does not expose credentials

T2: Path Traversal

Threat: Malicious file paths in config or LLM responses writing outside the project directory.

Mitigations:

Output paths are resolved relative to the project root
The output module validates write paths
Config file paths are canonicalized during discovery

T3: Command Injection

Threat: Crafted CLI arguments or config values executing unintended commands.

Mitigations:

clap validates all CLI input with value_parser and PossibleValuesParser
Config values are deserialized through serde (no shell evaluation)
No shell commands are executed from user input

T4: Prompt Injection via Codebase

Threat: Malicious content in scanned source files influencing LLM output to produce harmful rules.

Mitigations:

Generated rules are validated (syntax, schema, semantic checks)
Validation detects contradictory rules and unrealistic references
Users review generated rules before committing to their repository
Finalization stage can deconflict with existing rules

T5: Denial of Service

Threat: Crafted input causing excessive resource consumption (memory, CPU, network).

Mitigations:

Token counting prevents unbounded LLM calls
Chunk size limits cap memory usage per chunk
Cost confirmation requires explicit user approval before expensive operations
Bounded concurrency in async operations

T6: Supply Chain Compromise

Threat: Compromised dependencies or build artifacts.

Mitigations:

cargo audit checks for known vulnerabilities in CI
cargo deny enforces license and duplicate dependency policies
GitHub Actions pinned to commit SHAs (not mutable tags)
CodeQL static analysis on every PR
OSSF Scorecard monitoring
Sigstore artifact signing

Code Safety Guarantees

Guarantee	Enforcement
No unsafe code	`unsafe_code = "deny"` in `[lints.rust]`
No unwrap in production	`unwrap_used = "deny"` in clippy config
No panic in production	`panic = "deny"` in clippy config
Zero clippy warnings	`-D warnings` enforced in CI
Dependency auditing	`cargo audit` and `cargo deny` in CI

Data Flow Security

flowchart LR
    User["User<br/>(trusted)"] -->|CLI args,<br/>env vars| Ruley["ruley<br/>process"]
    Disk["Local files<br/>(semi-trusted)"] -->|config,<br/>source files| Ruley
    Ruley -->|HTTPS| LLM["LLM API<br/>(untrusted response)"]
    LLM -->|generated rules| Ruley
    Ruley -->|validated output| Output["Rule files<br/>(user reviews)"]
    Ruley -->|temp data| Cache[".ruley/<br/>cache"]

Key security boundaries:

Input boundary: All CLI arguments validated by clap; config files deserialized by serde
Network boundary: Only HTTPS outbound to configured providers; no inbound connections
Output boundary: Generated rules validated before writing; paths resolved relative to project root
Trust boundary: LLM responses treated as untrusted input; validated before use

Updating This Document

This document must be updated when:

New entry points are added (e.g., new input sources)
New exit points are added (e.g., new output destinations)
New dependencies are introduced that handle untrusted input
The network communication model changes
New LLM providers are added

Keyboard shortcuts

ruley User Guide