Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Make your codebase ruley. A Rust CLI tool for generating AI IDE rules from codebases.

ruley (the opposite of unruly) is a command-line tool that analyzes codebases and generates AI IDE rule files. It uses Large Language Models to understand project structure, conventions, and patterns, then produces actionable rules that help AI assistants provide better, context-aware code suggestions.

Tame your unruly codebase. Make it ruley.

Why ruley?

AI coding assistants work best when they understand your project’s conventions. Without explicit rules, they fall back to generic patterns that may not match your codebase. ruley bridges this gap by:

  1. Scanning your repository to understand its structure, languages, and patterns
  2. Compressing the codebase using tree-sitter for token efficiency
  3. Analyzing the compressed code with an LLM to extract conventions
  4. Generating format-specific rule files for your preferred AI IDE tools

The result is a set of rule files that teach AI assistants how your project works – coding style, architecture patterns, naming conventions, error handling approaches, and more.

Key Features

  • Single binary distribution – No runtime dependencies (Node.js, Python, etc.)
  • Multi-provider LLM support – Anthropic, OpenAI, Ollama, OpenRouter
  • Multi-format output – Generate rules for 7 different AI IDE formats in a single run
  • Native performance – Fast codebase analysis built with Rust
  • Smart compression – Tree-sitter-based code compression for token efficiency (~70% reduction)
  • Accurate token counting – Native tiktoken implementation for precise cost estimation
  • Cost transparency – Shows estimated cost before LLM calls, requires confirmation
  • Configurable – TOML configuration file, environment variables, and CLI flags

Supported Formats

FormatOutput FileDescription
Cursor.cursor/rules/*.mdcCursor IDE rules
ClaudeCLAUDE.mdClaude Code project instructions
Copilot.github/copilot-instructions.mdGitHub Copilot instructions
Windsurf.windsurfrulesWindsurf IDE rules
Aider.aider.conf.ymlAider conventions
Generic.ai-rules.mdGeneric markdown rules
JSON.ai-rules.jsonMachine-readable JSON

Where to Start

Installation

Pre-built binaries are available for Linux (x86_64, ARM64), macOS (ARM64), and Windows (x86_64) on the releases page.

macOS / Linux

curl -fsSL https://github.com/EvilBit-Labs/ruley/releases/latest/download/ruley-installer.sh | sh

Windows

powershell -ExecutionPolicy Bypass -c "irm https://github.com/EvilBit-Labs/ruley/releases/latest/download/ruley-installer.ps1 | iex"

Homebrew

brew install EvilBit-Labs/tap/ruley

Cargo (crates.io)

cargo install ruley

This builds from source with default features (Anthropic, OpenAI, TypeScript compression).

With All Features

cargo install ruley --all-features

Minimal Install

cargo install ruley --no-default-features --features anthropic

cargo-binstall

If you have cargo-binstall installed:

cargo binstall ruley

Building from Source

git clone https://github.com/EvilBit-Labs/ruley.git
cd ruley
cargo build --release

The binary will be at ./target/release/ruley.

System Requirements

  • Operating system: Linux (x86_64, ARM64), macOS (ARM64), Windows (x86_64)
  • Rust (build from source only): 1.91 or newer (see rust-version in Cargo.toml)
  • Network: Required for LLM API calls (except Ollama which runs locally)

Feature Flags

ruley uses Cargo feature flags to control which LLM providers and compression languages are compiled in:

FeatureDescriptionDefault
anthropicAnthropic Claude providerYes
openaiOpenAI GPT providerYes
ollamaOllama local model providerNo
openrouterOpenRouter multi-model providerNo
all-providersAll LLM providersNo
compression-typescriptTypeScript tree-sitter grammarYes
compression-pythonPython tree-sitter grammarNo
compression-rustRust tree-sitter grammarNo
compression-goGo tree-sitter grammarNo
compression-allAll compression languagesNo

Verifying Releases

All release artifacts are signed via Sigstore using GitHub Attestations:

gh attestation verify <artifact> --repo EvilBit-Labs/ruley

See Release Verification for details.

Quick Start

This guide walks you through generating your first set of AI IDE rules with ruley.

Prerequisites

  1. ruley installed (see Installation)
  2. An API key for at least one LLM provider

Step 1: Set Your API Key

Set the environment variable for your chosen provider:

Anthropic

export ANTHROPIC_API_KEY="sk-ant-..."

OpenAI

export OPENAI_API_KEY="sk-..."

Ollama

# No API key needed -- just ensure Ollama is running
ollama serve

OpenRouter

export OPENROUTER_API_KEY="sk-or-..."

Step 2: Generate Rules

Navigate to your project directory and run ruley:

cd /path/to/your/project
ruley

By default, ruley uses Anthropic Claude and generates Cursor format rules.

Step 3: Review the Output

ruley shows you:

  1. Scan results – How many files were discovered
  2. Compression stats – Token reduction from tree-sitter compression
  3. Cost estimate – Estimated LLM cost before proceeding
  4. Confirmation prompt – You must approve before the LLM call is made
  5. Generated files – Where the rule files were written

Common Variations

Use a Different Provider

ruley --provider openai --model gpt-4o

Generate Multiple Formats

ruley --format cursor,claude,copilot

Generate All Formats at Once

ruley --format all

Enable Tree-Sitter Compression

ruley --compress

Analyze a Specific Directory

ruley ./my-project --compress

Dry Run (Preview Without Calling LLM)

ruley --dry-run

This shows what would be processed (file count, token estimate, cost) without making any LLM calls. Useful for checking costs before committing.

Skip Cost Confirmation

ruley --no-confirm

Use a Local Ollama Model

ruley --provider ollama --model llama3.1

What Happens Next

The generated rule files are placed in your project directory at the standard locations for each format. Your AI IDE tools will automatically pick them up:

  • Cursor: .cursor/rules/*.mdc – loaded automatically by Cursor IDE
  • Claude: CLAUDE.md – read by Claude Code as project context
  • Copilot: .github/copilot-instructions.md – loaded by GitHub Copilot
  • Windsurf: .windsurfrules – loaded by Windsurf IDE
  • Aider: .aider.conf.yml – loaded by Aider CLI

Commit the generated files to your repository so your whole team benefits from consistent AI assistance.

Next Steps

Command-Line Interface

Usage

ruley [OPTIONS] [PATH]

PATH: Path to repository (local path or remote URL). Defaults to . (current directory).

Options

Core Options

FlagEnv VariableDefaultDescription
-p, --provider <NAME>RULEY_PROVIDERanthropicLLM provider (anthropic, openai, ollama, openrouter)
-m, --model <NAME>RULEY_MODEL(provider default)Model to use
-f, --format <FORMATS>RULEY_FORMATcursorOutput format(s), comma-separated
-o, --output <PATH>RULEY_OUTPUT(format default)Output file path (single format only)
-c, --config <PATH>RULEY_CONFIGruley.tomlConfig file path

Generation Options

FlagEnv VariableDefaultDescription
--description <TEXT>RULEY_DESCRIPTION(none)Focus area for rule generation
--rule-type <TYPE>RULEY_RULE_TYPEautoCursor rule type (auto, always, manual, agent-requested)
--compressRULEY_COMPRESSfalseEnable tree-sitter compression
--chunk-size <N>RULEY_CHUNK_SIZE100000Max tokens per LLM chunk
--repomix-file <PATH>RULEY_REPOMIX_FILE(none)Use pre-packed repomix file as input

Filtering Options

FlagDescription
--include <PATTERN>Include only matching files (repeatable)
--exclude <PATTERN>Exclude matching files (repeatable)

Behavior Options

FlagEnv VariableDefaultDescription
--no-confirmRULEY_NO_CONFIRMfalseSkip cost confirmation prompt
--dry-runRULEY_DRY_RUNfalseShow plan without calling LLM
--on-conflict <STRATEGY>RULEY_ON_CONFLICTpromptConflict resolution (prompt, overwrite, skip, smart-merge)
--retry-on-validation-failurefalseAuto-retry with LLM fix on validation failure
--no-deconflictfalseDisable LLM-based deconfliction with existing rules
--no-semantic-validationfalseDisable all semantic validation checks

Output Options

FlagDescription
-vIncrease verbosity (-v = DEBUG, -vv = TRACE)
-qSuppress non-essential output
--versionPrint version information
--helpPrint help information

Environment Variables

All CLI flags can be set via RULEY_* environment variables. CLI flags take precedence over environment variables, which take precedence over config file values.

Provider API Keys

VariableProviderRequired
ANTHROPIC_API_KEYAnthropicWhen using --provider anthropic
OPENAI_API_KEYOpenAIWhen using --provider openai
OLLAMA_HOSTOllamaOptional (default: http://localhost:11434)
OPENROUTER_API_KEYOpenRouterWhen using --provider openrouter

Examples

Basic Usage

# Analyze current directory with defaults
ruley

# Analyze a specific project
ruley /path/to/project

Provider Selection

# Use OpenAI with a specific model
ruley --provider openai --model gpt-4o

# Use local Ollama
ruley --provider ollama --model llama3.1

# Use OpenRouter with Claude
ruley --provider openrouter --model anthropic/claude-3.5-sonnet

Format Control

# Generate Cursor rules (default)
ruley --format cursor

# Generate multiple formats
ruley --format cursor,claude,copilot

# Generate all formats
ruley --format all

# Write to a specific path (single format only)
ruley --format claude --output ./docs/CLAUDE.md

Compression and Performance

# Enable tree-sitter compression (~70% token reduction)
ruley --compress

# Adjust chunk size for large codebases
ruley --chunk-size 200000

# Use a pre-packed repomix file
ruley --repomix-file ./codebase.xml

Cost Management

# Preview without calling the LLM
ruley --dry-run

# Skip the cost confirmation prompt
ruley --no-confirm

Conflict Resolution

# Overwrite existing rule files
ruley --on-conflict overwrite

# Skip if files already exist
ruley --on-conflict skip

# Use LLM to smart-merge with existing rules
ruley --on-conflict smart-merge

Filtering Files

# Only include Rust files
ruley --include "**/*.rs"

# Exclude test directories
ruley --exclude "**/tests/**" --exclude "**/benches/**"

Configuration

ruley supports hierarchical configuration from multiple sources. This page documents the configuration file format and precedence rules.

Configuration Precedence

Configuration is resolved in this order (highest to lowest precedence):

  1. CLI flags – Explicitly provided command-line arguments
  2. Environment variablesRULEY_* prefix (handled by clap’s env attribute)
  3. Config files – Loaded and merged in discovery order (see below)
  4. Built-in defaults – Hardcoded in the CLI parser

When a CLI flag is explicitly provided, it always wins. When it’s not provided (using the default), the config file value is used instead.

Config File Discovery

Config files are discovered and merged in this order (later overrides earlier):

  1. ~/.config/ruley/config.toml – User-level global config
  2. ruley.toml in the git repository root – Project-level config
  3. ./ruley.toml in the current directory – Working directory config
  4. Explicit --config <path> – If provided, overrides all above

All discovered files are merged. Duplicate keys in later files override earlier ones.

Configuration File Format

Configuration files use TOML format. All sections are optional.

Complete Example

[general]
provider = "anthropic"
model = "claude-sonnet-4-5-20250929"
format = ["cursor", "claude"]
compress = true
chunk_size = 100000
no_confirm = false
rule_type = "auto"

[output]
formats = ["cursor", "claude"]
on_conflict = "prompt"

[output.paths]
cursor = ".cursor/rules/project-rules.mdc"
claude = "CLAUDE.md"

[include]
patterns = ["**/*.rs", "**/*.toml"]

[exclude]
patterns = ["**/target/**", "**/node_modules/**"]

[chunking]
chunk_size = 100000
overlap = 10000

[providers.anthropic]
model = "claude-sonnet-4-5-20250929"
max_tokens = 8192

[providers.openai]
model = "gpt-4o"
max_tokens = 4096

[providers.ollama]
host = "http://localhost:11434"
model = "llama3.1:70b"

[providers.openrouter]
model = "anthropic/claude-3.5-sonnet"
max_tokens = 8192

[validation]
enabled = true
retry_on_failure = false
max_retries = 3

[validation.semantic]
check_file_paths = true
check_contradictions = true
check_consistency = true
check_reality = true

[finalization]
enabled = true
deconflict = true
normalize_formatting = true
inject_metadata = true

[general] Section

Core settings for the pipeline.

KeyTypeDefaultDescription
providerstring"anthropic"LLM provider name
modelstring(provider default)Model to use
formatstring[]["cursor"]Output formats
compressboolfalseEnable tree-sitter compression
chunk_sizeint100000Max tokens per LLM chunk
no_confirmboolfalseSkip cost confirmation
rule_typestring"auto"Cursor rule type

[output] Section

Output format and path configuration.

KeyTypeDefaultDescription
formatsstring[][]Alternative to general.format
on_conflictstring"prompt"Conflict resolution strategy
paths.<format>string(format default)Custom output path per format

[include] / [exclude] Sections

File filtering using glob patterns.

KeyTypeDefaultDescription
patternsstring[][]Glob patterns for file matching

[chunking] Section

Controls how large codebases are split for LLM processing.

KeyTypeDefaultDescription
chunk_sizeint100000Max tokens per chunk
overlapintchunk_size / 10Token overlap between chunks

[providers] Section

Provider-specific configuration. Each provider has its own subsection.

[providers.anthropic] / [providers.openai] / [providers.openrouter]:

KeyTypeDescription
modelstringModel name override
max_tokensintMax output tokens

[providers.ollama]:

KeyTypeDescription
hoststringOllama server URL
modelstringModel name

[validation] Section

Controls validation of generated rules.

KeyTypeDefaultDescription
enabledbooltrueEnable validation
retry_on_failureboolfalseAuto-retry with LLM fix
max_retriesint3Max auto-fix attempts

[validation.semantic] – Semantic validation checks:

KeyTypeDefaultDescription
check_file_pathsbooltrueVerify referenced file paths exist
check_contradictionsbooltrueDetect contradictory rules
check_consistencybooltrueCross-format consistency check
check_realitybooltrueVerify language/framework references

[finalization] Section

Controls post-processing of generated rules.

KeyTypeDefaultDescription
enabledbooltrueEnable finalization
deconflictbooltrueLLM-based deconfliction with existing rules
normalize_formattingbooltrueNormalize line endings and whitespace
inject_metadatabooltrueAdd timestamp/version/provider headers

LLM Providers

ruley supports multiple LLM providers. Each provider is feature-gated at compile time and requires its own API key (except Ollama).

Provider Comparison

ProviderAPI Key RequiredLocalDefault ModelContext Window
AnthropicYesNoclaude-sonnet-4-5-20250929200K tokens
OpenAIYesNogpt-4o128K tokens
OllamaNoYesllama3.1:70b~100K tokens
OpenRouterYesNoanthropic/claude-3.5-sonnetVaries by model

Anthropic

Anthropic’s Claude models are the default provider and generally produce excellent rule quality.

Setup

export ANTHROPIC_API_KEY="sk-ant-..."

Usage

# Uses default model (Claude Sonnet 4.5)
ruley --provider anthropic

# Specify a model
ruley --provider anthropic --model claude-sonnet-4-5-20250929

Config File

[general]
provider = "anthropic"

[providers.anthropic]
model = "claude-sonnet-4-5-20250929"
max_tokens = 8192

OpenAI

OpenAI’s GPT models provide strong rule generation with fast response times.

Setup

export OPENAI_API_KEY="sk-..."

Usage

ruley --provider openai --model gpt-4o

Config File

[general]
provider = "openai"

[providers.openai]
model = "gpt-4o"
max_tokens = 4096

Ollama

Ollama runs models locally. No API key is needed, and there are no per-token costs. This is ideal for privacy-sensitive codebases or offline use.

Setup

  1. Install Ollama
  2. Pull a model: ollama pull llama3.1:70b
  3. Start the server: ollama serve

Usage

ruley --provider ollama --model llama3.1

# Custom Ollama host
OLLAMA_HOST="http://192.168.1.100:11434" ruley --provider ollama

Config File

[general]
provider = "ollama"

[providers.ollama]
host = "http://localhost:11434"
model = "llama3.1:70b"

Considerations

  • Rule quality depends heavily on the model size. Larger models (70B+) produce better results.
  • Local models have smaller context windows. Use --compress and --chunk-size to manage large codebases.
  • No cost confirmation is shown since Ollama is free to use.

OpenRouter

OpenRouter provides access to models from multiple providers through a single API. It fetches dynamic pricing from the OpenRouter API for accurate cost estimation.

Setup

export OPENROUTER_API_KEY="sk-or-..."

Usage

ruley --provider openrouter --model anthropic/claude-3.5-sonnet

Config File

[general]
provider = "openrouter"

[providers.openrouter]
model = "anthropic/claude-3.5-sonnet"
max_tokens = 8192

Feature Flags

Providers are compiled in via Cargo feature flags. The default build includes anthropic and openai.

FeatureProvider
anthropicAnthropic (default)
openaiOpenAI (default)
ollamaOllama
openrouterOpenRouter
all-providersAll of the above

To include all providers when building from source:

cargo install ruley --features all-providers

Choosing a Provider

  • Best quality: Anthropic Claude (default) – excellent at understanding code conventions
  • Fastest: OpenAI GPT-4o – lower latency per request
  • Free / Private: Ollama – no API costs, data stays local
  • Flexible: OpenRouter – access to many models through one API

Output Formats

ruley generates rule files in 7 formats. Each format targets a specific AI IDE tool and follows its conventions for file naming, structure, and content.

Format Overview

FormatOutput FileDescription
cursor.cursor/rules/*.mdcCursor IDE rules with frontmatter
claudeCLAUDE.mdClaude Code project instructions
copilot.github/copilot-instructions.mdGitHub Copilot instructions
windsurf.windsurfrulesWindsurf IDE rules
aider.aider.conf.ymlAider conventions
generic.ai-rules.mdGeneric markdown rules
json.ai-rules.jsonMachine-readable JSON

Selecting Formats

Single Format (Default)

# Cursor format (default)
ruley

# Claude format
ruley --format claude

Multiple Formats

ruley --format cursor,claude,copilot

All Formats

ruley --format all

Custom Output Path

For a single format, you can override the output path:

ruley --format claude --output ./docs/CLAUDE.md

For multiple formats, use the config file:

[output.paths]
cursor = ".cursor/rules/project-rules.mdc"
claude = "docs/CLAUDE.md"

Format Details

Cursor (.mdc)

Cursor IDE rules use the .mdc (markdown component) format with YAML frontmatter. Rules are placed in .cursor/rules/ and loaded automatically by Cursor.

The --rule-type flag controls the frontmatter alwaysApply field:

Rule TypeBehavior
autoLLM decides based on rule content
alwaysRules always apply to every file
manualRules must be manually activated
agent-requestedRules are requested by the AI agent

Claude (CLAUDE.md)

A single markdown file at the project root. Claude Code reads this file as project context for all conversations. Content is structured as guidelines and conventions in standard markdown.

Copilot (.github/copilot-instructions.md)

GitHub Copilot’s project-level instructions file. Placed in the .github/ directory. Content is natural language instructions that guide Copilot’s suggestions.

Windsurf (.windsurfrules)

Windsurf IDE rules file at the project root. Similar to Cursor rules but without frontmatter. Content is structured as conventions and patterns.

Aider (.aider.conf.yml)

Aider’s configuration file in YAML format. Contains conventions and patterns that guide Aider’s code generation.

Generic (.ai-rules.md)

A generic markdown format not tied to any specific tool. Useful as a portable set of conventions that can be manually included in any AI assistant’s context.

JSON (.ai-rules.json)

Machine-readable JSON format for programmatic consumption. Contains the same convention data in a structured format suitable for integration with custom tools.

Conflict Resolution

When output files already exist, ruley offers several strategies:

StrategyBehavior
promptAsk the user what to do (default, interactive)
overwriteReplace existing files (creates backups)
skipSkip formats where files exist
smart-mergeUse LLM to merge new rules with existing ones

Set the strategy via CLI or config:

ruley --on-conflict smart-merge
[output]
on_conflict = "smart-merge"

When overwrite is used, ruley creates .bak backups of existing files before writing.

Single Analysis, Multiple Outputs

ruley performs a single LLM analysis of your codebase, then generates format-specific rules through a refinement step. This means:

  • The analysis cost is paid once regardless of how many formats you generate
  • Each format adds a small refinement LLM call to adapt the analysis to format-specific conventions
  • Generating all 7 formats is only marginally more expensive than generating 1

Architecture Overview

ruley is a single-crate Rust CLI tool organized into focused modules. This chapter describes the high-level architecture, module responsibilities, and design principles.

Module Map

graph TB
    CLI["cli/<br/>Argument parsing<br/>& configuration"]
    Packer["packer/<br/>File discovery<br/>& compression"]
    LLM["llm/<br/>Provider abstraction<br/>& token counting"]
    Gen["generator/<br/>Prompt templates<br/>& rule parsing"]
    Output["output/<br/>Format writers<br/>& conflict resolution"]
    Utils["utils/<br/>Errors, progress<br/>& caching"]

    CLI --> Packer
    CLI --> LLM
    Packer --> LLM
    LLM --> Gen
    Gen --> Output
    CLI --> Utils
    Packer --> Utils
    LLM --> Utils
    Gen --> Utils
    Output --> Utils

Module Responsibilities

ModulePurpose
cli/Command-line interface with clap argument parsing, config file loading and merging
packer/Repository scanning, file discovery, gitignore handling, tree-sitter compression
llm/Multi-provider LLM integration, tokenization, chunking, cost calculation
generator/Analysis and refinement prompt templates, response parsing, rule structures
output/Multi-format file writers, conflict resolution, smart-merge
utils/Shared utilities: error types, progress bars, caching, state management, validation

Design Principles

Provider-Agnostic LLM Interface

All LLM providers implement the LLMProvider trait, which defines a standard interface for completions. The LLMClient wraps a provider and handles retry logic. New providers can be added by implementing the trait and gating behind a Cargo feature flag.

Format-Agnostic Rule Generation

The pipeline performs a single LLM analysis pass, then generates format-specific rules through lightweight refinement calls. The GeneratedRules structure holds format-independent analysis results and per-format FormattedRules. This means adding a new output format requires only a new refinement prompt and writer – no changes to the analysis pipeline.

Token-Efficient Processing

ruley minimizes LLM costs through:

  • Tree-sitter compression: AST-based extraction reduces token count by ~70%
  • Accurate counting: Native tiktoken tokenization matches provider billing
  • Intelligent chunking: Large codebases are split at logical boundaries
  • Cost transparency: Estimates are shown before any LLM calls

Local-First Design

The scanning, compression, and output stages run entirely locally without network access. Only the analysis and refinement stages call external LLM APIs. When using Ollama, the entire pipeline runs on your machine.

Data Flow

flowchart LR
    Repo["Repository<br/>files"] --> Scan["Scan &<br/>filter"]
    Scan --> Compress["Compress<br/>(tree-sitter)"]
    Compress --> Tokenize["Tokenize<br/>& chunk"]
    Tokenize --> Analyze["LLM<br/>analysis"]
    Analyze --> Refine["Format<br/>refinement"]
    Refine --> Validate["Validate<br/>& finalize"]
    Validate --> Write["Write<br/>files"]
  1. Repository files are scanned respecting .gitignore rules
  2. Source files are compressed via tree-sitter (if enabled) to reduce token count
  3. The compressed codebase is tokenized and split into chunks if needed
  4. Chunks are sent to the LLM for analysis to extract conventions
  5. The analysis is refined per output format through targeted prompts
  6. Generated rules are validated (syntax, schema, semantic checks)
  7. Final rules are written to disk at format-standard locations

See Rule Generation Pipeline for detailed stage-by-stage documentation.

Key Abstractions

PipelineContext

The central state container passed through all 10 pipeline stages. It carries:

  • config: MergedConfig – Final resolved configuration
  • stage: PipelineStage – Current execution stage
  • compressed_codebase – Scanned and compressed repository data
  • generated_rules – Analysis results and formatted rules
  • cost_tracker – Running tally of LLM costs
  • progress_manager – Visual progress feedback

MergedConfig

The single source of truth for all configuration values, produced by merging CLI flags, environment variables, and config files.

LLMProvider Trait

The abstraction layer for LLM providers. Each provider (Anthropic, OpenAI, Ollama, OpenRouter) implements this trait. The LLMClient wraps a provider and adds retry logic with exponential backoff.

GeneratedRules

Holds the format-independent analysis and per-format rule content. Populated during the analysis and formatting stages, consumed during writing.

Rule Generation Pipeline

ruley processes codebases through a 10-stage pipeline. Each stage has a clear responsibility and transitions cleanly to the next. The PipelineContext carries state through all stages.

Pipeline Stages

flowchart TD
    S1["1. Init<br/>Configuration validation"]
    S2["2. Scanning<br/>File discovery"]
    S3["3. Compressing<br/>Tree-sitter compression"]
    S4["4. Analyzing<br/>LLM analysis"]
    S5["5. Formatting<br/>Per-format refinement"]
    S6["6. Validating<br/>Rule validation"]
    S7["7. Finalizing<br/>Post-processing"]
    S8["8. Writing<br/>File output"]
    S9["9. Reporting<br/>Summary display"]
    S10["10. Cleanup<br/>Temp file removal"]

    S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 --> S8 --> S9 --> S10

Stage 1: Init

Module: src/lib.rs

  • Validates the repository path exists
  • Creates the .ruley/ cache directory
  • Cleans up old temporary files (24-hour threshold)
  • Ensures .ruley/ is in .gitignore
  • Loads previous run state from .ruley/state.json

Stage 2: Scanning

Module: src/packer/

  • Discovers all files in the repository
  • Respects .gitignore rules via the ignore crate
  • Applies --include and --exclude glob patterns
  • Identifies file languages for compression
  • Caches the file list to .ruley/ for debugging

If --repomix-file is provided, scanning is skipped and the pre-packed file is used directly.

Stage 3: Compressing

Module: src/packer/compression/

  • Reads discovered files and their content
  • When --compress is enabled, uses tree-sitter grammars to extract structural elements (functions, classes, types) while removing implementation details
  • Calculates compression metadata (file count, original size, compressed size, ratio)
  • Target compression ratio: ~70% token reduction

Without --compress, files are included at full size.

Stage 4: Analyzing

Module: src/llm/, src/generator/

This is the core LLM interaction stage:

  1. Tokenize: Count tokens in the compressed codebase using the provider’s tokenizer
  2. Chunk: If the codebase exceeds the provider’s context window, split into chunks with configurable overlap
  3. Cost estimate: Calculate and display estimated cost
  4. Confirm: Prompt the user to approve (unless --no-confirm)
  5. Analyze: Send each chunk to the LLM with the analysis prompt
  6. Merge: If multi-chunk, perform an additional LLM call to merge chunk analyses
  7. Parse: Extract structured GeneratedRules from the LLM response

The analysis prompt asks the LLM to identify:

  • Project conventions and coding style
  • Architecture patterns and module structure
  • Error handling approaches
  • Testing practices
  • Naming conventions

Stage 5: Formatting

Module: src/generator/

For each requested output format:

  1. Build a format-specific refinement prompt with the analysis result
  2. Call the LLM to generate format-adapted content
  3. Store the result in GeneratedRules.rules_by_format

Each format refinement is a separate LLM call to ensure format-specific conventions are followed (e.g., YAML frontmatter for Cursor, markdown for Claude).

Stage 6: Validating

Module: src/utils/validation.rs

Validates generated rules against multiple criteria:

  • Syntax validation: Format-specific structure checks (valid YAML, valid frontmatter, etc.)
  • Schema validation: Required fields and structure
  • Semantic validation (configurable):
    • File paths referenced in rules exist in the codebase
    • No contradictory rules
    • Cross-format consistency
    • Languages/frameworks match the actual codebase

If validation fails and --retry-on-validation-failure is set, ruley sends the errors back to the LLM for auto-fix (up to max_retries attempts).

Stage 7: Finalizing

Module: src/utils/finalization.rs

Post-processing of validated rules:

  • Metadata injection: Adds generation timestamp, ruley version, and provider info
  • Deconfliction: If existing rule files are present, uses an LLM call to merge new rules with existing ones (unless --no-deconflict)
  • Formatting normalization: Normalizes line endings and trailing whitespace
  • Post-finalize smoke validation: Re-validates after finalization to catch any introduced errors

Stage 8: Writing

Module: src/output/

Writes rule files to disk:

  • Resolves output paths (format defaults, config overrides, --output flag)
  • Applies conflict resolution strategy (prompt, overwrite, skip, smart-merge)
  • Creates backups when overwriting
  • Reports what was written (created, updated, skipped, merged)

Stage 9: Reporting

Module: src/utils/summary.rs

Displays a summary of the pipeline run:

  • Files analyzed
  • Tokens processed
  • Compression ratio (if applicable)
  • Total LLM cost
  • Elapsed time
  • Output files written

Stage 10: Cleanup

Module: src/lib.rs, src/utils/cache.rs

Final cleanup:

  • Saves pipeline state to .ruley/state.json (for future runs)
  • Cleans up temporary files in .ruley/
  • Transitions to the Complete terminal state

Dry Run Mode

When --dry-run is specified, the pipeline runs stages 1-3 (Init, Scanning, Compressing), displays what would be processed (file count, token estimate, cost), and exits without making any LLM calls.

Tree-Sitter Compression

ruley uses tree-sitter grammars to compress source code before sending it to the LLM. This reduces token count by approximately 70%, significantly lowering costs for large codebases.

How It Works

Tree-sitter parses source files into abstract syntax trees (ASTs). ruley walks these ASTs to extract structural elements – function signatures, type definitions, class declarations, imports – while removing implementation bodies. The result is a compressed representation that preserves the project’s API surface and architecture while discarding the details.

Before Compression

#![allow(unused)]
fn main() {
pub fn analyze_codebase(path: &Path, config: &Config) -> Result<Analysis> {
    let files = scan_files(path, config)?;
    let mut analysis = Analysis::new();
    for file in &files {
        let content = std::fs::read_to_string(&file.path)?;
        let tokens = tokenize(&content);
        analysis.add_file(file, tokens);
    }
    analysis.finalize()
}
}

After Compression

#![allow(unused)]
fn main() {
pub fn analyze_codebase(path: &Path, config: &Config) -> Result<Analysis> { ... }
}

The LLM sees the function signature, return type, and parameter types – enough to understand the codebase’s API surface without the implementation details.

Supported Languages

Each language requires a tree-sitter grammar compiled into ruley via Cargo feature flags:

LanguageFeature FlagGrammar Version
TypeScriptcompression-typescript (default)tree-sitter-typescript 0.23.2
Pythoncompression-pythontree-sitter-python 0.25.0
Rustcompression-rusttree-sitter-rust 0.24.0
Gocompression-gotree-sitter-go 0.25.0

Enable all languages with:

cargo install ruley --features compression-all

Files in unsupported languages are included at full size (no compression applied).

Usage

Enable compression with the --compress flag:

ruley --compress

Or in the config file:

[general]
compress = true

What Gets Extracted

The compression extracts structural elements that help the LLM understand your codebase:

  • Functions: Signatures, parameters, return types
  • Types: Struct/class definitions, enum variants, type aliases
  • Traits/Interfaces: Method signatures
  • Imports: Module dependencies
  • Constants: Top-level constant definitions
  • Module structure: File and directory organization

What Gets Removed

Implementation details that don’t affect the LLM’s understanding of conventions:

  • Function bodies (replaced with { ... })
  • Loop internals
  • Conditional branches
  • Local variable assignments
  • Comments (optional, depending on grammar)

Compression Metrics

ruley tracks and reports compression statistics:

  • Total files: Number of files processed
  • Original size: Total bytes before compression
  • Compressed size: Total bytes after compression
  • Compression ratio: Ratio of compressed to original (lower is better)

These metrics are displayed during pipeline execution and in the final summary.

When to Use Compression

Use compression when:

  • Your codebase is large (>1000 files or >500K tokens)
  • You want to minimize LLM costs
  • The codebase has languages with tree-sitter grammar support

Skip compression when:

  • Your codebase is small (the cost savings are negligible)
  • You need the LLM to see implementation details for accurate convention extraction
  • Your primary language doesn’t have a tree-sitter grammar in ruley

ABI Compatibility

ruley uses tree-sitter 0.26.x (ABI v15). Language parsers may use slightly older ABI versions:

  • tree-sitter-go 0.25.0: ABI v15
  • tree-sitter-python 0.25.0: ABI v15
  • tree-sitter-rust 0.24.0: ABI v15
  • tree-sitter-typescript 0.23.2: ABI v14 (compatible via backward compatibility)

The tree-sitter core library supports backward-compatible ABI versions, so older grammar versions work correctly.

Token Counting and Chunking

Development Setup

This chapter covers setting up a development environment for contributing to ruley.

Prerequisites

  • Rust 1.91+ (see rust-version in Cargo.toml for minimum supported version)
  • Git for version control
  • mise (recommended) for development toolchain management

Quick Start

# Clone the repository
git clone https://github.com/EvilBit-Labs/ruley.git
cd ruley

# Install development tools (mise handles everything via mise.toml)
just setup

# Build the project
just build

# Run tests
just test

# Run the CLI
just run --help

Toolchain Management

ruley uses mise to manage the development toolchain. The mise.toml file at the project root defines all required tools and versions:

  • Rust 1.93.1 with rustfmt and clippy components
  • cargo-nextest for faster test execution
  • cargo-llvm-cov for code coverage
  • cargo-audit and cargo-deny for security auditing
  • mdbook and plugins for documentation
  • git-cliff for changelog generation
  • pre-commit for pre-commit hooks
  • actionlint for GitHub Actions linting

Run mise install to install all tools, or let just setup handle it.

Without mise

If you prefer not to use mise, install Rust via rustup and install individual tools with cargo install:

rustup toolchain install 1.93.1 --profile default -c rustfmt,clippy
cargo install cargo-nextest cargo-llvm-cov cargo-audit cargo-deny

Development Commands

ruley uses just as its task runner. Run just to see all available recipes:

CommandDescription
just testRun tests with nextest (all features)
just test-verboseRun tests with output
just lintRun rustfmt check + clippy (all features)
just clippy-minRun clippy with no default features
just checkQuick check: pre-commit + lint + build-check
just ci-checkFull CI suite: lint, test, build, audit, coverage
just buildDebug build
just build-releaseRelease build (all features, LTO)
just fmtFormat code
just coverageGenerate LCOV coverage report
just auditRun cargo audit
just denyRun cargo deny checks
just outdatedCheck for outdated dependencies
just docGenerate and open rustdoc
just docs-serveServe mdbook docs locally with live reload
just run <args>Run the CLI with arguments
just changelogGenerate CHANGELOG.md from git history

IDE Setup

rust-analyzer

ruley works well with rust-analyzer. Recommended VS Code settings:

{
  "rust-analyzer.cargo.features": "all",
  "rust-analyzer.check.command": "clippy",
  "rust-analyzer.check.extraArgs": [
    "--all-features"
  ]
}

Project Structure

ruley/
  src/
    cli/          # CLI argument parsing and config management
    packer/       # File discovery, gitignore, compression
    llm/          # LLM providers, tokenization, chunking
    generator/    # Prompt templates and rule parsing
    output/       # Format writers and conflict resolution
    utils/        # Errors, progress, caching, validation
    lib.rs        # Pipeline orchestration (10-stage pipeline)
    main.rs       # Entry point
  tests/          # Integration tests
  benches/        # Criterion benchmarks
  prompts/        # LLM prompt templates (markdown)
  docs/           # mdbook documentation (this book)
  examples/       # Example configuration files

Code Quality

Before submitting changes, ensure:

  1. All tests pass: just test
  2. No clippy warnings: just lint (includes all features) and just clippy-min (no default features)
  3. Code is formatted: just fmt
  4. Full CI suite passes: just ci-check

Lint Policy

ruley enforces a zero-warnings policy. Key lint rules:

  • unsafe_code = "deny" – No unsafe code in production (tests may use #[allow(unsafe_code)])
  • unwrap_used = "deny" – No unwrap() in production code
  • panic = "deny" – No panic!() in production code
  • pedantic, nursery, cargo – Clippy lint groups at warn level

See [workspace.lints.clippy] in Cargo.toml for the full lint configuration.

Commit Standards

Follow Conventional Commits:

<type>[(<scope>)]: <description>
  • Types: feat, fix, docs, refactor, test, perf, build, ci, chore
  • Scope (optional): cli, packer, llm, generator, output, utils, config, deps
  • DCO: Always sign off with git commit -s

See CONTRIBUTING.md for full contribution guidelines.

Testing

This chapter covers ruley’s testing philosophy, how to run tests, and guidelines for writing new tests.

Testing Philosophy

ruley follows the test proportionality principle: test critical functionality and real edge cases. Test code should be shorter than implementation.

Do test:

  • Critical functionality and real edge cases
  • Error conditions and recovery paths
  • Token counting and chunking logic
  • Retry logic and error handling
  • Cost estimation
  • Compression ratio targets (~70% token reduction)

Don’t test:

  • Trivial operations or framework behavior
  • Every possible provider/format permutation
  • Obvious success cases or trivial formatting

Running Tests

All Tests

just test

This runs all tests with cargo-nextest and --all-features.

Verbose Output

just test-verbose

Specific Tests

# Run a specific test by name
cargo test test_name

# Run tests in a specific module
cargo test packer::

# Run integration tests only
cargo test --test '*'

Coverage

just coverage

Generates an LCOV coverage report at lcov.info using cargo-llvm-cov.

Test Organization

Unit Tests

Unit tests live in the same file as the code they test, inside #[cfg(test)] modules:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_something() {
        // ...
    }
}
}

Integration Tests

Integration tests live in the tests/ directory and test the CLI as a black box using assert_cmd:

tests/
  common/
    mod.rs        # Shared test utilities
  cli_tests.rs    # CLI integration tests
  ...

Test Utilities

The tests/common/mod.rs module provides shared helpers for integration tests:

  • Environment isolation: Uses a denylist pattern (env_remove) to strip sensitive variables from subprocess environments
  • Denylisted variables: RULEY_*, ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY, OLLAMA_HOST

Important: The denylist uses env_remove (not env_clear()) because env_clear() breaks coverage instrumentation (LLVM_PROFILE_FILE), rustflags, and other tooling-injected variables.

Async Tests

Use #[tokio::test] for async tests:

#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_async_operation() {
    let result = some_async_function().await;
    assert!(result.is_ok());
}
}

Snapshot Tests

ruley uses insta for snapshot testing of CLI outputs and generated rules:

#![allow(unused)]
fn main() {
use insta::assert_snapshot;

#[test]
fn test_output_format() {
    let output = generate_output();
    assert_snapshot!(output);
}
}

Update snapshots with:

cargo insta review

CI Testing

CI runs the full test suite on every push and pull request:

  • Quality: just lint-rust (formatting + clippy)
  • Tests: just test with all features
  • Cross-platform: Tests on Linux, macOS, and Windows
  • Feature combinations: Default features, no features, all features
  • MSRV: Checks compilation with stable minus 2 releases
  • Coverage: Generates and uploads to Codecov

All CI checks must pass before merge. See .github/workflows/ci.yml for the full configuration.

Writing Tests

Guidelines

  1. Test the behavior, not the implementation – Focus on inputs and outputs
  2. Use descriptive test namestest_chunk_size_exceeds_context_triggers_chunking
  3. One assertion per concept – Multiple assert! calls are fine, but each test should verify one logical behavior
  4. Avoid mocking when possible – Integration tests with real (but controlled) inputs are preferred
  5. Keep tests fast – Use small inputs and avoid network calls in unit tests

Unsafe Code in Tests

Rust 2024 edition makes std::env::set_var unsafe due to data race concerns. Tests that manipulate environment variables need #[allow(unsafe_code)]:

#![allow(unused)]
fn main() {
#[test]
#[allow(unsafe_code)]
fn test_env_var_override() {
    unsafe { std::env::set_var("RULEY_PROVIDER", "openai") };
    // ... test logic ...
    unsafe { std::env::remove_var("RULEY_PROVIDER") };
}
}

Security

This chapter covers ruley’s security model, vulnerability reporting, and security features.

Reporting Vulnerabilities

Do not report security vulnerabilities through public GitHub issues.

Use one of the following channels:

  1. GitHub Private Vulnerability Reporting (preferred)
  2. Email support@evilbitlabs.io encrypted with the project’s PGP key

Please include:

  • Description of the vulnerability
  • Steps to reproduce
  • Potential impact
  • Suggested fix (if any)

See SECURITY.md for full policy details including scope, response times, safe harbor provisions, and the PGP key.

Security Features

Code Safety

  • unsafe_code = "deny" enforced at the package level
  • Pure Rust implementation with no C dependencies in core logic
  • Zero unwrap() and panic!() in production code (enforced via clippy lints)

Credential Handling

  • API keys are read from environment variables at runtime
  • Keys are never stored in generated output files
  • Keys are never logged or included in error messages
  • No credential persistence between runs

Network Security

  • No network listening: ruley makes outbound-only connections
  • Connections are made only to configured LLM provider APIs
  • HTTPS is used for all API calls
  • No telemetry or analytics

Supply Chain Security

  • GitHub Actions pinned to full commit SHAs
  • cargo audit runs in CI to check for known vulnerabilities
  • cargo deny checks license compliance and duplicate dependencies
  • CodeQL analysis on every PR
  • OSSF Scorecard monitoring
  • Automated dependency updates via Dependabot

Scope

In Scope

  • API key or credential leakage through error messages, logs, or generated output
  • Command injection via CLI arguments or configuration files
  • Path traversal in file input/output handling
  • Prompt injection affecting output integrity
  • Denial of service via crafted input files or configuration
  • Unsafe handling of LLM responses (e.g., writing to unintended paths)

Out of Scope

  • Vulnerabilities in upstream LLM providers (Anthropic, OpenAI, etc.)
  • Issues requiring physical access to the machine
  • Social engineering attacks
  • LLM hallucinations or inaccurate generated rules (quality issue, not security)

Response Timeline

This is a volunteer-maintained project. Response times are best-effort:

  • Acknowledgment: Within 1 week
  • Initial assessment: Within 2 weeks
  • Fix release: Within 90 days of confirmed vulnerabilities
  • Disclosure: Coordinated through GitHub Security Advisories

Release Process

ruley releases are automated via cargo-dist and GitHub Actions. This chapter documents the release workflow and verification procedures.

Overview

Pushing a version tag (e.g., v1.0.0) triggers the release workflow, which:

  1. Validates the tag version matches Cargo.toml
  2. Builds binaries for all 5 platform targets
  3. Generates SHA256 checksums
  4. Signs artifacts via Sigstore/GitHub Attestations
  5. Creates a GitHub release with changelog and binaries
  6. Publishes to crates.io (non-prereleases only)
  7. Updates the Homebrew tap

Configuration lives in dist-workspace.toml.

Platform Targets

PlatformTargetArchive
Linux x86_64x86_64-unknown-linux-gnu.tar.gz
Linux x86_64 (static)x86_64-unknown-linux-musl.tar.gz
Linux ARM64aarch64-unknown-linux-gnu.tar.gz
macOS ARM64aarch64-apple-darwin.tar.gz
Windows x86_64x86_64-pc-windows-msvc.zip

Pre-Release Checklist

Before creating a release:

  • All tests pass locally: just ci-check
  • Zero clippy warnings: cargo clippy --all-targets --all-features -- -D warnings
  • Documentation is up to date (README.md, CHANGELOG.md)
  • Review open issues and PRs for release blockers
  • Release build succeeds: cargo build --release
  • Binary works correctly: ./target/release/ruley --help
  • Dry-run crates.io publish: cargo publish --dry-run --all-features

Version Bump Process

  1. Update the version in Cargo.toml:

    version = "X.Y.Z"
    
  2. Run cargo update to update Cargo.lock.

  3. Generate the changelog:

    just changelog
    
  4. Review and edit CHANGELOG.md for the new version entry.

  5. Commit all changes:

    git add Cargo.toml Cargo.lock CHANGELOG.md
    git commit -s -m "chore(release): prepare for vX.Y.Z"
    

Tag and Release

  1. Create an annotated tag:

    git tag -a vX.Y.Z -m "Release vX.Y.Z"
    
  2. Push the tag to trigger the release workflow:

    git push origin vX.Y.Z
    

Automated Release Pipeline

The release is managed by two GitHub Actions workflows:

release.yml (cargo-dist)

Triggered by v* tags. Builds platform binaries, creates the GitHub release, publishes to crates.io, and updates the Homebrew tap.

release-plz.yml

Runs on every push to main:

  • release-plz-pr: Creates a release preparation PR with version bumps and changelog updates
  • release-plz-release: Publishes to crates.io when version changes are merged

Changelog Generation

Changelogs are generated by git-cliff using the configuration in cliff.toml. Commits follow the Conventional Commits specification and are grouped by type:

  • Features, Bug Fixes, Refactoring, Documentation, Performance, Testing, Miscellaneous, Security

Rollback Procedure

If a release needs to be rolled back:

  1. Delete the tag locally: git tag -d vX.Y.Z
  2. Delete the tag remotely: git push origin :refs/tags/vX.Y.Z
  3. Delete the GitHub release via the web interface
  4. Yank from crates.io: cargo yank --version X.Y.Z

Yanking prevents new installs but does not remove the package. Existing Cargo.lock files referencing this version will still work.

Prerelease Versions

For release candidates or beta releases:

  • Use a prerelease tag: v1.0.0-rc.1, v1.0.0-beta.1
  • The release workflow marks these as prereleases on GitHub
  • Prerelease versions are not published to crates.io automatically

Versioning Policy

ruley follows Semantic Versioning:

  • Major (X.0.0): Breaking changes to CLI interface or config format
  • Minor (0.X.0): New features, new providers, new output formats
  • Patch (0.0.X): Bug fixes, dependency updates, documentation

Release Verification

This chapter explains how to verify the authenticity and integrity of ruley release artifacts.

GitHub Attestations

All release artifacts are signed via Sigstore using GitHub Attestations. This provides cryptographic proof that binaries were built by the official GitHub Actions workflow and have not been tampered with.

Verifying with gh

gh attestation verify <artifact> --repo EvilBit-Labs/ruley

Replace <artifact> with the path to the downloaded binary or archive.

What This Verifies

  • The artifact was built by the EvilBit-Labs/ruley repository’s GitHub Actions
  • The build environment matches the expected workflow
  • The artifact has not been modified since it was built

SHA256 Checksums

Each release includes SHA256 checksums for all platform binaries. These are attached to the GitHub release alongside the binaries.

Verifying Checksums

macOS / Linux

# Download the checksum file
curl -fsSLO https://github.com/EvilBit-Labs/ruley/releases/latest/download/sha256sums.txt

# Verify a specific artifact
sha256sum -c sha256sums.txt --ignore-missing

Windows

# Compute the hash of the downloaded archive
Get-FileHash ruley-x86_64-pc-windows-msvc.zip -Algorithm SHA256

crates.io Verification

When installing via cargo install ruley, Cargo verifies the package integrity automatically using the crates.io checksum. No additional steps are needed.

Verifying a Cargo Install

To verify the installed version:

ruley --version

Compare the output with the expected version from the releases page.

Supply Chain Security

ruley takes several measures to secure the build and release pipeline:

MeasureDescription
Pinned ActionsAll GitHub Actions are pinned to full commit SHAs
Sigstore signingArtifacts signed via GitHub Attestations
cargo-auditChecks for known vulnerabilities in dependencies
cargo-denyChecks license compliance and duplicate dependencies
CodeQLStatic analysis for security vulnerabilities
OSSF ScorecardAutomated security posture monitoring
DependabotAutomated dependency update PRs
Reproducible buildsPinned Rust toolchain via rust-toolchain.toml and mise.toml
Committed lock fileCargo.lock is committed for deterministic builds

Security Assurance Case

This document provides a structured security assurance case for ruley, identifying the attack surface, threat model, and mitigations in place.

Attack Surface

ruley’s attack surface is limited by design. It is a CLI tool that reads local files and makes outbound API calls.

Entry Points

Entry PointDescriptionTrust Level
CLI argumentsUser-provided flags and pathsUntrusted
Configuration filesTOML files loaded from diskSemi-trusted
Environment variablesAPI keys and overridesTrusted
Repository filesSource files scanned for analysisUntrusted
LLM API responsesGenerated content from providersUntrusted
Repomix filesPre-packed XML input filesUntrusted

Exit Points

Exit PointDescription
Generated rule filesWritten to disk at user-specified or default paths
LLM API requestsOutbound HTTPS calls to provider endpoints
Console outputProgress, cost estimates, summaries
Cache files.ruley/ directory for state and temp files

Threat Model

T1: Credential Leakage

Threat: API keys exposed in error messages, logs, or generated output.

Mitigations:

  • API keys are read from environment variables only, never persisted
  • Error messages do not include API key values
  • Generated rule files do not contain API keys
  • Logging does not expose credentials

T2: Path Traversal

Threat: Malicious file paths in config or LLM responses writing outside the project directory.

Mitigations:

  • Output paths are resolved relative to the project root
  • The output module validates write paths
  • Config file paths are canonicalized during discovery

T3: Command Injection

Threat: Crafted CLI arguments or config values executing unintended commands.

Mitigations:

  • clap validates all CLI input with value_parser and PossibleValuesParser
  • Config values are deserialized through serde (no shell evaluation)
  • No shell commands are executed from user input

T4: Prompt Injection via Codebase

Threat: Malicious content in scanned source files influencing LLM output to produce harmful rules.

Mitigations:

  • Generated rules are validated (syntax, schema, semantic checks)
  • Validation detects contradictory rules and unrealistic references
  • Users review generated rules before committing to their repository
  • Finalization stage can deconflict with existing rules

T5: Denial of Service

Threat: Crafted input causing excessive resource consumption (memory, CPU, network).

Mitigations:

  • Token counting prevents unbounded LLM calls
  • Chunk size limits cap memory usage per chunk
  • Cost confirmation requires explicit user approval before expensive operations
  • Bounded concurrency in async operations

T6: Supply Chain Compromise

Threat: Compromised dependencies or build artifacts.

Mitigations:

  • cargo audit checks for known vulnerabilities in CI
  • cargo deny enforces license and duplicate dependency policies
  • GitHub Actions pinned to commit SHAs (not mutable tags)
  • CodeQL static analysis on every PR
  • OSSF Scorecard monitoring
  • Sigstore artifact signing

Code Safety Guarantees

GuaranteeEnforcement
No unsafe codeunsafe_code = "deny" in [lints.rust]
No unwrap in productionunwrap_used = "deny" in clippy config
No panic in productionpanic = "deny" in clippy config
Zero clippy warnings-D warnings enforced in CI
Dependency auditingcargo audit and cargo deny in CI

Data Flow Security

flowchart LR
    User["User<br/>(trusted)"] -->|CLI args,<br/>env vars| Ruley["ruley<br/>process"]
    Disk["Local files<br/>(semi-trusted)"] -->|config,<br/>source files| Ruley
    Ruley -->|HTTPS| LLM["LLM API<br/>(untrusted response)"]
    LLM -->|generated rules| Ruley
    Ruley -->|validated output| Output["Rule files<br/>(user reviews)"]
    Ruley -->|temp data| Cache[".ruley/<br/>cache"]

Key security boundaries:

  • Input boundary: All CLI arguments validated by clap; config files deserialized by serde
  • Network boundary: Only HTTPS outbound to configured providers; no inbound connections
  • Output boundary: Generated rules validated before writing; paths resolved relative to project root
  • Trust boundary: LLM responses treated as untrusted input; validated before use

Updating This Document

This document must be updated when:

  • New entry points are added (e.g., new input sources)
  • New exit points are added (e.g., new output destinations)
  • New dependencies are introduced that handle untrusted input
  • The network communication model changes
  • New LLM providers are added