Introduction
Make your codebase ruley. A Rust CLI tool for generating AI IDE rules from codebases.
ruley (the opposite of unruly) is a command-line tool that analyzes codebases and generates AI IDE rule files. It uses Large Language Models to understand project structure, conventions, and patterns, then produces actionable rules that help AI assistants provide better, context-aware code suggestions.
Tame your unruly codebase. Make it ruley.
Why ruley?
AI coding assistants work best when they understand your project’s conventions. Without explicit rules, they fall back to generic patterns that may not match your codebase. ruley bridges this gap by:
- Scanning your repository to understand its structure, languages, and patterns
- Compressing the codebase using tree-sitter for token efficiency
- Analyzing the compressed code with an LLM to extract conventions
- Generating format-specific rule files for your preferred AI IDE tools
The result is a set of rule files that teach AI assistants how your project works – coding style, architecture patterns, naming conventions, error handling approaches, and more.
Key Features
- Single binary distribution – No runtime dependencies (Node.js, Python, etc.)
- Multi-provider LLM support – Anthropic, OpenAI, Ollama, OpenRouter
- Multi-format output – Generate rules for 7 different AI IDE formats in a single run
- Native performance – Fast codebase analysis built with Rust
- Smart compression – Tree-sitter-based code compression for token efficiency (~70% reduction)
- Accurate token counting – Native tiktoken implementation for precise cost estimation
- Cost transparency – Shows estimated cost before LLM calls, requires confirmation
- Configurable – TOML configuration file, environment variables, and CLI flags
Supported Formats
| Format | Output File | Description |
|---|---|---|
| Cursor | .cursor/rules/*.mdc | Cursor IDE rules |
| Claude | CLAUDE.md | Claude Code project instructions |
| Copilot | .github/copilot-instructions.md | GitHub Copilot instructions |
| Windsurf | .windsurfrules | Windsurf IDE rules |
| Aider | .aider.conf.yml | Aider conventions |
| Generic | .ai-rules.md | Generic markdown rules |
| JSON | .ai-rules.json | Machine-readable JSON |
Where to Start
- New users: Start with Installation and Quick Start
- CLI reference: See Command-Line Interface for all options
- Configuration: See Configuration for
ruley.tomlsetup - Contributors: See Development Setup to get started
- Architecture: See Architecture Overview to understand the internals
Installation
Pre-built Binaries (Recommended)
Pre-built binaries are available for Linux (x86_64, ARM64), macOS (ARM64), and Windows (x86_64) on the releases page.
macOS / Linux
curl -fsSL https://github.com/EvilBit-Labs/ruley/releases/latest/download/ruley-installer.sh | sh
Windows
powershell -ExecutionPolicy Bypass -c "irm https://github.com/EvilBit-Labs/ruley/releases/latest/download/ruley-installer.ps1 | iex"
Homebrew
brew install EvilBit-Labs/tap/ruley
Cargo (crates.io)
cargo install ruley
This builds from source with default features (Anthropic, OpenAI, TypeScript compression).
With All Features
cargo install ruley --all-features
Minimal Install
cargo install ruley --no-default-features --features anthropic
cargo-binstall
If you have cargo-binstall installed:
cargo binstall ruley
Building from Source
git clone https://github.com/EvilBit-Labs/ruley.git
cd ruley
cargo build --release
The binary will be at ./target/release/ruley.
System Requirements
- Operating system: Linux (x86_64, ARM64), macOS (ARM64), Windows (x86_64)
- Rust (build from source only): 1.91 or newer (see
rust-versioninCargo.toml) - Network: Required for LLM API calls (except Ollama which runs locally)
Feature Flags
ruley uses Cargo feature flags to control which LLM providers and compression languages are compiled in:
| Feature | Description | Default |
|---|---|---|
anthropic | Anthropic Claude provider | Yes |
openai | OpenAI GPT provider | Yes |
ollama | Ollama local model provider | No |
openrouter | OpenRouter multi-model provider | No |
all-providers | All LLM providers | No |
compression-typescript | TypeScript tree-sitter grammar | Yes |
compression-python | Python tree-sitter grammar | No |
compression-rust | Rust tree-sitter grammar | No |
compression-go | Go tree-sitter grammar | No |
compression-all | All compression languages | No |
Verifying Releases
All release artifacts are signed via Sigstore using GitHub Attestations:
gh attestation verify <artifact> --repo EvilBit-Labs/ruley
See Release Verification for details.
Quick Start
This guide walks you through generating your first set of AI IDE rules with ruley.
Prerequisites
- ruley installed (see Installation)
- An API key for at least one LLM provider
Step 1: Set Your API Key
Set the environment variable for your chosen provider:
Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
OpenAI
export OPENAI_API_KEY="sk-..."
Ollama
# No API key needed -- just ensure Ollama is running
ollama serve
OpenRouter
export OPENROUTER_API_KEY="sk-or-..."
Step 2: Generate Rules
Navigate to your project directory and run ruley:
cd /path/to/your/project
ruley
By default, ruley uses Anthropic Claude and generates Cursor format rules.
Step 3: Review the Output
ruley shows you:
- Scan results – How many files were discovered
- Compression stats – Token reduction from tree-sitter compression
- Cost estimate – Estimated LLM cost before proceeding
- Confirmation prompt – You must approve before the LLM call is made
- Generated files – Where the rule files were written
Common Variations
Use a Different Provider
ruley --provider openai --model gpt-4o
Generate Multiple Formats
ruley --format cursor,claude,copilot
Generate All Formats at Once
ruley --format all
Enable Tree-Sitter Compression
ruley --compress
Analyze a Specific Directory
ruley ./my-project --compress
Dry Run (Preview Without Calling LLM)
ruley --dry-run
This shows what would be processed (file count, token estimate, cost) without making any LLM calls. Useful for checking costs before committing.
Skip Cost Confirmation
ruley --no-confirm
Use a Local Ollama Model
ruley --provider ollama --model llama3.1
What Happens Next
The generated rule files are placed in your project directory at the standard locations for each format. Your AI IDE tools will automatically pick them up:
- Cursor:
.cursor/rules/*.mdc– loaded automatically by Cursor IDE - Claude:
CLAUDE.md– read by Claude Code as project context - Copilot:
.github/copilot-instructions.md– loaded by GitHub Copilot - Windsurf:
.windsurfrules– loaded by Windsurf IDE - Aider:
.aider.conf.yml– loaded by Aider CLI
Commit the generated files to your repository so your whole team benefits from consistent AI assistance.
Next Steps
- Command-Line Interface – Full reference for all CLI options
- Configuration – Set up a
ruley.tomlfor your project - LLM Providers – Compare providers and choose the best fit
- Output Formats – Understand what each format produces
Command-Line Interface
Usage
ruley [OPTIONS] [PATH]
PATH: Path to repository (local path or remote URL). Defaults to . (current directory).
Options
Core Options
| Flag | Env Variable | Default | Description |
|---|---|---|---|
-p, --provider <NAME> | RULEY_PROVIDER | anthropic | LLM provider (anthropic, openai, ollama, openrouter) |
-m, --model <NAME> | RULEY_MODEL | (provider default) | Model to use |
-f, --format <FORMATS> | RULEY_FORMAT | cursor | Output format(s), comma-separated |
-o, --output <PATH> | RULEY_OUTPUT | (format default) | Output file path (single format only) |
-c, --config <PATH> | RULEY_CONFIG | ruley.toml | Config file path |
Generation Options
| Flag | Env Variable | Default | Description |
|---|---|---|---|
--description <TEXT> | RULEY_DESCRIPTION | (none) | Focus area for rule generation |
--rule-type <TYPE> | RULEY_RULE_TYPE | auto | Cursor rule type (auto, always, manual, agent-requested) |
--compress | RULEY_COMPRESS | false | Enable tree-sitter compression |
--chunk-size <N> | RULEY_CHUNK_SIZE | 100000 | Max tokens per LLM chunk |
--repomix-file <PATH> | RULEY_REPOMIX_FILE | (none) | Use pre-packed repomix file as input |
Filtering Options
| Flag | Description |
|---|---|
--include <PATTERN> | Include only matching files (repeatable) |
--exclude <PATTERN> | Exclude matching files (repeatable) |
Behavior Options
| Flag | Env Variable | Default | Description |
|---|---|---|---|
--no-confirm | RULEY_NO_CONFIRM | false | Skip cost confirmation prompt |
--dry-run | RULEY_DRY_RUN | false | Show plan without calling LLM |
--on-conflict <STRATEGY> | RULEY_ON_CONFLICT | prompt | Conflict resolution (prompt, overwrite, skip, smart-merge) |
--retry-on-validation-failure | false | Auto-retry with LLM fix on validation failure | |
--no-deconflict | false | Disable LLM-based deconfliction with existing rules | |
--no-semantic-validation | false | Disable all semantic validation checks |
Output Options
| Flag | Description |
|---|---|
-v | Increase verbosity (-v = DEBUG, -vv = TRACE) |
-q | Suppress non-essential output |
--version | Print version information |
--help | Print help information |
Environment Variables
All CLI flags can be set via RULEY_* environment variables. CLI flags take precedence over environment variables, which take precedence over config file values.
Provider API Keys
| Variable | Provider | Required |
|---|---|---|
ANTHROPIC_API_KEY | Anthropic | When using --provider anthropic |
OPENAI_API_KEY | OpenAI | When using --provider openai |
OLLAMA_HOST | Ollama | Optional (default: http://localhost:11434) |
OPENROUTER_API_KEY | OpenRouter | When using --provider openrouter |
Examples
Basic Usage
# Analyze current directory with defaults
ruley
# Analyze a specific project
ruley /path/to/project
Provider Selection
# Use OpenAI with a specific model
ruley --provider openai --model gpt-4o
# Use local Ollama
ruley --provider ollama --model llama3.1
# Use OpenRouter with Claude
ruley --provider openrouter --model anthropic/claude-3.5-sonnet
Format Control
# Generate Cursor rules (default)
ruley --format cursor
# Generate multiple formats
ruley --format cursor,claude,copilot
# Generate all formats
ruley --format all
# Write to a specific path (single format only)
ruley --format claude --output ./docs/CLAUDE.md
Compression and Performance
# Enable tree-sitter compression (~70% token reduction)
ruley --compress
# Adjust chunk size for large codebases
ruley --chunk-size 200000
# Use a pre-packed repomix file
ruley --repomix-file ./codebase.xml
Cost Management
# Preview without calling the LLM
ruley --dry-run
# Skip the cost confirmation prompt
ruley --no-confirm
Conflict Resolution
# Overwrite existing rule files
ruley --on-conflict overwrite
# Skip if files already exist
ruley --on-conflict skip
# Use LLM to smart-merge with existing rules
ruley --on-conflict smart-merge
Filtering Files
# Only include Rust files
ruley --include "**/*.rs"
# Exclude test directories
ruley --exclude "**/tests/**" --exclude "**/benches/**"
Configuration
ruley supports hierarchical configuration from multiple sources. This page documents the configuration file format and precedence rules.
Configuration Precedence
Configuration is resolved in this order (highest to lowest precedence):
- CLI flags – Explicitly provided command-line arguments
- Environment variables –
RULEY_*prefix (handled by clap’senvattribute) - Config files – Loaded and merged in discovery order (see below)
- Built-in defaults – Hardcoded in the CLI parser
When a CLI flag is explicitly provided, it always wins. When it’s not provided (using the default), the config file value is used instead.
Config File Discovery
Config files are discovered and merged in this order (later overrides earlier):
~/.config/ruley/config.toml– User-level global configruley.tomlin the git repository root – Project-level config./ruley.tomlin the current directory – Working directory config- Explicit
--config <path>– If provided, overrides all above
All discovered files are merged. Duplicate keys in later files override earlier ones.
Configuration File Format
Configuration files use TOML format. All sections are optional.
Complete Example
[general]
provider = "anthropic"
model = "claude-sonnet-4-5-20250929"
format = ["cursor", "claude"]
compress = true
chunk_size = 100000
no_confirm = false
rule_type = "auto"
[output]
formats = ["cursor", "claude"]
on_conflict = "prompt"
[output.paths]
cursor = ".cursor/rules/project-rules.mdc"
claude = "CLAUDE.md"
[include]
patterns = ["**/*.rs", "**/*.toml"]
[exclude]
patterns = ["**/target/**", "**/node_modules/**"]
[chunking]
chunk_size = 100000
overlap = 10000
[providers.anthropic]
model = "claude-sonnet-4-5-20250929"
max_tokens = 8192
[providers.openai]
model = "gpt-4o"
max_tokens = 4096
[providers.ollama]
host = "http://localhost:11434"
model = "llama3.1:70b"
[providers.openrouter]
model = "anthropic/claude-3.5-sonnet"
max_tokens = 8192
[validation]
enabled = true
retry_on_failure = false
max_retries = 3
[validation.semantic]
check_file_paths = true
check_contradictions = true
check_consistency = true
check_reality = true
[finalization]
enabled = true
deconflict = true
normalize_formatting = true
inject_metadata = true
[general] Section
Core settings for the pipeline.
| Key | Type | Default | Description |
|---|---|---|---|
provider | string | "anthropic" | LLM provider name |
model | string | (provider default) | Model to use |
format | string[] | ["cursor"] | Output formats |
compress | bool | false | Enable tree-sitter compression |
chunk_size | int | 100000 | Max tokens per LLM chunk |
no_confirm | bool | false | Skip cost confirmation |
rule_type | string | "auto" | Cursor rule type |
[output] Section
Output format and path configuration.
| Key | Type | Default | Description |
|---|---|---|---|
formats | string[] | [] | Alternative to general.format |
on_conflict | string | "prompt" | Conflict resolution strategy |
paths.<format> | string | (format default) | Custom output path per format |
[include] / [exclude] Sections
File filtering using glob patterns.
| Key | Type | Default | Description |
|---|---|---|---|
patterns | string[] | [] | Glob patterns for file matching |
[chunking] Section
Controls how large codebases are split for LLM processing.
| Key | Type | Default | Description |
|---|---|---|---|
chunk_size | int | 100000 | Max tokens per chunk |
overlap | int | chunk_size / 10 | Token overlap between chunks |
[providers] Section
Provider-specific configuration. Each provider has its own subsection.
[providers.anthropic] / [providers.openai] / [providers.openrouter]:
| Key | Type | Description |
|---|---|---|
model | string | Model name override |
max_tokens | int | Max output tokens |
[providers.ollama]:
| Key | Type | Description |
|---|---|---|
host | string | Ollama server URL |
model | string | Model name |
[validation] Section
Controls validation of generated rules.
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable validation |
retry_on_failure | bool | false | Auto-retry with LLM fix |
max_retries | int | 3 | Max auto-fix attempts |
[validation.semantic] – Semantic validation checks:
| Key | Type | Default | Description |
|---|---|---|---|
check_file_paths | bool | true | Verify referenced file paths exist |
check_contradictions | bool | true | Detect contradictory rules |
check_consistency | bool | true | Cross-format consistency check |
check_reality | bool | true | Verify language/framework references |
[finalization] Section
Controls post-processing of generated rules.
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable finalization |
deconflict | bool | true | LLM-based deconfliction with existing rules |
normalize_formatting | bool | true | Normalize line endings and whitespace |
inject_metadata | bool | true | Add timestamp/version/provider headers |
LLM Providers
ruley supports multiple LLM providers. Each provider is feature-gated at compile time and requires its own API key (except Ollama).
Provider Comparison
| Provider | API Key Required | Local | Default Model | Context Window |
|---|---|---|---|---|
| Anthropic | Yes | No | claude-sonnet-4-5-20250929 | 200K tokens |
| OpenAI | Yes | No | gpt-4o | 128K tokens |
| Ollama | No | Yes | llama3.1:70b | ~100K tokens |
| OpenRouter | Yes | No | anthropic/claude-3.5-sonnet | Varies by model |
Anthropic
Anthropic’s Claude models are the default provider and generally produce excellent rule quality.
Setup
export ANTHROPIC_API_KEY="sk-ant-..."
Usage
# Uses default model (Claude Sonnet 4.5)
ruley --provider anthropic
# Specify a model
ruley --provider anthropic --model claude-sonnet-4-5-20250929
Config File
[general]
provider = "anthropic"
[providers.anthropic]
model = "claude-sonnet-4-5-20250929"
max_tokens = 8192
OpenAI
OpenAI’s GPT models provide strong rule generation with fast response times.
Setup
export OPENAI_API_KEY="sk-..."
Usage
ruley --provider openai --model gpt-4o
Config File
[general]
provider = "openai"
[providers.openai]
model = "gpt-4o"
max_tokens = 4096
Ollama
Ollama runs models locally. No API key is needed, and there are no per-token costs. This is ideal for privacy-sensitive codebases or offline use.
Setup
- Install Ollama
- Pull a model:
ollama pull llama3.1:70b - Start the server:
ollama serve
Usage
ruley --provider ollama --model llama3.1
# Custom Ollama host
OLLAMA_HOST="http://192.168.1.100:11434" ruley --provider ollama
Config File
[general]
provider = "ollama"
[providers.ollama]
host = "http://localhost:11434"
model = "llama3.1:70b"
Considerations
- Rule quality depends heavily on the model size. Larger models (70B+) produce better results.
- Local models have smaller context windows. Use
--compressand--chunk-sizeto manage large codebases. - No cost confirmation is shown since Ollama is free to use.
OpenRouter
OpenRouter provides access to models from multiple providers through a single API. It fetches dynamic pricing from the OpenRouter API for accurate cost estimation.
Setup
export OPENROUTER_API_KEY="sk-or-..."
Usage
ruley --provider openrouter --model anthropic/claude-3.5-sonnet
Config File
[general]
provider = "openrouter"
[providers.openrouter]
model = "anthropic/claude-3.5-sonnet"
max_tokens = 8192
Feature Flags
Providers are compiled in via Cargo feature flags. The default build includes anthropic and openai.
| Feature | Provider |
|---|---|
anthropic | Anthropic (default) |
openai | OpenAI (default) |
ollama | Ollama |
openrouter | OpenRouter |
all-providers | All of the above |
To include all providers when building from source:
cargo install ruley --features all-providers
Choosing a Provider
- Best quality: Anthropic Claude (default) – excellent at understanding code conventions
- Fastest: OpenAI GPT-4o – lower latency per request
- Free / Private: Ollama – no API costs, data stays local
- Flexible: OpenRouter – access to many models through one API
Output Formats
- Format Overview
- Selecting Formats
- Format Details
- Conflict Resolution
- Single Analysis, Multiple Outputs
ruley generates rule files in 7 formats. Each format targets a specific AI IDE tool and follows its conventions for file naming, structure, and content.
Format Overview
| Format | Output File | Description |
|---|---|---|
cursor | .cursor/rules/*.mdc | Cursor IDE rules with frontmatter |
claude | CLAUDE.md | Claude Code project instructions |
copilot | .github/copilot-instructions.md | GitHub Copilot instructions |
windsurf | .windsurfrules | Windsurf IDE rules |
aider | .aider.conf.yml | Aider conventions |
generic | .ai-rules.md | Generic markdown rules |
json | .ai-rules.json | Machine-readable JSON |
Selecting Formats
Single Format (Default)
# Cursor format (default)
ruley
# Claude format
ruley --format claude
Multiple Formats
ruley --format cursor,claude,copilot
All Formats
ruley --format all
Custom Output Path
For a single format, you can override the output path:
ruley --format claude --output ./docs/CLAUDE.md
For multiple formats, use the config file:
[output.paths]
cursor = ".cursor/rules/project-rules.mdc"
claude = "docs/CLAUDE.md"
Format Details
Cursor (.mdc)
Cursor IDE rules use the .mdc (markdown component) format with YAML frontmatter. Rules are placed in .cursor/rules/ and loaded automatically by Cursor.
The --rule-type flag controls the frontmatter alwaysApply field:
| Rule Type | Behavior |
|---|---|
auto | LLM decides based on rule content |
always | Rules always apply to every file |
manual | Rules must be manually activated |
agent-requested | Rules are requested by the AI agent |
Claude (CLAUDE.md)
A single markdown file at the project root. Claude Code reads this file as project context for all conversations. Content is structured as guidelines and conventions in standard markdown.
Copilot (.github/copilot-instructions.md)
GitHub Copilot’s project-level instructions file. Placed in the .github/ directory. Content is natural language instructions that guide Copilot’s suggestions.
Windsurf (.windsurfrules)
Windsurf IDE rules file at the project root. Similar to Cursor rules but without frontmatter. Content is structured as conventions and patterns.
Aider (.aider.conf.yml)
Aider’s configuration file in YAML format. Contains conventions and patterns that guide Aider’s code generation.
Generic (.ai-rules.md)
A generic markdown format not tied to any specific tool. Useful as a portable set of conventions that can be manually included in any AI assistant’s context.
JSON (.ai-rules.json)
Machine-readable JSON format for programmatic consumption. Contains the same convention data in a structured format suitable for integration with custom tools.
Conflict Resolution
When output files already exist, ruley offers several strategies:
| Strategy | Behavior |
|---|---|
prompt | Ask the user what to do (default, interactive) |
overwrite | Replace existing files (creates backups) |
skip | Skip formats where files exist |
smart-merge | Use LLM to merge new rules with existing ones |
Set the strategy via CLI or config:
ruley --on-conflict smart-merge
[output]
on_conflict = "smart-merge"
When overwrite is used, ruley creates .bak backups of existing files before writing.
Single Analysis, Multiple Outputs
ruley performs a single LLM analysis of your codebase, then generates format-specific rules through a refinement step. This means:
- The analysis cost is paid once regardless of how many formats you generate
- Each format adds a small refinement LLM call to adapt the analysis to format-specific conventions
- Generating all 7 formats is only marginally more expensive than generating 1
Architecture Overview
ruley is a single-crate Rust CLI tool organized into focused modules. This chapter describes the high-level architecture, module responsibilities, and design principles.
Module Map
graph TB
CLI["cli/<br/>Argument parsing<br/>& configuration"]
Packer["packer/<br/>File discovery<br/>& compression"]
LLM["llm/<br/>Provider abstraction<br/>& token counting"]
Gen["generator/<br/>Prompt templates<br/>& rule parsing"]
Output["output/<br/>Format writers<br/>& conflict resolution"]
Utils["utils/<br/>Errors, progress<br/>& caching"]
CLI --> Packer
CLI --> LLM
Packer --> LLM
LLM --> Gen
Gen --> Output
CLI --> Utils
Packer --> Utils
LLM --> Utils
Gen --> Utils
Output --> Utils
Module Responsibilities
| Module | Purpose |
|---|---|
cli/ | Command-line interface with clap argument parsing, config file loading and merging |
packer/ | Repository scanning, file discovery, gitignore handling, tree-sitter compression |
llm/ | Multi-provider LLM integration, tokenization, chunking, cost calculation |
generator/ | Analysis and refinement prompt templates, response parsing, rule structures |
output/ | Multi-format file writers, conflict resolution, smart-merge |
utils/ | Shared utilities: error types, progress bars, caching, state management, validation |
Design Principles
Provider-Agnostic LLM Interface
All LLM providers implement the LLMProvider trait, which defines a standard interface for completions. The LLMClient wraps a provider and handles retry logic. New providers can be added by implementing the trait and gating behind a Cargo feature flag.
Format-Agnostic Rule Generation
The pipeline performs a single LLM analysis pass, then generates format-specific rules through lightweight refinement calls. The GeneratedRules structure holds format-independent analysis results and per-format FormattedRules. This means adding a new output format requires only a new refinement prompt and writer – no changes to the analysis pipeline.
Token-Efficient Processing
ruley minimizes LLM costs through:
- Tree-sitter compression: AST-based extraction reduces token count by ~70%
- Accurate counting: Native tiktoken tokenization matches provider billing
- Intelligent chunking: Large codebases are split at logical boundaries
- Cost transparency: Estimates are shown before any LLM calls
Local-First Design
The scanning, compression, and output stages run entirely locally without network access. Only the analysis and refinement stages call external LLM APIs. When using Ollama, the entire pipeline runs on your machine.
Data Flow
flowchart LR
Repo["Repository<br/>files"] --> Scan["Scan &<br/>filter"]
Scan --> Compress["Compress<br/>(tree-sitter)"]
Compress --> Tokenize["Tokenize<br/>& chunk"]
Tokenize --> Analyze["LLM<br/>analysis"]
Analyze --> Refine["Format<br/>refinement"]
Refine --> Validate["Validate<br/>& finalize"]
Validate --> Write["Write<br/>files"]
- Repository files are scanned respecting
.gitignorerules - Source files are compressed via tree-sitter (if enabled) to reduce token count
- The compressed codebase is tokenized and split into chunks if needed
- Chunks are sent to the LLM for analysis to extract conventions
- The analysis is refined per output format through targeted prompts
- Generated rules are validated (syntax, schema, semantic checks)
- Final rules are written to disk at format-standard locations
See Rule Generation Pipeline for detailed stage-by-stage documentation.
Key Abstractions
PipelineContext
The central state container passed through all 10 pipeline stages. It carries:
config: MergedConfig– Final resolved configurationstage: PipelineStage– Current execution stagecompressed_codebase– Scanned and compressed repository datagenerated_rules– Analysis results and formatted rulescost_tracker– Running tally of LLM costsprogress_manager– Visual progress feedback
MergedConfig
The single source of truth for all configuration values, produced by merging CLI flags, environment variables, and config files.
LLMProvider Trait
The abstraction layer for LLM providers. Each provider (Anthropic, OpenAI, Ollama, OpenRouter) implements this trait. The LLMClient wraps a provider and adds retry logic with exponential backoff.
GeneratedRules
Holds the format-independent analysis and per-format rule content. Populated during the analysis and formatting stages, consumed during writing.
Rule Generation Pipeline
- Pipeline Stages
- Stage 1: Init
- Stage 2: Scanning
- Stage 3: Compressing
- Stage 4: Analyzing
- Stage 5: Formatting
- Stage 6: Validating
- Stage 7: Finalizing
- Stage 8: Writing
- Stage 9: Reporting
- Stage 10: Cleanup
- Dry Run Mode
ruley processes codebases through a 10-stage pipeline. Each stage has a clear responsibility and transitions cleanly to the next. The PipelineContext carries state through all stages.
Pipeline Stages
flowchart TD
S1["1. Init<br/>Configuration validation"]
S2["2. Scanning<br/>File discovery"]
S3["3. Compressing<br/>Tree-sitter compression"]
S4["4. Analyzing<br/>LLM analysis"]
S5["5. Formatting<br/>Per-format refinement"]
S6["6. Validating<br/>Rule validation"]
S7["7. Finalizing<br/>Post-processing"]
S8["8. Writing<br/>File output"]
S9["9. Reporting<br/>Summary display"]
S10["10. Cleanup<br/>Temp file removal"]
S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 --> S8 --> S9 --> S10
Stage 1: Init
Module: src/lib.rs
- Validates the repository path exists
- Creates the
.ruley/cache directory - Cleans up old temporary files (24-hour threshold)
- Ensures
.ruley/is in.gitignore - Loads previous run state from
.ruley/state.json
Stage 2: Scanning
Module: src/packer/
- Discovers all files in the repository
- Respects
.gitignorerules via theignorecrate - Applies
--includeand--excludeglob patterns - Identifies file languages for compression
- Caches the file list to
.ruley/for debugging
If --repomix-file is provided, scanning is skipped and the pre-packed file is used directly.
Stage 3: Compressing
Module: src/packer/compression/
- Reads discovered files and their content
- When
--compressis enabled, uses tree-sitter grammars to extract structural elements (functions, classes, types) while removing implementation details - Calculates compression metadata (file count, original size, compressed size, ratio)
- Target compression ratio: ~70% token reduction
Without --compress, files are included at full size.
Stage 4: Analyzing
Module: src/llm/, src/generator/
This is the core LLM interaction stage:
- Tokenize: Count tokens in the compressed codebase using the provider’s tokenizer
- Chunk: If the codebase exceeds the provider’s context window, split into chunks with configurable overlap
- Cost estimate: Calculate and display estimated cost
- Confirm: Prompt the user to approve (unless
--no-confirm) - Analyze: Send each chunk to the LLM with the analysis prompt
- Merge: If multi-chunk, perform an additional LLM call to merge chunk analyses
- Parse: Extract structured
GeneratedRulesfrom the LLM response
The analysis prompt asks the LLM to identify:
- Project conventions and coding style
- Architecture patterns and module structure
- Error handling approaches
- Testing practices
- Naming conventions
Stage 5: Formatting
Module: src/generator/
For each requested output format:
- Build a format-specific refinement prompt with the analysis result
- Call the LLM to generate format-adapted content
- Store the result in
GeneratedRules.rules_by_format
Each format refinement is a separate LLM call to ensure format-specific conventions are followed (e.g., YAML frontmatter for Cursor, markdown for Claude).
Stage 6: Validating
Module: src/utils/validation.rs
Validates generated rules against multiple criteria:
- Syntax validation: Format-specific structure checks (valid YAML, valid frontmatter, etc.)
- Schema validation: Required fields and structure
- Semantic validation (configurable):
- File paths referenced in rules exist in the codebase
- No contradictory rules
- Cross-format consistency
- Languages/frameworks match the actual codebase
If validation fails and --retry-on-validation-failure is set, ruley sends the errors back to the LLM for auto-fix (up to max_retries attempts).
Stage 7: Finalizing
Module: src/utils/finalization.rs
Post-processing of validated rules:
- Metadata injection: Adds generation timestamp, ruley version, and provider info
- Deconfliction: If existing rule files are present, uses an LLM call to merge new rules with existing ones (unless
--no-deconflict) - Formatting normalization: Normalizes line endings and trailing whitespace
- Post-finalize smoke validation: Re-validates after finalization to catch any introduced errors
Stage 8: Writing
Module: src/output/
Writes rule files to disk:
- Resolves output paths (format defaults, config overrides,
--outputflag) - Applies conflict resolution strategy (
prompt,overwrite,skip,smart-merge) - Creates backups when overwriting
- Reports what was written (created, updated, skipped, merged)
Stage 9: Reporting
Module: src/utils/summary.rs
Displays a summary of the pipeline run:
- Files analyzed
- Tokens processed
- Compression ratio (if applicable)
- Total LLM cost
- Elapsed time
- Output files written
Stage 10: Cleanup
Module: src/lib.rs, src/utils/cache.rs
Final cleanup:
- Saves pipeline state to
.ruley/state.json(for future runs) - Cleans up temporary files in
.ruley/ - Transitions to the
Completeterminal state
Dry Run Mode
When --dry-run is specified, the pipeline runs stages 1-3 (Init, Scanning, Compressing), displays what would be processed (file count, token estimate, cost), and exits without making any LLM calls.
Tree-Sitter Compression
- How It Works
- Supported Languages
- Usage
- What Gets Extracted
- What Gets Removed
- Compression Metrics
- When to Use Compression
- ABI Compatibility
ruley uses tree-sitter grammars to compress source code before sending it to the LLM. This reduces token count by approximately 70%, significantly lowering costs for large codebases.
How It Works
Tree-sitter parses source files into abstract syntax trees (ASTs). ruley walks these ASTs to extract structural elements – function signatures, type definitions, class declarations, imports – while removing implementation bodies. The result is a compressed representation that preserves the project’s API surface and architecture while discarding the details.
Before Compression
#![allow(unused)]
fn main() {
pub fn analyze_codebase(path: &Path, config: &Config) -> Result<Analysis> {
let files = scan_files(path, config)?;
let mut analysis = Analysis::new();
for file in &files {
let content = std::fs::read_to_string(&file.path)?;
let tokens = tokenize(&content);
analysis.add_file(file, tokens);
}
analysis.finalize()
}
}
After Compression
#![allow(unused)]
fn main() {
pub fn analyze_codebase(path: &Path, config: &Config) -> Result<Analysis> { ... }
}
The LLM sees the function signature, return type, and parameter types – enough to understand the codebase’s API surface without the implementation details.
Supported Languages
Each language requires a tree-sitter grammar compiled into ruley via Cargo feature flags:
| Language | Feature Flag | Grammar Version |
|---|---|---|
| TypeScript | compression-typescript (default) | tree-sitter-typescript 0.23.2 |
| Python | compression-python | tree-sitter-python 0.25.0 |
| Rust | compression-rust | tree-sitter-rust 0.24.0 |
| Go | compression-go | tree-sitter-go 0.25.0 |
Enable all languages with:
cargo install ruley --features compression-all
Files in unsupported languages are included at full size (no compression applied).
Usage
Enable compression with the --compress flag:
ruley --compress
Or in the config file:
[general]
compress = true
What Gets Extracted
The compression extracts structural elements that help the LLM understand your codebase:
- Functions: Signatures, parameters, return types
- Types: Struct/class definitions, enum variants, type aliases
- Traits/Interfaces: Method signatures
- Imports: Module dependencies
- Constants: Top-level constant definitions
- Module structure: File and directory organization
What Gets Removed
Implementation details that don’t affect the LLM’s understanding of conventions:
- Function bodies (replaced with
{ ... }) - Loop internals
- Conditional branches
- Local variable assignments
- Comments (optional, depending on grammar)
Compression Metrics
ruley tracks and reports compression statistics:
- Total files: Number of files processed
- Original size: Total bytes before compression
- Compressed size: Total bytes after compression
- Compression ratio: Ratio of compressed to original (lower is better)
These metrics are displayed during pipeline execution and in the final summary.
When to Use Compression
Use compression when:
- Your codebase is large (>1000 files or >500K tokens)
- You want to minimize LLM costs
- The codebase has languages with tree-sitter grammar support
Skip compression when:
- Your codebase is small (the cost savings are negligible)
- You need the LLM to see implementation details for accurate convention extraction
- Your primary language doesn’t have a tree-sitter grammar in ruley
ABI Compatibility
ruley uses tree-sitter 0.26.x (ABI v15). Language parsers may use slightly older ABI versions:
- tree-sitter-go 0.25.0: ABI v15
- tree-sitter-python 0.25.0: ABI v15
- tree-sitter-rust 0.24.0: ABI v15
- tree-sitter-typescript 0.23.2: ABI v14 (compatible via backward compatibility)
The tree-sitter core library supports backward-compatible ABI versions, so older grammar versions work correctly.
Token Counting and Chunking
Development Setup
- Prerequisites
- Quick Start
- Toolchain Management
- Development Commands
- IDE Setup
- Project Structure
- Code Quality
- Commit Standards
This chapter covers setting up a development environment for contributing to ruley.
Prerequisites
- Rust 1.91+ (see
rust-versioninCargo.tomlfor minimum supported version) - Git for version control
- mise (recommended) for development toolchain management
Quick Start
# Clone the repository
git clone https://github.com/EvilBit-Labs/ruley.git
cd ruley
# Install development tools (mise handles everything via mise.toml)
just setup
# Build the project
just build
# Run tests
just test
# Run the CLI
just run --help
Toolchain Management
ruley uses mise to manage the development toolchain. The mise.toml file at the project root defines all required tools and versions:
- Rust 1.93.1 with rustfmt and clippy components
- cargo-nextest for faster test execution
- cargo-llvm-cov for code coverage
- cargo-audit and cargo-deny for security auditing
- mdbook and plugins for documentation
- git-cliff for changelog generation
- pre-commit for pre-commit hooks
- actionlint for GitHub Actions linting
Run mise install to install all tools, or let just setup handle it.
Without mise
If you prefer not to use mise, install Rust via rustup and install individual tools with cargo install:
rustup toolchain install 1.93.1 --profile default -c rustfmt,clippy
cargo install cargo-nextest cargo-llvm-cov cargo-audit cargo-deny
Development Commands
ruley uses just as its task runner. Run just to see all available recipes:
| Command | Description |
|---|---|
just test | Run tests with nextest (all features) |
just test-verbose | Run tests with output |
just lint | Run rustfmt check + clippy (all features) |
just clippy-min | Run clippy with no default features |
just check | Quick check: pre-commit + lint + build-check |
just ci-check | Full CI suite: lint, test, build, audit, coverage |
just build | Debug build |
just build-release | Release build (all features, LTO) |
just fmt | Format code |
just coverage | Generate LCOV coverage report |
just audit | Run cargo audit |
just deny | Run cargo deny checks |
just outdated | Check for outdated dependencies |
just doc | Generate and open rustdoc |
just docs-serve | Serve mdbook docs locally with live reload |
just run <args> | Run the CLI with arguments |
just changelog | Generate CHANGELOG.md from git history |
IDE Setup
rust-analyzer
ruley works well with rust-analyzer. Recommended VS Code settings:
{
"rust-analyzer.cargo.features": "all",
"rust-analyzer.check.command": "clippy",
"rust-analyzer.check.extraArgs": [
"--all-features"
]
}
Project Structure
ruley/
src/
cli/ # CLI argument parsing and config management
packer/ # File discovery, gitignore, compression
llm/ # LLM providers, tokenization, chunking
generator/ # Prompt templates and rule parsing
output/ # Format writers and conflict resolution
utils/ # Errors, progress, caching, validation
lib.rs # Pipeline orchestration (10-stage pipeline)
main.rs # Entry point
tests/ # Integration tests
benches/ # Criterion benchmarks
prompts/ # LLM prompt templates (markdown)
docs/ # mdbook documentation (this book)
examples/ # Example configuration files
Code Quality
Before submitting changes, ensure:
- All tests pass:
just test - No clippy warnings:
just lint(includes all features) andjust clippy-min(no default features) - Code is formatted:
just fmt - Full CI suite passes:
just ci-check
Lint Policy
ruley enforces a zero-warnings policy. Key lint rules:
unsafe_code = "deny"– No unsafe code in production (tests may use#[allow(unsafe_code)])unwrap_used = "deny"– Nounwrap()in production codepanic = "deny"– Nopanic!()in production codepedantic,nursery,cargo– Clippy lint groups atwarnlevel
See [workspace.lints.clippy] in Cargo.toml for the full lint configuration.
Commit Standards
Follow Conventional Commits:
<type>[(<scope>)]: <description>
- Types:
feat,fix,docs,refactor,test,perf,build,ci,chore - Scope (optional):
cli,packer,llm,generator,output,utils,config,deps - DCO: Always sign off with
git commit -s
See CONTRIBUTING.md for full contribution guidelines.
Testing
This chapter covers ruley’s testing philosophy, how to run tests, and guidelines for writing new tests.
Testing Philosophy
ruley follows the test proportionality principle: test critical functionality and real edge cases. Test code should be shorter than implementation.
Do test:
- Critical functionality and real edge cases
- Error conditions and recovery paths
- Token counting and chunking logic
- Retry logic and error handling
- Cost estimation
- Compression ratio targets (~70% token reduction)
Don’t test:
- Trivial operations or framework behavior
- Every possible provider/format permutation
- Obvious success cases or trivial formatting
Running Tests
All Tests
just test
This runs all tests with cargo-nextest and --all-features.
Verbose Output
just test-verbose
Specific Tests
# Run a specific test by name
cargo test test_name
# Run tests in a specific module
cargo test packer::
# Run integration tests only
cargo test --test '*'
Coverage
just coverage
Generates an LCOV coverage report at lcov.info using cargo-llvm-cov.
Test Organization
Unit Tests
Unit tests live in the same file as the code they test, inside #[cfg(test)] modules:
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_something() {
// ...
}
}
}
Integration Tests
Integration tests live in the tests/ directory and test the CLI as a black box using assert_cmd:
tests/
common/
mod.rs # Shared test utilities
cli_tests.rs # CLI integration tests
...
Test Utilities
The tests/common/mod.rs module provides shared helpers for integration tests:
- Environment isolation: Uses a denylist pattern (
env_remove) to strip sensitive variables from subprocess environments - Denylisted variables:
RULEY_*,ANTHROPIC_API_KEY,OPENAI_API_KEY,OPENROUTER_API_KEY,OLLAMA_HOST
Important: The denylist uses
env_remove(notenv_clear()) becauseenv_clear()breaks coverage instrumentation (LLVM_PROFILE_FILE), rustflags, and other tooling-injected variables.
Async Tests
Use #[tokio::test] for async tests:
#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_async_operation() {
let result = some_async_function().await;
assert!(result.is_ok());
}
}
Snapshot Tests
ruley uses insta for snapshot testing of CLI outputs and generated rules:
#![allow(unused)]
fn main() {
use insta::assert_snapshot;
#[test]
fn test_output_format() {
let output = generate_output();
assert_snapshot!(output);
}
}
Update snapshots with:
cargo insta review
CI Testing
CI runs the full test suite on every push and pull request:
- Quality:
just lint-rust(formatting + clippy) - Tests:
just testwith all features - Cross-platform: Tests on Linux, macOS, and Windows
- Feature combinations: Default features, no features, all features
- MSRV: Checks compilation with
stable minus 2 releases - Coverage: Generates and uploads to Codecov
All CI checks must pass before merge. See .github/workflows/ci.yml for the full configuration.
Writing Tests
Guidelines
- Test the behavior, not the implementation – Focus on inputs and outputs
- Use descriptive test names –
test_chunk_size_exceeds_context_triggers_chunking - One assertion per concept – Multiple
assert!calls are fine, but each test should verify one logical behavior - Avoid mocking when possible – Integration tests with real (but controlled) inputs are preferred
- Keep tests fast – Use small inputs and avoid network calls in unit tests
Unsafe Code in Tests
Rust 2024 edition makes std::env::set_var unsafe due to data race concerns. Tests that manipulate environment variables need #[allow(unsafe_code)]:
#![allow(unused)]
fn main() {
#[test]
#[allow(unsafe_code)]
fn test_env_var_override() {
unsafe { std::env::set_var("RULEY_PROVIDER", "openai") };
// ... test logic ...
unsafe { std::env::remove_var("RULEY_PROVIDER") };
}
}
Security
This chapter covers ruley’s security model, vulnerability reporting, and security features.
Reporting Vulnerabilities
Do not report security vulnerabilities through public GitHub issues.
Use one of the following channels:
- GitHub Private Vulnerability Reporting (preferred)
- Email support@evilbitlabs.io encrypted with the project’s PGP key
Please include:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested fix (if any)
See SECURITY.md for full policy details including scope, response times, safe harbor provisions, and the PGP key.
Security Features
Code Safety
unsafe_code = "deny"enforced at the package level- Pure Rust implementation with no C dependencies in core logic
- Zero
unwrap()andpanic!()in production code (enforced via clippy lints)
Credential Handling
- API keys are read from environment variables at runtime
- Keys are never stored in generated output files
- Keys are never logged or included in error messages
- No credential persistence between runs
Network Security
- No network listening: ruley makes outbound-only connections
- Connections are made only to configured LLM provider APIs
- HTTPS is used for all API calls
- No telemetry or analytics
Supply Chain Security
- GitHub Actions pinned to full commit SHAs
cargo auditruns in CI to check for known vulnerabilitiescargo denychecks license compliance and duplicate dependencies- CodeQL analysis on every PR
- OSSF Scorecard monitoring
- Automated dependency updates via Dependabot
Scope
In Scope
- API key or credential leakage through error messages, logs, or generated output
- Command injection via CLI arguments or configuration files
- Path traversal in file input/output handling
- Prompt injection affecting output integrity
- Denial of service via crafted input files or configuration
- Unsafe handling of LLM responses (e.g., writing to unintended paths)
Out of Scope
- Vulnerabilities in upstream LLM providers (Anthropic, OpenAI, etc.)
- Issues requiring physical access to the machine
- Social engineering attacks
- LLM hallucinations or inaccurate generated rules (quality issue, not security)
Response Timeline
This is a volunteer-maintained project. Response times are best-effort:
- Acknowledgment: Within 1 week
- Initial assessment: Within 2 weeks
- Fix release: Within 90 days of confirmed vulnerabilities
- Disclosure: Coordinated through GitHub Security Advisories
Release Process
- Overview
- Platform Targets
- Pre-Release Checklist
- Version Bump Process
- Tag and Release
- Automated Release Pipeline
- Changelog Generation
- Rollback Procedure
- Prerelease Versions
- Versioning Policy
ruley releases are automated via cargo-dist and GitHub Actions. This chapter documents the release workflow and verification procedures.
Overview
Pushing a version tag (e.g., v1.0.0) triggers the release workflow, which:
- Validates the tag version matches
Cargo.toml - Builds binaries for all 5 platform targets
- Generates SHA256 checksums
- Signs artifacts via Sigstore/GitHub Attestations
- Creates a GitHub release with changelog and binaries
- Publishes to crates.io (non-prereleases only)
- Updates the Homebrew tap
Configuration lives in dist-workspace.toml.
Platform Targets
| Platform | Target | Archive |
|---|---|---|
| Linux x86_64 | x86_64-unknown-linux-gnu | .tar.gz |
| Linux x86_64 (static) | x86_64-unknown-linux-musl | .tar.gz |
| Linux ARM64 | aarch64-unknown-linux-gnu | .tar.gz |
| macOS ARM64 | aarch64-apple-darwin | .tar.gz |
| Windows x86_64 | x86_64-pc-windows-msvc | .zip |
Pre-Release Checklist
Before creating a release:
- All tests pass locally:
just ci-check - Zero clippy warnings:
cargo clippy --all-targets --all-features -- -D warnings - Documentation is up to date (README.md, CHANGELOG.md)
- Review open issues and PRs for release blockers
- Release build succeeds:
cargo build --release - Binary works correctly:
./target/release/ruley --help - Dry-run crates.io publish:
cargo publish --dry-run --all-features
Version Bump Process
-
Update the version in
Cargo.toml:version = "X.Y.Z" -
Run
cargo updateto updateCargo.lock. -
Generate the changelog:
just changelog -
Review and edit
CHANGELOG.mdfor the new version entry. -
Commit all changes:
git add Cargo.toml Cargo.lock CHANGELOG.md git commit -s -m "chore(release): prepare for vX.Y.Z"
Tag and Release
-
Create an annotated tag:
git tag -a vX.Y.Z -m "Release vX.Y.Z" -
Push the tag to trigger the release workflow:
git push origin vX.Y.Z
Automated Release Pipeline
The release is managed by two GitHub Actions workflows:
release.yml (cargo-dist)
Triggered by v* tags. Builds platform binaries, creates the GitHub release, publishes to crates.io, and updates the Homebrew tap.
release-plz.yml
Runs on every push to main:
- release-plz-pr: Creates a release preparation PR with version bumps and changelog updates
- release-plz-release: Publishes to crates.io when version changes are merged
Changelog Generation
Changelogs are generated by git-cliff using the configuration in cliff.toml. Commits follow the Conventional Commits specification and are grouped by type:
- Features, Bug Fixes, Refactoring, Documentation, Performance, Testing, Miscellaneous, Security
Rollback Procedure
If a release needs to be rolled back:
- Delete the tag locally:
git tag -d vX.Y.Z - Delete the tag remotely:
git push origin :refs/tags/vX.Y.Z - Delete the GitHub release via the web interface
- Yank from crates.io:
cargo yank --version X.Y.Z
Yanking prevents new installs but does not remove the package. Existing Cargo.lock files referencing this version will still work.
Prerelease Versions
For release candidates or beta releases:
- Use a prerelease tag:
v1.0.0-rc.1,v1.0.0-beta.1 - The release workflow marks these as prereleases on GitHub
- Prerelease versions are not published to crates.io automatically
Versioning Policy
ruley follows Semantic Versioning:
- Major (X.0.0): Breaking changes to CLI interface or config format
- Minor (0.X.0): New features, new providers, new output formats
- Patch (0.0.X): Bug fixes, dependency updates, documentation
Release Verification
- GitHub Attestations
- SHA256 Checksums
- crates.io Verification
- Verifying a Cargo Install
- Supply Chain Security
This chapter explains how to verify the authenticity and integrity of ruley release artifacts.
GitHub Attestations
All release artifacts are signed via Sigstore using GitHub Attestations. This provides cryptographic proof that binaries were built by the official GitHub Actions workflow and have not been tampered with.
Verifying with gh
gh attestation verify <artifact> --repo EvilBit-Labs/ruley
Replace <artifact> with the path to the downloaded binary or archive.
What This Verifies
- The artifact was built by the
EvilBit-Labs/ruleyrepository’s GitHub Actions - The build environment matches the expected workflow
- The artifact has not been modified since it was built
SHA256 Checksums
Each release includes SHA256 checksums for all platform binaries. These are attached to the GitHub release alongside the binaries.
Verifying Checksums
macOS / Linux
# Download the checksum file
curl -fsSLO https://github.com/EvilBit-Labs/ruley/releases/latest/download/sha256sums.txt
# Verify a specific artifact
sha256sum -c sha256sums.txt --ignore-missing
Windows
# Compute the hash of the downloaded archive
Get-FileHash ruley-x86_64-pc-windows-msvc.zip -Algorithm SHA256
crates.io Verification
When installing via cargo install ruley, Cargo verifies the package integrity automatically using the crates.io checksum. No additional steps are needed.
Verifying a Cargo Install
To verify the installed version:
ruley --version
Compare the output with the expected version from the releases page.
Supply Chain Security
ruley takes several measures to secure the build and release pipeline:
| Measure | Description |
|---|---|
| Pinned Actions | All GitHub Actions are pinned to full commit SHAs |
| Sigstore signing | Artifacts signed via GitHub Attestations |
| cargo-audit | Checks for known vulnerabilities in dependencies |
| cargo-deny | Checks license compliance and duplicate dependencies |
| CodeQL | Static analysis for security vulnerabilities |
| OSSF Scorecard | Automated security posture monitoring |
| Dependabot | Automated dependency update PRs |
| Reproducible builds | Pinned Rust toolchain via rust-toolchain.toml and mise.toml |
| Committed lock file | Cargo.lock is committed for deterministic builds |
Security Assurance Case
This document provides a structured security assurance case for ruley, identifying the attack surface, threat model, and mitigations in place.
Attack Surface
ruley’s attack surface is limited by design. It is a CLI tool that reads local files and makes outbound API calls.
Entry Points
| Entry Point | Description | Trust Level |
|---|---|---|
| CLI arguments | User-provided flags and paths | Untrusted |
| Configuration files | TOML files loaded from disk | Semi-trusted |
| Environment variables | API keys and overrides | Trusted |
| Repository files | Source files scanned for analysis | Untrusted |
| LLM API responses | Generated content from providers | Untrusted |
| Repomix files | Pre-packed XML input files | Untrusted |
Exit Points
| Exit Point | Description |
|---|---|
| Generated rule files | Written to disk at user-specified or default paths |
| LLM API requests | Outbound HTTPS calls to provider endpoints |
| Console output | Progress, cost estimates, summaries |
| Cache files | .ruley/ directory for state and temp files |
Threat Model
T1: Credential Leakage
Threat: API keys exposed in error messages, logs, or generated output.
Mitigations:
- API keys are read from environment variables only, never persisted
- Error messages do not include API key values
- Generated rule files do not contain API keys
- Logging does not expose credentials
T2: Path Traversal
Threat: Malicious file paths in config or LLM responses writing outside the project directory.
Mitigations:
- Output paths are resolved relative to the project root
- The
outputmodule validates write paths - Config file paths are canonicalized during discovery
T3: Command Injection
Threat: Crafted CLI arguments or config values executing unintended commands.
Mitigations:
- clap validates all CLI input with
value_parserandPossibleValuesParser - Config values are deserialized through serde (no shell evaluation)
- No shell commands are executed from user input
T4: Prompt Injection via Codebase
Threat: Malicious content in scanned source files influencing LLM output to produce harmful rules.
Mitigations:
- Generated rules are validated (syntax, schema, semantic checks)
- Validation detects contradictory rules and unrealistic references
- Users review generated rules before committing to their repository
- Finalization stage can deconflict with existing rules
T5: Denial of Service
Threat: Crafted input causing excessive resource consumption (memory, CPU, network).
Mitigations:
- Token counting prevents unbounded LLM calls
- Chunk size limits cap memory usage per chunk
- Cost confirmation requires explicit user approval before expensive operations
- Bounded concurrency in async operations
T6: Supply Chain Compromise
Threat: Compromised dependencies or build artifacts.
Mitigations:
cargo auditchecks for known vulnerabilities in CIcargo denyenforces license and duplicate dependency policies- GitHub Actions pinned to commit SHAs (not mutable tags)
- CodeQL static analysis on every PR
- OSSF Scorecard monitoring
- Sigstore artifact signing
Code Safety Guarantees
| Guarantee | Enforcement |
|---|---|
| No unsafe code | unsafe_code = "deny" in [lints.rust] |
| No unwrap in production | unwrap_used = "deny" in clippy config |
| No panic in production | panic = "deny" in clippy config |
| Zero clippy warnings | -D warnings enforced in CI |
| Dependency auditing | cargo audit and cargo deny in CI |
Data Flow Security
flowchart LR
User["User<br/>(trusted)"] -->|CLI args,<br/>env vars| Ruley["ruley<br/>process"]
Disk["Local files<br/>(semi-trusted)"] -->|config,<br/>source files| Ruley
Ruley -->|HTTPS| LLM["LLM API<br/>(untrusted response)"]
LLM -->|generated rules| Ruley
Ruley -->|validated output| Output["Rule files<br/>(user reviews)"]
Ruley -->|temp data| Cache[".ruley/<br/>cache"]
Key security boundaries:
- Input boundary: All CLI arguments validated by clap; config files deserialized by serde
- Network boundary: Only HTTPS outbound to configured providers; no inbound connections
- Output boundary: Generated rules validated before writing; paths resolved relative to project root
- Trust boundary: LLM responses treated as untrusted input; validated before use
Updating This Document
This document must be updated when:
- New entry points are added (e.g., new input sources)
- New exit points are added (e.g., new output destinations)
- New dependencies are introduced that handle untrusted input
- The network communication model changes
- New LLM providers are added