Introduction

DBSurveyor is a security-focused, offline-first database analysis and documentation toolchain designed for security professionals, database administrators, and DevOps teams.

Core Purpose

Generate comprehensive, secure database schema documentation for PostgreSQL, MySQL, SQLite, and MongoDB databases without compromising security or requiring persistent network connectivity.

Key Features

Dual-Binary Architecture: Separate collector (dbsurveyor-collect) and postprocessor (dbsurveyor) for flexible workflows
Database Support: PostgreSQL (✅), SQLite (✅), MySQL (🚧), MongoDB (🚧), SQL Server (🚧)
Security-First Design: AES-GCM encryption, credential protection, zero telemetry
Offline Operation: Complete functionality in air-gapped environments
Documentation Generation: Markdown (✅), JSON analysis (✅), HTML/ERD (🚧 in development)
Shell Completion Support: Built-in completion scripts for bash, zsh, fish, PowerShell, and elvish
Progress Indicators: Visual feedback with spinners for long-running operations (schema loading, decryption, documentation generation)
Environment-Aware Output: Respects NO_COLOR and TERM=dumb environment variables to disable color output and progress indicators

Security Guarantees

NO NETWORK CALLS: Operates completely offline after initial installation, beyond the collector connection to the target database
NO TELEMETRY: Zero data collection, usage tracking, or external reporting
NO CREDENTIALS IN OUTPUTS: Database credentials never stored in output files
AES-GCM ENCRYPTION: Industry-standard authenticated encryption for sensitive outputs
AIRGAP COMPATIBLE: Full functionality in air-gapped environments

Target Users

Red Team Operators: Post-compromise database enumeration in contested environments
Blue Team Analysts: Auditing unknown systems and inherited databases
Database Administrators: Comprehensive documentation and compliance reporting
Security Analysts: Database security assessment and risk identification
System Administrators: Exploring inherited or legacy databases with minimal documentation

Architecture Overview

DBSurveyor uses a dual-binary architecture that separates data collection from documentation generation:

graph LR
    A[Database] --> B[dbsurveyor-collect]
    B --> C[Schema File]
    C --> D[dbsurveyor]
    D --> E[Documentation]
    
    C -.-> F[Encrypted]
    C -.-> G[Compressed]
    
    E --> H[Markdown]
    E --> I[HTML]
    E --> J[SQL DDL]
    E --> K[Mermaid ERD]

This separation provides several benefits:

Security: Collection can happen in secure environments, documentation generation offline
Flexibility: Multiple documentation formats from a single collection
Portability: Schema files can be transferred between environments
Auditability: Clear separation between data collection and processing

Installation

Pre-Built Binaries

Pre-built binaries are available on the Releases page for Linux, macOS, and Windows.

Download the appropriate binary for your platform and add it to your PATH.

Homebrew

For macOS and Linux users with Homebrew:

brew install EvilBit-Labs/tap/dbsurveyor

From Source

Clone and Build

git clone https://github.com/EvilBit-Labs/dbsurveyor.git
cd dbsurveyor
cargo build --release

The compiled binaries will be available in target/release/.

Feature Selection

Control which database engines are compiled in using feature flags:

# Build with all database support
cargo build --release --all-features

# Build with specific databases only
cargo build --release --features postgresql,mysql,encryption

# Build minimal version for airgap environments
cargo build --release --no-default-features --features sqlite

Feature Flags

Feature	Description	Dependencies
`postgresql`	PostgreSQL support	sqlx with postgres driver
`mysql`	MySQL support	sqlx with mysql driver
`sqlite`	SQLite support	sqlx with sqlite driver
`mongodb`	MongoDB support	mongodb crate
`mssql`	SQL Server support	tiberius crate
`compression`	Zstandard compression	zstd crate
`encryption`	AES-GCM encryption	aes-gcm, argon2 crates

Default Features

default = ["postgresql", "sqlite"]

The default build includes PostgreSQL and SQLite support, which covers the most common use cases while maintaining a reasonable binary size.

Binary Variants

DBSurveyor provides two main binaries:

`dbsurveyor-collect`

The database collection tool that connects to databases and extracts schema information.

Default Features: postgresql, sqlite
Optional Features: mysql, mongodb, mssql, compression, encryption

`dbsurveyor`

The documentation generator that processes collected schema files.

Default Features: None (minimal dependencies)
Optional Features: compression, encryption

Development Setup

For development work, install the complete toolchain:

# Install development dependencies
just install

# This installs:
# - Rust toolchain components (clippy, rustfmt)
# - Cargo tools (audit, deny, llvm-cov, nextest)
# - Security tools (syft for SBOM generation)
# - Documentation tools (mdbook and plugins)

Verification

Verify your installation:

# Check binary versions
./target/release/dbsurveyor-collect --version
./target/release/dbsurveyor --version

# Test with SQLite (no external database required)
echo "CREATE TABLE test (id INTEGER);" | sqlite3 test.db
./target/release/dbsurveyor-collect sqlite://test.db
./target/release/dbsurveyor schema.dbsurveyor.json

# Clean up
rm test.db schema.dbsurveyor.json schema.md

Airgap Installation

For air-gapped environments:

Prepare on connected system:

# Download dependencies
cargo fetch

# Create vendor directory
cargo vendor vendor

# Build minimal version
cargo build --release --no-default-features --features sqlite

Transfer to airgap system:
- Copy entire project directory including vendor/
- Copy built binaries from target/release/

Use offline:

# Use vendored dependencies
cargo build --release --offline --no-default-features --features sqlite

Troubleshooting

Common Issues

Compilation fails with missing dependencies:

Ensure you have the latest Rust toolchain: rustup update
Check feature flags match your requirements

Database driver compilation errors:

Install system dependencies for your target databases
For PostgreSQL: libpq-dev (Ubuntu) or postgresql-devel (RHEL)
For MySQL: libmysqlclient-dev (Ubuntu) or mysql-devel (RHEL)

Permission errors:

Ensure you have write permissions to the target directory
Use cargo install --root ~/.local for user-local installation

Getting Help

Check the Troubleshooting section
Review GitHub Issues
Consult the CLI Reference for command-specific help

Quick Start

This guide will get you up and running with DBSurveyor in minutes.

Prerequisites

DBSurveyor installed (see Installation)
Access to a database (PostgreSQL, MySQL, SQLite, or MongoDB)

Basic Workflow

DBSurveyor follows a two-step process:

Collect schema from database → schema file
Generate documentation from schema file → reports

graph LR
    A[Database] --> B[dbsurveyor-collect]
    B --> C[schema.dbsurveyor.json]
    C --> D[dbsurveyor]
    D --> E[Documentation]

Example 1: SQLite Database

The simplest example using SQLite (no server setup required):

# Create a sample SQLite database
sqlite3 sample.db << EOF
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    username TEXT NOT NULL,
    email TEXT UNIQUE,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE posts (
    id INTEGER PRIMARY KEY,
    user_id INTEGER,
    title TEXT NOT NULL,
    content TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id)
);

INSERT INTO users (username, email) VALUES
    ('alice', 'alice@example.com'),
    ('bob', 'bob@example.com');
EOF

# Step 1: Collect schema
dbsurveyor-collect sqlite://sample.db

# Step 2: Generate documentation
dbsurveyor generate schema.dbsurveyor.json

# View the generated documentation
cat schema.md

Example 2: PostgreSQL Database

# Step 1: Collect schema from PostgreSQL
dbsurveyor-collect postgres://user:password@localhost:5432/mydb

# Step 2: Generate Markdown documentation (HTML is placeholder)
dbsurveyor generate schema.dbsurveyor.json --format markdown

# View the generated documentation
cat schema.md

Example 3: Encrypted Schema Collection

For sensitive environments, encrypt the schema file:

# Collect with encryption (will prompt for password)
dbsurveyor-collect --encrypt postgres://user:pass@localhost/db

# Generate documentation from encrypted file (will prompt for password)
dbsurveyor generate schema.enc

Common Commands

Collection Commands

# Basic collection
dbsurveyor-collect <database-url>

# Test connection without collecting
dbsurveyor-collect test <database-url>

# Collect with compression
dbsurveyor-collect --compress <database-url>

# Collect all databases on server
dbsurveyor-collect --all-databases postgres://user:pass@localhost

# List supported database types
dbsurveyor-collect list

Documentation Commands

# Generate Markdown (default)
dbsurveyor generate schema.dbsurveyor.json

# Generate HTML report
dbsurveyor --format html schema.json

# Generate SQL DDL
dbsurveyor sql schema.json --dialect postgresql

# Analyze schema statistics
dbsurveyor analyze schema.json --detailed

# Validate schema file
dbsurveyor validate schema.json

Output Files

DBSurveyor generates several types of files:

Schema Files (from collector)

schema.dbsurveyor.json - Standard JSON format (specification)
schema.dbsurveyor.json.zst - Compressed format (with --compress)
schema.enc - Encrypted format (with --encrypt)

Documentation Files (from generator)

schema.md - Markdown documentation (✅ implemented)
schema_analysis.json - Analysis report (✅ implemented)
schema.html - HTML report with search (🚧 placeholder)
schema.sql - SQL DDL reconstruction (🚧 placeholder)
schema.mmd - Mermaid ERD diagram (🚧 placeholder)

Next Steps

Now that you have the basics working:

Explore Database Support: Learn about database-specific features
Security Features: Understand encryption and security options
Advanced Configuration: Customize behavior with configuration options
CLI Reference: Master all available command-line options

Troubleshooting Quick Start Issues

Connection Issues

# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db

# Check if database is accessible
ping localhost
telnet localhost 5432  # PostgreSQL default port

Permission Issues

# Ensure user has read permissions
# PostgreSQL: GRANT SELECT ON ALL TABLES IN SCHEMA public TO username;
# MySQL: GRANT SELECT ON database.* TO 'username'@'%';

File Not Found

# Check if schema file was created
ls -la *.dbsurveyor.json *.enc *.zst

# Verify file format
file schema.dbsurveyor.json

For more detailed troubleshooting, see the Troubleshooting Guide.

Database Collection

The dbsurveyor-collect tool connects to databases and extracts comprehensive schema information. This guide covers all collection features and options.

Basic Collection

Simple Schema Collection

# Collect from PostgreSQL
dbsurveyor-collect postgres://user:password@localhost:5432/mydb

# Collect from SQLite
dbsurveyor-collect sqlite:///path/to/database.db

# Collect from MySQL
dbsurveyor-collect mysql://root:password@localhost:3306/mydb

Connection Testing

Always test your connection before collecting:

# Test connection without collecting schema
dbsurveyor-collect test postgres://user:pass@localhost/db

# Example output:
# ✓ Connection test successful
# Connection to PostgreSQL database successful

Advanced Collection Options

Output Formats

Control how the collected schema is stored:

# Standard JSON (default)
dbsurveyor-collect --output schema.json postgres://localhost/db

# Compressed with Zstandard
dbsurveyor-collect --compress --output schema.json.zst postgres://localhost/db

# Encrypted with AES-GCM (prompts for password)
dbsurveyor-collect --encrypt --output schema.enc postgres://localhost/db

# Both compressed and encrypted
dbsurveyor-collect --compress --encrypt postgres://localhost/db

Multi-Database Collection

Collect schemas from all databases on a server:

# Collect all accessible databases
dbsurveyor-collect --all-databases postgres://user:pass@localhost

# Include system databases (postgres, template0, etc.)
dbsurveyor-collect --all-databases --include-system-databases postgres://localhost

# Exclude specific databases
dbsurveyor-collect --all-databases --exclude-databases postgres,template0,template1 postgres://localhost

Performance and Stealth Options

# Throttle operations for stealth (1 second delay between operations)
dbsurveyor-collect --throttle 1000 postgres://localhost/db

# Limit sample data collection
dbsurveyor-collect --sample 50 postgres://localhost/db

# Verbose logging for debugging
dbsurveyor-collect -vvv postgres://localhost/db

What Gets Collected

DBSurveyor extracts comprehensive database metadata:

Core Schema Objects

Tables: Names, schemas, columns, data types, constraints
Views: Definitions, columns, dependencies
Indexes: Names, columns, uniqueness, types
Constraints: Primary keys, foreign keys, check constraints, unique constraints
Procedures: Stored procedures with parameters and definitions
Functions: User-defined functions with signatures
Triggers: Event triggers with timing and definitions
Custom Types: User-defined types, enums, domains

Metadata Information

Database Info: Name, version, size, encoding, collation
Collection Metadata: Timestamp, duration, warnings, collector version
Statistics: Row counts, object counts, relationship mappings

Sample Data (Optional)

When enabled, DBSurveyor can collect sample data:

# Collect sample data (100 rows per table by default)
dbsurveyor-collect --sample 100 postgres://localhost/db

# Disable sample collection
dbsurveyor-collect --sample 0 postgres://localhost/db

Security Note: Sample data may contain sensitive information. Review outputs before sharing.

Database-Specific Features

PostgreSQL

# Full PostgreSQL collection
dbsurveyor-collect postgres://user:pass@localhost:5432/mydb

Collected Objects:

Tables, views, materialized views
Indexes (B-tree, Hash, GiST, SP-GiST, GIN, BRIN)
Constraints (PK, FK, Check, Unique, Exclusion)
Functions, procedures, triggers
Custom types, domains, enums
Extensions and schemas

PostgreSQL-Specific:

JSONB columns and indexes
Array types and operations
Inheritance relationships
Partitioned tables

MySQL

# MySQL collection (requires --features mysql)
dbsurveyor-collect mysql://root:password@localhost:3306/mydb

Collected Objects:

Tables, views
Indexes (Primary, Unique, Index, Fulltext, Spatial)
Constraints (PK, FK, Check, Unique)
Stored procedures, functions, triggers
Events and routines

MySQL-Specific:

Storage engines (InnoDB, MyISAM, etc.)
Partitioning information
Auto-increment values
Character sets and collations

SQLite

# SQLite collection
dbsurveyor-collect sqlite:///path/to/database.db
dbsurveyor-collect /path/to/database.sqlite  # Alternative format

Collected Objects:

Tables, views
Indexes
Triggers
Constraints (limited support)

SQLite-Specific:

ROWID and WITHOUT ROWID tables
Virtual tables
Attached databases
PRAGMA settings

MongoDB

# MongoDB collection (requires --features mongodb)
dbsurveyor-collect mongodb://user:pass@localhost:27017/mydb

Collected Objects:

Collections and their schemas
Indexes (single field, compound, text, geospatial)
Document structure analysis
Field statistics and types

MongoDB-Specific:

Schema inference from documents
Index usage statistics
Sharding information
GridFS collections

Security Considerations

Read-Only Operations

DBSurveyor only performs read operations:

SELECT statements for data retrieval
DESCRIBE or SHOW statements for metadata
Information schema queries
System catalog queries

No write operations are ever performed.

Credential Handling

# Use environment variables to avoid command history
export DATABASE_URL="postgres://user:pass@localhost/db"
dbsurveyor-collect

# Credentials are sanitized in all logs and error messages
# "postgres://user:pass@localhost/db" becomes "postgres://user:****@localhost/db"

Connection Security

# Use SSL/TLS connections when available
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"

# Connection timeouts prevent hanging
# Default: 30 seconds for connection, 30 seconds for queries

Troubleshooting Collection Issues

Connection Problems

# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db

# Check network connectivity
ping localhost
telnet localhost 5432  # PostgreSQL port

# Verify credentials
psql -h localhost -U user -d db -c "SELECT 1;"

Permission Issues

# PostgreSQL: Grant read permissions
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_user;
GRANT USAGE ON SCHEMA public TO dbsurveyor_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_user;

# MySQL: Grant read permissions
GRANT SELECT ON mydb.* TO 'dbsurveyor_user'@'%';

# SQLite: Ensure file read permissions
chmod 644 /path/to/database.db

Large Database Handling

# Use compression for large schemas
dbsurveyor-collect --compress postgres://localhost/large_db

# Throttle operations to reduce load
dbsurveyor-collect --throttle 500 postgres://localhost/large_db

# Exclude large or unnecessary databases
dbsurveyor-collect --all-databases --exclude-databases logs,temp,backup postgres://localhost

Memory and Performance

# Monitor collection progress
dbsurveyor-collect -v postgres://localhost/db

# For very large databases, consider collecting specific schemas
# (This feature is planned for future releases)

Output File Structure

The collected schema file contains:

{
  "format_version": "1.0",
  "database_info": {
    "name": "mydb",
    "version": "13.7",
    "size_bytes": 1048576,
    "encoding": "UTF8"
  },
  "tables": [...],
  "views": [...],
  "indexes": [...],
  "constraints": [...],
  "procedures": [...],
  "functions": [...],
  "triggers": [...],
  "custom_types": [...],
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 5432,
    "collector_version": "0.1.0",
    "warnings": []
  }
}

Next Steps

After collecting schema data:

Generate Documentation: Use dbsurveyor to create reports
Analyze Schema: Use analysis commands to understand the database structure
Secure Storage: Consider encrypting schema files for sensitive databases
Version Control: Track schema changes over time

See the Documentation Generation guide for the next steps.

Documentation Generation

The dbsurveyor tool processes collected schema files and generates comprehensive documentation in multiple formats. This guide covers all documentation generation features.

Basic Documentation Generation

Generate Markdown Documentation

# Generate Markdown (default format)
dbsurveyor generate schema.dbsurveyor.json

# Explicit format specification
dbsurveyor --format markdown schema.dbsurveyor.json

# Custom output file
dbsurveyor --format markdown --output database-docs.md schema.json

Generate HTML Reports

# Generate HTML report with search functionality
dbsurveyor --format html schema.dbsurveyor.json

# Custom output location
dbsurveyor --format html --output reports/database.html schema.json

Output Formats

DBSurveyor supports multiple documentation formats:

Markdown Documentation

Best for: Version control, wikis, README files

dbsurveyor --format markdown schema.json

Features:

Clean, readable text format
Table of contents with navigation
Code blocks for SQL examples
Compatible with GitHub, GitLab, etc.

Example Output Structure:

# Database Schema: mydb

## Summary
- **Tables**: 15
- **Views**: 3
- **Indexes**: 28

## Tables

### users
- **Schema**: public
- **Rows**: ~1,250

| Column | Type | Nullable | Default |
|--------|------|----------|---------|
| id | INTEGER | NO | nextval('users_id_seq') |
| username | VARCHAR(50) | NO | |
| email | VARCHAR(255) | YES | |

HTML Reports

Best for: Interactive browsing, presentations, sharing

dbsurveyor --format html schema.json

Features:

Interactive search and filtering
Collapsible sections
Syntax highlighting
Responsive design
Offline-capable (no external dependencies)

JSON Analysis

Best for: Programmatic processing, integration with other tools

dbsurveyor --format json schema.json

Features:

Structured data for automation
Statistics and metrics
Relationship analysis
Machine-readable format

Example Output:

{
  "database_name": "mydb",
  "analysis": {
    "table_count": 15,
    "view_count": 3,
    "index_count": 28,
    "constraint_count": 42
  },
  "relationships": [
    {
      "from_table": "orders",
      "to_table": "users",
      "relationship_type": "foreign_key"
    }
  ]
}

Mermaid ERD Diagrams

Best for: Visual database design, architecture documentation

dbsurveyor --format mermaid schema.json

Features:

Entity Relationship Diagrams
Visual table relationships
Compatible with Mermaid.js
Can be embedded in Markdown

Example Output:

erDiagram
    users {
        int id PK
        string username
        string email
    }
    orders {
        int id PK
        int user_id FK
        datetime created_at
    }
    users ||--o{ orders : "has many"

Advanced Features

SQL DDL Reconstruction

Generate SQL DDL scripts to recreate the database structure:

# PostgreSQL DDL
dbsurveyor sql schema.json --dialect postgresql --output recreate.sql

# MySQL DDL
dbsurveyor sql schema.json --dialect mysql --output mysql-schema.sql

# Generic SQL (ANSI standard)
dbsurveyor sql schema.json --dialect generic --output portable.sql

Supported Dialects:

postgresql - PostgreSQL-specific syntax
mysql - MySQL-specific syntax
sqlite - SQLite-specific syntax
sqlserver - SQL Server-specific syntax
generic - ANSI SQL standard

Schema Analysis

Get detailed insights about your database structure:

# Basic analysis
dbsurveyor analyze schema.json

# Detailed analysis with statistics
dbsurveyor analyze schema.json --detailed

Analysis Output:

Schema Analysis: mydb
================
Tables: 15
Views: 3
Indexes: 28
Constraints: 42

Detailed Analysis:
- Procedures: 5
- Functions: 12
- Triggers: 8
- Custom Types: 3

Relationship Analysis:
- Foreign Key Relationships: 18
- Self-Referencing Tables: 2
- Orphaned Tables: 1

Schema Validation

Verify schema file integrity and format:

dbsurveyor validate schema.dbsurveyor.json

Validation Checks:

File format correctness
Schema version compatibility
Data integrity verification
Warning detection

Working with Encrypted Files

DBSurveyor can process encrypted schema files:

# Process encrypted file (will prompt for password)
dbsurveyor generate schema.enc

# All formats work with encrypted input
dbsurveyor --format html schema.enc
dbsurveyor sql schema.enc --dialect postgresql

Security Notes:

Password prompts are secure (no echo)
Decryption happens in memory only
Original encrypted file remains unchanged

Working with Compressed Files

Process compressed schema files automatically:

# Compressed files are detected automatically
dbsurveyor generate schema.dbsurveyor.json.zst

# Works with all output formats
dbsurveyor --format html schema.json.zst

Data Redaction

Control sensitive data visibility in generated documentation:

Redaction Modes

# No redaction (show all data)
dbsurveyor --redact-mode none schema.json

# Minimal redaction (only obvious sensitive fields)
dbsurveyor --redact-mode minimal schema.json

# Balanced redaction (recommended default)
dbsurveyor --redact-mode balanced schema.json

# Conservative redaction (maximum privacy)
dbsurveyor --redact-mode conservative schema.json

# Disable all redaction
dbsurveyor --no-redact schema.json

Redaction Examples

Mode	Password Field	Email Field	ID Field
`none`	`password123`	`user@example.com`	`12345`
`minimal`	`[REDACTED]`	`user@example.com`	`12345`
`balanced`	`[REDACTED]`	`[EMAIL]`	`12345`
`conservative`	`[REDACTED]`	`[REDACTED]`	`[ID]`

Batch Processing

Process multiple schema files:

# Process all schema files in directory
for file in *.dbsurveyor.json; do
    dbsurveyor --format html "$file"
done

# Generate multiple formats from one schema
for format in markdown html mermaid; do
    dbsurveyor --format "$format" schema.json
done

Integration Examples

CI/CD Pipeline

# GitHub Actions example
  - name: Generate Database Documentation
    run: |
      dbsurveyor-collect postgres://${{ secrets.DB_URL }}
      dbsurveyor --format html --output docs/database.html schema.dbsurveyor.json
      dbsurveyor --format markdown --output docs/DATABASE.md schema.dbsurveyor.json

Documentation Website

# Generate documentation for website
dbsurveyor --format html --output public/database.html schema.json
dbsurveyor --format mermaid --output docs/erd.mmd schema.json

# Include in static site generator
cp schema.html docs/_includes/database-schema.html

Version Control

# Track schema changes over time
git add schema.dbsurveyor.json
git commit -m "Update database schema documentation"

# Generate diff-friendly markdown
dbsurveyor --format markdown schema.json
git add schema.md
git commit -m "Update schema documentation"

Customization and Templating

Template Customization

DBSurveyor uses the Askama template engine. Future versions will support custom templates:

# Custom template (planned feature)
dbsurveyor --template custom.html.j2 --format html schema.json

Output Customization

# Custom CSS for HTML output (planned feature)
dbsurveyor --format html --css custom.css schema.json

# Custom header/footer (planned feature)
dbsurveyor --format html --header header.html --footer footer.html schema.json

Performance Considerations

Large Schema Files

# For large schemas, use streaming processing
dbsurveyor --format json schema.json  # Most efficient

# HTML generation may be slower for very large schemas
dbsurveyor --format html large-schema.json  # May take time

Memory Usage

JSON format: Minimal memory usage
Markdown format: Low memory usage
HTML format: Moderate memory usage (includes search index)
Mermaid format: Low memory usage

Troubleshooting

Common Issues

File not found:

# Check file exists and has correct extension
ls -la *.dbsurveyor.json *.enc *.zst

Permission denied:

# Check file permissions
chmod 644 schema.dbsurveyor.json

Invalid format:

# Validate schema file first
dbsurveyor validate schema.json

Encryption password issues:

# Ensure you're using the correct password
# Password prompts don't show characters for security

Debug Mode

# Enable verbose logging
dbsurveyor -vvv generate schema.json

# Check file format
file schema.dbsurveyor.json

Next Steps

After generating documentation:

Review Output: Check generated documentation for accuracy
Share Securely: Consider redaction levels for different audiences
Integrate: Include documentation in your workflow or website
Automate: Set up regular documentation updates
Version Control: Track documentation changes over time

See the Configuration guide for advanced customization options.

Security Features

DBSurveyor is built with security-first principles that are non-negotiable. This document details the comprehensive security features and guarantees.

Core Security Guarantees

1. Offline-Only Operation

NO NETWORK CALLS: Operates completely offline after initial installation
NO TELEMETRY: Zero data collection, usage tracking, or external reporting
NO AUTO-UPDATES: Manual control over all software updates
AIRGAP COMPATIBLE: Full functionality in air-gapped environments

2. Credential Protection

Database credentials are never stored, logged, or included in any output files.

Implementation Details

Immediate Sanitization: Connection strings sanitized in all error messages and logs
Secure Memory Handling: Credentials automatically zeroed using the zeroize crate
Separate Handling: Credentials parsed separately from connection configuration
No Serialization: Credential structures are never serialized to disk

Example of Credential Sanitization

#![allow(unused)]
fn main() {
// Input: "postgres://user:secret123@localhost:5432/mydb"
// Logged: "postgres://user:****@localhost:5432/mydb"
// Error messages never contain actual passwords
}

3. AES-GCM Encryption

Industry-standard authenticated encryption for sensitive schema outputs.

Encryption Specifications

Algorithm: AES-GCM-256 (256-bit key, 96-bit nonce, 128-bit authentication tag)
Nonce Generation: Cryptographically secure random nonces (never reused)
Key Derivation: Argon2id with secure parameters
Authentication: Separate authentication tags prevent tampering

Key Derivation Parameters

Parameter	Value	Purpose
Algorithm	Argon2id v1.3	Memory-hard key derivation
Salt Size	16 bytes (128 bits)	Random salt per encryption
Memory Cost	64 MiB (65536 KiB)	Memory hardness
Time Cost	3 iterations	Computational hardness
Parallelism	4 threads	Parallel processing
Output Length	32 bytes (256 bits)	AES-256 key size

Encryption Process

graph TD
    A[Password] --> B[Argon2id KDF]
    C[Random Salt] --> B
    B --> D[256-bit Key]
    E[Schema Data] --> F[AES-GCM Encrypt]
    D --> F
    G[Random Nonce] --> F
    F --> H[Ciphertext]
    F --> I[Auth Tag]
    
    J[Encrypted File] --> K[Algorithm: AES-GCM-256]
    J --> L[Nonce: 96 bits]
    J --> M[Ciphertext]
    J --> N[Auth Tag: 128 bits]
    J --> O[KDF Params + Salt]

4. Database Security

All database operations follow strict security principles.

Read-Only Operations

SELECT/DESCRIBE Only: No INSERT, UPDATE, DELETE, or DDL operations
Connection Flags: Read-only mode enforced at connection level where supported
User Permissions: Recommend using read-only database users

Connection Security

Timeouts: All connections and queries have configurable timeouts (default: 30s)
Connection Pooling: Limited connection pools prevent resource exhaustion
TLS/SSL: Certificate validation when using encrypted connections
Parameterized Queries: All queries use parameterized statements (no SQL injection risk)

Example Connection Configuration

#![allow(unused)]
fn main() {
ConnectionConfig {
    connect_timeout: Duration::from_secs(30),
    query_timeout: Duration::from_secs(30),
    max_connections: 10,
    read_only: true,  // Enforced
    // Credentials handled separately
}
}

Security Testing

DBSurveyor includes comprehensive security tests to verify all guarantees.

Credential Protection Tests

#![allow(unused)]
fn main() {
#[test]
fn test_no_credentials_in_output() {
    let schema = collect_schema("postgres://user:secret@localhost/db").await?;
    let json = serde_json::to_string(&schema)?;

    // Verify no sensitive data is present
    assert!(!json.contains("secret"));
    assert!(!json.contains("user:secret"));
    assert!(!json.contains("password"));
}
}

Encryption Tests

#![allow(unused)]
fn main() {
#[test]
fn test_nonce_uniqueness() {
    let data = b"test schema data";
    let encrypted1 = encrypt_data(data, "password")?;
    let encrypted2 = encrypt_data(data, "password")?;

    // Same data should produce different ciphertext (random nonce)
    assert_ne!(encrypted1.nonce, encrypted2.nonce);
    assert_ne!(encrypted1.ciphertext, encrypted2.ciphertext);
}
}

Offline Operation Tests

#![allow(unused)]
fn main() {
#[test]
fn test_airgap_compatibility() {
    // Simulate airgap environment
    let schema = load_schema("fixtures/sample_schema.json")?;

    // All processing should work offline
    let documentation = generate_documentation(&schema, OutputFormat::Markdown)?;
    assert!(!documentation.is_empty());
}
}

Security Best Practices

For Users

Use Read-Only Database Users

-- PostgreSQL
CREATE USER dbsurveyor_readonly;
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_readonly;
GRANT USAGE ON SCHEMA public TO dbsurveyor_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_readonly;

-- MySQL
CREATE USER 'dbsurveyor_readonly'@'%' IDENTIFIED BY 'password';
GRANT SELECT ON mydb.* TO 'dbsurveyor_readonly'@'%';

Use Strong Encryption Passwords
- Minimum 12 characters
- Mix of uppercase, lowercase, numbers, symbols
- Consider using password managers

Secure File Handling

# Set restrictive permissions on encrypted files
chmod 600 schema.enc

# Use secure deletion when done
shred -vfz -n 3 schema.enc

Network Security
- Use TLS/SSL connections when possible
- Avoid credentials in command history
- Use environment variables for connection strings

For Developers

Never Log Credentials

#![allow(unused)]
fn main() {
// ✅ Correct
log::info!("Connecting to database");

// ❌ Never do this
log::info!("Connecting to {}", database_url);
}

Sanitize Error Messages

#![allow(unused)]
fn main() {
// ✅ Correct
#[error("Connection failed to database")]
ConnectionFailed,

// ❌ Never do this
#[error("Failed to connect to {url}")]
ConnectionError { url: String },
}

Use Secure Memory Handling

#![allow(unused)]
fn main() {
use zeroize::{Zeroize, Zeroizing};

#[derive(Zeroize)]
#[zeroize(drop)]
struct Credentials {
    username: Zeroizing<String>,
    password: Zeroizing<Option<String>>,
}
}

Security Validation Commands

Run these commands to verify security guarantees:

# Complete security validation
just security-full

# Test encryption capabilities
just test-encryption

# Verify offline operation
just test-offline

# Check for credential leakage
just test-credential-security

# Security audit
just audit

Threat Model

In Scope

Credential Exposure: Preventing database credentials from appearing in logs, outputs, or error messages
Data at Rest: Protecting collected schema files with strong encryption
Network Isolation: Ensuring offline operation after initial setup
Memory Safety: Secure handling of sensitive data in memory

Out of Scope

Network Transport Security: TLS/SSL configuration is the user’s responsibility
Database Server Security: DBSurveyor assumes the database server is trusted
Host System Security: Operating system and file system security is external
Social Engineering: User education and operational security practices

Compliance Considerations

DBSurveyor’s security features support various compliance requirements:

SOC 2: Offline operation and encryption support security controls
GDPR: Data minimization and encryption for personal data protection
HIPAA: Encryption and access controls for healthcare data
PCI DSS: Secure handling of systems that may contain payment data

Security Advisories

Current known security advisories and their status:

RUSTSEC-2023-0071 (RSA crate - Marvin Attack)

Severity: Medium
Impact: Timing side-channel vulnerability in RSA implementation
Affected: MySQL connections only (not enabled by default)
Mitigation: MySQL support disabled by default; PostgreSQL and SQLite recommended
Status: No fix available upstream; tracked for future SQLx updates
Review Date: 2025-09-01

Reporting Security Issues

To report security vulnerabilities:

DO NOT create public GitHub issues for security vulnerabilities
Email security issues to: security@evilbitlabs.io
Include detailed reproduction steps and impact assessment
Allow reasonable time for response and remediation

We follow responsible disclosure practices and will acknowledge security reports within 48 hours.

Configuration

DBSurveyor can be configured through command-line options, environment variables, and configuration files. This guide covers all configuration options and best practices.

Environment Variables

Database Connection

# Primary database connection
export DATABASE_URL="postgres://user:password@localhost:5432/mydb"

# Use with collector without specifying URL
dbsurveyor-collect

Logging Configuration

# Set log level for all modules
export RUST_LOG=info

# Set log level for specific modules
export RUST_LOG=dbsurveyor_collect=debug,dbsurveyor_core=trace

# Disable ANSI color output and progress indicators
# Follows the no-color.org convention
# Useful for CI environments, non-TTY contexts, and accessibility
export NO_COLOR=1            # Any value disables color and progress output
export TERM=dumb             # Alternative way to disable color/progress output

# Log levels: error, warn, info, debug, trace
export RUST_LOG=debug

Connection Pool Configuration

# Maximum connection pool size (default: 10)
export DBSURVEYOR_MAX_CONNECTIONS=10

# Minimum idle connections in pool (default: 2)
export DBSURVEYOR_MIN_IDLE_CONNECTIONS=2

# Connection timeout in seconds (default: 30)
export DBSURVEYOR_CONNECT_TIMEOUT_SECS=30

# Idle connection timeout in seconds (default: 600)
export DBSURVEYOR_IDLE_TIMEOUT_SECS=600

# Maximum connection lifetime in seconds (default: 3600)
export DBSURVEYOR_MAX_LIFETIME_SECS=3600

Security Configuration

# Disable credential prompts (use with caution)
export DBSURVEYOR_NO_PROMPT=1

# Default encryption password (NOT RECOMMENDED for production)
export DBSURVEYOR_ENCRYPTION_PASSWORD="your-password"

Command-Line Configuration

Global Options

Available for both dbsurveyor-collect and dbsurveyor:

# Verbosity levels
dbsurveyor-collect -v postgres://localhost/db      # Info level
dbsurveyor-collect -vv postgres://localhost/db     # Debug level
dbsurveyor-collect -vvv postgres://localhost/db    # Trace level

# Quiet mode (errors only)
dbsurveyor-collect -q postgres://localhost/db

# Help and version
dbsurveyor-collect --help
dbsurveyor-collect --version

Collection Configuration

Connection Settings

# Connection timeout (default: 30s)
dbsurveyor-collect --connect-timeout 60 postgres://localhost/db

# Query timeout (default: 30s)
dbsurveyor-collect --query-timeout 45 postgres://localhost/db

# Maximum connections (default: 10)
dbsurveyor-collect --max-connections 5 postgres://localhost/db

Data Collection Settings

# Sample size per table (default: 100, max: 10,000)
# Values above 10,000 are capped to prevent resource exhaustion
dbsurveyor-collect --sample 50 postgres://localhost/db

# Throttle delay between operations in milliseconds
dbsurveyor-collect --throttle 1000 postgres://localhost/db

# Include/exclude specific object types
dbsurveyor-collect --no-views postgres://localhost/db
dbsurveyor-collect --no-procedures postgres://localhost/db
dbsurveyor-collect --no-triggers postgres://localhost/db

Multi-Database Settings

# Collect all databases
dbsurveyor-collect --all-databases postgres://localhost

# Include system databases
dbsurveyor-collect --all-databases --include-system-databases postgres://localhost

# Exclude specific databases
dbsurveyor-collect --all-databases --exclude-databases "postgres,template0,template1" postgres://localhost

Documentation Configuration

Output Format Settings

# Output format
dbsurveyor --format html schema.json
dbsurveyor --format markdown schema.json
dbsurveyor --format json schema.json
dbsurveyor --format mermaid schema.json

# Custom output file
dbsurveyor --output custom-name.html --format html schema.json

Redaction Settings

# Redaction modes
dbsurveyor --redact-mode none schema.json        # No redaction
dbsurveyor --redact-mode minimal schema.json     # Minimal redaction
dbsurveyor --redact-mode balanced schema.json    # Balanced (default)
dbsurveyor --redact-mode conservative schema.json # Maximum redaction

# Disable all redaction
dbsurveyor --no-redact schema.json

Configuration Files

Workspace Configuration

Create a .dbsurveyor.toml file in your project root:

# .dbsurveyor.toml
[collection]
# Default connection settings
connect_timeout = "30s"
query_timeout = "30s"
max_connections = 10
sample_size = 100       # Max: 10,000 (values above are clamped)
throttle_ms = 0

# Object collection settings
include_views = true
include_procedures = true
include_functions = true
include_triggers = true
include_indexes = true
include_constraints = true
include_custom_types = true

# Multi-database settings
include_system_databases = false
exclude_databases = ["postgres", "template0", "template1"]

[output]
# Default output settings
format = "markdown"
compression = false
encryption = false

[redaction]
# Data redaction settings
mode = "balanced"
custom_patterns = [
  { pattern = "(?i)(api[_-]?key)", replacement = "[API_KEY]" },
  { pattern = "(?i)(token)", replacement = "[TOKEN]" },
]

[security]
# Security settings
read_only = true
sanitize_credentials = true

User Configuration

Create a global configuration file at ~/.config/dbsurveyor/config.toml:

# ~/.config/dbsurveyor/config.toml
[defaults]
# Default database type for ambiguous connections
database_type = "postgresql"

# Default output directory
output_dir = "~/Documents/database-docs"

# Default logging level
log_level = "info"

[security]
# Security preferences
always_encrypt = false
strong_passwords_only = true
credential_timeout = "5m"

[templates]
# Custom template directory
template_dir = "~/.config/dbsurveyor/templates"

Database-Specific Configuration

PostgreSQL Configuration

# SSL/TLS settings
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=verify-full&sslcert=client.crt&sslkey=client.key"

# Connection pool settings
dbsurveyor-collect "postgres://user:pass@localhost/db?pool_max_conns=5&pool_timeout=30"

# Schema-specific collection
dbsurveyor-collect "postgres://user:pass@localhost/db?search_path=public,custom"

MySQL Configuration

# SSL settings
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=REQUIRED"

# Character set
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"

# Connection timeout
dbsurveyor-collect "mysql://user:pass@localhost/db?connect_timeout=30"

SQLite Configuration

# Read-only mode
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"

# Busy timeout
dbsurveyor-collect "sqlite:///path/to/db.sqlite?busy_timeout=30000"

# Cache size
dbsurveyor-collect "sqlite:///path/to/db.sqlite?cache_size=2000"

MongoDB Configuration

# Authentication database
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?authSource=admin"

# SSL settings
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?ssl=true&sslVerify=false"

# Connection timeout
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?connectTimeoutMS=30000"

Advanced Configuration

Custom Redaction Patterns

Create custom redaction rules:

# .dbsurveyor.toml
[redaction.patterns]
# Credit card numbers
credit_card = { pattern = "\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}", replacement = "[CREDIT_CARD]" }

# Social Security Numbers
ssn = { pattern = "\\d{3}-\\d{2}-\\d{4}", replacement = "[SSN]" }

# Phone numbers
phone = { pattern = "\\(\\d{3}\\)\\s?\\d{3}-\\d{4}", replacement = "[PHONE]" }

# Custom API keys
api_key = { pattern = "(?i)api[_-]?key[\"']?\\s*[:=]\\s*[\"']?([a-zA-Z0-9]{32,})", replacement = "[API_KEY]" }

Performance Tuning

# .dbsurveyor.toml
[performance]
# Connection pool settings
max_connections = 10
min_connections = 2
connection_timeout = "30s"
idle_timeout = "10m"
max_lifetime = "1h"

# Query settings
query_timeout = "30s"
batch_size = 1000
max_concurrent_queries = 5

# Memory settings
max_memory_mb = 512
streaming_threshold_mb = 100

Output Customization

# .dbsurveyor.toml
[output.html]
# HTML-specific settings
theme = "dark"
include_search = true
include_toc = true
syntax_highlighting = true

[output.markdown]
# Markdown-specific settings
include_toc = true
table_format = "github"
code_blocks = true

[output.json]
# JSON-specific settings
pretty_print = true
include_metadata = true
compress = false

Environment-Specific Configuration

Development Environment

# .env.development
DATABASE_URL=postgres://dev_user:dev_pass@localhost:5432/dev_db
RUST_LOG=debug
DBSURVEYOR_SAMPLE_SIZE=10
DBSURVEYOR_THROTTLE=0

Production Environment

# .env.production
DATABASE_URL=postgres://readonly_user:secure_pass@prod-db:5432/prod_db
RUST_LOG=warn
DBSURVEYOR_SAMPLE_SIZE=0
DBSURVEYOR_THROTTLE=1000
DBSURVEYOR_ENCRYPT=true

CI/CD Environment

# .env.ci
DATABASE_URL=postgres://ci_user:ci_pass@ci-db:5432/test_db
RUST_LOG=info
NO_COLOR=1
DBSURVEYOR_NO_PROMPT=1
DBSURVEYOR_OUTPUT_FORMAT=json

Configuration Validation

Validate your configuration:

# Check configuration syntax
dbsurveyor-collect --check-config

# Dry run with configuration
dbsurveyor-collect --dry-run postgres://localhost/db

# Show effective configuration
dbsurveyor-collect --show-config

Security Best Practices

Credential Management

# Use environment variables instead of command line
export DATABASE_URL="postgres://user:pass@localhost/db"
dbsurveyor-collect

# Use credential files with restricted permissions
echo "postgres://user:pass@localhost/db" > ~/.dbsurveyor/credentials
chmod 600 ~/.dbsurveyor/credentials
dbsurveyor-collect --credentials-file ~/.dbsurveyor/credentials

Configuration File Security

# Secure configuration files
chmod 600 .dbsurveyor.toml
chmod 600 ~/.config/dbsurveyor/config.toml

# Use environment variable substitution
# .dbsurveyor.toml
[collection]
database_url = "${DATABASE_URL}"
encryption_password = "${ENCRYPTION_PASSWORD}"

Audit Configuration

# Log configuration usage
export RUST_LOG=dbsurveyor_core::config=debug

# Review effective configuration
dbsurveyor-collect --show-config --dry-run

Troubleshooting Configuration

Common Issues

Configuration not found:

# Check configuration file locations
ls -la .dbsurveyor.toml
ls -la ~/.config/dbsurveyor/config.toml

# Use explicit configuration file
dbsurveyor-collect --config custom-config.toml

Environment variable not recognized:

# Check environment variable names (case sensitive)
env | grep DBSURVEYOR
env | grep DATABASE_URL

# Verify variable export
echo $DATABASE_URL

Permission denied:

# Check file permissions
ls -la .dbsurveyor.toml
chmod 644 .dbsurveyor.toml  # Make readable

Debug Configuration

# Show all configuration sources
dbsurveyor-collect --show-config --verbose

# Test configuration parsing
dbsurveyor-collect --check-config --verbose

# Trace configuration loading
export RUST_LOG=dbsurveyor_core::config=trace
dbsurveyor-collect --dry-run

Migration and Upgrades

Configuration Migration

When upgrading DBSurveyor versions:

# Backup existing configuration
cp .dbsurveyor.toml .dbsurveyor.toml.backup

# Check for configuration changes
dbsurveyor-collect --check-config --verbose

# Migrate configuration format
dbsurveyor-collect --migrate-config

Version Compatibility

# .dbsurveyor.toml
[meta]
# Specify minimum required version
min_version = "0.1.0"
config_version = "1.0"

This comprehensive configuration system allows you to customize DBSurveyor for your specific needs while maintaining security and performance.

CLI Reference

Complete reference for all DBSurveyor command-line options.

dbsurveyor-collect

Database schema collection tool.

Synopsis

dbsurveyor-collect [OPTIONS] [DATABASE_URL]
dbsurveyor-collect <COMMAND>

Global Options

Option	Short	Description
`--verbose`	`-v`	Increase verbosity (-v, -vv, -vvv)
`--quiet`	`-q`	Suppress all output except errors
`--help`	`-h`	Print help information
`--version`	`-V`	Print version information

Collection Options

Option	Description	Default	Status
`--database-url <URL>`	Database connection string	From `DATABASE_URL` env var	✅ Implemented
`--output <PATH>`	Output file path	`schema.dbsurveyor.json`	✅ Implemented
`--sample <N>`	Number of sample rows per table	`100`	🚧 Planned
`--throttle <MS>`	Delay between operations (ms)	None	🚧 Planned
`--compress`	Compress output using Zstandard	`false`	✅ Implemented
`--encrypt`	Encrypt output using AES-GCM	`false`	✅ Implemented
`--enable-quality`	Enable data quality analysis on sampled data	`false`	✅ Implemented
`--quality-threshold <LIST>`	Quality thresholds (e.g., `completeness:0.9,uniqueness:0.95,consistency:0.85`)	None	✅ Implemented
`--all-databases`	Collect all accessible databases	`false`	🚧 Planned
`--include-system-databases`	Include system databases	`false`	🚧 Planned
`--exclude-databases <LIST>`	Comma-separated list to exclude	None	🚧 Planned

Commands

collect

Collect schema from database.

dbsurveyor-collect collect <DATABASE_URL> [--output <PATH>]

test

Test database connection without collecting schema.

dbsurveyor-collect test <DATABASE_URL>

list

List supported database types and connection formats.

dbsurveyor-collect list

Examples

# Basic collection
dbsurveyor-collect postgres://user:pass@localhost/db

# With custom output file
dbsurveyor-collect --output my-schema.json postgres://localhost/db

# Encrypted and compressed
dbsurveyor-collect --encrypt --compress postgres://localhost/db

# Multi-database collection
dbsurveyor-collect --all-databases --exclude-databases postgres,template0 postgres://localhost

# Test connection only
dbsurveyor-collect test mysql://root:password@localhost/mydb

# Throttled collection (stealth mode)
dbsurveyor-collect --throttle 1000 postgres://localhost/db

Connection String Formats

Database	Format	Example	Status
PostgreSQL	`postgres://user:pass@host:port/db`	`postgres://admin:secret@localhost:5432/mydb`	✅ Implemented
SQLite	`sqlite:///path/to/file`	`sqlite:///home/user/data.db`	✅ Implemented
MySQL	`mysql://user:pass@host:port/db`	`mysql://root:password@localhost:3306/mydb`	🚧 In Development
MongoDB	`mongodb://user:pass@host:port/db`	`mongodb://admin:secret@localhost:27017/mydb`	🚧 Planned
SQL Server	`mssql://user:pass@host:port/db`	`mssql://sa:password@localhost:1433/mydb`	🚧 Planned

Environment Variables

Variable	Description
`DATABASE_URL`	Default database connection string
`RUST_LOG`	Logging configuration (`error`, `warn`, `info`, `debug`, `trace`)
`DBSURVEYOR_MAX_CONNECTIONS`	Maximum connection pool size (default: `10`)
`DBSURVEYOR_MIN_IDLE_CONNECTIONS`	Minimum idle connections in pool (default: `2`)
`DBSURVEYOR_CONNECT_TIMEOUT_SECS`	Connection timeout in seconds (default: `30`)
`DBSURVEYOR_IDLE_TIMEOUT_SECS`	Idle connection timeout in seconds (default: `600`)
`DBSURVEYOR_MAX_LIFETIME_SECS`	Maximum connection lifetime in seconds (default: `3600`)

dbsurveyor

Database schema documentation and analysis tool.

Synopsis

dbsurveyor [OPTIONS] [INPUT_FILE]
dbsurveyor <COMMAND>

Global Options

Option	Short	Description
`--verbose`	`-v`	Increase verbosity (-v, -vv, -vvv)
`--quiet`	`-q`	Suppress all output except errors
`--help`	`-h`	Print help information
`--version`	`-V`	Print version information

Documentation Options

Option	Short	Description	Default
`--format <FORMAT>`	`-f`	Output format	`markdown`
`--output <PATH>`	`-o`	Output file path	Auto-detected
`--redact-mode <MODE>`		Data redaction level	`balanced`
`--no-redact`		Disable all data redaction	`false`

Output Formats

Format	Description	Extension	Status
`markdown`	Markdown documentation	`.md`	✅ Implemented
`json`	JSON analysis report	`.json`	✅ Implemented
`html`	HTML report with search	`.html`	🚧 Placeholder
`mermaid`	Mermaid ERD diagram	`.mmd`	🚧 Placeholder

Redaction Modes

Mode	Description
`none`	No redaction (show all data)
`minimal`	Minimal redaction (only obvious sensitive fields)
`balanced`	Balanced redaction (recommended default)
`conservative`	Conservative redaction (maximum privacy)

Commands

generate

Generate documentation from schema file.

dbsurveyor generate <INPUT_FILE> [OPTIONS]

Options:

--format <FORMAT> - Output format
--output <PATH> - Output file path

analyze

Analyze schema for insights and statistics.

dbsurveyor analyze <INPUT_FILE> [--detailed]

Options:

--detailed - Show detailed analysis statistics

sql

Reconstruct SQL DDL from schema.

dbsurveyor sql <INPUT_FILE> [OPTIONS]

Options:

--dialect <DIALECT> - Target SQL dialect (default: postgresql)
--output <PATH> - Output file path

SQL Dialects:

postgresql - PostgreSQL dialect (🚧 Placeholder)
mysql - MySQL dialect (🚧 Placeholder)
sqlite - SQLite dialect (🚧 Placeholder)
sqlserver - SQL Server dialect (🚧 Placeholder)
generic - Generic SQL (ANSI standard) (🚧 Placeholder)

validate

Validate schema file format.

dbsurveyor validate <INPUT_FILE>

Examples

# Generate Markdown documentation
dbsurveyor generate schema.dbsurveyor.json

# Generate HTML report
dbsurveyor --format html --output report.html schema.json

# Process encrypted schema (will prompt for password)
dbsurveyor generate schema.enc

# Generate SQL DDL for MySQL
dbsurveyor sql schema.json --dialect mysql --output recreate.sql

# Analyze schema with detailed statistics
dbsurveyor analyze schema.json --detailed

# Validate schema file format
dbsurveyor validate schema.dbsurveyor.json

# Generate with conservative redaction
dbsurveyor --redact-mode conservative schema.json

# Generate without any redaction
dbsurveyor --no-redact schema.json

Input File Formats

DBSurveyor automatically detects input file formats:

Extension	Format	Description
`.json`	JSON	Standard schema format
`.zst`	Compressed	Zstandard compressed JSON
`.enc`	Encrypted	AES-GCM encrypted JSON

Exit Codes

Code	Description
`0`	Success
`1`	General error

Note: Both binaries currently use exit code 1 for all error conditions. Granular exit codes (e.g., file-not-found, connection failure) may be added in a future release.

Environment Variables

Variable	Description
`RUST_LOG`	Logging configuration
`NO_COLOR`	Disable colored output
`DBSURVEYOR_MAX_CONNECTIONS`	Maximum connection pool size (default: `10`)
`DBSURVEYOR_MIN_IDLE_CONNECTIONS`	Minimum idle connections in pool (default: `2`)
`DBSURVEYOR_CONNECT_TIMEOUT_SECS`	Connection timeout in seconds (default: `30`)
`DBSURVEYOR_IDLE_TIMEOUT_SECS`	Idle connection timeout in seconds (default: `600`)
`DBSURVEYOR_MAX_LIFETIME_SECS`	Maximum connection lifetime in seconds (default: `3600`)

Common Usage Patterns

Secure Workflow

# 1. Test connection
dbsurveyor-collect test postgres://user:pass@localhost/db

# 2. Collect with encryption
dbsurveyor-collect --encrypt postgres://user:pass@localhost/db

# 3. Generate documentation offline
dbsurveyor generate schema.enc

# 4. Validate output
dbsurveyor validate schema.enc

Multi-Database Documentation

# Collect from multiple databases
dbsurveyor-collect --all-databases postgres://localhost > collection.log

# Generate comprehensive report
dbsurveyor --format html --output full-report.html schema.dbsurveyor.json

# Extract SQL for specific database
dbsurveyor sql schema.json --dialect postgresql --output postgres-ddl.sql

Development Workflow

# Quick collection and documentation
dbsurveyor-collect sqlite://dev.db && dbsurveyor generate schema.dbsurveyor.json

# Analyze changes
dbsurveyor analyze schema.dbsurveyor.json --detailed

# Generate multiple formats
for format in markdown html mermaid; do
    dbsurveyor --format $format schema.dbsurveyor.json
done

Output Formats

DBSurveyor generates structured output in multiple formats to support different use cases and workflows.

Schema File Formats

Standard JSON Format (.dbsurveyor.json)

The primary output format is a JSON file following the DBSurveyor Schema v1.0 specification:

{
  "format_version": "1.0",
  "database_info": {
    "name": "production_db",
    "version": "13.7",
    "size_bytes": 1073741824,
    "encoding": "UTF8",
    "collation": "en_US.UTF-8",
    "access_level": "Full",
    "collection_status": "Success"
  },
  "tables": [
    {
      "name": "users",
      "schema": "public",
      "columns": [
        {
          "name": "id",
          "data_type": {
            "Integer": {
              "bits": 32,
              "signed": true
            }
          },
          "is_nullable": false,
          "is_primary_key": true,
          "is_auto_increment": true,
          "ordinal_position": 1
        }
      ],
      "primary_key": {
        "name": "users_pkey",
        "columns": [
          "id"
        ]
      },
      "foreign_keys": [],
      "indexes": [],
      "constraints": []
    }
  ],
  "views": [],
  "indexes": [],
  "constraints": [],
  "procedures": [],
  "functions": [],
  "triggers": [],
  "custom_types": [],
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 1500,
    "collector_version": "1.0.0",
    "warnings": []
  }
}

Table Sampling Data

Tables may optionally include a sampling field containing sampled row data with metadata:

{
  "table_name": "users",
  "schema_name": "public",
  "rows": [
    {
      "id": 1,
      "username": "alice",
      "email": "alice@example.com"
    },
    {
      "id": 2,
      "username": "bob",
      "email": "bob@example.com"
    }
  ],
  "sample_size": 2,
  "total_rows": 1234,
  "sampling_strategy": {
    "MostRecent": {
      "limit": 10
    }
  },
  "collected_at": "2024-01-15T10:30:00Z",
  "warnings": [],
  "sample_status": "Complete"
}

Sample Status Field

The optional sample_status field indicates the outcome of the sampling operation:

"Complete": Sampling completed successfully with no issues
{"PartialRetry": {"original_limit": 10}}: Sampling partially completed with a reduced row limit (may be retried)
{"Skipped": {"reason": "table too large"}}: Sampling was skipped or not performed

Backward Compatibility: The sample_status field is optional and omitted when not set, ensuring older JSON files without this field remain valid.

Compressed Format (.dbsurveyor.json.zst)

Large schema files can be compressed using Zstandard compression:

# Generate compressed output
dbsurveyor-collect --compress postgres://localhost/large_db

# Postprocessor automatically detects and decompresses
dbsurveyor generate schema.dbsurveyor.json.zst

Benefits:

60-80% size reduction for large schemas
Fast compression/decompression
Maintains full data integrity

Encrypted Format (.enc)

Sensitive schema data can be encrypted using AES-GCM-256:

# Generate encrypted output (prompts for password)
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db

# Postprocessor prompts for decryption password
dbsurveyor generate schema.enc

Security Features:

AES-GCM-256 authenticated encryption
Argon2id key derivation (64 MiB memory, 3 iterations)
Random 96-bit nonces (never reused)
Embedded KDF parameters for decryption

Encrypted File Structure:

{
  "algorithm": "AES-GCM-256",
  "nonce": [/* 12 bytes */],
  "ciphertext": [/* encrypted data */],
  "auth_tag": [/* 16 bytes */],
  "kdf_params": {
    "salt": [/* 16 bytes */],
    "memory_cost": 65536,
    "time_cost": 3,
    "parallelism": 4,
    "version": "1.3"
  }
}

Documentation Formats

Markdown Documentation (.md)

Status: ✅ Implemented

Basic Markdown documentation with schema overview:

dbsurveyor generate schema.json --format markdown

Output Example:

# Database Schema: production_db

Generated by DBSurveyor v1.0.0
Collection Date: 2024-01-15 10:30:00 UTC

## Summary

- **Tables**: 25
- **Views**: 3
- **Indexes**: 47

## Tables

### users
- **Schema**: public
- **Columns**: 5
- **Primary Key**: id

JSON Analysis Report (.json)

Status: ✅ Implemented

Structured analysis data for programmatic consumption:

dbsurveyor analyze schema.json --detailed

Output Example:

{
  "database_name": "production_db",
  "table_count": 25,
  "view_count": 3,
  "index_count": 47,
  "constraint_count": 32,
  "collection_date": "2024-01-15T10:30:00Z",
  "analysis": {
    "largest_table": "audit_log",
    "most_indexes": "users",
    "foreign_key_relationships": 18
  }
}

HTML Report (.html)

Status: 🚧 Placeholder Implementation

Interactive HTML reports with search and navigation:

dbsurveyor generate schema.json --format html

Planned Features:

Interactive table browser
Search functionality
Relationship visualization
Responsive design for mobile/desktop
Offline-compatible (no external dependencies)

Mermaid ERD (.mmd)

Status: 🚧 Placeholder Implementation

Entity Relationship Diagrams using Mermaid syntax:

dbsurveyor generate schema.json --format mermaid

Planned Output:

erDiagram
    users {
        int id PK
        string username
        string email UK
        datetime created_at
    }

    orders {
        int id PK
        int user_id FK
        decimal total
        datetime created_at
    }

    users ||--o{ orders : "has many"

SQL DDL (.sql)

Status: 🚧 Placeholder Implementation

Reconstructed SQL DDL for schema recreation:

dbsurveyor sql schema.json --dialect postgresql --output recreate.sql

Planned Features:

Multiple SQL dialect support (PostgreSQL, MySQL, SQLite, SQL Server)
Complete schema recreation scripts
Data type mapping between databases
Constraint and index recreation
Migration script generation

Unified Data Type System

DBSurveyor uses a unified type system to represent data types across different databases:

Basic Types

{
  "String": {
    "max_length": 255
  },
  "Integer": {
    "bits": 32,
    "signed": true
  },
  "Float": {
    "precision": 53
  },
  "Boolean": null,
  "Date": null,
  "DateTime": {
    "with_timezone": true
  },
  "Time": {
    "with_timezone": false
  }
}

Advanced Types

{
  "Binary": {
    "max_length": 1024
  },
  "Json": null,
  "Uuid": null,
  "Array": {
    "element_type": {
      "String": {
        "max_length": 50
      }
    }
  },
  "Custom": {
    "type_name": "user_status_enum"
  }
}

Database-Specific Mapping

DBSurveyor Type	PostgreSQL	MySQL	SQLite	MongoDB
`String`	`VARCHAR`, `TEXT`	`VARCHAR`, `TEXT`	`TEXT`	`string`
`Integer`	`INTEGER`, `BIGINT`	`INT`, `BIGINT`	`INTEGER`	`int`, `long`
`Float`	`REAL`, `DOUBLE`	`FLOAT`, `DOUBLE`	`REAL`	`double`
`Boolean`	`BOOLEAN`	`BOOLEAN`	`INTEGER`	`bool`
`Json`	`JSON`, `JSONB`	`JSON`	`TEXT`	`object`
`Array`	`ARRAY[]`	`JSON`	`TEXT`	`array`
`Custom`	`ENUM`, `DOMAIN`	`ENUM`	N/A	N/A

Schema Validation

All output files are validated against the DBSurveyor JSON Schema v1.0:

Validation Features

Format Version Checking: Ensures compatibility
Required Field Validation: All mandatory fields present
Data Type Validation: Correct type structure
Security Validation: No credential fields allowed
Relationship Validation: Foreign key consistency

Validation Errors

# Validate a schema file
dbsurveyor validate schema.json

# Example validation error
Error: Schema validation failed
- Missing required field: collection_metadata.collected_at
- Invalid data type: expected Integer, found String
- Security violation: credential field detected in column default_value

File Size Considerations

Compression Recommendations

Schema Size	Recommendation	Expected Compression
< 1 MB	Standard JSON	N/A
1-10 MB	Zstandard compression	60-70% reduction
> 10 MB	Compression + chunking	70-80% reduction

Large Schema Handling

For very large databases (1000+ tables):

# Use compression for large schemas
dbsurveyor-collect --compress postgres://localhost/large_db

# Consider sampling for development
dbsurveyor-collect --sample 10 postgres://localhost/large_db

Security Considerations

Credential Protection

All output formats guarantee:

No database credentials in any output file
Connection strings sanitized in metadata
Error messages sanitized
Validation prevents credential leakage

Encryption Best Practices

# Use strong passwords for encryption
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db

# Verify encrypted file integrity
dbsurveyor validate schema.enc

Airgap Compatibility

All formats work completely offline:

No external dependencies in generated files
Self-contained documentation
Local-only processing
No network calls during generation

Format Evolution

Version Compatibility

v1.0: Current format with full backward compatibility
Future versions: Additive changes only
Migration tools: Automatic format upgrades
Deprecation policy: 2-version support window

Schema Extensions

The format supports extensions for:

Database-specific features
Custom metadata
Plugin-generated data
Performance metrics

Integration Examples

CI/CD Pipeline

# Generate schema documentation in CI
  - name: Collect Database Schema
    run: |
      dbsurveyor-collect --compress $DATABASE_URL
      dbsurveyor generate schema.dbsurveyor.json.zst --format markdown

  - name: Upload Documentation
    uses: actions/upload-artifact@v3
    with:
      name: database-docs
      path: schema.md

Programmatic Access

import json

# Load schema data
with open("schema.dbsurveyor.json") as f:
    schema = json.load(f)

# Analyze tables
for table in schema["tables"]:
    print(f"Table: {table['name']}")
    print(f"Columns: {len(table['columns'])}")

Documentation Automation

#!/bin/bash
# Automated documentation generation

# Collect from multiple environments
dbsurveyor-collect --output prod.json $PROD_DB_URL
dbsurveyor-collect --output staging.json $STAGING_DB_URL

# Generate documentation
dbsurveyor generate prod.json --format html --output prod-schema.html
dbsurveyor generate staging.json --format html --output staging-schema.html

# Compare schemas (future feature)
# dbsurveyor compare prod.json staging.json --output diff.html

This comprehensive output format system ensures that DBSurveyor can adapt to various workflows while maintaining security and compatibility guarantees.

JSON Schema Specification

The .dbsurveyor.json format is the standard output format for DBSurveyor schema collection. This specification documents the complete structure, validation rules, and usage examples for the schema format.

Overview

The .dbsurveyor.json format provides a comprehensive, validated representation of database schemas with the following characteristics:

Security-First: No credential fields or sensitive data allowed
Validation-Complete: Full JSON Schema validation ensures data integrity
Database-Agnostic: Unified representation across PostgreSQL, MySQL, SQLite, and MongoDB
Version-Aware: Format versioning for backward compatibility
Frictionless-Compatible: Based on Frictionless Data Table Schema specification

Schema Structure

Root Object

Every .dbsurveyor.json file contains these required fields:

{
  "format_version": "1.0",
  "database_info": { /* Database metadata */ },
  "tables": [ /* Table definitions */ ],
  "collection_metadata": { /* Collection process info */ }
}

Required Fields

Field	Type	Description
`format_version`	String	Schema format version (currently “1.0”)
`database_info`	Object	Database-level information and status
`collection_metadata`	Object	Collection process metadata

Optional Fields

Field	Type	Description
`tables`	Array	Table definitions (default: empty array)
`views`	Array	View definitions
`indexes`	Array	Database indexes
`constraints`	Array	Database constraints
`procedures`	Array	Stored procedures
`functions`	Array	Database functions
`triggers`	Array	Database triggers
`custom_types`	Array	Custom data types
`samples`	Array	Data samples from tables

Database Information

The database_info object contains essential database metadata:

{
  "name": "production_db",
  "version": "13.7",
  "size_bytes": 1073741824,
  "encoding": "UTF8",
  "collation": "en_US.UTF-8",
  "owner": "dbadmin",
  "is_system_database": false,
  "access_level": "Full",
  "collection_status": "Success"
}

Access Levels

Full: Complete schema access with all metadata
Limited: Partial access due to permission constraints
None: No access to schema information

Collection Status

"Success": Schema collected successfully
{"Failed": {"error": "Permission denied"}}: Collection failed with reason
{"Skipped": {"reason": "System database"}}: Database skipped with explanation

Table Structure

Tables are defined with comprehensive metadata:

{
  "name": "users",
  "schema": "public",
  "columns": [
    {
      "name": "id",
      "data_type": {
        "Integer": {
          "bits": 32,
          "signed": true
        }
      },
      "is_nullable": false,
      "is_primary_key": true,
      "is_auto_increment": true,
      "ordinal_position": 1,
      "comment": "Unique user identifier"
    },
    {
      "name": "email",
      "data_type": {
        "String": {
          "max_length": 255
        }
      },
      "is_nullable": false,
      "ordinal_position": 2,
      "comment": "User email address"
    }
  ],
  "primary_key": {
    "name": "users_pkey",
    "columns": [
      "id"
    ]
  },
  "foreign_keys": [
    {
      "name": "users_profile_fk",
      "columns": [
        "profile_id"
      ],
      "referenced_table": "profiles",
      "referenced_columns": [
        "id"
      ],
      "on_delete": "Cascade",
      "on_update": "Cascade"
    }
  ],
  "indexes": [
    {
      "name": "users_email_idx",
      "columns": [
        {
          "name": "email",
          "order": "asc"
        }
      ],
      "is_unique": true,
      "index_type": "btree"
    }
  ],
  "comment": "User account information",
  "row_count": 50000
}

Data Type System

DBSurveyor uses a unified data type system that maps database-specific types to a common representation:

Basic Types

"Boolean"                    // Boolean values
"Date"                       // Date without time
"Json"                       // JSON data
"Uuid"                       // UUID/GUID values

String Types

{
  "String": {
    "max_length": 255        // Maximum character length
  }
}

Numeric Types

{
  "Integer": {
    "bits": 32,              // Bit size: 8, 16, 32, 64, 128
    "signed": true           // Signed or unsigned
  }
}

{
  "Float": {
    "precision": 53          // Floating point precision (1-53)
  }
}

Date/Time Types

{
  "DateTime": {
    "with_timezone": true    // Includes timezone information
  }
}

{
  "Time": {
    "with_timezone": false   // Time without timezone
  }
}

Complex Types

{
  "Array": {
    "element_type": "String" // Array element type
  }
}

{
  "Binary": {
    "max_length": 1024       // Maximum binary length
  }
}

{
  "Custom": {
    "type_name": "geometry"  // Database-specific custom type
  }
}

Constraints and Relationships

Primary Keys

{
  "name": "users_pkey",
  "columns": [
    "id"
  ]
}

Foreign Keys

{
  "name": "orders_user_fk",
  "columns": [
    "user_id"
  ],
  "referenced_table": "users",
  "referenced_schema": "public",
  "referenced_columns": [
    "id"
  ],
  "on_delete": "Cascade",
  "on_update": "Restrict"
}

Referential Actions:

Cascade: Delete/update cascades to related records
SetNull: Set foreign key to NULL
SetDefault: Set foreign key to default value
Restrict: Prevent deletion/update if references exist
NoAction: No automatic action

Check Constraints

{
  "name": "users_age_check",
  "constraint_type": "Check",
  "definition": "age >= 0 AND age <= 150",
  "enforced": true
}

Indexes

Index definitions include performance characteristics:

{
  "name": "users_email_idx",
  "table_name": "users",
  "schema": "public",
  "columns": [
    {
      "name": "email",
      "order": "asc",
      "nulls_order": "last"
    }
  ],
  "is_unique": true,
  "is_primary": false,
  "index_type": "btree",
  "comment": "Unique index on email for fast lookups"
}

Index Types:

btree: Balanced tree (default)
hash: Hash-based index
gin: Generalized inverted index
gist: Generalized search tree
spgist: Space-partitioned GiST

Data Sampling

Optional data samples provide insight into actual data:

{
  "samples": [
    {
      "table_name": "users",
      "schema_name": "public",
      "rows": [
        {
          "id": 1001,
          "email": "user1001@example.com",
          "created_at": "2024-01-15T09:00:00Z"
        },
        {
          "id": 1002,
          "email": "user1002@example.com",
          "created_at": "2024-01-15T09:15:00Z"
        }
      ],
      "sample_size": 2,
      "total_rows": 50000,
      "sampling_strategy": {
        "MostRecent": {
          "limit": 10
        }
      },
      "collected_at": "2024-01-15T10:30:00Z",
      "warnings": [
        "Large table - limited sample collected"
      ],
      "sample_status": "Complete"
    }
  ]
}

Table Sample Fields

Field	Type	Required	Description
`table_name`	String	Yes	Name of the sampled table
`schema_name`	String	No	Schema/database name (null for databases without schemas)
`rows`	Array	Yes	Array of sampled row data as JSON objects
`sample_size`	Integer	Yes	Number of rows actually sampled
`total_rows`	Integer	No	Estimated total row count in the table
`sampling_strategy`	Object/String	Yes	Strategy used for sampling (see below)
`collected_at`	String	Yes	ISO 8601 timestamp of when sample was collected
`warnings`	Array	Yes	Array of warning messages (empty if no warnings)
`sample_status`	String/Object	No	Status of the sampling operation (see below)

Sampling Strategies

{"MostRecent": {"limit": 10}}: Latest N rows
{"Random": {"limit": 100}}: Random sample of N rows
"None": No sampling performed

Sample Status

The optional sample_status field tracks the outcome of the sampling operation. This field is backward-compatible and will be omitted when not set.

Status Values:

"Complete": Sampling completed successfully
{"PartialRetry": {"original_limit": 100}}: Sampling partially completed with a reduced limit due to constraints or errors
{"Skipped": {"reason": "Not implemented for this database"}}: Sampling was skipped with an explanation

Example with Complete Status:

{
  "table_name": "orders",
  "schema_name": "public",
  "rows": [...],
  "sample_size": 100,
  "sampling_strategy": {"MostRecent": {"limit": 100}},
  "collected_at": "2024-01-15T10:30:00Z",
  "warnings": [],
  "sample_status": "Complete"
}

Example with Skipped Status:

{
  "table_name": "large_archive",
  "schema_name": null,
  "rows": [],
  "sample_size": 0,
  "sampling_strategy": "None",
  "collected_at": "2024-01-15T10:30:00Z",
  "warnings": [],
  "sample_status": {
    "Skipped": {
      "reason": "Table exceeds size threshold"
    }
  }
}

Backward Compatibility Note: The sample_status field is optional and omitted when not explicitly set. Older schema files without this field remain fully compatible.

Multi-Database Collections

For server-level collections, the format supports multiple databases:

{
  "format_version": "1.0",
  "server_info": {
    "server_type": "PostgreSQL",
    "version": "13.7",
    "host": "localhost",
    "port": 5432,
    "total_databases": 5,
    "collected_databases": 3,
    "system_databases_excluded": 2,
    "connection_user": "dbadmin",
    "has_superuser_privileges": true,
    "collection_mode": {
      "MultiDatabase": {
        "discovered": 5,
        "collected": 3,
        "failed": 0
      }
    }
  },
  "databases": [
    /* Individual database schemas */
  ],
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 2500,
    "collector_version": "1.0.0",
    "warnings": []
  }
}

Collection Metadata

Every schema file includes metadata about the collection process:

{
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 1500,
    "collector_version": "1.0.0",
    "warnings": [
      "Large table 'audit_logs' - collection took 45 seconds",
      "Custom type 'geometry' not fully supported"
    ],
    "collector_options": {
      "include_system_tables": false,
      "sample_data": true,
      "max_sample_size": 1000
    }
  }
}

Validation Rules

Security Validation

The schema enforces strict security rules:

No credential fields: Field names cannot contain password, secret, token, etc.
No connection strings: Database URLs are automatically filtered
No sensitive patterns: Common credential patterns are rejected
Sanitized output: All error messages are credential-free

Data Validation

String lengths: Maximum 255 characters for names, 1000 for comments
Array limits: Maximum 1000 items in arrays
Nesting depth: Maximum 10 levels of object nesting
File size: Maximum 100MB per schema file

Type Validation

Required fields: All mandatory fields must be present
Data type consistency: Values must match declared types
Constraint validation: Foreign keys must reference valid tables
Index validation: Index columns must exist in referenced table

Usage Examples

Basic Schema Collection

# Collect PostgreSQL schema
dbsurveyor-collect postgres://user:pass@localhost/mydb --output schema.json

# Validate the output
dbsurveyor schema schema.json --validate

# Generate documentation
dbsurveyor schema schema.json --format markdown --output schema.md

Multi-Database Collection

# Collect all databases on server
dbsurveyor-collect postgres://admin:pass@localhost --all-databases --output server_schema.json

# Process specific database
dbsurveyor schema server_schema.json --database app_db --format json --output app_schema.json

Schema Validation

# Validate against JSON Schema
dbsurveyor validate schema.json

# Check for specific issues
dbsurveyor validate schema.json --check-security --check-constraints

Error Handling

Validation Errors

When validation fails, DBSurveyor provides detailed error information:

{
  "validation_errors": [
    {
      "path": "/tables/0/columns/1/data_type",
      "message": "Invalid data type: expected String, Integer, Boolean, Date, Json, Uuid, Array, Binary, Custom, or DateTime",
      "value": "VARCHAR",
      "suggestion": "Use {\"String\": {\"max_length\": 255}} instead"
    }
  ]
}

Collection Warnings

Warnings are included in the metadata for non-critical issues:

{
  "warnings": [
    "Large table 'audit_logs' (1.2M rows) - collection took 45 seconds",
    "Custom type 'geometry' not fully supported - using Custom type",
    "Table 'temp_users' appears to be temporary - may not persist"
  ]
}

Version Compatibility

Current Version: 1.0

Format: Stable and fully supported
Validation: Complete JSON Schema validation
Features: All documented features available
Backward Compatibility: N/A (first version)

Future Versions

The schema is designed for evolution:

Additive Changes: New optional fields can be added
Version Detection: Format version enables version-specific handling
Migration Support: Tools will support upgrading between versions
Deprecation Path: Old fields will be marked before removal

Integration

Documentation Tools

The schema format integrates with all DBSurveyor tools:

dbsurveyor-collect: Generates schema files
dbsurveyor: Processes and validates schemas
dbsurveyor-docs: Generates documentation from schemas
dbsurveyor-validate: Standalone validation tool

External Tools

The format is compatible with:

JSON Schema validators: jsonschema, ajv, etc.
Data analysis tools: pandas, jq, etc.
Documentation generators: Docusaurus, MkDocs, etc.
CI/CD pipelines: GitHub Actions, GitLab CI, etc.

Best Practices

Schema Collection

Use descriptive names: Avoid generic names like “db1”, “test”
Include comments: Add meaningful descriptions for tables and columns
Sample strategically: Use sampling for large tables to avoid huge files
Validate early: Check schemas immediately after collection

Schema Storage

Version control: Track schema changes in Git
Backup regularly: Keep historical schema versions
Compress large files: Use .zst compression for schemas >1MB
Secure access: Limit access to production schemas

Schema Processing

Validate inputs: Always validate before processing
Handle errors gracefully: Check collection status before proceeding
Monitor performance: Track collection times for optimization
Document changes: Keep records of schema evolution

Troubleshooting

Common Issues

Collection Fails with Permission Error

# Check database user privileges
dbsurveyor-collect postgres://user:pass@localhost/db --test-connection

# Verify user has SELECT on information_schema
GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO username;

Large Schema File Size

# Use compression
dbsurveyor-collect postgres://localhost/db --compress

# Limit data sampling
dbsurveyor-collect postgres://localhost/db --max-sample-size 100

Validation Errors

# Check schema format
dbsurveyor validate schema.json --verbose

# Fix common issues
dbsurveyor fix schema.json --output fixed_schema.json

Performance Optimization

Connection pooling: Use connection pooling for large databases
Parallel collection: Collect multiple databases simultaneously
Selective sampling: Only sample essential tables
Incremental updates: Collect only changed schemas

This specification provides a complete reference for the .dbsurveyor.json format, ensuring consistent, validated, and secure schema collection across all supported database types.

Database Support

DBSurveyor supports multiple database engines with comprehensive schema collection capabilities. This guide details the support level and specific features for each database type.

Supported Databases

Database	Status	Feature Flag	Default	Version Support
PostgreSQL	✅ Full	`postgresql`	✅ Yes	9.6+
SQLite	✅ Full	`sqlite`	✅ Yes	3.6+
MySQL	🚧 Partial	`mysql`	❌ No	5.7+, 8.0+
MongoDB	🚧 Planned	`mongodb`	❌ No	4.0+
SQL Server	🚧 Planned	`mssql`	❌ No	2017+

Legend:

✅ Full: Complete feature support with comprehensive testing
⚠️ Partial: Core features supported, some limitations
🚧 Basic: Minimal support, under development

PostgreSQL Support

Status: ✅ Full Support (Default)
Feature Flag: postgresql
Driver: SQLx with Tokio runtime

Connection Examples

# Basic connection
dbsurveyor-collect postgres://user:password@localhost:5432/mydb

# With SSL
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"

# Connection pooling
dbsurveyor-collect "postgres://user:pass@localhost/db?pool_max_conns=5"

# Multiple schemas
dbsurveyor-collect "postgres://user:pass@localhost/db?search_path=public,custom"

Supported Objects

Object Type	Support	Notes
Tables	✅ Full	Including partitioned tables
Views	✅ Full	Regular and materialized views
Indexes	✅ Full	All index types (B-tree, Hash, GiST, etc.)
Constraints	✅ Full	PK, FK, Check, Unique, Exclusion
Functions	✅ Full	SQL and PL/pgSQL functions
Procedures	✅ Full	Stored procedures (PostgreSQL 11+)
Triggers	✅ Full	Row and statement triggers
Types	✅ Full	Custom types, domains, enums
Extensions	✅ Full	Installed extensions
Schemas	✅ Full	Multiple schema support

PostgreSQL-Specific Features

-- Advanced data types
CREATE TABLE example (
    id SERIAL PRIMARY KEY,
    data JSONB,
    tags TEXT[],
    coordinates POINT,
    search_vector TSVECTOR
);

-- Custom types
CREATE TYPE status_enum AS ENUM ('active', 'inactive', 'pending');
CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+\.[^@]+$');

-- Advanced indexes
CREATE INDEX CONCURRENTLY idx_data_gin ON example USING GIN (data);
CREATE INDEX idx_search ON example USING GIN (search_vector);

Collection Output

{
  "database_info": {
    "name": "mydb",
    "version": "13.7",
    "encoding": "UTF8",
    "collation": "en_US.UTF-8"
  },
  "tables": [
    {
      "name": "example",
      "schema": "public",
      "columns": [
        {
          "name": "data",
          "data_type": {
            "Json": {}
          },
          "is_nullable": true
        }
      ]
    }
  ]
}

SQLite Support

Status: ✅ Full Support (Default)
Feature Flag: sqlite
Driver: SQLx with Tokio runtime

Connection Examples

# File path
dbsurveyor-collect sqlite:///path/to/database.db
dbsurveyor-collect /path/to/database.sqlite

# Read-only mode
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"

# In-memory database (for testing)
dbsurveyor-collect "sqlite://:memory:"

Supported Objects

Object Type	Support	Notes
Tables	✅ Full	Including WITHOUT ROWID tables
Views	✅ Full	Regular views
Indexes	✅ Full	B-tree and partial indexes
Constraints	⚠️ Partial	Limited constraint introspection
Triggers	✅ Full	BEFORE, AFTER, INSTEAD OF
Virtual Tables	✅ Full	FTS, R-Tree, etc.
Attached DBs	✅ Full	Multiple attached databases

SQLite-Specific Features

-- WITHOUT ROWID tables
CREATE TABLE example (
    id TEXT PRIMARY KEY,
    data TEXT
) WITHOUT ROWID;

-- Virtual tables
CREATE VIRTUAL TABLE docs_fts USING fts5(title, content);

-- Partial indexes
CREATE INDEX idx_active_users ON users(name) WHERE active = 1;

-- JSON support (SQLite 3.38+)
CREATE TABLE events (
    id INTEGER PRIMARY KEY,
    data JSON
);

Limitations

Limited constraint introspection (SQLite stores constraints as DDL text)
No stored procedures or functions
No custom types (uses affinity system)
No schemas (single namespace per database file)

MySQL Support

Status: ⚠️ Partial Support
Feature Flag: mysql (not default)
Driver: SQLx with Tokio runtime

Connection Examples

# Basic connection
dbsurveyor-collect mysql://root:password@localhost:3306/mydb

# With SSL
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=REQUIRED"

# Character set
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"

Supported Objects

Object Type	Support	Notes
Tables	✅ Full	All storage engines
Views	✅ Full	Regular views
Indexes	✅ Full	Primary, Unique, Index, Fulltext
Constraints	⚠️ Partial	PK, FK, Check (MySQL 8.0+)
Procedures	✅ Full	Stored procedures
Functions	✅ Full	User-defined functions
Triggers	✅ Full	BEFORE, AFTER triggers
Events	✅ Full	Scheduled events

MySQL-Specific Features

-- Storage engines
CREATE TABLE innodb_table (
    id INT PRIMARY KEY AUTO_INCREMENT,
    data TEXT
) ENGINE=InnoDB;

-- Partitioning
CREATE TABLE partitioned (
    id INT,
    created_date DATE
) PARTITION BY RANGE (YEAR(created_date)) (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

-- Full-text indexes
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    FULLTEXT(title, content)
);

Known Limitations

Check constraints only supported in MySQL 8.0+
Limited JSON introspection compared to PostgreSQL
Some storage engine specific features not captured

Security Advisory

⚠️ RUSTSEC-2023-0071: MySQL support uses the RSA crate which has a known timing side-channel vulnerability. MySQL support is disabled by default. Use PostgreSQL or SQLite for production environments.

MongoDB Support

Status: 🚧 Basic Support
Feature Flag: mongodb (not default)
Driver: Official MongoDB Rust driver

Connection Examples

# Basic connection
dbsurveyor-collect mongodb://user:password@localhost:27017/mydb

# With authentication database
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?authSource=admin"

# Replica set
dbsurveyor-collect "mongodb://user:pass@host1,host2,host3/mydb?replicaSet=rs0"

Supported Objects

Object Type	Support	Notes
Collections	✅ Full	Document collections
Indexes	✅ Full	Single field, compound, text, geo
Schema Inference	✅ Basic	Inferred from document sampling
GridFS	⚠️ Partial	Basic GridFS collection detection
Views	🚧 Planned	Aggregation pipeline views

MongoDB-Specific Features

// Schema inference from documents
{
    "_id": ObjectId("..."),
    "name": "string",
    "age": "number",
    "tags": ["array", "of", "strings"],
    "address": {
        "street": "string",
        "city": "string"
    }
}

// Index types
db.users.createIndex({
    "name": 1
}) // Single field
db.users.createIndex({
    "name": 1,
    "age": -1
}) // Compound
db.articles.createIndex({
    "title": "text"
}) // Text search
db.locations.createIndex({
    "coordinates": "2dsphere"
}) // Geospatial

Current Limitations

Schema inference is sampling-based (may miss rare fields)
No aggregation pipeline analysis
Limited sharding information
No user-defined functions

SQL Server Support

Status: 🚧 Basic Support
Feature Flag: mssql (not default)
Driver: Tiberius (native TDS protocol)

Connection Examples

# Basic connection
dbsurveyor-collect mssql://sa:password@localhost:1433/mydb

# Windows Authentication (planned)
dbsurveyor-collect "mssql://localhost/mydb?trusted_connection=yes"

# Named instance
dbsurveyor-collect "mssql://sa:pass@localhost\\SQLEXPRESS/mydb"

Supported Objects

Object Type	Support	Notes
Tables	✅ Full	User tables
Views	✅ Full	Regular views
Indexes	⚠️ Partial	Basic index information
Constraints	⚠️ Partial	PK, FK constraints
Procedures	🚧 Planned	Stored procedures
Functions	🚧 Planned	User-defined functions
Triggers	🚧 Planned	DML triggers

Current Limitations

Limited to basic table and view introspection
No stored procedure analysis yet
No advanced SQL Server features (CLR, XML, spatial)
Windows Authentication not yet supported

Feature Comparison Matrix

Feature	PostgreSQL	SQLite	MySQL	MongoDB	SQL Server
Tables	✅	✅	✅	✅	✅
Views	✅	✅	✅	🚧	✅
Indexes	✅	✅	✅	✅	⚠️
Constraints	✅	⚠️	⚠️	❌	⚠️
Procedures	✅	❌	✅	❌	🚧
Functions	✅	❌	✅	❌	🚧
Triggers	✅	✅	✅	❌	🚧
Custom Types	✅	❌	⚠️	❌	🚧
JSON Support	✅	✅	⚠️	✅	🚧
Multi-DB	✅	⚠️	✅	❌	🚧

Building with Database Support

Default Build

# Includes PostgreSQL and SQLite
cargo build --release

Custom Database Selection

# PostgreSQL only
cargo build --release --no-default-features --features postgresql

# All databases
cargo build --release --all-features

# Specific combination
cargo build --release --no-default-features --features postgresql,mysql,encryption

Feature Dependencies

[features]
postgresql = ["sqlx", "sqlx/postgres", "sqlx/runtime-tokio-rustls"]
mysql = ["sqlx", "sqlx/mysql", "sqlx/runtime-tokio-rustls"]
sqlite = ["sqlx", "sqlx/sqlite", "sqlx/runtime-tokio-rustls"]
mongodb = ["dep:mongodb"]
mssql = ["dep:tiberius"]

Database-Specific Best Practices

PostgreSQL

# Use read-only user
CREATE USER dbsurveyor_readonly;
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_readonly;
GRANT USAGE ON SCHEMA public TO dbsurveyor_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_readonly;

# For multiple schemas
GRANT USAGE ON SCHEMA schema1, schema2 TO dbsurveyor_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA schema1, schema2 TO dbsurveyor_readonly;

MySQL

# Create read-only user
CREATE USER 'dbsurveyor_readonly'@'%' IDENTIFIED BY 'password';
GRANT SELECT ON mydb.* TO 'dbsurveyor_readonly'@'%';
GRANT SELECT ON information_schema.* TO 'dbsurveyor_readonly'@'%';

SQLite

# Ensure read permissions
chmod 644 /path/to/database.db

# Use read-only mode for safety
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"

MongoDB

// Create read-only user
use admin
db.createUser({
    user: "dbsurveyor_readonly",
    pwd: "password",
    roles: [{
            role: "read",
            db: "mydb"
        }, {
            role: "read",
            db: "config"
        } // For sharding info
    ]
})

Troubleshooting Database Connections

Connection Issues

# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db

# Check network connectivity
telnet localhost 5432  # PostgreSQL
telnet localhost 3306  # MySQL
telnet localhost 27017 # MongoDB
telnet localhost 1433  # SQL Server

Permission Issues

# PostgreSQL: Check permissions
psql -h localhost -U user -d db -c "\dt"

# MySQL: Check permissions
mysql -h localhost -u user -p -e "SHOW TABLES;" db

# SQLite: Check file permissions
ls -la /path/to/database.db

Driver Issues

# Check compiled features
dbsurveyor-collect list

# Verify feature compilation
cargo build --features postgresql --verbose

Roadmap

Planned Improvements

PostgreSQL:

Advanced partitioning support
Extension-specific object types
Performance statistics collection

MySQL:

Enhanced JSON column support
Partition pruning analysis
Storage engine optimization hints

MongoDB:

Aggregation pipeline analysis
Sharding topology mapping
Index usage statistics

SQL Server:

Complete stored procedure support
CLR integration analysis
Spatial data type support

General:

Cross-database schema comparison
Migration script generation
Performance benchmarking integration

Contributing Database Support

See the Contributing Guide for information on adding support for new database engines or improving existing support.

Troubleshooting

This guide helps you diagnose and resolve common issues with DBSurveyor.

Quick Diagnostics

Check Installation

# Verify binaries are installed and working
dbsurveyor-collect --version
dbsurveyor --version

# Check compiled features
dbsurveyor-collect list

# Test with minimal example
echo "CREATE TABLE test (id INTEGER);" | sqlite3 test.db
dbsurveyor-collect sqlite://test.db
rm test.db schema.dbsurveyor.json

Enable Debug Logging

# Enable debug logging for all modules
export RUST_LOG=debug
dbsurveyor-collect postgres://localhost/db

# Enable trace logging for specific modules
export RUST_LOG=dbsurveyor_collect=trace,dbsurveyor_core=debug
dbsurveyor-collect postgres://localhost/db

# Log to file
export RUST_LOG=debug
dbsurveyor-collect postgres://localhost/db 2> debug.log

Connection Issues

Database Connection Failures

Symptoms: Connection timeouts, authentication failures, network errors

PostgreSQL Connection Issues

# Test basic connectivity
ping localhost
telnet localhost 5432

# Test with psql
psql -h localhost -U user -d db -c "SELECT 1;"

# Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*.log

# Common connection string issues
# ❌ Wrong: postgres://user:pass@localhost/db:5432
# ✅ Correct: postgres://user:pass@localhost:5432/db

# SSL issues
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=disable"
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"

MySQL Connection Issues

# Test basic connectivity
telnet localhost 3306

# Test with mysql client
mysql -h localhost -u user -p -e "SELECT 1;" db

# Check MySQL logs
sudo tail -f /var/log/mysql/error.log

# Common issues
# Character set problems
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"

# SSL issues
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=DISABLED"

SQLite Connection Issues

# Check file exists and is readable
ls -la /path/to/database.db
file /path/to/database.db

# Test with sqlite3 command
sqlite3 /path/to/database.db ".tables"

# Permission issues
chmod 644 /path/to/database.db

# Use absolute paths
dbsurveyor-collect "sqlite:///$(pwd)/database.db"

MongoDB Connection Issues

# Test basic connectivity
telnet localhost 27017

# Test with mongo client
mongosh "mongodb://user:pass@localhost:27017/db"

# Authentication issues
dbsurveyor-collect "mongodb://user:pass@localhost/db?authSource=admin"

# Replica set issues
dbsurveyor-collect "mongodb://user:pass@host1,host2/db?replicaSet=rs0"

Permission and Authentication Issues

Symptoms: Access denied, insufficient privileges, authentication failed

PostgreSQL Permissions

-- Check current user permissions
SELECT current_user, session_user;

-- Check database access
SELECT datname FROM pg_database WHERE datname = 'your_db';

-- Check table permissions
SELECT schemaname, tablename 
FROM pg_tables 
WHERE schemaname NOT IN ('information_schema', 'pg_catalog');

-- Grant necessary permissions
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_user;
GRANT USAGE ON SCHEMA public TO dbsurveyor_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_user;

MySQL Permissions

-- Check current user
SELECT USER(), CURRENT_USER();

-- Check permissions
SHOW GRANTS FOR 'dbsurveyor_user'@'%';

-- Grant necessary permissions
GRANT SELECT ON mydb.* TO 'dbsurveyor_user'@'%';
GRANT SELECT ON information_schema.* TO 'dbsurveyor_user'@'%';
FLUSH PRIVILEGES;

MongoDB Permissions

// Check current user
db.runCommand({
    connectionStatus: 1
})

// Check permissions
use mydb
db.runCommand({
    usersInfo: "dbsurveyor_user"
})

// Grant read permissions
use admin
db.grantRolesToUser("dbsurveyor_user", [{
    role: "read",
    db: "mydb"
}])

Collection Issues

Schema Collection Failures

Symptoms: Partial collection, missing objects, collection errors

Missing Tables or Objects

# Enable verbose logging to see what's being collected
dbsurveyor-collect -vvv postgres://localhost/db

# Check if objects exist in database
psql -h localhost -U user -d db -c "\dt"  # PostgreSQL tables
mysql -u user -p -e "SHOW TABLES;" db     # MySQL tables
sqlite3 db.sqlite ".tables"               # SQLite tables

Large Database Timeouts

# Increase timeouts
dbsurveyor-collect --connect-timeout 60 --query-timeout 120 postgres://localhost/db

# Use throttling to reduce load
dbsurveyor-collect --throttle 1000 postgres://localhost/db

# Disable sample collection for speed
dbsurveyor-collect --sample 0 postgres://localhost/db

Memory Issues

# Monitor memory usage
top -p $(pgrep dbsurveyor-collect)

# Use compression to reduce memory usage
dbsurveyor-collect --compress postgres://localhost/db

# Process smaller chunks (multi-database collection)
dbsurveyor-collect --exclude-databases large_db1,large_db2 postgres://localhost

Output File Issues

Symptoms: Empty files, corrupted output, permission errors

File Permission Issues

# Check output directory permissions
ls -la $(dirname schema.dbsurveyor.json)

# Ensure write permissions
chmod 755 $(dirname schema.dbsurveyor.json)

# Use explicit output path
dbsurveyor-collect --output /tmp/schema.json postgres://localhost/db

Corrupted Output Files

# Validate schema file
dbsurveyor validate schema.dbsurveyor.json

# Check file size and format
ls -la schema.dbsurveyor.json
file schema.dbsurveyor.json

# Test JSON parsing
jq . schema.dbsurveyor.json > /dev/null

Compression/Encryption Issues

# Test compression
dbsurveyor-collect --compress postgres://localhost/db
zstd -t schema.dbsurveyor.json.zst  # Test compressed file

# Test encryption (will prompt for password)
dbsurveyor-collect --encrypt postgres://localhost/db
dbsurveyor validate schema.enc  # Test encrypted file

Documentation Generation Issues

Processing Failures

Symptoms: Generation errors, empty output, format issues

Input File Issues

# Validate input file first
dbsurveyor validate schema.dbsurveyor.json

# Check file format
file schema.dbsurveyor.json

# Test with minimal processing
dbsurveyor --format json schema.dbsurveyor.json

Encrypted File Issues

# Verify encryption format
dbsurveyor validate schema.enc

# Test decryption
dbsurveyor --format json schema.enc

# Check password
# Note: Passwords are case-sensitive and don't show characters

Large Schema Processing

# Use JSON format for large schemas (most efficient)
dbsurveyor --format json large-schema.json

# Monitor memory usage
top -p $(pgrep dbsurveyor)

# Process in parts (planned feature)
# Currently: split large schemas manually

Output Format Issues

Symptoms: Malformed output, missing content, rendering problems

HTML Generation Issues

# Test with simple format first
dbsurveyor --format markdown schema.json

# Check HTML output
dbsurveyor --format html schema.json
# Open in browser to check for issues

Mermaid Diagram Issues

# Generate Mermaid format
dbsurveyor --format mermaid schema.json

# Validate Mermaid syntax
# Copy content to https://mermaid.live/ for testing

Output Formatting and Progress Indicators

Symptoms: Progress spinners or colored output causing problems, garbled output with escape codes, missing progress indicators

Disabling Colors and Progress Spinners

For CI/CD pipelines, automation scripts, or non-interactive environments where colored output or progress spinners may cause issues:

# Disable ANSI color codes and progress spinners
export NO_COLOR=1
dbsurveyor-collect postgres://localhost/db
dbsurveyor --format markdown schema.json

# Alternative: Use TERM=dumb
TERM=dumb dbsurveyor-collect postgres://localhost/db
TERM=dumb dbsurveyor --format json schema.json

# For CI environments (typically set automatically)
export NO_COLOR=1
export TERM=dumb
dbsurveyor-collect postgres://localhost/db > collection.log

Common scenarios:

CI/CD pipelines: Set NO_COLOR=1 to prevent ANSI escape codes in logs
Log files: Progress spinners are automatically hidden when output is redirected
Automated scripts: Use TERM=dumb for consistent non-interactive behavior
Windows systems: Some terminals may display escape codes incorrectly

Troubleshooting Progress Indicators

# Check if output is going to a TTY
# Progress spinners only show when stdout is a terminal
test -t 1 && echo "TTY detected" || echo "Not a TTY"

# Force spinner to be hidden (for debugging)
export NO_COLOR=1
dbsurveyor --format html schema.json

# Check environment variables
echo "NO_COLOR=${NO_COLOR:-not set}"
echo "TERM=${TERM:-not set}"

Note: The tool automatically detects when output is piped or redirected and hides progress indicators. Setting NO_COLOR or TERM=dumb ensures consistent behavior across all environments.

Performance Issues

Slow Collection

Symptoms: Long collection times, high resource usage

Database Performance

# Use connection pooling
dbsurveyor-collect --max-connections 5 postgres://localhost/db

# Add throttling
dbsurveyor-collect --throttle 500 postgres://localhost/db

# Disable sample collection
dbsurveyor-collect --sample 0 postgres://localhost/db

# Monitor database load
# PostgreSQL: SELECT * FROM pg_stat_activity;
# MySQL: SHOW PROCESSLIST;

Network Performance

# Test network latency
ping database-host

# Use local connections when possible
dbsurveyor-collect postgres://localhost/db  # Better than remote

# Consider compression for remote databases
dbsurveyor-collect --compress postgres://remote-host/db

Memory Usage

Symptoms: High memory consumption, out of memory errors

Memory Optimization

# Monitor memory usage
ps aux | grep dbsurveyor
top -p $(pgrep dbsurveyor)

# Use streaming for large datasets (automatic)
# Reduce sample size
dbsurveyor-collect --sample 10 postgres://localhost/db

# Use compression
dbsurveyor-collect --compress postgres://localhost/db

Security Issues

Credential Exposure

Symptoms: Passwords in logs, credential leakage

Verify Credential Sanitization

# Check logs for credentials
export RUST_LOG=debug
dbsurveyor-collect postgres://user:secret@localhost/db 2>&1 | grep -i secret
# Should return no results

# Test credential sanitization
dbsurveyor-collect test postgres://user:secret@localhost/db
# Should show: postgres://user:****@localhost/db

Secure Credential Handling

# Use environment variables
export DATABASE_URL="postgres://user:secret@localhost/db"
dbsurveyor-collect

# Avoid shell history
set +o history  # Disable history
dbsurveyor-collect postgres://user:secret@localhost/db
set -o history  # Re-enable history

Encryption Issues

Symptoms: Encryption failures, decryption errors

Test Encryption

# Test encryption roundtrip
echo "test data" > test.txt
dbsurveyor-collect --encrypt sqlite://test.db
dbsurveyor validate schema.enc
rm test.db test.txt schema.enc

Password Issues

# Ensure password consistency
# Passwords are case-sensitive
# No visual feedback during password entry (security feature)

# Test with simple password first
# Use ASCII characters only initially

Build and Installation Issues

Compilation Failures

Symptoms: Build errors, missing dependencies, feature conflicts

Rust Toolchain Issues

# Update Rust toolchain
rustup update

# Check Rust version (minimum 1.87)
rustc --version

# Clean and rebuild
cargo clean
cargo build --release

Feature Compilation Issues

# Check available features
cargo build --help | grep -A 20 "FEATURES:"

# Build with specific features
cargo build --release --features postgresql,sqlite

# Debug feature compilation
cargo build --release --features postgresql --verbose

System Dependencies

# Ubuntu/Debian: Install system dependencies
sudo apt-get update
sudo apt-get install build-essential libssl-dev pkg-config

# For PostgreSQL support
sudo apt-get install libpq-dev

# For MySQL support
sudo apt-get install libmysqlclient-dev

# macOS: Install dependencies
brew install openssl pkg-config
# For PostgreSQL: brew install postgresql
# For MySQL: brew install mysql

Runtime Dependencies

Symptoms: Missing shared libraries, runtime errors

Check Dependencies

# Check binary dependencies
ldd target/release/dbsurveyor-collect  # Linux
otool -L target/release/dbsurveyor-collect  # macOS

# Test minimal functionality
dbsurveyor-collect --version
dbsurveyor-collect list

Getting Help

Collect Debug Information

# System information
uname -a
rustc --version
cargo --version

# DBSurveyor information
dbsurveyor-collect --version
dbsurveyor-collect list

# Feature compilation
cargo build --release --verbose 2>&1 | grep -i feature

# Runtime debug
export RUST_LOG=debug
dbsurveyor-collect test postgres://localhost/db 2> debug.log

Create Minimal Reproduction

# Create minimal test case
sqlite3 minimal.db "CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT);"
sqlite3 minimal.db "INSERT INTO test VALUES (1, 'test');"

# Test collection
dbsurveyor-collect sqlite://minimal.db

# Test documentation
dbsurveyor generate schema.dbsurveyor.json

# Clean up
rm minimal.db schema.dbsurveyor.json schema.md

Report Issues

When reporting issues, include:

System Information: OS, Rust version, DBSurveyor version
Command Used: Exact command that failed (sanitize credentials)
Error Output: Complete error message and stack trace
Debug Logs: Output with RUST_LOG=debug
Minimal Reproduction: Smallest example that reproduces the issue

Security Note: Never include actual database credentials in issue reports. Use placeholder values like user:password@localhost/db.

Community Resources

GitHub Issues: Report bugs and request features
Documentation: Complete user guide
Security Issues: Email security@evilbitlabs.io

Professional Support

For enterprise users requiring professional support, contact support@evilbitlabs.io for:

Priority issue resolution
Custom feature development
Integration consulting
Security auditing and compliance

Architecture

DBSurveyor follows a security-first, modular architecture designed for flexibility, maintainability, and offline operation. This document details the system architecture and design decisions.

System Overview

graph TB
    subgraph "User Environment"
        CLI[Command Line Interface]
        CONFIG[Configuration Files]
        ENV[Environment Variables]
    end
    
    subgraph "DBSurveyor Workspace"
        subgraph "dbsurveyor-collect"
            COLLECT_CLI[Collection CLI]
            ADAPTERS[Database Adapters]
            POOL[Connection Pooling]
        end
        
        subgraph "dbsurveyor-core"
            MODELS[Data Models]
            SECURITY[Security Module]
            ERROR[Error Handling]
            TRAITS[Adapter Traits]
        end
        
        subgraph "dbsurveyor"
            DOC_CLI[Documentation CLI]
            TEMPLATES[Template Engine]
            FORMATS[Output Formats]
            ANALYSIS[Schema Analysis]
        end
    end
    
    subgraph "External Systems"
        POSTGRES[(PostgreSQL)]
        MYSQL[(MySQL)]
        SQLITE[(SQLite)]
        MONGODB[(MongoDB)]
        SQLSERVER[(SQL Server)]
    end
    
    subgraph "Output Artifacts"
        JSON[Schema Files]
        DOCS[Documentation]
        SQL[SQL DDL]
        DIAGRAMS[ERD Diagrams]
    end
    
    CLI --> COLLECT_CLI
    CLI --> DOC_CLI
    CONFIG --> COLLECT_CLI
    CONFIG --> DOC_CLI
    ENV --> COLLECT_CLI
    
    COLLECT_CLI --> ADAPTERS
    ADAPTERS --> POOL
    ADAPTERS --> MODELS
    ADAPTERS --> SECURITY
    ADAPTERS --> ERROR
    ADAPTERS --> TRAITS
    
    POOL --> POSTGRES
    POOL --> MYSQL
    POOL --> SQLITE
    POOL --> MONGODB
    POOL --> SQLSERVER
    
    COLLECT_CLI --> JSON
    
    DOC_CLI --> TEMPLATES
    DOC_CLI --> FORMATS
    DOC_CLI --> ANALYSIS
    TEMPLATES --> MODELS
    FORMATS --> MODELS
    ANALYSIS --> MODELS
    
    JSON --> DOC_CLI
    DOC_CLI --> DOCS
    DOC_CLI --> SQL
    DOC_CLI --> DIAGRAMS

Crate Architecture

Workspace Structure

DBSurveyor uses a Cargo workspace with three main crates:

dbsurveyor/
├── dbsurveyor-core/     # Shared library
├── dbsurveyor-collect/  # Collection binary
├── dbsurveyor/          # Documentation binary
└── Cargo.toml          # Workspace configuration

Dependency Graph

graph TD
    COLLECT[dbsurveyor-collect] --> CORE[dbsurveyor-core]
    DOC[dbsurveyor] --> CORE
    
    CORE --> SERDE[serde]
    CORE --> TOKIO[tokio]
    CORE --> SQLX[sqlx]
    CORE --> MONGO[mongodb]
    CORE --> CRYPTO[aes-gcm + argon2]
    
    COLLECT --> CLAP[clap]
    COLLECT --> RPASS[rpassword]
    COLLECT --> ZSTD[zstd]
    
    DOC --> ASKAMA[askama]
    DOC --> PULLDOWN[pulldown-cmark]

Core Library (dbsurveyor-core)

Module Structure

#![allow(unused)]
fn main() {
// dbsurveyor-core/src/lib.rs
pub mod adapters; // Database adapter traits and factory
pub mod error; // Comprehensive error handling
pub mod models; // Unified data models
pub mod security; // Encryption and credential protection

// Re-exports for public API
pub use adapters::{create_adapter, DatabaseAdapter};
pub use error::{DbSurveyorError, Result};
pub use models::{DatabaseSchema, DatabaseType};
}

Data Models

The core defines unified data structures that work across all database types:

#![allow(unused)]
fn main() {
// Unified schema representation
pub struct DatabaseSchema {
    pub format_version: String,
    pub database_info: DatabaseInfo,
    pub tables: Vec<Table>,
    pub views: Vec<View>,
    pub indexes: Vec<Index>,
    pub constraints: Vec<Constraint>,
    pub procedures: Vec<Procedure>,
    pub functions: Vec<Procedure>,
    pub triggers: Vec<Trigger>,
    pub custom_types: Vec<CustomType>,
    pub samples: Option<Vec<TableSample>>,
    pub collection_metadata: CollectionMetadata,
}

// Cross-database type mapping
pub enum UnifiedDataType {
    String { max_length: Option<u32> },
    Integer { bits: u8, signed: bool },
    Float { precision: Option<u8> },
    Boolean,
    DateTime { with_timezone: bool },
    Json,
    Array { element_type: Box<UnifiedDataType> },
    Custom { type_name: String },
}
}

Adapter Pattern

Database adapters implement a common trait for unified access:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait DatabaseAdapter: Send + Sync {
    async fn test_connection(&self) -> Result<()>;
    async fn collect_schema(&self) -> Result<DatabaseSchema>;
    fn database_type(&self) -> DatabaseType;
    fn supports_feature(&self, feature: AdapterFeature) -> bool;
    fn connection_config(&self) -> ConnectionConfig;
}
}

Factory Pattern

The adapter factory provides database-agnostic instantiation:

#![allow(unused)]
fn main() {
pub async fn create_adapter(connection_string: &str) -> Result<Box<dyn DatabaseAdapter>> {
    let database_type = detect_database_type(connection_string)?;
    
    match database_type {
        DatabaseType::PostgreSQL => {
            #[cfg(feature = "postgresql")]
            {
                let adapter = PostgresAdapter::new(connection_string).await?;
                Ok(Box::new(adapter))
            }
            #[cfg(not(feature = "postgresql"))]
            Err(DbSurveyorError::unsupported_feature("PostgreSQL"))
        }
        // ... other database types
    }
}
}

Security Architecture

Credential Protection

graph LR
    INPUT[Connection String] --> PARSE[URL Parser]
    PARSE --> CONFIG[Connection Config]
    PARSE --> CREDS[Credentials]
    
    CONFIG --> SANITIZE[Sanitization]
    CREDS --> ZEROIZE[Zeroizing Container]
    
    SANITIZE --> LOGS[Logs & Errors]
    ZEROIZE --> DATABASE[Database Connection]
    ZEROIZE --> MEMORY[Auto-Zero on Drop]

Implementation:

#![allow(unused)]
fn main() {
use zeroize::{Zeroize, Zeroizing};

#[derive(Zeroize)]
#[zeroize(drop)]
pub struct Credentials {
    pub username: Zeroizing<String>,
    pub password: Zeroizing<Option<String>>,
}

// Connection config never contains credentials
pub struct ConnectionConfig {
    pub host: String,
    pub port: Option<u16>,
    pub database: Option<String>,
    // No username/password fields
}
}

Encryption Architecture

graph TD
    PASSWORD[User Password] --> ARGON2[Argon2id KDF]
    SALT[Random Salt] --> ARGON2
    ARGON2 --> KEY[256-bit Key]
    
    DATA[Schema Data] --> AES[AES-GCM-256]
    KEY --> AES
    NONCE[Random Nonce] --> AES
    
    AES --> CIPHERTEXT[Encrypted Data]
    AES --> TAG[Auth Tag]
    
    ENCRYPTED[Encrypted File] --> ALGORITHM[Algorithm ID]
    ENCRYPTED --> NONCE
    ENCRYPTED --> CIPHERTEXT
    ENCRYPTED --> TAG
    ENCRYPTED --> KDF_PARAMS[KDF Parameters]
    ENCRYPTED --> SALT

Security Properties:

Confidentiality: AES-GCM-256 encryption
Integrity: 128-bit authentication tags
Authenticity: Authenticated encryption prevents tampering
Forward Secrecy: Random nonces prevent replay attacks
Key Security: Argon2id with memory-hard parameters

Database Adapter Architecture

Adapter Hierarchy

classDiagram
    class DatabaseAdapter {
        <<trait>>
        +test_connection() Result~()~
        +collect_schema() Result~DatabaseSchema~
        +database_type() DatabaseType
        +supports_feature(AdapterFeature) bool
    }
    
    class PostgresAdapter {
        -pool: PgPool
        -config: ConnectionConfig
        +new(connection_string) Result~Self~
    }
    
    class MySqlAdapter {
        -pool: MySqlPool
        -config: ConnectionConfig
        +new(connection_string) Result~Self~
    }
    
    class SqliteAdapter {
        -pool: SqlitePool
        -config: ConnectionConfig
        +new(connection_string) Result~Self~
    }
    
    class MongoAdapter {
        -client: Client
        -config: ConnectionConfig
        +new(connection_string) Result~Self~
    }
    
    DatabaseAdapter <|-- PostgresAdapter
    DatabaseAdapter <|-- MySqlAdapter
    DatabaseAdapter <|-- SqliteAdapter
    DatabaseAdapter <|-- MongoAdapter

Connection Pooling

Each adapter manages its own connection pool with security-focused defaults:

#![allow(unused)]
fn main() {
pub struct ConnectionConfig {
    pub connect_timeout: Duration, // Default: 30s
    pub query_timeout: Duration,   // Default: 30s
    pub max_connections: u32,      // Default: 10
    pub read_only: bool,           // Default: true
}
}

Feature Flags

Database support is controlled by feature flags for minimal binary size:

[features]
default = ["postgresql", "sqlite"]
postgresql = ["sqlx", "sqlx/postgres"]
mysql = ["sqlx", "sqlx/mysql"]
sqlite = ["sqlx", "sqlx/sqlite"]
mongodb = ["dep:mongodb"]
mssql = ["dep:tiberius"]

Error Handling Architecture

Error Hierarchy

#![allow(unused)]
fn main() {
#[derive(Debug, thiserror::Error)]
pub enum DbSurveyorError {
    #[error("Database connection failed")]
    Connection(#[from] ConnectionError),

    #[error("Schema collection failed: {context}")]
    Collection {
        context: String,
        source: Box<dyn std::error::Error>,
    },

    #[error("Configuration error: {message}")]
    Configuration { message: String },

    #[error("Encryption operation failed")]
    Encryption(#[from] EncryptionError),

    #[error("I/O operation failed: {context}")]
    Io {
        context: String,
        source: std::io::Error,
    },
}
}

Error Context Chain

graph TD
    USER_ERROR[User-Facing Error] --> CONTEXT[Error Context]
    CONTEXT --> SOURCE[Source Error]
    SOURCE --> ROOT[Root Cause]
    
    USER_ERROR --> SANITIZED[Sanitized Message]
    SANITIZED --> NO_CREDS[No Credentials]
    SANITIZED --> ACTIONABLE[Actionable Information]

Security Guarantee: All error messages are sanitized to prevent credential leakage.

CLI Architecture

Command Structure

graph TD
    CLI[CLI Entry Point] --> GLOBAL[Global Args]
    CLI --> COMMANDS[Commands]
    
    GLOBAL --> VERBOSE[Verbosity]
    GLOBAL --> QUIET[Quiet Mode]
    
    COMMANDS --> COLLECT[collect]
    COMMANDS --> TEST[test]
    COMMANDS --> LIST[list]
    
    COLLECT --> DB_URL[Database URL]
    COLLECT --> OUTPUT[Output Options]
    COLLECT --> SECURITY[Security Options]
    
    OUTPUT --> FORMAT[Format Selection]
    OUTPUT --> COMPRESSION[Compression]
    
    SECURITY --> ENCRYPTION[Encryption]
    SECURITY --> THROTTLE[Throttling]

Configuration Hierarchy

Configuration is loaded from multiple sources with clear precedence:

Command Line Arguments (highest priority)
Environment Variables
Project Configuration (.dbsurveyor.toml)
User Configuration (~/.config/dbsurveyor/config.toml)
Default Values (lowest priority)

Documentation Generation Architecture

Template Engine

graph LR
    SCHEMA[Schema Data] --> ANALYSIS[Schema Analysis]
    ANALYSIS --> CONTEXT[Template Context]
    
    TEMPLATES[Askama Templates] --> ENGINE[Template Engine]
    CONTEXT --> ENGINE
    
    ENGINE --> MARKDOWN[Markdown Output]
    ENGINE --> HTML[HTML Output]
    ENGINE --> JSON[JSON Analysis]
    
    SCHEMA --> MERMAID[Mermaid Generator]
    MERMAID --> ERD[ERD Diagrams]
    
    SCHEMA --> SQL[SQL Generator]
    SQL --> DDL[DDL Scripts]

Output Format Pipeline

#![allow(unused)]
fn main() {
pub trait OutputGenerator {
    fn generate(&self, schema: &DatabaseSchema) -> Result<String>;
    fn file_extension(&self) -> &'static str;
    fn mime_type(&self) -> &'static str;
}

// Implementations for each format
impl OutputGenerator for MarkdownGenerator { ... }
impl OutputGenerator for HtmlGenerator { ... }
impl OutputGenerator for JsonGenerator { ... }
impl OutputGenerator for MermaidGenerator { ... }
}

Performance Architecture

Memory Management

graph TD
    STREAMING[Streaming Processing] --> BATCHES[Batch Processing]
    BATCHES --> MEMORY[Memory Limits]
    
    LARGE_TABLES[Large Tables] --> PAGINATION[Pagination]
    PAGINATION --> CHUNKS[Chunk Processing]
    
    CONNECTIONS[Connection Pooling] --> LIMITS[Connection Limits]
    LIMITS --> TIMEOUTS[Query Timeouts]
    
    COMPRESSION[Data Compression] --> ZSTD[Zstandard]
    ZSTD --> EFFICIENCY[Storage Efficiency]

Concurrency Model

// Async/await with Tokio runtime
#[tokio::main]
async fn main() -> Result<()> {
    // Connection pooling for concurrent queries
    let pool = PgPoolOptions::new()
        .max_connections(10)
        .connect_timeout(Duration::from_secs(30))
        .connect(&database_url).await?;
    
    // Concurrent schema collection
    let tables = collect_tables(&pool).await?;
    let views = collect_views(&pool).await?;
    let indexes = collect_indexes(&pool).await?;
    
    // Join all concurrent operations
    let (tables, views, indexes) = tokio::try_join!(
        collect_tables(&pool),
        collect_views(&pool),
        collect_indexes(&pool)
    )?;
}

Testing Architecture

Test Organization

tests/
├── integration/          # End-to-end tests
│   ├── postgres_tests.rs
│   ├── mysql_tests.rs
│   └── sqlite_tests.rs
├── security/            # Security-focused tests
│   ├── credential_tests.rs
│   ├── encryption_tests.rs
│   └── offline_tests.rs
└── fixtures/            # Test data
    ├── sample_schemas/
    └── test_databases/

Test Categories

graph TD
    TESTS[Test Suite] --> UNIT[Unit Tests]
    TESTS --> INTEGRATION[Integration Tests]
    TESTS --> SECURITY[Security Tests]
    TESTS --> PERFORMANCE[Performance Tests]
    
    UNIT --> MODELS[Model Tests]
    UNIT --> ADAPTERS[Adapter Tests]
    UNIT --> SECURITY_UNIT[Security Unit Tests]
    
    INTEGRATION --> POSTGRES[PostgreSQL Integration]
    INTEGRATION --> MYSQL[MySQL Integration]
    INTEGRATION --> SQLITE[SQLite Integration]
    INTEGRATION --> MONGODB[MongoDB Integration]
    
    SECURITY --> CREDENTIALS[Credential Protection]
    SECURITY --> ENCRYPTION[Encryption Validation]
    SECURITY --> OFFLINE[Offline Operation]
    
    PERFORMANCE --> BENCHMARKS[Criterion Benchmarks]
    PERFORMANCE --> MEMORY[Memory Usage Tests]
    PERFORMANCE --> CONCURRENCY[Concurrency Tests]

Build and Distribution Architecture

Feature Matrix

graph TD
    FEATURES[Feature Flags] --> DATABASES[Database Support]
    FEATURES --> SECURITY[Security Features]
    FEATURES --> OPTIONAL[Optional Features]
    
    DATABASES --> POSTGRES[postgresql]
    DATABASES --> MYSQL[mysql]
    DATABASES --> SQLITE[sqlite]
    DATABASES --> MONGODB[mongodb]
    DATABASES --> MSSQL[mssql]
    
    SECURITY --> ENCRYPTION[encryption]
    SECURITY --> COMPRESSION[compression]
    
    OPTIONAL --> TEMPLATES[templates]
    OPTIONAL --> ANALYSIS[analysis]

Binary Variants

DBSurveyor produces multiple binary variants to support different database environments. GoReleaser v2 builds 7 distinct binaries:

dbsurveyor - Documentation postprocessor (all features)
dbsurveyor-collect - Data collection tool with variants:
- all - PostgreSQL, MySQL, SQLite, MongoDB, MSSQL
- postgresql - PostgreSQL only
- mysql - MySQL only
- sqlite - SQLite only
- mongodb - MongoDB only
- mssql - MSSQL only

Each variant is built with specific feature flags:

# All features (default release artifacts)
cargo zigbuild --release --all-features -p=dbsurveyor-collect

# PostgreSQL-only variant
cargo zigbuild --release --no-default-features \
  --features=postgresql,compression,encryption -p=dbsurveyor-collect

# SQLite-only variant
cargo zigbuild --release --no-default-features \
  --features=sqlite,compression,encryption -p=dbsurveyor-collect

Artifact Naming: Release artifacts follow the pattern:

dbsurveyor_{variant}_{OS}_{arch}.{tar.gz|zip}

Examples:

dbsurveyor_all_Linux_x86_64.tar.gz
dbsurveyor_postgresql_Darwin_x86_64.tar.gz
dbsurveyor_sqlite_Windows_x86_64.zip

Deployment Architecture

Airgap Deployment

graph LR
    CONNECTED[Connected System] --> VENDOR[cargo vendor]
    VENDOR --> PACKAGE[Deployment Package]
    
    PACKAGE --> TRANSFER[Secure Transfer]
    TRANSFER --> AIRGAP[Airgap System]
    
    AIRGAP --> OFFLINE_BUILD[Offline Build]
    OFFLINE_BUILD --> BINARIES[DBSurveyor Binaries]

CI/CD Integration

DBSurveyor uses GoReleaser v2 with cargo-zigbuild for cross-compilation:

# GitHub Actions release workflow
  - name: Install Rust toolchain
    uses: dtolnay/rust-toolchain@stable
    with:
      toolchain: 1.93.1

  - name: Install Zig
    uses: mlugg/setup-zig@v2
    with:
      version: 0.13.0

  - name: Install cargo-zigbuild
    run: cargo install --locked cargo-zigbuild --version 0.19.8

  - name: Install Cosign
    uses: sigstore/cosign-installer@v3

  - name: Install Syft
    uses: anchore/sbom-action/download-syft@v0

  - name: Run GoReleaser
    uses: goreleaser/goreleaser-action@v6
    with:
      distribution: goreleaser
      version: ~> v2
      args: release --clean
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      HOMEBREW_TAP_TOKEN: ${{ secrets.HOMEBREW_TAP_TOKEN }}

Cross-Compilation Targets (6 platforms):

x86_64-unknown-linux-gnu - Linux x86_64 (glibc)
aarch64-unknown-linux-gnu - Linux ARM64 (glibc)
x86_64-unknown-linux-musl - Linux x86_64 (musl/Alpine)
x86_64-apple-darwin - macOS Intel
aarch64-apple-darwin - macOS Apple Silicon
x86_64-pc-windows-gnu - Windows x86_64

The postprocessor and the per-driver collector variants (postgresql, mysql, sqlite, mongodb) build for all six targets. The all and mssql collector variants exclude x86_64-unknown-linux-musl: the mssql feature pulls tiberius with native-tls, which links system OpenSSL and cannot statically link against musl. Static-musl users should pick a per-driver variant.

Security Features:

Cosign Keyless Signing: Checksums are signed using GitHub OIDC identity
Syft SBOM Generation: Software Bill of Materials for all archives
Reproducible Builds: Consistent timestamps via {{ .CommitTimestamp }}

This architecture ensures DBSurveyor maintains its security-first principles while providing flexibility, performance, and maintainability across all supported platforms and use cases.

Contributing

We welcome contributions to DBSurveyor! This guide will help you get started with contributing to the project.

Code of Conduct

DBSurveyor follows the Rust Code of Conduct. Please be respectful and inclusive in all interactions.

Getting Started

Prerequisites

Rust 1.95+ (MSRV)
Git
Docker (for integration tests)
Just task runner

Development Setup

# Clone the repository
git clone https://github.com/EvilBit-Labs/dbsurveyor.git
cd dbsurveyor

# Install development tools
just install

# Run initial checks
just dev

Project Structure

dbsurveyor/
├── dbsurveyor-core/     # Shared library
├── dbsurveyor-collect/  # Collection binary
├── dbsurveyor/          # Documentation binary
├── docs/                # Documentation source
│   └── solutions/       # Documented solutions to past problems
└── justfile            # Development tasks

Development Workflow

Daily Development

# Format, lint, test, and check coverage
just dev

# Run specific test categories
just test-unit
just test-integration
just test-security

# Security validation
just security-full

# Pre-commit checks
just pre-commit

Code Quality Standards

DBSurveyor enforces strict quality standards:

Zero Warnings: cargo clippy -- -D warnings must pass
Test Coverage: 55% minimum coverage required (target: 80%, to be raised incrementally)
Security First: All code must pass security validation
Documentation: All public APIs must have /// documentation

Testing Requirements

All contributions must include appropriate tests:

#![allow(unused)]
fn main() {
// Unit tests in source files
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_credential_sanitization() {
        let config = ConnectionConfig::new("postgres://user:secret@host/db");
        let safe_display = config.to_safe_string();
        assert!(!safe_display.contains("secret"));
    }
}

// Integration tests in tests/ directory
#[tokio::test]
async fn test_postgres_collection() {
    let docker = testcontainers::clients::Cli::default();
    let postgres = docker.run(testcontainers::images::postgres::Postgres::default());
    // Test implementation
}

// Security tests are mandatory for security-sensitive code
#[tokio::test]
async fn test_no_credentials_in_output() {
    let schema = collect_schema("postgres://user:secret@localhost/db").await?;
    let json = serde_json::to_string(&schema)?;
    assert!(!json.contains("secret"));
}
}

Contribution Types

Bug Reports

When reporting bugs, please include:

System Information: OS, Rust version, DBSurveyor version
Reproduction Steps: Minimal example that reproduces the issue
Expected vs Actual Behavior: Clear description of the problem
Debug Information: Output with RUST_LOG=debug

Security Note: Never include actual database credentials in bug reports.

Feature Requests

For new features, please:

Check Existing Issues: Avoid duplicates
Describe Use Case: Why is this feature needed?
Propose Implementation: High-level approach
Consider Security: How does this maintain security guarantees?

Code Contributions

Pull Request Process

Fork and Branch: Create a feature branch from main
Implement Changes: Follow coding standards
Add Tests: Comprehensive test coverage
Update Documentation: Keep docs in sync
Run Quality Checks: just dev must pass
Submit PR: Clear description and context

Commit Standards

Follow Conventional Commits:

# Feature additions
feat(postgres): add connection pooling with timeout handling
feat(security): implement AES-GCM encryption with random nonces

# Bug fixes
fix(mysql): handle connection failures without exposing credentials
fix(core): ensure proper cleanup of sensitive data structures

# Security improvements
security(core): prevent credential leakage in error messages
security(encryption): add key derivation parameter validation

# Documentation
docs(readme): update installation instructions
docs(security): add encryption implementation details

Database Adapter Development

Adding New Database Support

To add support for a new database:

Create Adapter Module:

#![allow(unused)]
fn main() {
// dbsurveyor-core/src/adapters/newdb.rs
pub struct NewDbAdapter {
    config: ConnectionConfig,
}

#[async_trait]
impl DatabaseAdapter for NewDbAdapter {
    async fn test_connection(&self) -> Result<()> { ... }
    async fn collect_schema(&self) -> Result<DatabaseSchema> { ... }
    async fn sample_table(&self, table_ref: TableRef<'_>, config: &SamplingConfig) -> Result<TableSample> { ... }
    fn database_type(&self) -> DatabaseType { ... }
    fn supports_feature(&self, feature: AdapterFeature) -> bool { ... }
    fn connection_config(&self) -> ConnectionConfig { ... }
}
}

Add Feature Flag:

# Cargo.toml
[features]
newdb = ["dep:newdb-driver"]

Update Factory:

#![allow(unused)]
fn main() {
// dbsurveyor-core/src/adapters.rs
match database_type {
    #[cfg(feature = "newdb")]
    DatabaseType::NewDb => {
        let adapter = NewDbAdapter::new(connection_string).await?;
        Ok(Box::new(adapter))
    }
    // ...
}
}

Add Tests:

#![allow(unused)]
fn main() {
// tests/integration/newdb_tests.rs
#[tokio::test]
async fn test_newdb_collection() {
    // Integration test with testcontainers
}
}

Database Adapter Requirements

All database adapters must:

Implement DatabaseAdapter trait completely
- test_connection(): Verify database connectivity
- collect_schema(): Collect full database schema information
- sample_table(): Sample data from a specific table
- database_type(): Return the database type
- supports_feature(): Indicate supported features
- connection_config(): Provide connection configuration
Use read-only operations only (SELECT, DESCRIBE, SHOW)
Handle connection timeouts (default: 30 seconds)
Sanitize credentials in all error messages
Support connection pooling where applicable
Include comprehensive tests with testcontainers
Document database-specific features and limitations

The `sample_table()` Method

The sample_table() method performs per-table sampling for a specific table:

#![allow(unused)]
fn main() {
async fn sample_table(
    &self,
    table_ref: TableRef<'_>,
    config: &SamplingConfig,
) -> Result<TableSample>;
}

Parameters:

table_ref: A TableRef containing the table name and optional schema name
config: A SamplingConfig with sampling parameters (sample size, throttle, etc.)

Returns: A TableSample that includes:

Sampled rows as JSON values
Sample metadata (size, total rows, strategy used)
Collection timestamp and any warnings
sample_status: An optional field indicating the sampling outcome

Sample Status Values:

Some(SampleStatus::Complete): Sampling completed successfully
Some(SampleStatus::PartialRetry { original_limit }): Sampling partially completed with a reduced limit
Some(SampleStatus::Skipped { reason }): Sampling was skipped (e.g., not implemented)
None: For backward compatibility with existing data

Implementation Notes:

The TableRef struct wraps the table name and optional schema for the method signature
Implementations should populate sample_status with SampleStatus::Complete on successful sampling
If sampling is not implemented, return SampleStatus::Skipped with an appropriate reason
The method supports optional schema qualification (e.g., public.users vs users)

Testing Database Adapters

# Test specific database adapter
just test-postgres
just test-mysql
just test-sqlite

# Test with real databases using testcontainers
cargo test --test postgres_integration -- --nocapture

# Security testing for new adapters
cargo test --test security_credential_protection

Security Contributions

Security-First Development

All contributions must maintain DBSurveyor’s security guarantees:

No Credential Exposure: Never log or output credentials
Offline Operation: No external network calls except to databases
Encryption Security: Use AES-GCM with random nonces
Memory Safety: Use zeroize for sensitive data

Security Review Process

Security-sensitive changes require additional review:

Security Tests: Must include security-specific tests
Threat Model: Consider impact on threat model
Documentation: Update security documentation
Review: Additional security-focused code review

Security Testing

#![allow(unused)]
fn main() {
// Example security test
#[tokio::test]
async fn test_new_feature_credential_security() {
    // Test that new feature doesn't leak credentials
    let result = new_feature("postgres://user:secret@localhost/db").await?;
    let output = format!("{:?}", result);
    assert!(!output.contains("secret"));
    assert!(!output.contains("user:secret"));
}
}

Documentation Contributions

Documentation Standards

User-Focused: Write for the end user
Security-Aware: Highlight security implications
Example-Rich: Include working code examples
Up-to-Date: Keep in sync with code changes

Documentation Types

API Documentation: /// comments in code
User Guide: Markdown files in docs/src/
README: Project overview and quick start
Security Documentation: Security features and guarantees

Building Documentation

# Build API documentation
cargo doc --all-features --document-private-items --open

# Build user guide
just docs

# Check documentation
just docs-check

Release Process

Version Management

DBSurveyor uses semantic versioning:

Major: Breaking changes
Minor: New features (backward compatible)
Patch: Bug fixes

Release Checklist

Update Version: Bump version in Cargo.toml
Update Changelog: Document all changes
Run Full Tests: just security-full
Update Documentation: Ensure docs are current
Create Release: Tag and create GitHub release
Verify Artifacts: Test release binaries

Community Guidelines

Communication

GitHub Issues: Bug reports and feature requests
Pull Requests: Code contributions and discussions
Security Issues: Email security@evilbitlabs.io

Review Process

Automated Checks: CI must pass
Code Review: Maintainer review required
Security Review: For security-sensitive changes
Documentation Review: For user-facing changes

Recognition

Contributors are recognized in:

CONTRIBUTORS.md file
Release notes
Git commit history

Development Environment

Recommended Tools

IDE: VS Code with Rust Analyzer
Git Hooks: Pre-commit hooks for quality checks
Testing: Nextest for faster test execution
Debugging: RUST_LOG=debug for detailed logging

Environment Variables

# Development environment
export RUST_LOG=debug
export DATABASE_URL="postgres://dev:dev@localhost/dev_db"

# Testing environment
export RUST_LOG=trace
export DBSURVEYOR_TEST_TIMEOUT=60

Docker Development

# Start test databases
docker-compose up -d postgres mysql mongodb

# Run integration tests
just test-integration

# Clean up
docker-compose down

Troubleshooting Development Issues

Common Issues

Build failures:

# Clean and rebuild
cargo clean
cargo build --all-features

# Update toolchain
rustup update

Test failures:

# Run specific test with output
cargo test test_name -- --nocapture

# Run with debug logging
RUST_LOG=debug cargo test test_name

Clippy warnings:

# Fix automatically where possible
cargo clippy --fix --allow-dirty

# Check specific warnings
cargo clippy --workspace --all-targets --all-features -- -D warnings

Getting Help

Documentation: Check existing docs first
Issues: Search existing GitHub issues
Code: Look at similar implementations
Community: Ask questions in GitHub discussions

License and Legal

License

DBSurveyor is licensed under the Apache License 2.0. By contributing, you agree to license your contributions under the same license.

Copyright

All contributions must include appropriate copyright headers:

#![allow(unused)]
fn main() {
// Copyright 2024 EvilBit Labs
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
}

Contributor License Agreement

By submitting a pull request, you represent that:

You have the right to license your contribution
You agree to license it under the Apache License 2.0
Your contribution is your original work

Thank you for contributing to DBSurveyor! Your contributions help make database documentation more secure and accessible for everyone.

Testing

DBSurveyor uses a comprehensive testing strategy to ensure security, reliability, and correctness. This guide covers the testing framework, practices, and how to run tests.

Testing Philosophy

Security-First Testing: All tests must verify security guarantees
Comprehensive Coverage: 55% minimum test coverage with cargo llvm-cov (target: 80%, to be raised incrementally)
Real Database Integration: Use testcontainers for authentic testing
Zero Warnings: All test code must pass cargo clippy -- -D warnings

Test Organization

Test Categories

DBSurveyor organizes tests into distinct categories:

graph TD
    TESTS[Test Suite] --> UNIT[Unit Tests]
    TESTS --> INTEGRATION[Integration Tests]
    TESTS --> SECURITY[Security Tests]
    TESTS --> PERFORMANCE[Performance Tests]
    
    UNIT --> MODELS[Model Tests]
    UNIT --> ADAPTERS[Adapter Unit Tests]
    UNIT --> UTILS[Utility Tests]
    
    INTEGRATION --> POSTGRES[PostgreSQL Tests]
    INTEGRATION --> MYSQL[MySQL Tests]
    INTEGRATION --> SQLITE[SQLite Tests]
    INTEGRATION --> MONGODB[MongoDB Tests]
    
    SECURITY --> CREDENTIALS[Credential Protection]
    SECURITY --> ENCRYPTION[Encryption Tests]
    SECURITY --> OFFLINE[Offline Operation]
    
    PERFORMANCE --> BENCHMARKS[Criterion Benchmarks]
    PERFORMANCE --> MEMORY[Memory Tests]
    PERFORMANCE --> CONCURRENCY[Concurrency Tests]

Test File Structure

dbsurveyor/
├── dbsurveyor-core/
│   └── src/
│       ├── lib.rs              # Unit tests in #[cfg(test)]
│       ├── models.rs           # Model unit tests
│       └── security.rs         # Security unit tests
├── tests/
│   ├── integration/
│   │   ├── postgres_tests.rs   # PostgreSQL integration
│   │   ├── mysql_tests.rs      # MySQL integration
│   │   └── sqlite_tests.rs     # SQLite integration
│   ├── security/
│   │   ├── credential_tests.rs # Credential protection
│   │   ├── encryption_tests.rs # Encryption validation
│   │   └── offline_tests.rs    # Offline operation
│   └── fixtures/
│       ├── sample_schemas/     # Test schema files
│       └── test_data/          # Test databases
└── benches/
    ├── collection.rs           # Collection benchmarks
    └── documentation.rs        # Documentation benchmarks

Running Tests

Basic Test Commands

# Run all tests with nextest (default)
just test

# Run specific test categories
just test-unit           # Unit tests only
just test-integration    # Integration tests only
just test-security       # Security tests only

# Run tests for specific databases
just test-postgres       # PostgreSQL tests
just test-mysql          # MySQL tests
just test-sqlite         # SQLite tests

Advanced Test Options

# Run tests with different profiles
just test-dev            # Development profile (fast)
just test-ci             # CI profile (comprehensive)

# Run with verbose output
just test-verbose

# Run specific test by name
cargo nextest run test_postgres_connection

# Run tests with debug logging
RUST_LOG=debug cargo nextest run

Coverage Testing

# Generate coverage report (55% minimum required, target: 80%)
just coverage

# Generate HTML coverage report
just coverage-html

# CI-friendly coverage
just coverage-ci

Unit Testing

Test Structure

Unit tests are co-located with source code using #[cfg(test)] modules:

#![allow(unused)]
fn main() {
// dbsurveyor-core/src/models.rs
impl DatabaseSchema {
    pub fn new(database_info: DatabaseInfo) -> Self {
        // Implementation
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_database_schema_creation() {
        let db_info = DatabaseInfo::new("test_db".to_string());
        let schema = DatabaseSchema::new(db_info);

        assert_eq!(schema.format_version, "1.0");
        assert_eq!(schema.database_info.name, "test_db");
        assert_eq!(schema.object_count(), 0);
    }

    #[test]
    fn test_with_warning() {
        let db_info = DatabaseInfo::new("test_db".to_string());
        let schema = DatabaseSchema::new(db_info);

        let schema = schema.with_warning("Test warning".to_string());

        assert_eq!(schema.collection_metadata.warnings.len(), 1);
        assert_eq!(schema.collection_metadata.warnings[0], "Test warning");
    }
}
}

Security Unit Tests

All security-sensitive code must include security-focused unit tests:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod security_tests {
    use super::*;

    #[test]
    fn test_credential_sanitization() {
        let url = "postgres://user:secret123@localhost:5432/db";
        let sanitized = redact_database_url(url);

        // Verify credentials are sanitized
        assert!(!sanitized.contains("secret123"));
        assert!(sanitized.contains("user:****"));
        assert!(sanitized.contains("localhost:5432/db"));
    }

    #[test]
    fn test_connection_config_display() {
        let config = ConnectionConfig::new("localhost".to_string())
            .with_port(5432)
            .with_database("testdb".to_string());

        let display_output = format!("{}", config);

        // Should show connection info but never credentials
        assert!(display_output.contains("localhost:5432"));
        assert!(display_output.contains("testdb"));
        assert!(!display_output.contains("password"));
        assert!(!display_output.contains("secret"));
    }
}
}

Integration Testing

Database Integration Tests

Integration tests use testcontainers to spin up real databases:

#![allow(unused)]
fn main() {
// tests/integration/postgres_tests.rs
use testcontainers::{clients, images};
use dbsurveyor_core::adapters::create_adapter;

#[tokio::test]
async fn test_postgres_schema_collection() {
    let docker = clients::Cli::default();
    let postgres = docker.run(images::postgres::Postgres::default());

    let port = postgres.get_host_port_ipv4(5432);
    let database_url = format!(
        "postgres://postgres:postgres@localhost:{}/postgres",
        port
    );

    // Wait for PostgreSQL to be ready
    wait_for_postgres(&database_url).await;

    // Create test schema
    let pool = sqlx::PgPool::connect(&database_url).await
        .expect("Failed to connect to PostgreSQL");

    sqlx::query("CREATE TABLE IF NOT EXISTS users (id INT PRIMARY KEY, name TEXT)")
        .execute(&pool)
        .await
        .expect("Failed to create test table");

    // Test schema collection
    let adapter = create_adapter(&database_url).await
        .expect("Failed to create adapter");

    let schema = adapter.collect_schema().await
        .expect("Failed to collect schema");

    // Verify schema was collected correctly
    assert!(!schema.tables.is_empty());
    
    let users_table = schema.tables.iter()
        .find(|t| t.name == "users")
        .expect("Users table not found");

    assert_eq!(users_table.name, "users");
    assert_eq!(users_table.columns.len(), 2);

    // Clean up
    pool.close().await;
}

async fn wait_for_postgres(database_url: &str) {
    let max_attempts = 30;
    let mut attempts = 0;

    while attempts < max_attempts {
        match sqlx::PgPool::connect(database_url).await {
            Ok(pool) => {
                pool.close().await;
                return;
            }
            Err(_) => {
                attempts += 1;
                tokio::time::sleep(std::time::Duration::from_millis(500)).await;
            }
        }
    }

    panic!("PostgreSQL failed to become ready after {} attempts", max_attempts);
}
}

MySQL Integration Tests

#![allow(unused)]
fn main() {
// tests/integration/mysql_tests.rs
#[tokio::test]
async fn test_mysql_schema_collection() {
    let docker = clients::Cli::default();
    let mysql = docker.run(
        images::mysql::Mysql::default()
            .with_root_password("testpass")
            .with_database("testdb")
    );

    let port = mysql.get_host_port_ipv4(3306);
    let database_url = format!(
        "mysql://root:testpass@localhost:{}/testdb",
        port
    );

    // Wait for MySQL to be ready with exponential backoff
    wait_for_mysql(&database_url).await;

    let adapter = create_adapter(&database_url).await
        .expect("Failed to create MySQL adapter");

    let schema = adapter.collect_schema().await
        .expect("Failed to collect MySQL schema");

    assert!(!schema.tables.is_empty());
}
}

SQLite Integration Tests

#![allow(unused)]
fn main() {
// tests/integration/sqlite_tests.rs
use tempfile::tempdir;

#[tokio::test]
async fn test_sqlite_schema_collection() {
    let temp_dir = tempdir().unwrap();
    let db_path = temp_dir.path().join("test.db");

    // Create test database
    let conn = sqlx::SqlitePool::connect(&format!("sqlite://{}", db_path.display())).await?;

    sqlx::query("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT NOT NULL)")
        .execute(&conn).await?;

    conn.close().await;

    // Test schema collection
    let database_url = format!("sqlite://{}", db_path.display());
    let adapter = create_adapter(&database_url).await
        .expect("Failed to create SQLite adapter");

    let schema = adapter.collect_schema().await
        .expect("Failed to collect SQLite schema");

    assert!(schema.tables.iter().any(|t| t.name == "users"));
}
}

Security Testing

Credential Protection Tests

Security tests verify that credentials never appear in outputs:

#![allow(unused)]
fn main() {
// tests/security/credential_tests.rs
#[tokio::test]
async fn test_no_credentials_in_schema_output() -> Result<(), Box<dyn std::error::Error>> {
    let database_url = "postgres://testuser:secret123@localhost:5432/testdb";

    // Create mock schema (in real test, would collect from database)
    let db_info = DatabaseInfo::new("testdb".to_string());
    let schema = DatabaseSchema::new(db_info);
    
    let json_output = serde_json::to_string(&schema)?;

    // Verify no sensitive data is present
    assert!(!json_output.contains("secret123"));
    assert!(!json_output.contains("testuser:secret123"));
    assert!(!json_output.contains("password"));
    assert!(!json_output.contains("secret"));

    Ok(())
}

#[test]
fn test_error_message_sanitization() {
    let connection_url = "postgres://user:secret@host:5432/db";
    
    // Simulate connection error
    let error = DbSurveyorError::connection_failed("Connection timeout");
    let error_message = format!("{}", error);
    
    // Error message should not contain credentials
    assert!(!error_message.contains("secret"));
    assert!(!error_message.contains("user:secret"));
}
}

Encryption Tests

Comprehensive encryption testing ensures security guarantees:

#![allow(unused)]
fn main() {
// tests/security/encryption_tests.rs
#[cfg(feature = "encryption")]
mod encryption_tests {
    use dbsurveyor_core::security::encryption::{encrypt_data, decrypt_data};

    #[tokio::test]
    async fn test_encryption_roundtrip() {
        let original_data = b"sensitive database schema data";
        let password = "test_password_123";

        // Encrypt the data
        let encrypted = encrypt_data(original_data, password).unwrap();

        // Verify encrypted data structure
        assert_eq!(encrypted.algorithm, "AES-GCM-256");
        assert_eq!(encrypted.nonce.len(), 12); // 96 bits
        assert_eq!(encrypted.auth_tag.len(), 16); // 128 bits
        assert!(!encrypted.ciphertext.is_empty());

        // Decrypt the data
        let decrypted = decrypt_data(&encrypted, password).unwrap();

        // Verify roundtrip
        assert_eq!(original_data, &decrypted[..]);
    }

    #[tokio::test]
    async fn test_nonce_uniqueness() {
        let data = b"test data for nonce uniqueness";
        let password = "same_password";

        // Generate multiple encryptions
        let encrypted1 = encrypt_data(data, password).unwrap();
        let encrypted2 = encrypt_data(data, password).unwrap();
        let encrypted3 = encrypt_data(data, password).unwrap();

        // Nonces should be different (random)
        assert_ne!(encrypted1.nonce, encrypted2.nonce);
        assert_ne!(encrypted2.nonce, encrypted3.nonce);
        assert_ne!(encrypted1.nonce, encrypted3.nonce);

        // All should decrypt to same plaintext
        let decrypted1 = decrypt_data(&encrypted1, password).unwrap();
        let decrypted2 = decrypt_data(&encrypted2, password).unwrap();
        let decrypted3 = decrypt_data(&encrypted3, password).unwrap();

        assert_eq!(data, &decrypted1[..]);
        assert_eq!(data, &decrypted2[..]);
        assert_eq!(data, &decrypted3[..]);
    }

    #[tokio::test]
    async fn test_wrong_password_fails() {
        let data = b"secret data";
        let correct_password = "correct_password";
        let wrong_password = "wrong_password";

        let encrypted = encrypt_data(data, correct_password).unwrap();

        // Decryption with wrong password should fail
        let result = decrypt_data(&encrypted, wrong_password);
        assert!(result.is_err());
    }
}
}

Offline Operation Tests

Verify that DBSurveyor works completely offline:

#![allow(unused)]
fn main() {
// tests/security/offline_tests.rs
#[tokio::test]
async fn test_airgap_compatibility() -> Result<(), Box<dyn std::error::Error>> {
    // Simulate airgap environment by testing without network access
    let schema_data = include_bytes!("../fixtures/sample_schema.json");
    let schema: DatabaseSchema = serde_json::from_slice(schema_data)?;

    // All processing should work offline
    let documentation = generate_documentation(&schema, OutputFormat::Markdown).await?;
    assert!(!documentation.is_empty());

    let json_export = generate_documentation(&schema, OutputFormat::Json).await?;
    assert!(!json_export.is_empty());

    Ok(())
}
}

Performance Testing

Benchmarks

Performance benchmarks use Criterion:

#![allow(unused)]
fn main() {
// benches/collection.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use dbsurveyor_core::models::DatabaseSchema;

fn bench_schema_serialization(c: &mut Criterion) {
    let schema = create_large_test_schema(1000); // 1000 tables

    c.bench_function("schema_to_json", |b| {
        b.iter(|| serde_json::to_string(black_box(&schema)))
    });

    c.bench_function("schema_to_markdown", |b| {
        b.iter(|| generate_markdown_documentation(black_box(&schema)))
    });
}

fn create_large_test_schema(table_count: usize) -> DatabaseSchema {
    let db_info = DatabaseInfo::new("benchmark_db".to_string());
    let mut schema = DatabaseSchema::new(db_info);

    for i in 0..table_count {
        let table = create_test_table(&format!("table_{}", i));
        schema.tables.push(table);
    }

    schema
}

criterion_group!(benches, bench_schema_serialization);
criterion_main!(benches);
}

Memory Testing

#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_memory_usage_large_schema() {
    let initial_memory = get_memory_usage();
    
    // Process large schema
    let large_schema = create_schema_with_tables(10000);
    let _documentation = generate_documentation(&large_schema, OutputFormat::Html).await?;
    
    let final_memory = get_memory_usage();
    let memory_increase = final_memory - initial_memory;
    
    // Ensure memory usage is reasonable (< 100MB for 10k tables)
    assert!(memory_increase < 100 * 1024 * 1024);
}
}

Test Configuration

Nextest Configuration

DBSurveyor uses nextest for enhanced test execution:

# .config/nextest.toml
[profile.default]
retries = 2
test-threads = 4
failure-output = "immediate"
success-output = "never"

# Test groups for different types of tests
[[profile.default.overrides]]
filter = "test(integration)"
test-group = "integration"
max-threads = 2              # Limit concurrent container tests

[[profile.default.overrides]]
filter = "test(security)"
test-group = "security"
max-threads = 1           # Security tests run sequentially

[[profile.default.overrides]]
filter = "test(unit)"
test-group = "unit"
max-threads = 8       # Unit tests can run in parallel

CI Test Configuration

# .github/workflows/test.yml
  - name: Run Tests
    run: |
      # Run tests with CI profile
      cargo nextest run --profile ci --workspace --all-features

      # Generate coverage
      cargo llvm-cov --lcov --output-path lcov.info

      # Security validation
      just security-full

Test Data and Fixtures

Test Data Generation

#![allow(unused)]
fn main() {
// tests/fixtures/mod.rs
pub fn create_test_schema() -> DatabaseSchema {
    let db_info = DatabaseInfo::new("test_db".to_string());
    let mut schema = DatabaseSchema::new(db_info);

    schema.tables = vec![
        create_test_table("users"),
        create_test_table("orders"),
        create_test_table("products"),
    ];

    schema
}

pub fn create_test_table(name: &str) -> Table {
    Table {
        name: name.to_string(),
        schema: Some("public".to_string()),
        columns: vec![Column {
            name: "id".to_string(),
            data_type: UnifiedDataType::Integer {
                bits: 32,
                signed: true,
            },
            is_nullable: false,
            is_primary_key: true,
            is_auto_increment: true,
            default_value: None,
            comment: None,
            ordinal_position: 1,
        }],
        primary_key: Some(PrimaryKey {
            name: Some(format!("{}_pkey", name)),
            columns: vec!["id".to_string()],
        }),
        foreign_keys: vec![],
        indexes: vec![],
        constraints: vec![],
        comment: None,
        row_count: Some(100),
    }
}
}

Fixture Files

tests/fixtures/
├── sample_schemas/
│   ├── postgres_sample.json
│   ├── mysql_sample.json
│   └── sqlite_sample.json
├── test_databases/
│   ├── small_db.sql
│   ├── medium_db.sql
│   └── large_db.sql
└── encrypted_samples/
    ├── encrypted_schema.enc
    └── compressed_schema.zst

Testing Best Practices

Test Quality Standards

Comprehensive: Test happy path, error cases, and edge conditions
Isolated: Tests should not depend on external services (except testcontainers)
Deterministic: Tests must produce consistent results
Fast: Unit tests should complete in milliseconds
Secure: No real credentials in test code; use explicit dummy values only

Common Testing Patterns

#![allow(unused)]
fn main() {
// Use Result<(), Box<dyn std::error::Error>> for test functions
#[tokio::test]
async fn test_function() -> Result<(), Box<dyn std::error::Error>> {
    // Test implementation
    Ok(())
}

// Test both Ok and Err cases
#[test]
fn test_validation() {
    // Test success case
    assert!(validate_input("valid").is_ok());
    
    // Test failure cases
    assert!(validate_input("").is_err());
    assert!(validate_input("invalid").is_err());
}

// Use assert_matches! for pattern matching
#[test]
fn test_error_types() {
    let result = operation_that_fails();
    assert_matches!(result, Err(DbSurveyorError::Configuration { .. }));
}
}

Security Testing Requirements

Every security-sensitive function must have tests that verify:

No credential leakage in any output
Proper error sanitization in all error paths
Secure memory handling with automatic cleanup
Offline operation without external dependencies

Continuous Integration

GitHub Actions Integration

name: Test Suite

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable

      - name: Install tools
        run: just install

      - name: Run tests
        run: just test-ci

      - name: Security validation
        run: just security-full

      - name: Coverage
        run: just coverage-ci

Test Reporting

Coverage Reports: Uploaded to Codecov
Test Results: JUnit XML format for CI integration
Performance: Criterion benchmark results
Security: Security test results and audit reports

This comprehensive testing strategy ensures DBSurveyor maintains its security guarantees while providing reliable functionality across all supported platforms and databases.

Keyboard shortcuts