Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

DBSurveyor is a security-focused, offline-first database analysis and documentation toolchain designed for security professionals, database administrators, and DevOps teams.

Core Purpose

Generate comprehensive, secure database schema documentation for PostgreSQL, MySQL, SQLite, and MongoDB databases without compromising security or requiring persistent network connectivity.

Key Features

  • Dual-Binary Architecture: Separate collector (dbsurveyor-collect) and postprocessor (dbsurveyor) for flexible workflows
  • Database Support: PostgreSQL (βœ…), SQLite (βœ…), MySQL (🚧), MongoDB (🚧), SQL Server (🚧)
  • Security-First Design: AES-GCM encryption, credential protection, zero telemetry
  • Offline Operation: Complete functionality in air-gapped environments
  • Documentation Generation: Markdown (βœ…), JSON analysis (βœ…), HTML/ERD (🚧 in development)
  • Shell Completion Support: Built-in completion scripts for bash, zsh, fish, PowerShell, and elvish
  • Progress Indicators: Visual feedback with spinners for long-running operations (schema loading, decryption, documentation generation)
  • Environment-Aware Output: Respects NO_COLOR and TERM=dumb environment variables to disable color output and progress indicators

Security Guarantees

  • NO NETWORK CALLS: Operates completely offline after initial installation, beyond the collector connection to the target database
  • NO TELEMETRY: Zero data collection, usage tracking, or external reporting
  • NO CREDENTIALS IN OUTPUTS: Database credentials never stored in output files
  • AES-GCM ENCRYPTION: Industry-standard authenticated encryption for sensitive outputs
  • AIRGAP COMPATIBLE: Full functionality in air-gapped environments

Target Users

  • Red Team Operators: Post-compromise database enumeration in contested environments
  • Blue Team Analysts: Auditing unknown systems and inherited databases
  • Database Administrators: Comprehensive documentation and compliance reporting
  • Security Analysts: Database security assessment and risk identification
  • System Administrators: Exploring inherited or legacy databases with minimal documentation

Architecture Overview

DBSurveyor uses a dual-binary architecture that separates data collection from documentation generation:

graph LR
    A[Database] --> B[dbsurveyor-collect]
    B --> C[Schema File]
    C --> D[dbsurveyor]
    D --> E[Documentation]
    
    C -.-> F[Encrypted]
    C -.-> G[Compressed]
    
    E --> H[Markdown]
    E --> I[HTML]
    E --> J[SQL DDL]
    E --> K[Mermaid ERD]

This separation provides several benefits:

  • Security: Collection can happen in secure environments, documentation generation offline
  • Flexibility: Multiple documentation formats from a single collection
  • Portability: Schema files can be transferred between environments
  • Auditability: Clear separation between data collection and processing

Installation

Pre-Built Binaries

Pre-built binaries are available on the Releases page for Linux, macOS, and Windows.

Download the appropriate binary for your platform and add it to your PATH.

Homebrew

For macOS and Linux users with Homebrew:

brew install EvilBit-Labs/tap/dbsurveyor

From Source

Prerequisites

  • Rust 1.93.1+ (MSRV)
  • Git

Clone and Build

git clone https://github.com/EvilBit-Labs/dbsurveyor.git
cd dbsurveyor
cargo build --release

The compiled binaries will be available in target/release/.

Feature Selection

Control which database engines are compiled in using feature flags:

# Build with all database support
cargo build --release --all-features

# Build with specific databases only
cargo build --release --features postgresql,mysql,encryption

# Build minimal version for airgap environments
cargo build --release --no-default-features --features sqlite

Feature Flags

FeatureDescriptionDependencies
postgresqlPostgreSQL supportsqlx with postgres driver
mysqlMySQL supportsqlx with mysql driver
sqliteSQLite supportsqlx with sqlite driver
mongodbMongoDB supportmongodb crate
mssqlSQL Server supporttiberius crate
compressionZstandard compressionzstd crate
encryptionAES-GCM encryptionaes-gcm, argon2 crates

Default Features

default = ["postgresql", "sqlite"]

The default build includes PostgreSQL and SQLite support, which covers the most common use cases while maintaining a reasonable binary size.

Binary Variants

DBSurveyor provides two main binaries:

dbsurveyor-collect

The database collection tool that connects to databases and extracts schema information.

  • Default Features: postgresql, sqlite
  • Optional Features: mysql, mongodb, mssql, compression, encryption

dbsurveyor

The documentation generator that processes collected schema files.

  • Default Features: None (minimal dependencies)
  • Optional Features: compression, encryption

Development Setup

For development work, install the complete toolchain:

# Install development dependencies
just install

# This installs:
# - Rust toolchain components (clippy, rustfmt)
# - Cargo tools (audit, deny, llvm-cov, nextest)
# - Security tools (syft for SBOM generation)
# - Documentation tools (mdbook and plugins)

Verification

Verify your installation:

# Check binary versions
./target/release/dbsurveyor-collect --version
./target/release/dbsurveyor --version

# Test with SQLite (no external database required)
echo "CREATE TABLE test (id INTEGER);" | sqlite3 test.db
./target/release/dbsurveyor-collect sqlite://test.db
./target/release/dbsurveyor schema.dbsurveyor.json

# Clean up
rm test.db schema.dbsurveyor.json schema.md

Airgap Installation

For air-gapped environments:

  1. Prepare on connected system:

    # Download dependencies
    cargo fetch
    
    # Create vendor directory
    cargo vendor vendor
    
    # Build minimal version
    cargo build --release --no-default-features --features sqlite
    
  2. Transfer to airgap system:

    • Copy entire project directory including vendor/
    • Copy built binaries from target/release/
  3. Use offline:

    # Use vendored dependencies
    cargo build --release --offline --no-default-features --features sqlite
    

Troubleshooting

Common Issues

Compilation fails with missing dependencies:

  • Ensure you have the latest Rust toolchain: rustup update
  • Check feature flags match your requirements

Database driver compilation errors:

  • Install system dependencies for your target databases
  • For PostgreSQL: libpq-dev (Ubuntu) or postgresql-devel (RHEL)
  • For MySQL: libmysqlclient-dev (Ubuntu) or mysql-devel (RHEL)

Permission errors:

  • Ensure you have write permissions to the target directory
  • Use cargo install --root ~/.local for user-local installation

Getting Help

Quick Start

This guide will get you up and running with DBSurveyor in minutes.

Prerequisites

  • DBSurveyor installed (see Installation)
  • Access to a database (PostgreSQL, MySQL, SQLite, or MongoDB)

Basic Workflow

DBSurveyor follows a two-step process:

  1. Collect schema from database β†’ schema file
  2. Generate documentation from schema file β†’ reports
graph LR
    A[Database] --> B[dbsurveyor-collect]
    B --> C[schema.dbsurveyor.json]
    C --> D[dbsurveyor]
    D --> E[Documentation]

Example 1: SQLite Database

The simplest example using SQLite (no server setup required):

# Create a sample SQLite database
sqlite3 sample.db << EOF
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    username TEXT NOT NULL,
    email TEXT UNIQUE,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE posts (
    id INTEGER PRIMARY KEY,
    user_id INTEGER,
    title TEXT NOT NULL,
    content TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id)
);

INSERT INTO users (username, email) VALUES
    ('alice', 'alice@example.com'),
    ('bob', 'bob@example.com');
EOF

# Step 1: Collect schema
dbsurveyor-collect sqlite://sample.db

# Step 2: Generate documentation
dbsurveyor generate schema.dbsurveyor.json

# View the generated documentation
cat schema.md

Example 2: PostgreSQL Database

# Step 1: Collect schema from PostgreSQL
dbsurveyor-collect postgres://user:password@localhost:5432/mydb

# Step 2: Generate Markdown documentation (HTML is placeholder)
dbsurveyor generate schema.dbsurveyor.json --format markdown

# View the generated documentation
cat schema.md

Example 3: Encrypted Schema Collection

For sensitive environments, encrypt the schema file:

# Collect with encryption (will prompt for password)
dbsurveyor-collect --encrypt postgres://user:pass@localhost/db

# Generate documentation from encrypted file (will prompt for password)
dbsurveyor generate schema.enc

Common Commands

Collection Commands

# Basic collection
dbsurveyor-collect <database-url>

# Test connection without collecting
dbsurveyor-collect test <database-url>

# Collect with compression
dbsurveyor-collect --compress <database-url>

# Collect all databases on server
dbsurveyor-collect --all-databases postgres://user:pass@localhost

# List supported database types
dbsurveyor-collect list

Documentation Commands

# Generate Markdown (default)
dbsurveyor generate schema.dbsurveyor.json

# Generate HTML report
dbsurveyor --format html schema.json

# Generate SQL DDL
dbsurveyor sql schema.json --dialect postgresql

# Analyze schema statistics
dbsurveyor analyze schema.json --detailed

# Validate schema file
dbsurveyor validate schema.json

Output Files

DBSurveyor generates several types of files:

Schema Files (from collector)

  • schema.dbsurveyor.json - Standard JSON format (specification)
  • schema.dbsurveyor.json.zst - Compressed format (with --compress)
  • schema.enc - Encrypted format (with --encrypt)

Documentation Files (from generator)

  • schema.md - Markdown documentation (βœ… implemented)
  • schema_analysis.json - Analysis report (βœ… implemented)
  • schema.html - HTML report with search (🚧 placeholder)
  • schema.sql - SQL DDL reconstruction (🚧 placeholder)
  • schema.mmd - Mermaid ERD diagram (🚧 placeholder)

Next Steps

Now that you have the basics working:

  1. Explore Database Support: Learn about database-specific features
  2. Security Features: Understand encryption and security options
  3. Advanced Configuration: Customize behavior with configuration options
  4. CLI Reference: Master all available command-line options

Troubleshooting Quick Start Issues

Connection Issues

# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db

# Check if database is accessible
ping localhost
telnet localhost 5432  # PostgreSQL default port

Permission Issues

# Ensure user has read permissions
# PostgreSQL: GRANT SELECT ON ALL TABLES IN SCHEMA public TO username;
# MySQL: GRANT SELECT ON database.* TO 'username'@'%';

File Not Found

# Check if schema file was created
ls -la *.dbsurveyor.json *.enc *.zst

# Verify file format
file schema.dbsurveyor.json

For more detailed troubleshooting, see the Troubleshooting Guide.

Database Collection

The dbsurveyor-collect tool connects to databases and extracts comprehensive schema information. This guide covers all collection features and options.

Basic Collection

Simple Schema Collection

# Collect from PostgreSQL
dbsurveyor-collect postgres://user:password@localhost:5432/mydb

# Collect from SQLite
dbsurveyor-collect sqlite:///path/to/database.db

# Collect from MySQL
dbsurveyor-collect mysql://root:password@localhost:3306/mydb

Connection Testing

Always test your connection before collecting:

# Test connection without collecting schema
dbsurveyor-collect test postgres://user:pass@localhost/db

# Example output:
# βœ“ Connection test successful
# Connection to PostgreSQL database successful

Advanced Collection Options

Output Formats

Control how the collected schema is stored:

# Standard JSON (default)
dbsurveyor-collect --output schema.json postgres://localhost/db

# Compressed with Zstandard
dbsurveyor-collect --compress --output schema.json.zst postgres://localhost/db

# Encrypted with AES-GCM (prompts for password)
dbsurveyor-collect --encrypt --output schema.enc postgres://localhost/db

# Both compressed and encrypted
dbsurveyor-collect --compress --encrypt postgres://localhost/db

Multi-Database Collection

Collect schemas from all databases on a server:

# Collect all accessible databases
dbsurveyor-collect --all-databases postgres://user:pass@localhost

# Include system databases (postgres, template0, etc.)
dbsurveyor-collect --all-databases --include-system-databases postgres://localhost

# Exclude specific databases
dbsurveyor-collect --all-databases --exclude-databases postgres,template0,template1 postgres://localhost

Performance and Stealth Options

# Throttle operations for stealth (1 second delay between operations)
dbsurveyor-collect --throttle 1000 postgres://localhost/db

# Limit sample data collection
dbsurveyor-collect --sample 50 postgres://localhost/db

# Verbose logging for debugging
dbsurveyor-collect -vvv postgres://localhost/db

What Gets Collected

DBSurveyor extracts comprehensive database metadata:

Core Schema Objects

  • Tables: Names, schemas, columns, data types, constraints
  • Views: Definitions, columns, dependencies
  • Indexes: Names, columns, uniqueness, types
  • Constraints: Primary keys, foreign keys, check constraints, unique constraints
  • Procedures: Stored procedures with parameters and definitions
  • Functions: User-defined functions with signatures
  • Triggers: Event triggers with timing and definitions
  • Custom Types: User-defined types, enums, domains

Metadata Information

  • Database Info: Name, version, size, encoding, collation
  • Collection Metadata: Timestamp, duration, warnings, collector version
  • Statistics: Row counts, object counts, relationship mappings

Sample Data (Optional)

When enabled, DBSurveyor can collect sample data:

# Collect sample data (100 rows per table by default)
dbsurveyor-collect --sample 100 postgres://localhost/db

# Disable sample collection
dbsurveyor-collect --sample 0 postgres://localhost/db

Security Note: Sample data may contain sensitive information. Review outputs before sharing.

Database-Specific Features

PostgreSQL

# Full PostgreSQL collection
dbsurveyor-collect postgres://user:pass@localhost:5432/mydb

Collected Objects:

  • Tables, views, materialized views
  • Indexes (B-tree, Hash, GiST, SP-GiST, GIN, BRIN)
  • Constraints (PK, FK, Check, Unique, Exclusion)
  • Functions, procedures, triggers
  • Custom types, domains, enums
  • Extensions and schemas

PostgreSQL-Specific:

  • JSONB columns and indexes
  • Array types and operations
  • Inheritance relationships
  • Partitioned tables

MySQL

# MySQL collection (requires --features mysql)
dbsurveyor-collect mysql://root:password@localhost:3306/mydb

Collected Objects:

  • Tables, views
  • Indexes (Primary, Unique, Index, Fulltext, Spatial)
  • Constraints (PK, FK, Check, Unique)
  • Stored procedures, functions, triggers
  • Events and routines

MySQL-Specific:

  • Storage engines (InnoDB, MyISAM, etc.)
  • Partitioning information
  • Auto-increment values
  • Character sets and collations

SQLite

# SQLite collection
dbsurveyor-collect sqlite:///path/to/database.db
dbsurveyor-collect /path/to/database.sqlite  # Alternative format

Collected Objects:

  • Tables, views
  • Indexes
  • Triggers
  • Constraints (limited support)

SQLite-Specific:

  • ROWID and WITHOUT ROWID tables
  • Virtual tables
  • Attached databases
  • PRAGMA settings

MongoDB

# MongoDB collection (requires --features mongodb)
dbsurveyor-collect mongodb://user:pass@localhost:27017/mydb

Collected Objects:

  • Collections and their schemas
  • Indexes (single field, compound, text, geospatial)
  • Document structure analysis
  • Field statistics and types

MongoDB-Specific:

  • Schema inference from documents
  • Index usage statistics
  • Sharding information
  • GridFS collections

Security Considerations

Read-Only Operations

DBSurveyor only performs read operations:

  • SELECT statements for data retrieval
  • DESCRIBE or SHOW statements for metadata
  • Information schema queries
  • System catalog queries

No write operations are ever performed.

Credential Handling

# Use environment variables to avoid command history
export DATABASE_URL="postgres://user:pass@localhost/db"
dbsurveyor-collect

# Credentials are sanitized in all logs and error messages
# "postgres://user:pass@localhost/db" becomes "postgres://user:****@localhost/db"

Connection Security

# Use SSL/TLS connections when available
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"

# Connection timeouts prevent hanging
# Default: 30 seconds for connection, 30 seconds for queries

Troubleshooting Collection Issues

Connection Problems

# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db

# Check network connectivity
ping localhost
telnet localhost 5432  # PostgreSQL port

# Verify credentials
psql -h localhost -U user -d db -c "SELECT 1;"

Permission Issues

# PostgreSQL: Grant read permissions
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_user;
GRANT USAGE ON SCHEMA public TO dbsurveyor_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_user;

# MySQL: Grant read permissions
GRANT SELECT ON mydb.* TO 'dbsurveyor_user'@'%';

# SQLite: Ensure file read permissions
chmod 644 /path/to/database.db

Large Database Handling

# Use compression for large schemas
dbsurveyor-collect --compress postgres://localhost/large_db

# Throttle operations to reduce load
dbsurveyor-collect --throttle 500 postgres://localhost/large_db

# Exclude large or unnecessary databases
dbsurveyor-collect --all-databases --exclude-databases logs,temp,backup postgres://localhost

Memory and Performance

# Monitor collection progress
dbsurveyor-collect -v postgres://localhost/db

# For very large databases, consider collecting specific schemas
# (This feature is planned for future releases)

Output File Structure

The collected schema file contains:

{
  "format_version": "1.0",
  "database_info": {
    "name": "mydb",
    "version": "13.7",
    "size_bytes": 1048576,
    "encoding": "UTF8"
  },
  "tables": [...],
  "views": [...],
  "indexes": [...],
  "constraints": [...],
  "procedures": [...],
  "functions": [...],
  "triggers": [...],
  "custom_types": [...],
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 5432,
    "collector_version": "0.1.0",
    "warnings": []
  }
}

Next Steps

After collecting schema data:

  1. Generate Documentation: Use dbsurveyor to create reports
  2. Analyze Schema: Use analysis commands to understand the database structure
  3. Secure Storage: Consider encrypting schema files for sensitive databases
  4. Version Control: Track schema changes over time

See the Documentation Generation guide for the next steps.

Documentation Generation

The dbsurveyor tool processes collected schema files and generates comprehensive documentation in multiple formats. This guide covers all documentation generation features.

Basic Documentation Generation

Generate Markdown Documentation

# Generate Markdown (default format)
dbsurveyor generate schema.dbsurveyor.json

# Explicit format specification
dbsurveyor --format markdown schema.dbsurveyor.json

# Custom output file
dbsurveyor --format markdown --output database-docs.md schema.json

Generate HTML Reports

# Generate HTML report with search functionality
dbsurveyor --format html schema.dbsurveyor.json

# Custom output location
dbsurveyor --format html --output reports/database.html schema.json

Output Formats

DBSurveyor supports multiple documentation formats:

Markdown Documentation

Best for: Version control, wikis, README files

dbsurveyor --format markdown schema.json

Features:

  • Clean, readable text format
  • Table of contents with navigation
  • Code blocks for SQL examples
  • Compatible with GitHub, GitLab, etc.

Example Output Structure:

# Database Schema: mydb

## Summary
- **Tables**: 15
- **Views**: 3
- **Indexes**: 28

## Tables

### users
- **Schema**: public
- **Rows**: ~1,250

| Column | Type | Nullable | Default |
|--------|------|----------|---------|
| id | INTEGER | NO | nextval('users_id_seq') |
| username | VARCHAR(50) | NO | |
| email | VARCHAR(255) | YES | |

HTML Reports

Best for: Interactive browsing, presentations, sharing

dbsurveyor --format html schema.json

Features:

  • Interactive search and filtering
  • Collapsible sections
  • Syntax highlighting
  • Responsive design
  • Offline-capable (no external dependencies)

JSON Analysis

Best for: Programmatic processing, integration with other tools

dbsurveyor --format json schema.json

Features:

  • Structured data for automation
  • Statistics and metrics
  • Relationship analysis
  • Machine-readable format

Example Output:

{
  "database_name": "mydb",
  "analysis": {
    "table_count": 15,
    "view_count": 3,
    "index_count": 28,
    "constraint_count": 42
  },
  "relationships": [
    {
      "from_table": "orders",
      "to_table": "users",
      "relationship_type": "foreign_key"
    }
  ]
}

Mermaid ERD Diagrams

Best for: Visual database design, architecture documentation

dbsurveyor --format mermaid schema.json

Features:

  • Entity Relationship Diagrams
  • Visual table relationships
  • Compatible with Mermaid.js
  • Can be embedded in Markdown

Example Output:

erDiagram
    users {
        int id PK
        string username
        string email
    }
    orders {
        int id PK
        int user_id FK
        datetime created_at
    }
    users ||--o{ orders : "has many"

Advanced Features

SQL DDL Reconstruction

Generate SQL DDL scripts to recreate the database structure:

# PostgreSQL DDL
dbsurveyor sql schema.json --dialect postgresql --output recreate.sql

# MySQL DDL
dbsurveyor sql schema.json --dialect mysql --output mysql-schema.sql

# Generic SQL (ANSI standard)
dbsurveyor sql schema.json --dialect generic --output portable.sql

Supported Dialects:

  • postgresql - PostgreSQL-specific syntax
  • mysql - MySQL-specific syntax
  • sqlite - SQLite-specific syntax
  • sqlserver - SQL Server-specific syntax
  • generic - ANSI SQL standard

Schema Analysis

Get detailed insights about your database structure:

# Basic analysis
dbsurveyor analyze schema.json

# Detailed analysis with statistics
dbsurveyor analyze schema.json --detailed

Analysis Output:

Schema Analysis: mydb
================
Tables: 15
Views: 3
Indexes: 28
Constraints: 42

Detailed Analysis:
- Procedures: 5
- Functions: 12
- Triggers: 8
- Custom Types: 3

Relationship Analysis:
- Foreign Key Relationships: 18
- Self-Referencing Tables: 2
- Orphaned Tables: 1

Schema Validation

Verify schema file integrity and format:

dbsurveyor validate schema.dbsurveyor.json

Validation Checks:

  • File format correctness
  • Schema version compatibility
  • Data integrity verification
  • Warning detection

Working with Encrypted Files

DBSurveyor can process encrypted schema files:

# Process encrypted file (will prompt for password)
dbsurveyor generate schema.enc

# All formats work with encrypted input
dbsurveyor --format html schema.enc
dbsurveyor sql schema.enc --dialect postgresql

Security Notes:

  • Password prompts are secure (no echo)
  • Decryption happens in memory only
  • Original encrypted file remains unchanged

Working with Compressed Files

Process compressed schema files automatically:

# Compressed files are detected automatically
dbsurveyor generate schema.dbsurveyor.json.zst

# Works with all output formats
dbsurveyor --format html schema.json.zst

Data Redaction

Control sensitive data visibility in generated documentation:

Redaction Modes

# No redaction (show all data)
dbsurveyor --redact-mode none schema.json

# Minimal redaction (only obvious sensitive fields)
dbsurveyor --redact-mode minimal schema.json

# Balanced redaction (recommended default)
dbsurveyor --redact-mode balanced schema.json

# Conservative redaction (maximum privacy)
dbsurveyor --redact-mode conservative schema.json

# Disable all redaction
dbsurveyor --no-redact schema.json

Redaction Examples

ModePassword FieldEmail FieldID Field
nonepassword123user@example.com12345
minimal[REDACTED]user@example.com12345
balanced[REDACTED][EMAIL]12345
conservative[REDACTED][REDACTED][ID]

Batch Processing

Process multiple schema files:

# Process all schema files in directory
for file in *.dbsurveyor.json; do
    dbsurveyor --format html "$file"
done

# Generate multiple formats from one schema
for format in markdown html mermaid; do
    dbsurveyor --format "$format" schema.json
done

Integration Examples

CI/CD Pipeline

# GitHub Actions example
  - name: Generate Database Documentation
    run: |
      dbsurveyor-collect postgres://${{ secrets.DB_URL }}
      dbsurveyor --format html --output docs/database.html schema.dbsurveyor.json
      dbsurveyor --format markdown --output docs/DATABASE.md schema.dbsurveyor.json

Documentation Website

# Generate documentation for website
dbsurveyor --format html --output public/database.html schema.json
dbsurveyor --format mermaid --output docs/erd.mmd schema.json

# Include in static site generator
cp schema.html docs/_includes/database-schema.html

Version Control

# Track schema changes over time
git add schema.dbsurveyor.json
git commit -m "Update database schema documentation"

# Generate diff-friendly markdown
dbsurveyor --format markdown schema.json
git add schema.md
git commit -m "Update schema documentation"

Customization and Templating

Template Customization

DBSurveyor uses the Askama template engine. Future versions will support custom templates:

# Custom template (planned feature)
dbsurveyor --template custom.html.j2 --format html schema.json

Output Customization

# Custom CSS for HTML output (planned feature)
dbsurveyor --format html --css custom.css schema.json

# Custom header/footer (planned feature)
dbsurveyor --format html --header header.html --footer footer.html schema.json

Performance Considerations

Large Schema Files

# For large schemas, use streaming processing
dbsurveyor --format json schema.json  # Most efficient

# HTML generation may be slower for very large schemas
dbsurveyor --format html large-schema.json  # May take time

Memory Usage

  • JSON format: Minimal memory usage
  • Markdown format: Low memory usage
  • HTML format: Moderate memory usage (includes search index)
  • Mermaid format: Low memory usage

Troubleshooting

Common Issues

File not found:

# Check file exists and has correct extension
ls -la *.dbsurveyor.json *.enc *.zst

Permission denied:

# Check file permissions
chmod 644 schema.dbsurveyor.json

Invalid format:

# Validate schema file first
dbsurveyor validate schema.json

Encryption password issues:

# Ensure you're using the correct password
# Password prompts don't show characters for security

Debug Mode

# Enable verbose logging
dbsurveyor -vvv generate schema.json

# Check file format
file schema.dbsurveyor.json

Next Steps

After generating documentation:

  1. Review Output: Check generated documentation for accuracy
  2. Share Securely: Consider redaction levels for different audiences
  3. Integrate: Include documentation in your workflow or website
  4. Automate: Set up regular documentation updates
  5. Version Control: Track documentation changes over time

See the Configuration guide for advanced customization options.

Security Features

DBSurveyor is built with security-first principles that are non-negotiable. This document details the comprehensive security features and guarantees.

Core Security Guarantees

1. Offline-Only Operation

  • NO NETWORK CALLS: Operates completely offline after initial installation
  • NO TELEMETRY: Zero data collection, usage tracking, or external reporting
  • NO AUTO-UPDATES: Manual control over all software updates
  • AIRGAP COMPATIBLE: Full functionality in air-gapped environments

2. Credential Protection

Database credentials are never stored, logged, or included in any output files.

Implementation Details

  • Immediate Sanitization: Connection strings sanitized in all error messages and logs
  • Secure Memory Handling: Credentials automatically zeroed using the zeroize crate
  • Separate Handling: Credentials parsed separately from connection configuration
  • No Serialization: Credential structures are never serialized to disk

Example of Credential Sanitization

#![allow(unused)]
fn main() {
// Input: "postgres://user:secret123@localhost:5432/mydb"
// Logged: "postgres://user:****@localhost:5432/mydb"
// Error messages never contain actual passwords
}

3. AES-GCM Encryption

Industry-standard authenticated encryption for sensitive schema outputs.

Encryption Specifications

  • Algorithm: AES-GCM-256 (256-bit key, 96-bit nonce, 128-bit authentication tag)
  • Nonce Generation: Cryptographically secure random nonces (never reused)
  • Key Derivation: Argon2id with secure parameters
  • Authentication: Separate authentication tags prevent tampering

Key Derivation Parameters

ParameterValuePurpose
AlgorithmArgon2id v1.3Memory-hard key derivation
Salt Size16 bytes (128 bits)Random salt per encryption
Memory Cost64 MiB (65536 KiB)Memory hardness
Time Cost3 iterationsComputational hardness
Parallelism4 threadsParallel processing
Output Length32 bytes (256 bits)AES-256 key size

Encryption Process

graph TD
    A[Password] --> B[Argon2id KDF]
    C[Random Salt] --> B
    B --> D[256-bit Key]
    E[Schema Data] --> F[AES-GCM Encrypt]
    D --> F
    G[Random Nonce] --> F
    F --> H[Ciphertext]
    F --> I[Auth Tag]
    
    J[Encrypted File] --> K[Algorithm: AES-GCM-256]
    J --> L[Nonce: 96 bits]
    J --> M[Ciphertext]
    J --> N[Auth Tag: 128 bits]
    J --> O[KDF Params + Salt]

4. Database Security

All database operations follow strict security principles.

Read-Only Operations

  • SELECT/DESCRIBE Only: No INSERT, UPDATE, DELETE, or DDL operations
  • Connection Flags: Read-only mode enforced at connection level where supported
  • User Permissions: Recommend using read-only database users

Connection Security

  • Timeouts: All connections and queries have configurable timeouts (default: 30s)
  • Connection Pooling: Limited connection pools prevent resource exhaustion
  • TLS/SSL: Certificate validation when using encrypted connections
  • Parameterized Queries: All queries use parameterized statements (no SQL injection risk)

Example Connection Configuration

#![allow(unused)]
fn main() {
ConnectionConfig {
    connect_timeout: Duration::from_secs(30),
    query_timeout: Duration::from_secs(30),
    max_connections: 10,
    read_only: true,  // Enforced
    // Credentials handled separately
}
}

Security Testing

DBSurveyor includes comprehensive security tests to verify all guarantees.

Credential Protection Tests

#![allow(unused)]
fn main() {
#[test]
fn test_no_credentials_in_output() {
    let schema = collect_schema("postgres://user:secret@localhost/db").await?;
    let json = serde_json::to_string(&schema)?;

    // Verify no sensitive data is present
    assert!(!json.contains("secret"));
    assert!(!json.contains("user:secret"));
    assert!(!json.contains("password"));
}
}

Encryption Tests

#![allow(unused)]
fn main() {
#[test]
fn test_nonce_uniqueness() {
    let data = b"test schema data";
    let encrypted1 = encrypt_data(data, "password")?;
    let encrypted2 = encrypt_data(data, "password")?;

    // Same data should produce different ciphertext (random nonce)
    assert_ne!(encrypted1.nonce, encrypted2.nonce);
    assert_ne!(encrypted1.ciphertext, encrypted2.ciphertext);
}
}

Offline Operation Tests

#![allow(unused)]
fn main() {
#[test]
fn test_airgap_compatibility() {
    // Simulate airgap environment
    let schema = load_schema("fixtures/sample_schema.json")?;

    // All processing should work offline
    let documentation = generate_documentation(&schema, OutputFormat::Markdown)?;
    assert!(!documentation.is_empty());
}
}

Security Best Practices

For Users

  1. Use Read-Only Database Users

    -- PostgreSQL
    CREATE USER dbsurveyor_readonly;
    GRANT CONNECT ON DATABASE mydb TO dbsurveyor_readonly;
    GRANT USAGE ON SCHEMA public TO dbsurveyor_readonly;
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_readonly;
    
    -- MySQL
    CREATE USER 'dbsurveyor_readonly'@'%' IDENTIFIED BY 'password';
    GRANT SELECT ON mydb.* TO 'dbsurveyor_readonly'@'%';
    
  2. Use Strong Encryption Passwords

    • Minimum 12 characters
    • Mix of uppercase, lowercase, numbers, symbols
    • Consider using password managers
  3. Secure File Handling

    # Set restrictive permissions on encrypted files
    chmod 600 schema.enc
    
    # Use secure deletion when done
    shred -vfz -n 3 schema.enc
    
  4. Network Security

    • Use TLS/SSL connections when possible
    • Avoid credentials in command history
    • Use environment variables for connection strings

For Developers

  1. Never Log Credentials

    #![allow(unused)]
    fn main() {
    // βœ… Correct
    log::info!("Connecting to database");
    
    // ❌ Never do this
    log::info!("Connecting to {}", database_url);
    }
  2. Sanitize Error Messages

    #![allow(unused)]
    fn main() {
    // βœ… Correct
    #[error("Connection failed to database")]
    ConnectionFailed,
    
    // ❌ Never do this
    #[error("Failed to connect to {url}")]
    ConnectionError { url: String },
    }
  3. Use Secure Memory Handling

    #![allow(unused)]
    fn main() {
    use zeroize::{Zeroize, Zeroizing};
    
    #[derive(Zeroize)]
    #[zeroize(drop)]
    struct Credentials {
        username: Zeroizing<String>,
        password: Zeroizing<Option<String>>,
    }
    }

Security Validation Commands

Run these commands to verify security guarantees:

# Complete security validation
just security-full

# Test encryption capabilities
just test-encryption

# Verify offline operation
just test-offline

# Check for credential leakage
just test-credential-security

# Security audit
just audit

Threat Model

In Scope

  • Credential Exposure: Preventing database credentials from appearing in logs, outputs, or error messages
  • Data at Rest: Protecting collected schema files with strong encryption
  • Network Isolation: Ensuring offline operation after initial setup
  • Memory Safety: Secure handling of sensitive data in memory

Out of Scope

  • Network Transport Security: TLS/SSL configuration is the user’s responsibility
  • Database Server Security: DBSurveyor assumes the database server is trusted
  • Host System Security: Operating system and file system security is external
  • Social Engineering: User education and operational security practices

Compliance Considerations

DBSurveyor’s security features support various compliance requirements:

  • SOC 2: Offline operation and encryption support security controls
  • GDPR: Data minimization and encryption for personal data protection
  • HIPAA: Encryption and access controls for healthcare data
  • PCI DSS: Secure handling of systems that may contain payment data

Security Advisories

Current known security advisories and their status:

RUSTSEC-2023-0071 (RSA crate - Marvin Attack)

  • Severity: Medium
  • Impact: Timing side-channel vulnerability in RSA implementation
  • Affected: MySQL connections only (not enabled by default)
  • Mitigation: MySQL support disabled by default; PostgreSQL and SQLite recommended
  • Status: No fix available upstream; tracked for future SQLx updates
  • Review Date: 2025-09-01

Reporting Security Issues

To report security vulnerabilities:

  1. DO NOT create public GitHub issues for security vulnerabilities
  2. Email security issues to: security@evilbitlabs.io
  3. Include detailed reproduction steps and impact assessment
  4. Allow reasonable time for response and remediation

We follow responsible disclosure practices and will acknowledge security reports within 48 hours.

Configuration

DBSurveyor can be configured through command-line options, environment variables, and configuration files. This guide covers all configuration options and best practices.

Environment Variables

Database Connection

# Primary database connection
export DATABASE_URL="postgres://user:password@localhost:5432/mydb"

# Use with collector without specifying URL
dbsurveyor-collect

Logging Configuration

# Set log level for all modules
export RUST_LOG=info

# Set log level for specific modules
export RUST_LOG=dbsurveyor_collect=debug,dbsurveyor_core=trace

# Disable ANSI color output and progress indicators
# Follows the no-color.org convention
# Useful for CI environments, non-TTY contexts, and accessibility
export NO_COLOR=1            # Any value disables color and progress output
export TERM=dumb             # Alternative way to disable color/progress output

# Log levels: error, warn, info, debug, trace
export RUST_LOG=debug

Connection Pool Configuration

# Maximum connection pool size (default: 10)
export DBSURVEYOR_MAX_CONNECTIONS=10

# Minimum idle connections in pool (default: 2)
export DBSURVEYOR_MIN_IDLE_CONNECTIONS=2

# Connection timeout in seconds (default: 30)
export DBSURVEYOR_CONNECT_TIMEOUT_SECS=30

# Idle connection timeout in seconds (default: 600)
export DBSURVEYOR_IDLE_TIMEOUT_SECS=600

# Maximum connection lifetime in seconds (default: 3600)
export DBSURVEYOR_MAX_LIFETIME_SECS=3600

Security Configuration

# Disable credential prompts (use with caution)
export DBSURVEYOR_NO_PROMPT=1

# Default encryption password (NOT RECOMMENDED for production)
export DBSURVEYOR_ENCRYPTION_PASSWORD="your-password"

Command-Line Configuration

Global Options

Available for both dbsurveyor-collect and dbsurveyor:

# Verbosity levels
dbsurveyor-collect -v postgres://localhost/db      # Info level
dbsurveyor-collect -vv postgres://localhost/db     # Debug level
dbsurveyor-collect -vvv postgres://localhost/db    # Trace level

# Quiet mode (errors only)
dbsurveyor-collect -q postgres://localhost/db

# Help and version
dbsurveyor-collect --help
dbsurveyor-collect --version

Collection Configuration

Connection Settings

# Connection timeout (default: 30s)
dbsurveyor-collect --connect-timeout 60 postgres://localhost/db

# Query timeout (default: 30s)
dbsurveyor-collect --query-timeout 45 postgres://localhost/db

# Maximum connections (default: 10)
dbsurveyor-collect --max-connections 5 postgres://localhost/db

Data Collection Settings

# Sample size per table (default: 100, max: 10,000)
# Values above 10,000 are capped to prevent resource exhaustion
dbsurveyor-collect --sample 50 postgres://localhost/db

# Throttle delay between operations in milliseconds
dbsurveyor-collect --throttle 1000 postgres://localhost/db

# Include/exclude specific object types
dbsurveyor-collect --no-views postgres://localhost/db
dbsurveyor-collect --no-procedures postgres://localhost/db
dbsurveyor-collect --no-triggers postgres://localhost/db

Multi-Database Settings

# Collect all databases
dbsurveyor-collect --all-databases postgres://localhost

# Include system databases
dbsurveyor-collect --all-databases --include-system-databases postgres://localhost

# Exclude specific databases
dbsurveyor-collect --all-databases --exclude-databases "postgres,template0,template1" postgres://localhost

Documentation Configuration

Output Format Settings

# Output format
dbsurveyor --format html schema.json
dbsurveyor --format markdown schema.json
dbsurveyor --format json schema.json
dbsurveyor --format mermaid schema.json

# Custom output file
dbsurveyor --output custom-name.html --format html schema.json

Redaction Settings

# Redaction modes
dbsurveyor --redact-mode none schema.json        # No redaction
dbsurveyor --redact-mode minimal schema.json     # Minimal redaction
dbsurveyor --redact-mode balanced schema.json    # Balanced (default)
dbsurveyor --redact-mode conservative schema.json # Maximum redaction

# Disable all redaction
dbsurveyor --no-redact schema.json

Configuration Files

Workspace Configuration

Create a .dbsurveyor.toml file in your project root:

# .dbsurveyor.toml
[collection]
# Default connection settings
connect_timeout = "30s"
query_timeout = "30s"
max_connections = 10
sample_size = 100       # Max: 10,000 (values above are clamped)
throttle_ms = 0

# Object collection settings
include_views = true
include_procedures = true
include_functions = true
include_triggers = true
include_indexes = true
include_constraints = true
include_custom_types = true

# Multi-database settings
include_system_databases = false
exclude_databases = ["postgres", "template0", "template1"]

[output]
# Default output settings
format = "markdown"
compression = false
encryption = false

[redaction]
# Data redaction settings
mode = "balanced"
custom_patterns = [
  { pattern = "(?i)(api[_-]?key)", replacement = "[API_KEY]" },
  { pattern = "(?i)(token)", replacement = "[TOKEN]" },
]

[security]
# Security settings
read_only = true
sanitize_credentials = true

User Configuration

Create a global configuration file at ~/.config/dbsurveyor/config.toml:

# ~/.config/dbsurveyor/config.toml
[defaults]
# Default database type for ambiguous connections
database_type = "postgresql"

# Default output directory
output_dir = "~/Documents/database-docs"

# Default logging level
log_level = "info"

[security]
# Security preferences
always_encrypt = false
strong_passwords_only = true
credential_timeout = "5m"

[templates]
# Custom template directory
template_dir = "~/.config/dbsurveyor/templates"

Database-Specific Configuration

PostgreSQL Configuration

# SSL/TLS settings
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=verify-full&sslcert=client.crt&sslkey=client.key"

# Connection pool settings
dbsurveyor-collect "postgres://user:pass@localhost/db?pool_max_conns=5&pool_timeout=30"

# Schema-specific collection
dbsurveyor-collect "postgres://user:pass@localhost/db?search_path=public,custom"

MySQL Configuration

# SSL settings
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=REQUIRED"

# Character set
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"

# Connection timeout
dbsurveyor-collect "mysql://user:pass@localhost/db?connect_timeout=30"

SQLite Configuration

# Read-only mode
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"

# Busy timeout
dbsurveyor-collect "sqlite:///path/to/db.sqlite?busy_timeout=30000"

# Cache size
dbsurveyor-collect "sqlite:///path/to/db.sqlite?cache_size=2000"

MongoDB Configuration

# Authentication database
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?authSource=admin"

# SSL settings
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?ssl=true&sslVerify=false"

# Connection timeout
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?connectTimeoutMS=30000"

Advanced Configuration

Custom Redaction Patterns

Create custom redaction rules:

# .dbsurveyor.toml
[redaction.patterns]
# Credit card numbers
credit_card = { pattern = "\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}", replacement = "[CREDIT_CARD]" }

# Social Security Numbers
ssn = { pattern = "\\d{3}-\\d{2}-\\d{4}", replacement = "[SSN]" }

# Phone numbers
phone = { pattern = "\\(\\d{3}\\)\\s?\\d{3}-\\d{4}", replacement = "[PHONE]" }

# Custom API keys
api_key = { pattern = "(?i)api[_-]?key[\"']?\\s*[:=]\\s*[\"']?([a-zA-Z0-9]{32,})", replacement = "[API_KEY]" }

Performance Tuning

# .dbsurveyor.toml
[performance]
# Connection pool settings
max_connections = 10
min_connections = 2
connection_timeout = "30s"
idle_timeout = "10m"
max_lifetime = "1h"

# Query settings
query_timeout = "30s"
batch_size = 1000
max_concurrent_queries = 5

# Memory settings
max_memory_mb = 512
streaming_threshold_mb = 100

Output Customization

# .dbsurveyor.toml
[output.html]
# HTML-specific settings
theme = "dark"
include_search = true
include_toc = true
syntax_highlighting = true

[output.markdown]
# Markdown-specific settings
include_toc = true
table_format = "github"
code_blocks = true

[output.json]
# JSON-specific settings
pretty_print = true
include_metadata = true
compress = false

Environment-Specific Configuration

Development Environment

# .env.development
DATABASE_URL=postgres://dev_user:dev_pass@localhost:5432/dev_db
RUST_LOG=debug
DBSURVEYOR_SAMPLE_SIZE=10
DBSURVEYOR_THROTTLE=0

Production Environment

# .env.production
DATABASE_URL=postgres://readonly_user:secure_pass@prod-db:5432/prod_db
RUST_LOG=warn
DBSURVEYOR_SAMPLE_SIZE=0
DBSURVEYOR_THROTTLE=1000
DBSURVEYOR_ENCRYPT=true

CI/CD Environment

# .env.ci
DATABASE_URL=postgres://ci_user:ci_pass@ci-db:5432/test_db
RUST_LOG=info
NO_COLOR=1
DBSURVEYOR_NO_PROMPT=1
DBSURVEYOR_OUTPUT_FORMAT=json

Configuration Validation

Validate your configuration:

# Check configuration syntax
dbsurveyor-collect --check-config

# Dry run with configuration
dbsurveyor-collect --dry-run postgres://localhost/db

# Show effective configuration
dbsurveyor-collect --show-config

Security Best Practices

Credential Management

# Use environment variables instead of command line
export DATABASE_URL="postgres://user:pass@localhost/db"
dbsurveyor-collect

# Use credential files with restricted permissions
echo "postgres://user:pass@localhost/db" > ~/.dbsurveyor/credentials
chmod 600 ~/.dbsurveyor/credentials
dbsurveyor-collect --credentials-file ~/.dbsurveyor/credentials

Configuration File Security

# Secure configuration files
chmod 600 .dbsurveyor.toml
chmod 600 ~/.config/dbsurveyor/config.toml

# Use environment variable substitution
# .dbsurveyor.toml
[collection]
database_url = "${DATABASE_URL}"
encryption_password = "${ENCRYPTION_PASSWORD}"

Audit Configuration

# Log configuration usage
export RUST_LOG=dbsurveyor_core::config=debug

# Review effective configuration
dbsurveyor-collect --show-config --dry-run

Troubleshooting Configuration

Common Issues

Configuration not found:

# Check configuration file locations
ls -la .dbsurveyor.toml
ls -la ~/.config/dbsurveyor/config.toml

# Use explicit configuration file
dbsurveyor-collect --config custom-config.toml

Environment variable not recognized:

# Check environment variable names (case sensitive)
env | grep DBSURVEYOR
env | grep DATABASE_URL

# Verify variable export
echo $DATABASE_URL

Permission denied:

# Check file permissions
ls -la .dbsurveyor.toml
chmod 644 .dbsurveyor.toml  # Make readable

Debug Configuration

# Show all configuration sources
dbsurveyor-collect --show-config --verbose

# Test configuration parsing
dbsurveyor-collect --check-config --verbose

# Trace configuration loading
export RUST_LOG=dbsurveyor_core::config=trace
dbsurveyor-collect --dry-run

Migration and Upgrades

Configuration Migration

When upgrading DBSurveyor versions:

# Backup existing configuration
cp .dbsurveyor.toml .dbsurveyor.toml.backup

# Check for configuration changes
dbsurveyor-collect --check-config --verbose

# Migrate configuration format
dbsurveyor-collect --migrate-config

Version Compatibility

# .dbsurveyor.toml
[meta]
# Specify minimum required version
min_version = "0.1.0"
config_version = "1.0"

This comprehensive configuration system allows you to customize DBSurveyor for your specific needs while maintaining security and performance.

CLI Reference

Complete reference for all DBSurveyor command-line options.

dbsurveyor-collect

Database schema collection tool.

Synopsis

dbsurveyor-collect [OPTIONS] [DATABASE_URL]
dbsurveyor-collect <COMMAND>

Global Options

OptionShortDescription
--verbose-vIncrease verbosity (-v, -vv, -vvv)
--quiet-qSuppress all output except errors
--help-hPrint help information
--version-VPrint version information

Collection Options

OptionDescriptionDefaultStatus
--database-url <URL>Database connection stringFrom DATABASE_URL env varβœ… Implemented
--output <PATH>Output file pathschema.dbsurveyor.jsonβœ… Implemented
--sample <N>Number of sample rows per table100🚧 Planned
--throttle <MS>Delay between operations (ms)None🚧 Planned
--compressCompress output using Zstandardfalseβœ… Implemented
--encryptEncrypt output using AES-GCMfalseβœ… Implemented
--enable-qualityEnable data quality analysis on sampled datafalseβœ… Implemented
--quality-threshold <LIST>Quality thresholds (e.g., completeness:0.9,uniqueness:0.95,consistency:0.85)Noneβœ… Implemented
--all-databasesCollect all accessible databasesfalse🚧 Planned
--include-system-databasesInclude system databasesfalse🚧 Planned
--exclude-databases <LIST>Comma-separated list to excludeNone🚧 Planned

Commands

collect

Collect schema from database.

dbsurveyor-collect collect <DATABASE_URL> [--output <PATH>]

test

Test database connection without collecting schema.

dbsurveyor-collect test <DATABASE_URL>

list

List supported database types and connection formats.

dbsurveyor-collect list

Examples

# Basic collection
dbsurveyor-collect postgres://user:pass@localhost/db

# With custom output file
dbsurveyor-collect --output my-schema.json postgres://localhost/db

# Encrypted and compressed
dbsurveyor-collect --encrypt --compress postgres://localhost/db

# Multi-database collection
dbsurveyor-collect --all-databases --exclude-databases postgres,template0 postgres://localhost

# Test connection only
dbsurveyor-collect test mysql://root:password@localhost/mydb

# Throttled collection (stealth mode)
dbsurveyor-collect --throttle 1000 postgres://localhost/db

Connection String Formats

DatabaseFormatExampleStatus
PostgreSQLpostgres://user:pass@host:port/dbpostgres://admin:secret@localhost:5432/mydbβœ… Implemented
SQLitesqlite:///path/to/filesqlite:///home/user/data.dbβœ… Implemented
MySQLmysql://user:pass@host:port/dbmysql://root:password@localhost:3306/mydb🚧 In Development
MongoDBmongodb://user:pass@host:port/dbmongodb://admin:secret@localhost:27017/mydb🚧 Planned
SQL Servermssql://user:pass@host:port/dbmssql://sa:password@localhost:1433/mydb🚧 Planned

Environment Variables

VariableDescription
DATABASE_URLDefault database connection string
RUST_LOGLogging configuration (error, warn, info, debug, trace)
DBSURVEYOR_MAX_CONNECTIONSMaximum connection pool size (default: 10)
DBSURVEYOR_MIN_IDLE_CONNECTIONSMinimum idle connections in pool (default: 2)
DBSURVEYOR_CONNECT_TIMEOUT_SECSConnection timeout in seconds (default: 30)
DBSURVEYOR_IDLE_TIMEOUT_SECSIdle connection timeout in seconds (default: 600)
DBSURVEYOR_MAX_LIFETIME_SECSMaximum connection lifetime in seconds (default: 3600)

dbsurveyor

Database schema documentation and analysis tool.

Synopsis

dbsurveyor [OPTIONS] [INPUT_FILE]
dbsurveyor <COMMAND>

Global Options

OptionShortDescription
--verbose-vIncrease verbosity (-v, -vv, -vvv)
--quiet-qSuppress all output except errors
--help-hPrint help information
--version-VPrint version information

Documentation Options

OptionShortDescriptionDefault
--format <FORMAT>-fOutput formatmarkdown
--output <PATH>-oOutput file pathAuto-detected
--redact-mode <MODE>Data redaction levelbalanced
--no-redactDisable all data redactionfalse

Output Formats

FormatDescriptionExtensionStatus
markdownMarkdown documentation.mdβœ… Implemented
jsonJSON analysis report.jsonβœ… Implemented
htmlHTML report with search.html🚧 Placeholder
mermaidMermaid ERD diagram.mmd🚧 Placeholder

Redaction Modes

ModeDescription
noneNo redaction (show all data)
minimalMinimal redaction (only obvious sensitive fields)
balancedBalanced redaction (recommended default)
conservativeConservative redaction (maximum privacy)

Commands

generate

Generate documentation from schema file.

dbsurveyor generate <INPUT_FILE> [OPTIONS]

Options:

  • --format <FORMAT> - Output format
  • --output <PATH> - Output file path

analyze

Analyze schema for insights and statistics.

dbsurveyor analyze <INPUT_FILE> [--detailed]

Options:

  • --detailed - Show detailed analysis statistics

sql

Reconstruct SQL DDL from schema.

dbsurveyor sql <INPUT_FILE> [OPTIONS]

Options:

  • --dialect <DIALECT> - Target SQL dialect (default: postgresql)
  • --output <PATH> - Output file path

SQL Dialects:

  • postgresql - PostgreSQL dialect (🚧 Placeholder)
  • mysql - MySQL dialect (🚧 Placeholder)
  • sqlite - SQLite dialect (🚧 Placeholder)
  • sqlserver - SQL Server dialect (🚧 Placeholder)
  • generic - Generic SQL (ANSI standard) (🚧 Placeholder)

validate

Validate schema file format.

dbsurveyor validate <INPUT_FILE>

Examples

# Generate Markdown documentation
dbsurveyor generate schema.dbsurveyor.json

# Generate HTML report
dbsurveyor --format html --output report.html schema.json

# Process encrypted schema (will prompt for password)
dbsurveyor generate schema.enc

# Generate SQL DDL for MySQL
dbsurveyor sql schema.json --dialect mysql --output recreate.sql

# Analyze schema with detailed statistics
dbsurveyor analyze schema.json --detailed

# Validate schema file format
dbsurveyor validate schema.dbsurveyor.json

# Generate with conservative redaction
dbsurveyor --redact-mode conservative schema.json

# Generate without any redaction
dbsurveyor --no-redact schema.json

Input File Formats

DBSurveyor automatically detects input file formats:

ExtensionFormatDescription
.jsonJSONStandard schema format
.zstCompressedZstandard compressed JSON
.encEncryptedAES-GCM encrypted JSON

Exit Codes

CodeDescription
0Success
1General error

Note: Both binaries currently use exit code 1 for all error conditions. Granular exit codes (e.g., file-not-found, connection failure) may be added in a future release.

Environment Variables

VariableDescription
RUST_LOGLogging configuration
NO_COLORDisable colored output
DBSURVEYOR_MAX_CONNECTIONSMaximum connection pool size (default: 10)
DBSURVEYOR_MIN_IDLE_CONNECTIONSMinimum idle connections in pool (default: 2)
DBSURVEYOR_CONNECT_TIMEOUT_SECSConnection timeout in seconds (default: 30)
DBSURVEYOR_IDLE_TIMEOUT_SECSIdle connection timeout in seconds (default: 600)
DBSURVEYOR_MAX_LIFETIME_SECSMaximum connection lifetime in seconds (default: 3600)

Common Usage Patterns

Secure Workflow

# 1. Test connection
dbsurveyor-collect test postgres://user:pass@localhost/db

# 2. Collect with encryption
dbsurveyor-collect --encrypt postgres://user:pass@localhost/db

# 3. Generate documentation offline
dbsurveyor generate schema.enc

# 4. Validate output
dbsurveyor validate schema.enc

Multi-Database Documentation

# Collect from multiple databases
dbsurveyor-collect --all-databases postgres://localhost > collection.log

# Generate comprehensive report
dbsurveyor --format html --output full-report.html schema.dbsurveyor.json

# Extract SQL for specific database
dbsurveyor sql schema.json --dialect postgresql --output postgres-ddl.sql

Development Workflow

# Quick collection and documentation
dbsurveyor-collect sqlite://dev.db && dbsurveyor generate schema.dbsurveyor.json

# Analyze changes
dbsurveyor analyze schema.dbsurveyor.json --detailed

# Generate multiple formats
for format in markdown html mermaid; do
    dbsurveyor --format $format schema.dbsurveyor.json
done

Output Formats

DBSurveyor generates structured output in multiple formats to support different use cases and workflows.

Schema File Formats

Standard JSON Format (.dbsurveyor.json)

The primary output format is a JSON file following the DBSurveyor Schema v1.0 specification:

{
  "format_version": "1.0",
  "database_info": {
    "name": "production_db",
    "version": "13.7",
    "size_bytes": 1073741824,
    "encoding": "UTF8",
    "collation": "en_US.UTF-8",
    "access_level": "Full",
    "collection_status": "Success"
  },
  "tables": [
    {
      "name": "users",
      "schema": "public",
      "columns": [
        {
          "name": "id",
          "data_type": {
            "Integer": {
              "bits": 32,
              "signed": true
            }
          },
          "is_nullable": false,
          "is_primary_key": true,
          "is_auto_increment": true,
          "ordinal_position": 1
        }
      ],
      "primary_key": {
        "name": "users_pkey",
        "columns": [
          "id"
        ]
      },
      "foreign_keys": [],
      "indexes": [],
      "constraints": []
    }
  ],
  "views": [],
  "indexes": [],
  "constraints": [],
  "procedures": [],
  "functions": [],
  "triggers": [],
  "custom_types": [],
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 1500,
    "collector_version": "1.0.0",
    "warnings": []
  }
}

Table Sampling Data

Tables may optionally include a sampling field containing sampled row data with metadata:

{
  "table_name": "users",
  "schema_name": "public",
  "rows": [
    {
      "id": 1,
      "username": "alice",
      "email": "alice@example.com"
    },
    {
      "id": 2,
      "username": "bob",
      "email": "bob@example.com"
    }
  ],
  "sample_size": 2,
  "total_rows": 1234,
  "sampling_strategy": {
    "MostRecent": {
      "limit": 10
    }
  },
  "collected_at": "2024-01-15T10:30:00Z",
  "warnings": [],
  "sample_status": "Complete"
}

Sample Status Field

The optional sample_status field indicates the outcome of the sampling operation:

  • "Complete": Sampling completed successfully with no issues
  • {"PartialRetry": {"original_limit": 10}}: Sampling partially completed with a reduced row limit (may be retried)
  • {"Skipped": {"reason": "table too large"}}: Sampling was skipped or not performed

Backward Compatibility: The sample_status field is optional and omitted when not set, ensuring older JSON files without this field remain valid.

Compressed Format (.dbsurveyor.json.zst)

Large schema files can be compressed using Zstandard compression:

# Generate compressed output
dbsurveyor-collect --compress postgres://localhost/large_db

# Postprocessor automatically detects and decompresses
dbsurveyor generate schema.dbsurveyor.json.zst

Benefits:

  • 60-80% size reduction for large schemas
  • Fast compression/decompression
  • Maintains full data integrity

Encrypted Format (.enc)

Sensitive schema data can be encrypted using AES-GCM-256:

# Generate encrypted output (prompts for password)
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db

# Postprocessor prompts for decryption password
dbsurveyor generate schema.enc

Security Features:

  • AES-GCM-256 authenticated encryption
  • Argon2id key derivation (64 MiB memory, 3 iterations)
  • Random 96-bit nonces (never reused)
  • Embedded KDF parameters for decryption

Encrypted File Structure:

{
  "algorithm": "AES-GCM-256",
  "nonce": [/* 12 bytes */],
  "ciphertext": [/* encrypted data */],
  "auth_tag": [/* 16 bytes */],
  "kdf_params": {
    "salt": [/* 16 bytes */],
    "memory_cost": 65536,
    "time_cost": 3,
    "parallelism": 4,
    "version": "1.3"
  }
}

Documentation Formats

Markdown Documentation (.md)

Status: βœ… Implemented

Basic Markdown documentation with schema overview:

dbsurveyor generate schema.json --format markdown

Output Example:

# Database Schema: production_db

Generated by DBSurveyor v1.0.0
Collection Date: 2024-01-15 10:30:00 UTC

## Summary

- **Tables**: 25
- **Views**: 3
- **Indexes**: 47

## Tables

### users
- **Schema**: public
- **Columns**: 5
- **Primary Key**: id

JSON Analysis Report (.json)

Status: βœ… Implemented

Structured analysis data for programmatic consumption:

dbsurveyor analyze schema.json --detailed

Output Example:

{
  "database_name": "production_db",
  "table_count": 25,
  "view_count": 3,
  "index_count": 47,
  "constraint_count": 32,
  "collection_date": "2024-01-15T10:30:00Z",
  "analysis": {
    "largest_table": "audit_log",
    "most_indexes": "users",
    "foreign_key_relationships": 18
  }
}

HTML Report (.html)

Status: 🚧 Placeholder Implementation

Interactive HTML reports with search and navigation:

dbsurveyor generate schema.json --format html

Planned Features:

  • Interactive table browser
  • Search functionality
  • Relationship visualization
  • Responsive design for mobile/desktop
  • Offline-compatible (no external dependencies)

Mermaid ERD (.mmd)

Status: 🚧 Placeholder Implementation

Entity Relationship Diagrams using Mermaid syntax:

dbsurveyor generate schema.json --format mermaid

Planned Output:

erDiagram
    users {
        int id PK
        string username
        string email UK
        datetime created_at
    }

    orders {
        int id PK
        int user_id FK
        decimal total
        datetime created_at
    }

    users ||--o{ orders : "has many"

SQL DDL (.sql)

Status: 🚧 Placeholder Implementation

Reconstructed SQL DDL for schema recreation:

dbsurveyor sql schema.json --dialect postgresql --output recreate.sql

Planned Features:

  • Multiple SQL dialect support (PostgreSQL, MySQL, SQLite, SQL Server)
  • Complete schema recreation scripts
  • Data type mapping between databases
  • Constraint and index recreation
  • Migration script generation

Unified Data Type System

DBSurveyor uses a unified type system to represent data types across different databases:

Basic Types

{
  "String": {
    "max_length": 255
  },
  "Integer": {
    "bits": 32,
    "signed": true
  },
  "Float": {
    "precision": 53
  },
  "Boolean": null,
  "Date": null,
  "DateTime": {
    "with_timezone": true
  },
  "Time": {
    "with_timezone": false
  }
}

Advanced Types

{
  "Binary": {
    "max_length": 1024
  },
  "Json": null,
  "Uuid": null,
  "Array": {
    "element_type": {
      "String": {
        "max_length": 50
      }
    }
  },
  "Custom": {
    "type_name": "user_status_enum"
  }
}

Database-Specific Mapping

DBSurveyor TypePostgreSQLMySQLSQLiteMongoDB
StringVARCHAR, TEXTVARCHAR, TEXTTEXTstring
IntegerINTEGER, BIGINTINT, BIGINTINTEGERint, long
FloatREAL, DOUBLEFLOAT, DOUBLEREALdouble
BooleanBOOLEANBOOLEANINTEGERbool
JsonJSON, JSONBJSONTEXTobject
ArrayARRAY[]JSONTEXTarray
CustomENUM, DOMAINENUMN/AN/A

Schema Validation

All output files are validated against the DBSurveyor JSON Schema v1.0:

Validation Features

  • Format Version Checking: Ensures compatibility
  • Required Field Validation: All mandatory fields present
  • Data Type Validation: Correct type structure
  • Security Validation: No credential fields allowed
  • Relationship Validation: Foreign key consistency

Validation Errors

# Validate a schema file
dbsurveyor validate schema.json

# Example validation error
Error: Schema validation failed
- Missing required field: collection_metadata.collected_at
- Invalid data type: expected Integer, found String
- Security violation: credential field detected in column default_value

File Size Considerations

Compression Recommendations

Schema SizeRecommendationExpected Compression
< 1 MBStandard JSONN/A
1-10 MBZstandard compression60-70% reduction
> 10 MBCompression + chunking70-80% reduction

Large Schema Handling

For very large databases (1000+ tables):

# Use compression for large schemas
dbsurveyor-collect --compress postgres://localhost/large_db

# Consider sampling for development
dbsurveyor-collect --sample 10 postgres://localhost/large_db

Security Considerations

Credential Protection

All output formats guarantee:

  • No database credentials in any output file
  • Connection strings sanitized in metadata
  • Error messages sanitized
  • Validation prevents credential leakage

Encryption Best Practices

# Use strong passwords for encryption
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db

# Verify encrypted file integrity
dbsurveyor validate schema.enc

Airgap Compatibility

All formats work completely offline:

  • No external dependencies in generated files
  • Self-contained documentation
  • Local-only processing
  • No network calls during generation

Format Evolution

Version Compatibility

  • v1.0: Current format with full backward compatibility
  • Future versions: Additive changes only
  • Migration tools: Automatic format upgrades
  • Deprecation policy: 2-version support window

Schema Extensions

The format supports extensions for:

  • Database-specific features
  • Custom metadata
  • Plugin-generated data
  • Performance metrics

Integration Examples

CI/CD Pipeline

# Generate schema documentation in CI
  - name: Collect Database Schema
    run: |
      dbsurveyor-collect --compress $DATABASE_URL
      dbsurveyor generate schema.dbsurveyor.json.zst --format markdown

  - name: Upload Documentation
    uses: actions/upload-artifact@v3
    with:
      name: database-docs
      path: schema.md

Programmatic Access

import json

# Load schema data
with open("schema.dbsurveyor.json") as f:
    schema = json.load(f)

# Analyze tables
for table in schema["tables"]:
    print(f"Table: {table['name']}")
    print(f"Columns: {len(table['columns'])}")

Documentation Automation

#!/bin/bash
# Automated documentation generation

# Collect from multiple environments
dbsurveyor-collect --output prod.json $PROD_DB_URL
dbsurveyor-collect --output staging.json $STAGING_DB_URL

# Generate documentation
dbsurveyor generate prod.json --format html --output prod-schema.html
dbsurveyor generate staging.json --format html --output staging-schema.html

# Compare schemas (future feature)
# dbsurveyor compare prod.json staging.json --output diff.html

This comprehensive output format system ensures that DBSurveyor can adapt to various workflows while maintaining security and compatibility guarantees.

JSON Schema Specification

The .dbsurveyor.json format is the standard output format for DBSurveyor schema collection. This specification documents the complete structure, validation rules, and usage examples for the schema format.

Overview

The .dbsurveyor.json format provides a comprehensive, validated representation of database schemas with the following characteristics:

  • Security-First: No credential fields or sensitive data allowed
  • Validation-Complete: Full JSON Schema validation ensures data integrity
  • Database-Agnostic: Unified representation across PostgreSQL, MySQL, SQLite, and MongoDB
  • Version-Aware: Format versioning for backward compatibility
  • Frictionless-Compatible: Based on Frictionless Data Table Schema specification

Schema Structure

Root Object

Every .dbsurveyor.json file contains these required fields:

{
  "format_version": "1.0",
  "database_info": { /* Database metadata */ },
  "tables": [ /* Table definitions */ ],
  "collection_metadata": { /* Collection process info */ }
}

Required Fields

FieldTypeDescription
format_versionStringSchema format version (currently β€œ1.0”)
database_infoObjectDatabase-level information and status
collection_metadataObjectCollection process metadata

Optional Fields

FieldTypeDescription
tablesArrayTable definitions (default: empty array)
viewsArrayView definitions
indexesArrayDatabase indexes
constraintsArrayDatabase constraints
proceduresArrayStored procedures
functionsArrayDatabase functions
triggersArrayDatabase triggers
custom_typesArrayCustom data types
samplesArrayData samples from tables

Database Information

The database_info object contains essential database metadata:

{
  "name": "production_db",
  "version": "13.7",
  "size_bytes": 1073741824,
  "encoding": "UTF8",
  "collation": "en_US.UTF-8",
  "owner": "dbadmin",
  "is_system_database": false,
  "access_level": "Full",
  "collection_status": "Success"
}

Access Levels

  • Full: Complete schema access with all metadata
  • Limited: Partial access due to permission constraints
  • None: No access to schema information

Collection Status

  • "Success": Schema collected successfully
  • {"Failed": {"error": "Permission denied"}}: Collection failed with reason
  • {"Skipped": {"reason": "System database"}}: Database skipped with explanation

Table Structure

Tables are defined with comprehensive metadata:

{
  "name": "users",
  "schema": "public",
  "columns": [
    {
      "name": "id",
      "data_type": {
        "Integer": {
          "bits": 32,
          "signed": true
        }
      },
      "is_nullable": false,
      "is_primary_key": true,
      "is_auto_increment": true,
      "ordinal_position": 1,
      "comment": "Unique user identifier"
    },
    {
      "name": "email",
      "data_type": {
        "String": {
          "max_length": 255
        }
      },
      "is_nullable": false,
      "ordinal_position": 2,
      "comment": "User email address"
    }
  ],
  "primary_key": {
    "name": "users_pkey",
    "columns": [
      "id"
    ]
  },
  "foreign_keys": [
    {
      "name": "users_profile_fk",
      "columns": [
        "profile_id"
      ],
      "referenced_table": "profiles",
      "referenced_columns": [
        "id"
      ],
      "on_delete": "Cascade",
      "on_update": "Cascade"
    }
  ],
  "indexes": [
    {
      "name": "users_email_idx",
      "columns": [
        {
          "name": "email",
          "order": "asc"
        }
      ],
      "is_unique": true,
      "index_type": "btree"
    }
  ],
  "comment": "User account information",
  "row_count": 50000
}

Data Type System

DBSurveyor uses a unified data type system that maps database-specific types to a common representation:

Basic Types

"Boolean"                    // Boolean values
"Date"                       // Date without time
"Json"                       // JSON data
"Uuid"                       // UUID/GUID values

String Types

{
  "String": {
    "max_length": 255        // Maximum character length
  }
}

Numeric Types

{
  "Integer": {
    "bits": 32,              // Bit size: 8, 16, 32, 64, 128
    "signed": true           // Signed or unsigned
  }
}

{
  "Float": {
    "precision": 53          // Floating point precision (1-53)
  }
}

Date/Time Types

{
  "DateTime": {
    "with_timezone": true    // Includes timezone information
  }
}

{
  "Time": {
    "with_timezone": false   // Time without timezone
  }
}

Complex Types

{
  "Array": {
    "element_type": "String" // Array element type
  }
}

{
  "Binary": {
    "max_length": 1024       // Maximum binary length
  }
}

{
  "Custom": {
    "type_name": "geometry"  // Database-specific custom type
  }
}

Constraints and Relationships

Primary Keys

{
  "name": "users_pkey",
  "columns": [
    "id"
  ]
}

Foreign Keys

{
  "name": "orders_user_fk",
  "columns": [
    "user_id"
  ],
  "referenced_table": "users",
  "referenced_schema": "public",
  "referenced_columns": [
    "id"
  ],
  "on_delete": "Cascade",
  "on_update": "Restrict"
}

Referential Actions:

  • Cascade: Delete/update cascades to related records
  • SetNull: Set foreign key to NULL
  • SetDefault: Set foreign key to default value
  • Restrict: Prevent deletion/update if references exist
  • NoAction: No automatic action

Check Constraints

{
  "name": "users_age_check",
  "constraint_type": "Check",
  "definition": "age >= 0 AND age <= 150",
  "enforced": true
}

Indexes

Index definitions include performance characteristics:

{
  "name": "users_email_idx",
  "table_name": "users",
  "schema": "public",
  "columns": [
    {
      "name": "email",
      "order": "asc",
      "nulls_order": "last"
    }
  ],
  "is_unique": true,
  "is_primary": false,
  "index_type": "btree",
  "comment": "Unique index on email for fast lookups"
}

Index Types:

  • btree: Balanced tree (default)
  • hash: Hash-based index
  • gin: Generalized inverted index
  • gist: Generalized search tree
  • spgist: Space-partitioned GiST

Data Sampling

Optional data samples provide insight into actual data:

{
  "samples": [
    {
      "table_name": "users",
      "schema_name": "public",
      "rows": [
        {
          "id": 1001,
          "email": "user1001@example.com",
          "created_at": "2024-01-15T09:00:00Z"
        },
        {
          "id": 1002,
          "email": "user1002@example.com",
          "created_at": "2024-01-15T09:15:00Z"
        }
      ],
      "sample_size": 2,
      "total_rows": 50000,
      "sampling_strategy": {
        "MostRecent": {
          "limit": 10
        }
      },
      "collected_at": "2024-01-15T10:30:00Z",
      "warnings": [
        "Large table - limited sample collected"
      ],
      "sample_status": "Complete"
    }
  ]
}

Table Sample Fields

FieldTypeRequiredDescription
table_nameStringYesName of the sampled table
schema_nameStringNoSchema/database name (null for databases without schemas)
rowsArrayYesArray of sampled row data as JSON objects
sample_sizeIntegerYesNumber of rows actually sampled
total_rowsIntegerNoEstimated total row count in the table
sampling_strategyObject/StringYesStrategy used for sampling (see below)
collected_atStringYesISO 8601 timestamp of when sample was collected
warningsArrayYesArray of warning messages (empty if no warnings)
sample_statusString/ObjectNoStatus of the sampling operation (see below)

Sampling Strategies

  • {"MostRecent": {"limit": 10}}: Latest N rows
  • {"Random": {"limit": 100}}: Random sample of N rows
  • "None": No sampling performed

Sample Status

The optional sample_status field tracks the outcome of the sampling operation. This field is backward-compatible and will be omitted when not set.

Status Values:

  • "Complete": Sampling completed successfully
  • {"PartialRetry": {"original_limit": 100}}: Sampling partially completed with a reduced limit due to constraints or errors
  • {"Skipped": {"reason": "Not implemented for this database"}}: Sampling was skipped with an explanation

Example with Complete Status:

{
  "table_name": "orders",
  "schema_name": "public",
  "rows": [...],
  "sample_size": 100,
  "sampling_strategy": {"MostRecent": {"limit": 100}},
  "collected_at": "2024-01-15T10:30:00Z",
  "warnings": [],
  "sample_status": "Complete"
}

Example with Skipped Status:

{
  "table_name": "large_archive",
  "schema_name": null,
  "rows": [],
  "sample_size": 0,
  "sampling_strategy": "None",
  "collected_at": "2024-01-15T10:30:00Z",
  "warnings": [],
  "sample_status": {
    "Skipped": {
      "reason": "Table exceeds size threshold"
    }
  }
}

Backward Compatibility Note: The sample_status field is optional and omitted when not explicitly set. Older schema files without this field remain fully compatible.

Multi-Database Collections

For server-level collections, the format supports multiple databases:

{
  "format_version": "1.0",
  "server_info": {
    "server_type": "PostgreSQL",
    "version": "13.7",
    "host": "localhost",
    "port": 5432,
    "total_databases": 5,
    "collected_databases": 3,
    "system_databases_excluded": 2,
    "connection_user": "dbadmin",
    "has_superuser_privileges": true,
    "collection_mode": {
      "MultiDatabase": {
        "discovered": 5,
        "collected": 3,
        "failed": 0
      }
    }
  },
  "databases": [
    /* Individual database schemas */
  ],
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 2500,
    "collector_version": "1.0.0",
    "warnings": []
  }
}

Collection Metadata

Every schema file includes metadata about the collection process:

{
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 1500,
    "collector_version": "1.0.0",
    "warnings": [
      "Large table 'audit_logs' - collection took 45 seconds",
      "Custom type 'geometry' not fully supported"
    ],
    "collector_options": {
      "include_system_tables": false,
      "sample_data": true,
      "max_sample_size": 1000
    }
  }
}

Validation Rules

Security Validation

The schema enforces strict security rules:

  • No credential fields: Field names cannot contain password, secret, token, etc.
  • No connection strings: Database URLs are automatically filtered
  • No sensitive patterns: Common credential patterns are rejected
  • Sanitized output: All error messages are credential-free

Data Validation

  • String lengths: Maximum 255 characters for names, 1000 for comments
  • Array limits: Maximum 1000 items in arrays
  • Nesting depth: Maximum 10 levels of object nesting
  • File size: Maximum 100MB per schema file

Type Validation

  • Required fields: All mandatory fields must be present
  • Data type consistency: Values must match declared types
  • Constraint validation: Foreign keys must reference valid tables
  • Index validation: Index columns must exist in referenced table

Usage Examples

Basic Schema Collection

# Collect PostgreSQL schema
dbsurveyor-collect postgres://user:pass@localhost/mydb --output schema.json

# Validate the output
dbsurveyor schema schema.json --validate

# Generate documentation
dbsurveyor schema schema.json --format markdown --output schema.md

Multi-Database Collection

# Collect all databases on server
dbsurveyor-collect postgres://admin:pass@localhost --all-databases --output server_schema.json

# Process specific database
dbsurveyor schema server_schema.json --database app_db --format json --output app_schema.json

Schema Validation

# Validate against JSON Schema
dbsurveyor validate schema.json

# Check for specific issues
dbsurveyor validate schema.json --check-security --check-constraints

Error Handling

Validation Errors

When validation fails, DBSurveyor provides detailed error information:

{
  "validation_errors": [
    {
      "path": "/tables/0/columns/1/data_type",
      "message": "Invalid data type: expected String, Integer, Boolean, Date, Json, Uuid, Array, Binary, Custom, or DateTime",
      "value": "VARCHAR",
      "suggestion": "Use {\"String\": {\"max_length\": 255}} instead"
    }
  ]
}

Collection Warnings

Warnings are included in the metadata for non-critical issues:

{
  "warnings": [
    "Large table 'audit_logs' (1.2M rows) - collection took 45 seconds",
    "Custom type 'geometry' not fully supported - using Custom type",
    "Table 'temp_users' appears to be temporary - may not persist"
  ]
}

Version Compatibility

Current Version: 1.0

  • Format: Stable and fully supported
  • Validation: Complete JSON Schema validation
  • Features: All documented features available
  • Backward Compatibility: N/A (first version)

Future Versions

The schema is designed for evolution:

  1. Additive Changes: New optional fields can be added
  2. Version Detection: Format version enables version-specific handling
  3. Migration Support: Tools will support upgrading between versions
  4. Deprecation Path: Old fields will be marked before removal

Integration

Documentation Tools

The schema format integrates with all DBSurveyor tools:

  • dbsurveyor-collect: Generates schema files
  • dbsurveyor: Processes and validates schemas
  • dbsurveyor-docs: Generates documentation from schemas
  • dbsurveyor-validate: Standalone validation tool

External Tools

The format is compatible with:

  • JSON Schema validators: jsonschema, ajv, etc.
  • Data analysis tools: pandas, jq, etc.
  • Documentation generators: Docusaurus, MkDocs, etc.
  • CI/CD pipelines: GitHub Actions, GitLab CI, etc.

Best Practices

Schema Collection

  1. Use descriptive names: Avoid generic names like β€œdb1”, β€œtest”
  2. Include comments: Add meaningful descriptions for tables and columns
  3. Sample strategically: Use sampling for large tables to avoid huge files
  4. Validate early: Check schemas immediately after collection

Schema Storage

  1. Version control: Track schema changes in Git
  2. Backup regularly: Keep historical schema versions
  3. Compress large files: Use .zst compression for schemas >1MB
  4. Secure access: Limit access to production schemas

Schema Processing

  1. Validate inputs: Always validate before processing
  2. Handle errors gracefully: Check collection status before proceeding
  3. Monitor performance: Track collection times for optimization
  4. Document changes: Keep records of schema evolution

Troubleshooting

Common Issues

Collection Fails with Permission Error

# Check database user privileges
dbsurveyor-collect postgres://user:pass@localhost/db --test-connection

# Verify user has SELECT on information_schema
GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO username;

Large Schema File Size

# Use compression
dbsurveyor-collect postgres://localhost/db --compress

# Limit data sampling
dbsurveyor-collect postgres://localhost/db --max-sample-size 100

Validation Errors

# Check schema format
dbsurveyor validate schema.json --verbose

# Fix common issues
dbsurveyor fix schema.json --output fixed_schema.json

Performance Optimization

  1. Connection pooling: Use connection pooling for large databases
  2. Parallel collection: Collect multiple databases simultaneously
  3. Selective sampling: Only sample essential tables
  4. Incremental updates: Collect only changed schemas

This specification provides a complete reference for the .dbsurveyor.json format, ensuring consistent, validated, and secure schema collection across all supported database types.

Database Support

DBSurveyor supports multiple database engines with comprehensive schema collection capabilities. This guide details the support level and specific features for each database type.

Supported Databases

DatabaseStatusFeature FlagDefaultVersion Support
PostgreSQLβœ… Fullpostgresqlβœ… Yes9.6+
SQLiteβœ… Fullsqliteβœ… Yes3.6+
MySQL🚧 Partialmysql❌ No5.7+, 8.0+
MongoDB🚧 Plannedmongodb❌ No4.0+
SQL Server🚧 Plannedmssql❌ No2017+

Legend:

  • βœ… Full: Complete feature support with comprehensive testing
  • ⚠️ Partial: Core features supported, some limitations
  • 🚧 Basic: Minimal support, under development

PostgreSQL Support

Status: βœ… Full Support (Default)
Feature Flag: postgresql
Driver: SQLx with Tokio runtime

Connection Examples

# Basic connection
dbsurveyor-collect postgres://user:password@localhost:5432/mydb

# With SSL
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"

# Connection pooling
dbsurveyor-collect "postgres://user:pass@localhost/db?pool_max_conns=5"

# Multiple schemas
dbsurveyor-collect "postgres://user:pass@localhost/db?search_path=public,custom"

Supported Objects

Object TypeSupportNotes
Tablesβœ… FullIncluding partitioned tables
Viewsβœ… FullRegular and materialized views
Indexesβœ… FullAll index types (B-tree, Hash, GiST, etc.)
Constraintsβœ… FullPK, FK, Check, Unique, Exclusion
Functionsβœ… FullSQL and PL/pgSQL functions
Proceduresβœ… FullStored procedures (PostgreSQL 11+)
Triggersβœ… FullRow and statement triggers
Typesβœ… FullCustom types, domains, enums
Extensionsβœ… FullInstalled extensions
Schemasβœ… FullMultiple schema support

PostgreSQL-Specific Features

-- Advanced data types
CREATE TABLE example (
    id SERIAL PRIMARY KEY,
    data JSONB,
    tags TEXT[],
    coordinates POINT,
    search_vector TSVECTOR
);

-- Custom types
CREATE TYPE status_enum AS ENUM ('active', 'inactive', 'pending');
CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+\.[^@]+$');

-- Advanced indexes
CREATE INDEX CONCURRENTLY idx_data_gin ON example USING GIN (data);
CREATE INDEX idx_search ON example USING GIN (search_vector);

Collection Output

{
  "database_info": {
    "name": "mydb",
    "version": "13.7",
    "encoding": "UTF8",
    "collation": "en_US.UTF-8"
  },
  "tables": [
    {
      "name": "example",
      "schema": "public",
      "columns": [
        {
          "name": "data",
          "data_type": {
            "Json": {}
          },
          "is_nullable": true
        }
      ]
    }
  ]
}

SQLite Support

Status: βœ… Full Support (Default)
Feature Flag: sqlite
Driver: SQLx with Tokio runtime

Connection Examples

# File path
dbsurveyor-collect sqlite:///path/to/database.db
dbsurveyor-collect /path/to/database.sqlite

# Read-only mode
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"

# In-memory database (for testing)
dbsurveyor-collect "sqlite://:memory:"

Supported Objects

Object TypeSupportNotes
Tablesβœ… FullIncluding WITHOUT ROWID tables
Viewsβœ… FullRegular views
Indexesβœ… FullB-tree and partial indexes
Constraints⚠️ PartialLimited constraint introspection
Triggersβœ… FullBEFORE, AFTER, INSTEAD OF
Virtual Tablesβœ… FullFTS, R-Tree, etc.
Attached DBsβœ… FullMultiple attached databases

SQLite-Specific Features

-- WITHOUT ROWID tables
CREATE TABLE example (
    id TEXT PRIMARY KEY,
    data TEXT
) WITHOUT ROWID;

-- Virtual tables
CREATE VIRTUAL TABLE docs_fts USING fts5(title, content);

-- Partial indexes
CREATE INDEX idx_active_users ON users(name) WHERE active = 1;

-- JSON support (SQLite 3.38+)
CREATE TABLE events (
    id INTEGER PRIMARY KEY,
    data JSON
);

Limitations

  • Limited constraint introspection (SQLite stores constraints as DDL text)
  • No stored procedures or functions
  • No custom types (uses affinity system)
  • No schemas (single namespace per database file)

MySQL Support

Status: ⚠️ Partial Support
Feature Flag: mysql (not default)
Driver: SQLx with Tokio runtime

Connection Examples

# Basic connection
dbsurveyor-collect mysql://root:password@localhost:3306/mydb

# With SSL
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=REQUIRED"

# Character set
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"

Supported Objects

Object TypeSupportNotes
Tablesβœ… FullAll storage engines
Viewsβœ… FullRegular views
Indexesβœ… FullPrimary, Unique, Index, Fulltext
Constraints⚠️ PartialPK, FK, Check (MySQL 8.0+)
Proceduresβœ… FullStored procedures
Functionsβœ… FullUser-defined functions
Triggersβœ… FullBEFORE, AFTER triggers
Eventsβœ… FullScheduled events

MySQL-Specific Features

-- Storage engines
CREATE TABLE innodb_table (
    id INT PRIMARY KEY AUTO_INCREMENT,
    data TEXT
) ENGINE=InnoDB;

-- Partitioning
CREATE TABLE partitioned (
    id INT,
    created_date DATE
) PARTITION BY RANGE (YEAR(created_date)) (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

-- Full-text indexes
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    FULLTEXT(title, content)
);

Known Limitations

  • Check constraints only supported in MySQL 8.0+
  • Limited JSON introspection compared to PostgreSQL
  • Some storage engine specific features not captured

Security Advisory

⚠️ RUSTSEC-2023-0071: MySQL support uses the RSA crate which has a known timing side-channel vulnerability. MySQL support is disabled by default. Use PostgreSQL or SQLite for production environments.

MongoDB Support

Status: 🚧 Basic Support
Feature Flag: mongodb (not default)
Driver: Official MongoDB Rust driver

Connection Examples

# Basic connection
dbsurveyor-collect mongodb://user:password@localhost:27017/mydb

# With authentication database
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?authSource=admin"

# Replica set
dbsurveyor-collect "mongodb://user:pass@host1,host2,host3/mydb?replicaSet=rs0"

Supported Objects

Object TypeSupportNotes
Collectionsβœ… FullDocument collections
Indexesβœ… FullSingle field, compound, text, geo
Schema Inferenceβœ… BasicInferred from document sampling
GridFS⚠️ PartialBasic GridFS collection detection
Views🚧 PlannedAggregation pipeline views

MongoDB-Specific Features

// Schema inference from documents
{
    "_id": ObjectId("..."),
    "name": "string",
    "age": "number",
    "tags": ["array", "of", "strings"],
    "address": {
        "street": "string",
        "city": "string"
    }
}

// Index types
db.users.createIndex({
    "name": 1
}) // Single field
db.users.createIndex({
    "name": 1,
    "age": -1
}) // Compound
db.articles.createIndex({
    "title": "text"
}) // Text search
db.locations.createIndex({
    "coordinates": "2dsphere"
}) // Geospatial

Current Limitations

  • Schema inference is sampling-based (may miss rare fields)
  • No aggregation pipeline analysis
  • Limited sharding information
  • No user-defined functions

SQL Server Support

Status: 🚧 Basic Support
Feature Flag: mssql (not default)
Driver: Tiberius (native TDS protocol)

Connection Examples

# Basic connection
dbsurveyor-collect mssql://sa:password@localhost:1433/mydb

# Windows Authentication (planned)
dbsurveyor-collect "mssql://localhost/mydb?trusted_connection=yes"

# Named instance
dbsurveyor-collect "mssql://sa:pass@localhost\\SQLEXPRESS/mydb"

Supported Objects

Object TypeSupportNotes
Tablesβœ… FullUser tables
Viewsβœ… FullRegular views
Indexes⚠️ PartialBasic index information
Constraints⚠️ PartialPK, FK constraints
Procedures🚧 PlannedStored procedures
Functions🚧 PlannedUser-defined functions
Triggers🚧 PlannedDML triggers

Current Limitations

  • Limited to basic table and view introspection
  • No stored procedure analysis yet
  • No advanced SQL Server features (CLR, XML, spatial)
  • Windows Authentication not yet supported

Feature Comparison Matrix

FeaturePostgreSQLSQLiteMySQLMongoDBSQL Server
Tablesβœ…βœ…βœ…βœ…βœ…
Viewsβœ…βœ…βœ…πŸš§βœ…
Indexesβœ…βœ…βœ…βœ…βš οΈ
Constraintsβœ…βš οΈβš οΈβŒβš οΈ
Proceduresβœ…βŒβœ…βŒπŸš§
Functionsβœ…βŒβœ…βŒπŸš§
Triggersβœ…βœ…βœ…βŒπŸš§
Custom Typesβœ…βŒβš οΈβŒπŸš§
JSON Supportβœ…βœ…βš οΈβœ…πŸš§
Multi-DBβœ…βš οΈβœ…βŒπŸš§

Building with Database Support

Default Build

# Includes PostgreSQL and SQLite
cargo build --release

Custom Database Selection

# PostgreSQL only
cargo build --release --no-default-features --features postgresql

# All databases
cargo build --release --all-features

# Specific combination
cargo build --release --no-default-features --features postgresql,mysql,encryption

Feature Dependencies

[features]
postgresql = ["sqlx", "sqlx/postgres", "sqlx/runtime-tokio-rustls"]
mysql = ["sqlx", "sqlx/mysql", "sqlx/runtime-tokio-rustls"]
sqlite = ["sqlx", "sqlx/sqlite", "sqlx/runtime-tokio-rustls"]
mongodb = ["dep:mongodb"]
mssql = ["dep:tiberius"]

Database-Specific Best Practices

PostgreSQL

# Use read-only user
CREATE USER dbsurveyor_readonly;
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_readonly;
GRANT USAGE ON SCHEMA public TO dbsurveyor_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_readonly;

# For multiple schemas
GRANT USAGE ON SCHEMA schema1, schema2 TO dbsurveyor_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA schema1, schema2 TO dbsurveyor_readonly;

MySQL

# Create read-only user
CREATE USER 'dbsurveyor_readonly'@'%' IDENTIFIED BY 'password';
GRANT SELECT ON mydb.* TO 'dbsurveyor_readonly'@'%';
GRANT SELECT ON information_schema.* TO 'dbsurveyor_readonly'@'%';

SQLite

# Ensure read permissions
chmod 644 /path/to/database.db

# Use read-only mode for safety
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"

MongoDB

// Create read-only user
use admin
db.createUser({
    user: "dbsurveyor_readonly",
    pwd: "password",
    roles: [{
            role: "read",
            db: "mydb"
        }, {
            role: "read",
            db: "config"
        } // For sharding info
    ]
})

Troubleshooting Database Connections

Connection Issues

# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db

# Check network connectivity
telnet localhost 5432  # PostgreSQL
telnet localhost 3306  # MySQL
telnet localhost 27017 # MongoDB
telnet localhost 1433  # SQL Server

Permission Issues

# PostgreSQL: Check permissions
psql -h localhost -U user -d db -c "\dt"

# MySQL: Check permissions
mysql -h localhost -u user -p -e "SHOW TABLES;" db

# SQLite: Check file permissions
ls -la /path/to/database.db

Driver Issues

# Check compiled features
dbsurveyor-collect list

# Verify feature compilation
cargo build --features postgresql --verbose

Roadmap

Planned Improvements

PostgreSQL:

  • Advanced partitioning support
  • Extension-specific object types
  • Performance statistics collection

MySQL:

  • Enhanced JSON column support
  • Partition pruning analysis
  • Storage engine optimization hints

MongoDB:

  • Aggregation pipeline analysis
  • Sharding topology mapping
  • Index usage statistics

SQL Server:

  • Complete stored procedure support
  • CLR integration analysis
  • Spatial data type support

General:

  • Cross-database schema comparison
  • Migration script generation
  • Performance benchmarking integration

Contributing Database Support

See the Contributing Guide for information on adding support for new database engines or improving existing support.

Troubleshooting

This guide helps you diagnose and resolve common issues with DBSurveyor.

Quick Diagnostics

Check Installation

# Verify binaries are installed and working
dbsurveyor-collect --version
dbsurveyor --version

# Check compiled features
dbsurveyor-collect list

# Test with minimal example
echo "CREATE TABLE test (id INTEGER);" | sqlite3 test.db
dbsurveyor-collect sqlite://test.db
rm test.db schema.dbsurveyor.json

Enable Debug Logging

# Enable debug logging for all modules
export RUST_LOG=debug
dbsurveyor-collect postgres://localhost/db

# Enable trace logging for specific modules
export RUST_LOG=dbsurveyor_collect=trace,dbsurveyor_core=debug
dbsurveyor-collect postgres://localhost/db

# Log to file
export RUST_LOG=debug
dbsurveyor-collect postgres://localhost/db 2> debug.log

Connection Issues

Database Connection Failures

Symptoms: Connection timeouts, authentication failures, network errors

PostgreSQL Connection Issues

# Test basic connectivity
ping localhost
telnet localhost 5432

# Test with psql
psql -h localhost -U user -d db -c "SELECT 1;"

# Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*.log

# Common connection string issues
# ❌ Wrong: postgres://user:pass@localhost/db:5432
# βœ… Correct: postgres://user:pass@localhost:5432/db

# SSL issues
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=disable"
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"

MySQL Connection Issues

# Test basic connectivity
telnet localhost 3306

# Test with mysql client
mysql -h localhost -u user -p -e "SELECT 1;" db

# Check MySQL logs
sudo tail -f /var/log/mysql/error.log

# Common issues
# Character set problems
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"

# SSL issues
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=DISABLED"

SQLite Connection Issues

# Check file exists and is readable
ls -la /path/to/database.db
file /path/to/database.db

# Test with sqlite3 command
sqlite3 /path/to/database.db ".tables"

# Permission issues
chmod 644 /path/to/database.db

# Use absolute paths
dbsurveyor-collect "sqlite:///$(pwd)/database.db"

MongoDB Connection Issues

# Test basic connectivity
telnet localhost 27017

# Test with mongo client
mongosh "mongodb://user:pass@localhost:27017/db"

# Authentication issues
dbsurveyor-collect "mongodb://user:pass@localhost/db?authSource=admin"

# Replica set issues
dbsurveyor-collect "mongodb://user:pass@host1,host2/db?replicaSet=rs0"

Permission and Authentication Issues

Symptoms: Access denied, insufficient privileges, authentication failed

PostgreSQL Permissions

-- Check current user permissions
SELECT current_user, session_user;

-- Check database access
SELECT datname FROM pg_database WHERE datname = 'your_db';

-- Check table permissions
SELECT schemaname, tablename 
FROM pg_tables 
WHERE schemaname NOT IN ('information_schema', 'pg_catalog');

-- Grant necessary permissions
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_user;
GRANT USAGE ON SCHEMA public TO dbsurveyor_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_user;

MySQL Permissions

-- Check current user
SELECT USER(), CURRENT_USER();

-- Check permissions
SHOW GRANTS FOR 'dbsurveyor_user'@'%';

-- Grant necessary permissions
GRANT SELECT ON mydb.* TO 'dbsurveyor_user'@'%';
GRANT SELECT ON information_schema.* TO 'dbsurveyor_user'@'%';
FLUSH PRIVILEGES;

MongoDB Permissions

// Check current user
db.runCommand({
    connectionStatus: 1
})

// Check permissions
use mydb
db.runCommand({
    usersInfo: "dbsurveyor_user"
})

// Grant read permissions
use admin
db.grantRolesToUser("dbsurveyor_user", [{
    role: "read",
    db: "mydb"
}])

Collection Issues

Schema Collection Failures

Symptoms: Partial collection, missing objects, collection errors

Missing Tables or Objects

# Enable verbose logging to see what's being collected
dbsurveyor-collect -vvv postgres://localhost/db

# Check if objects exist in database
psql -h localhost -U user -d db -c "\dt"  # PostgreSQL tables
mysql -u user -p -e "SHOW TABLES;" db     # MySQL tables
sqlite3 db.sqlite ".tables"               # SQLite tables

Large Database Timeouts

# Increase timeouts
dbsurveyor-collect --connect-timeout 60 --query-timeout 120 postgres://localhost/db

# Use throttling to reduce load
dbsurveyor-collect --throttle 1000 postgres://localhost/db

# Disable sample collection for speed
dbsurveyor-collect --sample 0 postgres://localhost/db

Memory Issues

# Monitor memory usage
top -p $(pgrep dbsurveyor-collect)

# Use compression to reduce memory usage
dbsurveyor-collect --compress postgres://localhost/db

# Process smaller chunks (multi-database collection)
dbsurveyor-collect --exclude-databases large_db1,large_db2 postgres://localhost

Output File Issues

Symptoms: Empty files, corrupted output, permission errors

File Permission Issues

# Check output directory permissions
ls -la $(dirname schema.dbsurveyor.json)

# Ensure write permissions
chmod 755 $(dirname schema.dbsurveyor.json)

# Use explicit output path
dbsurveyor-collect --output /tmp/schema.json postgres://localhost/db

Corrupted Output Files

# Validate schema file
dbsurveyor validate schema.dbsurveyor.json

# Check file size and format
ls -la schema.dbsurveyor.json
file schema.dbsurveyor.json

# Test JSON parsing
jq . schema.dbsurveyor.json > /dev/null

Compression/Encryption Issues

# Test compression
dbsurveyor-collect --compress postgres://localhost/db
zstd -t schema.dbsurveyor.json.zst  # Test compressed file

# Test encryption (will prompt for password)
dbsurveyor-collect --encrypt postgres://localhost/db
dbsurveyor validate schema.enc  # Test encrypted file

Documentation Generation Issues

Processing Failures

Symptoms: Generation errors, empty output, format issues

Input File Issues

# Validate input file first
dbsurveyor validate schema.dbsurveyor.json

# Check file format
file schema.dbsurveyor.json

# Test with minimal processing
dbsurveyor --format json schema.dbsurveyor.json

Encrypted File Issues

# Verify encryption format
dbsurveyor validate schema.enc

# Test decryption
dbsurveyor --format json schema.enc

# Check password
# Note: Passwords are case-sensitive and don't show characters

Large Schema Processing

# Use JSON format for large schemas (most efficient)
dbsurveyor --format json large-schema.json

# Monitor memory usage
top -p $(pgrep dbsurveyor)

# Process in parts (planned feature)
# Currently: split large schemas manually

Output Format Issues

Symptoms: Malformed output, missing content, rendering problems

HTML Generation Issues

# Test with simple format first
dbsurveyor --format markdown schema.json

# Check HTML output
dbsurveyor --format html schema.json
# Open in browser to check for issues

Mermaid Diagram Issues

# Generate Mermaid format
dbsurveyor --format mermaid schema.json

# Validate Mermaid syntax
# Copy content to https://mermaid.live/ for testing

Output Formatting and Progress Indicators

Symptoms: Progress spinners or colored output causing problems, garbled output with escape codes, missing progress indicators

Disabling Colors and Progress Spinners

For CI/CD pipelines, automation scripts, or non-interactive environments where colored output or progress spinners may cause issues:

# Disable ANSI color codes and progress spinners
export NO_COLOR=1
dbsurveyor-collect postgres://localhost/db
dbsurveyor --format markdown schema.json

# Alternative: Use TERM=dumb
TERM=dumb dbsurveyor-collect postgres://localhost/db
TERM=dumb dbsurveyor --format json schema.json

# For CI environments (typically set automatically)
export NO_COLOR=1
export TERM=dumb
dbsurveyor-collect postgres://localhost/db > collection.log

Common scenarios:

  • CI/CD pipelines: Set NO_COLOR=1 to prevent ANSI escape codes in logs
  • Log files: Progress spinners are automatically hidden when output is redirected
  • Automated scripts: Use TERM=dumb for consistent non-interactive behavior
  • Windows systems: Some terminals may display escape codes incorrectly

Troubleshooting Progress Indicators

# Check if output is going to a TTY
# Progress spinners only show when stdout is a terminal
test -t 1 && echo "TTY detected" || echo "Not a TTY"

# Force spinner to be hidden (for debugging)
export NO_COLOR=1
dbsurveyor --format html schema.json

# Check environment variables
echo "NO_COLOR=${NO_COLOR:-not set}"
echo "TERM=${TERM:-not set}"

Note: The tool automatically detects when output is piped or redirected and hides progress indicators. Setting NO_COLOR or TERM=dumb ensures consistent behavior across all environments.

Performance Issues

Slow Collection

Symptoms: Long collection times, high resource usage

Database Performance

# Use connection pooling
dbsurveyor-collect --max-connections 5 postgres://localhost/db

# Add throttling
dbsurveyor-collect --throttle 500 postgres://localhost/db

# Disable sample collection
dbsurveyor-collect --sample 0 postgres://localhost/db

# Monitor database load
# PostgreSQL: SELECT * FROM pg_stat_activity;
# MySQL: SHOW PROCESSLIST;

Network Performance

# Test network latency
ping database-host

# Use local connections when possible
dbsurveyor-collect postgres://localhost/db  # Better than remote

# Consider compression for remote databases
dbsurveyor-collect --compress postgres://remote-host/db

Memory Usage

Symptoms: High memory consumption, out of memory errors

Memory Optimization

# Monitor memory usage
ps aux | grep dbsurveyor
top -p $(pgrep dbsurveyor)

# Use streaming for large datasets (automatic)
# Reduce sample size
dbsurveyor-collect --sample 10 postgres://localhost/db

# Use compression
dbsurveyor-collect --compress postgres://localhost/db

Security Issues

Credential Exposure

Symptoms: Passwords in logs, credential leakage

Verify Credential Sanitization

# Check logs for credentials
export RUST_LOG=debug
dbsurveyor-collect postgres://user:secret@localhost/db 2>&1 | grep -i secret
# Should return no results

# Test credential sanitization
dbsurveyor-collect test postgres://user:secret@localhost/db
# Should show: postgres://user:****@localhost/db

Secure Credential Handling

# Use environment variables
export DATABASE_URL="postgres://user:secret@localhost/db"
dbsurveyor-collect

# Avoid shell history
set +o history  # Disable history
dbsurveyor-collect postgres://user:secret@localhost/db
set -o history  # Re-enable history

Encryption Issues

Symptoms: Encryption failures, decryption errors

Test Encryption

# Test encryption roundtrip
echo "test data" > test.txt
dbsurveyor-collect --encrypt sqlite://test.db
dbsurveyor validate schema.enc
rm test.db test.txt schema.enc

Password Issues

# Ensure password consistency
# Passwords are case-sensitive
# No visual feedback during password entry (security feature)

# Test with simple password first
# Use ASCII characters only initially

Build and Installation Issues

Compilation Failures

Symptoms: Build errors, missing dependencies, feature conflicts

Rust Toolchain Issues

# Update Rust toolchain
rustup update

# Check Rust version (minimum 1.87)
rustc --version

# Clean and rebuild
cargo clean
cargo build --release

Feature Compilation Issues

# Check available features
cargo build --help | grep -A 20 "FEATURES:"

# Build with specific features
cargo build --release --features postgresql,sqlite

# Debug feature compilation
cargo build --release --features postgresql --verbose

System Dependencies

# Ubuntu/Debian: Install system dependencies
sudo apt-get update
sudo apt-get install build-essential libssl-dev pkg-config

# For PostgreSQL support
sudo apt-get install libpq-dev

# For MySQL support
sudo apt-get install libmysqlclient-dev

# macOS: Install dependencies
brew install openssl pkg-config
# For PostgreSQL: brew install postgresql
# For MySQL: brew install mysql

Runtime Dependencies

Symptoms: Missing shared libraries, runtime errors

Check Dependencies

# Check binary dependencies
ldd target/release/dbsurveyor-collect  # Linux
otool -L target/release/dbsurveyor-collect  # macOS

# Test minimal functionality
dbsurveyor-collect --version
dbsurveyor-collect list

Getting Help

Collect Debug Information

# System information
uname -a
rustc --version
cargo --version

# DBSurveyor information
dbsurveyor-collect --version
dbsurveyor-collect list

# Feature compilation
cargo build --release --verbose 2>&1 | grep -i feature

# Runtime debug
export RUST_LOG=debug
dbsurveyor-collect test postgres://localhost/db 2> debug.log

Create Minimal Reproduction

# Create minimal test case
sqlite3 minimal.db "CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT);"
sqlite3 minimal.db "INSERT INTO test VALUES (1, 'test');"

# Test collection
dbsurveyor-collect sqlite://minimal.db

# Test documentation
dbsurveyor generate schema.dbsurveyor.json

# Clean up
rm minimal.db schema.dbsurveyor.json schema.md

Report Issues

When reporting issues, include:

  1. System Information: OS, Rust version, DBSurveyor version
  2. Command Used: Exact command that failed (sanitize credentials)
  3. Error Output: Complete error message and stack trace
  4. Debug Logs: Output with RUST_LOG=debug
  5. Minimal Reproduction: Smallest example that reproduces the issue

Security Note: Never include actual database credentials in issue reports. Use placeholder values like user:password@localhost/db.

Community Resources

Professional Support

For enterprise users requiring professional support, contact support@evilbitlabs.io for:

  • Priority issue resolution
  • Custom feature development
  • Integration consulting
  • Security auditing and compliance

Architecture

DBSurveyor follows a security-first, modular architecture designed for flexibility, maintainability, and offline operation. This document details the system architecture and design decisions.

System Overview

graph TB
    subgraph "User Environment"
        CLI[Command Line Interface]
        CONFIG[Configuration Files]
        ENV[Environment Variables]
    end
    
    subgraph "DBSurveyor Workspace"
        subgraph "dbsurveyor-collect"
            COLLECT_CLI[Collection CLI]
            ADAPTERS[Database Adapters]
            POOL[Connection Pooling]
        end
        
        subgraph "dbsurveyor-core"
            MODELS[Data Models]
            SECURITY[Security Module]
            ERROR[Error Handling]
            TRAITS[Adapter Traits]
        end
        
        subgraph "dbsurveyor"
            DOC_CLI[Documentation CLI]
            TEMPLATES[Template Engine]
            FORMATS[Output Formats]
            ANALYSIS[Schema Analysis]
        end
    end
    
    subgraph "External Systems"
        POSTGRES[(PostgreSQL)]
        MYSQL[(MySQL)]
        SQLITE[(SQLite)]
        MONGODB[(MongoDB)]
        SQLSERVER[(SQL Server)]
    end
    
    subgraph "Output Artifacts"
        JSON[Schema Files]
        DOCS[Documentation]
        SQL[SQL DDL]
        DIAGRAMS[ERD Diagrams]
    end
    
    CLI --> COLLECT_CLI
    CLI --> DOC_CLI
    CONFIG --> COLLECT_CLI
    CONFIG --> DOC_CLI
    ENV --> COLLECT_CLI
    
    COLLECT_CLI --> ADAPTERS
    ADAPTERS --> POOL
    ADAPTERS --> MODELS
    ADAPTERS --> SECURITY
    ADAPTERS --> ERROR
    ADAPTERS --> TRAITS
    
    POOL --> POSTGRES
    POOL --> MYSQL
    POOL --> SQLITE
    POOL --> MONGODB
    POOL --> SQLSERVER
    
    COLLECT_CLI --> JSON
    
    DOC_CLI --> TEMPLATES
    DOC_CLI --> FORMATS
    DOC_CLI --> ANALYSIS
    TEMPLATES --> MODELS
    FORMATS --> MODELS
    ANALYSIS --> MODELS
    
    JSON --> DOC_CLI
    DOC_CLI --> DOCS
    DOC_CLI --> SQL
    DOC_CLI --> DIAGRAMS

Crate Architecture

Workspace Structure

DBSurveyor uses a Cargo workspace with three main crates:

dbsurveyor/
β”œβ”€β”€ dbsurveyor-core/     # Shared library
β”œβ”€β”€ dbsurveyor-collect/  # Collection binary
β”œβ”€β”€ dbsurveyor/          # Documentation binary
└── Cargo.toml          # Workspace configuration

Dependency Graph

graph TD
    COLLECT[dbsurveyor-collect] --> CORE[dbsurveyor-core]
    DOC[dbsurveyor] --> CORE
    
    CORE --> SERDE[serde]
    CORE --> TOKIO[tokio]
    CORE --> SQLX[sqlx]
    CORE --> MONGO[mongodb]
    CORE --> CRYPTO[aes-gcm + argon2]
    
    COLLECT --> CLAP[clap]
    COLLECT --> RPASS[rpassword]
    COLLECT --> ZSTD[zstd]
    
    DOC --> ASKAMA[askama]
    DOC --> PULLDOWN[pulldown-cmark]

Core Library (dbsurveyor-core)

Module Structure

#![allow(unused)]
fn main() {
// dbsurveyor-core/src/lib.rs
pub mod adapters; // Database adapter traits and factory
pub mod error; // Comprehensive error handling
pub mod models; // Unified data models
pub mod security; // Encryption and credential protection

// Re-exports for public API
pub use adapters::{create_adapter, DatabaseAdapter};
pub use error::{DbSurveyorError, Result};
pub use models::{DatabaseSchema, DatabaseType};
}

Data Models

The core defines unified data structures that work across all database types:

#![allow(unused)]
fn main() {
// Unified schema representation
pub struct DatabaseSchema {
    pub format_version: String,
    pub database_info: DatabaseInfo,
    pub tables: Vec<Table>,
    pub views: Vec<View>,
    pub indexes: Vec<Index>,
    pub constraints: Vec<Constraint>,
    pub procedures: Vec<Procedure>,
    pub functions: Vec<Procedure>,
    pub triggers: Vec<Trigger>,
    pub custom_types: Vec<CustomType>,
    pub samples: Option<Vec<TableSample>>,
    pub collection_metadata: CollectionMetadata,
}

// Cross-database type mapping
pub enum UnifiedDataType {
    String { max_length: Option<u32> },
    Integer { bits: u8, signed: bool },
    Float { precision: Option<u8> },
    Boolean,
    DateTime { with_timezone: bool },
    Json,
    Array { element_type: Box<UnifiedDataType> },
    Custom { type_name: String },
}
}

Adapter Pattern

Database adapters implement a common trait for unified access:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait DatabaseAdapter: Send + Sync {
    async fn test_connection(&self) -> Result<()>;
    async fn collect_schema(&self) -> Result<DatabaseSchema>;
    fn database_type(&self) -> DatabaseType;
    fn supports_feature(&self, feature: AdapterFeature) -> bool;
    fn connection_config(&self) -> ConnectionConfig;
}
}

Factory Pattern

The adapter factory provides database-agnostic instantiation:

#![allow(unused)]
fn main() {
pub async fn create_adapter(connection_string: &str) -> Result<Box<dyn DatabaseAdapter>> {
    let database_type = detect_database_type(connection_string)?;
    
    match database_type {
        DatabaseType::PostgreSQL => {
            #[cfg(feature = "postgresql")]
            {
                let adapter = PostgresAdapter::new(connection_string).await?;
                Ok(Box::new(adapter))
            }
            #[cfg(not(feature = "postgresql"))]
            Err(DbSurveyorError::unsupported_feature("PostgreSQL"))
        }
        // ... other database types
    }
}
}

Security Architecture

Credential Protection

graph LR
    INPUT[Connection String] --> PARSE[URL Parser]
    PARSE --> CONFIG[Connection Config]
    PARSE --> CREDS[Credentials]
    
    CONFIG --> SANITIZE[Sanitization]
    CREDS --> ZEROIZE[Zeroizing Container]
    
    SANITIZE --> LOGS[Logs & Errors]
    ZEROIZE --> DATABASE[Database Connection]
    ZEROIZE --> MEMORY[Auto-Zero on Drop]

Implementation:

#![allow(unused)]
fn main() {
use zeroize::{Zeroize, Zeroizing};

#[derive(Zeroize)]
#[zeroize(drop)]
pub struct Credentials {
    pub username: Zeroizing<String>,
    pub password: Zeroizing<Option<String>>,
}

// Connection config never contains credentials
pub struct ConnectionConfig {
    pub host: String,
    pub port: Option<u16>,
    pub database: Option<String>,
    // No username/password fields
}
}

Encryption Architecture

graph TD
    PASSWORD[User Password] --> ARGON2[Argon2id KDF]
    SALT[Random Salt] --> ARGON2
    ARGON2 --> KEY[256-bit Key]
    
    DATA[Schema Data] --> AES[AES-GCM-256]
    KEY --> AES
    NONCE[Random Nonce] --> AES
    
    AES --> CIPHERTEXT[Encrypted Data]
    AES --> TAG[Auth Tag]
    
    ENCRYPTED[Encrypted File] --> ALGORITHM[Algorithm ID]
    ENCRYPTED --> NONCE
    ENCRYPTED --> CIPHERTEXT
    ENCRYPTED --> TAG
    ENCRYPTED --> KDF_PARAMS[KDF Parameters]
    ENCRYPTED --> SALT

Security Properties:

  • Confidentiality: AES-GCM-256 encryption
  • Integrity: 128-bit authentication tags
  • Authenticity: Authenticated encryption prevents tampering
  • Forward Secrecy: Random nonces prevent replay attacks
  • Key Security: Argon2id with memory-hard parameters

Database Adapter Architecture

Adapter Hierarchy

classDiagram
    class DatabaseAdapter {
        <<trait>>
        +test_connection() Result~()~
        +collect_schema() Result~DatabaseSchema~
        +database_type() DatabaseType
        +supports_feature(AdapterFeature) bool
    }
    
    class PostgresAdapter {
        -pool: PgPool
        -config: ConnectionConfig
        +new(connection_string) Result~Self~
    }
    
    class MySqlAdapter {
        -pool: MySqlPool
        -config: ConnectionConfig
        +new(connection_string) Result~Self~
    }
    
    class SqliteAdapter {
        -pool: SqlitePool
        -config: ConnectionConfig
        +new(connection_string) Result~Self~
    }
    
    class MongoAdapter {
        -client: Client
        -config: ConnectionConfig
        +new(connection_string) Result~Self~
    }
    
    DatabaseAdapter <|-- PostgresAdapter
    DatabaseAdapter <|-- MySqlAdapter
    DatabaseAdapter <|-- SqliteAdapter
    DatabaseAdapter <|-- MongoAdapter

Connection Pooling

Each adapter manages its own connection pool with security-focused defaults:

#![allow(unused)]
fn main() {
pub struct ConnectionConfig {
    pub connect_timeout: Duration, // Default: 30s
    pub query_timeout: Duration,   // Default: 30s
    pub max_connections: u32,      // Default: 10
    pub read_only: bool,           // Default: true
}
}

Feature Flags

Database support is controlled by feature flags for minimal binary size:

[features]
default = ["postgresql", "sqlite"]
postgresql = ["sqlx", "sqlx/postgres"]
mysql = ["sqlx", "sqlx/mysql"]
sqlite = ["sqlx", "sqlx/sqlite"]
mongodb = ["dep:mongodb"]
mssql = ["dep:tiberius"]

Error Handling Architecture

Error Hierarchy

#![allow(unused)]
fn main() {
#[derive(Debug, thiserror::Error)]
pub enum DbSurveyorError {
    #[error("Database connection failed")]
    Connection(#[from] ConnectionError),

    #[error("Schema collection failed: {context}")]
    Collection {
        context: String,
        source: Box<dyn std::error::Error>,
    },

    #[error("Configuration error: {message}")]
    Configuration { message: String },

    #[error("Encryption operation failed")]
    Encryption(#[from] EncryptionError),

    #[error("I/O operation failed: {context}")]
    Io {
        context: String,
        source: std::io::Error,
    },
}
}

Error Context Chain

graph TD
    USER_ERROR[User-Facing Error] --> CONTEXT[Error Context]
    CONTEXT --> SOURCE[Source Error]
    SOURCE --> ROOT[Root Cause]
    
    USER_ERROR --> SANITIZED[Sanitized Message]
    SANITIZED --> NO_CREDS[No Credentials]
    SANITIZED --> ACTIONABLE[Actionable Information]

Security Guarantee: All error messages are sanitized to prevent credential leakage.

CLI Architecture

Command Structure

graph TD
    CLI[CLI Entry Point] --> GLOBAL[Global Args]
    CLI --> COMMANDS[Commands]
    
    GLOBAL --> VERBOSE[Verbosity]
    GLOBAL --> QUIET[Quiet Mode]
    
    COMMANDS --> COLLECT[collect]
    COMMANDS --> TEST[test]
    COMMANDS --> LIST[list]
    
    COLLECT --> DB_URL[Database URL]
    COLLECT --> OUTPUT[Output Options]
    COLLECT --> SECURITY[Security Options]
    
    OUTPUT --> FORMAT[Format Selection]
    OUTPUT --> COMPRESSION[Compression]
    
    SECURITY --> ENCRYPTION[Encryption]
    SECURITY --> THROTTLE[Throttling]

Configuration Hierarchy

Configuration is loaded from multiple sources with clear precedence:

  1. Command Line Arguments (highest priority)
  2. Environment Variables
  3. Project Configuration (.dbsurveyor.toml)
  4. User Configuration (~/.config/dbsurveyor/config.toml)
  5. Default Values (lowest priority)

Documentation Generation Architecture

Template Engine

graph LR
    SCHEMA[Schema Data] --> ANALYSIS[Schema Analysis]
    ANALYSIS --> CONTEXT[Template Context]
    
    TEMPLATES[Askama Templates] --> ENGINE[Template Engine]
    CONTEXT --> ENGINE
    
    ENGINE --> MARKDOWN[Markdown Output]
    ENGINE --> HTML[HTML Output]
    ENGINE --> JSON[JSON Analysis]
    
    SCHEMA --> MERMAID[Mermaid Generator]
    MERMAID --> ERD[ERD Diagrams]
    
    SCHEMA --> SQL[SQL Generator]
    SQL --> DDL[DDL Scripts]

Output Format Pipeline

#![allow(unused)]
fn main() {
pub trait OutputGenerator {
    fn generate(&self, schema: &DatabaseSchema) -> Result<String>;
    fn file_extension(&self) -> &'static str;
    fn mime_type(&self) -> &'static str;
}

// Implementations for each format
impl OutputGenerator for MarkdownGenerator { ... }
impl OutputGenerator for HtmlGenerator { ... }
impl OutputGenerator for JsonGenerator { ... }
impl OutputGenerator for MermaidGenerator { ... }
}

Performance Architecture

Memory Management

graph TD
    STREAMING[Streaming Processing] --> BATCHES[Batch Processing]
    BATCHES --> MEMORY[Memory Limits]
    
    LARGE_TABLES[Large Tables] --> PAGINATION[Pagination]
    PAGINATION --> CHUNKS[Chunk Processing]
    
    CONNECTIONS[Connection Pooling] --> LIMITS[Connection Limits]
    LIMITS --> TIMEOUTS[Query Timeouts]
    
    COMPRESSION[Data Compression] --> ZSTD[Zstandard]
    ZSTD --> EFFICIENCY[Storage Efficiency]

Concurrency Model

// Async/await with Tokio runtime
#[tokio::main]
async fn main() -> Result<()> {
    // Connection pooling for concurrent queries
    let pool = PgPoolOptions::new()
        .max_connections(10)
        .connect_timeout(Duration::from_secs(30))
        .connect(&database_url).await?;
    
    // Concurrent schema collection
    let tables = collect_tables(&pool).await?;
    let views = collect_views(&pool).await?;
    let indexes = collect_indexes(&pool).await?;
    
    // Join all concurrent operations
    let (tables, views, indexes) = tokio::try_join!(
        collect_tables(&pool),
        collect_views(&pool),
        collect_indexes(&pool)
    )?;
}

Testing Architecture

Test Organization

tests/
β”œβ”€β”€ integration/          # End-to-end tests
β”‚   β”œβ”€β”€ postgres_tests.rs
β”‚   β”œβ”€β”€ mysql_tests.rs
β”‚   └── sqlite_tests.rs
β”œβ”€β”€ security/            # Security-focused tests
β”‚   β”œβ”€β”€ credential_tests.rs
β”‚   β”œβ”€β”€ encryption_tests.rs
β”‚   └── offline_tests.rs
└── fixtures/            # Test data
    β”œβ”€β”€ sample_schemas/
    └── test_databases/

Test Categories

graph TD
    TESTS[Test Suite] --> UNIT[Unit Tests]
    TESTS --> INTEGRATION[Integration Tests]
    TESTS --> SECURITY[Security Tests]
    TESTS --> PERFORMANCE[Performance Tests]
    
    UNIT --> MODELS[Model Tests]
    UNIT --> ADAPTERS[Adapter Tests]
    UNIT --> SECURITY_UNIT[Security Unit Tests]
    
    INTEGRATION --> POSTGRES[PostgreSQL Integration]
    INTEGRATION --> MYSQL[MySQL Integration]
    INTEGRATION --> SQLITE[SQLite Integration]
    INTEGRATION --> MONGODB[MongoDB Integration]
    
    SECURITY --> CREDENTIALS[Credential Protection]
    SECURITY --> ENCRYPTION[Encryption Validation]
    SECURITY --> OFFLINE[Offline Operation]
    
    PERFORMANCE --> BENCHMARKS[Criterion Benchmarks]
    PERFORMANCE --> MEMORY[Memory Usage Tests]
    PERFORMANCE --> CONCURRENCY[Concurrency Tests]

Build and Distribution Architecture

Feature Matrix

graph TD
    FEATURES[Feature Flags] --> DATABASES[Database Support]
    FEATURES --> SECURITY[Security Features]
    FEATURES --> OPTIONAL[Optional Features]
    
    DATABASES --> POSTGRES[postgresql]
    DATABASES --> MYSQL[mysql]
    DATABASES --> SQLITE[sqlite]
    DATABASES --> MONGODB[mongodb]
    DATABASES --> MSSQL[mssql]
    
    SECURITY --> ENCRYPTION[encryption]
    SECURITY --> COMPRESSION[compression]
    
    OPTIONAL --> TEMPLATES[templates]
    OPTIONAL --> ANALYSIS[analysis]

Binary Variants

DBSurveyor produces multiple binary variants to support different database environments. GoReleaser v2 builds 7 distinct binaries:

  • dbsurveyor - Documentation postprocessor (all features)
  • dbsurveyor-collect - Data collection tool with variants:
    • all - PostgreSQL, MySQL, SQLite, MongoDB, MSSQL
    • postgresql - PostgreSQL only
    • mysql - MySQL only
    • sqlite - SQLite only
    • mongodb - MongoDB only
    • mssql - MSSQL only

Each variant is built with specific feature flags:

# All features (default release artifacts)
cargo zigbuild --release --all-features -p=dbsurveyor-collect

# PostgreSQL-only variant
cargo zigbuild --release --no-default-features \
  --features=postgresql,compression,encryption -p=dbsurveyor-collect

# SQLite-only variant
cargo zigbuild --release --no-default-features \
  --features=sqlite,compression,encryption -p=dbsurveyor-collect

Artifact Naming: Release artifacts follow the pattern:

dbsurveyor_{variant}_{OS}_{arch}.{tar.gz|zip}

Examples:

  • dbsurveyor_all_Linux_x86_64.tar.gz
  • dbsurveyor_postgresql_Darwin_x86_64.tar.gz
  • dbsurveyor_sqlite_Windows_x86_64.zip

Deployment Architecture

Airgap Deployment

graph LR
    CONNECTED[Connected System] --> VENDOR[cargo vendor]
    VENDOR --> PACKAGE[Deployment Package]
    
    PACKAGE --> TRANSFER[Secure Transfer]
    TRANSFER --> AIRGAP[Airgap System]
    
    AIRGAP --> OFFLINE_BUILD[Offline Build]
    OFFLINE_BUILD --> BINARIES[DBSurveyor Binaries]

CI/CD Integration

DBSurveyor uses GoReleaser v2 with cargo-zigbuild for cross-compilation:

# GitHub Actions release workflow
  - name: Install Rust toolchain
    uses: dtolnay/rust-toolchain@stable
    with:
      toolchain: 1.93.1

  - name: Install Zig
    uses: mlugg/setup-zig@v2
    with:
      version: 0.13.0

  - name: Install cargo-zigbuild
    run: cargo install --locked cargo-zigbuild --version 0.19.8

  - name: Install Cosign
    uses: sigstore/cosign-installer@v3

  - name: Install Syft
    uses: anchore/sbom-action/download-syft@v0

  - name: Run GoReleaser
    uses: goreleaser/goreleaser-action@v6
    with:
      distribution: goreleaser
      version: ~> v2
      args: release --clean
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      HOMEBREW_TAP_TOKEN: ${{ secrets.HOMEBREW_TAP_TOKEN }}

Cross-Compilation Targets (6 platforms):

  • x86_64-unknown-linux-gnu - Linux x86_64 (glibc)
  • aarch64-unknown-linux-gnu - Linux ARM64 (glibc)
  • x86_64-unknown-linux-musl - Linux x86_64 (musl/Alpine)
  • x86_64-apple-darwin - macOS Intel
  • aarch64-apple-darwin - macOS Apple Silicon
  • x86_64-pc-windows-gnu - Windows x86_64

Security Features:

  • Cosign Keyless Signing: Checksums are signed using GitHub OIDC identity
  • Syft SBOM Generation: Software Bill of Materials for all archives
  • Reproducible Builds: Consistent timestamps via {{ .CommitTimestamp }}

This architecture ensures DBSurveyor maintains its security-first principles while providing flexibility, performance, and maintainability across all supported platforms and use cases.

Contributing

We welcome contributions to DBSurveyor! This guide will help you get started with contributing to the project.

Code of Conduct

DBSurveyor follows the Rust Code of Conduct. Please be respectful and inclusive in all interactions.

Getting Started

Prerequisites

  • Rust 1.93.1+ (MSRV)
  • Git
  • Docker (for integration tests)
  • Just task runner

Development Setup

# Clone the repository
git clone https://github.com/EvilBit-Labs/dbsurveyor.git
cd dbsurveyor

# Install development tools
just install

# Run initial checks
just dev

Project Structure

dbsurveyor/
β”œβ”€β”€ dbsurveyor-core/     # Shared library
β”œβ”€β”€ dbsurveyor-collect/  # Collection binary
β”œβ”€β”€ dbsurveyor/          # Documentation binary
β”œβ”€β”€ docs/                # Documentation source
β”‚   └── solutions/       # Documented solutions to past problems
└── justfile            # Development tasks

Development Workflow

Daily Development

# Format, lint, test, and check coverage
just dev

# Run specific test categories
just test-unit
just test-integration
just test-security

# Security validation
just security-full

# Pre-commit checks
just pre-commit

Code Quality Standards

DBSurveyor enforces strict quality standards:

  • Zero Warnings: cargo clippy -- -D warnings must pass
  • Test Coverage: 55% minimum coverage required (target: 80%, to be raised incrementally)
  • Security First: All code must pass security validation
  • Documentation: All public APIs must have /// documentation

Testing Requirements

All contributions must include appropriate tests:

#![allow(unused)]
fn main() {
// Unit tests in source files
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_credential_sanitization() {
        let config = ConnectionConfig::new("postgres://user:secret@host/db");
        let safe_display = config.to_safe_string();
        assert!(!safe_display.contains("secret"));
    }
}

// Integration tests in tests/ directory
#[tokio::test]
async fn test_postgres_collection() {
    let docker = testcontainers::clients::Cli::default();
    let postgres = docker.run(testcontainers::images::postgres::Postgres::default());
    // Test implementation
}

// Security tests are mandatory for security-sensitive code
#[tokio::test]
async fn test_no_credentials_in_output() {
    let schema = collect_schema("postgres://user:secret@localhost/db").await?;
    let json = serde_json::to_string(&schema)?;
    assert!(!json.contains("secret"));
}
}

Contribution Types

Bug Reports

When reporting bugs, please include:

  1. System Information: OS, Rust version, DBSurveyor version
  2. Reproduction Steps: Minimal example that reproduces the issue
  3. Expected vs Actual Behavior: Clear description of the problem
  4. Debug Information: Output with RUST_LOG=debug

Security Note: Never include actual database credentials in bug reports.

Feature Requests

For new features, please:

  1. Check Existing Issues: Avoid duplicates
  2. Describe Use Case: Why is this feature needed?
  3. Propose Implementation: High-level approach
  4. Consider Security: How does this maintain security guarantees?

Code Contributions

Pull Request Process

  1. Fork and Branch: Create a feature branch from main
  2. Implement Changes: Follow coding standards
  3. Add Tests: Comprehensive test coverage
  4. Update Documentation: Keep docs in sync
  5. Run Quality Checks: just dev must pass
  6. Submit PR: Clear description and context

Commit Standards

Follow Conventional Commits:

# Feature additions
feat(postgres): add connection pooling with timeout handling
feat(security): implement AES-GCM encryption with random nonces

# Bug fixes
fix(mysql): handle connection failures without exposing credentials
fix(core): ensure proper cleanup of sensitive data structures

# Security improvements
security(core): prevent credential leakage in error messages
security(encryption): add key derivation parameter validation

# Documentation
docs(readme): update installation instructions
docs(security): add encryption implementation details

Database Adapter Development

Adding New Database Support

To add support for a new database:

  1. Create Adapter Module:

    #![allow(unused)]
    fn main() {
    // dbsurveyor-core/src/adapters/newdb.rs
    pub struct NewDbAdapter {
        config: ConnectionConfig,
    }
    
    #[async_trait]
    impl DatabaseAdapter for NewDbAdapter {
        async fn test_connection(&self) -> Result<()> { ... }
        async fn collect_schema(&self) -> Result<DatabaseSchema> { ... }
        async fn sample_table(&self, table_ref: TableRef<'_>, config: &SamplingConfig) -> Result<TableSample> { ... }
        fn database_type(&self) -> DatabaseType { ... }
        fn supports_feature(&self, feature: AdapterFeature) -> bool { ... }
        fn connection_config(&self) -> ConnectionConfig { ... }
    }
    }
  2. Add Feature Flag:

    # Cargo.toml
    [features]
    newdb = ["dep:newdb-driver"]
    
  3. Update Factory:

    #![allow(unused)]
    fn main() {
    // dbsurveyor-core/src/adapters.rs
    match database_type {
        #[cfg(feature = "newdb")]
        DatabaseType::NewDb => {
            let adapter = NewDbAdapter::new(connection_string).await?;
            Ok(Box::new(adapter))
        }
        // ...
    }
    }
  4. Add Tests:

    #![allow(unused)]
    fn main() {
    // tests/integration/newdb_tests.rs
    #[tokio::test]
    async fn test_newdb_collection() {
        // Integration test with testcontainers
    }
    }

Database Adapter Requirements

All database adapters must:

  • Implement DatabaseAdapter trait completely
    • test_connection(): Verify database connectivity
    • collect_schema(): Collect full database schema information
    • sample_table(): Sample data from a specific table
    • database_type(): Return the database type
    • supports_feature(): Indicate supported features
    • connection_config(): Provide connection configuration
  • Use read-only operations only (SELECT, DESCRIBE, SHOW)
  • Handle connection timeouts (default: 30 seconds)
  • Sanitize credentials in all error messages
  • Support connection pooling where applicable
  • Include comprehensive tests with testcontainers
  • Document database-specific features and limitations

The sample_table() Method

The sample_table() method performs per-table sampling for a specific table:

#![allow(unused)]
fn main() {
async fn sample_table(
    &self,
    table_ref: TableRef<'_>,
    config: &SamplingConfig,
) -> Result<TableSample>;
}

Parameters:

  • table_ref: A TableRef containing the table name and optional schema name
  • config: A SamplingConfig with sampling parameters (sample size, throttle, etc.)

Returns: A TableSample that includes:

  • Sampled rows as JSON values
  • Sample metadata (size, total rows, strategy used)
  • Collection timestamp and any warnings
  • sample_status: An optional field indicating the sampling outcome

Sample Status Values:

  • Some(SampleStatus::Complete): Sampling completed successfully
  • Some(SampleStatus::PartialRetry { original_limit }): Sampling partially completed with a reduced limit
  • Some(SampleStatus::Skipped { reason }): Sampling was skipped (e.g., not implemented)
  • None: For backward compatibility with existing data

Implementation Notes:

  • The TableRef struct wraps the table name and optional schema for the method signature
  • Implementations should populate sample_status with SampleStatus::Complete on successful sampling
  • If sampling is not implemented, return SampleStatus::Skipped with an appropriate reason
  • The method supports optional schema qualification (e.g., public.users vs users)

Testing Database Adapters

# Test specific database adapter
just test-postgres
just test-mysql
just test-sqlite

# Test with real databases using testcontainers
cargo test --test postgres_integration -- --nocapture

# Security testing for new adapters
cargo test --test security_credential_protection

Security Contributions

Security-First Development

All contributions must maintain DBSurveyor’s security guarantees:

  1. No Credential Exposure: Never log or output credentials
  2. Offline Operation: No external network calls except to databases
  3. Encryption Security: Use AES-GCM with random nonces
  4. Memory Safety: Use zeroize for sensitive data

Security Review Process

Security-sensitive changes require additional review:

  1. Security Tests: Must include security-specific tests
  2. Threat Model: Consider impact on threat model
  3. Documentation: Update security documentation
  4. Review: Additional security-focused code review

Security Testing

#![allow(unused)]
fn main() {
// Example security test
#[tokio::test]
async fn test_new_feature_credential_security() {
    // Test that new feature doesn't leak credentials
    let result = new_feature("postgres://user:secret@localhost/db").await?;
    let output = format!("{:?}", result);
    assert!(!output.contains("secret"));
    assert!(!output.contains("user:secret"));
}
}

Documentation Contributions

Documentation Standards

  • User-Focused: Write for the end user
  • Security-Aware: Highlight security implications
  • Example-Rich: Include working code examples
  • Up-to-Date: Keep in sync with code changes

Documentation Types

  1. API Documentation: /// comments in code
  2. User Guide: Markdown files in docs/src/
  3. README: Project overview and quick start
  4. Security Documentation: Security features and guarantees

Building Documentation

# Build API documentation
cargo doc --all-features --document-private-items --open

# Build user guide
just docs

# Check documentation
just docs-check

Release Process

Version Management

DBSurveyor uses semantic versioning:

  • Major: Breaking changes
  • Minor: New features (backward compatible)
  • Patch: Bug fixes

Release Checklist

  1. Update Version: Bump version in Cargo.toml
  2. Update Changelog: Document all changes
  3. Run Full Tests: just security-full
  4. Update Documentation: Ensure docs are current
  5. Create Release: Tag and create GitHub release
  6. Verify Artifacts: Test release binaries

Community Guidelines

Communication

  • GitHub Issues: Bug reports and feature requests
  • Pull Requests: Code contributions and discussions
  • Security Issues: Email security@evilbitlabs.io

Review Process

  1. Automated Checks: CI must pass
  2. Code Review: Maintainer review required
  3. Security Review: For security-sensitive changes
  4. Documentation Review: For user-facing changes

Recognition

Contributors are recognized in:

  • CONTRIBUTORS.md file
  • Release notes
  • Git commit history

Development Environment

  • IDE: VS Code with Rust Analyzer
  • Git Hooks: Pre-commit hooks for quality checks
  • Testing: Nextest for faster test execution
  • Debugging: RUST_LOG=debug for detailed logging

Environment Variables

# Development environment
export RUST_LOG=debug
export DATABASE_URL="postgres://dev:dev@localhost/dev_db"

# Testing environment
export RUST_LOG=trace
export DBSURVEYOR_TEST_TIMEOUT=60

Docker Development

# Start test databases
docker-compose up -d postgres mysql mongodb

# Run integration tests
just test-integration

# Clean up
docker-compose down

Troubleshooting Development Issues

Common Issues

Build failures:

# Clean and rebuild
cargo clean
cargo build --all-features

# Update toolchain
rustup update

Test failures:

# Run specific test with output
cargo test test_name -- --nocapture

# Run with debug logging
RUST_LOG=debug cargo test test_name

Clippy warnings:

# Fix automatically where possible
cargo clippy --fix --allow-dirty

# Check specific warnings
cargo clippy --workspace --all-targets --all-features -- -D warnings

Getting Help

  • Documentation: Check existing docs first
  • Issues: Search existing GitHub issues
  • Code: Look at similar implementations
  • Community: Ask questions in GitHub discussions

License

DBSurveyor is licensed under the Apache License 2.0. By contributing, you agree to license your contributions under the same license.

All contributions must include appropriate copyright headers:

#![allow(unused)]
fn main() {
// Copyright 2024 EvilBit Labs
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
}

Contributor License Agreement

By submitting a pull request, you represent that:

  1. You have the right to license your contribution
  2. You agree to license it under the Apache License 2.0
  3. Your contribution is your original work

Thank you for contributing to DBSurveyor! Your contributions help make database documentation more secure and accessible for everyone.

Testing

DBSurveyor uses a comprehensive testing strategy to ensure security, reliability, and correctness. This guide covers the testing framework, practices, and how to run tests.

Testing Philosophy

  • Security-First Testing: All tests must verify security guarantees
  • Comprehensive Coverage: 55% minimum test coverage with cargo llvm-cov (target: 80%, to be raised incrementally)
  • Real Database Integration: Use testcontainers for authentic testing
  • Zero Warnings: All test code must pass cargo clippy -- -D warnings

Test Organization

Test Categories

DBSurveyor organizes tests into distinct categories:

graph TD
    TESTS[Test Suite] --> UNIT[Unit Tests]
    TESTS --> INTEGRATION[Integration Tests]
    TESTS --> SECURITY[Security Tests]
    TESTS --> PERFORMANCE[Performance Tests]
    
    UNIT --> MODELS[Model Tests]
    UNIT --> ADAPTERS[Adapter Unit Tests]
    UNIT --> UTILS[Utility Tests]
    
    INTEGRATION --> POSTGRES[PostgreSQL Tests]
    INTEGRATION --> MYSQL[MySQL Tests]
    INTEGRATION --> SQLITE[SQLite Tests]
    INTEGRATION --> MONGODB[MongoDB Tests]
    
    SECURITY --> CREDENTIALS[Credential Protection]
    SECURITY --> ENCRYPTION[Encryption Tests]
    SECURITY --> OFFLINE[Offline Operation]
    
    PERFORMANCE --> BENCHMARKS[Criterion Benchmarks]
    PERFORMANCE --> MEMORY[Memory Tests]
    PERFORMANCE --> CONCURRENCY[Concurrency Tests]

Test File Structure

dbsurveyor/
β”œβ”€β”€ dbsurveyor-core/
β”‚   └── src/
β”‚       β”œβ”€β”€ lib.rs              # Unit tests in #[cfg(test)]
β”‚       β”œβ”€β”€ models.rs           # Model unit tests
β”‚       └── security.rs         # Security unit tests
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ integration/
β”‚   β”‚   β”œβ”€β”€ postgres_tests.rs   # PostgreSQL integration
β”‚   β”‚   β”œβ”€β”€ mysql_tests.rs      # MySQL integration
β”‚   β”‚   └── sqlite_tests.rs     # SQLite integration
β”‚   β”œβ”€β”€ security/
β”‚   β”‚   β”œβ”€β”€ credential_tests.rs # Credential protection
β”‚   β”‚   β”œβ”€β”€ encryption_tests.rs # Encryption validation
β”‚   β”‚   └── offline_tests.rs    # Offline operation
β”‚   └── fixtures/
β”‚       β”œβ”€β”€ sample_schemas/     # Test schema files
β”‚       └── test_data/          # Test databases
└── benches/
    β”œβ”€β”€ collection.rs           # Collection benchmarks
    └── documentation.rs        # Documentation benchmarks

Running Tests

Basic Test Commands

# Run all tests with nextest (default)
just test

# Run specific test categories
just test-unit           # Unit tests only
just test-integration    # Integration tests only
just test-security       # Security tests only

# Run tests for specific databases
just test-postgres       # PostgreSQL tests
just test-mysql          # MySQL tests
just test-sqlite         # SQLite tests

Advanced Test Options

# Run tests with different profiles
just test-dev            # Development profile (fast)
just test-ci             # CI profile (comprehensive)

# Run with verbose output
just test-verbose

# Run specific test by name
cargo nextest run test_postgres_connection

# Run tests with debug logging
RUST_LOG=debug cargo nextest run

Coverage Testing

# Generate coverage report (55% minimum required, target: 80%)
just coverage

# Generate HTML coverage report
just coverage-html

# CI-friendly coverage
just coverage-ci

Unit Testing

Test Structure

Unit tests are co-located with source code using #[cfg(test)] modules:

#![allow(unused)]
fn main() {
// dbsurveyor-core/src/models.rs
impl DatabaseSchema {
    pub fn new(database_info: DatabaseInfo) -> Self {
        // Implementation
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_database_schema_creation() {
        let db_info = DatabaseInfo::new("test_db".to_string());
        let schema = DatabaseSchema::new(db_info);

        assert_eq!(schema.format_version, "1.0");
        assert_eq!(schema.database_info.name, "test_db");
        assert_eq!(schema.object_count(), 0);
    }

    #[test]
    fn test_with_warning() {
        let db_info = DatabaseInfo::new("test_db".to_string());
        let schema = DatabaseSchema::new(db_info);

        let schema = schema.with_warning("Test warning".to_string());

        assert_eq!(schema.collection_metadata.warnings.len(), 1);
        assert_eq!(schema.collection_metadata.warnings[0], "Test warning");
    }
}
}

Security Unit Tests

All security-sensitive code must include security-focused unit tests:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod security_tests {
    use super::*;

    #[test]
    fn test_credential_sanitization() {
        let url = "postgres://user:secret123@localhost:5432/db";
        let sanitized = redact_database_url(url);

        // Verify credentials are sanitized
        assert!(!sanitized.contains("secret123"));
        assert!(sanitized.contains("user:****"));
        assert!(sanitized.contains("localhost:5432/db"));
    }

    #[test]
    fn test_connection_config_display() {
        let config = ConnectionConfig::new("localhost".to_string())
            .with_port(5432)
            .with_database("testdb".to_string());

        let display_output = format!("{}", config);

        // Should show connection info but never credentials
        assert!(display_output.contains("localhost:5432"));
        assert!(display_output.contains("testdb"));
        assert!(!display_output.contains("password"));
        assert!(!display_output.contains("secret"));
    }
}
}

Integration Testing

Database Integration Tests

Integration tests use testcontainers to spin up real databases:

#![allow(unused)]
fn main() {
// tests/integration/postgres_tests.rs
use testcontainers::{clients, images};
use dbsurveyor_core::adapters::create_adapter;

#[tokio::test]
async fn test_postgres_schema_collection() {
    let docker = clients::Cli::default();
    let postgres = docker.run(images::postgres::Postgres::default());

    let port = postgres.get_host_port_ipv4(5432);
    let database_url = format!(
        "postgres://postgres:postgres@localhost:{}/postgres",
        port
    );

    // Wait for PostgreSQL to be ready
    wait_for_postgres(&database_url).await;

    // Create test schema
    let pool = sqlx::PgPool::connect(&database_url).await
        .expect("Failed to connect to PostgreSQL");

    sqlx::query("CREATE TABLE IF NOT EXISTS users (id INT PRIMARY KEY, name TEXT)")
        .execute(&pool)
        .await
        .expect("Failed to create test table");

    // Test schema collection
    let adapter = create_adapter(&database_url).await
        .expect("Failed to create adapter");

    let schema = adapter.collect_schema().await
        .expect("Failed to collect schema");

    // Verify schema was collected correctly
    assert!(!schema.tables.is_empty());
    
    let users_table = schema.tables.iter()
        .find(|t| t.name == "users")
        .expect("Users table not found");

    assert_eq!(users_table.name, "users");
    assert_eq!(users_table.columns.len(), 2);

    // Clean up
    pool.close().await;
}

async fn wait_for_postgres(database_url: &str) {
    let max_attempts = 30;
    let mut attempts = 0;

    while attempts < max_attempts {
        match sqlx::PgPool::connect(database_url).await {
            Ok(pool) => {
                pool.close().await;
                return;
            }
            Err(_) => {
                attempts += 1;
                tokio::time::sleep(std::time::Duration::from_millis(500)).await;
            }
        }
    }

    panic!("PostgreSQL failed to become ready after {} attempts", max_attempts);
}
}

MySQL Integration Tests

#![allow(unused)]
fn main() {
// tests/integration/mysql_tests.rs
#[tokio::test]
async fn test_mysql_schema_collection() {
    let docker = clients::Cli::default();
    let mysql = docker.run(
        images::mysql::Mysql::default()
            .with_root_password("testpass")
            .with_database("testdb")
    );

    let port = mysql.get_host_port_ipv4(3306);
    let database_url = format!(
        "mysql://root:testpass@localhost:{}/testdb",
        port
    );

    // Wait for MySQL to be ready with exponential backoff
    wait_for_mysql(&database_url).await;

    let adapter = create_adapter(&database_url).await
        .expect("Failed to create MySQL adapter");

    let schema = adapter.collect_schema().await
        .expect("Failed to collect MySQL schema");

    assert!(!schema.tables.is_empty());
}
}

SQLite Integration Tests

#![allow(unused)]
fn main() {
// tests/integration/sqlite_tests.rs
use tempfile::tempdir;

#[tokio::test]
async fn test_sqlite_schema_collection() {
    let temp_dir = tempdir().unwrap();
    let db_path = temp_dir.path().join("test.db");

    // Create test database
    let conn = sqlx::SqlitePool::connect(&format!("sqlite://{}", db_path.display())).await?;

    sqlx::query("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT NOT NULL)")
        .execute(&conn).await?;

    conn.close().await;

    // Test schema collection
    let database_url = format!("sqlite://{}", db_path.display());
    let adapter = create_adapter(&database_url).await
        .expect("Failed to create SQLite adapter");

    let schema = adapter.collect_schema().await
        .expect("Failed to collect SQLite schema");

    assert!(schema.tables.iter().any(|t| t.name == "users"));
}
}

Security Testing

Credential Protection Tests

Security tests verify that credentials never appear in outputs:

#![allow(unused)]
fn main() {
// tests/security/credential_tests.rs
#[tokio::test]
async fn test_no_credentials_in_schema_output() -> Result<(), Box<dyn std::error::Error>> {
    let database_url = "postgres://testuser:secret123@localhost:5432/testdb";

    // Create mock schema (in real test, would collect from database)
    let db_info = DatabaseInfo::new("testdb".to_string());
    let schema = DatabaseSchema::new(db_info);
    
    let json_output = serde_json::to_string(&schema)?;

    // Verify no sensitive data is present
    assert!(!json_output.contains("secret123"));
    assert!(!json_output.contains("testuser:secret123"));
    assert!(!json_output.contains("password"));
    assert!(!json_output.contains("secret"));

    Ok(())
}

#[test]
fn test_error_message_sanitization() {
    let connection_url = "postgres://user:secret@host:5432/db";
    
    // Simulate connection error
    let error = DbSurveyorError::connection_failed("Connection timeout");
    let error_message = format!("{}", error);
    
    // Error message should not contain credentials
    assert!(!error_message.contains("secret"));
    assert!(!error_message.contains("user:secret"));
}
}

Encryption Tests

Comprehensive encryption testing ensures security guarantees:

#![allow(unused)]
fn main() {
// tests/security/encryption_tests.rs
#[cfg(feature = "encryption")]
mod encryption_tests {
    use dbsurveyor_core::security::encryption::{encrypt_data, decrypt_data};

    #[tokio::test]
    async fn test_encryption_roundtrip() {
        let original_data = b"sensitive database schema data";
        let password = "test_password_123";

        // Encrypt the data
        let encrypted = encrypt_data(original_data, password).unwrap();

        // Verify encrypted data structure
        assert_eq!(encrypted.algorithm, "AES-GCM-256");
        assert_eq!(encrypted.nonce.len(), 12); // 96 bits
        assert_eq!(encrypted.auth_tag.len(), 16); // 128 bits
        assert!(!encrypted.ciphertext.is_empty());

        // Decrypt the data
        let decrypted = decrypt_data(&encrypted, password).unwrap();

        // Verify roundtrip
        assert_eq!(original_data, &decrypted[..]);
    }

    #[tokio::test]
    async fn test_nonce_uniqueness() {
        let data = b"test data for nonce uniqueness";
        let password = "same_password";

        // Generate multiple encryptions
        let encrypted1 = encrypt_data(data, password).unwrap();
        let encrypted2 = encrypt_data(data, password).unwrap();
        let encrypted3 = encrypt_data(data, password).unwrap();

        // Nonces should be different (random)
        assert_ne!(encrypted1.nonce, encrypted2.nonce);
        assert_ne!(encrypted2.nonce, encrypted3.nonce);
        assert_ne!(encrypted1.nonce, encrypted3.nonce);

        // All should decrypt to same plaintext
        let decrypted1 = decrypt_data(&encrypted1, password).unwrap();
        let decrypted2 = decrypt_data(&encrypted2, password).unwrap();
        let decrypted3 = decrypt_data(&encrypted3, password).unwrap();

        assert_eq!(data, &decrypted1[..]);
        assert_eq!(data, &decrypted2[..]);
        assert_eq!(data, &decrypted3[..]);
    }

    #[tokio::test]
    async fn test_wrong_password_fails() {
        let data = b"secret data";
        let correct_password = "correct_password";
        let wrong_password = "wrong_password";

        let encrypted = encrypt_data(data, correct_password).unwrap();

        // Decryption with wrong password should fail
        let result = decrypt_data(&encrypted, wrong_password);
        assert!(result.is_err());
    }
}
}

Offline Operation Tests

Verify that DBSurveyor works completely offline:

#![allow(unused)]
fn main() {
// tests/security/offline_tests.rs
#[tokio::test]
async fn test_airgap_compatibility() -> Result<(), Box<dyn std::error::Error>> {
    // Simulate airgap environment by testing without network access
    let schema_data = include_bytes!("../fixtures/sample_schema.json");
    let schema: DatabaseSchema = serde_json::from_slice(schema_data)?;

    // All processing should work offline
    let documentation = generate_documentation(&schema, OutputFormat::Markdown).await?;
    assert!(!documentation.is_empty());

    let json_export = generate_documentation(&schema, OutputFormat::Json).await?;
    assert!(!json_export.is_empty());

    Ok(())
}
}

Performance Testing

Benchmarks

Performance benchmarks use Criterion:

#![allow(unused)]
fn main() {
// benches/collection.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use dbsurveyor_core::models::DatabaseSchema;

fn bench_schema_serialization(c: &mut Criterion) {
    let schema = create_large_test_schema(1000); // 1000 tables

    c.bench_function("schema_to_json", |b| {
        b.iter(|| serde_json::to_string(black_box(&schema)))
    });

    c.bench_function("schema_to_markdown", |b| {
        b.iter(|| generate_markdown_documentation(black_box(&schema)))
    });
}

fn create_large_test_schema(table_count: usize) -> DatabaseSchema {
    let db_info = DatabaseInfo::new("benchmark_db".to_string());
    let mut schema = DatabaseSchema::new(db_info);

    for i in 0..table_count {
        let table = create_test_table(&format!("table_{}", i));
        schema.tables.push(table);
    }

    schema
}

criterion_group!(benches, bench_schema_serialization);
criterion_main!(benches);
}

Memory Testing

#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_memory_usage_large_schema() {
    let initial_memory = get_memory_usage();
    
    // Process large schema
    let large_schema = create_schema_with_tables(10000);
    let _documentation = generate_documentation(&large_schema, OutputFormat::Html).await?;
    
    let final_memory = get_memory_usage();
    let memory_increase = final_memory - initial_memory;
    
    // Ensure memory usage is reasonable (< 100MB for 10k tables)
    assert!(memory_increase < 100 * 1024 * 1024);
}
}

Test Configuration

Nextest Configuration

DBSurveyor uses nextest for enhanced test execution:

# .config/nextest.toml
[profile.default]
retries = 2
test-threads = 4
failure-output = "immediate"
success-output = "never"

# Test groups for different types of tests
[[profile.default.overrides]]
filter = "test(integration)"
test-group = "integration"
max-threads = 2              # Limit concurrent container tests

[[profile.default.overrides]]
filter = "test(security)"
test-group = "security"
max-threads = 1           # Security tests run sequentially

[[profile.default.overrides]]
filter = "test(unit)"
test-group = "unit"
max-threads = 8       # Unit tests can run in parallel

CI Test Configuration

# .github/workflows/test.yml
  - name: Run Tests
    run: |
      # Run tests with CI profile
      cargo nextest run --profile ci --workspace --all-features

      # Generate coverage
      cargo llvm-cov --lcov --output-path lcov.info

      # Security validation
      just security-full

Test Data and Fixtures

Test Data Generation

#![allow(unused)]
fn main() {
// tests/fixtures/mod.rs
pub fn create_test_schema() -> DatabaseSchema {
    let db_info = DatabaseInfo::new("test_db".to_string());
    let mut schema = DatabaseSchema::new(db_info);

    schema.tables = vec![
        create_test_table("users"),
        create_test_table("orders"),
        create_test_table("products"),
    ];

    schema
}

pub fn create_test_table(name: &str) -> Table {
    Table {
        name: name.to_string(),
        schema: Some("public".to_string()),
        columns: vec![Column {
            name: "id".to_string(),
            data_type: UnifiedDataType::Integer {
                bits: 32,
                signed: true,
            },
            is_nullable: false,
            is_primary_key: true,
            is_auto_increment: true,
            default_value: None,
            comment: None,
            ordinal_position: 1,
        }],
        primary_key: Some(PrimaryKey {
            name: Some(format!("{}_pkey", name)),
            columns: vec!["id".to_string()],
        }),
        foreign_keys: vec![],
        indexes: vec![],
        constraints: vec![],
        comment: None,
        row_count: Some(100),
    }
}
}

Fixture Files

tests/fixtures/
β”œβ”€β”€ sample_schemas/
β”‚   β”œβ”€β”€ postgres_sample.json
β”‚   β”œβ”€β”€ mysql_sample.json
β”‚   └── sqlite_sample.json
β”œβ”€β”€ test_databases/
β”‚   β”œβ”€β”€ small_db.sql
β”‚   β”œβ”€β”€ medium_db.sql
β”‚   └── large_db.sql
└── encrypted_samples/
    β”œβ”€β”€ encrypted_schema.enc
    └── compressed_schema.zst

Testing Best Practices

Test Quality Standards

  • Comprehensive: Test happy path, error cases, and edge conditions
  • Isolated: Tests should not depend on external services (except testcontainers)
  • Deterministic: Tests must produce consistent results
  • Fast: Unit tests should complete in milliseconds
  • Secure: No real credentials in test code; use explicit dummy values only

Common Testing Patterns

#![allow(unused)]
fn main() {
// Use Result<(), Box<dyn std::error::Error>> for test functions
#[tokio::test]
async fn test_function() -> Result<(), Box<dyn std::error::Error>> {
    // Test implementation
    Ok(())
}

// Test both Ok and Err cases
#[test]
fn test_validation() {
    // Test success case
    assert!(validate_input("valid").is_ok());
    
    // Test failure cases
    assert!(validate_input("").is_err());
    assert!(validate_input("invalid").is_err());
}

// Use assert_matches! for pattern matching
#[test]
fn test_error_types() {
    let result = operation_that_fails();
    assert_matches!(result, Err(DbSurveyorError::Configuration { .. }));
}
}

Security Testing Requirements

Every security-sensitive function must have tests that verify:

  1. No credential leakage in any output
  2. Proper error sanitization in all error paths
  3. Secure memory handling with automatic cleanup
  4. Offline operation without external dependencies

Continuous Integration

GitHub Actions Integration

name: Test Suite

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable

      - name: Install tools
        run: just install

      - name: Run tests
        run: just test-ci

      - name: Security validation
        run: just security-full

      - name: Coverage
        run: just coverage-ci

Test Reporting

  • Coverage Reports: Uploaded to Codecov
  • Test Results: JUnit XML format for CI integration
  • Performance: Criterion benchmark results
  • Security: Security test results and audit reports

This comprehensive testing strategy ensures DBSurveyor maintains its security guarantees while providing reliable functionality across all supported platforms and databases.