Introduction
DBSurveyor is a security-focused, offline-first database analysis and documentation toolchain designed for security professionals, database administrators, and DevOps teams.
Core Purpose
Generate comprehensive, secure database schema documentation for PostgreSQL, MySQL, SQLite, and MongoDB databases without compromising security or requiring persistent network connectivity.
Key Features
- Dual-Binary Architecture: Separate collector (
dbsurveyor-collect) and postprocessor (dbsurveyor) for flexible workflows - Database Support: PostgreSQL (β ), SQLite (β ), MySQL (π§), MongoDB (π§), SQL Server (π§)
- Security-First Design: AES-GCM encryption, credential protection, zero telemetry
- Offline Operation: Complete functionality in air-gapped environments
- Documentation Generation: Markdown (β ), JSON analysis (β ), HTML/ERD (π§ in development)
- Shell Completion Support: Built-in completion scripts for bash, zsh, fish, PowerShell, and elvish
- Progress Indicators: Visual feedback with spinners for long-running operations (schema loading, decryption, documentation generation)
- Environment-Aware Output: Respects
NO_COLORandTERM=dumbenvironment variables to disable color output and progress indicators
Security Guarantees
- NO NETWORK CALLS: Operates completely offline after initial installation, beyond the collector connection to the target database
- NO TELEMETRY: Zero data collection, usage tracking, or external reporting
- NO CREDENTIALS IN OUTPUTS: Database credentials never stored in output files
- AES-GCM ENCRYPTION: Industry-standard authenticated encryption for sensitive outputs
- AIRGAP COMPATIBLE: Full functionality in air-gapped environments
Target Users
- Red Team Operators: Post-compromise database enumeration in contested environments
- Blue Team Analysts: Auditing unknown systems and inherited databases
- Database Administrators: Comprehensive documentation and compliance reporting
- Security Analysts: Database security assessment and risk identification
- System Administrators: Exploring inherited or legacy databases with minimal documentation
Architecture Overview
DBSurveyor uses a dual-binary architecture that separates data collection from documentation generation:
graph LR
A[Database] --> B[dbsurveyor-collect]
B --> C[Schema File]
C --> D[dbsurveyor]
D --> E[Documentation]
C -.-> F[Encrypted]
C -.-> G[Compressed]
E --> H[Markdown]
E --> I[HTML]
E --> J[SQL DDL]
E --> K[Mermaid ERD]
This separation provides several benefits:
- Security: Collection can happen in secure environments, documentation generation offline
- Flexibility: Multiple documentation formats from a single collection
- Portability: Schema files can be transferred between environments
- Auditability: Clear separation between data collection and processing
Installation
Pre-Built Binaries
Pre-built binaries are available on the Releases page for Linux, macOS, and Windows.
Download the appropriate binary for your platform and add it to your PATH.
Homebrew
For macOS and Linux users with Homebrew:
brew install EvilBit-Labs/tap/dbsurveyor
From Source
Prerequisites
- Rust 1.93.1+ (MSRV)
- Git
Clone and Build
git clone https://github.com/EvilBit-Labs/dbsurveyor.git
cd dbsurveyor
cargo build --release
The compiled binaries will be available in target/release/.
Feature Selection
Control which database engines are compiled in using feature flags:
# Build with all database support
cargo build --release --all-features
# Build with specific databases only
cargo build --release --features postgresql,mysql,encryption
# Build minimal version for airgap environments
cargo build --release --no-default-features --features sqlite
Feature Flags
| Feature | Description | Dependencies |
|---|---|---|
postgresql | PostgreSQL support | sqlx with postgres driver |
mysql | MySQL support | sqlx with mysql driver |
sqlite | SQLite support | sqlx with sqlite driver |
mongodb | MongoDB support | mongodb crate |
mssql | SQL Server support | tiberius crate |
compression | Zstandard compression | zstd crate |
encryption | AES-GCM encryption | aes-gcm, argon2 crates |
Default Features
default = ["postgresql", "sqlite"]
The default build includes PostgreSQL and SQLite support, which covers the most common use cases while maintaining a reasonable binary size.
Binary Variants
DBSurveyor provides two main binaries:
dbsurveyor-collect
The database collection tool that connects to databases and extracts schema information.
- Default Features:
postgresql,sqlite - Optional Features:
mysql,mongodb,mssql,compression,encryption
dbsurveyor
The documentation generator that processes collected schema files.
- Default Features: None (minimal dependencies)
- Optional Features:
compression,encryption
Development Setup
For development work, install the complete toolchain:
# Install development dependencies
just install
# This installs:
# - Rust toolchain components (clippy, rustfmt)
# - Cargo tools (audit, deny, llvm-cov, nextest)
# - Security tools (syft for SBOM generation)
# - Documentation tools (mdbook and plugins)
Verification
Verify your installation:
# Check binary versions
./target/release/dbsurveyor-collect --version
./target/release/dbsurveyor --version
# Test with SQLite (no external database required)
echo "CREATE TABLE test (id INTEGER);" | sqlite3 test.db
./target/release/dbsurveyor-collect sqlite://test.db
./target/release/dbsurveyor schema.dbsurveyor.json
# Clean up
rm test.db schema.dbsurveyor.json schema.md
Airgap Installation
For air-gapped environments:
-
Prepare on connected system:
# Download dependencies cargo fetch # Create vendor directory cargo vendor vendor # Build minimal version cargo build --release --no-default-features --features sqlite -
Transfer to airgap system:
- Copy entire project directory including
vendor/ - Copy built binaries from
target/release/
- Copy entire project directory including
-
Use offline:
# Use vendored dependencies cargo build --release --offline --no-default-features --features sqlite
Troubleshooting
Common Issues
Compilation fails with missing dependencies:
- Ensure you have the latest Rust toolchain:
rustup update - Check feature flags match your requirements
Database driver compilation errors:
- Install system dependencies for your target databases
- For PostgreSQL:
libpq-dev(Ubuntu) orpostgresql-devel(RHEL) - For MySQL:
libmysqlclient-dev(Ubuntu) ormysql-devel(RHEL)
Permission errors:
- Ensure you have write permissions to the target directory
- Use
cargo install --root ~/.localfor user-local installation
Getting Help
- Check the Troubleshooting section
- Review GitHub Issues
- Consult the CLI Reference for command-specific help
Quick Start
This guide will get you up and running with DBSurveyor in minutes.
Prerequisites
- DBSurveyor installed (see Installation)
- Access to a database (PostgreSQL, MySQL, SQLite, or MongoDB)
Basic Workflow
DBSurveyor follows a two-step process:
- Collect schema from database β schema file
- Generate documentation from schema file β reports
graph LR
A[Database] --> B[dbsurveyor-collect]
B --> C[schema.dbsurveyor.json]
C --> D[dbsurveyor]
D --> E[Documentation]
Example 1: SQLite Database
The simplest example using SQLite (no server setup required):
# Create a sample SQLite database
sqlite3 sample.db << EOF
CREATE TABLE users (
id INTEGER PRIMARY KEY,
username TEXT NOT NULL,
email TEXT UNIQUE,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE posts (
id INTEGER PRIMARY KEY,
user_id INTEGER,
title TEXT NOT NULL,
content TEXT,
FOREIGN KEY (user_id) REFERENCES users(id)
);
INSERT INTO users (username, email) VALUES
('alice', 'alice@example.com'),
('bob', 'bob@example.com');
EOF
# Step 1: Collect schema
dbsurveyor-collect sqlite://sample.db
# Step 2: Generate documentation
dbsurveyor generate schema.dbsurveyor.json
# View the generated documentation
cat schema.md
Example 2: PostgreSQL Database
# Step 1: Collect schema from PostgreSQL
dbsurveyor-collect postgres://user:password@localhost:5432/mydb
# Step 2: Generate Markdown documentation (HTML is placeholder)
dbsurveyor generate schema.dbsurveyor.json --format markdown
# View the generated documentation
cat schema.md
Example 3: Encrypted Schema Collection
For sensitive environments, encrypt the schema file:
# Collect with encryption (will prompt for password)
dbsurveyor-collect --encrypt postgres://user:pass@localhost/db
# Generate documentation from encrypted file (will prompt for password)
dbsurveyor generate schema.enc
Common Commands
Collection Commands
# Basic collection
dbsurveyor-collect <database-url>
# Test connection without collecting
dbsurveyor-collect test <database-url>
# Collect with compression
dbsurveyor-collect --compress <database-url>
# Collect all databases on server
dbsurveyor-collect --all-databases postgres://user:pass@localhost
# List supported database types
dbsurveyor-collect list
Documentation Commands
# Generate Markdown (default)
dbsurveyor generate schema.dbsurveyor.json
# Generate HTML report
dbsurveyor --format html schema.json
# Generate SQL DDL
dbsurveyor sql schema.json --dialect postgresql
# Analyze schema statistics
dbsurveyor analyze schema.json --detailed
# Validate schema file
dbsurveyor validate schema.json
Output Files
DBSurveyor generates several types of files:
Schema Files (from collector)
schema.dbsurveyor.json- Standard JSON format (specification)schema.dbsurveyor.json.zst- Compressed format (with--compress)schema.enc- Encrypted format (with--encrypt)
Documentation Files (from generator)
schema.md- Markdown documentation (β implemented)schema_analysis.json- Analysis report (β implemented)schema.html- HTML report with search (π§ placeholder)schema.sql- SQL DDL reconstruction (π§ placeholder)schema.mmd- Mermaid ERD diagram (π§ placeholder)
Next Steps
Now that you have the basics working:
- Explore Database Support: Learn about database-specific features
- Security Features: Understand encryption and security options
- Advanced Configuration: Customize behavior with configuration options
- CLI Reference: Master all available command-line options
Troubleshooting Quick Start Issues
Connection Issues
# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db
# Check if database is accessible
ping localhost
telnet localhost 5432 # PostgreSQL default port
Permission Issues
# Ensure user has read permissions
# PostgreSQL: GRANT SELECT ON ALL TABLES IN SCHEMA public TO username;
# MySQL: GRANT SELECT ON database.* TO 'username'@'%';
File Not Found
# Check if schema file was created
ls -la *.dbsurveyor.json *.enc *.zst
# Verify file format
file schema.dbsurveyor.json
For more detailed troubleshooting, see the Troubleshooting Guide.
Database Collection
The dbsurveyor-collect tool connects to databases and extracts comprehensive schema information. This guide covers all collection features and options.
Basic Collection
Simple Schema Collection
# Collect from PostgreSQL
dbsurveyor-collect postgres://user:password@localhost:5432/mydb
# Collect from SQLite
dbsurveyor-collect sqlite:///path/to/database.db
# Collect from MySQL
dbsurveyor-collect mysql://root:password@localhost:3306/mydb
Connection Testing
Always test your connection before collecting:
# Test connection without collecting schema
dbsurveyor-collect test postgres://user:pass@localhost/db
# Example output:
# β Connection test successful
# Connection to PostgreSQL database successful
Advanced Collection Options
Output Formats
Control how the collected schema is stored:
# Standard JSON (default)
dbsurveyor-collect --output schema.json postgres://localhost/db
# Compressed with Zstandard
dbsurveyor-collect --compress --output schema.json.zst postgres://localhost/db
# Encrypted with AES-GCM (prompts for password)
dbsurveyor-collect --encrypt --output schema.enc postgres://localhost/db
# Both compressed and encrypted
dbsurveyor-collect --compress --encrypt postgres://localhost/db
Multi-Database Collection
Collect schemas from all databases on a server:
# Collect all accessible databases
dbsurveyor-collect --all-databases postgres://user:pass@localhost
# Include system databases (postgres, template0, etc.)
dbsurveyor-collect --all-databases --include-system-databases postgres://localhost
# Exclude specific databases
dbsurveyor-collect --all-databases --exclude-databases postgres,template0,template1 postgres://localhost
Performance and Stealth Options
# Throttle operations for stealth (1 second delay between operations)
dbsurveyor-collect --throttle 1000 postgres://localhost/db
# Limit sample data collection
dbsurveyor-collect --sample 50 postgres://localhost/db
# Verbose logging for debugging
dbsurveyor-collect -vvv postgres://localhost/db
What Gets Collected
DBSurveyor extracts comprehensive database metadata:
Core Schema Objects
- Tables: Names, schemas, columns, data types, constraints
- Views: Definitions, columns, dependencies
- Indexes: Names, columns, uniqueness, types
- Constraints: Primary keys, foreign keys, check constraints, unique constraints
- Procedures: Stored procedures with parameters and definitions
- Functions: User-defined functions with signatures
- Triggers: Event triggers with timing and definitions
- Custom Types: User-defined types, enums, domains
Metadata Information
- Database Info: Name, version, size, encoding, collation
- Collection Metadata: Timestamp, duration, warnings, collector version
- Statistics: Row counts, object counts, relationship mappings
Sample Data (Optional)
When enabled, DBSurveyor can collect sample data:
# Collect sample data (100 rows per table by default)
dbsurveyor-collect --sample 100 postgres://localhost/db
# Disable sample collection
dbsurveyor-collect --sample 0 postgres://localhost/db
Security Note: Sample data may contain sensitive information. Review outputs before sharing.
Database-Specific Features
PostgreSQL
# Full PostgreSQL collection
dbsurveyor-collect postgres://user:pass@localhost:5432/mydb
Collected Objects:
- Tables, views, materialized views
- Indexes (B-tree, Hash, GiST, SP-GiST, GIN, BRIN)
- Constraints (PK, FK, Check, Unique, Exclusion)
- Functions, procedures, triggers
- Custom types, domains, enums
- Extensions and schemas
PostgreSQL-Specific:
- JSONB columns and indexes
- Array types and operations
- Inheritance relationships
- Partitioned tables
MySQL
# MySQL collection (requires --features mysql)
dbsurveyor-collect mysql://root:password@localhost:3306/mydb
Collected Objects:
- Tables, views
- Indexes (Primary, Unique, Index, Fulltext, Spatial)
- Constraints (PK, FK, Check, Unique)
- Stored procedures, functions, triggers
- Events and routines
MySQL-Specific:
- Storage engines (InnoDB, MyISAM, etc.)
- Partitioning information
- Auto-increment values
- Character sets and collations
SQLite
# SQLite collection
dbsurveyor-collect sqlite:///path/to/database.db
dbsurveyor-collect /path/to/database.sqlite # Alternative format
Collected Objects:
- Tables, views
- Indexes
- Triggers
- Constraints (limited support)
SQLite-Specific:
- ROWID and WITHOUT ROWID tables
- Virtual tables
- Attached databases
- PRAGMA settings
MongoDB
# MongoDB collection (requires --features mongodb)
dbsurveyor-collect mongodb://user:pass@localhost:27017/mydb
Collected Objects:
- Collections and their schemas
- Indexes (single field, compound, text, geospatial)
- Document structure analysis
- Field statistics and types
MongoDB-Specific:
- Schema inference from documents
- Index usage statistics
- Sharding information
- GridFS collections
Security Considerations
Read-Only Operations
DBSurveyor only performs read operations:
SELECTstatements for data retrievalDESCRIBEorSHOWstatements for metadata- Information schema queries
- System catalog queries
No write operations are ever performed.
Credential Handling
# Use environment variables to avoid command history
export DATABASE_URL="postgres://user:pass@localhost/db"
dbsurveyor-collect
# Credentials are sanitized in all logs and error messages
# "postgres://user:pass@localhost/db" becomes "postgres://user:****@localhost/db"
Connection Security
# Use SSL/TLS connections when available
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"
# Connection timeouts prevent hanging
# Default: 30 seconds for connection, 30 seconds for queries
Troubleshooting Collection Issues
Connection Problems
# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db
# Check network connectivity
ping localhost
telnet localhost 5432 # PostgreSQL port
# Verify credentials
psql -h localhost -U user -d db -c "SELECT 1;"
Permission Issues
# PostgreSQL: Grant read permissions
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_user;
GRANT USAGE ON SCHEMA public TO dbsurveyor_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_user;
# MySQL: Grant read permissions
GRANT SELECT ON mydb.* TO 'dbsurveyor_user'@'%';
# SQLite: Ensure file read permissions
chmod 644 /path/to/database.db
Large Database Handling
# Use compression for large schemas
dbsurveyor-collect --compress postgres://localhost/large_db
# Throttle operations to reduce load
dbsurveyor-collect --throttle 500 postgres://localhost/large_db
# Exclude large or unnecessary databases
dbsurveyor-collect --all-databases --exclude-databases logs,temp,backup postgres://localhost
Memory and Performance
# Monitor collection progress
dbsurveyor-collect -v postgres://localhost/db
# For very large databases, consider collecting specific schemas
# (This feature is planned for future releases)
Output File Structure
The collected schema file contains:
{
"format_version": "1.0",
"database_info": {
"name": "mydb",
"version": "13.7",
"size_bytes": 1048576,
"encoding": "UTF8"
},
"tables": [...],
"views": [...],
"indexes": [...],
"constraints": [...],
"procedures": [...],
"functions": [...],
"triggers": [...],
"custom_types": [...],
"collection_metadata": {
"collected_at": "2024-01-15T10:30:00Z",
"collection_duration_ms": 5432,
"collector_version": "0.1.0",
"warnings": []
}
}
Next Steps
After collecting schema data:
- Generate Documentation: Use
dbsurveyorto create reports - Analyze Schema: Use analysis commands to understand the database structure
- Secure Storage: Consider encrypting schema files for sensitive databases
- Version Control: Track schema changes over time
See the Documentation Generation guide for the next steps.
Documentation Generation
The dbsurveyor tool processes collected schema files and generates comprehensive documentation in multiple formats. This guide covers all documentation generation features.
Basic Documentation Generation
Generate Markdown Documentation
# Generate Markdown (default format)
dbsurveyor generate schema.dbsurveyor.json
# Explicit format specification
dbsurveyor --format markdown schema.dbsurveyor.json
# Custom output file
dbsurveyor --format markdown --output database-docs.md schema.json
Generate HTML Reports
# Generate HTML report with search functionality
dbsurveyor --format html schema.dbsurveyor.json
# Custom output location
dbsurveyor --format html --output reports/database.html schema.json
Output Formats
DBSurveyor supports multiple documentation formats:
Markdown Documentation
Best for: Version control, wikis, README files
dbsurveyor --format markdown schema.json
Features:
- Clean, readable text format
- Table of contents with navigation
- Code blocks for SQL examples
- Compatible with GitHub, GitLab, etc.
Example Output Structure:
# Database Schema: mydb
## Summary
- **Tables**: 15
- **Views**: 3
- **Indexes**: 28
## Tables
### users
- **Schema**: public
- **Rows**: ~1,250
| Column | Type | Nullable | Default |
|--------|------|----------|---------|
| id | INTEGER | NO | nextval('users_id_seq') |
| username | VARCHAR(50) | NO | |
| email | VARCHAR(255) | YES | |
HTML Reports
Best for: Interactive browsing, presentations, sharing
dbsurveyor --format html schema.json
Features:
- Interactive search and filtering
- Collapsible sections
- Syntax highlighting
- Responsive design
- Offline-capable (no external dependencies)
JSON Analysis
Best for: Programmatic processing, integration with other tools
dbsurveyor --format json schema.json
Features:
- Structured data for automation
- Statistics and metrics
- Relationship analysis
- Machine-readable format
Example Output:
{
"database_name": "mydb",
"analysis": {
"table_count": 15,
"view_count": 3,
"index_count": 28,
"constraint_count": 42
},
"relationships": [
{
"from_table": "orders",
"to_table": "users",
"relationship_type": "foreign_key"
}
]
}
Mermaid ERD Diagrams
Best for: Visual database design, architecture documentation
dbsurveyor --format mermaid schema.json
Features:
- Entity Relationship Diagrams
- Visual table relationships
- Compatible with Mermaid.js
- Can be embedded in Markdown
Example Output:
erDiagram
users {
int id PK
string username
string email
}
orders {
int id PK
int user_id FK
datetime created_at
}
users ||--o{ orders : "has many"
Advanced Features
SQL DDL Reconstruction
Generate SQL DDL scripts to recreate the database structure:
# PostgreSQL DDL
dbsurveyor sql schema.json --dialect postgresql --output recreate.sql
# MySQL DDL
dbsurveyor sql schema.json --dialect mysql --output mysql-schema.sql
# Generic SQL (ANSI standard)
dbsurveyor sql schema.json --dialect generic --output portable.sql
Supported Dialects:
postgresql- PostgreSQL-specific syntaxmysql- MySQL-specific syntaxsqlite- SQLite-specific syntaxsqlserver- SQL Server-specific syntaxgeneric- ANSI SQL standard
Schema Analysis
Get detailed insights about your database structure:
# Basic analysis
dbsurveyor analyze schema.json
# Detailed analysis with statistics
dbsurveyor analyze schema.json --detailed
Analysis Output:
Schema Analysis: mydb
================
Tables: 15
Views: 3
Indexes: 28
Constraints: 42
Detailed Analysis:
- Procedures: 5
- Functions: 12
- Triggers: 8
- Custom Types: 3
Relationship Analysis:
- Foreign Key Relationships: 18
- Self-Referencing Tables: 2
- Orphaned Tables: 1
Schema Validation
Verify schema file integrity and format:
dbsurveyor validate schema.dbsurveyor.json
Validation Checks:
- File format correctness
- Schema version compatibility
- Data integrity verification
- Warning detection
Working with Encrypted Files
DBSurveyor can process encrypted schema files:
# Process encrypted file (will prompt for password)
dbsurveyor generate schema.enc
# All formats work with encrypted input
dbsurveyor --format html schema.enc
dbsurveyor sql schema.enc --dialect postgresql
Security Notes:
- Password prompts are secure (no echo)
- Decryption happens in memory only
- Original encrypted file remains unchanged
Working with Compressed Files
Process compressed schema files automatically:
# Compressed files are detected automatically
dbsurveyor generate schema.dbsurveyor.json.zst
# Works with all output formats
dbsurveyor --format html schema.json.zst
Data Redaction
Control sensitive data visibility in generated documentation:
Redaction Modes
# No redaction (show all data)
dbsurveyor --redact-mode none schema.json
# Minimal redaction (only obvious sensitive fields)
dbsurveyor --redact-mode minimal schema.json
# Balanced redaction (recommended default)
dbsurveyor --redact-mode balanced schema.json
# Conservative redaction (maximum privacy)
dbsurveyor --redact-mode conservative schema.json
# Disable all redaction
dbsurveyor --no-redact schema.json
Redaction Examples
| Mode | Password Field | Email Field | ID Field |
|---|---|---|---|
none | password123 | user@example.com | 12345 |
minimal | [REDACTED] | user@example.com | 12345 |
balanced | [REDACTED] | [EMAIL] | 12345 |
conservative | [REDACTED] | [REDACTED] | [ID] |
Batch Processing
Process multiple schema files:
# Process all schema files in directory
for file in *.dbsurveyor.json; do
dbsurveyor --format html "$file"
done
# Generate multiple formats from one schema
for format in markdown html mermaid; do
dbsurveyor --format "$format" schema.json
done
Integration Examples
CI/CD Pipeline
# GitHub Actions example
- name: Generate Database Documentation
run: |
dbsurveyor-collect postgres://${{ secrets.DB_URL }}
dbsurveyor --format html --output docs/database.html schema.dbsurveyor.json
dbsurveyor --format markdown --output docs/DATABASE.md schema.dbsurveyor.json
Documentation Website
# Generate documentation for website
dbsurveyor --format html --output public/database.html schema.json
dbsurveyor --format mermaid --output docs/erd.mmd schema.json
# Include in static site generator
cp schema.html docs/_includes/database-schema.html
Version Control
# Track schema changes over time
git add schema.dbsurveyor.json
git commit -m "Update database schema documentation"
# Generate diff-friendly markdown
dbsurveyor --format markdown schema.json
git add schema.md
git commit -m "Update schema documentation"
Customization and Templating
Template Customization
DBSurveyor uses the Askama template engine. Future versions will support custom templates:
# Custom template (planned feature)
dbsurveyor --template custom.html.j2 --format html schema.json
Output Customization
# Custom CSS for HTML output (planned feature)
dbsurveyor --format html --css custom.css schema.json
# Custom header/footer (planned feature)
dbsurveyor --format html --header header.html --footer footer.html schema.json
Performance Considerations
Large Schema Files
# For large schemas, use streaming processing
dbsurveyor --format json schema.json # Most efficient
# HTML generation may be slower for very large schemas
dbsurveyor --format html large-schema.json # May take time
Memory Usage
- JSON format: Minimal memory usage
- Markdown format: Low memory usage
- HTML format: Moderate memory usage (includes search index)
- Mermaid format: Low memory usage
Troubleshooting
Common Issues
File not found:
# Check file exists and has correct extension
ls -la *.dbsurveyor.json *.enc *.zst
Permission denied:
# Check file permissions
chmod 644 schema.dbsurveyor.json
Invalid format:
# Validate schema file first
dbsurveyor validate schema.json
Encryption password issues:
# Ensure you're using the correct password
# Password prompts don't show characters for security
Debug Mode
# Enable verbose logging
dbsurveyor -vvv generate schema.json
# Check file format
file schema.dbsurveyor.json
Next Steps
After generating documentation:
- Review Output: Check generated documentation for accuracy
- Share Securely: Consider redaction levels for different audiences
- Integrate: Include documentation in your workflow or website
- Automate: Set up regular documentation updates
- Version Control: Track documentation changes over time
See the Configuration guide for advanced customization options.
Security Features
DBSurveyor is built with security-first principles that are non-negotiable. This document details the comprehensive security features and guarantees.
Core Security Guarantees
1. Offline-Only Operation
- NO NETWORK CALLS: Operates completely offline after initial installation
- NO TELEMETRY: Zero data collection, usage tracking, or external reporting
- NO AUTO-UPDATES: Manual control over all software updates
- AIRGAP COMPATIBLE: Full functionality in air-gapped environments
2. Credential Protection
Database credentials are never stored, logged, or included in any output files.
Implementation Details
- Immediate Sanitization: Connection strings sanitized in all error messages and logs
- Secure Memory Handling: Credentials automatically zeroed using the
zeroizecrate - Separate Handling: Credentials parsed separately from connection configuration
- No Serialization: Credential structures are never serialized to disk
Example of Credential Sanitization
#![allow(unused)]
fn main() {
// Input: "postgres://user:secret123@localhost:5432/mydb"
// Logged: "postgres://user:****@localhost:5432/mydb"
// Error messages never contain actual passwords
}
3. AES-GCM Encryption
Industry-standard authenticated encryption for sensitive schema outputs.
Encryption Specifications
- Algorithm: AES-GCM-256 (256-bit key, 96-bit nonce, 128-bit authentication tag)
- Nonce Generation: Cryptographically secure random nonces (never reused)
- Key Derivation: Argon2id with secure parameters
- Authentication: Separate authentication tags prevent tampering
Key Derivation Parameters
| Parameter | Value | Purpose |
|---|---|---|
| Algorithm | Argon2id v1.3 | Memory-hard key derivation |
| Salt Size | 16 bytes (128 bits) | Random salt per encryption |
| Memory Cost | 64 MiB (65536 KiB) | Memory hardness |
| Time Cost | 3 iterations | Computational hardness |
| Parallelism | 4 threads | Parallel processing |
| Output Length | 32 bytes (256 bits) | AES-256 key size |
Encryption Process
graph TD
A[Password] --> B[Argon2id KDF]
C[Random Salt] --> B
B --> D[256-bit Key]
E[Schema Data] --> F[AES-GCM Encrypt]
D --> F
G[Random Nonce] --> F
F --> H[Ciphertext]
F --> I[Auth Tag]
J[Encrypted File] --> K[Algorithm: AES-GCM-256]
J --> L[Nonce: 96 bits]
J --> M[Ciphertext]
J --> N[Auth Tag: 128 bits]
J --> O[KDF Params + Salt]
4. Database Security
All database operations follow strict security principles.
Read-Only Operations
- SELECT/DESCRIBE Only: No INSERT, UPDATE, DELETE, or DDL operations
- Connection Flags: Read-only mode enforced at connection level where supported
- User Permissions: Recommend using read-only database users
Connection Security
- Timeouts: All connections and queries have configurable timeouts (default: 30s)
- Connection Pooling: Limited connection pools prevent resource exhaustion
- TLS/SSL: Certificate validation when using encrypted connections
- Parameterized Queries: All queries use parameterized statements (no SQL injection risk)
Example Connection Configuration
#![allow(unused)]
fn main() {
ConnectionConfig {
connect_timeout: Duration::from_secs(30),
query_timeout: Duration::from_secs(30),
max_connections: 10,
read_only: true, // Enforced
// Credentials handled separately
}
}
Security Testing
DBSurveyor includes comprehensive security tests to verify all guarantees.
Credential Protection Tests
#![allow(unused)]
fn main() {
#[test]
fn test_no_credentials_in_output() {
let schema = collect_schema("postgres://user:secret@localhost/db").await?;
let json = serde_json::to_string(&schema)?;
// Verify no sensitive data is present
assert!(!json.contains("secret"));
assert!(!json.contains("user:secret"));
assert!(!json.contains("password"));
}
}
Encryption Tests
#![allow(unused)]
fn main() {
#[test]
fn test_nonce_uniqueness() {
let data = b"test schema data";
let encrypted1 = encrypt_data(data, "password")?;
let encrypted2 = encrypt_data(data, "password")?;
// Same data should produce different ciphertext (random nonce)
assert_ne!(encrypted1.nonce, encrypted2.nonce);
assert_ne!(encrypted1.ciphertext, encrypted2.ciphertext);
}
}
Offline Operation Tests
#![allow(unused)]
fn main() {
#[test]
fn test_airgap_compatibility() {
// Simulate airgap environment
let schema = load_schema("fixtures/sample_schema.json")?;
// All processing should work offline
let documentation = generate_documentation(&schema, OutputFormat::Markdown)?;
assert!(!documentation.is_empty());
}
}
Security Best Practices
For Users
-
Use Read-Only Database Users
-- PostgreSQL CREATE USER dbsurveyor_readonly; GRANT CONNECT ON DATABASE mydb TO dbsurveyor_readonly; GRANT USAGE ON SCHEMA public TO dbsurveyor_readonly; GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_readonly; -- MySQL CREATE USER 'dbsurveyor_readonly'@'%' IDENTIFIED BY 'password'; GRANT SELECT ON mydb.* TO 'dbsurveyor_readonly'@'%'; -
Use Strong Encryption Passwords
- Minimum 12 characters
- Mix of uppercase, lowercase, numbers, symbols
- Consider using password managers
-
Secure File Handling
# Set restrictive permissions on encrypted files chmod 600 schema.enc # Use secure deletion when done shred -vfz -n 3 schema.enc -
Network Security
- Use TLS/SSL connections when possible
- Avoid credentials in command history
- Use environment variables for connection strings
For Developers
-
Never Log Credentials
#![allow(unused)] fn main() { // β Correct log::info!("Connecting to database"); // β Never do this log::info!("Connecting to {}", database_url); } -
Sanitize Error Messages
#![allow(unused)] fn main() { // β Correct #[error("Connection failed to database")] ConnectionFailed, // β Never do this #[error("Failed to connect to {url}")] ConnectionError { url: String }, } -
Use Secure Memory Handling
#![allow(unused)] fn main() { use zeroize::{Zeroize, Zeroizing}; #[derive(Zeroize)] #[zeroize(drop)] struct Credentials { username: Zeroizing<String>, password: Zeroizing<Option<String>>, } }
Security Validation Commands
Run these commands to verify security guarantees:
# Complete security validation
just security-full
# Test encryption capabilities
just test-encryption
# Verify offline operation
just test-offline
# Check for credential leakage
just test-credential-security
# Security audit
just audit
Threat Model
In Scope
- Credential Exposure: Preventing database credentials from appearing in logs, outputs, or error messages
- Data at Rest: Protecting collected schema files with strong encryption
- Network Isolation: Ensuring offline operation after initial setup
- Memory Safety: Secure handling of sensitive data in memory
Out of Scope
- Network Transport Security: TLS/SSL configuration is the userβs responsibility
- Database Server Security: DBSurveyor assumes the database server is trusted
- Host System Security: Operating system and file system security is external
- Social Engineering: User education and operational security practices
Compliance Considerations
DBSurveyorβs security features support various compliance requirements:
- SOC 2: Offline operation and encryption support security controls
- GDPR: Data minimization and encryption for personal data protection
- HIPAA: Encryption and access controls for healthcare data
- PCI DSS: Secure handling of systems that may contain payment data
Security Advisories
Current known security advisories and their status:
RUSTSEC-2023-0071 (RSA crate - Marvin Attack)
- Severity: Medium
- Impact: Timing side-channel vulnerability in RSA implementation
- Affected: MySQL connections only (not enabled by default)
- Mitigation: MySQL support disabled by default; PostgreSQL and SQLite recommended
- Status: No fix available upstream; tracked for future SQLx updates
- Review Date: 2025-09-01
Reporting Security Issues
To report security vulnerabilities:
- DO NOT create public GitHub issues for security vulnerabilities
- Email security issues to: security@evilbitlabs.io
- Include detailed reproduction steps and impact assessment
- Allow reasonable time for response and remediation
We follow responsible disclosure practices and will acknowledge security reports within 48 hours.
Configuration
DBSurveyor can be configured through command-line options, environment variables, and configuration files. This guide covers all configuration options and best practices.
Environment Variables
Database Connection
# Primary database connection
export DATABASE_URL="postgres://user:password@localhost:5432/mydb"
# Use with collector without specifying URL
dbsurveyor-collect
Logging Configuration
# Set log level for all modules
export RUST_LOG=info
# Set log level for specific modules
export RUST_LOG=dbsurveyor_collect=debug,dbsurveyor_core=trace
# Disable ANSI color output and progress indicators
# Follows the no-color.org convention
# Useful for CI environments, non-TTY contexts, and accessibility
export NO_COLOR=1 # Any value disables color and progress output
export TERM=dumb # Alternative way to disable color/progress output
# Log levels: error, warn, info, debug, trace
export RUST_LOG=debug
Connection Pool Configuration
# Maximum connection pool size (default: 10)
export DBSURVEYOR_MAX_CONNECTIONS=10
# Minimum idle connections in pool (default: 2)
export DBSURVEYOR_MIN_IDLE_CONNECTIONS=2
# Connection timeout in seconds (default: 30)
export DBSURVEYOR_CONNECT_TIMEOUT_SECS=30
# Idle connection timeout in seconds (default: 600)
export DBSURVEYOR_IDLE_TIMEOUT_SECS=600
# Maximum connection lifetime in seconds (default: 3600)
export DBSURVEYOR_MAX_LIFETIME_SECS=3600
Security Configuration
# Disable credential prompts (use with caution)
export DBSURVEYOR_NO_PROMPT=1
# Default encryption password (NOT RECOMMENDED for production)
export DBSURVEYOR_ENCRYPTION_PASSWORD="your-password"
Command-Line Configuration
Global Options
Available for both dbsurveyor-collect and dbsurveyor:
# Verbosity levels
dbsurveyor-collect -v postgres://localhost/db # Info level
dbsurveyor-collect -vv postgres://localhost/db # Debug level
dbsurveyor-collect -vvv postgres://localhost/db # Trace level
# Quiet mode (errors only)
dbsurveyor-collect -q postgres://localhost/db
# Help and version
dbsurveyor-collect --help
dbsurveyor-collect --version
Collection Configuration
Connection Settings
# Connection timeout (default: 30s)
dbsurveyor-collect --connect-timeout 60 postgres://localhost/db
# Query timeout (default: 30s)
dbsurveyor-collect --query-timeout 45 postgres://localhost/db
# Maximum connections (default: 10)
dbsurveyor-collect --max-connections 5 postgres://localhost/db
Data Collection Settings
# Sample size per table (default: 100, max: 10,000)
# Values above 10,000 are capped to prevent resource exhaustion
dbsurveyor-collect --sample 50 postgres://localhost/db
# Throttle delay between operations in milliseconds
dbsurveyor-collect --throttle 1000 postgres://localhost/db
# Include/exclude specific object types
dbsurveyor-collect --no-views postgres://localhost/db
dbsurveyor-collect --no-procedures postgres://localhost/db
dbsurveyor-collect --no-triggers postgres://localhost/db
Multi-Database Settings
# Collect all databases
dbsurveyor-collect --all-databases postgres://localhost
# Include system databases
dbsurveyor-collect --all-databases --include-system-databases postgres://localhost
# Exclude specific databases
dbsurveyor-collect --all-databases --exclude-databases "postgres,template0,template1" postgres://localhost
Documentation Configuration
Output Format Settings
# Output format
dbsurveyor --format html schema.json
dbsurveyor --format markdown schema.json
dbsurveyor --format json schema.json
dbsurveyor --format mermaid schema.json
# Custom output file
dbsurveyor --output custom-name.html --format html schema.json
Redaction Settings
# Redaction modes
dbsurveyor --redact-mode none schema.json # No redaction
dbsurveyor --redact-mode minimal schema.json # Minimal redaction
dbsurveyor --redact-mode balanced schema.json # Balanced (default)
dbsurveyor --redact-mode conservative schema.json # Maximum redaction
# Disable all redaction
dbsurveyor --no-redact schema.json
Configuration Files
Workspace Configuration
Create a .dbsurveyor.toml file in your project root:
# .dbsurveyor.toml
[collection]
# Default connection settings
connect_timeout = "30s"
query_timeout = "30s"
max_connections = 10
sample_size = 100 # Max: 10,000 (values above are clamped)
throttle_ms = 0
# Object collection settings
include_views = true
include_procedures = true
include_functions = true
include_triggers = true
include_indexes = true
include_constraints = true
include_custom_types = true
# Multi-database settings
include_system_databases = false
exclude_databases = ["postgres", "template0", "template1"]
[output]
# Default output settings
format = "markdown"
compression = false
encryption = false
[redaction]
# Data redaction settings
mode = "balanced"
custom_patterns = [
{ pattern = "(?i)(api[_-]?key)", replacement = "[API_KEY]" },
{ pattern = "(?i)(token)", replacement = "[TOKEN]" },
]
[security]
# Security settings
read_only = true
sanitize_credentials = true
User Configuration
Create a global configuration file at ~/.config/dbsurveyor/config.toml:
# ~/.config/dbsurveyor/config.toml
[defaults]
# Default database type for ambiguous connections
database_type = "postgresql"
# Default output directory
output_dir = "~/Documents/database-docs"
# Default logging level
log_level = "info"
[security]
# Security preferences
always_encrypt = false
strong_passwords_only = true
credential_timeout = "5m"
[templates]
# Custom template directory
template_dir = "~/.config/dbsurveyor/templates"
Database-Specific Configuration
PostgreSQL Configuration
# SSL/TLS settings
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=verify-full&sslcert=client.crt&sslkey=client.key"
# Connection pool settings
dbsurveyor-collect "postgres://user:pass@localhost/db?pool_max_conns=5&pool_timeout=30"
# Schema-specific collection
dbsurveyor-collect "postgres://user:pass@localhost/db?search_path=public,custom"
MySQL Configuration
# SSL settings
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=REQUIRED"
# Character set
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"
# Connection timeout
dbsurveyor-collect "mysql://user:pass@localhost/db?connect_timeout=30"
SQLite Configuration
# Read-only mode
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"
# Busy timeout
dbsurveyor-collect "sqlite:///path/to/db.sqlite?busy_timeout=30000"
# Cache size
dbsurveyor-collect "sqlite:///path/to/db.sqlite?cache_size=2000"
MongoDB Configuration
# Authentication database
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?authSource=admin"
# SSL settings
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?ssl=true&sslVerify=false"
# Connection timeout
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?connectTimeoutMS=30000"
Advanced Configuration
Custom Redaction Patterns
Create custom redaction rules:
# .dbsurveyor.toml
[redaction.patterns]
# Credit card numbers
credit_card = { pattern = "\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}", replacement = "[CREDIT_CARD]" }
# Social Security Numbers
ssn = { pattern = "\\d{3}-\\d{2}-\\d{4}", replacement = "[SSN]" }
# Phone numbers
phone = { pattern = "\\(\\d{3}\\)\\s?\\d{3}-\\d{4}", replacement = "[PHONE]" }
# Custom API keys
api_key = { pattern = "(?i)api[_-]?key[\"']?\\s*[:=]\\s*[\"']?([a-zA-Z0-9]{32,})", replacement = "[API_KEY]" }
Performance Tuning
# .dbsurveyor.toml
[performance]
# Connection pool settings
max_connections = 10
min_connections = 2
connection_timeout = "30s"
idle_timeout = "10m"
max_lifetime = "1h"
# Query settings
query_timeout = "30s"
batch_size = 1000
max_concurrent_queries = 5
# Memory settings
max_memory_mb = 512
streaming_threshold_mb = 100
Output Customization
# .dbsurveyor.toml
[output.html]
# HTML-specific settings
theme = "dark"
include_search = true
include_toc = true
syntax_highlighting = true
[output.markdown]
# Markdown-specific settings
include_toc = true
table_format = "github"
code_blocks = true
[output.json]
# JSON-specific settings
pretty_print = true
include_metadata = true
compress = false
Environment-Specific Configuration
Development Environment
# .env.development
DATABASE_URL=postgres://dev_user:dev_pass@localhost:5432/dev_db
RUST_LOG=debug
DBSURVEYOR_SAMPLE_SIZE=10
DBSURVEYOR_THROTTLE=0
Production Environment
# .env.production
DATABASE_URL=postgres://readonly_user:secure_pass@prod-db:5432/prod_db
RUST_LOG=warn
DBSURVEYOR_SAMPLE_SIZE=0
DBSURVEYOR_THROTTLE=1000
DBSURVEYOR_ENCRYPT=true
CI/CD Environment
# .env.ci
DATABASE_URL=postgres://ci_user:ci_pass@ci-db:5432/test_db
RUST_LOG=info
NO_COLOR=1
DBSURVEYOR_NO_PROMPT=1
DBSURVEYOR_OUTPUT_FORMAT=json
Configuration Validation
Validate your configuration:
# Check configuration syntax
dbsurveyor-collect --check-config
# Dry run with configuration
dbsurveyor-collect --dry-run postgres://localhost/db
# Show effective configuration
dbsurveyor-collect --show-config
Security Best Practices
Credential Management
# Use environment variables instead of command line
export DATABASE_URL="postgres://user:pass@localhost/db"
dbsurveyor-collect
# Use credential files with restricted permissions
echo "postgres://user:pass@localhost/db" > ~/.dbsurveyor/credentials
chmod 600 ~/.dbsurveyor/credentials
dbsurveyor-collect --credentials-file ~/.dbsurveyor/credentials
Configuration File Security
# Secure configuration files
chmod 600 .dbsurveyor.toml
chmod 600 ~/.config/dbsurveyor/config.toml
# Use environment variable substitution
# .dbsurveyor.toml
[collection]
database_url = "${DATABASE_URL}"
encryption_password = "${ENCRYPTION_PASSWORD}"
Audit Configuration
# Log configuration usage
export RUST_LOG=dbsurveyor_core::config=debug
# Review effective configuration
dbsurveyor-collect --show-config --dry-run
Troubleshooting Configuration
Common Issues
Configuration not found:
# Check configuration file locations
ls -la .dbsurveyor.toml
ls -la ~/.config/dbsurveyor/config.toml
# Use explicit configuration file
dbsurveyor-collect --config custom-config.toml
Environment variable not recognized:
# Check environment variable names (case sensitive)
env | grep DBSURVEYOR
env | grep DATABASE_URL
# Verify variable export
echo $DATABASE_URL
Permission denied:
# Check file permissions
ls -la .dbsurveyor.toml
chmod 644 .dbsurveyor.toml # Make readable
Debug Configuration
# Show all configuration sources
dbsurveyor-collect --show-config --verbose
# Test configuration parsing
dbsurveyor-collect --check-config --verbose
# Trace configuration loading
export RUST_LOG=dbsurveyor_core::config=trace
dbsurveyor-collect --dry-run
Migration and Upgrades
Configuration Migration
When upgrading DBSurveyor versions:
# Backup existing configuration
cp .dbsurveyor.toml .dbsurveyor.toml.backup
# Check for configuration changes
dbsurveyor-collect --check-config --verbose
# Migrate configuration format
dbsurveyor-collect --migrate-config
Version Compatibility
# .dbsurveyor.toml
[meta]
# Specify minimum required version
min_version = "0.1.0"
config_version = "1.0"
This comprehensive configuration system allows you to customize DBSurveyor for your specific needs while maintaining security and performance.
CLI Reference
Complete reference for all DBSurveyor command-line options.
dbsurveyor-collect
Database schema collection tool.
Synopsis
dbsurveyor-collect [OPTIONS] [DATABASE_URL]
dbsurveyor-collect <COMMAND>
Global Options
| Option | Short | Description |
|---|---|---|
--verbose | -v | Increase verbosity (-v, -vv, -vvv) |
--quiet | -q | Suppress all output except errors |
--help | -h | Print help information |
--version | -V | Print version information |
Collection Options
| Option | Description | Default | Status |
|---|---|---|---|
--database-url <URL> | Database connection string | From DATABASE_URL env var | β Implemented |
--output <PATH> | Output file path | schema.dbsurveyor.json | β Implemented |
--sample <N> | Number of sample rows per table | 100 | π§ Planned |
--throttle <MS> | Delay between operations (ms) | None | π§ Planned |
--compress | Compress output using Zstandard | false | β Implemented |
--encrypt | Encrypt output using AES-GCM | false | β Implemented |
--enable-quality | Enable data quality analysis on sampled data | false | β Implemented |
--quality-threshold <LIST> | Quality thresholds (e.g., completeness:0.9,uniqueness:0.95,consistency:0.85) | None | β Implemented |
--all-databases | Collect all accessible databases | false | π§ Planned |
--include-system-databases | Include system databases | false | π§ Planned |
--exclude-databases <LIST> | Comma-separated list to exclude | None | π§ Planned |
Commands
collect
Collect schema from database.
dbsurveyor-collect collect <DATABASE_URL> [--output <PATH>]
test
Test database connection without collecting schema.
dbsurveyor-collect test <DATABASE_URL>
list
List supported database types and connection formats.
dbsurveyor-collect list
Examples
# Basic collection
dbsurveyor-collect postgres://user:pass@localhost/db
# With custom output file
dbsurveyor-collect --output my-schema.json postgres://localhost/db
# Encrypted and compressed
dbsurveyor-collect --encrypt --compress postgres://localhost/db
# Multi-database collection
dbsurveyor-collect --all-databases --exclude-databases postgres,template0 postgres://localhost
# Test connection only
dbsurveyor-collect test mysql://root:password@localhost/mydb
# Throttled collection (stealth mode)
dbsurveyor-collect --throttle 1000 postgres://localhost/db
Connection String Formats
| Database | Format | Example | Status |
|---|---|---|---|
| PostgreSQL | postgres://user:pass@host:port/db | postgres://admin:secret@localhost:5432/mydb | β Implemented |
| SQLite | sqlite:///path/to/file | sqlite:///home/user/data.db | β Implemented |
| MySQL | mysql://user:pass@host:port/db | mysql://root:password@localhost:3306/mydb | π§ In Development |
| MongoDB | mongodb://user:pass@host:port/db | mongodb://admin:secret@localhost:27017/mydb | π§ Planned |
| SQL Server | mssql://user:pass@host:port/db | mssql://sa:password@localhost:1433/mydb | π§ Planned |
Environment Variables
| Variable | Description |
|---|---|
DATABASE_URL | Default database connection string |
RUST_LOG | Logging configuration (error, warn, info, debug, trace) |
DBSURVEYOR_MAX_CONNECTIONS | Maximum connection pool size (default: 10) |
DBSURVEYOR_MIN_IDLE_CONNECTIONS | Minimum idle connections in pool (default: 2) |
DBSURVEYOR_CONNECT_TIMEOUT_SECS | Connection timeout in seconds (default: 30) |
DBSURVEYOR_IDLE_TIMEOUT_SECS | Idle connection timeout in seconds (default: 600) |
DBSURVEYOR_MAX_LIFETIME_SECS | Maximum connection lifetime in seconds (default: 3600) |
dbsurveyor
Database schema documentation and analysis tool.
Synopsis
dbsurveyor [OPTIONS] [INPUT_FILE]
dbsurveyor <COMMAND>
Global Options
| Option | Short | Description |
|---|---|---|
--verbose | -v | Increase verbosity (-v, -vv, -vvv) |
--quiet | -q | Suppress all output except errors |
--help | -h | Print help information |
--version | -V | Print version information |
Documentation Options
| Option | Short | Description | Default |
|---|---|---|---|
--format <FORMAT> | -f | Output format | markdown |
--output <PATH> | -o | Output file path | Auto-detected |
--redact-mode <MODE> | Data redaction level | balanced | |
--no-redact | Disable all data redaction | false |
Output Formats
| Format | Description | Extension | Status |
|---|---|---|---|
markdown | Markdown documentation | .md | β Implemented |
json | JSON analysis report | .json | β Implemented |
html | HTML report with search | .html | π§ Placeholder |
mermaid | Mermaid ERD diagram | .mmd | π§ Placeholder |
Redaction Modes
| Mode | Description |
|---|---|
none | No redaction (show all data) |
minimal | Minimal redaction (only obvious sensitive fields) |
balanced | Balanced redaction (recommended default) |
conservative | Conservative redaction (maximum privacy) |
Commands
generate
Generate documentation from schema file.
dbsurveyor generate <INPUT_FILE> [OPTIONS]
Options:
--format <FORMAT>- Output format--output <PATH>- Output file path
analyze
Analyze schema for insights and statistics.
dbsurveyor analyze <INPUT_FILE> [--detailed]
Options:
--detailed- Show detailed analysis statistics
sql
Reconstruct SQL DDL from schema.
dbsurveyor sql <INPUT_FILE> [OPTIONS]
Options:
--dialect <DIALECT>- Target SQL dialect (default:postgresql)--output <PATH>- Output file path
SQL Dialects:
postgresql- PostgreSQL dialect (π§ Placeholder)mysql- MySQL dialect (π§ Placeholder)sqlite- SQLite dialect (π§ Placeholder)sqlserver- SQL Server dialect (π§ Placeholder)generic- Generic SQL (ANSI standard) (π§ Placeholder)
validate
Validate schema file format.
dbsurveyor validate <INPUT_FILE>
Examples
# Generate Markdown documentation
dbsurveyor generate schema.dbsurveyor.json
# Generate HTML report
dbsurveyor --format html --output report.html schema.json
# Process encrypted schema (will prompt for password)
dbsurveyor generate schema.enc
# Generate SQL DDL for MySQL
dbsurveyor sql schema.json --dialect mysql --output recreate.sql
# Analyze schema with detailed statistics
dbsurveyor analyze schema.json --detailed
# Validate schema file format
dbsurveyor validate schema.dbsurveyor.json
# Generate with conservative redaction
dbsurveyor --redact-mode conservative schema.json
# Generate without any redaction
dbsurveyor --no-redact schema.json
Input File Formats
DBSurveyor automatically detects input file formats:
| Extension | Format | Description |
|---|---|---|
.json | JSON | Standard schema format |
.zst | Compressed | Zstandard compressed JSON |
.enc | Encrypted | AES-GCM encrypted JSON |
Exit Codes
| Code | Description |
|---|---|
0 | Success |
1 | General error |
Note: Both binaries currently use exit code
1for all error conditions. Granular exit codes (e.g., file-not-found, connection failure) may be added in a future release.
Environment Variables
| Variable | Description |
|---|---|
RUST_LOG | Logging configuration |
NO_COLOR | Disable colored output |
DBSURVEYOR_MAX_CONNECTIONS | Maximum connection pool size (default: 10) |
DBSURVEYOR_MIN_IDLE_CONNECTIONS | Minimum idle connections in pool (default: 2) |
DBSURVEYOR_CONNECT_TIMEOUT_SECS | Connection timeout in seconds (default: 30) |
DBSURVEYOR_IDLE_TIMEOUT_SECS | Idle connection timeout in seconds (default: 600) |
DBSURVEYOR_MAX_LIFETIME_SECS | Maximum connection lifetime in seconds (default: 3600) |
Common Usage Patterns
Secure Workflow
# 1. Test connection
dbsurveyor-collect test postgres://user:pass@localhost/db
# 2. Collect with encryption
dbsurveyor-collect --encrypt postgres://user:pass@localhost/db
# 3. Generate documentation offline
dbsurveyor generate schema.enc
# 4. Validate output
dbsurveyor validate schema.enc
Multi-Database Documentation
# Collect from multiple databases
dbsurveyor-collect --all-databases postgres://localhost > collection.log
# Generate comprehensive report
dbsurveyor --format html --output full-report.html schema.dbsurveyor.json
# Extract SQL for specific database
dbsurveyor sql schema.json --dialect postgresql --output postgres-ddl.sql
Development Workflow
# Quick collection and documentation
dbsurveyor-collect sqlite://dev.db && dbsurveyor generate schema.dbsurveyor.json
# Analyze changes
dbsurveyor analyze schema.dbsurveyor.json --detailed
# Generate multiple formats
for format in markdown html mermaid; do
dbsurveyor --format $format schema.dbsurveyor.json
done
Output Formats
DBSurveyor generates structured output in multiple formats to support different use cases and workflows.
Schema File Formats
Standard JSON Format (.dbsurveyor.json)
The primary output format is a JSON file following the DBSurveyor Schema v1.0 specification:
{
"format_version": "1.0",
"database_info": {
"name": "production_db",
"version": "13.7",
"size_bytes": 1073741824,
"encoding": "UTF8",
"collation": "en_US.UTF-8",
"access_level": "Full",
"collection_status": "Success"
},
"tables": [
{
"name": "users",
"schema": "public",
"columns": [
{
"name": "id",
"data_type": {
"Integer": {
"bits": 32,
"signed": true
}
},
"is_nullable": false,
"is_primary_key": true,
"is_auto_increment": true,
"ordinal_position": 1
}
],
"primary_key": {
"name": "users_pkey",
"columns": [
"id"
]
},
"foreign_keys": [],
"indexes": [],
"constraints": []
}
],
"views": [],
"indexes": [],
"constraints": [],
"procedures": [],
"functions": [],
"triggers": [],
"custom_types": [],
"collection_metadata": {
"collected_at": "2024-01-15T10:30:00Z",
"collection_duration_ms": 1500,
"collector_version": "1.0.0",
"warnings": []
}
}
Table Sampling Data
Tables may optionally include a sampling field containing sampled row data with metadata:
{
"table_name": "users",
"schema_name": "public",
"rows": [
{
"id": 1,
"username": "alice",
"email": "alice@example.com"
},
{
"id": 2,
"username": "bob",
"email": "bob@example.com"
}
],
"sample_size": 2,
"total_rows": 1234,
"sampling_strategy": {
"MostRecent": {
"limit": 10
}
},
"collected_at": "2024-01-15T10:30:00Z",
"warnings": [],
"sample_status": "Complete"
}
Sample Status Field
The optional sample_status field indicates the outcome of the sampling operation:
"Complete": Sampling completed successfully with no issues{"PartialRetry": {"original_limit": 10}}: Sampling partially completed with a reduced row limit (may be retried){"Skipped": {"reason": "table too large"}}: Sampling was skipped or not performed
Backward Compatibility: The sample_status field is optional and omitted when not set, ensuring older JSON files without this field remain valid.
Compressed Format (.dbsurveyor.json.zst)
Large schema files can be compressed using Zstandard compression:
# Generate compressed output
dbsurveyor-collect --compress postgres://localhost/large_db
# Postprocessor automatically detects and decompresses
dbsurveyor generate schema.dbsurveyor.json.zst
Benefits:
- 60-80% size reduction for large schemas
- Fast compression/decompression
- Maintains full data integrity
Encrypted Format (.enc)
Sensitive schema data can be encrypted using AES-GCM-256:
# Generate encrypted output (prompts for password)
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db
# Postprocessor prompts for decryption password
dbsurveyor generate schema.enc
Security Features:
- AES-GCM-256 authenticated encryption
- Argon2id key derivation (64 MiB memory, 3 iterations)
- Random 96-bit nonces (never reused)
- Embedded KDF parameters for decryption
Encrypted File Structure:
{
"algorithm": "AES-GCM-256",
"nonce": [/* 12 bytes */],
"ciphertext": [/* encrypted data */],
"auth_tag": [/* 16 bytes */],
"kdf_params": {
"salt": [/* 16 bytes */],
"memory_cost": 65536,
"time_cost": 3,
"parallelism": 4,
"version": "1.3"
}
}
Documentation Formats
Markdown Documentation (.md)
Status: β Implemented
Basic Markdown documentation with schema overview:
dbsurveyor generate schema.json --format markdown
Output Example:
# Database Schema: production_db
Generated by DBSurveyor v1.0.0
Collection Date: 2024-01-15 10:30:00 UTC
## Summary
- **Tables**: 25
- **Views**: 3
- **Indexes**: 47
## Tables
### users
- **Schema**: public
- **Columns**: 5
- **Primary Key**: id
JSON Analysis Report (.json)
Status: β Implemented
Structured analysis data for programmatic consumption:
dbsurveyor analyze schema.json --detailed
Output Example:
{
"database_name": "production_db",
"table_count": 25,
"view_count": 3,
"index_count": 47,
"constraint_count": 32,
"collection_date": "2024-01-15T10:30:00Z",
"analysis": {
"largest_table": "audit_log",
"most_indexes": "users",
"foreign_key_relationships": 18
}
}
HTML Report (.html)
Status: π§ Placeholder Implementation
Interactive HTML reports with search and navigation:
dbsurveyor generate schema.json --format html
Planned Features:
- Interactive table browser
- Search functionality
- Relationship visualization
- Responsive design for mobile/desktop
- Offline-compatible (no external dependencies)
Mermaid ERD (.mmd)
Status: π§ Placeholder Implementation
Entity Relationship Diagrams using Mermaid syntax:
dbsurveyor generate schema.json --format mermaid
Planned Output:
erDiagram
users {
int id PK
string username
string email UK
datetime created_at
}
orders {
int id PK
int user_id FK
decimal total
datetime created_at
}
users ||--o{ orders : "has many"
SQL DDL (.sql)
Status: π§ Placeholder Implementation
Reconstructed SQL DDL for schema recreation:
dbsurveyor sql schema.json --dialect postgresql --output recreate.sql
Planned Features:
- Multiple SQL dialect support (PostgreSQL, MySQL, SQLite, SQL Server)
- Complete schema recreation scripts
- Data type mapping between databases
- Constraint and index recreation
- Migration script generation
Unified Data Type System
DBSurveyor uses a unified type system to represent data types across different databases:
Basic Types
{
"String": {
"max_length": 255
},
"Integer": {
"bits": 32,
"signed": true
},
"Float": {
"precision": 53
},
"Boolean": null,
"Date": null,
"DateTime": {
"with_timezone": true
},
"Time": {
"with_timezone": false
}
}
Advanced Types
{
"Binary": {
"max_length": 1024
},
"Json": null,
"Uuid": null,
"Array": {
"element_type": {
"String": {
"max_length": 50
}
}
},
"Custom": {
"type_name": "user_status_enum"
}
}
Database-Specific Mapping
| DBSurveyor Type | PostgreSQL | MySQL | SQLite | MongoDB |
|---|---|---|---|---|
String | VARCHAR, TEXT | VARCHAR, TEXT | TEXT | string |
Integer | INTEGER, BIGINT | INT, BIGINT | INTEGER | int, long |
Float | REAL, DOUBLE | FLOAT, DOUBLE | REAL | double |
Boolean | BOOLEAN | BOOLEAN | INTEGER | bool |
Json | JSON, JSONB | JSON | TEXT | object |
Array | ARRAY[] | JSON | TEXT | array |
Custom | ENUM, DOMAIN | ENUM | N/A | N/A |
Schema Validation
All output files are validated against the DBSurveyor JSON Schema v1.0:
Validation Features
- Format Version Checking: Ensures compatibility
- Required Field Validation: All mandatory fields present
- Data Type Validation: Correct type structure
- Security Validation: No credential fields allowed
- Relationship Validation: Foreign key consistency
Validation Errors
# Validate a schema file
dbsurveyor validate schema.json
# Example validation error
Error: Schema validation failed
- Missing required field: collection_metadata.collected_at
- Invalid data type: expected Integer, found String
- Security violation: credential field detected in column default_value
File Size Considerations
Compression Recommendations
| Schema Size | Recommendation | Expected Compression |
|---|---|---|
| < 1 MB | Standard JSON | N/A |
| 1-10 MB | Zstandard compression | 60-70% reduction |
| > 10 MB | Compression + chunking | 70-80% reduction |
Large Schema Handling
For very large databases (1000+ tables):
# Use compression for large schemas
dbsurveyor-collect --compress postgres://localhost/large_db
# Consider sampling for development
dbsurveyor-collect --sample 10 postgres://localhost/large_db
Security Considerations
Credential Protection
All output formats guarantee:
- No database credentials in any output file
- Connection strings sanitized in metadata
- Error messages sanitized
- Validation prevents credential leakage
Encryption Best Practices
# Use strong passwords for encryption
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db
# Verify encrypted file integrity
dbsurveyor validate schema.enc
Airgap Compatibility
All formats work completely offline:
- No external dependencies in generated files
- Self-contained documentation
- Local-only processing
- No network calls during generation
Format Evolution
Version Compatibility
- v1.0: Current format with full backward compatibility
- Future versions: Additive changes only
- Migration tools: Automatic format upgrades
- Deprecation policy: 2-version support window
Schema Extensions
The format supports extensions for:
- Database-specific features
- Custom metadata
- Plugin-generated data
- Performance metrics
Integration Examples
CI/CD Pipeline
# Generate schema documentation in CI
- name: Collect Database Schema
run: |
dbsurveyor-collect --compress $DATABASE_URL
dbsurveyor generate schema.dbsurveyor.json.zst --format markdown
- name: Upload Documentation
uses: actions/upload-artifact@v3
with:
name: database-docs
path: schema.md
Programmatic Access
import json
# Load schema data
with open("schema.dbsurveyor.json") as f:
schema = json.load(f)
# Analyze tables
for table in schema["tables"]:
print(f"Table: {table['name']}")
print(f"Columns: {len(table['columns'])}")
Documentation Automation
#!/bin/bash
# Automated documentation generation
# Collect from multiple environments
dbsurveyor-collect --output prod.json $PROD_DB_URL
dbsurveyor-collect --output staging.json $STAGING_DB_URL
# Generate documentation
dbsurveyor generate prod.json --format html --output prod-schema.html
dbsurveyor generate staging.json --format html --output staging-schema.html
# Compare schemas (future feature)
# dbsurveyor compare prod.json staging.json --output diff.html
This comprehensive output format system ensures that DBSurveyor can adapt to various workflows while maintaining security and compatibility guarantees.
JSON Schema Specification
The .dbsurveyor.json format is the standard output format for DBSurveyor schema collection. This specification documents the complete structure, validation rules, and usage examples for the schema format.
Overview
The .dbsurveyor.json format provides a comprehensive, validated representation of database schemas with the following characteristics:
- Security-First: No credential fields or sensitive data allowed
- Validation-Complete: Full JSON Schema validation ensures data integrity
- Database-Agnostic: Unified representation across PostgreSQL, MySQL, SQLite, and MongoDB
- Version-Aware: Format versioning for backward compatibility
- Frictionless-Compatible: Based on Frictionless Data Table Schema specification
Schema Structure
Root Object
Every .dbsurveyor.json file contains these required fields:
{
"format_version": "1.0",
"database_info": { /* Database metadata */ },
"tables": [ /* Table definitions */ ],
"collection_metadata": { /* Collection process info */ }
}
Required Fields
| Field | Type | Description |
|---|---|---|
format_version | String | Schema format version (currently β1.0β) |
database_info | Object | Database-level information and status |
collection_metadata | Object | Collection process metadata |
Optional Fields
| Field | Type | Description |
|---|---|---|
tables | Array | Table definitions (default: empty array) |
views | Array | View definitions |
indexes | Array | Database indexes |
constraints | Array | Database constraints |
procedures | Array | Stored procedures |
functions | Array | Database functions |
triggers | Array | Database triggers |
custom_types | Array | Custom data types |
samples | Array | Data samples from tables |
Database Information
The database_info object contains essential database metadata:
{
"name": "production_db",
"version": "13.7",
"size_bytes": 1073741824,
"encoding": "UTF8",
"collation": "en_US.UTF-8",
"owner": "dbadmin",
"is_system_database": false,
"access_level": "Full",
"collection_status": "Success"
}
Access Levels
Full: Complete schema access with all metadataLimited: Partial access due to permission constraintsNone: No access to schema information
Collection Status
"Success": Schema collected successfully{"Failed": {"error": "Permission denied"}}: Collection failed with reason{"Skipped": {"reason": "System database"}}: Database skipped with explanation
Table Structure
Tables are defined with comprehensive metadata:
{
"name": "users",
"schema": "public",
"columns": [
{
"name": "id",
"data_type": {
"Integer": {
"bits": 32,
"signed": true
}
},
"is_nullable": false,
"is_primary_key": true,
"is_auto_increment": true,
"ordinal_position": 1,
"comment": "Unique user identifier"
},
{
"name": "email",
"data_type": {
"String": {
"max_length": 255
}
},
"is_nullable": false,
"ordinal_position": 2,
"comment": "User email address"
}
],
"primary_key": {
"name": "users_pkey",
"columns": [
"id"
]
},
"foreign_keys": [
{
"name": "users_profile_fk",
"columns": [
"profile_id"
],
"referenced_table": "profiles",
"referenced_columns": [
"id"
],
"on_delete": "Cascade",
"on_update": "Cascade"
}
],
"indexes": [
{
"name": "users_email_idx",
"columns": [
{
"name": "email",
"order": "asc"
}
],
"is_unique": true,
"index_type": "btree"
}
],
"comment": "User account information",
"row_count": 50000
}
Data Type System
DBSurveyor uses a unified data type system that maps database-specific types to a common representation:
Basic Types
"Boolean" // Boolean values
"Date" // Date without time
"Json" // JSON data
"Uuid" // UUID/GUID values
String Types
{
"String": {
"max_length": 255 // Maximum character length
}
}
Numeric Types
{
"Integer": {
"bits": 32, // Bit size: 8, 16, 32, 64, 128
"signed": true // Signed or unsigned
}
}
{
"Float": {
"precision": 53 // Floating point precision (1-53)
}
}
Date/Time Types
{
"DateTime": {
"with_timezone": true // Includes timezone information
}
}
{
"Time": {
"with_timezone": false // Time without timezone
}
}
Complex Types
{
"Array": {
"element_type": "String" // Array element type
}
}
{
"Binary": {
"max_length": 1024 // Maximum binary length
}
}
{
"Custom": {
"type_name": "geometry" // Database-specific custom type
}
}
Constraints and Relationships
Primary Keys
{
"name": "users_pkey",
"columns": [
"id"
]
}
Foreign Keys
{
"name": "orders_user_fk",
"columns": [
"user_id"
],
"referenced_table": "users",
"referenced_schema": "public",
"referenced_columns": [
"id"
],
"on_delete": "Cascade",
"on_update": "Restrict"
}
Referential Actions:
Cascade: Delete/update cascades to related recordsSetNull: Set foreign key to NULLSetDefault: Set foreign key to default valueRestrict: Prevent deletion/update if references existNoAction: No automatic action
Check Constraints
{
"name": "users_age_check",
"constraint_type": "Check",
"definition": "age >= 0 AND age <= 150",
"enforced": true
}
Indexes
Index definitions include performance characteristics:
{
"name": "users_email_idx",
"table_name": "users",
"schema": "public",
"columns": [
{
"name": "email",
"order": "asc",
"nulls_order": "last"
}
],
"is_unique": true,
"is_primary": false,
"index_type": "btree",
"comment": "Unique index on email for fast lookups"
}
Index Types:
btree: Balanced tree (default)hash: Hash-based indexgin: Generalized inverted indexgist: Generalized search treespgist: Space-partitioned GiST
Data Sampling
Optional data samples provide insight into actual data:
{
"samples": [
{
"table_name": "users",
"schema_name": "public",
"rows": [
{
"id": 1001,
"email": "user1001@example.com",
"created_at": "2024-01-15T09:00:00Z"
},
{
"id": 1002,
"email": "user1002@example.com",
"created_at": "2024-01-15T09:15:00Z"
}
],
"sample_size": 2,
"total_rows": 50000,
"sampling_strategy": {
"MostRecent": {
"limit": 10
}
},
"collected_at": "2024-01-15T10:30:00Z",
"warnings": [
"Large table - limited sample collected"
],
"sample_status": "Complete"
}
]
}
Table Sample Fields
| Field | Type | Required | Description |
|---|---|---|---|
table_name | String | Yes | Name of the sampled table |
schema_name | String | No | Schema/database name (null for databases without schemas) |
rows | Array | Yes | Array of sampled row data as JSON objects |
sample_size | Integer | Yes | Number of rows actually sampled |
total_rows | Integer | No | Estimated total row count in the table |
sampling_strategy | Object/String | Yes | Strategy used for sampling (see below) |
collected_at | String | Yes | ISO 8601 timestamp of when sample was collected |
warnings | Array | Yes | Array of warning messages (empty if no warnings) |
sample_status | String/Object | No | Status of the sampling operation (see below) |
Sampling Strategies
{"MostRecent": {"limit": 10}}: Latest N rows{"Random": {"limit": 100}}: Random sample of N rows"None": No sampling performed
Sample Status
The optional sample_status field tracks the outcome of the sampling operation. This field is backward-compatible and will be omitted when not set.
Status Values:
"Complete": Sampling completed successfully{"PartialRetry": {"original_limit": 100}}: Sampling partially completed with a reduced limit due to constraints or errors{"Skipped": {"reason": "Not implemented for this database"}}: Sampling was skipped with an explanation
Example with Complete Status:
{
"table_name": "orders",
"schema_name": "public",
"rows": [...],
"sample_size": 100,
"sampling_strategy": {"MostRecent": {"limit": 100}},
"collected_at": "2024-01-15T10:30:00Z",
"warnings": [],
"sample_status": "Complete"
}
Example with Skipped Status:
{
"table_name": "large_archive",
"schema_name": null,
"rows": [],
"sample_size": 0,
"sampling_strategy": "None",
"collected_at": "2024-01-15T10:30:00Z",
"warnings": [],
"sample_status": {
"Skipped": {
"reason": "Table exceeds size threshold"
}
}
}
Backward Compatibility Note: The sample_status field is optional and omitted when not explicitly set. Older schema files without this field remain fully compatible.
Multi-Database Collections
For server-level collections, the format supports multiple databases:
{
"format_version": "1.0",
"server_info": {
"server_type": "PostgreSQL",
"version": "13.7",
"host": "localhost",
"port": 5432,
"total_databases": 5,
"collected_databases": 3,
"system_databases_excluded": 2,
"connection_user": "dbadmin",
"has_superuser_privileges": true,
"collection_mode": {
"MultiDatabase": {
"discovered": 5,
"collected": 3,
"failed": 0
}
}
},
"databases": [
/* Individual database schemas */
],
"collection_metadata": {
"collected_at": "2024-01-15T10:30:00Z",
"collection_duration_ms": 2500,
"collector_version": "1.0.0",
"warnings": []
}
}
Collection Metadata
Every schema file includes metadata about the collection process:
{
"collection_metadata": {
"collected_at": "2024-01-15T10:30:00Z",
"collection_duration_ms": 1500,
"collector_version": "1.0.0",
"warnings": [
"Large table 'audit_logs' - collection took 45 seconds",
"Custom type 'geometry' not fully supported"
],
"collector_options": {
"include_system_tables": false,
"sample_data": true,
"max_sample_size": 1000
}
}
}
Validation Rules
Security Validation
The schema enforces strict security rules:
- No credential fields: Field names cannot contain password, secret, token, etc.
- No connection strings: Database URLs are automatically filtered
- No sensitive patterns: Common credential patterns are rejected
- Sanitized output: All error messages are credential-free
Data Validation
- String lengths: Maximum 255 characters for names, 1000 for comments
- Array limits: Maximum 1000 items in arrays
- Nesting depth: Maximum 10 levels of object nesting
- File size: Maximum 100MB per schema file
Type Validation
- Required fields: All mandatory fields must be present
- Data type consistency: Values must match declared types
- Constraint validation: Foreign keys must reference valid tables
- Index validation: Index columns must exist in referenced table
Usage Examples
Basic Schema Collection
# Collect PostgreSQL schema
dbsurveyor-collect postgres://user:pass@localhost/mydb --output schema.json
# Validate the output
dbsurveyor schema schema.json --validate
# Generate documentation
dbsurveyor schema schema.json --format markdown --output schema.md
Multi-Database Collection
# Collect all databases on server
dbsurveyor-collect postgres://admin:pass@localhost --all-databases --output server_schema.json
# Process specific database
dbsurveyor schema server_schema.json --database app_db --format json --output app_schema.json
Schema Validation
# Validate against JSON Schema
dbsurveyor validate schema.json
# Check for specific issues
dbsurveyor validate schema.json --check-security --check-constraints
Error Handling
Validation Errors
When validation fails, DBSurveyor provides detailed error information:
{
"validation_errors": [
{
"path": "/tables/0/columns/1/data_type",
"message": "Invalid data type: expected String, Integer, Boolean, Date, Json, Uuid, Array, Binary, Custom, or DateTime",
"value": "VARCHAR",
"suggestion": "Use {\"String\": {\"max_length\": 255}} instead"
}
]
}
Collection Warnings
Warnings are included in the metadata for non-critical issues:
{
"warnings": [
"Large table 'audit_logs' (1.2M rows) - collection took 45 seconds",
"Custom type 'geometry' not fully supported - using Custom type",
"Table 'temp_users' appears to be temporary - may not persist"
]
}
Version Compatibility
Current Version: 1.0
- Format: Stable and fully supported
- Validation: Complete JSON Schema validation
- Features: All documented features available
- Backward Compatibility: N/A (first version)
Future Versions
The schema is designed for evolution:
- Additive Changes: New optional fields can be added
- Version Detection: Format version enables version-specific handling
- Migration Support: Tools will support upgrading between versions
- Deprecation Path: Old fields will be marked before removal
Integration
Documentation Tools
The schema format integrates with all DBSurveyor tools:
dbsurveyor-collect: Generates schema filesdbsurveyor: Processes and validates schemasdbsurveyor-docs: Generates documentation from schemasdbsurveyor-validate: Standalone validation tool
External Tools
The format is compatible with:
- JSON Schema validators: jsonschema, ajv, etc.
- Data analysis tools: pandas, jq, etc.
- Documentation generators: Docusaurus, MkDocs, etc.
- CI/CD pipelines: GitHub Actions, GitLab CI, etc.
Best Practices
Schema Collection
- Use descriptive names: Avoid generic names like βdb1β, βtestβ
- Include comments: Add meaningful descriptions for tables and columns
- Sample strategically: Use sampling for large tables to avoid huge files
- Validate early: Check schemas immediately after collection
Schema Storage
- Version control: Track schema changes in Git
- Backup regularly: Keep historical schema versions
- Compress large files: Use
.zstcompression for schemas >1MB - Secure access: Limit access to production schemas
Schema Processing
- Validate inputs: Always validate before processing
- Handle errors gracefully: Check collection status before proceeding
- Monitor performance: Track collection times for optimization
- Document changes: Keep records of schema evolution
Troubleshooting
Common Issues
Collection Fails with Permission Error
# Check database user privileges
dbsurveyor-collect postgres://user:pass@localhost/db --test-connection
# Verify user has SELECT on information_schema
GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO username;
Large Schema File Size
# Use compression
dbsurveyor-collect postgres://localhost/db --compress
# Limit data sampling
dbsurveyor-collect postgres://localhost/db --max-sample-size 100
Validation Errors
# Check schema format
dbsurveyor validate schema.json --verbose
# Fix common issues
dbsurveyor fix schema.json --output fixed_schema.json
Performance Optimization
- Connection pooling: Use connection pooling for large databases
- Parallel collection: Collect multiple databases simultaneously
- Selective sampling: Only sample essential tables
- Incremental updates: Collect only changed schemas
This specification provides a complete reference for the .dbsurveyor.json format, ensuring consistent, validated, and secure schema collection across all supported database types.
Database Support
DBSurveyor supports multiple database engines with comprehensive schema collection capabilities. This guide details the support level and specific features for each database type.
Supported Databases
| Database | Status | Feature Flag | Default | Version Support |
|---|---|---|---|---|
| PostgreSQL | β Full | postgresql | β Yes | 9.6+ |
| SQLite | β Full | sqlite | β Yes | 3.6+ |
| MySQL | π§ Partial | mysql | β No | 5.7+, 8.0+ |
| MongoDB | π§ Planned | mongodb | β No | 4.0+ |
| SQL Server | π§ Planned | mssql | β No | 2017+ |
Legend:
- β Full: Complete feature support with comprehensive testing
- β οΈ Partial: Core features supported, some limitations
- π§ Basic: Minimal support, under development
PostgreSQL Support
Status: β
Full Support (Default)
Feature Flag: postgresql
Driver: SQLx with Tokio runtime
Connection Examples
# Basic connection
dbsurveyor-collect postgres://user:password@localhost:5432/mydb
# With SSL
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"
# Connection pooling
dbsurveyor-collect "postgres://user:pass@localhost/db?pool_max_conns=5"
# Multiple schemas
dbsurveyor-collect "postgres://user:pass@localhost/db?search_path=public,custom"
Supported Objects
| Object Type | Support | Notes |
|---|---|---|
| Tables | β Full | Including partitioned tables |
| Views | β Full | Regular and materialized views |
| Indexes | β Full | All index types (B-tree, Hash, GiST, etc.) |
| Constraints | β Full | PK, FK, Check, Unique, Exclusion |
| Functions | β Full | SQL and PL/pgSQL functions |
| Procedures | β Full | Stored procedures (PostgreSQL 11+) |
| Triggers | β Full | Row and statement triggers |
| Types | β Full | Custom types, domains, enums |
| Extensions | β Full | Installed extensions |
| Schemas | β Full | Multiple schema support |
PostgreSQL-Specific Features
-- Advanced data types
CREATE TABLE example (
id SERIAL PRIMARY KEY,
data JSONB,
tags TEXT[],
coordinates POINT,
search_vector TSVECTOR
);
-- Custom types
CREATE TYPE status_enum AS ENUM ('active', 'inactive', 'pending');
CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+\.[^@]+$');
-- Advanced indexes
CREATE INDEX CONCURRENTLY idx_data_gin ON example USING GIN (data);
CREATE INDEX idx_search ON example USING GIN (search_vector);
Collection Output
{
"database_info": {
"name": "mydb",
"version": "13.7",
"encoding": "UTF8",
"collation": "en_US.UTF-8"
},
"tables": [
{
"name": "example",
"schema": "public",
"columns": [
{
"name": "data",
"data_type": {
"Json": {}
},
"is_nullable": true
}
]
}
]
}
SQLite Support
Status: β
Full Support (Default)
Feature Flag: sqlite
Driver: SQLx with Tokio runtime
Connection Examples
# File path
dbsurveyor-collect sqlite:///path/to/database.db
dbsurveyor-collect /path/to/database.sqlite
# Read-only mode
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"
# In-memory database (for testing)
dbsurveyor-collect "sqlite://:memory:"
Supported Objects
| Object Type | Support | Notes |
|---|---|---|
| Tables | β Full | Including WITHOUT ROWID tables |
| Views | β Full | Regular views |
| Indexes | β Full | B-tree and partial indexes |
| Constraints | β οΈ Partial | Limited constraint introspection |
| Triggers | β Full | BEFORE, AFTER, INSTEAD OF |
| Virtual Tables | β Full | FTS, R-Tree, etc. |
| Attached DBs | β Full | Multiple attached databases |
SQLite-Specific Features
-- WITHOUT ROWID tables
CREATE TABLE example (
id TEXT PRIMARY KEY,
data TEXT
) WITHOUT ROWID;
-- Virtual tables
CREATE VIRTUAL TABLE docs_fts USING fts5(title, content);
-- Partial indexes
CREATE INDEX idx_active_users ON users(name) WHERE active = 1;
-- JSON support (SQLite 3.38+)
CREATE TABLE events (
id INTEGER PRIMARY KEY,
data JSON
);
Limitations
- Limited constraint introspection (SQLite stores constraints as DDL text)
- No stored procedures or functions
- No custom types (uses affinity system)
- No schemas (single namespace per database file)
MySQL Support
Status: β οΈ Partial Support
Feature Flag: mysql (not default)
Driver: SQLx with Tokio runtime
Connection Examples
# Basic connection
dbsurveyor-collect mysql://root:password@localhost:3306/mydb
# With SSL
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=REQUIRED"
# Character set
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"
Supported Objects
| Object Type | Support | Notes |
|---|---|---|
| Tables | β Full | All storage engines |
| Views | β Full | Regular views |
| Indexes | β Full | Primary, Unique, Index, Fulltext |
| Constraints | β οΈ Partial | PK, FK, Check (MySQL 8.0+) |
| Procedures | β Full | Stored procedures |
| Functions | β Full | User-defined functions |
| Triggers | β Full | BEFORE, AFTER triggers |
| Events | β Full | Scheduled events |
MySQL-Specific Features
-- Storage engines
CREATE TABLE innodb_table (
id INT PRIMARY KEY AUTO_INCREMENT,
data TEXT
) ENGINE=InnoDB;
-- Partitioning
CREATE TABLE partitioned (
id INT,
created_date DATE
) PARTITION BY RANGE (YEAR(created_date)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025)
);
-- Full-text indexes
CREATE TABLE articles (
id INT PRIMARY KEY,
title VARCHAR(255),
content TEXT,
FULLTEXT(title, content)
);
Known Limitations
- Check constraints only supported in MySQL 8.0+
- Limited JSON introspection compared to PostgreSQL
- Some storage engine specific features not captured
Security Advisory
β οΈ RUSTSEC-2023-0071: MySQL support uses the RSA crate which has a known timing side-channel vulnerability. MySQL support is disabled by default. Use PostgreSQL or SQLite for production environments.
MongoDB Support
Status: π§ Basic Support
Feature Flag: mongodb (not default)
Driver: Official MongoDB Rust driver
Connection Examples
# Basic connection
dbsurveyor-collect mongodb://user:password@localhost:27017/mydb
# With authentication database
dbsurveyor-collect "mongodb://user:pass@localhost/mydb?authSource=admin"
# Replica set
dbsurveyor-collect "mongodb://user:pass@host1,host2,host3/mydb?replicaSet=rs0"
Supported Objects
| Object Type | Support | Notes |
|---|---|---|
| Collections | β Full | Document collections |
| Indexes | β Full | Single field, compound, text, geo |
| Schema Inference | β Basic | Inferred from document sampling |
| GridFS | β οΈ Partial | Basic GridFS collection detection |
| Views | π§ Planned | Aggregation pipeline views |
MongoDB-Specific Features
// Schema inference from documents
{
"_id": ObjectId("..."),
"name": "string",
"age": "number",
"tags": ["array", "of", "strings"],
"address": {
"street": "string",
"city": "string"
}
}
// Index types
db.users.createIndex({
"name": 1
}) // Single field
db.users.createIndex({
"name": 1,
"age": -1
}) // Compound
db.articles.createIndex({
"title": "text"
}) // Text search
db.locations.createIndex({
"coordinates": "2dsphere"
}) // Geospatial
Current Limitations
- Schema inference is sampling-based (may miss rare fields)
- No aggregation pipeline analysis
- Limited sharding information
- No user-defined functions
SQL Server Support
Status: π§ Basic Support
Feature Flag: mssql (not default)
Driver: Tiberius (native TDS protocol)
Connection Examples
# Basic connection
dbsurveyor-collect mssql://sa:password@localhost:1433/mydb
# Windows Authentication (planned)
dbsurveyor-collect "mssql://localhost/mydb?trusted_connection=yes"
# Named instance
dbsurveyor-collect "mssql://sa:pass@localhost\\SQLEXPRESS/mydb"
Supported Objects
| Object Type | Support | Notes |
|---|---|---|
| Tables | β Full | User tables |
| Views | β Full | Regular views |
| Indexes | β οΈ Partial | Basic index information |
| Constraints | β οΈ Partial | PK, FK constraints |
| Procedures | π§ Planned | Stored procedures |
| Functions | π§ Planned | User-defined functions |
| Triggers | π§ Planned | DML triggers |
Current Limitations
- Limited to basic table and view introspection
- No stored procedure analysis yet
- No advanced SQL Server features (CLR, XML, spatial)
- Windows Authentication not yet supported
Feature Comparison Matrix
| Feature | PostgreSQL | SQLite | MySQL | MongoDB | SQL Server |
|---|---|---|---|---|---|
| Tables | β | β | β | β | β |
| Views | β | β | β | π§ | β |
| Indexes | β | β | β | β | β οΈ |
| Constraints | β | β οΈ | β οΈ | β | β οΈ |
| Procedures | β | β | β | β | π§ |
| Functions | β | β | β | β | π§ |
| Triggers | β | β | β | β | π§ |
| Custom Types | β | β | β οΈ | β | π§ |
| JSON Support | β | β | β οΈ | β | π§ |
| Multi-DB | β | β οΈ | β | β | π§ |
Building with Database Support
Default Build
# Includes PostgreSQL and SQLite
cargo build --release
Custom Database Selection
# PostgreSQL only
cargo build --release --no-default-features --features postgresql
# All databases
cargo build --release --all-features
# Specific combination
cargo build --release --no-default-features --features postgresql,mysql,encryption
Feature Dependencies
[features]
postgresql = ["sqlx", "sqlx/postgres", "sqlx/runtime-tokio-rustls"]
mysql = ["sqlx", "sqlx/mysql", "sqlx/runtime-tokio-rustls"]
sqlite = ["sqlx", "sqlx/sqlite", "sqlx/runtime-tokio-rustls"]
mongodb = ["dep:mongodb"]
mssql = ["dep:tiberius"]
Database-Specific Best Practices
PostgreSQL
# Use read-only user
CREATE USER dbsurveyor_readonly;
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_readonly;
GRANT USAGE ON SCHEMA public TO dbsurveyor_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_readonly;
# For multiple schemas
GRANT USAGE ON SCHEMA schema1, schema2 TO dbsurveyor_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA schema1, schema2 TO dbsurveyor_readonly;
MySQL
# Create read-only user
CREATE USER 'dbsurveyor_readonly'@'%' IDENTIFIED BY 'password';
GRANT SELECT ON mydb.* TO 'dbsurveyor_readonly'@'%';
GRANT SELECT ON information_schema.* TO 'dbsurveyor_readonly'@'%';
SQLite
# Ensure read permissions
chmod 644 /path/to/database.db
# Use read-only mode for safety
dbsurveyor-collect "sqlite:///path/to/db.sqlite?mode=ro"
MongoDB
// Create read-only user
use admin
db.createUser({
user: "dbsurveyor_readonly",
pwd: "password",
roles: [{
role: "read",
db: "mydb"
}, {
role: "read",
db: "config"
} // For sharding info
]
})
Troubleshooting Database Connections
Connection Issues
# Test connection first
dbsurveyor-collect test postgres://user:pass@localhost/db
# Check network connectivity
telnet localhost 5432 # PostgreSQL
telnet localhost 3306 # MySQL
telnet localhost 27017 # MongoDB
telnet localhost 1433 # SQL Server
Permission Issues
# PostgreSQL: Check permissions
psql -h localhost -U user -d db -c "\dt"
# MySQL: Check permissions
mysql -h localhost -u user -p -e "SHOW TABLES;" db
# SQLite: Check file permissions
ls -la /path/to/database.db
Driver Issues
# Check compiled features
dbsurveyor-collect list
# Verify feature compilation
cargo build --features postgresql --verbose
Roadmap
Planned Improvements
PostgreSQL:
- Advanced partitioning support
- Extension-specific object types
- Performance statistics collection
MySQL:
- Enhanced JSON column support
- Partition pruning analysis
- Storage engine optimization hints
MongoDB:
- Aggregation pipeline analysis
- Sharding topology mapping
- Index usage statistics
SQL Server:
- Complete stored procedure support
- CLR integration analysis
- Spatial data type support
General:
- Cross-database schema comparison
- Migration script generation
- Performance benchmarking integration
Contributing Database Support
See the Contributing Guide for information on adding support for new database engines or improving existing support.
Troubleshooting
This guide helps you diagnose and resolve common issues with DBSurveyor.
Quick Diagnostics
Check Installation
# Verify binaries are installed and working
dbsurveyor-collect --version
dbsurveyor --version
# Check compiled features
dbsurveyor-collect list
# Test with minimal example
echo "CREATE TABLE test (id INTEGER);" | sqlite3 test.db
dbsurveyor-collect sqlite://test.db
rm test.db schema.dbsurveyor.json
Enable Debug Logging
# Enable debug logging for all modules
export RUST_LOG=debug
dbsurveyor-collect postgres://localhost/db
# Enable trace logging for specific modules
export RUST_LOG=dbsurveyor_collect=trace,dbsurveyor_core=debug
dbsurveyor-collect postgres://localhost/db
# Log to file
export RUST_LOG=debug
dbsurveyor-collect postgres://localhost/db 2> debug.log
Connection Issues
Database Connection Failures
Symptoms: Connection timeouts, authentication failures, network errors
PostgreSQL Connection Issues
# Test basic connectivity
ping localhost
telnet localhost 5432
# Test with psql
psql -h localhost -U user -d db -c "SELECT 1;"
# Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*.log
# Common connection string issues
# β Wrong: postgres://user:pass@localhost/db:5432
# β
Correct: postgres://user:pass@localhost:5432/db
# SSL issues
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=disable"
dbsurveyor-collect "postgres://user:pass@localhost/db?sslmode=require"
MySQL Connection Issues
# Test basic connectivity
telnet localhost 3306
# Test with mysql client
mysql -h localhost -u user -p -e "SELECT 1;" db
# Check MySQL logs
sudo tail -f /var/log/mysql/error.log
# Common issues
# Character set problems
dbsurveyor-collect "mysql://user:pass@localhost/db?charset=utf8mb4"
# SSL issues
dbsurveyor-collect "mysql://user:pass@localhost/db?ssl-mode=DISABLED"
SQLite Connection Issues
# Check file exists and is readable
ls -la /path/to/database.db
file /path/to/database.db
# Test with sqlite3 command
sqlite3 /path/to/database.db ".tables"
# Permission issues
chmod 644 /path/to/database.db
# Use absolute paths
dbsurveyor-collect "sqlite:///$(pwd)/database.db"
MongoDB Connection Issues
# Test basic connectivity
telnet localhost 27017
# Test with mongo client
mongosh "mongodb://user:pass@localhost:27017/db"
# Authentication issues
dbsurveyor-collect "mongodb://user:pass@localhost/db?authSource=admin"
# Replica set issues
dbsurveyor-collect "mongodb://user:pass@host1,host2/db?replicaSet=rs0"
Permission and Authentication Issues
Symptoms: Access denied, insufficient privileges, authentication failed
PostgreSQL Permissions
-- Check current user permissions
SELECT current_user, session_user;
-- Check database access
SELECT datname FROM pg_database WHERE datname = 'your_db';
-- Check table permissions
SELECT schemaname, tablename
FROM pg_tables
WHERE schemaname NOT IN ('information_schema', 'pg_catalog');
-- Grant necessary permissions
GRANT CONNECT ON DATABASE mydb TO dbsurveyor_user;
GRANT USAGE ON SCHEMA public TO dbsurveyor_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbsurveyor_user;
MySQL Permissions
-- Check current user
SELECT USER(), CURRENT_USER();
-- Check permissions
SHOW GRANTS FOR 'dbsurveyor_user'@'%';
-- Grant necessary permissions
GRANT SELECT ON mydb.* TO 'dbsurveyor_user'@'%';
GRANT SELECT ON information_schema.* TO 'dbsurveyor_user'@'%';
FLUSH PRIVILEGES;
MongoDB Permissions
// Check current user
db.runCommand({
connectionStatus: 1
})
// Check permissions
use mydb
db.runCommand({
usersInfo: "dbsurveyor_user"
})
// Grant read permissions
use admin
db.grantRolesToUser("dbsurveyor_user", [{
role: "read",
db: "mydb"
}])
Collection Issues
Schema Collection Failures
Symptoms: Partial collection, missing objects, collection errors
Missing Tables or Objects
# Enable verbose logging to see what's being collected
dbsurveyor-collect -vvv postgres://localhost/db
# Check if objects exist in database
psql -h localhost -U user -d db -c "\dt" # PostgreSQL tables
mysql -u user -p -e "SHOW TABLES;" db # MySQL tables
sqlite3 db.sqlite ".tables" # SQLite tables
Large Database Timeouts
# Increase timeouts
dbsurveyor-collect --connect-timeout 60 --query-timeout 120 postgres://localhost/db
# Use throttling to reduce load
dbsurveyor-collect --throttle 1000 postgres://localhost/db
# Disable sample collection for speed
dbsurveyor-collect --sample 0 postgres://localhost/db
Memory Issues
# Monitor memory usage
top -p $(pgrep dbsurveyor-collect)
# Use compression to reduce memory usage
dbsurveyor-collect --compress postgres://localhost/db
# Process smaller chunks (multi-database collection)
dbsurveyor-collect --exclude-databases large_db1,large_db2 postgres://localhost
Output File Issues
Symptoms: Empty files, corrupted output, permission errors
File Permission Issues
# Check output directory permissions
ls -la $(dirname schema.dbsurveyor.json)
# Ensure write permissions
chmod 755 $(dirname schema.dbsurveyor.json)
# Use explicit output path
dbsurveyor-collect --output /tmp/schema.json postgres://localhost/db
Corrupted Output Files
# Validate schema file
dbsurveyor validate schema.dbsurveyor.json
# Check file size and format
ls -la schema.dbsurveyor.json
file schema.dbsurveyor.json
# Test JSON parsing
jq . schema.dbsurveyor.json > /dev/null
Compression/Encryption Issues
# Test compression
dbsurveyor-collect --compress postgres://localhost/db
zstd -t schema.dbsurveyor.json.zst # Test compressed file
# Test encryption (will prompt for password)
dbsurveyor-collect --encrypt postgres://localhost/db
dbsurveyor validate schema.enc # Test encrypted file
Documentation Generation Issues
Processing Failures
Symptoms: Generation errors, empty output, format issues
Input File Issues
# Validate input file first
dbsurveyor validate schema.dbsurveyor.json
# Check file format
file schema.dbsurveyor.json
# Test with minimal processing
dbsurveyor --format json schema.dbsurveyor.json
Encrypted File Issues
# Verify encryption format
dbsurveyor validate schema.enc
# Test decryption
dbsurveyor --format json schema.enc
# Check password
# Note: Passwords are case-sensitive and don't show characters
Large Schema Processing
# Use JSON format for large schemas (most efficient)
dbsurveyor --format json large-schema.json
# Monitor memory usage
top -p $(pgrep dbsurveyor)
# Process in parts (planned feature)
# Currently: split large schemas manually
Output Format Issues
Symptoms: Malformed output, missing content, rendering problems
HTML Generation Issues
# Test with simple format first
dbsurveyor --format markdown schema.json
# Check HTML output
dbsurveyor --format html schema.json
# Open in browser to check for issues
Mermaid Diagram Issues
# Generate Mermaid format
dbsurveyor --format mermaid schema.json
# Validate Mermaid syntax
# Copy content to https://mermaid.live/ for testing
Output Formatting and Progress Indicators
Symptoms: Progress spinners or colored output causing problems, garbled output with escape codes, missing progress indicators
Disabling Colors and Progress Spinners
For CI/CD pipelines, automation scripts, or non-interactive environments where colored output or progress spinners may cause issues:
# Disable ANSI color codes and progress spinners
export NO_COLOR=1
dbsurveyor-collect postgres://localhost/db
dbsurveyor --format markdown schema.json
# Alternative: Use TERM=dumb
TERM=dumb dbsurveyor-collect postgres://localhost/db
TERM=dumb dbsurveyor --format json schema.json
# For CI environments (typically set automatically)
export NO_COLOR=1
export TERM=dumb
dbsurveyor-collect postgres://localhost/db > collection.log
Common scenarios:
- CI/CD pipelines: Set
NO_COLOR=1to prevent ANSI escape codes in logs - Log files: Progress spinners are automatically hidden when output is redirected
- Automated scripts: Use
TERM=dumbfor consistent non-interactive behavior - Windows systems: Some terminals may display escape codes incorrectly
Troubleshooting Progress Indicators
# Check if output is going to a TTY
# Progress spinners only show when stdout is a terminal
test -t 1 && echo "TTY detected" || echo "Not a TTY"
# Force spinner to be hidden (for debugging)
export NO_COLOR=1
dbsurveyor --format html schema.json
# Check environment variables
echo "NO_COLOR=${NO_COLOR:-not set}"
echo "TERM=${TERM:-not set}"
Note: The tool automatically detects when output is piped or redirected and hides progress indicators. Setting NO_COLOR or TERM=dumb ensures consistent behavior across all environments.
Performance Issues
Slow Collection
Symptoms: Long collection times, high resource usage
Database Performance
# Use connection pooling
dbsurveyor-collect --max-connections 5 postgres://localhost/db
# Add throttling
dbsurveyor-collect --throttle 500 postgres://localhost/db
# Disable sample collection
dbsurveyor-collect --sample 0 postgres://localhost/db
# Monitor database load
# PostgreSQL: SELECT * FROM pg_stat_activity;
# MySQL: SHOW PROCESSLIST;
Network Performance
# Test network latency
ping database-host
# Use local connections when possible
dbsurveyor-collect postgres://localhost/db # Better than remote
# Consider compression for remote databases
dbsurveyor-collect --compress postgres://remote-host/db
Memory Usage
Symptoms: High memory consumption, out of memory errors
Memory Optimization
# Monitor memory usage
ps aux | grep dbsurveyor
top -p $(pgrep dbsurveyor)
# Use streaming for large datasets (automatic)
# Reduce sample size
dbsurveyor-collect --sample 10 postgres://localhost/db
# Use compression
dbsurveyor-collect --compress postgres://localhost/db
Security Issues
Credential Exposure
Symptoms: Passwords in logs, credential leakage
Verify Credential Sanitization
# Check logs for credentials
export RUST_LOG=debug
dbsurveyor-collect postgres://user:secret@localhost/db 2>&1 | grep -i secret
# Should return no results
# Test credential sanitization
dbsurveyor-collect test postgres://user:secret@localhost/db
# Should show: postgres://user:****@localhost/db
Secure Credential Handling
# Use environment variables
export DATABASE_URL="postgres://user:secret@localhost/db"
dbsurveyor-collect
# Avoid shell history
set +o history # Disable history
dbsurveyor-collect postgres://user:secret@localhost/db
set -o history # Re-enable history
Encryption Issues
Symptoms: Encryption failures, decryption errors
Test Encryption
# Test encryption roundtrip
echo "test data" > test.txt
dbsurveyor-collect --encrypt sqlite://test.db
dbsurveyor validate schema.enc
rm test.db test.txt schema.enc
Password Issues
# Ensure password consistency
# Passwords are case-sensitive
# No visual feedback during password entry (security feature)
# Test with simple password first
# Use ASCII characters only initially
Build and Installation Issues
Compilation Failures
Symptoms: Build errors, missing dependencies, feature conflicts
Rust Toolchain Issues
# Update Rust toolchain
rustup update
# Check Rust version (minimum 1.87)
rustc --version
# Clean and rebuild
cargo clean
cargo build --release
Feature Compilation Issues
# Check available features
cargo build --help | grep -A 20 "FEATURES:"
# Build with specific features
cargo build --release --features postgresql,sqlite
# Debug feature compilation
cargo build --release --features postgresql --verbose
System Dependencies
# Ubuntu/Debian: Install system dependencies
sudo apt-get update
sudo apt-get install build-essential libssl-dev pkg-config
# For PostgreSQL support
sudo apt-get install libpq-dev
# For MySQL support
sudo apt-get install libmysqlclient-dev
# macOS: Install dependencies
brew install openssl pkg-config
# For PostgreSQL: brew install postgresql
# For MySQL: brew install mysql
Runtime Dependencies
Symptoms: Missing shared libraries, runtime errors
Check Dependencies
# Check binary dependencies
ldd target/release/dbsurveyor-collect # Linux
otool -L target/release/dbsurveyor-collect # macOS
# Test minimal functionality
dbsurveyor-collect --version
dbsurveyor-collect list
Getting Help
Collect Debug Information
# System information
uname -a
rustc --version
cargo --version
# DBSurveyor information
dbsurveyor-collect --version
dbsurveyor-collect list
# Feature compilation
cargo build --release --verbose 2>&1 | grep -i feature
# Runtime debug
export RUST_LOG=debug
dbsurveyor-collect test postgres://localhost/db 2> debug.log
Create Minimal Reproduction
# Create minimal test case
sqlite3 minimal.db "CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT);"
sqlite3 minimal.db "INSERT INTO test VALUES (1, 'test');"
# Test collection
dbsurveyor-collect sqlite://minimal.db
# Test documentation
dbsurveyor generate schema.dbsurveyor.json
# Clean up
rm minimal.db schema.dbsurveyor.json schema.md
Report Issues
When reporting issues, include:
- System Information: OS, Rust version, DBSurveyor version
- Command Used: Exact command that failed (sanitize credentials)
- Error Output: Complete error message and stack trace
- Debug Logs: Output with
RUST_LOG=debug - Minimal Reproduction: Smallest example that reproduces the issue
Security Note: Never include actual database credentials in issue reports. Use placeholder values like user:password@localhost/db.
Community Resources
- GitHub Issues: Report bugs and request features
- Documentation: Complete user guide
- Security Issues: Email security@evilbitlabs.io
Professional Support
For enterprise users requiring professional support, contact support@evilbitlabs.io for:
- Priority issue resolution
- Custom feature development
- Integration consulting
- Security auditing and compliance
Architecture
DBSurveyor follows a security-first, modular architecture designed for flexibility, maintainability, and offline operation. This document details the system architecture and design decisions.
System Overview
graph TB
subgraph "User Environment"
CLI[Command Line Interface]
CONFIG[Configuration Files]
ENV[Environment Variables]
end
subgraph "DBSurveyor Workspace"
subgraph "dbsurveyor-collect"
COLLECT_CLI[Collection CLI]
ADAPTERS[Database Adapters]
POOL[Connection Pooling]
end
subgraph "dbsurveyor-core"
MODELS[Data Models]
SECURITY[Security Module]
ERROR[Error Handling]
TRAITS[Adapter Traits]
end
subgraph "dbsurveyor"
DOC_CLI[Documentation CLI]
TEMPLATES[Template Engine]
FORMATS[Output Formats]
ANALYSIS[Schema Analysis]
end
end
subgraph "External Systems"
POSTGRES[(PostgreSQL)]
MYSQL[(MySQL)]
SQLITE[(SQLite)]
MONGODB[(MongoDB)]
SQLSERVER[(SQL Server)]
end
subgraph "Output Artifacts"
JSON[Schema Files]
DOCS[Documentation]
SQL[SQL DDL]
DIAGRAMS[ERD Diagrams]
end
CLI --> COLLECT_CLI
CLI --> DOC_CLI
CONFIG --> COLLECT_CLI
CONFIG --> DOC_CLI
ENV --> COLLECT_CLI
COLLECT_CLI --> ADAPTERS
ADAPTERS --> POOL
ADAPTERS --> MODELS
ADAPTERS --> SECURITY
ADAPTERS --> ERROR
ADAPTERS --> TRAITS
POOL --> POSTGRES
POOL --> MYSQL
POOL --> SQLITE
POOL --> MONGODB
POOL --> SQLSERVER
COLLECT_CLI --> JSON
DOC_CLI --> TEMPLATES
DOC_CLI --> FORMATS
DOC_CLI --> ANALYSIS
TEMPLATES --> MODELS
FORMATS --> MODELS
ANALYSIS --> MODELS
JSON --> DOC_CLI
DOC_CLI --> DOCS
DOC_CLI --> SQL
DOC_CLI --> DIAGRAMS
Crate Architecture
Workspace Structure
DBSurveyor uses a Cargo workspace with three main crates:
dbsurveyor/
βββ dbsurveyor-core/ # Shared library
βββ dbsurveyor-collect/ # Collection binary
βββ dbsurveyor/ # Documentation binary
βββ Cargo.toml # Workspace configuration
Dependency Graph
graph TD
COLLECT[dbsurveyor-collect] --> CORE[dbsurveyor-core]
DOC[dbsurveyor] --> CORE
CORE --> SERDE[serde]
CORE --> TOKIO[tokio]
CORE --> SQLX[sqlx]
CORE --> MONGO[mongodb]
CORE --> CRYPTO[aes-gcm + argon2]
COLLECT --> CLAP[clap]
COLLECT --> RPASS[rpassword]
COLLECT --> ZSTD[zstd]
DOC --> ASKAMA[askama]
DOC --> PULLDOWN[pulldown-cmark]
Core Library (dbsurveyor-core)
Module Structure
#![allow(unused)]
fn main() {
// dbsurveyor-core/src/lib.rs
pub mod adapters; // Database adapter traits and factory
pub mod error; // Comprehensive error handling
pub mod models; // Unified data models
pub mod security; // Encryption and credential protection
// Re-exports for public API
pub use adapters::{create_adapter, DatabaseAdapter};
pub use error::{DbSurveyorError, Result};
pub use models::{DatabaseSchema, DatabaseType};
}
Data Models
The core defines unified data structures that work across all database types:
#![allow(unused)]
fn main() {
// Unified schema representation
pub struct DatabaseSchema {
pub format_version: String,
pub database_info: DatabaseInfo,
pub tables: Vec<Table>,
pub views: Vec<View>,
pub indexes: Vec<Index>,
pub constraints: Vec<Constraint>,
pub procedures: Vec<Procedure>,
pub functions: Vec<Procedure>,
pub triggers: Vec<Trigger>,
pub custom_types: Vec<CustomType>,
pub samples: Option<Vec<TableSample>>,
pub collection_metadata: CollectionMetadata,
}
// Cross-database type mapping
pub enum UnifiedDataType {
String { max_length: Option<u32> },
Integer { bits: u8, signed: bool },
Float { precision: Option<u8> },
Boolean,
DateTime { with_timezone: bool },
Json,
Array { element_type: Box<UnifiedDataType> },
Custom { type_name: String },
}
}
Adapter Pattern
Database adapters implement a common trait for unified access:
#![allow(unused)]
fn main() {
#[async_trait]
pub trait DatabaseAdapter: Send + Sync {
async fn test_connection(&self) -> Result<()>;
async fn collect_schema(&self) -> Result<DatabaseSchema>;
fn database_type(&self) -> DatabaseType;
fn supports_feature(&self, feature: AdapterFeature) -> bool;
fn connection_config(&self) -> ConnectionConfig;
}
}
Factory Pattern
The adapter factory provides database-agnostic instantiation:
#![allow(unused)]
fn main() {
pub async fn create_adapter(connection_string: &str) -> Result<Box<dyn DatabaseAdapter>> {
let database_type = detect_database_type(connection_string)?;
match database_type {
DatabaseType::PostgreSQL => {
#[cfg(feature = "postgresql")]
{
let adapter = PostgresAdapter::new(connection_string).await?;
Ok(Box::new(adapter))
}
#[cfg(not(feature = "postgresql"))]
Err(DbSurveyorError::unsupported_feature("PostgreSQL"))
}
// ... other database types
}
}
}
Security Architecture
Credential Protection
graph LR
INPUT[Connection String] --> PARSE[URL Parser]
PARSE --> CONFIG[Connection Config]
PARSE --> CREDS[Credentials]
CONFIG --> SANITIZE[Sanitization]
CREDS --> ZEROIZE[Zeroizing Container]
SANITIZE --> LOGS[Logs & Errors]
ZEROIZE --> DATABASE[Database Connection]
ZEROIZE --> MEMORY[Auto-Zero on Drop]
Implementation:
#![allow(unused)]
fn main() {
use zeroize::{Zeroize, Zeroizing};
#[derive(Zeroize)]
#[zeroize(drop)]
pub struct Credentials {
pub username: Zeroizing<String>,
pub password: Zeroizing<Option<String>>,
}
// Connection config never contains credentials
pub struct ConnectionConfig {
pub host: String,
pub port: Option<u16>,
pub database: Option<String>,
// No username/password fields
}
}
Encryption Architecture
graph TD
PASSWORD[User Password] --> ARGON2[Argon2id KDF]
SALT[Random Salt] --> ARGON2
ARGON2 --> KEY[256-bit Key]
DATA[Schema Data] --> AES[AES-GCM-256]
KEY --> AES
NONCE[Random Nonce] --> AES
AES --> CIPHERTEXT[Encrypted Data]
AES --> TAG[Auth Tag]
ENCRYPTED[Encrypted File] --> ALGORITHM[Algorithm ID]
ENCRYPTED --> NONCE
ENCRYPTED --> CIPHERTEXT
ENCRYPTED --> TAG
ENCRYPTED --> KDF_PARAMS[KDF Parameters]
ENCRYPTED --> SALT
Security Properties:
- Confidentiality: AES-GCM-256 encryption
- Integrity: 128-bit authentication tags
- Authenticity: Authenticated encryption prevents tampering
- Forward Secrecy: Random nonces prevent replay attacks
- Key Security: Argon2id with memory-hard parameters
Database Adapter Architecture
Adapter Hierarchy
classDiagram
class DatabaseAdapter {
<<trait>>
+test_connection() Result~()~
+collect_schema() Result~DatabaseSchema~
+database_type() DatabaseType
+supports_feature(AdapterFeature) bool
}
class PostgresAdapter {
-pool: PgPool
-config: ConnectionConfig
+new(connection_string) Result~Self~
}
class MySqlAdapter {
-pool: MySqlPool
-config: ConnectionConfig
+new(connection_string) Result~Self~
}
class SqliteAdapter {
-pool: SqlitePool
-config: ConnectionConfig
+new(connection_string) Result~Self~
}
class MongoAdapter {
-client: Client
-config: ConnectionConfig
+new(connection_string) Result~Self~
}
DatabaseAdapter <|-- PostgresAdapter
DatabaseAdapter <|-- MySqlAdapter
DatabaseAdapter <|-- SqliteAdapter
DatabaseAdapter <|-- MongoAdapter
Connection Pooling
Each adapter manages its own connection pool with security-focused defaults:
#![allow(unused)]
fn main() {
pub struct ConnectionConfig {
pub connect_timeout: Duration, // Default: 30s
pub query_timeout: Duration, // Default: 30s
pub max_connections: u32, // Default: 10
pub read_only: bool, // Default: true
}
}
Feature Flags
Database support is controlled by feature flags for minimal binary size:
[features]
default = ["postgresql", "sqlite"]
postgresql = ["sqlx", "sqlx/postgres"]
mysql = ["sqlx", "sqlx/mysql"]
sqlite = ["sqlx", "sqlx/sqlite"]
mongodb = ["dep:mongodb"]
mssql = ["dep:tiberius"]
Error Handling Architecture
Error Hierarchy
#![allow(unused)]
fn main() {
#[derive(Debug, thiserror::Error)]
pub enum DbSurveyorError {
#[error("Database connection failed")]
Connection(#[from] ConnectionError),
#[error("Schema collection failed: {context}")]
Collection {
context: String,
source: Box<dyn std::error::Error>,
},
#[error("Configuration error: {message}")]
Configuration { message: String },
#[error("Encryption operation failed")]
Encryption(#[from] EncryptionError),
#[error("I/O operation failed: {context}")]
Io {
context: String,
source: std::io::Error,
},
}
}
Error Context Chain
graph TD
USER_ERROR[User-Facing Error] --> CONTEXT[Error Context]
CONTEXT --> SOURCE[Source Error]
SOURCE --> ROOT[Root Cause]
USER_ERROR --> SANITIZED[Sanitized Message]
SANITIZED --> NO_CREDS[No Credentials]
SANITIZED --> ACTIONABLE[Actionable Information]
Security Guarantee: All error messages are sanitized to prevent credential leakage.
CLI Architecture
Command Structure
graph TD
CLI[CLI Entry Point] --> GLOBAL[Global Args]
CLI --> COMMANDS[Commands]
GLOBAL --> VERBOSE[Verbosity]
GLOBAL --> QUIET[Quiet Mode]
COMMANDS --> COLLECT[collect]
COMMANDS --> TEST[test]
COMMANDS --> LIST[list]
COLLECT --> DB_URL[Database URL]
COLLECT --> OUTPUT[Output Options]
COLLECT --> SECURITY[Security Options]
OUTPUT --> FORMAT[Format Selection]
OUTPUT --> COMPRESSION[Compression]
SECURITY --> ENCRYPTION[Encryption]
SECURITY --> THROTTLE[Throttling]
Configuration Hierarchy
Configuration is loaded from multiple sources with clear precedence:
- Command Line Arguments (highest priority)
- Environment Variables
- Project Configuration (
.dbsurveyor.toml) - User Configuration (
~/.config/dbsurveyor/config.toml) - Default Values (lowest priority)
Documentation Generation Architecture
Template Engine
graph LR
SCHEMA[Schema Data] --> ANALYSIS[Schema Analysis]
ANALYSIS --> CONTEXT[Template Context]
TEMPLATES[Askama Templates] --> ENGINE[Template Engine]
CONTEXT --> ENGINE
ENGINE --> MARKDOWN[Markdown Output]
ENGINE --> HTML[HTML Output]
ENGINE --> JSON[JSON Analysis]
SCHEMA --> MERMAID[Mermaid Generator]
MERMAID --> ERD[ERD Diagrams]
SCHEMA --> SQL[SQL Generator]
SQL --> DDL[DDL Scripts]
Output Format Pipeline
#![allow(unused)]
fn main() {
pub trait OutputGenerator {
fn generate(&self, schema: &DatabaseSchema) -> Result<String>;
fn file_extension(&self) -> &'static str;
fn mime_type(&self) -> &'static str;
}
// Implementations for each format
impl OutputGenerator for MarkdownGenerator { ... }
impl OutputGenerator for HtmlGenerator { ... }
impl OutputGenerator for JsonGenerator { ... }
impl OutputGenerator for MermaidGenerator { ... }
}
Performance Architecture
Memory Management
graph TD
STREAMING[Streaming Processing] --> BATCHES[Batch Processing]
BATCHES --> MEMORY[Memory Limits]
LARGE_TABLES[Large Tables] --> PAGINATION[Pagination]
PAGINATION --> CHUNKS[Chunk Processing]
CONNECTIONS[Connection Pooling] --> LIMITS[Connection Limits]
LIMITS --> TIMEOUTS[Query Timeouts]
COMPRESSION[Data Compression] --> ZSTD[Zstandard]
ZSTD --> EFFICIENCY[Storage Efficiency]
Concurrency Model
// Async/await with Tokio runtime
#[tokio::main]
async fn main() -> Result<()> {
// Connection pooling for concurrent queries
let pool = PgPoolOptions::new()
.max_connections(10)
.connect_timeout(Duration::from_secs(30))
.connect(&database_url).await?;
// Concurrent schema collection
let tables = collect_tables(&pool).await?;
let views = collect_views(&pool).await?;
let indexes = collect_indexes(&pool).await?;
// Join all concurrent operations
let (tables, views, indexes) = tokio::try_join!(
collect_tables(&pool),
collect_views(&pool),
collect_indexes(&pool)
)?;
}
Testing Architecture
Test Organization
tests/
βββ integration/ # End-to-end tests
β βββ postgres_tests.rs
β βββ mysql_tests.rs
β βββ sqlite_tests.rs
βββ security/ # Security-focused tests
β βββ credential_tests.rs
β βββ encryption_tests.rs
β βββ offline_tests.rs
βββ fixtures/ # Test data
βββ sample_schemas/
βββ test_databases/
Test Categories
graph TD
TESTS[Test Suite] --> UNIT[Unit Tests]
TESTS --> INTEGRATION[Integration Tests]
TESTS --> SECURITY[Security Tests]
TESTS --> PERFORMANCE[Performance Tests]
UNIT --> MODELS[Model Tests]
UNIT --> ADAPTERS[Adapter Tests]
UNIT --> SECURITY_UNIT[Security Unit Tests]
INTEGRATION --> POSTGRES[PostgreSQL Integration]
INTEGRATION --> MYSQL[MySQL Integration]
INTEGRATION --> SQLITE[SQLite Integration]
INTEGRATION --> MONGODB[MongoDB Integration]
SECURITY --> CREDENTIALS[Credential Protection]
SECURITY --> ENCRYPTION[Encryption Validation]
SECURITY --> OFFLINE[Offline Operation]
PERFORMANCE --> BENCHMARKS[Criterion Benchmarks]
PERFORMANCE --> MEMORY[Memory Usage Tests]
PERFORMANCE --> CONCURRENCY[Concurrency Tests]
Build and Distribution Architecture
Feature Matrix
graph TD
FEATURES[Feature Flags] --> DATABASES[Database Support]
FEATURES --> SECURITY[Security Features]
FEATURES --> OPTIONAL[Optional Features]
DATABASES --> POSTGRES[postgresql]
DATABASES --> MYSQL[mysql]
DATABASES --> SQLITE[sqlite]
DATABASES --> MONGODB[mongodb]
DATABASES --> MSSQL[mssql]
SECURITY --> ENCRYPTION[encryption]
SECURITY --> COMPRESSION[compression]
OPTIONAL --> TEMPLATES[templates]
OPTIONAL --> ANALYSIS[analysis]
Binary Variants
DBSurveyor produces multiple binary variants to support different database environments. GoReleaser v2 builds 7 distinct binaries:
dbsurveyor- Documentation postprocessor (all features)dbsurveyor-collect- Data collection tool with variants:all- PostgreSQL, MySQL, SQLite, MongoDB, MSSQLpostgresql- PostgreSQL onlymysql- MySQL onlysqlite- SQLite onlymongodb- MongoDB onlymssql- MSSQL only
Each variant is built with specific feature flags:
# All features (default release artifacts)
cargo zigbuild --release --all-features -p=dbsurveyor-collect
# PostgreSQL-only variant
cargo zigbuild --release --no-default-features \
--features=postgresql,compression,encryption -p=dbsurveyor-collect
# SQLite-only variant
cargo zigbuild --release --no-default-features \
--features=sqlite,compression,encryption -p=dbsurveyor-collect
Artifact Naming: Release artifacts follow the pattern:
dbsurveyor_{variant}_{OS}_{arch}.{tar.gz|zip}
Examples:
dbsurveyor_all_Linux_x86_64.tar.gzdbsurveyor_postgresql_Darwin_x86_64.tar.gzdbsurveyor_sqlite_Windows_x86_64.zip
Deployment Architecture
Airgap Deployment
graph LR
CONNECTED[Connected System] --> VENDOR[cargo vendor]
VENDOR --> PACKAGE[Deployment Package]
PACKAGE --> TRANSFER[Secure Transfer]
TRANSFER --> AIRGAP[Airgap System]
AIRGAP --> OFFLINE_BUILD[Offline Build]
OFFLINE_BUILD --> BINARIES[DBSurveyor Binaries]
CI/CD Integration
DBSurveyor uses GoReleaser v2 with cargo-zigbuild for cross-compilation:
# GitHub Actions release workflow
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
with:
toolchain: 1.93.1
- name: Install Zig
uses: mlugg/setup-zig@v2
with:
version: 0.13.0
- name: Install cargo-zigbuild
run: cargo install --locked cargo-zigbuild --version 0.19.8
- name: Install Cosign
uses: sigstore/cosign-installer@v3
- name: Install Syft
uses: anchore/sbom-action/download-syft@v0
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v6
with:
distribution: goreleaser
version: ~> v2
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
HOMEBREW_TAP_TOKEN: ${{ secrets.HOMEBREW_TAP_TOKEN }}
Cross-Compilation Targets (6 platforms):
x86_64-unknown-linux-gnu- Linux x86_64 (glibc)aarch64-unknown-linux-gnu- Linux ARM64 (glibc)x86_64-unknown-linux-musl- Linux x86_64 (musl/Alpine)x86_64-apple-darwin- macOS Intelaarch64-apple-darwin- macOS Apple Siliconx86_64-pc-windows-gnu- Windows x86_64
Security Features:
- Cosign Keyless Signing: Checksums are signed using GitHub OIDC identity
- Syft SBOM Generation: Software Bill of Materials for all archives
- Reproducible Builds: Consistent timestamps via
{{ .CommitTimestamp }}
This architecture ensures DBSurveyor maintains its security-first principles while providing flexibility, performance, and maintainability across all supported platforms and use cases.
Contributing
We welcome contributions to DBSurveyor! This guide will help you get started with contributing to the project.
Code of Conduct
DBSurveyor follows the Rust Code of Conduct. Please be respectful and inclusive in all interactions.
Getting Started
Prerequisites
- Rust 1.93.1+ (MSRV)
- Git
- Docker (for integration tests)
- Just task runner
Development Setup
# Clone the repository
git clone https://github.com/EvilBit-Labs/dbsurveyor.git
cd dbsurveyor
# Install development tools
just install
# Run initial checks
just dev
Project Structure
dbsurveyor/
βββ dbsurveyor-core/ # Shared library
βββ dbsurveyor-collect/ # Collection binary
βββ dbsurveyor/ # Documentation binary
βββ docs/ # Documentation source
β βββ solutions/ # Documented solutions to past problems
βββ justfile # Development tasks
Development Workflow
Daily Development
# Format, lint, test, and check coverage
just dev
# Run specific test categories
just test-unit
just test-integration
just test-security
# Security validation
just security-full
# Pre-commit checks
just pre-commit
Code Quality Standards
DBSurveyor enforces strict quality standards:
- Zero Warnings:
cargo clippy -- -D warningsmust pass - Test Coverage: 55% minimum coverage required (target: 80%, to be raised incrementally)
- Security First: All code must pass security validation
- Documentation: All public APIs must have
///documentation
Testing Requirements
All contributions must include appropriate tests:
#![allow(unused)]
fn main() {
// Unit tests in source files
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_credential_sanitization() {
let config = ConnectionConfig::new("postgres://user:secret@host/db");
let safe_display = config.to_safe_string();
assert!(!safe_display.contains("secret"));
}
}
// Integration tests in tests/ directory
#[tokio::test]
async fn test_postgres_collection() {
let docker = testcontainers::clients::Cli::default();
let postgres = docker.run(testcontainers::images::postgres::Postgres::default());
// Test implementation
}
// Security tests are mandatory for security-sensitive code
#[tokio::test]
async fn test_no_credentials_in_output() {
let schema = collect_schema("postgres://user:secret@localhost/db").await?;
let json = serde_json::to_string(&schema)?;
assert!(!json.contains("secret"));
}
}
Contribution Types
Bug Reports
When reporting bugs, please include:
- System Information: OS, Rust version, DBSurveyor version
- Reproduction Steps: Minimal example that reproduces the issue
- Expected vs Actual Behavior: Clear description of the problem
- Debug Information: Output with
RUST_LOG=debug
Security Note: Never include actual database credentials in bug reports.
Feature Requests
For new features, please:
- Check Existing Issues: Avoid duplicates
- Describe Use Case: Why is this feature needed?
- Propose Implementation: High-level approach
- Consider Security: How does this maintain security guarantees?
Code Contributions
Pull Request Process
- Fork and Branch: Create a feature branch from
main - Implement Changes: Follow coding standards
- Add Tests: Comprehensive test coverage
- Update Documentation: Keep docs in sync
- Run Quality Checks:
just devmust pass - Submit PR: Clear description and context
Commit Standards
Follow Conventional Commits:
# Feature additions
feat(postgres): add connection pooling with timeout handling
feat(security): implement AES-GCM encryption with random nonces
# Bug fixes
fix(mysql): handle connection failures without exposing credentials
fix(core): ensure proper cleanup of sensitive data structures
# Security improvements
security(core): prevent credential leakage in error messages
security(encryption): add key derivation parameter validation
# Documentation
docs(readme): update installation instructions
docs(security): add encryption implementation details
Database Adapter Development
Adding New Database Support
To add support for a new database:
-
Create Adapter Module:
#![allow(unused)] fn main() { // dbsurveyor-core/src/adapters/newdb.rs pub struct NewDbAdapter { config: ConnectionConfig, } #[async_trait] impl DatabaseAdapter for NewDbAdapter { async fn test_connection(&self) -> Result<()> { ... } async fn collect_schema(&self) -> Result<DatabaseSchema> { ... } async fn sample_table(&self, table_ref: TableRef<'_>, config: &SamplingConfig) -> Result<TableSample> { ... } fn database_type(&self) -> DatabaseType { ... } fn supports_feature(&self, feature: AdapterFeature) -> bool { ... } fn connection_config(&self) -> ConnectionConfig { ... } } } -
Add Feature Flag:
# Cargo.toml [features] newdb = ["dep:newdb-driver"] -
Update Factory:
#![allow(unused)] fn main() { // dbsurveyor-core/src/adapters.rs match database_type { #[cfg(feature = "newdb")] DatabaseType::NewDb => { let adapter = NewDbAdapter::new(connection_string).await?; Ok(Box::new(adapter)) } // ... } } -
Add Tests:
#![allow(unused)] fn main() { // tests/integration/newdb_tests.rs #[tokio::test] async fn test_newdb_collection() { // Integration test with testcontainers } }
Database Adapter Requirements
All database adapters must:
- Implement
DatabaseAdaptertrait completelytest_connection(): Verify database connectivitycollect_schema(): Collect full database schema informationsample_table(): Sample data from a specific tabledatabase_type(): Return the database typesupports_feature(): Indicate supported featuresconnection_config(): Provide connection configuration
- Use read-only operations only (SELECT, DESCRIBE, SHOW)
- Handle connection timeouts (default: 30 seconds)
- Sanitize credentials in all error messages
- Support connection pooling where applicable
- Include comprehensive tests with testcontainers
- Document database-specific features and limitations
The sample_table() Method
The sample_table() method performs per-table sampling for a specific table:
#![allow(unused)]
fn main() {
async fn sample_table(
&self,
table_ref: TableRef<'_>,
config: &SamplingConfig,
) -> Result<TableSample>;
}
Parameters:
table_ref: ATableRefcontaining the table name and optional schema nameconfig: ASamplingConfigwith sampling parameters (sample size, throttle, etc.)
Returns: A TableSample that includes:
- Sampled rows as JSON values
- Sample metadata (size, total rows, strategy used)
- Collection timestamp and any warnings
sample_status: An optional field indicating the sampling outcome
Sample Status Values:
Some(SampleStatus::Complete): Sampling completed successfullySome(SampleStatus::PartialRetry { original_limit }): Sampling partially completed with a reduced limitSome(SampleStatus::Skipped { reason }): Sampling was skipped (e.g., not implemented)None: For backward compatibility with existing data
Implementation Notes:
- The
TableRefstruct wraps the table name and optional schema for the method signature - Implementations should populate
sample_statuswithSampleStatus::Completeon successful sampling - If sampling is not implemented, return
SampleStatus::Skippedwith an appropriate reason - The method supports optional schema qualification (e.g.,
public.usersvsusers)
Testing Database Adapters
# Test specific database adapter
just test-postgres
just test-mysql
just test-sqlite
# Test with real databases using testcontainers
cargo test --test postgres_integration -- --nocapture
# Security testing for new adapters
cargo test --test security_credential_protection
Security Contributions
Security-First Development
All contributions must maintain DBSurveyorβs security guarantees:
- No Credential Exposure: Never log or output credentials
- Offline Operation: No external network calls except to databases
- Encryption Security: Use AES-GCM with random nonces
- Memory Safety: Use
zeroizefor sensitive data
Security Review Process
Security-sensitive changes require additional review:
- Security Tests: Must include security-specific tests
- Threat Model: Consider impact on threat model
- Documentation: Update security documentation
- Review: Additional security-focused code review
Security Testing
#![allow(unused)]
fn main() {
// Example security test
#[tokio::test]
async fn test_new_feature_credential_security() {
// Test that new feature doesn't leak credentials
let result = new_feature("postgres://user:secret@localhost/db").await?;
let output = format!("{:?}", result);
assert!(!output.contains("secret"));
assert!(!output.contains("user:secret"));
}
}
Documentation Contributions
Documentation Standards
- User-Focused: Write for the end user
- Security-Aware: Highlight security implications
- Example-Rich: Include working code examples
- Up-to-Date: Keep in sync with code changes
Documentation Types
- API Documentation:
///comments in code - User Guide: Markdown files in
docs/src/ - README: Project overview and quick start
- Security Documentation: Security features and guarantees
Building Documentation
# Build API documentation
cargo doc --all-features --document-private-items --open
# Build user guide
just docs
# Check documentation
just docs-check
Release Process
Version Management
DBSurveyor uses semantic versioning:
- Major: Breaking changes
- Minor: New features (backward compatible)
- Patch: Bug fixes
Release Checklist
- Update Version: Bump version in
Cargo.toml - Update Changelog: Document all changes
- Run Full Tests:
just security-full - Update Documentation: Ensure docs are current
- Create Release: Tag and create GitHub release
- Verify Artifacts: Test release binaries
Community Guidelines
Communication
- GitHub Issues: Bug reports and feature requests
- Pull Requests: Code contributions and discussions
- Security Issues: Email security@evilbitlabs.io
Review Process
- Automated Checks: CI must pass
- Code Review: Maintainer review required
- Security Review: For security-sensitive changes
- Documentation Review: For user-facing changes
Recognition
Contributors are recognized in:
CONTRIBUTORS.mdfile- Release notes
- Git commit history
Development Environment
Recommended Tools
- IDE: VS Code with Rust Analyzer
- Git Hooks: Pre-commit hooks for quality checks
- Testing: Nextest for faster test execution
- Debugging:
RUST_LOG=debugfor detailed logging
Environment Variables
# Development environment
export RUST_LOG=debug
export DATABASE_URL="postgres://dev:dev@localhost/dev_db"
# Testing environment
export RUST_LOG=trace
export DBSURVEYOR_TEST_TIMEOUT=60
Docker Development
# Start test databases
docker-compose up -d postgres mysql mongodb
# Run integration tests
just test-integration
# Clean up
docker-compose down
Troubleshooting Development Issues
Common Issues
Build failures:
# Clean and rebuild
cargo clean
cargo build --all-features
# Update toolchain
rustup update
Test failures:
# Run specific test with output
cargo test test_name -- --nocapture
# Run with debug logging
RUST_LOG=debug cargo test test_name
Clippy warnings:
# Fix automatically where possible
cargo clippy --fix --allow-dirty
# Check specific warnings
cargo clippy --workspace --all-targets --all-features -- -D warnings
Getting Help
- Documentation: Check existing docs first
- Issues: Search existing GitHub issues
- Code: Look at similar implementations
- Community: Ask questions in GitHub discussions
License and Legal
License
DBSurveyor is licensed under the Apache License 2.0. By contributing, you agree to license your contributions under the same license.
Copyright
All contributions must include appropriate copyright headers:
#![allow(unused)]
fn main() {
// Copyright 2024 EvilBit Labs
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
}
Contributor License Agreement
By submitting a pull request, you represent that:
- You have the right to license your contribution
- You agree to license it under the Apache License 2.0
- Your contribution is your original work
Thank you for contributing to DBSurveyor! Your contributions help make database documentation more secure and accessible for everyone.
Testing
DBSurveyor uses a comprehensive testing strategy to ensure security, reliability, and correctness. This guide covers the testing framework, practices, and how to run tests.
Testing Philosophy
- Security-First Testing: All tests must verify security guarantees
- Comprehensive Coverage: 55% minimum test coverage with
cargo llvm-cov(target: 80%, to be raised incrementally) - Real Database Integration: Use testcontainers for authentic testing
- Zero Warnings: All test code must pass
cargo clippy -- -D warnings
Test Organization
Test Categories
DBSurveyor organizes tests into distinct categories:
graph TD
TESTS[Test Suite] --> UNIT[Unit Tests]
TESTS --> INTEGRATION[Integration Tests]
TESTS --> SECURITY[Security Tests]
TESTS --> PERFORMANCE[Performance Tests]
UNIT --> MODELS[Model Tests]
UNIT --> ADAPTERS[Adapter Unit Tests]
UNIT --> UTILS[Utility Tests]
INTEGRATION --> POSTGRES[PostgreSQL Tests]
INTEGRATION --> MYSQL[MySQL Tests]
INTEGRATION --> SQLITE[SQLite Tests]
INTEGRATION --> MONGODB[MongoDB Tests]
SECURITY --> CREDENTIALS[Credential Protection]
SECURITY --> ENCRYPTION[Encryption Tests]
SECURITY --> OFFLINE[Offline Operation]
PERFORMANCE --> BENCHMARKS[Criterion Benchmarks]
PERFORMANCE --> MEMORY[Memory Tests]
PERFORMANCE --> CONCURRENCY[Concurrency Tests]
Test File Structure
dbsurveyor/
βββ dbsurveyor-core/
β βββ src/
β βββ lib.rs # Unit tests in #[cfg(test)]
β βββ models.rs # Model unit tests
β βββ security.rs # Security unit tests
βββ tests/
β βββ integration/
β β βββ postgres_tests.rs # PostgreSQL integration
β β βββ mysql_tests.rs # MySQL integration
β β βββ sqlite_tests.rs # SQLite integration
β βββ security/
β β βββ credential_tests.rs # Credential protection
β β βββ encryption_tests.rs # Encryption validation
β β βββ offline_tests.rs # Offline operation
β βββ fixtures/
β βββ sample_schemas/ # Test schema files
β βββ test_data/ # Test databases
βββ benches/
βββ collection.rs # Collection benchmarks
βββ documentation.rs # Documentation benchmarks
Running Tests
Basic Test Commands
# Run all tests with nextest (default)
just test
# Run specific test categories
just test-unit # Unit tests only
just test-integration # Integration tests only
just test-security # Security tests only
# Run tests for specific databases
just test-postgres # PostgreSQL tests
just test-mysql # MySQL tests
just test-sqlite # SQLite tests
Advanced Test Options
# Run tests with different profiles
just test-dev # Development profile (fast)
just test-ci # CI profile (comprehensive)
# Run with verbose output
just test-verbose
# Run specific test by name
cargo nextest run test_postgres_connection
# Run tests with debug logging
RUST_LOG=debug cargo nextest run
Coverage Testing
# Generate coverage report (55% minimum required, target: 80%)
just coverage
# Generate HTML coverage report
just coverage-html
# CI-friendly coverage
just coverage-ci
Unit Testing
Test Structure
Unit tests are co-located with source code using #[cfg(test)] modules:
#![allow(unused)]
fn main() {
// dbsurveyor-core/src/models.rs
impl DatabaseSchema {
pub fn new(database_info: DatabaseInfo) -> Self {
// Implementation
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_database_schema_creation() {
let db_info = DatabaseInfo::new("test_db".to_string());
let schema = DatabaseSchema::new(db_info);
assert_eq!(schema.format_version, "1.0");
assert_eq!(schema.database_info.name, "test_db");
assert_eq!(schema.object_count(), 0);
}
#[test]
fn test_with_warning() {
let db_info = DatabaseInfo::new("test_db".to_string());
let schema = DatabaseSchema::new(db_info);
let schema = schema.with_warning("Test warning".to_string());
assert_eq!(schema.collection_metadata.warnings.len(), 1);
assert_eq!(schema.collection_metadata.warnings[0], "Test warning");
}
}
}
Security Unit Tests
All security-sensitive code must include security-focused unit tests:
#![allow(unused)]
fn main() {
#[cfg(test)]
mod security_tests {
use super::*;
#[test]
fn test_credential_sanitization() {
let url = "postgres://user:secret123@localhost:5432/db";
let sanitized = redact_database_url(url);
// Verify credentials are sanitized
assert!(!sanitized.contains("secret123"));
assert!(sanitized.contains("user:****"));
assert!(sanitized.contains("localhost:5432/db"));
}
#[test]
fn test_connection_config_display() {
let config = ConnectionConfig::new("localhost".to_string())
.with_port(5432)
.with_database("testdb".to_string());
let display_output = format!("{}", config);
// Should show connection info but never credentials
assert!(display_output.contains("localhost:5432"));
assert!(display_output.contains("testdb"));
assert!(!display_output.contains("password"));
assert!(!display_output.contains("secret"));
}
}
}
Integration Testing
Database Integration Tests
Integration tests use testcontainers to spin up real databases:
#![allow(unused)]
fn main() {
// tests/integration/postgres_tests.rs
use testcontainers::{clients, images};
use dbsurveyor_core::adapters::create_adapter;
#[tokio::test]
async fn test_postgres_schema_collection() {
let docker = clients::Cli::default();
let postgres = docker.run(images::postgres::Postgres::default());
let port = postgres.get_host_port_ipv4(5432);
let database_url = format!(
"postgres://postgres:postgres@localhost:{}/postgres",
port
);
// Wait for PostgreSQL to be ready
wait_for_postgres(&database_url).await;
// Create test schema
let pool = sqlx::PgPool::connect(&database_url).await
.expect("Failed to connect to PostgreSQL");
sqlx::query("CREATE TABLE IF NOT EXISTS users (id INT PRIMARY KEY, name TEXT)")
.execute(&pool)
.await
.expect("Failed to create test table");
// Test schema collection
let adapter = create_adapter(&database_url).await
.expect("Failed to create adapter");
let schema = adapter.collect_schema().await
.expect("Failed to collect schema");
// Verify schema was collected correctly
assert!(!schema.tables.is_empty());
let users_table = schema.tables.iter()
.find(|t| t.name == "users")
.expect("Users table not found");
assert_eq!(users_table.name, "users");
assert_eq!(users_table.columns.len(), 2);
// Clean up
pool.close().await;
}
async fn wait_for_postgres(database_url: &str) {
let max_attempts = 30;
let mut attempts = 0;
while attempts < max_attempts {
match sqlx::PgPool::connect(database_url).await {
Ok(pool) => {
pool.close().await;
return;
}
Err(_) => {
attempts += 1;
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
}
}
}
panic!("PostgreSQL failed to become ready after {} attempts", max_attempts);
}
}
MySQL Integration Tests
#![allow(unused)]
fn main() {
// tests/integration/mysql_tests.rs
#[tokio::test]
async fn test_mysql_schema_collection() {
let docker = clients::Cli::default();
let mysql = docker.run(
images::mysql::Mysql::default()
.with_root_password("testpass")
.with_database("testdb")
);
let port = mysql.get_host_port_ipv4(3306);
let database_url = format!(
"mysql://root:testpass@localhost:{}/testdb",
port
);
// Wait for MySQL to be ready with exponential backoff
wait_for_mysql(&database_url).await;
let adapter = create_adapter(&database_url).await
.expect("Failed to create MySQL adapter");
let schema = adapter.collect_schema().await
.expect("Failed to collect MySQL schema");
assert!(!schema.tables.is_empty());
}
}
SQLite Integration Tests
#![allow(unused)]
fn main() {
// tests/integration/sqlite_tests.rs
use tempfile::tempdir;
#[tokio::test]
async fn test_sqlite_schema_collection() {
let temp_dir = tempdir().unwrap();
let db_path = temp_dir.path().join("test.db");
// Create test database
let conn = sqlx::SqlitePool::connect(&format!("sqlite://{}", db_path.display())).await?;
sqlx::query("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT NOT NULL)")
.execute(&conn).await?;
conn.close().await;
// Test schema collection
let database_url = format!("sqlite://{}", db_path.display());
let adapter = create_adapter(&database_url).await
.expect("Failed to create SQLite adapter");
let schema = adapter.collect_schema().await
.expect("Failed to collect SQLite schema");
assert!(schema.tables.iter().any(|t| t.name == "users"));
}
}
Security Testing
Credential Protection Tests
Security tests verify that credentials never appear in outputs:
#![allow(unused)]
fn main() {
// tests/security/credential_tests.rs
#[tokio::test]
async fn test_no_credentials_in_schema_output() -> Result<(), Box<dyn std::error::Error>> {
let database_url = "postgres://testuser:secret123@localhost:5432/testdb";
// Create mock schema (in real test, would collect from database)
let db_info = DatabaseInfo::new("testdb".to_string());
let schema = DatabaseSchema::new(db_info);
let json_output = serde_json::to_string(&schema)?;
// Verify no sensitive data is present
assert!(!json_output.contains("secret123"));
assert!(!json_output.contains("testuser:secret123"));
assert!(!json_output.contains("password"));
assert!(!json_output.contains("secret"));
Ok(())
}
#[test]
fn test_error_message_sanitization() {
let connection_url = "postgres://user:secret@host:5432/db";
// Simulate connection error
let error = DbSurveyorError::connection_failed("Connection timeout");
let error_message = format!("{}", error);
// Error message should not contain credentials
assert!(!error_message.contains("secret"));
assert!(!error_message.contains("user:secret"));
}
}
Encryption Tests
Comprehensive encryption testing ensures security guarantees:
#![allow(unused)]
fn main() {
// tests/security/encryption_tests.rs
#[cfg(feature = "encryption")]
mod encryption_tests {
use dbsurveyor_core::security::encryption::{encrypt_data, decrypt_data};
#[tokio::test]
async fn test_encryption_roundtrip() {
let original_data = b"sensitive database schema data";
let password = "test_password_123";
// Encrypt the data
let encrypted = encrypt_data(original_data, password).unwrap();
// Verify encrypted data structure
assert_eq!(encrypted.algorithm, "AES-GCM-256");
assert_eq!(encrypted.nonce.len(), 12); // 96 bits
assert_eq!(encrypted.auth_tag.len(), 16); // 128 bits
assert!(!encrypted.ciphertext.is_empty());
// Decrypt the data
let decrypted = decrypt_data(&encrypted, password).unwrap();
// Verify roundtrip
assert_eq!(original_data, &decrypted[..]);
}
#[tokio::test]
async fn test_nonce_uniqueness() {
let data = b"test data for nonce uniqueness";
let password = "same_password";
// Generate multiple encryptions
let encrypted1 = encrypt_data(data, password).unwrap();
let encrypted2 = encrypt_data(data, password).unwrap();
let encrypted3 = encrypt_data(data, password).unwrap();
// Nonces should be different (random)
assert_ne!(encrypted1.nonce, encrypted2.nonce);
assert_ne!(encrypted2.nonce, encrypted3.nonce);
assert_ne!(encrypted1.nonce, encrypted3.nonce);
// All should decrypt to same plaintext
let decrypted1 = decrypt_data(&encrypted1, password).unwrap();
let decrypted2 = decrypt_data(&encrypted2, password).unwrap();
let decrypted3 = decrypt_data(&encrypted3, password).unwrap();
assert_eq!(data, &decrypted1[..]);
assert_eq!(data, &decrypted2[..]);
assert_eq!(data, &decrypted3[..]);
}
#[tokio::test]
async fn test_wrong_password_fails() {
let data = b"secret data";
let correct_password = "correct_password";
let wrong_password = "wrong_password";
let encrypted = encrypt_data(data, correct_password).unwrap();
// Decryption with wrong password should fail
let result = decrypt_data(&encrypted, wrong_password);
assert!(result.is_err());
}
}
}
Offline Operation Tests
Verify that DBSurveyor works completely offline:
#![allow(unused)]
fn main() {
// tests/security/offline_tests.rs
#[tokio::test]
async fn test_airgap_compatibility() -> Result<(), Box<dyn std::error::Error>> {
// Simulate airgap environment by testing without network access
let schema_data = include_bytes!("../fixtures/sample_schema.json");
let schema: DatabaseSchema = serde_json::from_slice(schema_data)?;
// All processing should work offline
let documentation = generate_documentation(&schema, OutputFormat::Markdown).await?;
assert!(!documentation.is_empty());
let json_export = generate_documentation(&schema, OutputFormat::Json).await?;
assert!(!json_export.is_empty());
Ok(())
}
}
Performance Testing
Benchmarks
Performance benchmarks use Criterion:
#![allow(unused)]
fn main() {
// benches/collection.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use dbsurveyor_core::models::DatabaseSchema;
fn bench_schema_serialization(c: &mut Criterion) {
let schema = create_large_test_schema(1000); // 1000 tables
c.bench_function("schema_to_json", |b| {
b.iter(|| serde_json::to_string(black_box(&schema)))
});
c.bench_function("schema_to_markdown", |b| {
b.iter(|| generate_markdown_documentation(black_box(&schema)))
});
}
fn create_large_test_schema(table_count: usize) -> DatabaseSchema {
let db_info = DatabaseInfo::new("benchmark_db".to_string());
let mut schema = DatabaseSchema::new(db_info);
for i in 0..table_count {
let table = create_test_table(&format!("table_{}", i));
schema.tables.push(table);
}
schema
}
criterion_group!(benches, bench_schema_serialization);
criterion_main!(benches);
}
Memory Testing
#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_memory_usage_large_schema() {
let initial_memory = get_memory_usage();
// Process large schema
let large_schema = create_schema_with_tables(10000);
let _documentation = generate_documentation(&large_schema, OutputFormat::Html).await?;
let final_memory = get_memory_usage();
let memory_increase = final_memory - initial_memory;
// Ensure memory usage is reasonable (< 100MB for 10k tables)
assert!(memory_increase < 100 * 1024 * 1024);
}
}
Test Configuration
Nextest Configuration
DBSurveyor uses nextest for enhanced test execution:
# .config/nextest.toml
[profile.default]
retries = 2
test-threads = 4
failure-output = "immediate"
success-output = "never"
# Test groups for different types of tests
[[profile.default.overrides]]
filter = "test(integration)"
test-group = "integration"
max-threads = 2 # Limit concurrent container tests
[[profile.default.overrides]]
filter = "test(security)"
test-group = "security"
max-threads = 1 # Security tests run sequentially
[[profile.default.overrides]]
filter = "test(unit)"
test-group = "unit"
max-threads = 8 # Unit tests can run in parallel
CI Test Configuration
# .github/workflows/test.yml
- name: Run Tests
run: |
# Run tests with CI profile
cargo nextest run --profile ci --workspace --all-features
# Generate coverage
cargo llvm-cov --lcov --output-path lcov.info
# Security validation
just security-full
Test Data and Fixtures
Test Data Generation
#![allow(unused)]
fn main() {
// tests/fixtures/mod.rs
pub fn create_test_schema() -> DatabaseSchema {
let db_info = DatabaseInfo::new("test_db".to_string());
let mut schema = DatabaseSchema::new(db_info);
schema.tables = vec![
create_test_table("users"),
create_test_table("orders"),
create_test_table("products"),
];
schema
}
pub fn create_test_table(name: &str) -> Table {
Table {
name: name.to_string(),
schema: Some("public".to_string()),
columns: vec![Column {
name: "id".to_string(),
data_type: UnifiedDataType::Integer {
bits: 32,
signed: true,
},
is_nullable: false,
is_primary_key: true,
is_auto_increment: true,
default_value: None,
comment: None,
ordinal_position: 1,
}],
primary_key: Some(PrimaryKey {
name: Some(format!("{}_pkey", name)),
columns: vec!["id".to_string()],
}),
foreign_keys: vec![],
indexes: vec![],
constraints: vec![],
comment: None,
row_count: Some(100),
}
}
}
Fixture Files
tests/fixtures/
βββ sample_schemas/
β βββ postgres_sample.json
β βββ mysql_sample.json
β βββ sqlite_sample.json
βββ test_databases/
β βββ small_db.sql
β βββ medium_db.sql
β βββ large_db.sql
βββ encrypted_samples/
βββ encrypted_schema.enc
βββ compressed_schema.zst
Testing Best Practices
Test Quality Standards
- Comprehensive: Test happy path, error cases, and edge conditions
- Isolated: Tests should not depend on external services (except testcontainers)
- Deterministic: Tests must produce consistent results
- Fast: Unit tests should complete in milliseconds
- Secure: No real credentials in test code; use explicit dummy values only
Common Testing Patterns
#![allow(unused)]
fn main() {
// Use Result<(), Box<dyn std::error::Error>> for test functions
#[tokio::test]
async fn test_function() -> Result<(), Box<dyn std::error::Error>> {
// Test implementation
Ok(())
}
// Test both Ok and Err cases
#[test]
fn test_validation() {
// Test success case
assert!(validate_input("valid").is_ok());
// Test failure cases
assert!(validate_input("").is_err());
assert!(validate_input("invalid").is_err());
}
// Use assert_matches! for pattern matching
#[test]
fn test_error_types() {
let result = operation_that_fails();
assert_matches!(result, Err(DbSurveyorError::Configuration { .. }));
}
}
Security Testing Requirements
Every security-sensitive function must have tests that verify:
- No credential leakage in any output
- Proper error sanitization in all error paths
- Secure memory handling with automatic cleanup
- Offline operation without external dependencies
Continuous Integration
GitHub Actions Integration
name: Test Suite
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Rust
uses: dtolnay/rust-toolchain@stable
- name: Install tools
run: just install
- name: Run tests
run: just test-ci
- name: Security validation
run: just security-full
- name: Coverage
run: just coverage-ci
Test Reporting
- Coverage Reports: Uploaded to Codecov
- Test Results: JUnit XML format for CI integration
- Performance: Criterion benchmark results
- Security: Security test results and audit reports
This comprehensive testing strategy ensures DBSurveyor maintains its security guarantees while providing reliable functionality across all supported platforms and databases.