Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Output Formats

DBSurveyor generates structured output in multiple formats to support different use cases and workflows.

Schema File Formats

Standard JSON Format (.dbsurveyor.json)

The primary output format is a JSON file following the DBSurveyor Schema v1.0 specification:

{
  "format_version": "1.0",
  "database_info": {
    "name": "production_db",
    "version": "13.7",
    "size_bytes": 1073741824,
    "encoding": "UTF8",
    "collation": "en_US.UTF-8",
    "access_level": "Full",
    "collection_status": "Success"
  },
  "tables": [
    {
      "name": "users",
      "schema": "public",
      "columns": [
        {
          "name": "id",
          "data_type": {
            "Integer": {
              "bits": 32,
              "signed": true
            }
          },
          "is_nullable": false,
          "is_primary_key": true,
          "is_auto_increment": true,
          "ordinal_position": 1
        }
      ],
      "primary_key": {
        "name": "users_pkey",
        "columns": [
          "id"
        ]
      },
      "foreign_keys": [],
      "indexes": [],
      "constraints": []
    }
  ],
  "views": [],
  "indexes": [],
  "constraints": [],
  "procedures": [],
  "functions": [],
  "triggers": [],
  "custom_types": [],
  "collection_metadata": {
    "collected_at": "2024-01-15T10:30:00Z",
    "collection_duration_ms": 1500,
    "collector_version": "1.0.0",
    "warnings": []
  }
}

Table Sampling Data

Tables may optionally include a sampling field containing sampled row data with metadata:

{
  "table_name": "users",
  "schema_name": "public",
  "rows": [
    {
      "id": 1,
      "username": "alice",
      "email": "alice@example.com"
    },
    {
      "id": 2,
      "username": "bob",
      "email": "bob@example.com"
    }
  ],
  "sample_size": 2,
  "total_rows": 1234,
  "sampling_strategy": {
    "MostRecent": {
      "limit": 10
    }
  },
  "collected_at": "2024-01-15T10:30:00Z",
  "warnings": [],
  "sample_status": "Complete"
}

Sample Status Field

The optional sample_status field indicates the outcome of the sampling operation:

  • "Complete": Sampling completed successfully with no issues
  • {"PartialRetry": {"original_limit": 10}}: Sampling partially completed with a reduced row limit (may be retried)
  • {"Skipped": {"reason": "table too large"}}: Sampling was skipped or not performed

Backward Compatibility: The sample_status field is optional and omitted when not set, ensuring older JSON files without this field remain valid.

Compressed Format (.dbsurveyor.json.zst)

Large schema files can be compressed using Zstandard compression:

# Generate compressed output
dbsurveyor-collect --compress postgres://localhost/large_db

# Postprocessor automatically detects and decompresses
dbsurveyor generate schema.dbsurveyor.json.zst

Benefits:

  • 60-80% size reduction for large schemas
  • Fast compression/decompression
  • Maintains full data integrity

Encrypted Format (.enc)

Sensitive schema data can be encrypted using AES-GCM-256:

# Generate encrypted output (prompts for password)
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db

# Postprocessor prompts for decryption password
dbsurveyor generate schema.enc

Security Features:

  • AES-GCM-256 authenticated encryption
  • Argon2id key derivation (64 MiB memory, 3 iterations)
  • Random 96-bit nonces (never reused)
  • Embedded KDF parameters for decryption

Encrypted File Structure:

{
  "algorithm": "AES-GCM-256",
  "nonce": [/* 12 bytes */],
  "ciphertext": [/* encrypted data */],
  "auth_tag": [/* 16 bytes */],
  "kdf_params": {
    "salt": [/* 16 bytes */],
    "memory_cost": 65536,
    "time_cost": 3,
    "parallelism": 4,
    "version": "1.3"
  }
}

Documentation Formats

Markdown Documentation (.md)

Status: ✅ Implemented

Basic Markdown documentation with schema overview:

dbsurveyor generate schema.json --format markdown

Output Example:

# Database Schema: production_db

Generated by DBSurveyor v1.0.0
Collection Date: 2024-01-15 10:30:00 UTC

## Summary

- **Tables**: 25
- **Views**: 3
- **Indexes**: 47

## Tables

### users
- **Schema**: public
- **Columns**: 5
- **Primary Key**: id

JSON Analysis Report (.json)

Status: ✅ Implemented

Structured analysis data for programmatic consumption:

dbsurveyor analyze schema.json --detailed

Output Example:

{
  "database_name": "production_db",
  "table_count": 25,
  "view_count": 3,
  "index_count": 47,
  "constraint_count": 32,
  "collection_date": "2024-01-15T10:30:00Z",
  "analysis": {
    "largest_table": "audit_log",
    "most_indexes": "users",
    "foreign_key_relationships": 18
  }
}

HTML Report (.html)

Status: 🚧 Placeholder Implementation

Interactive HTML reports with search and navigation:

dbsurveyor generate schema.json --format html

Planned Features:

  • Interactive table browser
  • Search functionality
  • Relationship visualization
  • Responsive design for mobile/desktop
  • Offline-compatible (no external dependencies)

Mermaid ERD (.mmd)

Status: 🚧 Placeholder Implementation

Entity Relationship Diagrams using Mermaid syntax:

dbsurveyor generate schema.json --format mermaid

Planned Output:

erDiagram
    users {
        int id PK
        string username
        string email UK
        datetime created_at
    }

    orders {
        int id PK
        int user_id FK
        decimal total
        datetime created_at
    }

    users ||--o{ orders : "has many"

SQL DDL (.sql)

Status: 🚧 Placeholder Implementation

Reconstructed SQL DDL for schema recreation:

dbsurveyor sql schema.json --dialect postgresql --output recreate.sql

Planned Features:

  • Multiple SQL dialect support (PostgreSQL, MySQL, SQLite, SQL Server)
  • Complete schema recreation scripts
  • Data type mapping between databases
  • Constraint and index recreation
  • Migration script generation

Unified Data Type System

DBSurveyor uses a unified type system to represent data types across different databases:

Basic Types

{
  "String": {
    "max_length": 255
  },
  "Integer": {
    "bits": 32,
    "signed": true
  },
  "Float": {
    "precision": 53
  },
  "Boolean": null,
  "Date": null,
  "DateTime": {
    "with_timezone": true
  },
  "Time": {
    "with_timezone": false
  }
}

Advanced Types

{
  "Binary": {
    "max_length": 1024
  },
  "Json": null,
  "Uuid": null,
  "Array": {
    "element_type": {
      "String": {
        "max_length": 50
      }
    }
  },
  "Custom": {
    "type_name": "user_status_enum"
  }
}

Database-Specific Mapping

DBSurveyor TypePostgreSQLMySQLSQLiteMongoDB
StringVARCHAR, TEXTVARCHAR, TEXTTEXTstring
IntegerINTEGER, BIGINTINT, BIGINTINTEGERint, long
FloatREAL, DOUBLEFLOAT, DOUBLEREALdouble
BooleanBOOLEANBOOLEANINTEGERbool
JsonJSON, JSONBJSONTEXTobject
ArrayARRAY[]JSONTEXTarray
CustomENUM, DOMAINENUMN/AN/A

Schema Validation

All output files are validated against the DBSurveyor JSON Schema v1.0:

Validation Features

  • Format Version Checking: Ensures compatibility
  • Required Field Validation: All mandatory fields present
  • Data Type Validation: Correct type structure
  • Security Validation: No credential fields allowed
  • Relationship Validation: Foreign key consistency

Validation Errors

# Validate a schema file
dbsurveyor validate schema.json

# Example validation error
Error: Schema validation failed
- Missing required field: collection_metadata.collected_at
- Invalid data type: expected Integer, found String
- Security violation: credential field detected in column default_value

File Size Considerations

Compression Recommendations

Schema SizeRecommendationExpected Compression
< 1 MBStandard JSONN/A
1-10 MBZstandard compression60-70% reduction
> 10 MBCompression + chunking70-80% reduction

Large Schema Handling

For very large databases (1000+ tables):

# Use compression for large schemas
dbsurveyor-collect --compress postgres://localhost/large_db

# Consider sampling for development
dbsurveyor-collect --sample 10 postgres://localhost/large_db

Security Considerations

Credential Protection

All output formats guarantee:

  • No database credentials in any output file
  • Connection strings sanitized in metadata
  • Error messages sanitized
  • Validation prevents credential leakage

Encryption Best Practices

# Use strong passwords for encryption
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db

# Verify encrypted file integrity
dbsurveyor validate schema.enc

Airgap Compatibility

All formats work completely offline:

  • No external dependencies in generated files
  • Self-contained documentation
  • Local-only processing
  • No network calls during generation

Format Evolution

Version Compatibility

  • v1.0: Current format with full backward compatibility
  • Future versions: Additive changes only
  • Migration tools: Automatic format upgrades
  • Deprecation policy: 2-version support window

Schema Extensions

The format supports extensions for:

  • Database-specific features
  • Custom metadata
  • Plugin-generated data
  • Performance metrics

Integration Examples

CI/CD Pipeline

# Generate schema documentation in CI
  - name: Collect Database Schema
    run: |
      dbsurveyor-collect --compress $DATABASE_URL
      dbsurveyor generate schema.dbsurveyor.json.zst --format markdown

  - name: Upload Documentation
    uses: actions/upload-artifact@v3
    with:
      name: database-docs
      path: schema.md

Programmatic Access

import json

# Load schema data
with open("schema.dbsurveyor.json") as f:
    schema = json.load(f)

# Analyze tables
for table in schema["tables"]:
    print(f"Table: {table['name']}")
    print(f"Columns: {len(table['columns'])}")

Documentation Automation

#!/bin/bash
# Automated documentation generation

# Collect from multiple environments
dbsurveyor-collect --output prod.json $PROD_DB_URL
dbsurveyor-collect --output staging.json $STAGING_DB_URL

# Generate documentation
dbsurveyor generate prod.json --format html --output prod-schema.html
dbsurveyor generate staging.json --format html --output staging-schema.html

# Compare schemas (future feature)
# dbsurveyor compare prod.json staging.json --output diff.html

This comprehensive output format system ensures that DBSurveyor can adapt to various workflows while maintaining security and compatibility guarantees.