Output Formats
DBSurveyor generates structured output in multiple formats to support different use cases and workflows.
Schema File Formats
Standard JSON Format (.dbsurveyor.json)
The primary output format is a JSON file following the DBSurveyor Schema v1.0 specification:
{
"format_version": "1.0",
"database_info": {
"name": "production_db",
"version": "13.7",
"size_bytes": 1073741824,
"encoding": "UTF8",
"collation": "en_US.UTF-8",
"access_level": "Full",
"collection_status": "Success"
},
"tables": [
{
"name": "users",
"schema": "public",
"columns": [
{
"name": "id",
"data_type": {
"Integer": {
"bits": 32,
"signed": true
}
},
"is_nullable": false,
"is_primary_key": true,
"is_auto_increment": true,
"ordinal_position": 1
}
],
"primary_key": {
"name": "users_pkey",
"columns": [
"id"
]
},
"foreign_keys": [],
"indexes": [],
"constraints": []
}
],
"views": [],
"indexes": [],
"constraints": [],
"procedures": [],
"functions": [],
"triggers": [],
"custom_types": [],
"collection_metadata": {
"collected_at": "2024-01-15T10:30:00Z",
"collection_duration_ms": 1500,
"collector_version": "1.0.0",
"warnings": []
}
}
Table Sampling Data
Tables may optionally include a sampling field containing sampled row data with metadata:
{
"table_name": "users",
"schema_name": "public",
"rows": [
{
"id": 1,
"username": "alice",
"email": "alice@example.com"
},
{
"id": 2,
"username": "bob",
"email": "bob@example.com"
}
],
"sample_size": 2,
"total_rows": 1234,
"sampling_strategy": {
"MostRecent": {
"limit": 10
}
},
"collected_at": "2024-01-15T10:30:00Z",
"warnings": [],
"sample_status": "Complete"
}
Sample Status Field
The optional sample_status field indicates the outcome of the sampling operation:
"Complete": Sampling completed successfully with no issues{"PartialRetry": {"original_limit": 10}}: Sampling partially completed with a reduced row limit (may be retried){"Skipped": {"reason": "table too large"}}: Sampling was skipped or not performed
Backward Compatibility: The sample_status field is optional and omitted when not set, ensuring older JSON files without this field remain valid.
Compressed Format (.dbsurveyor.json.zst)
Large schema files can be compressed using Zstandard compression:
# Generate compressed output
dbsurveyor-collect --compress postgres://localhost/large_db
# Postprocessor automatically detects and decompresses
dbsurveyor generate schema.dbsurveyor.json.zst
Benefits:
- 60-80% size reduction for large schemas
- Fast compression/decompression
- Maintains full data integrity
Encrypted Format (.enc)
Sensitive schema data can be encrypted using AES-GCM-256:
# Generate encrypted output (prompts for password)
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db
# Postprocessor prompts for decryption password
dbsurveyor generate schema.enc
Security Features:
- AES-GCM-256 authenticated encryption
- Argon2id key derivation (64 MiB memory, 3 iterations)
- Random 96-bit nonces (never reused)
- Embedded KDF parameters for decryption
Encrypted File Structure:
{
"algorithm": "AES-GCM-256",
"nonce": [/* 12 bytes */],
"ciphertext": [/* encrypted data */],
"auth_tag": [/* 16 bytes */],
"kdf_params": {
"salt": [/* 16 bytes */],
"memory_cost": 65536,
"time_cost": 3,
"parallelism": 4,
"version": "1.3"
}
}
Documentation Formats
Markdown Documentation (.md)
Status: ✅ Implemented
Basic Markdown documentation with schema overview:
dbsurveyor generate schema.json --format markdown
Output Example:
# Database Schema: production_db
Generated by DBSurveyor v1.0.0
Collection Date: 2024-01-15 10:30:00 UTC
## Summary
- **Tables**: 25
- **Views**: 3
- **Indexes**: 47
## Tables
### users
- **Schema**: public
- **Columns**: 5
- **Primary Key**: id
JSON Analysis Report (.json)
Status: ✅ Implemented
Structured analysis data for programmatic consumption:
dbsurveyor analyze schema.json --detailed
Output Example:
{
"database_name": "production_db",
"table_count": 25,
"view_count": 3,
"index_count": 47,
"constraint_count": 32,
"collection_date": "2024-01-15T10:30:00Z",
"analysis": {
"largest_table": "audit_log",
"most_indexes": "users",
"foreign_key_relationships": 18
}
}
HTML Report (.html)
Status: 🚧 Placeholder Implementation
Interactive HTML reports with search and navigation:
dbsurveyor generate schema.json --format html
Planned Features:
- Interactive table browser
- Search functionality
- Relationship visualization
- Responsive design for mobile/desktop
- Offline-compatible (no external dependencies)
Mermaid ERD (.mmd)
Status: 🚧 Placeholder Implementation
Entity Relationship Diagrams using Mermaid syntax:
dbsurveyor generate schema.json --format mermaid
Planned Output:
erDiagram
users {
int id PK
string username
string email UK
datetime created_at
}
orders {
int id PK
int user_id FK
decimal total
datetime created_at
}
users ||--o{ orders : "has many"
SQL DDL (.sql)
Status: 🚧 Placeholder Implementation
Reconstructed SQL DDL for schema recreation:
dbsurveyor sql schema.json --dialect postgresql --output recreate.sql
Planned Features:
- Multiple SQL dialect support (PostgreSQL, MySQL, SQLite, SQL Server)
- Complete schema recreation scripts
- Data type mapping between databases
- Constraint and index recreation
- Migration script generation
Unified Data Type System
DBSurveyor uses a unified type system to represent data types across different databases:
Basic Types
{
"String": {
"max_length": 255
},
"Integer": {
"bits": 32,
"signed": true
},
"Float": {
"precision": 53
},
"Boolean": null,
"Date": null,
"DateTime": {
"with_timezone": true
},
"Time": {
"with_timezone": false
}
}
Advanced Types
{
"Binary": {
"max_length": 1024
},
"Json": null,
"Uuid": null,
"Array": {
"element_type": {
"String": {
"max_length": 50
}
}
},
"Custom": {
"type_name": "user_status_enum"
}
}
Database-Specific Mapping
| DBSurveyor Type | PostgreSQL | MySQL | SQLite | MongoDB |
|---|---|---|---|---|
String | VARCHAR, TEXT | VARCHAR, TEXT | TEXT | string |
Integer | INTEGER, BIGINT | INT, BIGINT | INTEGER | int, long |
Float | REAL, DOUBLE | FLOAT, DOUBLE | REAL | double |
Boolean | BOOLEAN | BOOLEAN | INTEGER | bool |
Json | JSON, JSONB | JSON | TEXT | object |
Array | ARRAY[] | JSON | TEXT | array |
Custom | ENUM, DOMAIN | ENUM | N/A | N/A |
Schema Validation
All output files are validated against the DBSurveyor JSON Schema v1.0:
Validation Features
- Format Version Checking: Ensures compatibility
- Required Field Validation: All mandatory fields present
- Data Type Validation: Correct type structure
- Security Validation: No credential fields allowed
- Relationship Validation: Foreign key consistency
Validation Errors
# Validate a schema file
dbsurveyor validate schema.json
# Example validation error
Error: Schema validation failed
- Missing required field: collection_metadata.collected_at
- Invalid data type: expected Integer, found String
- Security violation: credential field detected in column default_value
File Size Considerations
Compression Recommendations
| Schema Size | Recommendation | Expected Compression |
|---|---|---|
| < 1 MB | Standard JSON | N/A |
| 1-10 MB | Zstandard compression | 60-70% reduction |
| > 10 MB | Compression + chunking | 70-80% reduction |
Large Schema Handling
For very large databases (1000+ tables):
# Use compression for large schemas
dbsurveyor-collect --compress postgres://localhost/large_db
# Consider sampling for development
dbsurveyor-collect --sample 10 postgres://localhost/large_db
Security Considerations
Credential Protection
All output formats guarantee:
- No database credentials in any output file
- Connection strings sanitized in metadata
- Error messages sanitized
- Validation prevents credential leakage
Encryption Best Practices
# Use strong passwords for encryption
dbsurveyor-collect --encrypt postgres://localhost/sensitive_db
# Verify encrypted file integrity
dbsurveyor validate schema.enc
Airgap Compatibility
All formats work completely offline:
- No external dependencies in generated files
- Self-contained documentation
- Local-only processing
- No network calls during generation
Format Evolution
Version Compatibility
- v1.0: Current format with full backward compatibility
- Future versions: Additive changes only
- Migration tools: Automatic format upgrades
- Deprecation policy: 2-version support window
Schema Extensions
The format supports extensions for:
- Database-specific features
- Custom metadata
- Plugin-generated data
- Performance metrics
Integration Examples
CI/CD Pipeline
# Generate schema documentation in CI
- name: Collect Database Schema
run: |
dbsurveyor-collect --compress $DATABASE_URL
dbsurveyor generate schema.dbsurveyor.json.zst --format markdown
- name: Upload Documentation
uses: actions/upload-artifact@v3
with:
name: database-docs
path: schema.md
Programmatic Access
import json
# Load schema data
with open("schema.dbsurveyor.json") as f:
schema = json.load(f)
# Analyze tables
for table in schema["tables"]:
print(f"Table: {table['name']}")
print(f"Columns: {len(table['columns'])}")
Documentation Automation
#!/bin/bash
# Automated documentation generation
# Collect from multiple environments
dbsurveyor-collect --output prod.json $PROD_DB_URL
dbsurveyor-collect --output staging.json $STAGING_DB_URL
# Generate documentation
dbsurveyor generate prod.json --format html --output prod-schema.html
dbsurveyor generate staging.json --format html --output staging-schema.html
# Compare schemas (future feature)
# dbsurveyor compare prod.json staging.json --output diff.html
This comprehensive output format system ensures that DBSurveyor can adapt to various workflows while maintaining security and compatibility guarantees.