โ— PHANTOM
๐Ÿ‡ฎ๐Ÿ‡ณ IN
โœ•
Skip to content

The Universal Database Seeder. A zero-config, context-aware synthetic data generator for PostgreSQL that maintains referential integrity and semantic realism.

License

Notifications You must be signed in to change notification settings

synthdb/synthdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

๐Ÿฆ€ SynthDB

The Universal Database Seeder

Production-grade synthetic data. Zero config. Context-aware.

Crates.io Built with Rust License: MIT Documentation

Features โ€ข Quick Start โ€ข Examples โ€ข Contributing


๐Ÿ“– Overview

SynthDB is a next-generation database seeding engine that reads your existing PostgreSQL schema and generates statistically realistic, relational data automatically.

Unlike traditional tools that generate random gibberish, SynthDB employs a Deep Semantic Engine to understand your data model's context and relationships, producing data that looks and feels real.

-- Instead of this garbage:
INSERT INTO users VALUES ('XJ9K2', 'asdf@qwerty', '99999', 'ZZZ');

-- SynthDB generates this:
INSERT INTO users VALUES ('John Doe', 'john.doe@techcorp.com', '+1-555-0142', 'San Francisco, CA');

โœจ Features

๐Ÿง  Deep Semantic Intelligence

SynthDB understands the meaning of your columns, not just their types.

๐ŸŽฏ Context-Aware Identity

If a table has first_name, last_name, and email, SynthDB ensures they match perfectly:

๐Ÿท๏ธ Smart Categorization

Automatically detects and generates valid data across multiple domains:

๐Ÿ’ฐ Finance

  • Credit Cards (valid Luhn)
  • IBANs & Swift Codes
  • Cryptocurrency Addresses
  • Currency Codes & Amounts

๐ŸŒ Geography

  • Coherent Addresses
  • Cities โ†” States โ†” Zip Codes
  • Latitude/Longitude Pairs
  • Time Zones

๐Ÿ”ฌ Science

  • Chemical Formulas
  • DNA Sequences
  • Medical/ICD Codes
  • Laboratory Values

๐Ÿ’ป Technology

  • IPv4 & IPv6 Addresses
  • MAC Addresses
  • User Agents
  • File Paths & URLs

๐Ÿข Business

  • Company Names
  • Job Titles
  • Department Names
  • Stock Tickers

๐Ÿ“ฑ Personal

  • Phone Numbers
  • Social Security Numbers
  • Passport Numbers
  • Driver's License IDs

๐Ÿ”— Referential Integrity

๐Ÿ“Š Topological Sort

Automatically analyzes foreign key dependencies and inserts data in the correct order:

Users โ†’ Orders โ†’ OrderItems โ†’ Shipments

โœ… Zero Broken Links

Generated foreign keys always reference valid, existing parent rows. No orphaned records, ever.

-- Parent record created first
INSERT INTO customers (id, name) VALUES (1, 'Acme Corp');

-- Child record references existing parent
INSERT INTO orders (id, customer_id, total) VALUES (101, 1, 1299.99);

๐Ÿ›ก๏ธ Production Ready

Feature Description
Strict Precision Respects NUMERIC(10,2), VARCHAR(15), and all constraint types
Smart Nulls Intelligently applies NULL values to optional fields while keeping critical data populated
Unique Constraints Guarantees uniqueness for columns with UNIQUE or PRIMARY KEY constraints
Check Constraints Honors CHECK constraints and enum types
Zero Configuration No YAML files, no mapping rules. Just point it at your database
Performance Written in Rust ๐Ÿฆ€ for blazing-fast data generation

โšก Quick Start

๐Ÿ“ฅ Installation

# Via Cargo
cargo install synthdb

๐ŸŽฏ Basic Usage

Step 1: Create a target database with your schema (tables must exist)

Step 2: Run SynthDB

synthdb clone \
  --url "postgres://user:pass@localhost:5432/my_staging_db" \
  --rows 1000 \
  --output seed.sql

Step 3: Apply the generated data

psql -d my_staging_db -f seed.sql

๐Ÿ”ง Advanced Options

# Generate data directly to database (no SQL file)
synthdb clone --url "postgres://..." --rows 5000 --execute

# Specify custom row counts per table
synthdb clone --url "postgres://..." --config counts.json

# Exclude specific tables
synthdb clone --url "postgres://..." --exclude "logs,temp_*"

# Set data locale
synthdb clone --url "postgres://..." --locale "en_GB"

๐Ÿ’ก Examples

๐ŸŽจ How SynthDB Handles Data

Column Name Generated Value Logic
merchant_name 'Acme Corporation' ๐Ÿข Detected Company entity
support_email 'support@acmecorp.com' ๐Ÿ“ง Matched to Company Name
mac_address '00:1A:2B:3C:4D:5E' ๐Ÿ”ง Valid hexadecimal format
ipv6_address '2001:0db8:85a3::8a2e:0370' ๐ŸŒ Valid IPv6 format
contract_value 45021.50 ๐Ÿ’ฏ Respected NUMERIC(10,2)
tracking_code 'TRK-9281-A02' ๐ŸŽฏ Semantic ID generation
audit_log_path '/var/logs/audit/2024-11.log' ๐Ÿ“ Context-aware file path
birth_date '1985-06-15' ๐ŸŽ‚ Realistic age distribution
website_url 'https://acmecorp.com' ๐Ÿ”— Matched to company domain

๐Ÿ—‚๏ธ Real-World Schema Example

-- Your existing schema
CREATE TABLE companies (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    website VARCHAR(255),
    industry VARCHAR(50)
);

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    company_id INTEGER REFERENCES companies(id),
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    phone VARCHAR(20),
    job_title VARCHAR(100),
    salary NUMERIC(10,2),
    hire_date DATE NOT NULL
);

SynthDB generates:

-- Coherent company data
INSERT INTO companies VALUES 
(1, 'TechVision Solutions', 'https://techvision.io', 'Software'),
(2, 'Global Logistics Inc', 'https://globallogistics.com', 'Transportation');

-- Employees with matching company context
INSERT INTO employees VALUES 
(1, 1, 'Alice', 'Chen', 'alice.chen@techvision.io', '+1-555-0123', 'Senior Software Engineer', 125000.00, '2022-03-15'),
(2, 1, 'Bob', 'Kumar', 'bob.kumar@techvision.io', '+1-555-0124', 'Product Manager', 135000.00, '2021-08-22'),
(3, 2, 'Carol', 'Rodriguez', 'carol.rodriguez@globallogistics.com', '+1-555-0198', 'Operations Director', 145000.00, '2020-01-10');

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     SynthDB Engine                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  1. Schema Introspection                                โ”‚
โ”‚     โ””โ”€ Read tables, columns, constraints, relationships โ”‚
โ”‚                                                          โ”‚
โ”‚  2. Dependency Analysis                                 โ”‚
โ”‚     โ””โ”€ Build dependency graph via topological sort      โ”‚
โ”‚                                                          โ”‚
โ”‚  3. Semantic Classification                             โ”‚
โ”‚     โ””โ”€ Detect column meaning from names & types         โ”‚
โ”‚                                                          โ”‚
โ”‚  4. Context-Aware Generation                            โ”‚
โ”‚     โ””โ”€ Generate coherent, relational data               โ”‚
โ”‚                                                          โ”‚
โ”‚  5. Constraint Validation                               โ”‚
โ”‚     โ””โ”€ Ensure all DB constraints are satisfied          โ”‚
โ”‚                                                          โ”‚
โ”‚  6. Output                                              โ”‚
โ”‚     โ””โ”€ SQL file or direct database insertion            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ—บ๏ธ Roadmap

  • PostgreSQL support
  • Semantic column detection
  • Foreign key resolution
  • MySQL/MariaDB support
  • SQLite support
  • Custom data providers
  • GraphQL schema support
  • Performance benchmarking suite
  • Web UI for configuration
  • Machine learning-based pattern detection

๐Ÿค Contributing

We love Rustaceans! ๐Ÿฆ€ Contributions are welcome and appreciated.

How to Contribute

  1. Fork the repository
  2. Create a feature branch
    git checkout -b feature/amazing-feature
  3. Make your changes
    cargo fmt
    cargo clippy
    cargo test
  4. Commit your changes
    git commit -m 'Add amazing feature'
  5. Push to your fork
    git push origin feature/amazing-feature
  6. Open a Pull Request

Development Setup

# Clone the repository
git clone https://github.com/yourusername/synthdb.git
cd synthdb

# Build the project
cargo build

# Run tests
cargo test

# Run with example
cargo run -- clone --url "postgres://localhost/testdb" --rows 100

Code of Conduct

Please read our Code of Conduct before contributing.



๐Ÿ™ Acknowledgments

Built with โค๏ธ using:

  • Rust - Systems programming language
  • Tokio - Async runtime
  • SQLx - Database toolkit
  • Fake - Data generation library

๐Ÿ“„ License

Distributed under the MIT License. See LICENSE for more information.


๐Ÿ’ฌ Community & Support


If SynthDB helps your project, consider giving it a โญ on GitHub!

Made with ๐Ÿฆ€ by the SynthDB team

About

The Universal Database Seeder. A zero-config, context-aware synthetic data generator for PostgreSQL that maintains referential integrity and semantic realism.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages