Open Source · Free to Use (MIT)

Open Source
Verifiable Synthetic Data

VeriSynth turns sensitive tables into privacy-safe synthetic data for training AI models - with cryptographic proof receipts anyone can verify. Runs locally, no cloud required.

No signup · No GPUs needed · On-prem/VPC friendly

Quick Start
  1. # Install the package
  2. pip install verisynth-core
  3. # Run it!
  4. verisynth --input data/sample.csv --output out/ --rows 1000
    [INFO] Loading data.csv (10 rows)
    [OK] Synthetic dataset written to out/synthetic.csv
    [OK] Proof receipt saved → proof.json
    [OK] Done. 1M synthetic rows created in 3.4s.
  5. # Verify the proof receipt
  6. verisynth verify proof.json

From 10 to 1,000,000 rows

Generate realistic, privacy-safe datasets - not random noise. VeriSynth uses Gaussian Copula modeling to learn your dataset's statistical structure and correlations, producing synthetic records that look and act like the real thing.

Unlimited Growth

From open source to enterprise scale

VeriSynth Core is 100% free and open source under MIT. For teams needing enterprise compliance, managed hosting, or advanced differential privacy - VeriSynth Cloud is on the way.

Trusted foundations · Open governance · Enterprise ready

How VeriSynth works

We learn statistical relationships, not identities - then prove it. VeriSynth learns your dataset's statistical DNA and generates unlimited synthetic records that behave like the real world.

1 · Ingest

Upload data locally or via CLI/API. No data leaves your environment.

2 · Learn

Gaussian Copula captures each column's distribution and correlation structure.

3 · Generate

Sample any number of realistic rows while preserving statistical realism.

4 · Verify

Ship a cryptographic proof receipt for auditability and trust.

Built for every dataset

VeriSynth adapts to your data - whether it's medical records, financial transactions, IoT telemetry, or customer analytics. It automatically detects schema types and learns statistical relationships across numeric, categorical, and boolean fields.

Healthcare & Life Sciences

Generate privacy-safe patient data for model training and research sharing - without exposing any individual's information.

Finance & Risk Modeling

Create realistic transaction or credit datasets that preserve distributions, seasonality, and correlation between metrics.

Public Sector & Analytics

Safely publish or analyze sensitive demographic or behavioral data - maintaining statistical integrity while ensuring anonymity.

Flexible schema control

VeriSynth can automatically infer column types and relationships, or you can define a custom schema - excluding sensitive fields, setting column constraints, or specifying correlation targets. The engine adapts to your data policy, not the other way around.

# Example config.yaml
exclude: ["patient_id", "address"]
types:
  age: int
  bmi: float
  smoker: bool
  hba1c: float
model:
  engine: GaussianCopula
  seed: 42

See VeriSynth in action

A simple example showing how VeriSynth transforms a small real dataset into a synthetic one - preserving structure and patterns while protecting every individual's privacy.

Original Data (sensitive)

patient_idagebmismokerhba1c
0016231.417.8
0024528.606.1
0033324.105.3
0045829.916.7
0057033.518.2

Synthetic Output Verified

patient_id*agebmismokerhba1c
1016131.217.7
1024728.306.3
1033424.505.2
1045930.116.6
1057233.718.1

*Synthetic IDs are newly generated; no record maps to a real individual.

Distributions closely match while records are newly generated. VeriSynth preserves correlations (e.g., age ↔ blood sugar, BMI ↔ smoker status), yielding data that behaves like the real world for analytics and ML - without exposing any individual.

Proof receipts auditors will love

  • • File integrity via SHA-256 and Merkle roots
  • • Fidelity & privacy metrics (KS tests, correlation delta)
  • • Deterministic seeds and reproducibility
  • • Optional DP parameters and ed25519 signatures
  1. Attach proof.json – Ship synthetic data with a verifiable receipt.
  2. Verify offline – Auditors recompute hashes without raw data.
  3. Approve & share – If all metrics pass policy, it's verified.
proof.json Verified
{
    "verisynth_version": "core-0.1.0",
    "license": "MIT",
    "metrics": { "corr_mean_abs_delta": 0.12, "naive_reid_risk": 0.01 },
    "input":  { "rows": 10, "sha256": "…82b7" },
    "output": { "rows": 1000000, "sha256": "…acb9" },
    "proof": "merkle_root: …c31"
  }

Join the VeriSynth community

Connect with developers, researchers, and data scientists building the future of privacy-safe synthetic data. Get support, share use cases, and contribute to open source.

Open source • Community driven • Privacy first

Get notified for VeriSynth Cloud

Be the first to know when VeriSynth Cloud launches. Get early access to enterprise features, managed hosting, and advanced differential privacy capabilities.

We'll only email you about VeriSynth Cloud updates. No spam, unsubscribe anytime.