Open Source · Free to Use (MIT)

Open Source
Verifiable Synthetic Data

VeriSynth turns sensitive tables into privacy-safe synthetic data for training AI models - with cryptographic proof receipts anyone can verify. Runs locally, no cloud required.

View on GitHub How it works

No signup · No GPUs needed · On-prem/VPC friendly

Quick Start

# Install the package
pip install verisynth-core
# Run it!
verisynth --input data/sample.csv --output out/ --rows 1000
[INFO] Loading data.csv (10 rows)
[OK] Synthetic dataset written to out/synthetic.csv
[OK] Proof receipt saved → proof.json
[OK] Done. 1M synthetic rows created in 3.4s.
# Verify the proof receipt
verisynth verify proof.json

From 10 to 1,000,000 rows

Generate realistic, privacy-safe datasets - not random noise. VeriSynth uses Gaussian Copula modeling to learn your dataset's statistical structure and correlations, producing synthetic records that look and act like the real thing.

From open source to enterprise scale

VeriSynth Core is 100% free and open source under MIT. For teams needing enterprise compliance, managed hosting, or advanced differential privacy - VeriSynth Cloud is on the way.

Explore Open Source Get Notified for VeriSynth Cloud

Trusted foundations · Open governance · Enterprise ready

How VeriSynth works

We learn statistical relationships, not identities - then prove it. VeriSynth learns your dataset's statistical DNA and generates unlimited synthetic records that behave like the real world.

1 · Ingest

Upload data locally or via CLI/API. No data leaves your environment.

2 · Learn

Gaussian Copula captures each column's distribution and correlation structure.

3 · Generate

Sample any number of realistic rows while preserving statistical realism.

4 · Verify

Ship a cryptographic proof receipt for auditability and trust.

Built for every dataset

VeriSynth adapts to your data - whether it's medical records, financial transactions, IoT telemetry, or customer analytics. It automatically detects schema types and learns statistical relationships across numeric, categorical, and boolean fields.

Healthcare & Life Sciences

Generate privacy-safe patient data for model training and research sharing - without exposing any individual's information.

Finance & Risk Modeling

Create realistic transaction or credit datasets that preserve distributions, seasonality, and correlation between metrics.

Public Sector & Analytics

Safely publish or analyze sensitive demographic or behavioral data - maintaining statistical integrity while ensuring anonymity.

Flexible schema control

VeriSynth can automatically infer column types and relationships, or you can define a custom schema - excluding sensitive fields, setting column constraints, or specifying correlation targets. The engine adapts to your data policy, not the other way around.

# Example config.yaml
exclude: ["patient_id", "address"]
types:
  age: int
  bmi: float
  smoker: bool
  hba1c: float
model:
  engine: GaussianCopula
  seed: 42

See VeriSynth in action

A simple example showing how VeriSynth transforms a small real dataset into a synthetic one - preserving structure and patterns while protecting every individual's privacy.

Original Data (sensitive)

patient_id	age	bmi	smoker	hba1c
001	62	31.4	1	7.8
002	45	28.6	0	6.1
003	33	24.1	0	5.3
004	58	29.9	1	6.7
005	70	33.5	1	8.2

Synthetic Output Verified

patient_id*	age	bmi	smoker	hba1c
101	61	31.2	1	7.7
102	47	28.3	0	6.3
103	34	24.5	0	5.2
104	59	30.1	1	6.6
105	72	33.7	1	8.1

*Synthetic IDs are newly generated; no record maps to a real individual.

Distributions closely match while records are newly generated. VeriSynth preserves correlations (e.g., age ↔ blood sugar, BMI ↔ smoker status), yielding data that behaves like the real world for analytics and ML - without exposing any individual.

Proof receipts auditors will love

• File integrity via SHA-256 and Merkle roots
• Fidelity & privacy metrics (KS tests, correlation delta)
• Deterministic seeds and reproducibility
• Optional DP parameters and ed25519 signatures

Attach proof.json – Ship synthetic data with a verifiable receipt.
Verify offline – Auditors recompute hashes without raw data.
Approve & share – If all metrics pass policy, it's verified.

proof.json Verified

{
    "verisynth_version": "core-0.1.0",
    "license": "MIT",
    "metrics": { "corr_mean_abs_delta": 0.12, "naive_reid_risk": 0.01 },
    "input":  { "rows": 10, "sha256": "…82b7" },
    "output": { "rows": 1000000, "sha256": "…acb9" },
    "proof": "merkle_root: …c31"
  }

Join the VeriSynth community

Connect with developers, researchers, and data scientists building the future of privacy-safe synthetic data. Get support, share use cases, and contribute to open source.

GitHub

Star, fork, and contribute

X (Twitter)

Follow for updates

Open source • Community driven • Privacy first

Get notified for VeriSynth Cloud

Be the first to know when VeriSynth Cloud launches. Get early access to enterprise features, managed hosting, and advanced differential privacy capabilities.

We'll only email you about VeriSynth Cloud updates. No spam, unsubscribe anytime.

Open Source Verifiable Synthetic Data