Open Source · Free to Use (MIT)
VeriSynth turns sensitive tables into privacy-safe synthetic data for training AI models - with cryptographic proof receipts anyone can verify. Runs locally, no cloud required.
No signup · No GPUs needed · On-prem/VPC friendly
Generate realistic, privacy-safe datasets - not random noise. VeriSynth uses Gaussian Copula modeling to learn your dataset's statistical structure and correlations, producing synthetic records that look and act like the real thing.
VeriSynth Core is 100% free and open source under MIT. For teams needing enterprise compliance, managed hosting, or advanced differential privacy - VeriSynth Cloud is on the way.
Trusted foundations · Open governance · Enterprise ready
We learn statistical relationships, not identities - then prove it. VeriSynth learns your dataset's statistical DNA and generates unlimited synthetic records that behave like the real world.
Upload data locally or via CLI/API. No data leaves your environment.
Gaussian Copula captures each column's distribution and correlation structure.
Sample any number of realistic rows while preserving statistical realism.
Ship a cryptographic proof receipt for auditability and trust.
VeriSynth adapts to your data - whether it's medical records, financial transactions, IoT telemetry, or customer analytics. It automatically detects schema types and learns statistical relationships across numeric, categorical, and boolean fields.
Generate privacy-safe patient data for model training and research sharing - without exposing any individual's information.
Create realistic transaction or credit datasets that preserve distributions, seasonality, and correlation between metrics.
Safely publish or analyze sensitive demographic or behavioral data - maintaining statistical integrity while ensuring anonymity.
VeriSynth can automatically infer column types and relationships, or you can define a custom schema - excluding sensitive fields, setting column constraints, or specifying correlation targets. The engine adapts to your data policy, not the other way around.
# Example config.yaml
exclude: ["patient_id", "address"]
types:
age: int
bmi: float
smoker: bool
hba1c: float
model:
engine: GaussianCopula
seed: 42
A simple example showing how VeriSynth transforms a small real dataset into a synthetic one - preserving structure and patterns while protecting every individual's privacy.
| patient_id | age | bmi | smoker | hba1c |
|---|---|---|---|---|
| 001 | 62 | 31.4 | 1 | 7.8 |
| 002 | 45 | 28.6 | 0 | 6.1 |
| 003 | 33 | 24.1 | 0 | 5.3 |
| 004 | 58 | 29.9 | 1 | 6.7 |
| 005 | 70 | 33.5 | 1 | 8.2 |
| patient_id* | age | bmi | smoker | hba1c |
|---|---|---|---|---|
| 101 | 61 | 31.2 | 1 | 7.7 |
| 102 | 47 | 28.3 | 0 | 6.3 |
| 103 | 34 | 24.5 | 0 | 5.2 |
| 104 | 59 | 30.1 | 1 | 6.6 |
| 105 | 72 | 33.7 | 1 | 8.1 |
*Synthetic IDs are newly generated; no record maps to a real individual.
Distributions closely match while records are newly generated. VeriSynth preserves correlations (e.g., age ↔ blood sugar, BMI ↔ smoker status), yielding data that behaves like the real world for analytics and ML - without exposing any individual.
{
"verisynth_version": "core-0.1.0",
"license": "MIT",
"metrics": { "corr_mean_abs_delta": 0.12, "naive_reid_risk": 0.01 },
"input": { "rows": 10, "sha256": "…82b7" },
"output": { "rows": 1000000, "sha256": "…acb9" },
"proof": "merkle_root: …c31"
}Connect with developers, researchers, and data scientists building the future of privacy-safe synthetic data. Get support, share use cases, and contribute to open source.
Open source • Community driven • Privacy first
Be the first to know when VeriSynth Cloud launches. Get early access to enterprise features, managed hosting, and advanced differential privacy capabilities.
We'll only email you about VeriSynth Cloud updates. No spam, unsubscribe anytime.