II.
Workflow overview
Reference · liveworkflow:synthetic-data-generation-pipeline
Synthetic Data Generation Pipeline overview
Designs and executes pipelines that produce synthetic training, evaluation, or test datasets — defining schema constraints and statistical distributions, configuring generative models or rule-based generators, validating that synthetic outputs match real-data characteristics without leaking PII, running bias and fairness audits, and versioning artifacts in a data registry. Produces validated synthetic datasets and a data-card. Excludes model training itself.
Attributes
displayName
Synthetic Data Generation Pipeline
workflowKind
development
triggerType
on-demand
typicalCadence
per-milestone
complexity
cross-team
description
Designs and executes pipelines that produce synthetic training, evaluation,
or test datasets — defining schema constraints and statistical
distributions, configuring generative models or rule-based generators,
validating that synthetic outputs match real-data characteristics without
leaking PII, running bias and fairness audits, and versioning artifacts in
a data registry. Produces validated synthetic datasets and a data-card.
Excludes model training itself.
Outgoing edges
applies_to_domain2
- domain:data-science·DomainData Science
- domain:ml-ops·DomainMLOps
involves_role3
- role:data-scientist·RoleData Scientist
- role:ml-engineer·RoleMachine Learning Engineer
- role:data-engineer·RoleData Engineer
performed_by_org_unit2
- org-unit:ml-platform-team·OrgUnitML Platform Team
- org-unit:data-platform-team·OrgUnitData Platform Team
requires_skill_area2
- skill-area:python-implementation·SkillAreaPython Function Implementation
- skill-area:data-quality·SkillAreaData Quality
triggers_responsibility2
- responsibility:data-quality-monitoring·ResponsibilityData quality monitoring
- responsibility:ai-safety-guardrails·Responsibility
Incoming edges
follows_workflow1
- stack-profile:synthetic-data-generation·StackProfileSynthetic Data Generation Stack (Python, PyTorch, FastAPI, PostgreSQL, S3)