II.
LibraryProcess overview
Reference · livelib-process:data-science-ml--data-collection-validation
data-collection-validation overview
Orchestrate data ingestion from multiple sources with validation, quality checks, and versioning
Attributes
displayName
data-collection-validation
description
Orchestrate data ingestion from multiple sources with validation, quality checks, and versioning
libraryPath
library/specializations/data-science-ml/data-collection-validation.js
specialization
data-science-ml
references
- - Great Expectations: https://greatexpectations.io/ - DVC (Data Version Control): https://dvc.org/ - MLOps Principles: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
example
const result = await orchestrate('data-science-ml/data-collection-validation', {
dataSources: [
{ type: 'csv', path: 'data/raw/customers.csv', name: 'customers' },
{ type: 'database', connection: 'postgres://...', query: 'SELECT * FROM orders', name: 'orders' }
],
targetQuality: 85,
schemaPath: 'schemas/data_schema.json',
validationRules: ['no_missing_primary_keys', 'valid_email_format', 'positive_amounts'],
versioningEnabled: true
});
usesAgents
- general-purpose
Outgoing edges
lib_applies_to_domain1
- domain:data-science·DomainData Science
lib_belongs_to_specialization1
- specialization:data-science-ml·Specialization
lib_implements_workflow1
- workflow:ml-model-lifecycle·WorkflowML Model Lifecycle
Incoming edges
None.