II.
LibraryProcess overview
Reference · livelib-process:data-science-ml--ml-observability
ml-observability overview
ML System Observability and Incident Response - Comprehensive monitoring, anomaly detection, incident triage, root cause analysis, and automated remediation for ML systems in production.
Attributes
displayName
ml-observability
description
ML System Observability and Incident Response - Comprehensive monitoring, anomaly detection,
incident triage, root cause analysis, and automated remediation for ML systems in production.
libraryPath
library/specializations/data-science-ml/ml-observability.js
specialization
data-science-ml
references
- - Google SRE Book: https://sre.google/sre-book/table-of-contents/ - ML Observability Best Practices: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#mlops_level_2_cicd_pipeline_automation - Incident Response: https://response.pagerduty.com/ - OpenTelemetry for ML: https://opentelemetry.io/ - Evidently AI Monitoring: https://www.evidentlyai.com/
example
const result = await orchestrate('specializations/data-science-ml/ml-observability', {
modelId: 'recommendation-engine-v3',
environment: 'production',
incidentType: 'performance-degradation', // or 'data-drift', 'prediction-anomaly', 'system-failure'
alertThresholds: {
latencyP95Ms: 500,
errorRatePercent: 2.0,
predictionDriftScore: 0.15,
dataDriftScore: 0.20
},
enableAutoRemediation: true
});
usesAgents
- general-purpose
Outgoing edges
lib_applies_to_domain1
- domain:data-science·DomainData Science
lib_belongs_to_specialization1
- specialization:data-science-ml·Specialization
lib_implements_workflow1
- workflow:data-pipeline-deployment·WorkflowData Pipeline Deployment
lib_involves_role1
- role:data-scientist·RoleData Scientist
Incoming edges
None.