II.
StackProfile overview
Reference · livestack-profile:chaos-engineering
Chaos Engineering (Kubernetes, Prometheus, Grafana, Go, OpenTelemetry) overview
A chaos engineering practice stack that injects controlled failures into Kubernetes-based production and staging environments to validate system resilience. Experiments target pod termination, network partitions, CPU stress, and DNS failures. Prometheus and Grafana monitor system behavior during experiments while OpenTelemetry traces reveal cascading failure paths. Custom Go tooling orchestrates experiment schedules and automatically aborts when safety conditions are breached. Targeted at SRE teams building confidence in system fault tolerance before peak traffic events. The tradeoff is the risk of uncontrolled blast radius and the cultural challenge of convincing stakeholders to break things intentionally.
Attributes
displayName
Chaos Engineering (Kubernetes, Prometheus, Grafana, Go, OpenTelemetry)
description
A chaos engineering practice stack that injects controlled failures into
Kubernetes-based production and staging environments to validate system
resilience. Experiments target pod termination, network partitions, CPU
stress, and DNS failures. Prometheus and Grafana monitor system behavior
during experiments while OpenTelemetry traces reveal cascading failure
paths. Custom Go tooling orchestrates experiment schedules and
automatically aborts when safety conditions are breached. Targeted at
SRE teams building confidence in system fault tolerance before peak
traffic events. The tradeoff is the risk of uncontrolled blast radius
and the cultural challenge of convincing stakeholders to break things
intentionally.
composes
Outgoing edges
applies_to2
- domain:platform-engineering·DomainPlatform Engineering
- domain:observability·DomainObservability
composed_of8
- tool:kubernetes·ToolKubernetes
- tool:prometheus·ToolPrometheus
- tool:grafana·ToolGrafana
- language:go·LanguageGo
- tool:opentelemetry·ToolOpenTelemetry
- tool:jaeger·ToolJaeger
- language:yaml·LanguageYAML
- tool:helm·ToolHelm
follows_workflow2
- workflow:chaos-game-day·WorkflowChaos Game Day
- workflow:slo-burn-rate-review·WorkflowSLO Burn Rate Review
requires_skill_area5
- skill-area:chaos-engineering·SkillAreaChaos Engineering
- skill-area:sli-slo-management·SkillAreaSLI / SLO Management
- skill-area:incident-response·SkillAreaIncident Response
- skill-area:observability-instrumentation·SkillAreaObservability Instrumentation
- skill-area:distributed-tracing·SkillAreaDistributed Tracing
used_by_role3
- role:sre·Role
- role:platform-engineer·Role
- role:backend-engineer·RoleBackend Engineer
Incoming edges
None.