CNCJ - May, 2020
[Kubernetes] Embracing Failure: Lessons learned from 50 post-mortems
Summary
In this session, Moshe’s covered his anecdotal lessons learned on deploying Kubernetes:
- ⛔ Some of the most common failure modes of Kubernetes clusters
- ⭐ Best practices you can apply to mitigate these failure modes
- 👨💻 A live demo simulating failure and then applying policies to prevent it
Session Recording
Slides
Google Docs: Embracing failure - Lessons learned from 50 post-mortems
References
- PDF: How Complex Systems Fail
- Kubernetes Failure Stories
- Open Policy Agent
- Node Local DNS
- Resilience Engineering Papers
- Blog: Kitchen Soap
- Github: Cluster API SIG
- Github: Bloomberg Goldpinger
- Github: Tmobile Magtape
- Youtube: Resilience in Complex Adaptive Systems
- Youtube: Resilience Engineering Playlist
- Youtube: The Gotchas of Zero-Downtime Traffic /w Kubernetes