CNCJ - May, 2020
[Kubernetes] Embracing Failure: Lessons learned from 50 post-mortems
Summary
In this session, Moshe’s covered his anecdotal lessons learned on deploying Kubernetes:
- ⛔ Some of the most common failure modes of Kubernetes clusters
 - ⭐ Best practices you can apply to mitigate these failure modes
 - 👨💻 A live demo simulating failure and then applying policies to prevent it
 
Session Recording
Slides
Google Docs: Embracing failure - Lessons learned from 50 post-mortems
References
- PDF: How Complex Systems Fail
 - Kubernetes Failure Stories
 - Open Policy Agent
 - Node Local DNS
 - Resilience Engineering Papers
 - Blog: Kitchen Soap
 - Github: Cluster API SIG
 - Github: Bloomberg Goldpinger
 - Github: Tmobile Magtape
 - Youtube: Resilience in Complex Adaptive Systems
 - Youtube: Resilience Engineering Playlist
 - Youtube: The Gotchas of Zero-Downtime Traffic /w Kubernetes