When a TTL Quietly Deleted Live Sessions: A Policy-Control RCA
Root-caused why add-on policy changes silently failed for long-lived broadband sessions: a time-to-live on a secondary index that was stamped once and never renewed.
Read case studyCase studies
Root-caused why add-on policy changes silently failed for long-lived broadband sessions: a time-to-live on a secondary index that was stamped once and never renewed.
Read case studyLed production incident restoration and RCA delivery under a platinum support process for one telecom operator, while running hypercare stabilization and a client-facing security disclosure for another.
Read case studyBuilt engineering governance for a live two-squad engagement at an energy retailer, served as a high-volume PR quality gate, and led the RCA for a six-figure payments incident, without slowing delivery.
Read case studyCo-architected a server-driven UI platform for a national airline's mobile app (2.3M MAU), moving screen composition from app-store releases to backend definitions. The Android modernization underneath cut feature delivery from about a month to about a week.
Read case studyScaled a high-growth social-commerce platform to #3 most-downloaded in the Philippines in 2021: 30M+ users at peak. An AWS case study showed about +600% growth at only about +50% monthly cost, with chat from 100K to 1M concurrent.
Read case study