The hosting provider for my client’s application recently made a change to their security setup. Since part of their infrastructure would only ever be accessed internally, they switched to using self-signed security certificates.
Our application was designed to verify certificates to mitigate potential security risks, which now became an issue. The solution was a simple configuration change—stop verifying the certificate. But the bigger question is: what could have been done to prevent the outage?
In this case, my answer is: nothing.
We knew the platform was going to enable secure connections, and we were prepared for that change. We had security alerts in place, proper error logging, and were fully ready for the transition. The only thing we didn’t have was someone on pager duty.
However, the application is an important internal tool, and when you weigh the cost of having someone on-call 24/7 against the low likelihood of a provider-caused outage, it just wasn’t a good return on investment in this case.
During an outage review, it’s essential not just to evaluate the cost of the outage, but also to consider the cost of preventing similar incidents. Sometimes, prevention isn’t worth the expense.