Appraisd unscheduled downtime, 18th August 2021

Approximately 3 hours of downtime were observed - here's our write up

Analysis and impact

At approximately 18:00 BST on 18 August 2021, the main Appraisd application at app.appraisd.com, the website at www.appraisd.com and the API at api.appraisd.com became unavailable to users. The Appraisd information security team, customer success team and the CEO were alerted by Pingdom, the third-party company we use to monitor Appraisd. Alerts were received as expected via Slack and text message.

Service was restored approximately 3 hours later.

Root cause

On receiving the alert from Pingdom, the information security team immediately took responsibility for handling the incident. Within a few minutes we had identified the cause: our Microsoft Azure services services had been suspended due to a small outstanding amount on an invoice that was the subject of a support ticket that was currently being investigated.

We had not allocated enough time and resources to get to the bottom of the issue ourselves, assuming Microsoft and our account manager there would do so on our behalf.

Resolution

We made an additional payment to Microsoft immediately to cover any outstanding amounts, notified our account manager and updated the support ticket. We also issued another support ticket in the hope it would resolve things further.

Microsoft recognised the new credit and resumed services.

As per our ISO27001 management system and incident response plan, we conducted a review to determine lessons learned and resultant follow up actions.

Lessons learned / comment

We had made some incorrect assumptions. We assumed that our support ticket would ensure services would not be interrupted.

As a result and to avoid this in future, we're now increasing access to the billing areas for our Microsoft (and other) services to suitable personnel and setting up monthly checks so that there's enough resource to cover problems like this.

To our customers we'd like to apologise: we want Appraisd to be a system you can rely on completely, and this incident has let many of you down.

Prior to this incident, our downtime for the previous 12 months totalled less than 1 hour, with 99.98% uptime, which is generally considered a very good track record. So this outage is very frustrating for you, and embarrassing for us.

If you have any further questions, please contact support@appraisd.com and your enquiry will be dealt with by the information security team at Appraisd.