For clients interested in the outage we recently experienced, we wanted to break down the chain of events. On February 28th at 22:30 we began to implement a network upgrade. The upgrade had been planned and tested for quite some time in advance, and aimed to implement enhanced security features for all clients.
At 22:42 we detected a fault with the network which related to this update, and an unforeseen problem when changing the configuration of one of our core routing systems. This had the knock on effect of causing all hosts on one of our internal VLANs to become unavailable. Customer facing servers affected included:
The nature of the fault required an on site engineer to respond, and as such our engineer arrived on site at 23:15 to begin work.
The nature of the configuration error meant several diagnostic steps were required, including the power cycling of various parts of our core network. During this time other servers on the network may have experience intermittent availability, as parts of the network were taken down.
At 01:58 on March 1st we had restored connectivity to all hosting servers. Our site and related services plus the Mercury e-mail system remained unavailable until 02:48 when they were eventually fixed by several database rebuilds and service restarts.
The intended upgrade was aborted due to this fault and we are now urgently reviewing our findings from the event. After this review concludes, we will be taking steps to implement the upgrade again at a later date.