Sorry for the Rough Ride

Hi there. As you surely noticed the site was down for the last couple of days. This was due to a chain of events that in the end resulted in a complete outage. The system the site is hosted on is running from a dedicated server and the data resides on a simple RAID-1 array. On Monday one of the drives in that array died and so the site went down for the first time. Using the rescue system I was able to create a current backup (in addition to the ones that are created automatically) and ordered the failed drive to be replaced. This was done but while restoring the array the second drive died from the stress of rebuilding. With the backup at hand I ordered the second drive to be replaced as well, screw the data on the disks and rebuild the array from scratch. Now, this process didn’t go as smooth as I hoped because the major part of the following days was spent with convincing the hoster that the new replacement drive couldn’t be bothered to join the array. Part of the delay of course was due to the fact that I need to not only go to work but also do some actual work there so that I couldn’t be immediately available for follow-up questions from the hoster which mostly consisted of “Are you really sure that we should …” and my simple answer “Yes, I am” (felt like being at an elaborate wedding at some times).
Anyway, one the array was up with a finally again replaced second drive and the OS installed I could go about reinstalling the bits and pieces. And finally I was able to restore the data which took a long, long time for I had to use my meager 2 Mbit/s uplink to transfer roughly 70 GB of data. That’s why it took so long to bring the site up again.
I heard that some of you were encountering troubles throughout the outage, mainly due to some services seemingly being dependent on some response from the site. My apologies for that, I’ll look into those reports and will try to make sure this stuff is removed or at least changed so that it will fail gracefully should another outage occur.
On the plus side I took the chance and went through all the backup systems and refined some of those to be able to get the site up faster. Still, the major blocker here is my upload speed and there’s nothing I can do about it in the near future.
That all said, I again apologize for any inconvenience and hope you still enjoy your ReadyNAS and my add-ons.

Comments

  1. Wietse says:

    Well you’re back up and running without loss of data etc. and that’s most important part. Great job!