Monday, February 22, 2010

Latest on Outage

Dear WestHost Client,
Thank you for your patience during this very difficult time. As you well know, WestHost has a history of delivering solid and reliable hosting platforms with top-notch support. We are not perfect by any means, and recognize areas where we need to improve, but we aim to please and strive to deliver real value to you our loyal customers.
At present, we have restored service to all but 12 shared and 6 dedicated servers. To our great concern, your account is residing on one of those servers currently being impacted by this outage. Our team of Systems Administrators and Engineers are continuing their work to restore the remaining servers and have been working non-stop for over 48 hours with little or no sleep. We continue to place the highest level of urgency on restoring these remaining servers. This is our number one priority at all levels within the organization.
Now for some details:
As a result of the fire suppression system error which was triggered at the Data Center during a routine system check by one of the fire system vendors, critical hardware components were damaged across dozens of our servers. Our first course of action was to replace those damaged components. We were able to repair or replace almost every failed hardware component during the first 36 hours through a combination of our on-hand hardware supplies and an expedited hardware shipment from our primary supplier.
Unfortunately, many of these hardware problems impacted hard drives, causing hard drive failures. When this happens, data loss becomes an immediate focus. Retrieving lost data and restoring from backups can take up to 24 hours per server. This is where we are currently with a majority of our downed servers. Our backup process restores three servers simultaneously. We will get to each server as quickly as we can, but given the number of servers needing to be restored from backups, this places our ETA anywhere from 24 to 96 hours. In other words, it could very well be Friday of this week before all servers and accounts are back on line. If ANYTHING changes with regard to this timing (good or bad) we will immediately inform you. We profusely apologize for the .stops and starts., however, once one issue was resolved another problem surfaced which resulted in adjustments to the published ETA.
In addition to the failed hard drives, we have also found other dedicated and shared servers that are experiencing file system problems. While accounts can generally function on these servers, file system problems can negatively impact the performance of the server. We are aware of these servers and are tackling these issues as they become evident.
What does this mean to you?
Due to the length of this restoration process, we are currently preparing new servers in our WestHost 4.0 environment for you to get your site back up sooner if needed. These new servers will be ready for clients to start using within the next few hours. As soon as they are ready, we will send you further details. This free solution will temporarily enable you to get your email and a basic site up and running so you can communicate with your customers. We recognize the impact this has had on your business and we plan to offer compensation via free month(s) of service for your loss. Once again, we are 100% committed to doing what we can to restore your business and confidence in our company going forward.
For those who might have a question about the Data Center facility, it is important for you to note that we are co-locating at this location which is a Tier 4 data center managed by some of the best professionals in the industry. It has been fully operational throughout this entire process. Most of our systems are running perfectly fine, without any residual effects. Power, bandwidth, cooling, and other critical systems are working properly. This issue has purely been a result of a fire suppression error, and not as a result of any mistake by WestHost personnel or systems.
We will continue to provide email updates every few hours. Again, thank-you for your patience as we restore your services.
Jeff Hunsaker
UK2 Group, US Operations
WestHost Brand

No comments:

Post a Comment