Server Support: Dealing with Downtime

TR NOT: bu asagidaki yazi yillardir ugrastigim herseyi kisaca ve ozce cok guzel anlatmis.

belki bir gun Turkceye ceviririm. Orjinal link pingzineden en asagida linki.

 

You get a call at 2 am in the morning. One of your servers with over 1000 shared accounts on them has gone down. You rush to the office (thank God it’s close to home) and find your support staff frantically working on the server and at the same time trying to field calls and emails from irate customers. After several tense moments, the cause is found. The load is very high, causing services to fail. Your support staff suggests a reboot instead of diagnosing the reason for high load. You say ok, go ahead, as long as the load comes back to normal and all services run normally. Reboot done, and the team spends the rest of the night replying to customers. Later, you have no clue why the load went up the way it did because there were no logs.

Downtime is serious. In this age of social networking on Twitter and Facebook, bad news flies really fast. This kind of negative publicity can seriously result in loss of reputation and customers within a single day. It’s no wonder, Hosts have to be on top of their business all day every day.

Downtime is a reality in the Hosting business. Do the math. Here are some commonly advertised service availability figures.

  •   99.9% availability equates to 8 hours and 45 minutes of downtime per year.
  •  99.99% availability equates to around 52 minutes of downtime per year.

Even the most reliable WebHost has 52 minutes of downtime in a year. This downtime can be a result of scheduled or unscheduled events or both. In this article, we will look at ways to deal with both types of events.

Dealing with scheduled downtime

Scheduled downtimes are a necessary part of server maintenance. A web host who regularly maintains the servers will reduce incidence of security vulnerabilities, increase performance and improve customer experience. A good host will have more scheduled downtimes than unscheduled downtimes.

The most important way of dealing with scheduled downtime is through “Proactive Communication”. In this type of communication, you let customers know about the downtime before they find out on their own. Sounds simple, isn’t it. The sad fact is that many Hosts do not follow it well enough. So lets see how this helps.

How does proactive communication help?

Proactive communication is a very useful method for customer retention during downtimes.

  •  Gives you time to let your customers know all the great benefits they can hope to get with the changes in the system.
  •  Reduces customer confusion
  •  Helps customers inform their customers of downtime
  •  Reduce flood of tickets during the downtime
  •  Customers appreciate that you let them in on your plans.

How to setup proactive communication

Before shooting all your customers an email, spend a few minutes deciding what you will tell them. A nicely formatted and complete email will reduce a lot of confusion and reduce the burden on your support team, especially when they are busy with the maintenance work. Here are some pointers.

What to tell your customers during scheduled downtime. Tell them…

  •  When the maintenance is scheduled (Exact date and time)
  •  How long maintenance will last (down to the minutes)
  •  What exactly will get disrupted (eg, web, mail etc)
  •  Reasons for maintenance
  •  Benefits to the customer once the maintenance is done
  •  How to contact support staff during maintenance (via email, forum etc)
  •  Alternative arrangements they can do, if any.

When to tell them

  •  At least one week prior to the event.
  •  Again, 24 hours before the event

How to tell them

  •  News section on website
  •  Email
  •  Social media (Twitter, Facebook)
  •  Forum or blog

Dealing with Unscheduled downtime

Unscheduled downtimes happen when something unexpected or untoward happens. The reasons for unscheduled downtimes could include sudden increases in traffic, hacking attempts, old software leading to exploited vulnerabilities, DOS attacks, spam resulting in flooding of the queues, even the occasional hardware failure. No wonder it is a nightmarish scenario to deal with at 2am in the morning.

So what can hosts do to prevent a massive downturn, in the event of a downtime? Simply follow the 2Ps.

  •  Prevent downtime
  •  Prepare for downtime

How to Prevent downtime

Wouldn’t you service your car periodically to prevent breakdowns and expensive repairs. The same way, a server is the engine on which your hosting business runs. An important way to prevent downtime is to maintain your server hardware and software periodically. This type of server administration is called Proactive server administration.

In proactive server administration, always start by first securing the server with at least these steps. Note that these methods should be performed by a trained professional.

  •  Make sure the software is all updated
  •  Configure a firewall and restrict access to critical ports
  •  Decide on minimum services and secure those services. Close unwanted services.
  •  If you have shared accounts, check user security such as weak passwords.
  •  Enable extended logging so that detecting during disaster is easier.
  •  Secure world writable directories.

Monitor availability of servers and individual services. For example, if your server load frequently goes high, you should be able to set up notifications that inform you of cut off load long before it becomes dangerously high. This helps you prevent downtime simply by checking on it, before the load creeps up and brings the server down.

It is always useful to log all information for critical services, and to set up notifications for certain events. This helps in debugging and preventing future downtimes. The scenario I presented in the beginning, could have been prevented if logs were maintained.

Keep track of exploits and service vulnerabilities. Sites like secunia.org and milw0rm.com have newsletters and mailing lists that you can sign on, thats gives you information first hand on any vulnerabilities. Take action before hackers do.

Also, always conduct a monthly server audit to check for any suspect logins, spamming, server performance etc.

How to prepare for downtime

The first step to prepare for downtime is to visualize your reaction if an unscheduled downtime took place.

How would you contact your customers? Is your infrastructure up to speed to deal with an emergency. For example a helpdesk system, your website, phone lines and email are critical systems that should be available to engage with your customers in times of downtime.

Some people wonder whether to communicate unscheduled downtime to customers. The downtime is going to last a few minutes. Should the host inform customers of unscheduled downtime?

And the answer is Yes!! The worst thing the host wants to do is to have customers find out by themselves, or worse, their customers. By being responsible and letting customers know, you seem to be on top of your business. Customers appreciate the fact that you informed them, rather than the other way around.

Always prepare to send a lightening response to customers who are experiencing downtime. Here are a few things you should prepare.

  1.   Speed of response. You need to put up information on your website within minutes of the downtime at least.
  2.  Decide where you are going to put up this information on the website. How you are going to contact your customers.
  3.  Many times you need professional help in solving downtime issues. Form those relationships early on, so that they are available when you need them.
  4.  If you have an in-house team, make sure they are ready and knowledgeable to solve these issues when they happen.

By prevention and careful preparedness, you can avoid downtimes taking a hit at your business and your customers’ businesses.

 

TAKEN FROM: http://www.pingzine.com/server-support-dealing-with-downtime-2/