View Single Post
  #4   (View Single Post)  
Old 13th May 2009
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
Join Date: May 2008
Location: USA
Posts: 6,188

Some follow up, regarding your question of "live" system backups. The key here are knowledge of the application(s) and the impact of any type of data loss.

When you are backing up *filesystems* that can change during the backup, you risk loss of data during the backup due to filesystem changes -- additions, deletions, and updates. You must determine if such loss is acceptable or not. It may be. Example: I back up /var on a running system. Even a fairly idle system will have content being added to /var/log regularly. So, I may lose log information in today's backup; but those logs will be available for me tomorrow. In the event I need to recover /var, I may lose /var/log information that was never captured in a backup. But .... even if I do a clean and high-integrity backup by stopping applications and dropping into single user mode at 01:00, I will still lose data if I have a filesystem melt-down at 09:00 later that morning.

There are any number of applications which allow for high-integrity backups of running systems: modern DBMS systems are a good example. My backup of /var, for example, begins with backing up a PostgreSQL DBMS into a flat file, first, prior to running dump(8).

RAID systems can provide redundancy and prevent individual storage device failure from causing data loss; and they can keep application availability going. But they cannot prevent data loss. I run RAID systems, yet I've had to do any number of recoveries unrelated to hardware failure. Application bugs can do harm to data. (I had one of those on Monday this week.) "Finger fumbles" are also a common reason for recovery. RAID can't help if the data on the array is wrong.

Data loss impact must be understood. I've had customers who, for some applications, would flush cache to disk and take "snapshots" of file systems hourly, because they could manually recover only an hour's worth of lost information. I've had other customers with applications that could not withstand any data loss; they required synchronous remote replication of their data, and the ability to seamlessly switch applications from one computing center to another in the event of any infrastructure problems. And, I've had customers for whom "last week's backup" was good enough. The tradeoff is availability and cost.

I even had one customer who's offsite backup strategy -- for a multibillion (US) dollar development project -- was to have a Unix admin toss a set of tapes into the back of his personal vehicle and take them home on Fridays after work. A publicly held, very large US company, too. I'm glad I wasn't their auditor.
Reply With Quote