More Resources

Beyond backup toward storage management.


by Kaczmarski, M.^Jiang, T.^Pease, D.A.
IBM Systems Journal • July, 2003 •

The proliferation of distributed computing and Internet usage together with continually falling storage prices, greater disk capacities, and tremendous data growth, challenge storage administrators to adequately provide nonintrusive backup and proper recovery of data. Enterprise computer system data protection is subject to operational demands that are driven by varying business requirements and continual advancements in storage technology. A number of factors lead to inherent complexity in the seemingly mundane task of recovering data.

All data are not the same. Information that supports important business processes may be distributed across multiple applications, databases, file systems, and hosts--intermixed with data that are easily recreated and clearly less important to the enterprise. Data elements that share the same file system or host containers have varying levels of importance, depending upon the applications they support or the rate at which they are changed. The management complexity in dealing with this environment often leads to inefficient backup practices because all data are treated at the level required for the most important elements. Differentiated data requirements need to be recognized and managed in an automated way to control network and media resources and the administrative expense involved in backup processing.

Disaster recovery can involve many dimensions, ranging from simple user errors that cause the loss of word-processing files or spreadsheets, to hard drive failures that impact entire file systems or databases, to tragic losses of buildings and assets that include large-scale information technology infrastructure and storage subsystems. Backup management is usually tuned to one of these possible disaster situations at the expense of efficient recovery should the other occur. Automated management of backup storage helps administrators move from operational monitoring to developing strategies and practices for handling each of these situations.

New storage devices and technology come and go, creating a struggle in migrating massive amounts of data from one device type to another while maintaining application availability. Failure to keep up with advances in storage technology can expose an enterprise to long-term support problems should its existing devices fail. These exposures directly affect the ability of the organization to provide proper data protection.

These factors provide the motivation to go beyond backup processing to a more comprehensive storage management paradigm--one that controls costs and automates common tasks by providing a means to map the underlying storage to the requirements of a business.

The IBM Workstation Data Save Facility (WDSF) was developed in the late 1980s at the IBM Almaden Research Center to meet customer requirements for distributed network backup. The product underwent significant redevelopment and became the ADSTAR Distributed Storage Manager (ADSM) in 1993. It was later renamed the Tivoli Storage Manager (TSM). The need for network backup emerged from distributed client/server computing with the proliferation of personal computers and workstations. The goal was to centralize the protection of distributed data in an environment where information assets were no longer restricted to controlled mainframe computer environments. Backing up individual computers to locally attached devices was, and still is, costly and error-prone and often did not meet requirements for disaster recovery. With TSM, clients can back up their data to central servers. The servers store the data on a variety of media and track the location of the data for later retrieval.

Tivoli Storage Manager is a client/server application that provides backup and recovery operations, archive and retrieve operations, hierarchical storage management (HSM), and disaster recovery planning across heterogeneous client hosts and centralized storage management servers. Support has been made available for over 15 client platforms, 7 server platforms, and over 400 different storage devices as illustrated in Figure 1. Specialized clients, represented as green cylinders in the figure, supply backup and restore or archive and retrieve support for specific applications such as DB2 * (Database 2 *), Lotus Domino *, Microsoft Exchange, and SAP R/3 **. A client application programming interface (APO is also provided for those customers or business partners who wish to store and retrieve data directly into TSM. Data are transferred between the clients and the TSM server over the communications network or across a storage area network. A Web-based administrative interface and a coordinated distribution of shared management policy provide a common control point for multiple storage management server instances.

[FIGURE 1 OMITTED]

Advanced design points were established for TSM in an environment where many network backup applications evolved from simple single-host backup utilities. The primary influences were the need to deal with relatively slow network speeds, scalability in handling a large number of clients and platforms, and the desire to manage data with policy constructs borrowed from systems-managed storage (SMS) of mainframe computers. This paper describes these design points with a survey of functions and features that demonstrate storage management capabilities in TSM. Today, these capabilities provide management for active data as well as backup copies.

Minimizing network traffic: Progressive incremental backup

The rate of growth in the amount of data stored in computer systems has traditionally outpaced growth in network bandwidth. The use of traditional communication lines for backup processing suggests that indiscriminate backup loads can clog or disable communication networks. Control is needed to determine when backup processing takes place and to ensure that backup communications traffic is minimized when it does occur.

The goal behind the progressive incremental backup of TSM is that once backed up, unchanged data should never have to be resent (or rebacked up) to the server. Most methodologies for open systems have been developed to optimize data placement for tape media reuse and not to minimize data transfer and optimize scalability in a client/server environment.

The use of tape as a backup medium requires that periodic consolidation of data be performed. Tape differs from disk in that once a tape is initialized (or "labeled"), data can only be appended to the tape until it is full, after which time it must be reinitialized before it can be reused. Tape consolidation is required because files change at differing rates; subsequent backup operations that only copy changed files will store the new copies on additional tapes. On existing tapes this leaves logical "holes" occupied by file copies that are no longer current. Over time, these operations fragment the regions of useful data on the original tape volumes and spread backup data across many tapes, requiring more time and mount activity to perform a restore operation and requiring more media for storing backup data. Traditional incremental or differential backup methods achieve consolidation by periodically performing a new full backup of all of the data to a fresh tape. This method frees the original tapes so that they can be reused but has the side effect of resending all (unchanged) data across the network. This method not only wastes network bandwidth, processing cycles, and tapes, but it also leads to having to manage more data.

Figure 2 illustrates the most common consolidation methods in use in comparison with the progressive (or incremental forever) methodology of TSM. Increasing points in time are displayed on the left as times [T.sub.0] through [T.sub.5]. Each column in the figure represents a different backup technique, with use of tape for backup storage depicted as square tape cartridges. The dark areas on the cartridges represent used portions of tape, whereas lighter regions represent unused tape or regions of tape that are no longer valid because the file copies in these areas are no longer needed.

[FIGURE 2 OMITTED]

Incremental backup processing is shown in the first column. The technique involves periodic full backup operations that copy all data (at times [T.sub.0] and [T.sub.3]), interspersed with "incremental" backup operations that only copy data that have changed since the last full or incremental backup operation (times [T.sub.1], [T.sub.2], [T.sub.4], and [T.sub.5]). Although relatively efficient for backup processing, the technique can require the highest number of tape mount operations when restoring data. A full restore operation needed shortly after time [T.sub.2] but before time [T.sub.3], for example, would have to restore the data from tapes created at times [T.sub.0], [T.sub.1], and [T.sub.2].

Differential backup processing, column 2 in the figure, is similar to incremental backup processing. Periodic full backup operations (times [T.sub.0] and [T.sub.4]) are interspersed by "differential" backups (times [T.sub.1], [T.sub.2], and [T.sub.3]), which copy all data that have changed since the last full backup operation. Restore operations are more efficient than with incremental processing because fewer volumes need to be mounted, but differential backup still requires unchanged data to be repetitively copied (or backed up) from the client to the server. A full restore operation after time [T.sub.2] in the differential model would need data from tapes created at times [T.sub.0] and [T.sub.2] (since the tape at time [T.sub.2] also contains the data copied to the tape created at time [T.sub.1]).


1  2  3  4  5  6  7  
COPYRIGHT 2003 All Rights Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.


Browse by Journal Name:
Today on Entrepreneur
Related Video

e-Business & Technology
Franchise News
Business Book Sampler
Starting a Business
Sales & Marketing
Growing a Business
E-mail*:
Zip Code*: