More Resources

Advanced functions for storage subsystems: supporting continuous availability.


by Azagury, A.C.^Factor, M.E.^Micka, W.F.
IBM Systems Journal • July, 2003 •

Storage subsystems (or storage control units) were once expected only to store and retrieve randomly accessible data. That day, however, is long gone, and today storage subsystems--in particular high-end subsystems such as the IBM TotalStorage * Enterprise Storage Server * (ESS), EMC's Symmetrix **, or Hitachi's Lightning--are expected to play an integral role in supporting continuously available business operations.

To support continuously available operation, storage subsystems must support advanced copy functions. These include point-in-time copy and continuous remote copy functions. Point-in-time copy functions enable administrative operations to be performed on the data (e.g., backups, checkpoints, etc.), in almost zero time, in such a way that the applications using the data are not significantly impacted. Continuous remote copy functions support continuous availability by ensuring that all data written to a primary control unit are also written to a remote secondary control unit, which, it is assumed, will not be impacted by a disaster.

Point-in-time copy and continuous remote copy are building blocks for overall solutions, which enable continuously available business operation; they do not, themselves, provide these solutions. For instance, to allow backing up a large ERP (Enterprise Resource Planning) installation, a database hot backup mode is used to quiesce the application (that is, end it by allowing operations to complete normally) while a point-in-time copy is made of the underlying data, and it is this copy that is backed up. Similarly, to ensure that a business can keep operating in the event of a disaster, some form of cluster management software is responsible for leveraging a continuous remote copy facility to "bring up" an application at a backup site in the event of a disaster. For an example, see Reference 2.

In this paper, we focus on continuous remote copy. We have described solutions for point-in-time copy in more detail in Reference 3 and provide a brief summary of this function here.

A disaster recovery solution for a data center must provide a copy of a corporation's data that is physically distant from the corporation's main data center, thereby allowing a business to resume operation in a reasonable amount of time after a disaster (e.g., a fire, flood, or power outage, that destroys or otherwise makes unusable the corporation's data center). A wide range of disaster recovery solutions is possible. The solutions differ in their implementation cost, their impact on normal business operations, the amount of data lost, and the length of time a disaster can cause a business's data to be unavailable. One of the main items that impacts the cost of a disaster recovery solution is the mechanism for ensuring the existence of a remote copy of the data.

Which solution is appropriate for a given situation depends upon the trade-offs made by the corporation. At one extreme are solutions where a nightly tape backup is placed on a truck and driven to a remote location; in the event of a disaster, new information technology equipment is obtained, and the data are restored from tape. Such solutions could make use of point-in-time copy facilities to minimize the impact of the backup on normal business activities. At the other extreme are the solutions in which an on-line copy of data is maintained at a site that is physically remote from the corporation's main data center. This on-line secondary copy can be kept continuously synchronized with the primary copy, making it a mirror of that copy, it can be continuously consistent with the primary copy but running behind, or it can be only periodically consistent. All of these on-line solutions build upon some type of continuous remote copy facility to approach continuous operation even in the face of a disaster.

A continuous remote copy facility differs from a point-in-time copy in two essential ways. First, as the name implies, the source and target of the remote copy (also referred to as primary and secondary) are located at some distance from each other. Second, and more significantly, a continuous remote copy facility is not aimed at capturing the state of the source at some point in time, but rather aims at reflecting all changes made to the source data at the target.

A remote copy facility can be viewed as having essentially two phases. An initial copy phase involves a bulk transfer of all of the data at the primary site to the secondary site. This phase involves transferring large amounts of data in large units. The second phase involves transferring modifications that occur at the primary site to the secondary site; this may include updates that occurred during the first phase. This continuous phase involves transferring smaller units of data. Given sufficient time without modifications to the primary site, any continuous remote copy solution should be able to ensure that the secondary copy of the data is identical to the primary copy.

Given this definition, it should be clear that a range of continuous copy solutions exists. These solutions can be implemented above the file system, by the file system, 4 at the volume manager or device driver level, (5,6) at the storage subsystem, (7-10) or via some combination of external facilities and support by the disk subsystem. (9) Although we mention solutions at other levels, our focus is on continuous remote copy facilities provided by storage control systems. As for point-in-time copy, the biggest benefit of providing this function at the level of the storage subsystem is performance--we do not needlessly add load to other components of the system. In addition, solutions can be uniform across applications. The biggest drawbacks are the lack of semantic knowledge (i.e., knowledge at the subsystem level concerning the content of storage blocks) and the requirement for software to integrate advanced copy functions with the applications in order to provide a total solution.

In the remainder of this paper, we provide a background description of point-in-time copy and a more detailed definition of continuous copy, describing the range of behaviors the latter function can display and the various generic ways of implementing this function in a block storage subsystem. We then show how these facilities are realized in practice by briefly surveying several existing implementations. In particular, we describe in detail the Extended Remote Copy (XRC) (9) of IBM; we also describe Network Appliance's SnapMirror ** 4 and OceanStore. (11) We then describe the Peer-to-Peer Remote Copy (PPRC) facility of ESS, (9) including a discussion of the new Extended Distance option. (12) These last functions were developed in our labs.

Point-in-time copy

Point-in-time copies or "snapshots" of data can be made for a variety of reasons. They allow easy restoration of data in case of inadvertent corruption, backup of a consistent image of the data, easy replication of "production" data for test environments, mining of nearly current versions of data, and so on. The ability to create a point-in-time copy of the data efficiently, with minimal interruption and with minimal overhead, is critical in most production environments, where the expectation for continuous operation is commonplace. The storage capacity growth trend, with drive capacity doubling roughly every 12 months for the last few years, is expected to continue in the near future, and this trend has rendered obsolete point-in-time solutions based on an actual physical copy of the data.

Although conceptually simple, point-in-time copy functions come in a variety of flavors. In some cases, limitations are imposed on the copy, such as making the copy read-only, limiting its fault tolerance, limiting the number of "outstanding" copies, and so on. In addition, while some copy approaches allow almost immediate copying of the data (without advanced planning), some implementations require careful ahead-of-time planning. A more comprehensive review of point-in-time copy techniques is provided in Reference 3.


1  2  3  4  5  6  
COPYRIGHT 2003 All Rights Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.


Browse by Journal Name:
Today on Entrepreneur
Related Video

e-Business & Technology
Franchise News
Business Book Sampler
Starting a Business
Sales & Marketing
Growing a Business
E-mail*:
Zip Code*: