In the 1960s and 1970s, control units served as gateways that
provided attachment of various input and output devices to a relatively
small number of host channels. This technology was used primarily in
System/360* and System/370*. (1) In addition to providing attachment for
a variety of devices, the control unit provided the conversion between
the channel protocol and the protocols for the attached devices. The
control unit also permitted multiple channels from the same host, or
from different hosts, to attach to the devices it controlled. The
control units provided limited error recovery, as well as error
detection and isolation.
In 1981, a read cache (also known as write-through cache) was
introduced into storage control units in the 3880 Models 11 and 13. In
1988, a write cache (write-in cache) was introduced into storage control
units in the 3990 Model 3. (2) In the late 1980s and early 1990s,
control units included RAID (redundant array of independent disks)
technology to provide additional reliability for the attached storage.
(3) By this time, storage control units had powerful microprocessors, a
large read cache, and a large write cache. Next came adding
storage-based functionality to the storage control units.
A storage control unit provides sole access to the attached devices
and all access to the associated storage/data flows through the storage
control unit. Because the storage control unit was uniquely positioned
to provide function associated with the stored data, and because it
could be equipped with needed processing capability (i.e., processor and
memory), it became the focal point for new storage-oriented
functionality. A result was replication services, which in effect means
creating copies of data for various purposes. The addition of
replication services to the function already present (function that
exploited the read cache, the write cache, and RAID), completed the
transition from the control unit as gateway and aggregator to the
storage server, a system whose advanced functions far exceed the storage
access function.
Networked storage was the next major development in storage
systems. Storage area networks (SANs) enable multiple hosts to work with
a common set of storage systems. Both SANs and network attached storage
(NAS) permit multiple servers to share storage systems and facilitate
the sharing of the stored data. Networked storage contrasts with
"direct-attached" storage, where a storage device is available
to just to a single server. With direct-attached storage, no opportunity
exists for sharing the storage resource or the data stored on it.
The storage consolidation enabled by storage networking provided an
important shift in host-and-storage topology for the UNIX ** and
Microsoft Windows NT ** environments. Historically, the UNIX and
Microsoft Windows NT storage environments consisted of direct-attached
disks, either internal or external. Disks attached to a host were owned
by that host, and unused disk space was not shared with any other
server. The relationship was so close, in fact, that the storage could
rarely be moved to a dissimilar server. Because storage resources across
hosts, be they homogeneous or heterogeneous, could not be pooled
together, the purchasing decision for a host was irreversibly tied to
the purchasing of storage components. Storage consolidation, however,
separates the two purchasing decisions and allows customers to upgrade
or replace hosts (even to new platforms) without purchasing new storage.
Conversely, storage can be upgraded without installing new hosts.
Another important consequence of storage consolidation is the
introduction of storage-based functions, such as replication services.
Using the function provided by the storage system, an enterprise can
build a single set of procedures and processes for data-related
activities, such as disaster recovery or data archiving. These processes
and procedures are the same for all data in the enterprise and are
applied uniformly across heterogeneous hosts. Such processes cannot be
completely independent of the host platform, but the core function
consistency is of significant value in that all data have the same high
level of usability and protection.
Storage has seen dramatic price reductions of 40 to 60 percent per
year. This cost reduction makes possible a rapid increase in configured
storage, and more data being immediately accessible to the enterprise.
As the configured storage grows, the cost of managing this storage
becomes a significant inhibitor to adding more storage. Management costs
can grow exponentially with storage capacity. These costs are primarily
the cost of human resource, first as payroll, but also as the cost of
acquiring and maintaining the required skills.
In order to alleviate the problem of the rising cost of managing
storage systems and enable continued growth of installed storage,
systems management software for storage systems is being enhanced.
Policy-based storage management (PBSM) is directed at reducing the cost
of managing storage. PBSM automation maps enterprise policy to various
constraints and self-optimizing mechanisms to be used when implementing
software components. Ideally, the enterprise policies and goals are
formulated as input to PBSM in the language used to manage the
enterprise. In contrast, today administrators must define configurations
by manually translating business requirements into system requirements.
The PBSM software enlists the appropriate technologies and resource
controls (e.g., service level agreements, quotas) to support enforcement
of enterprise policies through the operation of the information
processing system. PBSM usually operates with most of the solution
components. It can also provide overall monitoring and a feedback
control loop to support consistent delivery of the requested policies.
Another major factor in the evolution of storage systems is the
increasing role of autonomic computing (i.e., self-healing,
self-optimizing, self-configuring, and self-protecting). (4) For over 30
years, self-healing has been a recognized requirement in
enterprise-class storage systems and has come to be known as
"continuous availability." The premise of continuous
availability is that no single failure will result in loss of data,
access to data, or functionality. Scheduled events such as maintenance
and microcode load, as well as unscheduled events such as failures, must
be accomplished without impacting system availability or functionality.
While the self-healing requirement has been relatively constant over the
past 30 years, the self-healing requirement for scheduled events has
become more stringent. New business requirements such as 24-hour
operation and worldwide accessibility have led to the loss of the weekly
or monthly batch windows that were once available for scheduled
activity.
Self-optimizing has become a more important requirement for storage
systems since the introduction of read caching, write caching, and
advanced functions. The system must allocate system resources (e.g.,
read cache, write cache, and processor) based upon the current demands
on the system. This requirement is now prominent in storage system
development.
Self-configuring and self-protecting became more important
requirements with the introduction of storage area networks (SANs). The
additional complexities of configuring networked storage led to
increasing requirements for intelligent self-configuring. The
"universal" access provided by networked storage led to a
dramatically increased requirement for self-protecting, as only those
with proper authorization could be allowed to access data stored within
the system data.
In summary, over the decades, storage evolved from the simple role
of media, where hosts stored data, to powerful storage servers. The
declining cost of physical storage led to a greater focus on the cost of
managing storage, because this remains the primary inhibitor to the
growth of the installed storage. In delivering storage the focus has
become the storage system that can contain the management costs. The
realization of such function is based on new techniques (e.g., PBSM)
implemented in the host as well as in the embedded software of the
latest storage servers. IBM TotalStorage Enterprise Storage Server (ESS)
is a premier example of a storage server designed to meet these
requirements.
The rest of this paper is organized as follows. In the next section
we describe ESS architecture, discuss its server-based design, and
describe the basic operation. Then we discuss the ESS objectives and the
methods used to achieve them. In the following section we explore some
design decisions that significantly affected ESS architecture and
performance. We conclude with some comments about possible future
enhancements.
ESS hardware and embedded software
ESS is IBM'S most powerful disk storage server. It supports a
multitude of hosts in a heterogeneous open-systems environment. ESS
supports direct connection to SANs and provides a number of advanced
functions for data duplication and backup and disaster recovery. We
first discuss the server-based design of ESS and then describe its basic
operation.
Server-based design. ESS is a server-based storage system
configured from two IBM pSeries* symmetrical multiprocessors (SMPs). (5)
The SMPs cooperate to provide and support the function, performance, and
continuous availability so critical to high-end storage. Each SMP has
one or more host adapters that provide host connectivity. Each SMP also
has one or more device adapters that attach to disk devices.
COPYRIGHT 2003 All Rights
Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2003, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.