Technical note--the IBM TotalStorage Enterprise
Storage Server: testing for general availability and
beyond.
by Dillon, M.R.^Collins, P.M.^Hurley, P.R.^Handlin, J.E.^Reed,
S.A.
Testing the IBM TotalStorage * Enterprise Storage Server *
(ESS)--both the hardware and the supporting software--is a daunting
task. ESS supports a large number of programmable features,
heterogeneous servers and their attached devices, attachment methods,
and operating systems. The complexity of the hardware and software
testing required is considerable, in light of a very rich function set
(including, for example, Peer-to-Peer Remote Copy [PPRC], Extended
Remote Copy [XRC], remote services support, and Concurrent Copy). There
are also a number of performance-enhancing features for System/390 *,
for example, priority I/O queuing.
Additional complicating factors are the configuration and
operations tools provided with ESS, which include Web-based
configuration tools, service log-on tools provided for service personnel
use such as ESS Net Console, a server-based command line interface tool
for invocation of scripted copy operations, and a server subsystem
device driver for concurrent service operation.
The ESS project presents a range of engineering testing challenges,
requiring team skills in multiple disciplines, education in numerous
system environments, intensive code driver delivery schedules,
considerable product complexity, and the difficulty of stressing the
machine in various configurations and types of I/O activity. As ESS host
attachments increase in variety of host types and adapter types, the
number of host attachment test scenarios is exponentially increased. The
development team also has to be creative in its approach for testing ESS
in multiple customer environments in order to support the
"time-to-market" goals of IBM and to meet customer
requirements, while providing a highly reliable and stable product.
ESS can to be attached in various ways to a variety of computer
architectures. For example, System/390 uses ESCON * (Enterprise Systems
Connection) and FICON * (Fiber Connection) to attach to ESS and views
its ESS volumes as CKD (count-key-data) format devices. The other server
types (i.e., other than System/390), referred to by IBM as "open
systems," attach via SCSI (Small Computer System Interface) and
view their devices as fixed block devices. As a result, ESS testing
requires several different host types (not to mention hosts from many
different manufacturers). Also requiring variation are hardware
adapters, device adapters, and the hard drives themselves.
In the following sections, we describe the design and execution of
ESS hardware and software tests. Although much of the test
implementation is tailored to the unique requirements of ESS and its
applications, our experience may be instructive to testers of a wide
variety of systems.
ESS hardware testing
In the following, we describe in detail the ESS hardware testing:
its objective, test design, and the hardware development verification
tests.
Testing objective. The objective of our hardware development
testing is to provide a technical evaluation of storage products.
Results of that technical evaluation are utilized to support key product
checkpoints (such as announcement, early ship program [ESP], and general
availability [GA]) and to support post-GA field problem resolution and
the release of product enhancements.
The breadth of ESS product features and attachment types (as well
as attachable computers, operating systems, and network topologies),
makes testing challenging, due to the number of possible interactions
and permutations. Test planning also becomes critical. Flexibility must
be built into the test plans in order to adapt to potentially changing
conditions. Test planning and methodologies must anticipate and work
around frequent and recurring bottlenecks in device and software
availability.
In addition to the formal test objective, the hardware testing for
ESS is organized around the following key principles.
The process itself must be subject to analysis. Both the product
and the testing processes must improve over time. Criteria for the
quality of testing are defined and evaluated. Progress is monitored
throughout the test (see the subsection "In-process metrics"),
allowing the team to make "in-flight" corrections to test
scenarios. These metrics are combined with the adoption of postprocess
methods, experience reports, and postmortem evaluations.
Testing must consist of both function-oriented and task-oriented
approaches. Later stages of testing--which increasingly involve the
integration of subsystem elements-must be oriented toward actual
customer usage, task scenarios, and solution testing, rather than
individual function verification.
There must be room in the test plan for "creative testing.
"Such testing capitalizes on the improving skills of the test team
and gives them room to modify test procedures.
There must be cooperation between the hardware and software test
teams, for the benefit of both. Customers buy system solutions, not
isolated elements. As illustrated by the testing of the DFSMS (Data
Facility Storage Management Subsystem) Small Programming Enhancement
(SPE), which is needed to support testing of ESS, this kind of
cooperation is critical to the successful delivery of ESS.
Hardware test design. In this subsection, we describe our
experiences in planning and implementing the ESS hardware test suite.
ESS testing follows the industry-proven best practices of planning,
preparation, and execution.
Test planning normally begins when a product such as ESS becomes
part of the official product plan and with completion of the design
documents--primarily the functional specifications for the product.
Draft plans are developed and reviewed, and issues are tracked and
resolved prior to test plan acceptance. At this point, detailed
schedules are set in place and dependencies are cross-checked.
The preparation phase involves setting up labs, acquiring software
and hardware tools, preparing status-tracking databases, and doing
detailed test design. In some cases, this phase also includes
"lessons learned" from the planning phase.
For each test, execution begins when all of the entrance criteria
documented in the test plan for that test are met. Each hardware test
ends when its exit criteria, as documented in the appropriate test plan,
are met. However, some tests continue after exit criteria are met, for
the purposes of fix verification or regression testing. This extended
test scope covers the verification of functions added after the initial
product plan, as well as boundary testing, whitebox testing, and fix
verification. Boundary testing emphasizes the limits of allowable
parameters and extremes in environmental conditions. Whereas blackbox
testing considers only the inputs and outputs of the system under test,
whitebox testing also takes into account some known internal
characteristics of the system under test.
In-process metrics. The use of in-process metrics is a common
practice for both hardware and software testing. Some of the metrics
that are used during ESS hardware and DFSMS software testing are power
curves, defect charts, and defect analysis.
Power curves track the expected versus actual progress of test case
groups against time. The usual practice is to track test attempts
(planned and actual) as well as test successes (planned and actual).
Product defects found in each test are recorded both by component
and by type of test. The rates of detecting and resolving defects, the
severity of the defects, and the time between detection and resolution
of defects are all tracked. Defects are prioritized by severity and, in
some cases, this may lead to reordering of the testing. Defects in the
tests themselves are also recorded. Defect rates and categories are
later used with the Orthogonal Defect Classification (ODC) (1) process
as one measure of test effectiveness.
Several layers of defect analysis are done during and after product
development. During ESS hardware and microcode development, all defects
are classified by customer impact and probability of occurrence. This
provides input to the development team as to the severity of the problem
from a customer standpoint. This information, along with the ODC process
(which helps measure test effectiveness), provides developers with tools
to review the defects identified during test cycle and field life, to
enhance or change the development process.
After the product cycle is completed, testing continues with escape
analysis. Escape analysis is a standard engineering practice that treats
problems found by customers after general product availability as
potential testing escapes--problems that should have or could have been
found during product testing. For ESS escape analysis, each problem
encountered in the field is evaluated as follows, once the problem is
resolved. The problem is evaluated to determine if it was a hardware
failure or code problem. Hardware problems require a failure analysis to
determine what caused the hardware failure. In some cases, the problem
is so convoluted or requires so many multiple fault conditions that it
cannot be found in normal/realistic testing. In the remaining cases, a
determination is made as to why the problem was not found, and the
testing process is then modified or expanded to eliminate this type of
escape in the future. Both hardware failures and code problems may
necessitate additional test scenarios for the appropriate areas of test,
or a modification of an existing test scenario to perform a test in a
different manner or sequence. Escape analysis is an ongoing process
throughout the life of a product.
COPYRIGHT 2003 All Rights
Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2003, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.