Enabling autonomic behavior in systems software with
hot swapping.
by Appavoo, Jonathan^Hui, Kevin^Soules, Craig A.N.^Wisniewski,
Robert W. ^Da Silva, Dilma M.^Krieger, Orran^Auslander, Marc
A.^Edelsohn, David J. ^Gamsa, Ben^Ganger, Greg R.^McKenney,
Paul^Ostrowski, Michael.^Rosenburg, Bryan^Stumm, Michael^Xenidis,
Jimi
As computer systems become more complex, they become more difficult
to administer properly. Special training is needed to configure and
maintain modern systems, and this complexity continues to increase.
Autonomic computing systems address this problem by managing themselves.
(1) Ideal autonomic systems just work, configuring and tuning themselves
as needed.
Central to autonomic computing is the ability of a system to
identify problems and to reconfigure itself in order to address them. In
this paper, we investigate hot swapping as a technology that can be used
to address systems software's autonomic requirements. Hot swapping
is accomplished either by interpositioning of code, or by replacement of
code. Interpositioning involves inserting a new component between two
existing ones. This allows us, for example, to enable more detailed
monitoring when problems occur, while minimizing run-time costs when the
system is performing acceptably. Replacement allows an active component
to be switched with a different implementation of that component while
the system is running, and while applications continue to use resources
managed by that component. As conditions change, upgraded components,
better suited to the new environment, dynamically replace the ones
currently active.
Hot swapping makes downloading of code more powerful. New
algorithms and monitoring code can be added to a running system and
employed without disruption. Thus, system developers do not need to be
prescient about the state that needs to be monitored or the alternative
algorithms that need to be available. More importantly, new
implementations that fix bugs or security holes can be introduced in a
running system.
The rest of the paper is organized as follows. The next section
describes how hot swapping can facilitate the autonomic features of
systems software. An important goal of autonomic systems software is
achieving good performance. The section "Autonomically improving
performance" illustrates how hot swapping can autonomically improve
performance using examples from our K42 (2) research operating system
(OS) as well as from the broader literature. The section that follows
describes a generic infrastructure for hot swapping and contrasts it
with the adaptive code alternative. Then the section "Hot swapping
in K42" describes the overall K42 structure, presents the
implementation of hot swapping in K42, and includes a brief status and a
performance evaluation. The next section discusses related work, and the
concluding section contains some final comments.
Autonomic features through hot swapping
Autonomic computing encompasses a wide array of technologies and
crosses many disciplines. In our work, we focus on systems software. In
this section we discuss a set of crucial characteristics of autonomic
systems software and describe how hot swapping via interposition and
replacement of components can support these autonomic features, as
follows.
Performance--The optimal resource-management mechanism and policy
depends on the workload. Workloads can vary as an application moves
through phases or as applications enter and exit the system. As an
example, to obtain good performance in multiprocessor systems,
components servicing parallel applications require fundamentally
different data structures than those for achieving good performance for
sequential applications, However, when a component is created, for
example, when a file is opened, it is generally not known how it will be
used. With replacement, a component designed for sequential applications
can be used initially, and then it can be autonomically switched to one
supporting greater concurrency if contention is detected across multiple
processors.
System monitoring--Monitoring is required for autonomic systems to
be able to detect security threats, performance problems, and so on.
However, there is a trade-off between placing extensive monitoring in
the system and the performance overhead this entails. With support for
interposition, upon detection of a problem by broad-based monitoring, it
becomes possible to dynamically insert additional monitoring, tracing,
or debugging without incurring overhead when the more extensive code is
not needed. In an object-oriented system, where each resource is managed
by a different instance of an object, it is possible to garner an
additional advantage by monitoring the code managing a specific
resource.
Flexibility and maintainability--Autonomic systems must evolve as
their environment and workloads change, but must remain easy to
administer and maintain. The danger is that additions and enhancements
to the system increase complexity, potentially resulting in increased
failures and decreased performance. To perform hot swapping, a system
needs to be modularized so that individual components may be identified.
Although this places a burden on system design, satisfying this
constraint yields a more maintainable system. Given a modular structure,
hot swapping often allows each policy and option to be implemented as a
separate, independent component, with components swapped as needed. This
separation of concerns simplifies the overall structure of the system.
The modular structure also provides data structures local to the
component. It becomes conceivable to rejuvenate software by swapping in
a new component (same implementation) to replace the decrepit one. This
rejuvenation can be done by discarding the data structures of the old
object, then starting from scratch or a known state in the new object.
System availability--Numerous mission-critical systems require
five-nines-level (99.999 percent) availability, making software upgrades
challenging. Support for hot swapping allows software to be upgraded
(i.e., for bug fixes, security patches, new features, performance
improvements, etc.) without having to take the system down. Telephony
systems, financial transaction systems, and air traffic control systems
are a few examples of software systems that are used in mission-critical
settings and that would benefit from hot-swappable component support.
Extensibility--As they evolve, autonomic systems must take on tasks
not anticipated in their original design. These tasks can be performed
by hot-swapped code, using both interposition and dynamic replacement.
Interposition can be used to provide existing components with wrappers
that extend or modify their interfaces. Thus, these wrappers allow
interfaces to be extended without requiring that existing components be
rewritten. If more significant changes are required, dynamic replacement
can be used to substitute an entirely new object into an existing
running system.
Testing--Even in existing relatively inflexible systems, testing is
a significant cost that constrains development. Autonomic systems are
more complicated, exacerbating this problem. Hot swapping can ease the
burden of testing the system. Individual components can be tested by
interposing an object to generate input values and examine results,
thereby improving code coverage. Delays can be injected into the system
at internal interfaces, allowing the system to explore potential race
conditions. This concept is motivated by a VLSI (very large scale
integration) technique whereby insertion of test probes across the chip
allows intermediate values to be examined. (3,4)
Autonomically improving performance
As outlined in the previous section, autonomic computing covers a
wide range of goals, one of which is improving performance. For systems
software, the ability to self-tune to maintain or improve performance is
one of the most important goals. In this section, we discuss how hot
swapping can support and extend existing performance enhancements,
allowing the OS to tailor itself to a changing environment.
Optimizing for the common case. For many OS resources the common
access pattern is simple and can be implemented efficiently. However,
the implementation becomes expensive when it has to support all the
complex and less common cases. Dynamic replacement allows efficient
implementations of common paths to be used when safe, and
less-efficient, less-common implementations to be switched in when
necessary.
As an example, consider file sharing. Although most applications
have exclusive access to their files, on occasion files are shared among
a set of applications. In K42, when a file is accessed exclusively by
one application, an object in the application's address space
handles the file control structures, allowing it to take advantage of
mapped file I/O, thereby achieving performance benefits of 40 percent or
more. (5) When the file becomes shared, a new object dynamically
replaces the old object. This new object communicates with the file
system to maintain the control information. Other examples where similar
optimizations are possible are (a) a pipe with a single producer and
consumer (in which case the implementation of the pipe can use shared
memory between the producer and consumer) and (b) network connections
that have a single client on the system (in which case data can be
shared with zero copy between the network service and the client).
Optimizing for a wide range of file attribute values. Several
specialized file system structures have been proposed to optimize file
layout and caching for files with different attributes. (6,7) We can
optimize the performance across the range of file attribute values by
implementing a number of components, where each component is optimized
for a given set of file attribute values, and then having the OS hot
swap between these components as appropriate.
COPYRIGHT 2003 All Rights
Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2003, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.