On March 8, 2001, Paul Horn, IBM Senior Vice President and Director
of Research, presented the theme and importance of autonomic computing
to the National Academy of Engineering at Harvard University. His
message was:
The information technology industry loves to
prove the impossible possible. We obliterate barriers
and set records with astonishing regularity.
But now we face a problem springing from the very
core of our success--and too few of us are focused
on solving it. More than any other I/T problem,
this one--if it remains unsolved--will actually prevent
us from moving to the next era of computing.
The obstacle is complexity ... Dealing with
it is the single most important challenge facing the
I/T industry. (1)
One month later, Irving Wladawsky-Berger, Vice President of
Strategy and Technology for the IBM Server Group, introduced the Server
Group's autonomic computing project (then named eLiza * (2)), with
the goal of providing self-managing systems to address those concerns.
Thus began IBM's commitment to deliver "autonomic
computing"--a new companywide and, it is to be hoped,
industry-wide, initiative targeted at coping with the rapidly growing
complexity of operating, managing, and integrating computing systems.
We do not see a change in Moore's law (3) that would slow
development as the main obstacle to further progress in the information
technology (IT) industry. Rather, it is the IT industry's
exploitation of the technologies in accordance with Moore's law
that has led to the verge of a complexity crisis. Software developers
have fully exploited a four- to six-orders-of-magnitude increase in
computational power--producing ever more sophisticated software
applications and environments. There has been exponential growth in the
number and variety of systems and components. The value of database
technology and the Internet has fueled significant growth in storage
subsystems to hold petabytes (4) of structured and unstructured
information. Networks have interconnected the distributed, heterogeneous
systems of the IT industry. Our information society creates
unpredictable and highly variable workloads on those networked systems.
And today, those increasingly valuable, complex systems require more and
more skilled IT professionals to install, configure, operate, tune, and
maintain them.
IBM is using the phrase "autonomic computing" (5) to
represent the vision of how IBM, the rest of the IT industry, academia,
and the national laboratories can address this new challenge. By
choosing the word "autonomic," IBM is making an analogy with
the autonomic nervous system. The autonomic nervous system frees our
conscious brain from the burden of having to deal with vital but
lower-level functions. Autonomic computing will free system
administrators from many of today's routine management and
operational tasks. Corporations will be able to devote more of their IT
skills toward fulfilling the needs of their core businesses, instead of
having to spend an increasing amount of time dealing with the complexity
of computing systems.
Need for autonomic computing
As Frederick P. Brooks, Jr., one of the architects of the IBM
System/360 *, observed, "Complexity is the business we are in, and
complexity is what limits us." (6) The computer industry has spent
decades creating systems of marvelous and ever-increasing complexity.
But today, complexity itself is the problem.
The spiraling cost of managing the increasing complexity of
computing systems is becoming a significant inhibitor that threatens to
undermine the future growth and societal benefits of information
technology. Simply stated, managing complex systems has grown too costly
and prone to error. Administering a myriad of system management details
is too labor-intensive. People under such pressure make mistakes,
increasing the potential of system outages with a concurrent impact on
business. And, testing and tuning complex systems is becoming more
difficult. Consider:
* It is now estimated that one-third to one-half of a
company's total IT budget is spent preventing or recovering from
crashes. (7)
* Nick Tabellion, CTO of Fujitsu Softek, said: "The commonly
used number is: For every dollar to purchase storage, you spend $9 to
have someone manage it." (8)
* Aberdeen Group studies show that administrative cost can account
for 60 to 75 percent of the overall cost of database ownership (this
includes administrative tools, installation, upgrade and deployment,
training, administrator salaries, and service and support from database
suppliers). (9)
* When you examine data on the root cause of computer system
outages, you find that about 40 percent are caused by operator error,
(10) and the reason is not because operators are not well-trained or do
not have the right capabilities. Rather, it is because the complexities
of today's computer systems are too difficult to understand, and IT
operators and managers are under pressure to make decisions about
problems in seconds. (11)
* A Yankee Group report (12) estimated that downtime caused by
security incidents cost as much as $4,500,000 per hour for brokerages
and $2,600,000 for banking firms.
* David J. Clancy, chief of the Computational Sciences Division at
the NASA Ames Research Center, underscored the problem of the increasing
systems complexity issues: "Forty percent of the group's
software work is devoted to test," he said, and added, "As the
range of behavior of a system grows, the test problem grows
exponentially." (13)
* A recent Meta Group study looked at the impact of downtime by
industry sector as shown in Figure 1.
[FIGURE 1 OMITTED]
Although estimated, cost data such as shown in Figure 1 are
indicative of the economic impact of system failures and downtime.
According to a recent IT resource survey by the Merit Project of
Computer Associates International, 1867 respondents grouped the most
common causes of outages into four areas of data center operations:
systems, networks, database, and applications. (14) Most frequently
cited outages included:
* For systems: operational error, user error, third-party software
error, internally developed software problem, inadequate change control,
lack of automated processes
* For networks: performance overload, peak load problems,
insufficient bandwidth
* For database: out of disk space, log file full, performance
overload
* For applications: application error, inadequate change control,
operational error, nonautomated application exceptions
Well-engineered autonomic functions targeted at improving and
automating systems operations, installation, dependency management, and
performance management can address many causes of these "most
frequent" outages and reduce outages and downtime.
A confluence of marketplace forces are driving the industry toward
autonomic computing. Complex heterogeneous infrastructures composed of
dozens of applications, hundreds of system components, and thousands of
tuning parameters are a reality. New business models depend on the IT
infrastructure being available 24 hours a day, 7 days a week. In the
face of an economic downturn, there is an increasing management focus on
"return on investment" and operational cost controls--while
staffing costs exceed the costs of technology. To compound matters
further, there continues to be a scarcity of highly skilled IT
professionals to install, configure, optimize, and maintain these
complex, heterogeneous systems.
To respond, system design objectives must shift from the
"pure" price/performance requirements to issues of robustness
and manageability in the total-cost-of-ownership equation. As a
profession, we must strive to simplify and automate the management of
systems. Today's systems must evolve to become much more
self-managing, that is: self-configuring, self-healing, self-optimizing,
and self-protecting.
Irving Wladawsky-Berger outlined the solution at the Kennedy
Consulting Summit in November 2001: "There is only one answer: The
technology needs to manage itself. Now, I don't mean any far out AI
project; what I mean is that we need to develop the right software, the
right architecture, the right mechanisms ... So that instead of the
technology behaving in its usual pedantic way and requiring a human
being to do everything for it, it starts behaving more like the
`intelligent' computer we all expect it to be, and starts taking
care of its own needs. If it doesn't feel well, it does something.
If someone is attacking it, the system recognizes it and deals with the
attack. If it needs more computing power, it just goes and gets it, and
it doesn't keep looking for human beings to step in." (15)
What is autonomic computing?
COPYRIGHT 2003 All Rights
Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2003, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.