Utility computing SLA management based upon business
objectives.
by Buco, M.J.^Chang, R.N.^Luan, L.Z.^Ward, C.^Wolf, J.L.^Yu,
P.S.
It has become increasingly desirable for companies worldwide to
outsource their computing resources, e-business applications, and
business processes, to focus on the growth of their core competency and
to competitively improve their productivity by exploiting leading-edge
computing technologies. Aiming at capitalizing on this information
technology (IT) outsourcing trend, leading IT providers are exploring
cost-effective means of maximizing the utilization of shareable
computing and human resources under the utility computing model. (1)
From the customer's viewpoint, the utility computing model promises
on demand delivery of IT capabilities and cost-effective usage-based
pricing schemes. Service-quality management objectives are assured by
the provider in accordance with the established service level agreement
(SLA) contract. The customer need not know the implementation details of
the provider's service level management (SLM) processes. (2)
A utility computing SLA is an IT service contract that specifies
the minimum expectations and obligations that exist between the provider
and the customer of a utility computing service. (3,4) It includes one
or more service level components, each of which specifies the
measurement, evaluation, and reporting criteria for an agreed
service-quality standard (5) such as:
* How raw quality measures (e.g., service availability or
performance) for an agreed service component (e.g., on demand storage
provisioning) in the contract will be gathered
* How raw quality measures will be adjudicated to become qualified
quality measures (so that, for example, "service outages caused by
the customer or associated with contract maintenance provisions do not
contribute to the total service downtime calculations" (6))
* How qualified quality measures will be used to evaluate the
achieved service levels (e.g., computing monthly Lotus Notes *
availability as "the monthly average availability of the Lotus
Notes application running 017 the e-mail servers, weighted by the number
of Lotus Notes IDs on each server")
* How service level evaluation results will be reported (e.g.,
"monthly network latency statistics can be viewed at the following
URL [uniform resource locator]")
* How unexpected disputes on service level evaluation results will
be resolved
Based upon the agreed set of quality standards (or service level
targets), ramifications of not meeting of exceeding the standards can be
explicitly included in the SLA contract. If a service level target (or a
service level objective (2)) is linked with a penalty clause for a
service level violation, it is considered to be a service level
guarantee (SLG); otherwise, it is a service level intent. The clarity,
attainability, and manageability of a service level guarantee are
usually better than those of a service level intent in a commercial SLA
contract.
A service level target in an SLA contract can be stated based upon
objective quantitative measurement of computing system availability or
performance (e.g., "monthly availability of Individual Web Server
will be no less than 99. percent") or business process efficiency
or effectiveness (e.g., "no less than 97 percent of on demand
storage provisioning requests are fulfilled within two business
days"). The refund policies for missing service level targets can
be specified relative to the service cost (e.g., "credit customer
one day of the service cost if the outsourced e-business infrastructure
is unavailable more than 15 minutes a day") or in absolute terms
(e.g., "credit customer two thousand dollars if a monthly average
network latency across the provider ISP [Internet Service Provider]
access links to the ISP's backbone is higher than 95
millisecond"). A sample (abridged) Web hosting SLA contract is
provided in the Appendix.
>From the viewpoint of a utility computing provider, offering a
few customer-neutral service functions atop a common service delivery
infrastructure exploits economy of scale better than pursuing a high
degree of customization of its service functions for every potential
customer. This customer-neutral approach to establishing SLA contracts
is adopted by most network and server collocation service providers. (7)
However, a competitive IT outsourcing contract normally requires
nontrivial customer-specific customization or extension of the
provider's "standard" service offerings to accommodate
the customer's unique IT outsourcing needs. When the number of such
customer-specific SLA contracts grows, the complexity increases in the
provider's service delivery environment and SLM processes. (8) A
credible study on a leading IT service provider's SLA reporting
cost, for example, has shown that several millions of dollars could be
saved annually by reducing the cost of generating the monthly reports
for 100 high-valued customer-specific SLA contracts by no more than 20
percent. Thus, it is important for a successful utility computing
service provider to be able to satisfy its customers' demand for
customer-oriented IT outsourcing functions with high-quality services
and to fulfill all of its SLA commitments based upon business objectives
(e.g., cost-effectively minimizing the exposed business impact of
missing SLA commitments).
Figure 1 highlights the business logic for service level reporting
(or SLA compliance reporting). It shows that the gathered raw quality
measures must be adjudicated first before they can be used as qualified
quality measurement. The service level evaluation step can be triggered
to generate the quality attainment reporting data for a past (completed)
service level evaluation period of for the current evaluation period.
After making changes to the input of implementation of any one of the
steps, that step as well as the following steps must be re-executed to
update the affected service level reports. Activation of such a report
update process is necessary when, for example, the qualification status
of a quality measure needs to be changed after a dispute about the
quality measure is resolved between the customer and the provider.
[FIGURE 1 OMITTED]
In order to make timely adequate service management decisions based
upon the provider's SLA commitments, the provider's SLA
management system must be capable of performing on demand intermediate
service level evaluations with support for adjudication processes for
contractual quality measures. The intermediate service level evaluation
results must be as accurate and current as possible so that appropriate
SLM processes can be executed in a timely manner.
A leading utility computing provider must proactively: (1) maximize
customer satisfaction with competitive service level reports (with
regard to both coverage and attained quality), (2) minimize the exposed
business impact of service level violations, and (3) lower the
cost-to-quality ratio of executing SLM processes. However, these goals
cannot be sufficiently and effectively supported by the existing service
quality management products and common service management practices (9)
for the following reasons:
* Existing service quality management products do not support SLA
compliance evaluations well because of their limited support of the
adjudication processes for quality measures. (10)
* Contractual and internal quality measures on computing system
health or performance are usually sent directly to service personnel
(who usually manage systems by experience) or system management agents
(which usually manage systems by infrequently changed thresholds or
condition-action rules). Most service personnel and system management
agents know little about the established SLA contracts; moreover, most
of them incorrectly equate contractual service level targets to raw
quality-monitoring thresholds.
* Service levels on efficiency or effectiveness of business
processes (e.g., resource provisioning processes and problem resolution
processes) are usually managed by a simple and static task
prioritization scheme, such as those based upon severity levels.
* When computing-resource or human-resource contention situations,
or both, are caused by unexpected system management alerts, ad hoc SLM
processes are usually used to determine which management actions should
be carried out first by the available service personnel or system
management agents. Resolution-time-based business impact assessments of
the alerts are not clearly linked with the provider's SLA
commitments and the intermediate service level evaluation results for
the affected SLA contracts.
Existing service-quality management technologies and methodologies,
therefore, need to be improved to enable unified, business-oriented
approaches to fulfilling SLA commitments. (11,12) This paper presents
the design rationale of the utility computing SLA management system
called SAM (SLA Action Manager) and our implementation experiences with
it. The SAM project aims to develop a generic SLA management framework
and an integrated set of advanced service level management technologies
that among other benefits do the following:
1. Enable the provider to deploy an effective means of capturing
and managing SLM-related contractual data as well as the provider's
internal management data.
2. Enable the provider and the customer to review and analyze
intermediate service level attainment reports on demand.
3. Assist service personnel and service management agents in
ordering quality management alerts based upon the exposed business
impact over time.
4. Automate the prioritization and execution management of SLM
processes, including the assignment of SLM tasks to service personnel
using continual optimization technologies.
COPYRIGHT 2004 All Rights
Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2004, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.