More Resources

Using logical data models for understanding and transforming legacy business applications.


by Chandra, Satish^de Vries, Jackie^Field, John^Hess, Howard^Kalidasan, Manivannan^Raghavan, Komondoor V.^Nieuwerth, Frans^Ramalingam, Ganesan^Xue, Justin
IBM Systems Journal • July-Sept, 2006 • Technical Forum
Article Tools
T   |   T
TEXT SIZE:
printPrint
E-MailE-Mail

Add to My Bookmarks

Adds Article to your Entrepreneur Assist Bookmark page.

Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious.

Frederick Brooks, The Mythical Man-Month

Modifying a legacy application is typically an expensive and time-consuming process, even when the required modifications are conceptually very simple. We argue that this problem can be ameliorated by adopting an approach in which logical data models of a legacy application are used by software developers to understand, maintain, and transform the software. In addition, we outline the goals and status of the Mastery project at IBM Research, which aims to build a suite of tools for automatically extracting logical models from legacy applications, focusing initially on logical data models.

THE PROBLEM

For the past few years, our group at IBM Research has been investigating tools and techniques for analyzing and transforming legacy business applications, focusing on mainframe-based applications written in COBOL. (1) Such applications are often decades old and implement core business functionality. Yet they are difficult to update in a timely manner in response to new business requirements due to a number of factors that include the following:

[] Volume of code in a typical application

[] Logical structure of code has deteriorated as updates have accumulated over time

[] Functional redundancy

[] Structure of code reflects the dated technology on which it was built

[] Scarce technical skills

Size

Legacy application portfolios, that is, complete collections of programs and related components, can be very large. For example, one IBM customer had a portfolio consisting of 700 interdependent applications, 3000 online data sets, 27,000 batch jobs, and 31,000 compilation units. The sheer volume of information contained in an application of this size makes it impossible for an individual to understand the relationships between all parts of the application.

Deterioration

The logical structure of code and data tends to deteriorate over time as a result of a continuous stream of modifications and enhancements. For large legacy applications, persistent data is the principal coupling mechanism between components of an application portfolio. Yet, as an application evolves to meet new business requirements, the structure and coherence of the data models underlying the code decays faster than the structure and coherence of the basic control and process flow through the application. Perhaps this is because it is relatively easy to add new functionality to an existing application by creating modules that manipulate new data items stored separately from the original application data. The alternative of refactoring the basic process flow through the application to accommodate new requirements typically requires much more intrusive changes.

Redundancy

Over time, applications frequently accumulate a great deal of redundant code (multiple code fragments that perform the same logical function) and redundant data (data structures that represent the same information, perhaps with slight differences, and are scattered throughout the code). Reasons for this redundancy include incomplete integration of information systems following business mergers, performance-driven enhancements to the code, and quick "hacks" when adding new functionality under tight schedules.

Technology

The code structure of legacy applications often reflects the limitations of the programming languages used and the middleware on which it was originally designed to run. In many cases, the code structure dictated by the constraints of legacy languages and middleware renders such systems more difficult to understand and evolve than they would be if they had been implemented on modern platforms.

Skills

As new languages and software systems become popular, it becomes more difficult to find people with skills in legacy languages and systems.

Interest in the use of automated and semiautomated tools to analyze and transform legacy code is increasing. Such tools include program-understanding tools, tools for identifying and extracting semantically related code statements (through techniques such as program slicing (2)), tools for migrating from one library or middleware base to another, tools for integrating legacy code with modern middleware, and so on.

In the remainder of the paper, we first explain the value of logical data models and describe a number of applications of logical models to program-understanding and transformation tasks. Then we describe the Mastery project, which is concerned with developing algorithms and tools for extracting and manipulating logical data and the source code from which they are derived. We conclude with a brief review of related work and some final comments.

VALUE OF LOGICAL DATA MODELS

The Mastery project is concerned with extracting logical models from legacy applications. These logical models, which are high-level abstractions of business processes and data relationships, are used together with human- and machine-readable links from these logical models back to their physical realizations in code as the foundation for a variety of program-understanding and transformation tasks (we use "physical" to mean "implementation-related"). The initial focus of the Mastery project is on logical data models: abstractions encoding essential data relationships. In this paper we focus on applications of data models because we believe that their utility (relative to process- and control-oriented program abstractions) in program understanding and transformation has been under-appreciated. Nonetheless, other concepts of logical models not covered in this paper are also valuable; the information they provide can complement logical data models for many of the applications we consider.

Logical data models are critical for understanding and transforming legacy applications. Consider the UML **-style (3) logical data model depicted in Figure 1 (UML stands for Unified Modeling Language **.) This model describes key data structures and their interrelationships for a typical order-processing application. In this case, a batch application processes transaction records pertaining to orders for parts; the processing of a transaction may result in the creation of a new order for a part (New Order), in the correction of an error in an existing unfulfilled order (Correct), or in the cancelation of an unfulfilled order (Delete), and so on.

[FIGURE 1 OMITTED]

The application represented by the model in Figure 1 is large (around 60,000 lines of COBOL) and complex. The complexity of the code obscures its essential functionality, which is to process different kinds of transactions pertaining to orders for parts. This functionality is expressed succinctly and at a high level of abstraction by the data model. In other words, the "business logic" of the application is concerned primarily with maintaining and updating certain relationships among persistent and transient data items; therefore, the data model embodies much of the interesting functionality of the application, even though the model contains no representation of code.

It is notable that the logical data model shown in Figure 1 differs greatly from the data declarations in the source code of the application. Figure 2 shows an outline of these data declarations, with the data items linked (links shown using dashed arrows) to the corresponding logical-data-model entities (this figure contains a relevant subset of the logical model in Figure 1). The data declarations are spread over several source files; furthermore, they reveal little about the structure of and relationships between the logical entities manipulated by the application, which is obtainable only by an analysis of the code that uses the data. As illustrated in Figure 2, the logical data model adds value by making information that is hidden in the code explicit, such as the following:

[] Logical entities--The logical entities manipulated by the program include Transactions (i.e., requests to the system of various types) Orders, Parts, and so forth. Physical data items (variables) correspond to these entities; such as ORDER-BUF and ORDER-REC store Orders (as indicated by the links).

[] Logical subtypes--Transactions are of several kinds (have several subtypes), such as Delete, Correct, and New Order.

[] Associations--Entities are associated with (or pertain to) other entities, as indicated by the red arrows in Figure 1. Associations have multiplicities; for example, the labels on the association from Transaction to Order indicate that each Transaction pertains to zero or one (existing) Orders and that each Order has zero or more Transactions pertaining to it on any given day.

[] Aggregation--The information corresponding to a single part is stored in two physical records, PRI-PART-REC and PR2-PART-REC, which are tied together by their PART-KEY attribute. This (perhaps historical) artifact is elided in the logical model, and both records are linked to a single "Part" entity.

[] Integrity constraints--Although our example does not illustrate it, a logical data model can also include semantic integrity constraints and data invariants (beyond those implied by multiplicities on associations), such as the constraint that the Order Amount must be positive.

[FIGURE 2 OMITTED]

APPLICATIONS OF LOGICAL DATA MODELS


1  2  3  4  
COPYRIGHT 2006 All Rights Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2006, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.


Browse by Journal Name:
Today on Entrepreneur

e-Business & Technology
Franchise News
Business Book Sampler
Starting a Business
Sales & Marketing
Growing a Business
E-mail*:
Zip Code*: