More Resources

Mining for information gold: data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a substantial way.


by Firestone, Joseph M.
Information Management Journal • Sept-Oct, 2005 •
Article Tools
T   |   T
TEXT SIZE:
printPrint
E-MailE-Mail

Add to My Bookmarks

Adds Article to your Entrepreneur Assist Bookmark page.

During the late 1980s, several trends in computing, including the emergence of client-server technology, the growing popularity of structured query language (SQL), the gravity of "the islands of information" problem, and the inaccessibility of much of the structured information "hidden away" in both legacy and SQL transactional databases led to the development of large, physically centralized, structured databases called data warehouses. These were intended for decision support. SQL-querying technology, however, was not sufficient to deliver the hoped-for information value, and the 1990s led to the rapid growth of data warehousing and to the development and spread of new technologies for getting useful information out of those surprisingly unwieldy first-generation data warehouses.

One of these new technologies is data mining, a term based on the idea that very large databases are "mountains" of information that can be "mined" for "nuggets" of great value if the right technology is applied. During the 1980s and increasingly during the 1990s, data mining technology was becoming available in the form of statistical and artificial intelligence-based models and computing algorithms. Additionally, new software technology was being developed for integrating distributed systems based on object and web technology. The result of this confluence between need and technology has been a continually growing data mining industry containing scores of new companies selling to large, mid-sized--and even small--organizations. There are several sectors interested in data mining: banking, medicine, insurance, retailing, and government. Data mining supports many goals, such as reducing costs, enhancing or reusing research, increasing sales, and detecting fraud.

The image suggested by the term "data mining" is an attractive one, but, unfortunately, it may not be very informative to those records and information management (RIM) professionals who need to know what data mining means for them. RIM managers need answers to these questions.

* What is the process context of data mining?

* What is its value for RIM managers?

* What is the relationship between data mining and knowledge discovery in databases (KDD)?

* How does one get started in a data mining process?

* In what direction is this fast-moving field going?

The most compelling reason for RIM managers to take an interest in data mining is simply this: the "data" in "data mining" are, for the most part, records created in the normal course of business of any organization. Records, then, become the structured data foundation to the data mining process.

What Is Data Mining?

Definitions of data mining abound, and they vary among practitioners. (See Sidebar, "Definitions of Data Mining" on page 50.)

Selecting just one of the definitions is not as important as realizing that people will use the term data mining in at least the four ways described in the sidebar. It will be up to information managers to decide which meaning their organization assigns to it. Definition 3 is used in this article because it has the advantage of distinguishing "data mining" from traditional analyses by emphasizing its automated character in generating patterns and relationships. It also clearly distinguishes data mining from knowledge discovery by emphasizing the much broader character of KDD as an overarching process, including steps distinct from data mining and relying more heavily on human interaction.

What Is the Process Context of Data Mining?

The process context leads to the more comprehensive process of KDD within which data mining occurs. KDD starts with problems--seeking them in routine situations, recognizing them, and clearly articulating them. It continues with gathering information about a problem and its potential solutions. At that point, hypotheses or models are developed that are central to the solution. There are many alternative ways of developing models, including intuition, a literature review, mathematical modeling, and facilitation processes, that do not involve data mining, even when statistical and modeling techniques are used as part of the KDD process. But, at this point, one can make the choice to apply automated analysis to an organization's very large database--that is, data mining--as an initial method of arriving at alternative patterns and/or relationships.

When that decision is made, then the steps of selecting, pre-processing, and transforming data must be completed, as well as the step of selecting data mining tools before data mining itself can be performed. Also, once data mining is completed, KDD is still far from done. The patterns found by data mining must still be interpreted and evaluated, and further statistical analysis and analytical modeling frequently is needed to refine, test, and evaluate the discovered patterns. In short, the process context of data mining is KDD, and KDD, in turn, is a knowledge life cycle originating in a problem, proceeding with attempts to discover patterns through a number of steps, that include--but go beyond--data mining, and ending with evaluating, interpreting, and selecting patterns that solve the original problem.

What Is Its Value for RIM Professionals?

What good is data mining to RIM professionals? Of course, it depends on the person and his or her role. Using the results of KDD that rely on data mining can help with very routine decisions almost anywhere in the enterprise. Mike Ferguson gives a good example in his article "Integrating Business Intelligence into the Enterprise: Part II" regarding a bank call-center operator who receives a lending recommendation on his screen and applies a data mining-derived predictive model to a database to develop a risk score and an associated loan recommendation. Another example is the physician who receives an alert from a prescription order entry system about possible side effects of a scrip along with a report of the conditions and frequency with which the side effects occur.

If there is a problem to solve and performing KDD to produce an explanatory causal or predictive model is being considered, then using a data mining step --in addition to traditional statistical analysis--can be very valuable. In the article

"Putting Data Mining in Its Place," Dorian Pyle tells the story of a bank that turned to data mining when its huge direct mail marketing effort to increase loan inventory failed miserably. Data mining was able to work through 2.5 million accounts looking for those that were the most profitable. It showed that a tiny segment, one comprising only 0.1 percent of the accounts, comprised 30 percent of all people who bought ski equipment valued at $3,000 or more in a 30-day period and then later bought travel packages valued at an additional $3,000 or more. When the bank used this information to implement a marketing package to 8,300 others in its database who had bought $3,000 in ski equipment in 30 days, an additional 3,300 people responded to its offer, purchased an additional $3,000 loan, and helped the bank increase its loan inventory by $10 million. In commenting on this case, Pyle makes the important point that this 0.1 percent segment, discovered by the brute force approach of data mining, would probably have been missed by traditional statistical analysis because its small size makes statistically insignificant. However, from the causal, predictive, and commercial points of view, it was highly significant, and its discovery illustrates one of the advantages of data mining over more traditional approaches in the KDD process.

Though data mining can be very useful in arriving at models, an important caution for RIM managers to keep in mind is that data mining is not magic. That is, merely taking a data set, clicking an on screen icon, and expecting to solve a problem will not work. The steps of KDD surrounding data mining involve continuous interaction of humans and computers. How well those steps are performed depends on the skills and background knowledge of humans.

During the early days of data mining, some of its exponents claimed that data mining software would generate good results even though the data miners using it were not highly skilled or trained analysts or statisticians. But this notion has proved to be oversimplified. All the steps in KDD preceding and following data mining require good technical skills and business experience to perform effectively, and, in the end, they--not a computer--determine the success or failure of the automated data mining step. When it comes to KDD, then, there is no free lunch or "magic." There are only careful and smart humans working through difficult problems with, admittedly, more leverage than they used to have over their databases.

Enhancing the Quality of Information

The job of the RIM professional is to enhance the quality of the information in the enterprise by enhancing the quality of the processes used in producing, storing, using, and integrating information within it. Much, not all, of this information is in the form of structured information--i.e., records--and is found in enterprise databases. Data-mining capability enables staff to enhance the quality of this information by facilitating knowledge discovery in these databases. Other capabilities, however, such as SQL and online analytical processing (OLAP), do not discover new patterns as much as they con firm patterns already thought and articulated by the investigator. As noted above, data mining can discover patterns and relationships that convention al statistical approaches can easily miss.


1  2  
COPYRIGHT 2005 Association of Records Managers & Administrators (ARMA) Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2005, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.


Browse by Journal Name:
Today on Entrepreneur

e-Business & Technology
Franchise News
Business Book Sampler
Starting a Business
Sales & Marketing
Growing a Business
E-mail*:
Zip Code*: