Entrepreneur: Start & Grow Your Business

Machines in the conversation: detecting themes and trends in informal communication streams.


by Spangler, W. Scott^Kreulen, Jeffrey T.^Newswanger, James F.
IBM Systems Journal • Oct-Dec, 2006 •

INTRODUCTION

Since the dawn of spoken language, conversation has been the means by which ideas are developed and a consensus around those ideas obtained. The speed, range, and modes in which conversations can take place have increased with technological advancements over time. Recently, developments in the Internet and associated applications have made it possible for the scale of a single conversation to grow to one involving the simultaneous input of thousands of people. A discourse this massive poses the new challenge of properly summarizing all the thoughts generated and making them comprehensible for participants. This is the problem we address in our research.

Machines taking part in conversations is not a new idea. Conversation between man and machine has been a subject of intense interest ever since the computer was invented. The famous Turing Test (1) for machine intelligence focused on a machine being indistinguishable from a human in one-on-one conversation. One of the first artificial intelligence programs, ELIZA, (2) was a demonstration of a rudimentary conversation between a human patient and a machine "counselor". Our research takes on one small piece of the overall Turing Test problem by seeking an answer to the question, "What can computers contribute to a discourse that extends conversational content beyond what humans convey on their own?"

We believe the answer to this question lies in the text analysis of informal electronic communication streams. A computer that is recording and observing an electronic conversation among many different individuals over a period of time may be able to detect and report on overall metalevel themes and trends in the conversation, relay this information back to the conversational group, and thereby contribute to--and even influence--the course of the conversation. The theory is that in large-scale conversations, such as those taking place on Internet forums and through blogs (Web sites used in the manner of online journals), there are bound to be emergent phenomena, themes and trends that reflect common aggregate behavior that no single human reader can easily discern. This is where textmining approaches come in: The role of the computer in the discussion can be a combination of facilitator, neutral observer, and reporter--helping each human participant to more fully understand and appreciate all of the other human participants' thoughts and ideas and helping to amplify those discussion points that seem to reflect areas of group consensus or overlapping interests. Once an electronic discussion reaches a certain critical size (e.g., those involving hundreds or even thousands of participants in a focused period of time), the need for an individual or individuals to play this role becomes readily apparent. But, as the size of the conversation grows, the sheer volume of the content makes it impractical for humans to fulfill this role successfully. Thus we believe that as conversations scale larger and larger, enabled by instant messaging and World Wide Web technology, the need for computers to be involved in analyzing the content of the conversation and contributing the findings to the conversation becomes greater and greater.

The role played by computers in furthering human discussion is just beginning to be explored in research. The unstructured nature of blogging, discussion groups, opinions, reviews, and the like creates a kind of intellectual democracy of ideas. (3) Additionally, research has shown that group editors with shared concurrent editing capability have a positive effect on brainstorming. (4) Taking this a step further, it has also been shown that directed brainstorming (5) has a positive effect on creativity in problem solving. Obviously it is important to understand the organization (6,7) of the information. Then it is necessary to understand how this organization changes and what the diffusion characteristics (8) of ideas are over time. Once the behavior over time is understood, we would then want to understand the causal nature and the influential effects of information in a network. (9) For some applications, one may want to use and model this understanding to predict future behaviors. (10)

Our research is not about inventing new text-analysis tools; it is about employing and combining existing text-mining techniques in a new way to analyze and contribute to human discourse. We have developed a systematic method and toolset, which we first described in Reference 11. This paper describes how we have taken that generic text-mining approach and applied it to large-scale conversations called Jams. (12,13) A Jam is a construct invented at IBM that allows an organization of significant size to have a discussion in an area of interest with the goal of building consensus around actionable ideas. Our previous work began to indicate the potential of this technology to help facilitate the conversation during a single IBM Jam. (14) This paper takes a much broader look at both the methodology and its application across several Jams (internal to IBM and external) and shows how the analysis techniques have evolved to meet the challenges of this particular application. The success we have had with our approaches to date shows this to be a promising area for future applications in the field of conversational analytics and human-machine interaction.

WHAT IS A JAM?

A Jam is an internet- or intranet-based discussion and idea-stimulation vehicle. More formal than a chat room, a Jam is typically organized into a handful of separate forums (from four to seven in number), each on a different subtopic related to the overall Jam topic. The Jam is continuous, but conducted only for a limited time period (usually between 48 and 72 hours). During the event, participants can come into and leave a Jam as often as they like. Participants who register at the site can make original posts or reply to existing posts. The posts are labeled with the participant's name (anonymous contributions are not permitted). Some Jam participants may simply read the existing posts while others will enter posts without reading anyone else's thoughts. Most participants will both read what is already in the Jam and make their own contributions. As the Jam continues, themes emerge from the communication stream. These themes, detected by text mining, are posted back to the Jam periodically along with typical comments for each theme. This allows participants to see at a glance the gist of what is being said.

Moderators in each forum can highlight hot topics, referred to as Jam Alerts, as they emerge in the discussion (this is separate from the themes detected by text mining). Participants can also use full text search to browse for posts on a certain subject or for posts that particular individuals have contributed. Finally, posts can be e-mailed by Jam participants to others, perhaps encouraging them to make new contributions.

The process of Jamming at IBM has evolved over several years. At first it involved no text-mining technology at all. It used only human facilitators and asked participants to rate ideas to help analyze the event as it was happening and communicate information back to participants. Unfortunately, this system suffered an inevitable problem: The early ideas usually got the most votes. With the introduction of text-mining techniques into the more recent Jam events, each individual participant in the Jam is provided with the necessary information to "hear" the Jam as a whole.

At this writing, there have been seven Jams sponsored by IBM. This paper focuses on the three most recent Jam events that took place at different times between August 2003 and December 2005. Values Jam, a 72-hour event in 2003, involved IBM employees and explored the company's fundamental business beliefs and values. WorldJam, held the following year within IBM, studied how the IBM Values could be implemented. This 48-hour event generated over 32,000 posts. Habitat Jam, sponsored by the United Nations Habitat Initiative, the government of Canada, and IBM in 2005, was an open discussion on the Internet about the future of cities and the search for solutions to critical worldwide urban issues. During this 72-hour event, over 15,000 posts were generated from participants in 120 different countries.

UNDERSTANDING THE JAM THROUGH INTERACTIVE TEXT MINING

Although computers are quite capable of grouping documents together based on their surface characteristics (word frequency), such groupings may not always be useful. To ensure that categories make sense and make useful distinctions can require common sense knowledge and reasoning of a type not yet exhibited reliably by computer software. The involvement of a human in the role of analyst is needed to identify and discard spurious classes that are created from common features but have no underlying semantic value. This is what we mean by interactive text mining.

To play this critical role, the human data analyst must be provided with the necessary information to understand the meaning of each class. When one considers that each class may be composed of hundreds of examples and that the data frequently needs to be analyzed for multiple forums in real time, it becomes clear that powerful summarization tools are needed to communicate the meaning of each class in the taxonomy to the data analyst. Furthermore, as the data analyst finds classes that need to be modified or removed from the taxonomy, powerful editing tools are required to make changes that reflect the analyst's intent. (11,14)

Generating a taxonomy

The initial taxonomy is an important first step in helping the human analyst make sense of a large set of documents quickly and accurately. Our methodology provides two main alternatives for taxonomy generation: K-means clustering (using a set of randomly selected k centroids--average term vectors--to generate clusters) and cohesive keyword clustering (generating clusters based on specific words or phrases selected on the basis of a cohesion metric). We employed the K-means clustering method for the two IBM Jams and the cohesive keyword method for Habitat Jam.

Taxonomy generation through clustering

In cases where the user has no preconceived idea about what categories the document collection should contain, text clustering may be used to create an initial breakdown of the documents into clusters, grouping together documents having similar word content.

To facilitate this process we represent the documents in a vector space model. We represent each document as a vector of weighted frequencies of the document features (words and phrases). (15) We use the txn weighting scheme, also known as normalized term frequency. (16) This scheme emphasizes words with high frequency in a document and normalizes each document vector to have a unit Euclidean norm, i.e., the magnitude of each feature vector is 1.0. For example, if a document were simply the sentence, "We have no bananas, we have no bananas today," and the dictionary consisted of only two terms, "bananas" and "today", then the unnormalized document vector would be {21} (to indicate two "bananas" and one "today"), and the normalized version would be

[2/[square root of 5], 1/[square root of 5]] .

The words and phrases that make up the document feature space are determined by first counting which words occur most frequently in the text (in the most documents). A standard stop-word list is used to eliminate words such as "and", "but", and "the". (17) The top N words are retained in the first pass, where the value of N may vary depending on the length of the documents, the number of documents, and the number of categories to be created. Typically N = 2,000 is sufficient for 10,000 short documents of about 200 words to be divided into 30 categories. (Note that 30 categories were chosen based on user feedback concerning how many categories they could readily contemplate during analysis.) After selecting words in the first pass, we make a second pass to count the frequency of phrases that occur using these words. A phrase is considered to be two consecutive words occurring without intervening nonstop words. We again prune to keep only the N most frequent words and phrases. This becomes the feature space. The documents are then indexed by their feature occurrences (i.e, word count) in a third pass through the data. The user may edit this feature space as desired to improve clustering performance. For instance, the user can add particular words and phrases deemed important, such as named entities like "International Business Machines". Stemming (reducing words to their roots so that different forms of the same word are selected) is usually incorporated to create a default synonym table that the user may edit. (18)

For categorization, we employ the K-means algorithm, (19,20) using a cosine similarity metric (21) to partition the documents into k disjoint clusters automatically. The algorithm is very fast and easy to implement. See Reference 21 for a detailed discussion of various other text-clustering algorithms. The K-means algorithm produces a set of disjoint clusters and a centroid for each cluster that represents the cluster mean. Typically k is initially set to 30 for the highest level of the taxonomy, though the user may adjust this if desired. The initial taxonomy assigns each document to only one category (cluster). After clustering is complete, a final merging step takes place. In this step, two or more clusters dominated by the same keyword (dominated means that 90 percent of the examples contain this keyword) are merged into a single cluster, and a new centroid is calculated based on the combined example set. We do this to avoid arbitrarily separating similar examples into different subsets before the analyst evaluates the class as a whole.

To help the analyst understand the meaning of each cluster, the system names each document category. Cluster naming is not an exact science, but our method attempts to describe the cluster as succinctly as possible without missing any important constituent components. The first rule of naming is that if a single term dominates a cluster, then this term is given as the cluster name. If no term dominates, then the most frequent term in the cluster becomes the first word in the name and the remaining set of examples (those not containing the most frequent term) are analyzed to find the dominant term. If a dominant term for the remaining examples is found, then this term is added to the name (separated by a comma), and the name is complete; otherwise, the process continues for up to four words. Beyond four words, we simply call the class "Miscellaneous".

Taxonomy generation through cohesive terms

During early Jams in which we used our text-mining approach, one of the feedback comments we received was that the initial categorization was often difficult to interpret, making the process of refining the categories painfully slow. It turns out that one of the drawbacks of the K-means clustering approach is that it frequently creates categories which are difficult to interpret by a human being. Approaches to cluster naming attempt to address this issue by adding more and more terms to a name to capture the complex concept that is being modeled by a centroid. An example from our own Values Jam of a difficult cluster name would be: world, specific, develop, e-business. Unfortunately, this approach puts the onus on the human interpreter to make sense of what the list of words means and how it relates to the entire set of examples contained in the category.

To address this problem and speed the taxonomy editing process by starting with category names that are easier to comprehend, we developed a new strategy (described here for the first time) for document categorization based on categories centered around selected individual terms in the dictionary. We then employ a single iteration of K-means to the generated categories to refine the membership so that documents which contain more than one of the selected terms can be placed in the category best suited to the overall term content of the document. Note that the alternative strategy of putting such documents in more than one category (i.e., multiple membership) is less desirable because it increases the average size of each category and defeats the purpose of summarization by the divide-and-conquer strategy inherent in document clustering. Creating multiple copies of documents that match more than one category would be multiplying instead of dividing. Once the clusters are created, we name each one, using the single term that created it, thus avoiding the complex name problem associated with K-means clusters.

Selecting which terms to use for generating categories is critically important. Our approach is to rank all discovered terms in the data set based on a normalized measure of cohesion calculated using

cohesion(T, n) = [summation over x[epsilon]T]cos(centroid(T),x)/[|T|.sup.n],

where T is the set of documents that contain a given term, centroid(T) is the average vector of all these documents, and n is a parameter used to adjust for variance in category size (typically n = 0.9). The cosine distance between document vectors is defined to be

cos(X,Y) = X x Y/||X|| x ||Y||.

Terms that score relatively high with this measure tend to be those with a significant number of examples having many words in common. Adjusting the n parameter downward tends to surface more general terms with larger matching sets, while adjusting it upward gives more specific terms.

The algorithm selects enough of the most cohesive terms to get 80 to 90 percent of the data categorized. Terms are selected in cohesive order, skipping those terms in the list that do not add a significant number (e.g., more than three) of additional examples to those already categorized with previous terms. The algorithm halts when at least 80 percent of the data has been categorized and the uncategorized examples are placed in a Miscellaneous category. The resulting categories are then refined using a single iteration of K-means (i.e., each document is placed in the category of the nearest centroid as calculated by the term membership just described).

While this approach does not completely eliminate the need for taxonomy visualization and editing by an analyst (as described in the following sections), it does make the process much less cumbersome by creating categories that are, for the most part, fairly easy to comprehend immediately. In practice, this cut the time required to edit each taxonomy by about half (from around 30 minutes to around 15 minutes per forum in the Jam).

Viewing the taxonomy

Before analysts can begin editing a taxonomy, they must first understand the existing categories and their relationships. In this section, we describe our strategy to communicate the salient characteristics of a document taxonomy to the user,

Our primary representation of each category is the centroid. (19) The distance metric employed to compare documents to each other and to the category centroids is the cosine similarity metric. (21) As we describe later in the section "Editing the taxonomy," we are not rigid in requiring that each document belong to the category of its nearest centroid, nor do we strictly require every document to belong to only one category.

Summaries

Because we cannot expect the analyst to have time to read through all of the individual documents in a category, summarization is an important tool to help the user understand what a category contains. Summarization techniques based on extracting text from the individual documents (22) were found to be insufficient in practice for the purpose of summarizing an entire document category, especially when the theme of that category covered diverse elements. Instead, we employ two different techniques to summarize a category. The first is a feature bar chart. This chart has an entry for every dictionary term (feature) that occurs in any document of the category. Each entry consists of two bars, a red bar to indicate the percentage of the documents in the category that contain the feature and a blue bar to indicate how frequently the feature occurs in the background population of documents from which the category was drawn. The bars are sorted in decreasing order of the difference between blue and red. Thus the most important features of a category are shown at the beginning of the chart with their relative importance indicated by the size of the bars.

The second technique is a dynamic decision tree representation that describes the feature combinations that define the category. This tree is generated in the same manner as a binary ID3, (23,24) selecting at each decision point the attribute that is most helpful in splitting the document universe so that the two new classes created are most nearly pure category and pure noncategory. Each feature choice is made dynamically as the user expands each node until a state of purity is reached or when no additional features will improve the purity. The result is essentially a set of classification rules that define the category to the desired level of detail. At any point the user may select a node of the decision tree to see either all the documents at the node, all the in-category documents at the node, or all the noncategory documents at the node. The nodes are also color coded: red for a node whose membership is 50 percent or more in-category and blue for a node whose membership is less than 50 percent in-category. This display gives users an in-depth definition of the class in terms of salient features and lets the analyst readily select various category components for further study.

Visualization

We employ a visualization strategy to understand how two or more categories at the same level of the taxonomy relate to each other. The idea is to visually display the term vector space model for each document so that the documents will appear as points in space. The result is that documents containing similar words occur near each other in the visual display. If the vector space model were two dimensional, this would be straightforward: We could simply draw the documents as points on an X,Y scatter plot. The difficulty is that the document vector space is of much higher dimension. In fact, the dimensionality is the size of the feature space (dictionary), which is typically thousands of terms. Therefore, we need a way to reduce the dimensionality from thousands to two in such a way as to retain most of the relevant information. Our approach uses the CViz method, (25) which relies on three category centroids to define the plane of most interest and to project the documents as points on this plane (by finding the intersection with a normal line drawn from point to plane). The selection of which categories to display in addition to the selected category is based on finding the categories with the nearest centroid distance to the selected category. The documents displayed in such a plot are color coded according to category membership. The centroid of the category is also displayed. An example of the resultant plot is shown in Figure 1. Such a plot is a valuable way to discover relationships among neighboring concepts in a taxonomy. For instance, it might reveal overlaps that require further investigation.

[FIGURE 1 OMITTED]

Sorting examples

When a user wants to study the examples in a category to understand the essence of the category, it is important that the examples not be chosen at random. A random selection can sometimes lead to a skewed understanding of the category content, especially if the sample is small compared with the size of the category (often the case in practice). To overcome this potential problem, our software enables examples to be sorted based on the criteria of most typical first or least typical first. This translates in vector space terms to sorting in order of distance from the category centroid (i.e., the most typical example is closest to the centroid, the least typical example is farthest from the centroid). The advantage of sorting in this way is twofold. Reading documents in the most typical order can help the user quickly understand what the category is generally about without having to read a large sample of documents in the category; reading the least typical documents can help the user understand the scope of the category and determine if there is conceptual purity.

Editing the taxonomy

Once the analyst understands the meaning of the classes in the taxonomy and their relationship with one another, the next step is to provide tools for rapidly changing the taxonomy to reflect the needs of the application. Our goal here is not to produce a perfect taxonomy for every point of view because such a taxonomy may not exist or may require too much effort to obtain. Instead we want to focus the user's efforts on creating a natural taxonomy that can summarize major themes in the discussion, thus eliminating any categories that the system may have created that do not make sense as discussion themes. This might be due to a centroid forming around a concept that is syntactically similar but has different meanings in different contexts. For example, a cluster created around the word "customer" might be based on two types of comments: one set dealing with customer relationship management applications and another set dealing with customer satisfaction issues. In some cases such changes can be made at the category level; in other cases a more detailed modification of category membership may be required. Our tool provides capabilities at every level of a taxonomy to allow the user to make the desired modifications with a simple point and click.

Category-level changes

Category-level changes involve modifying the taxonomy at a macro level without direct reference to individual documents within each category. Categories can be merged or deleted.

Merging two classes means creating a new category that is the union of two or more previously existing category memberships. A new centroid is created that is the average of the combined examples. The user gives an appropriate name to the new category.

Deleting a category (or categories) means removing the category and its children from the taxonomy. This, however, may have unintended consequences because all the examples that formerly belonged to the deleted category must now be placed in a different category at the current level of the taxonomy. To make this decision more explicit for the user, we introduced a pie chart that shows all of the secondary classes and the percent of the category's documents that would be assigned to each if the category were to be deleted. Each slice of the pie chart can be selected to view the individual documents represented by the slice. Making this information explicit allows the user, when faced with a decision concerning the deletion of a category, to arrive at an informed decision and avoid unintended consequences.

Document-level-changes

While some changes to a taxonomy may be made at the class level, others require a finer degree of control. These are called document-level changes and consist of moving or copying selected documents from a source category to a destination category. The difficult part of this operation from the user's point of view is selecting the right set of documents to move so that the source and destination categories are changed in the manner desired. To address this problem, three methods of selection are provided.

1. Selection by keywords--One of the most natural and common ways to select a set of documents is with a keyword query. The user may enter a query for the whole document collection or for a specific category. The query can contain keywords and use Boolean logic. Words that co-occur with the query string are displayed to help the user refine the query. Documents that are found by using the keyword query tool can be viewed immediately and selected one at a time or as a group to move or establish a new category.

2. Selection by sorting--Another way to select documents to move or copy is by the most typical or least typical sorting technique described earlier. For example, the documents that are least typical of a given category can be located, selected, and moved or placed in a new category.

3. Selection by visualization--The scatter plot visualization display (Figure 1) can also be a powerful tool for selecting individual or groups of documents. Groups of contiguous points (documents) can be selected by using the mouse to draw a floating box around them, and then they can be moved to a new category.

Validation

Whenever a change is made to the taxonomy, it is very important for the analyst to validate that the change has had the desired effect on the taxonomy as a whole and that no undesired consequences have resulted from unintentional side effects. (26) Our software contains a number of capabilities that allow the user to inspect the results of modifications. The goal is to ensure that all the categories are meaningful, complete, and differentiable, and that the concepts represented by the document partitioning can be carried forward automatically in the future as new documents arrive.

Direct inspection

The simplest method for validating the taxonomy is through direct inspection of the categories. The category views described earlier in the subsection "Viewing the taxonomy" provide unique tools for validating that the membership of a category is not more or less than what the category means. Looking over some of the least typical documents is a valuable way to ascertain quickly that a category does not contain documents that do not belong. Another visual inspection method is to look at the nearest neighbors of the category being evaluated through the scatter plot display. Areas of document overlap at the margins are primary candidates for further investigation and validation.

Validation metrics

Much research has been done in the area of evaluating the results of clustering algorithms. (17,27) While such measures are not entirely applicable to taxonomies that have been modified to incorporate domain knowledge, there are some important concepts that can be applied from this research. Our vector space model representation (15,16) (admittedly a coarse reflection of the documents' actual content) at least allows us to summarize a single level of the taxonomy with some useful statistics, including cohesion and distinctness. Cohesion is a measure of similarity within a category. This is the average cosine distance of the documents within a category to the centroid of that category. Distinctness is a measure of differentiation between categories. This is one minus the cosine distance of the category to the centroid of the nearest neighboring category.

These two criteria are variations on the ones proposed by Berry and Linhof: compactness and separation. (28) The advantage of using this approach as opposed to other statistical validation techniques is that these criteria are more easily computed and also readily understood by the taxonomy expert. In practice, these metrics often prove useful in identifying two potential areas of concern in a taxonomy. The first potential problem is having Miscellaneous classes. These are classes that have a diffuse population of documents with widely varying contents. Such classes may need to be split further or subcategorized. The second potential problem is when two different categories have very similar content. If two or more classes are almost indistinguishable in terms of their word content, they may be candidates for merging.

Statistical measures such as cohesion and distinctness provide a good rough measure of how well the word content of a category reflects its underlying meaning. For example, if a user-created category is not cohesive, then there is some doubt as to whether an analyst could learn to recognize a new document as belonging to that category as the word content is not well-defined. On the other hand, if a category is not distinct, then there is at least one other category containing documents with a similar vocabulary. This means that an analyst may have difficulty distinguishing into which of the two similar categories to place a candidate document. Of course, cohesion and distinctness are rough and relative metrics, so there is no fixed threshold value at which we can say that a category is not cohesive enough or lacks sufficient distinctness. In general, whenever a new category is created, we suggest that the cohesion and distinctness scores for the new category be no worse than the average for the current level of the taxonomy.

Emerging themes

In addition to overall themes for each forum, it is also desirable to discover newly emerging issues in the discussion. One way to discover such themes would be to generate a new taxonomy for each of the forums based on only the most recent sample of the data.

There are several drawbacks to this approach, not the least of which is that categories generated in this way may differ from the overall categories for reasons that are not related to the different data sample, but are inherent artifacts of the clustering approach, that is, the fact that K-means clustering begins with a random starting point. A simpler, more reliable way to find emerging themes is to analyze the dictionary of terms across time to determine which terms are showing increased mentions. To achieve a reliable sample size, we defined recent to be the last 10 percent of the posts in a forum, sorted chronologically. We then analyzed all the dictionary terms to determine which, if any, occurred with an unusually high frequency in the Recent set. Unusually high is determined by using a chi-squared test, which determines the independence of two discrete random variables. (29) Terms that occur with probability of less than 0.01 are selected. The resulting term list is displayed to the user for further investigation by trend charts and example displays that can be used to create new document categories, which can then be published as themes.

CASE STUDIES

We are focusing on three major Jams that recently took place, two internal for IBM employees worldwide and one for the World Urban Forum. In this section, we describe how text analysis has been used in each of these Jams and how it has evolved to meet the demands of this new medium.

ValuesJam

ValuesJam was a 72-hour global brainstorming event on the IBM intranet, held July 29-August 1, 2003. IBMers described their experiences and contributed ideas by means of four asynchronous discussion forums. The purpose of real-time interactive text mining of the Jam was to generate forum topics that allowed participants to identify themes as they emerged in each forum and in the Jam overall, in 12-hour intervals. Total posts for this event were in excess of 8,000 over the course of the event, with one of the largest forums containing more than 3,000 posts.

Analyzing discussion forum data to produce topic areas of interest presents several challenges that an interactive text-mining approach is well-suited to address:

1. The forum analyzer must produce categories that reflect meaningful groups of posts, and these groups must not contain a significant number of extraneous or misclassified examples.

2. Each cluster of posts must be given a concise yet meaningful name.

3. When a cluster of posts is presented, a set of representative examples are needed to further explain the meaning of the cluster and direct the user to the appropriate point in the discussion.

4. The clusters need to evolve with the discussion, adding new clusters over time as appropriate to incorporate the new topics that arise without losing the old clusters and thus the overall continuity of the discussion topic list.

Clearly a completely automated solution is impractical, given these requirements, and a manual approach requiring a set of human editors to read over 8,000 posts in 72 hours and classify them is prohibitively expensive. Interactive text mining is thus an ideal candidate for this application. During Values Jam, different experts in each forum used our tools to develop themes for that forum, and a single primary analyst (one of the authors of this paper) helped coordinate the analysis as a whole.

Initial taxonomy generation

The first taxonomy generated for discussions in the largest Values Jam forum was created on 1,308 posts representing 20 hours of discussion. The form of the taxonomy was a list of 24 classes that indicated their name, size, cohesion, and distinctness.

We began by sorting the categories by their cohesion scores. This gave us a useful order in which to tackle the problem of quickly understanding the taxonomy, category by category. We viewed each category in detail, made adjustments as necessary, and gave the category a new name if needed (e.g., the category name "stock price" replaced the name "stock" given by the system). Occasionally we found clusters that were formed based on words that were not relevant to the content of the post, such as the "question,term" cluster in Figure 2. For this class, we viewed the secondary class pie chart to determine where the examples would go when the centroid was removed. We saw that they would be distributed evenly throughout the taxonomy, so we felt we could delete the centroid without ill effect.

[FIGURE 2 OMITTED]

The Miscellaneous class required special attention. Individual dictionary terms can frequently be used to extract a common set of examples from a Miscellaneous class and create a useful separate category. In Figure 3, the category centered on the word "trust" is an example. Clicking on the red trust bar in the figure caused all examples in Miscellaneous that contained the word "trust" to be selected. These were then further edited, and a new category called "trust" was created in the taxonomy. Finally, the complete analyst-adjusted list of categories was generated.

[FIGURE 3 OMITTED]

Using our methodology and software text-analysis tools, this entire process required about a half hour of concentrated effort. We then used this information to generate reports to the Values Jam audience. The resulting Web page report is shown in Figure 4. Selecting any of the links shown in the figure took the user to a display of ten of the most typical comments for that theme. This process was then repeated for each of the remaining forums and for the Jam as a whole. The entire reporting operation took about three to four hours.

[FIGURE 4 OMITTED]

ValuesJam emerging themes

As the Jam progressed, new topics naturally emerged. To identify these, the emerging themes analysis described earlier was especially valuable. A good example of this came late in the Jam when a breaking news story had an impact on the discussion. (30) We observed the word "pension" occurred 51 times overall, and 11 times in the last 10 percent of the data. This was deemed by the chi squared test to be a low probability event (P = 0.0056). The trend line for this keyword, shown in Figure 5, indicates the spike. Posts that contained the word "pension" had been decreasing as a percent of total posts, but on the last day there was a sharp increase. Looking at the text for these examples quickly revealed the cause--the breaking news story--and thus a new category was created centered on this word.

[FIGURE 5 OMITTED]

Success of interactive text mining during ValuesJam

Our interactive text-mining approach, with only one primary analyst working on a standard laptop, showed itself to be very capable of supporting real-time analysis of a discussion among thousands of users. A survey that included 1,248 respondents done after Values Jam indicated that 42 percent of the participants used the theme pages to enter the Jam. Of those who used this feature, 72 percent found it to be important and 61 percent found it to be satisfactory--the top two possible ratings. Only 10 percent were dissatisfied.

IBM WorldJam

The purpose of IBM WorldJam, a companywide Jam held in 2004, was to encourage ideas about how IBM could best implement its values. The process we used to generate 3am Themes was much the same as in Values Jam. After the Jam, a survey was conducted to determine the success of the event and the usefulness of the various tools involved. One of these tools was text analysis as presented by Jam Theme pages. As the data in Table 1 shows, only the World Jam2004 search tool came close to 3am Themes when rated for importance. The frequency of use and satisfaction scores for Jam Themes surpassed the other Jam discovery tools.

World Urban Forum HabitatJam

In HabitatJam, we exposed the Jam technical infrastructure to a non-IBM audience for the first time. The purpose of the Jam was to identify topics for discussion at an upcoming World Urban Forum. This was a 72-hour event that included participants from 120 countries. Posts were submitted in both French and English in seven forums.

During the event, text analysis was done three times a day for each of the seven forums and three times a day for the Jam as a whole. The English forums were all done by a single human analyst using the techniques described in this paper (including clustering using cohesive terms). Despite the fact that nearly twice as many themes were generated in the same time period for this Jam as for World Jam, the post-event survey indicates that the quality (as measured by importance and satisfaction scores) did not degrade. This survey was conducted following the Jam (n = 1,374 respondents). The results, presented in the lower section of Table 1, indicate that for this event, the Jam Themes Web page was both the most important and the most satisfactory discovery tool available during the event.

The user feedback scores from all three Jams, with different audiences and topic areas, show that Jam Themes are the most frequently used tool for navigation and discovery--even more so than text search. As well, the overall satisfaction was higher for Jam Themes than for any other Jam tool. While there is still room for improvement, these results indicate that our text-mining approach has significant value for discussion participants.

SUMMARY AND FUTURE WORK

We have demonstrated the value of using text-mining techniques to facilitate and enhance large-scale electronic dialogs. While it is true that the computer does not take part directly in the discussion in the same way as a human participant would, it still, in conjunction with the human analyst, plays a critical role in generating content that furthers the discussion. In fact, the computer plays a role in Jam conversations that might be played by a human being in a much smaller conversation--that of facilitator or moderator--by helping to ensure that all points of view are heard and taken into account by all participants.

A problem that still needs to be addressed is that those themes that become established as the Jam progresses may tend to become document silos, ignoring potential relationships among and between other themes. We are experimenting with semantic browsing approaches that might help alleviate this problem for users in the future.

Many of the techniques described in this paper can be applied in fields beyond Jams. Market intelligence, through which businesses seek to find actionable insights for economic advantage, is a common application. (31) This can also be accomplished by using alternative information sources such as captured information from customer interactions during sales and service. Additionally, in large enterprises there is abundant text information available in e-mails, documents, and databases that can be leveraged in a similar way.

The planned future direction of our work is to minimize the need for a human analyst or perhaps ultimately eliminate it, leaving the computer alone to play the role of Jam Theme generator and conversation facilitator. This will require more precise text category naming strategies and intelligent pruning techniques for removing categories that are not meaningful or helpful in summarizing a topic area. Perhaps the conversation participants themselves might be enlisted to provide feedback on categories that might be used to adjust the text categorization algorithms.

Inevitably, computers are becoming a greater and greater participant in our conversations. Through text analysis, they can tell us things that humans would find difficult or even impossible to discover on their own about what is being said. As text-analysis techniques become ever more powerful and intuitive, the role of machines in our conversations is only going to increase in the future. We look forward eagerly to hearing what they have to say.

ACKNOWLEDGMENTS

The authors gratefully acknowledge the contributions of Mike Wing for inventing Jams in the first place and for recognizing the important role that text mining would have to play, of Kristine Lawas, who assisted in editing themes during the Jams, and Dharmendra Modha for his text visualization and analysis insights. Finally the authors thank the IBM Jam infrastructure team for helping to make the technical details run so smoothly during each Jam event.

Accepted for publication March 27, 2006.

CITED REFERENCES

(1.) A.M. Turing, "Computing Machinery and Intelligence," Mind 59, No. 236, 433-460 (1950).

(2.) J. Weizenbaum, "ELIZA--A Computer Program for the Study of Natural Language Communication Between Man and Machine," Communications of the ACM 9, No. 1, 36-45 (1966).

(3.) C. R. Sunstein, "Democracy and Filtering," Communications of the ACM 47, No. 12, 57-59 (2004).

(4.) C.M. Hymes and G. M. Olson, "Unblocking Brainstorming Through the Use of a Simple Group Editor," Proceedings of the ACM Conference on Computer-Supported Cooperative Work, Toronto, Ontario, Canada (1992), pp. 99-106.

(5.) E. L. Santanen, R. O. Briggs, and G.-J. de Vreede, "A Cognitive Network Model of Creativity: A Renewed Focus on Brainstorming Methodology," Proceedings of the 20th International Conference on Information Systems, Charlotte, NC (1999), pp. 489-494.

(6.) J. Kleinberg, "Bursty and Hierarchical Structure in Streams," Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Alberta, Canada (2002), pp. 91-101.

(7.) R. Kumar, J. Novak, P. Raghavan, and A. Tomkins, "Structure and Evolution of Blogspace," Communications of the ACM 47, No. 12, 35-39 (2004).

(8.) D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins, "Information Diffusion Through Blogspace," Proceedings of the 13th International Conference on World Wide Web, New York, NY (2004), pp. 491-501.

(9.) D. Kempe, J. Kleinberg, and E. Tardos, "Maximizing the Spread of Influence through a Social Network," Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC (2003), pp. 137-146.

(10.) D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins, "The Predictive Power of Online Chatter," Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL (2005), pp. 78-87.

(11.) S. Spangler and J. Kreulen, "Interactive Methods for Taxonomy Editing and Validation," Proceedings of the 11th International Conference on Information and Knowledge Mining, McLean