Uncovering the to-dos hidden in your
in-box.
by Sow, Daby M.^Davis, John S., II^Ebling, Maria R.^Misra,
Archan^Bergman, Lawrence
INTRODUCTION
The beginning of the 21st century has been marked by an explosion
of electronic information. We often find ourselves inundated with
information that reaches us through a variety of channels, including
e-mail accounts, voice-mail recorders, and most recently, text-messaging
clients on our cellular phones. This amounts to a deluge of requests,
notices, scheduling invitations, and the like. We owe the information
explosion to the proliferation of inexpensive and readily available
technology that has led to what most would agree is both a blessing and
a curse. Whereas information facilitates efficient business and social
networking, it also has become a major burden. Humans simply cannot
accommodate massive information input.
For economic reasons, e-mail is the most acute and widespread
facilitator of this explosion due to the virtually free cost of
generating and distributing e-mail messages. Even when we ignore spam
and focus on legitimate e-mail (from trusted and welcome senders), we
find that e-mail is still a major problem. (1-3) One user reports
receiving well over 70 legitimate messages per day. (4) Anecdotal
evidence suggests that his experience is not unusual. In an enterprise
environment, a significant portion of these legitimate e-mails are
associated with tasks that must be acted upon by the recipient.
A key challenge posed by e-mail inundation is how to effectively
manage the tasks and activities that are associated with e-mail
messages. Herein lies the goal of our work: to help users manage their
tasks effectively. We consider a task to be a particular kind of
activity. Moran defines "activity" as a set of mental or
physical actions carried out by persons. (5) Through composition,
activities can contain subactivities, which can themselves contain
subactivities. In this vein, we define a task to be an atomic level
activity, one that may not contain subactivities. We focus only on tasks
that are communicated by e-mail messages, such that there is at most one
task per message. As an example, the process of bidding for a product on
eBay ** is an activity containing many subactivities. One e-mail
associated with this process alerts the recipient that he or she has won
an auction. The task for the recipient contained in this e-mail is to
initiate a payment to the seller.
In this work, we focus on the population of business managers who
receive daily a large number of legitimate, machine-generated e-mails,
such as the ones that are generated by business processes within a large
enterprise. We present here a practical solution for dealing with such
e-mail in the form of a task management tool called SCOUT, (6) which
uses contextual information about the user and the environment to
recognize, filter, sort, organize and execute tasks associated with
e-mails. By using information from pervasive sources (i.e., ubiquitous
computing devices), SCOUT alleviates some of the problems associated
with e-mail overload by presenting the core information to the recipient
in an efficient and well-organized fashion.
We hypothesize that tasks contained within e-mail messages can be
automatically identified for presentation within SCOUT. Tasks can be
contained in one of two types of e-mail messages: human-generated and
machine-generated. For our purposes, the salient difference is that
human-generated messages tend to be unstructured, whereas the contents
of machine-generated messages have a regular structure. To simplify the
problem, we focus on machine-generated messages. We assume that every
machine-generated message is associated with some business process
(e.g., the eBay bidding process or the expense reimbursement process in
an enterprise), that we only have access to e-mail messages generated by
business processes, and that other than inspecting the e-mail messages
themselves, we have no knowledge of the syntactic structure used in
these messages. Furthermore, we assume that we make no modifications to
messages or to the business processes that generate them.
SCOUT tracks a set of registered task types, each of which
corresponds to a business process. When SCOUT identifies an e-mail
message associated with a business process, the task contained within
that e-mail message is specified in a document by using an Extensible
Markup Language (XML) dialect called TaskML. A task description contains
the following attributes:
* Type--the task type represented by a label unique to a business
process or transaction associated with the task (e.g., a bidding
transaction on eBay, a password update at Amazon.com Web site).
* Subject--a summary description of the task (e.g., you have won
the auction)
* Person--an optional list of persons associated with the task
(e.g., a collaborator who can help complete a task)
* Deadline--an optional deadline by which the task must be
completed
* Thread--the set of related messages associated with the activity
containing this task
* Comments--free-form comments associated with the task
* Status--the state of completion of the task
By automatically identifying tasks within e-mails generated by
business processes, SCOUT helps make users aware of the tasks awaiting
their attention. Furthermore, by pulling these e-mail messages into a
task management system, it reduces the number of legitimate e-mail
messages the user must process each day.
SCOUT provides three main functions: e-mail analysis, context-based
task presentation, and context-based task reminding.
1. E-mail analysis: An e-mail analysis engine recognizes incoming
e-mails as being associated with known business processes. Such e-mails
are then parsed and further analyzed to extract task information
relevant to that process.
2. Context-based task presentation: SCOUT uses context associated
with a task so that it can be presented in a graphical interface that is
customized according to the viewer.
3. Context-based task reminding: To extend SCOUT beyond the
desktop, context-based reminders enable task-related messages to be sent
to users on pervasive devices. Users can specify contextual criteria to
trigger the reminding process (e.g., if my task is to pick up a package,
alert me when I am in the vicinity of the mail room; if my task involves
Steve, alert me when we are both available).
The e-mail analysis function is implemented using Unstructured
Information Management Architecture (UIMA) annotators. UIMA (7) is a
component-based software framework used for the development of
applications that process unstructured information. It focuses on text
analysis and isolates the core algorithms that perform text analytics
from system services such as storage of data, communication between
components, and visualization of results. By offering a framework with
well-defined application programming interfaces (APIs), UIMA allows
developers to share and combine text analysis algorithms in order to
build complex applications.
The rest of the paper is organized as follows. In the next section,
we review related work. In the following section we introduce the SCOUT
application, describe the way in which the application requirements were
defined, and describe the two interfaces to SCOUT, the Web portal and
the e-mail client. Next we present the context information used by
SCOUT, the sources of that information, and the way in which additional
context is derived. We then describe the SCOUT architecture and give an
overview of the e-mail analysis components. We present results of a
pilot study and conclude with some final comments, including ideas for
future work.
RELATED WORK
Moran and his colleagues identified several metatasks required for
efficient task management (2,5):
* Creating awareness of the core task and related metatasks
* Prioritization of tasks
* Scheduling of task appointments
* Completion of task prerequisites
* Monitoring of task status
* Notification/reminders of partially completed tasks
* Delegation of tasks through reassignment
An important focus in task management is the awareness aspect.
Although task management has received a great deal of attention in the
literature, (8-11) most approaches tend to disregard the awareness
problem. A notable exception is the work of Cortson-Oliver et al., (12)
which deals with general e-mails. They propose SmartMail, a prototype
task-extraction system that uses linear support vector machines
(machine-learning method used for classification) and linguistic rules
to analyze unstructured e-mails. Their technique produces task-focused
summaries of action items detected in e-mails. With such a wide scope on
general e-mails, their solution has had only modest predictive success.
Another exception is the work of Bennett and Carbonell (13)
describing a system that tries to identify the action items contained in
unstructured e-mails. They compared a standard unigram (1st order
Markov) approach to an n-gram (n--1 order Markov) approach applied at
both the document and sentence level. They found that n-grams applied at
the sentence level are most effective, achieving accuracies of 0.8092,
0.8145 and 0.8173 for a k-nearest neighbor, naive Bayes, and support
vector machine classifier, respectively. In contrast to this work, SCOUT
limits the e-mail that it considers to those items that arrive from
semi-structured business processes. In the case of one SCOUT user with
an e-mail corpus consisting of 2,269 messages, the observed accuracy was
0.9996; similar results were obtained for other SCOUT users.
COPYRIGHT 2006 All Rights
Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2006, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.