INTRODUCTION
The beginning of the 21st century has been marked by an explosion
of electronic information. We often find ourselves inundated with
information that reaches us through a variety of channels, including
e-mail accounts, voice-mail recorders, and most recently, text-messaging
clients on our cellular phones. This amounts to a deluge of requests,
notices, scheduling invitations, and the like. We owe the information
explosion to the proliferation of inexpensive and readily available
technology that has led to what most would agree is both a blessing and
a curse. Whereas information facilitates efficient business and social
networking, it also has become a major burden. Humans simply cannot
accommodate massive information input.
For economic reasons, e-mail is the most acute and widespread
facilitator of this explosion due to the virtually free cost of
generating and distributing e-mail messages. Even when we ignore spam
and focus on legitimate e-mail (from trusted and welcome senders), we
find that e-mail is still a major problem. (1-3) One user reports
receiving well over 70 legitimate messages per day. (4) Anecdotal
evidence suggests that his experience is not unusual. In an enterprise
environment, a significant portion of these legitimate e-mails are
associated with tasks that must be acted upon by the recipient.
A key challenge posed by e-mail inundation is how to effectively
manage the tasks and activities that are associated with e-mail
messages. Herein lies the goal of our work: to help users manage their
tasks effectively. We consider a task to be a particular kind of
activity. Moran defines "activity" as a set of mental or
physical actions carried out by persons. (5) Through composition,
activities can contain subactivities, which can themselves contain
subactivities. In this vein, we define a task to be an atomic level
activity, one that may not contain subactivities. We focus only on tasks
that are communicated by e-mail messages, such that there is at most one
task per message. As an example, the process of bidding for a product on
eBay ** is an activity containing many subactivities. One e-mail
associated with this process alerts the recipient that he or she has won
an auction. The task for the recipient contained in this e-mail is to
initiate a payment to the seller.
In this work, we focus on the population of business managers who
receive daily a large number of legitimate, machine-generated e-mails,
such as the ones that are generated by business processes within a large
enterprise. We present here a practical solution for dealing with such
e-mail in the form of a task management tool called SCOUT, (6) which
uses contextual information about the user and the environment to
recognize, filter, sort, organize and execute tasks associated with
e-mails. By using information from pervasive sources (i.e., ubiquitous
computing devices), SCOUT alleviates some of the problems associated
with e-mail overload by presenting the core information to the recipient
in an efficient and well-organized fashion.
We hypothesize that tasks contained within e-mail messages can be
automatically identified for presentation within SCOUT. Tasks can be
contained in one of two types of e-mail messages: human-generated and
machine-generated. For our purposes, the salient difference is that
human-generated messages tend to be unstructured, whereas the contents
of machine-generated messages have a regular structure. To simplify the
problem, we focus on machine-generated messages. We assume that every
machine-generated message is associated with some business process
(e.g., the eBay bidding process or the expense reimbursement process in
an enterprise), that we only have access to e-mail messages generated by
business processes, and that other than inspecting the e-mail messages
themselves, we have no knowledge of the syntactic structure used in
these messages. Furthermore, we assume that we make no modifications to
messages or to the business processes that generate them.
SCOUT tracks a set of registered task types, each of which
corresponds to a business process. When SCOUT identifies an e-mail
message associated with a business process, the task contained within
that e-mail message is specified in a document by using an Extensible
Markup Language (XML) dialect called TaskML. A task description contains
the following attributes:
* Type--the task type represented by a label unique to a business
process or transaction associated with the task (e.g., a bidding
transaction on eBay, a password update at Amazon.com Web site).
* Subject--a summary description of the task (e.g., you have won
the auction)
* Person--an optional list of persons associated with the task
(e.g., a collaborator who can help complete a task)
* Deadline--an optional deadline by which the task must be
completed
* Thread--the set of related messages associated with the activity
containing this task
* Comments--free-form comments associated with the task
* Status--the state of completion of the task
By automatically identifying tasks within e-mails generated by
business processes, SCOUT helps make users aware of the tasks awaiting
their attention. Furthermore, by pulling these e-mail messages into a
task management system, it reduces the number of legitimate e-mail
messages the user must process each day.
SCOUT provides three main functions: e-mail analysis, context-based
task presentation, and context-based task reminding.
1. E-mail analysis: An e-mail analysis engine recognizes incoming
e-mails as being associated with known business processes. Such e-mails
are then parsed and further analyzed to extract task information
relevant to that process.
2. Context-based task presentation: SCOUT uses context associated
with a task so that it can be presented in a graphical interface that is
customized according to the viewer.
3. Context-based task reminding: To extend SCOUT beyond the
desktop, context-based reminders enable task-related messages to be sent
to users on pervasive devices. Users can specify contextual criteria to
trigger the reminding process (e.g., if my task is to pick up a package,
alert me when I am in the vicinity of the mail room; if my task involves
Steve, alert me when we are both available).
The e-mail analysis function is implemented using Unstructured
Information Management Architecture (UIMA) annotators. UIMA (7) is a
component-based software framework used for the development of
applications that process unstructured information. It focuses on text
analysis and isolates the core algorithms that perform text analytics
from system services such as storage of data, communication between
components, and visualization of results. By offering a framework with
well-defined application programming interfaces (APIs), UIMA allows
developers to share and combine text analysis algorithms in order to
build complex applications.
The rest of the paper is organized as follows. In the next section,
we review related work. In the following section we introduce the SCOUT
application, describe the way in which the application requirements were
defined, and describe the two interfaces to SCOUT, the Web portal and
the e-mail client. Next we present the context information used by
SCOUT, the sources of that information, and the way in which additional
context is derived. We then describe the SCOUT architecture and give an
overview of the e-mail analysis components. We present results of a
pilot study and conclude with some final comments, including ideas for
future work.
RELATED WORK
Moran and his colleagues identified several metatasks required for
efficient task management (2,5):
* Creating awareness of the core task and related metatasks
* Prioritization of tasks
* Scheduling of task appointments
* Completion of task prerequisites
* Monitoring of task status
* Notification/reminders of partially completed tasks
* Delegation of tasks through reassignment
An important focus in task management is the awareness aspect.
Although task management has received a great deal of attention in the
literature, (8-11) most approaches tend to disregard the awareness
problem. A notable exception is the work of Cortson-Oliver et al., (12)
which deals with general e-mails. They propose SmartMail, a prototype
task-extraction system that uses linear support vector machines
(machine-learning method used for classification) and linguistic rules
to analyze unstructured e-mails. Their technique produces task-focused
summaries of action items detected in e-mails. With such a wide scope on
general e-mails, their solution has had only modest predictive success.
Another exception is the work of Bennett and Carbonell (13)
describing a system that tries to identify the action items contained in
unstructured e-mails. They compared a standard unigram (1st order
Markov) approach to an n-gram (n--1 order Markov) approach applied at
both the document and sentence level. They found that n-grams applied at
the sentence level are most effective, achieving accuracies of 0.8092,
0.8145 and 0.8173 for a k-nearest neighbor, naive Bayes, and support
vector machine classifier, respectively. In contrast to this work, SCOUT
limits the e-mail that it considers to those items that arrive from
semi-structured business processes. In the case of one SCOUT user with
an e-mail corpus consisting of 2,269 messages, the observed accuracy was
0.9996; similar results were obtained for other SCOUT users.
Much of the work on automatically classifying e-mails aims at
automatically placing e-mail messages into appropriate folders. Examples
of work addressing the filing problem include Segal and Kephart's
MailCat system, (14) and the work of Bekkerman et al. with e-mail data
from Enron Corporation and SRI International. (8) Similar work has also
been performed by Dredze et al., (15) who focus on automatically
classifying each incoming e-mail message according to the activity to
which it belongs. In the TaskPredictor system, Shen et al. (16) extend
this work by using incoming (unstructured) e-mail messages to predict a
user's activities. The focus of all this research, however, is on
the classification of e-mail messages and on predicting a user's
activities, and not upon the identification of action items within
e-mail messages. In contrast, SCOUT attempts to help users identify and
manage the tasks that are contained within e-mail messages.
Another related body of work focuses on extracting information from
text-based documents, such as e-mail messages and Web pages. McCallum
provides a good overview of the challenges of information extraction,
(17) including the trade-offs involved in the use of various techniques.
He argues that rule-based approaches, such as the one used in SCOUT,
only work on relatively simple text within applications of limited
complexity. Business process e-mails are generally verbose, but contain
relatively simple requests. Furthermore, assuming future access to the
business processes that generate the e-mails, a reasonable long-term
solution would not focus on information extraction from the e-mail
messages, but on the use of a task mark-up language from which SCOUT
entries and e-mail messages could be generated.
Tomasic and his colleagues (18) describe a virtual information
officer (VIO) that accepts e-mailed requests for updates to corporate
databases and returns partially filled out forms for user confirmation.
Like VIO, SCOUT is interpreting an incoming e-mail message to identify
the underlying task. Unlike VIO, SCOUT's focus is on helping users
manage tasks presented to them by business processes. If SCOUT were used
in conjunction with VIO, SCOUT would classify the partially filled out
forms sent to the user for confirmation as a task from a business
process. In addition, SCOUT differs from VIO in that it employs a
rules-based approach that does not require extensive training to support
each new business process; a new process can be supported by SCOUT with
a minimal investment of time.
THE SCOUT APPLICATION
In this section, we present the design and realization of the SCOUT
application. We begin by discussing the application requirements and how
they were determined. We then present the two application interfaces
that we have built--one within a portal environment and the other within
an e-mail client.
Application requirements
There are many ways to support the previously outlined metatasks in
a task management system. We collected application requirements in a
user study that involved a focus group in a two-phase process. First, a
group of seven participants were interviewed, and their comments were
incorporated within a tentative set of requirements. The group consisted
of managers (three first-level and four second-level) representing the
target population for the tool. In the second phase, sketches of a
proposed user interface were reviewed by five additional participants,
three first-level managers and two second-level managers, and their
comments were incorporated within the final set of requirements.
The participants felt strongly that the task management interface
should emphasize simplicity with terse, relevant information displays.
As a result, we decided to limit the set of functions to basic ones.
Another issue that our managers brought up was the choice between a
Web-based user interface and a client-based user interface. In our
survey population approximately half of the participants preferred a
SCOUT implementation that was accessible via the Web, whereas the others
preferred a SCOUT client integrated with their e-mail client, which
would periodically synchronize with a back-end server, allowing them to
process tasks while operating in a mobile environment.
Beyond the preceding design considerations, our survey group
expressed interest in the following list of novel features for managing
e-mail complexity:
1. Automatically generated to-do list--This was the primary feature
that our user study participants requested. Of the managers we surveyed,
only one actively used the to-do list feature of the available personal
information manager tools. The primary hindrance to using to-do lists
cited by users was the burden of manually populating and maintaining
such lists.
2. Rich views of to-dos--Related to the issue of automatically
populating the to-do lists is enabling access to rich and varied views
of these lists. Such views depend on the ability to associate attributes
(e.g., colleagues, task deadline, task type) with tasks. Manual
assignment of these attributes is a tedious process. The ability to
automatically extract e-mail-based tasks, create to-do entries for these
tasks, and assign appropriate values for attributes was deemed extremely
useful by our study group.
3. Communication and collaboration support--Machine-generated tasks
often contain subtasks that require human-to-human communication or
collaboration. Many participants expressed interest in having a task
management system that allows users to associate e-mails or other
information to automatically generated tasks. In addition, such a task
management system must allow users to delegate tasks to others.
4. Automated scheduling--The automating of scheduling tasks
received a lukewarm response. Although no participant expressed interest
in a completely automated tool able to schedule times for performing
tasks without their explicit approval, there was some support for a
feature that assists with task scheduling. For example, some
participants thought that an informational calendar showing deadlines
and proposing start dates for long-running tasks might be beneficial.
5. Automated notification--Participants thought that notification
of imminent and urgent deadlines through channels other than e-mail
would be beneficial, as long as the notifications were completely
unobtrusive and well-timed (e.g., not sent during meetings). Such
notification mechanisms had to take into account the users' context
in order to make intelligent decisions. A few participants were adamant
in opposing the idea of any notification.
With these requirements in mind, we designed two interfaces to
SCOUT: a portal-based implementation and an implementation integrated
into an e-mail client application. In spite of the significant
additional implementation effort, we decided on a dual interface
approach because of the strong recommendations from our study group.
The SCOUT Web portal application
The SCOUT Web portal application (SCOUT portal, for short) presents
to the user a dashboard that lists all pending tasks and their
attributes. Figure 1 shows a screen capture of a typical task list in
the SCOUT portal.
[FIGURE 1 OMITTED]
Upon visiting the portal, the user can view all pending tasks,
arrange and prioritize them according to several automatically extracted
attributes such as task due date, associated users, associated formal
process, and task subject. As shown in Figure 1, in this view there is a
Sort by pull-down menu in which the user can select different criteria
for sorting the tasks. This list can be sorted by importance, type,
subject, deadlines, or upcoming meetings. In the task list view shown in
Figure 1, tasks are sorted by upcoming meetings. Taking a closer look at
the fourth row of the task list reveals that the corresponding task is
of type EASubmission, which means that it is an expense submission task
generated by a business process whose name is shown in the type column.
The subject is Trip to WMCSA, and its deadline is 12-16-2005. In
addition, SCOUT informs the user that it is Maria R. Ebling who
submitted this travel expense. Should the user want to meet with Maria
to discuss this task, the next meeting with Maria is scheduled to take
place on 12-19-2005, as shown in the upcoming meetings column.
By presenting all pending tasks in a single, well-organized
interface, SCOUT helps make users aware of the tasks awaiting their
attention; it also helps them prioritize those tasks and monitor the
status of pending tasks--all important metatasks associated with
efficient task management.
Task-specific details can be viewed by clicking on the task Subject
link. Figure 2 shows the task detail view when the user clicks on the
fourth task of the task list shown in Figure 1. From this screen, the
user can edit the importance rating of this task as well as its deadline
and status. (The task status can be monitored in the task list view.) In
addition, if a task needs to be delegated to a colleague, clicking on
Delegate Task triggers a mechanism for sending appropriate notifications
to others. The user can also send e-mail from this view or set up a
context-aware reminder. Finally, SCOUT provides a Launch IDP button to
access the application related to this business process, if available on
the Web. (IDP stands for Individual Development Plan.)
[FIGURE 2 OMITTED]
The presentation of tasks within the task list view (Figure 1) and
the management of individual tasks (Figure 2) can change based upon
contextual information. Our discussion here is based on a general notion
of context; we will discuss the specific sources of context information
and their derivatives in the next section. Two examples of the use of
contextual data are (1) the availability of a person required to carry
out the task and (2) context-aware reminders.
The availability of a person required to carry out a task is needed
in order to schedule time for the task owner to work on the task. SCOUT
makes use of several contextual data sources to infer a person's
availability: calendar data, instant messaging (IM) status, phone
status, and so on.
Contextual information is also used for context-aware delivery of
messages to a task owner. Both proximity-based and availability-based
reminders make use of contextual information. If a pending task requires
the task owner to visit a particular location, the owner can arrange for
a proximity-based reminder that will send the user a notification when
he is in the vicinity of the location (e.g., remind the user to pick up
a package from the mail room when the user is leaving the cafeteria
which happens to be nearby). Similarly, if a pending task requires
consultation with a colleague, the task owner can arrange for a reminder
to be sent when the colleague becomes available or collocated (in the
same room). In our current implementation, notification messages are
delivered by means of Short Message Service (SMS).
E-mail client application
The SCOUT e-mail client application is integrated into the Lotus
Notes* e-mail client used by a majority of the employees in our
organization. The tasks identified by SCOUT are entered into the to-do
list of the e-mail application. In addition, these tasks are shown in
the calendar. The decision to show the to-do entries in the calendar is
based upon the fact that the to-do feature of the e-mail application is
not commonly used, whereas the calendar feature has an extremely high
adoption rate within our organization.
Figure 3 shows a screen capture of the integrated Lotus Notes view
of the to-do list. In this view the user can inspect the pending tasks
and arrange and prioritize them according to several automatically
extracted attributes, such as the due date, type, and subject. Figure 3
shows a view of tasks organized by type (Lotus Notes uses the term
category instead of type). For example, the fourth task in category
EASubmission represents a travel expense submission with subject Hotel
Bill WMCSA 2006 and deadline 02/07/2006. The status of this task is
shown to be In progress, and the task has been delegated to John S.
Davis II. In addition, the entry shows that John has accepted the
assignment.
[FIGURE 3 OMITTED]
Clicking on the subject link of a task opens up the task detail
view, as shown at the bottom of Figure 3. The user can use standard
Lotus Notes task management features to set the importance rating of the
task and edit its deadline and status. In addition, the user may also
delegate a task to a colleague, as discussed earlier. As can be seen in
Figure 3, this interface supports a number of the metatasks presented
earlier, including the prioritization of tasks, the monitoring of task
status, the notification regarding partially completed tasks, and the
delegation of tasks through explicit reassignment.
The e-mail client interface does not support the context-aware
features provided by the SCOUT portal. This is because changes to the
standard corporate mail template are difficult to deploy. Without such
changes, context-aware features simply cannot be integrated into the
e-mail client interface. The e-mail client application runs on the
user's machine, which represents another important difference
between the Web portal application and the e-mail client application.
Because all processing is done on the user's machine (rather than
on a server), this implementation has the benefit of improved security
and privacy.
CONTEXT DATA SOURCES
The organization of a user's outstanding tasks should adapt to
changing contextual attributes of the user and of any other individuals
associated with the task. An example of adaptation based on the
user's own context is modifying the task list according to the
user's calendar. Examples of adaptation based on the contextual
state of other individuals are proximity-aware reminders (e.g., when a
pending task requires a conversation with the manager, alert the user
when both are in the same room) or availability-aware reminders (e.g.,
when a travel reimbursement request requires a conversation with a
colleague, alert the user that the individual has become available by
highlighting the task entry in the task list view).
Most of the research on context-aware computing implicitly assumes
the deployment of infrastructure to sense and collect context. It is
sometimes hard to justify the capital cost of either the deployment of
necessary sensors or the retooling of existing IT infrastructure
components. Accordingly, our focus has been on identifying the dynamic
user attributes that are already available and accessible. Contextual
attributes can be classified as either raw or derived. Raw context
refers to attributes obtained directly from external infrastructure
components, such as sensors or software, and typically includes
information such as a user's location, calendar entries, or IM
status. Derived context refers to higher-level user attributes obtained
by composing, or fusing, raw contextual data. For example, a
person's availability (or willingness to be interrupted) might be
deduced from a combination of calendar information, IM status, and phone
status.
Context may also be classified as either physical (referring to
physical user attributes such as location) or virtual (referring to
attributes that exist only within the IT infrastructure, such as the
number of open IM sessions). As shown in Figure 4, the raw-derived
dimension and the physical-virtual dimension can be viewed as orthogonal
axes in the space of context sources. In the rest of this section we
discuss the way we use a number of context sources in SCOUT.
[FIGURE 4 OMITTED]
Raw context sources
In this section we describe the following raw context sources used
in SCOUT: calendar, IM presence, location, and phone status.
Calendar
In an enterprise environment, the calendar provides a rich source
of context information. A SCOUT component known as the calendar adapter
interface to a calendar server retrieves and parses calendar
information. It gathers information on meeting and appointment events,
which includes a list of attendees and the location, time, and topic of
the meeting. It is worth noting that not only does calendar information
provide insight about a user's current state (e.g., located in room
1401) but also about the user's likely future activity (e.g.,
scheduled for an off-site meeting in New York in an hour). The calendar
information is particularly useful in determining if certain individuals
are busy on specific tasks and should thus be free from interruption.
IM presence
Given the popularity of IM-based collaboration and communication
within the enterprise, a user's current IM status also provides a
very useful form of context. In particular, we use IM APIs for both
synchronous and asynchronous retrieval of a user's presence-related
events (e.g., online, offline, away from the computer, or in
do-not-disturb mode).
Location
The user's location may be obtained in a number of ways. We
focus here on the location within a building in which we have deployed
an active badge infrastructure. The technology consists of radio
frequency identification (RFID) readers installed at selected points
(e.g., entrance and exit to the cafeteria) that detect when card-shaped
badges (worn by individuals) are in the vicinity. Indoor location may
also be inferred from wireless local area network (WLAN) connections.
Phone status
The activity status of the user on a phone (e.g., on/ off hook, the
identity of the callee) is retrieved through protocol-specific
mechanisms. For Voice over Internet Protocol (VoIP) traffic based on the
Session Initiation Protocol (SIP) protocol, we use special SIP OPTION
messages to interrogate the SIP server or the end device about its
status. As VoIP phones become universally deployed and SIP technology
matures, presence-based notification mechanisms will provide a scalable
solution for obtaining the phone status of individuals or groups.
Derived context sources
Based on the preceding raw sources of context, we derive the
following contextual attributes: proximity and availability.
Proximity
The typical use of this attribute involves detecting when two
people are in the same Vicinity within an office building. We refer to
these people as being collocated when they are in the same office or the
same public space, such as a cafeteria. In SCOUT the detection of
proximity relies on an active badge system. Proximity is useful, for
example, when a user is to be reminded at the opportune time of an
outstanding task that requires a face-to-face meeting with another
party.
Availability
At present, availability is by far the most frequently used derived
contextual attribute. Availability is an extension of the current
network-centric notion of presence, in that the latter refers to the
ability to communicate (e.g., whether an IM can reach the user), whereas
the former refers to the user's availability to actually
participate in a communication session. In addition, whereas presence is
a binary-valued attribute (a user is either connected or disconnected
from IM), availability is multidimensional; a user may be available for
IM but only with company employees, or the user may be available for
voice calls but only with family members. Automatic, context-driven
determination of availability (see, for example, References 19-22) is
particularly appealing as it relieves the user of the responsibility of
configuring the access rights granted to other users. We currently
determine an individual's availability by combining their IM
presence, voice status, and calendar information.
The addition of proximity further refines the notion of
availability. For example, a user is considered to be attending a
scheduled meeting (according to the calendar entry) only if SCOUT
determines that the user is either located in the meeting room specified
in the calendar entry or has dialed into the teleconference specified in
that entry.
SYSTEM ARCHITECTURE
Figure 5 illustrates the architecture of the full-featured version
of SCOUT. SCOUT has four main components: the Mail Agent component for
accessing e-mail messages, the E-Mail Analysis module, the Context
Services module for aggregating context data, and the Portlet
Application module which provides the client interface. The SCOUT
architecture is extensible so that a variety of client interfaces,
e-mail systems, context sources and e-mail analysis modules can be
supported.
[FIGURE 5 OMITTED]
The Mail Agent component serves as a translator that hides from
SCOUT the syntactic details of proprietary e-mail systems (e.g., Lotus
Notes, Microsoft Outlook **). The Mail Agent component reads e-mail
messages that are stored in the third-party mail database and translates
each e-mail message into an EmailML document. EmailML is a simple XML
description of an e-mail message that consists of the following
attributes: from-address, to-address, subject, body, time sent, and
unique ID. The unique ID is assigned by Mail Agent.
The output of the Mail Agent is consumed by the E-Mail Analysis
module. This module is built on the IBM UIMA framework. (7) It maps
incoming EmailML documents into TaskML documents. Details on this
transformation are provided in the section "E-mail task
analysis."
We use Context Services (CxS) to aggregate all context data as well
as TaskML instances for delivery to the client interface. (23,24) CxS is
designed for general support (API) of context-aware applications. The
CxS architecture includes three major parts: a set of adapters, a
composition engine, and an application interface. These parts are
described in more detail below.
Adapter
Each type of context data that CxS supports is associated with an
adapter. Figure 5 shows five adapters associated with data sources
currently supported by CxS: calendar, location, IM, VoIP, and task. Each
adapter retrieves or receives data of a given type from its context
source. The retrieval of data occurs according to a specified schema,
whereas a specified communication mechanism determines the way in which
data will be sent to or retrieved by the adapter. For instance, the task
adapter supports the TaskML schema. This adapter is essentially a Web
service receiving TaskML documents encapsulated in Simple Object Access
Protocol (SOAP) requests issued by the E-Mail Analysis module.
Composition engine
The CxS composition engine allows different sources of context to
be aggregated. Specifically, we implemented availability and proximity
composers, as shown in Figure 5. These derived context attributes were
discussed in the section "Derived context sources." By
delivering TaskML data through CxS (instead of uploading it directly
into Context DB), we allow future applications to use the compositional
facilities of CxS to aggregate TaskML documents with context data.
Application interface (API). The client application of SCOUT sits
on top of CxS. It uses the CxS API shown in Figure 5 to receive TaskML
instances as well as relevant context data. Our context-aware reminder
system is also a component sitting on top of CxS. SCOUT can post
reminders to the reminder system based upon user preferences and the
received TaskML.
Applications interface with SCOUT in two ways. They can access a
database (context DB) where TaskML and calendar events are stored. This
database is managed by a Context Logger module that uses the CxS API to
subscribe to TaskML and calendar events. Applications can also retrieve
contextual data about users (i.e., current location, current
availability) by using the CxS API directly.
SCOUT can be configured to exclude context features when sensor
resources are not available. The context features discussed in the
previous section are only available in the SCOUT portal implementation.
Figure 6 shows a simplified architecture for the stand-alone
implementation (e-mail client application). In this simplified version,
the output of the calendar agent and the E-Mail Analysis module are
directly consumed by the client application.
[FIGURE 6 OMITTED]
E-MAIL TASK ANALYSIS
SCOUT automates the task awareness aspect of the task management
life cycle by analyzing e-mails in three stages, as illustrated in
Figure 7. The first stage, carried out by the task identification
module, involves identifying the task associated with an e-mail message
and then labeling the message accordingly. In the second stage, the task
extraction module further analyzes the labeled e-mails to produce a set
of task attributes. Finally, the TaskML Generator module uses the set of
task attributes to map the incoming e-mail into a TaskML document. The
execution of each of these modules relies on sets of rules. These rules
define the business processes supported in the system. They are
maintained by an administrator on a Web server, as shown on the top part
of Figure 7. E-Mail Analysis modules periodically poll this Web server
to make sure that they are using the latest sets of rules published by
the administrator. The current polling frequency is a parameter that is
currently set to 180 minutes.
[FIGURE 7 OMITTED]
The e-mail analysis process is essentially a classification problem
that can be performed either through statistically-based
machine-learning algorithms or by a rule-based system that uses
human-defined rules. Both approaches implement a translation function
that maps e-mail messages to tasks. In the machine-learning approach, a
machine-learning algorithm is determined through the use of training
examples that consist of e-mail message and task pairs. In the
rule-based approach, the mapping function is determined by human
experts, who enter the rules into the system.
After reviewing a number of business-process-generated e-mails, we
decided that a rule-based approach was the best method for performing
e-mail analysis. Our decision was based on the deterministic nature of
machine-generated e-mails, which allows humans who have knowledge of
business process domains to easily create the analysis rules. Looking at
e-mail samples sent by a business process often reveals numerous strings
of text that are invariant throughout all these e-mail samples. These
strings define a unique signature that may be encapsulated in very
simple static rules that determine the e-mail-to-task translation
function.
SCOUT rule-based analysis
The process of defining rules for the SCOUT rule-based engine
starts with collecting samples of e-mails sent by each registered
business process. For each business process, all sample e-mails are then
aligned to identify a unique signature that is used to define a regular
expression representative of the set of all task e-mails sent by this
process. Such regular expressions are used to define these rules. For
example, the following regular expression is currently used to identify
all task e-mails sent by the Course Enrollment business process:
Subject: [] * [eE]nrollment[] + [cC]onfirmation[] + [a-zA-Z, @0-9]
* [] + [cC]ourse[] + [A-Za-z0-9] + [] + [cC]lass[] + [a-zA-Z0-9] +
The task identification module is a UIMA annotator that uses rules
to assign task types to each e-mail that it analyzes. The task
extraction module is also a UIMA annotato