tHE ROLE OF INTERNAL AUDITORS TRADITIONALLY has been dominated by financial reporting, special projects, and compliance-related efforts. But auditors now are constantly challenged to contribute to overall business performance in a tangible way. In the current environment, practitioners must have access to information to help identify the many categories of enterprise risk--wherever possible preempting the incidence of certain risks and mitigating the effects of others.
[ILLUSTRATION OMITTED]
[ILLUSTRATION OMITTED]
Most computer-assisted internal audit tests focus on the numeric data contained within structured sources, such as financial systems and transactional databases. But according to Gartner Research's "Introducing the High-performance Workplace: Improving Competitive Advantage and Employee Impact," unstructured or "text based" data, such as e-mail, documents, and Web-based content, represents an estimated 80 percent of enterprise data within an organization. When assessing written communications or correspondence about a key business event, internal auditors often are limited to reading large volumes of data, with few automated tools to help synthesize, summarize, and cluster key information points to aid in decision making.
To address the full spectrum of data sources surrounding enterprise risk more efficiently, internal auditors can now incorporate unstructured data or "text analytics" tools into their work plans. Text analytics describes a set of analytical tools that identify, classify, and parse words and clusters of words in electronic documents. The software provides for linguistic searches; recognizes and isolates lexical patterns; and provides additional functionality for extracting words by category, theme, or meaning. Moreover, it enables users to tag and structure search results, interpret the data through use of visual tools, and use predictive techniques. Text analytics also describes the process internal auditors and other professionals use to apply these techniques to solve business problems, independently or in conjunction with query and analysis of fielded, numeric structured data.
Text analytics can provide insight into how business risks are emerging. It can also add to internal auditors' understanding of the people, transactions, and dates associated with significant events--including the development and incidence of fraud--without having to read hundreds of e-mails, documents, or presentations. The software can help practitioners increase their audit efficiency, gain greater and more meaningful information about business performance and enterprise risk, and support the organization's compliance efforts. Text analytics tools can be used in the context of a risk-based internal audit, as part of a forensic review of controls or business practices, or during an actual investigation.
RISK ASSESSMENT AND ANALYSIS
Text analytics is a relatively new concept. The software stems from a combination of developments in the fields of litigation support and electronic discovery, counterterrorism and surveillance technology, customer relationship management, and research into the life sciences--specifically, artificial intelligence. The application of text analytics in data review and investigations dates back to the mid 1990s.
Text-mining tools broadly referred to as text analytics help users to extract, group, tag, and analyze associations among identified entities and concepts (e.g., noun themes) and identify the documents that contain them. They create categories, or hierarchical knowledge representations, to auto-classify documents and extracted data. Furthermore, the tools apply statistical techniques to cluster documents according to discovered characteristics.
Text analytics generally is used to examine three main elements of target data: the "who," "what," and "when." Internal auditors incorporating analytics into their existing numeric tests would typically use the tools along these three areas.
THE WHO: SOCIAL NETWORK ANALYSIS According to a study conducted by the research firm Meta Group Inc., now owned by Gartner Inc., 80 percent of business people surveyed prefer using e-mail to using the telephone. Most business transactions or events, then, likely have e-mail communication associated with them. Unlike telephone messages, e-mail contains rich metadata--information stored about the data, such as its author, origin, version, and date accessed--and can be documented easily. For example, to monitor who is communicating with whom in the purchasing department, and conceivably to identify whether any relationships therein implied might signal anomalous activity, an internal auditor might wish to analyze metadata in the "to," "from," "cc," or "bcc" fields in department e-mails.
Many technologies for parsing e-mail with text analytics capabilities are available on the market today, some stemming from investigations and electronic discovery software. These technologies are similar to social network diagrams used in law enforcement or in counterterrorism efforts. They enable users to dynamically map communications between individuals, as demonstrated in the "Who" section of "The Three Elements of Text Analytics" on page 43. Internal auditors should keep in mind, however, that some countries may limit the organization's access to e-mail data.
THE WHAT: CONCEPT MAPPING The ambiguity inherent in human language presents significant challenges to the internal auditor or forensic investigator trying to understand the circumstances and actions around an event. This difficulty is compounded by the tendency of people within organizations to invent their own words or communicate in code.
Language ambiguity can be illustrated by examining the word shred. A simple keyword search on the word might return not only documents that contain text about shredding a document, but also those where two sports fans are having a conversation about "shredding the defense," or even e-mails between spouses about eating "shredded chicken" for dinner. Hence, e-mail research analytics seeks to group similar documents according to their semantic context so that documents about shredding as concealment or covering up an action would be grouped separately from casual e-mails about sports or dinner--thus markedly reducing the volume of e-mail requiring more thorough review.
Concept-based analysis goes beyond traditional search technology by enabling users to group documents according to a statistical inference about the co-occurrence of similar words. In effect, text analytics software allows documents to "describe themselves" and group themselves by context, as in the "shred" example. Because text analytics examines document sets and identifies relationships between documents according to their context, it can produce far more relevant results than traditional keyword searches.
Using text analytics before filtering with keywords can be a powerful strategy for quickly understanding the content of a large corpus of unstructured, text-based data, and for determining what is relevant to the search (see the "What" section of "The Three Elements of Text Analytics"). After viewing concepts at a high level, subsequent keyword selection becomes more effective by enabling users to better understand the possible code words or company-specific jargon. They can develop the keywords based on actual content, instead of guessing relevant terms, words, or phrases up front.
THE WHEN: DOCUMENT THREADS In striving to understand the time frames in which key events took place, auditors often need to not only identify the chronological order of documents (e.g., sorted by or limited to dates), but also link related communication threads, such as e-mails, so that similar threads and communications can be identified and plotted over time. A thread comprises a set of messages connected by various relationships; each message consists of either a first message or a reply to or forwarding of some other message in the set. Messages within a thread are connected by relationships that identify important events, such as a reply vs. a forward, or changes in correspondents.
Quite often, e-mails accumulate long threads with similar subject headings, authors, and message content over time. These threads ultimately may lead to a decision, such as approval to proceed with a project. The approval may be critical to understanding business events that lead up to a particular journal entry. Seeing those threads mapped over time can be a powerful tool when trying to understand the business logic of a complex financial transaction.
In the context of fraud risk, text analytics can be particularly effective when threads and keyword hits are examined in light of the Fraud Triangle. Developed in the 1950s by criminologist Donald Cressey, the Fraud Triangle attempts to explain why people commit fraud. Cressey's premise was that all three components--incentive/pressure, opportunity, and rationalization--are present when fraud exists. The "When" section of "The Three Elements of Text Analytics" illustrates an analysis of the keyword frequency based on the Fraud Triangle. This analysis method can be applied in a variety of business contexts where increases in the frequency of certain keywords--related to incentive/pressure, opportunity, and rationalization--can indicate risk.
SPOTTING COMPLIANCE RISK
During a company's internal risk assessment, compliance risks typically can be mapped to divisions or departments using text analytics. These units usually comprise people, and people in today's organizations typically generate large amounts of e-mail, documents, and other forms of unstructured data.
A large pharmaceutical company's experience with text analytics illustrates how the technology is applied in practice. The U.S. Food & Drug Administration (FDA) suspected the company's salespeople were inappropriately pushing products to certain doctors, making "subtle references" to off-label capabilities. The chief auditor knew his company had already addressed compliance risk in the sales group years ago, and several analytical mechanisms were already in place to monitor such activity. However, data analysis tools couldn't pick up the subtleties of language and implication in everyday communications and sales presentations, and the chief auditor knew he couldn't ignore the risk.




Mobile Edition
Print
Get the Mag
Weekly Updates