A further limitation is that fictional text whether simplified or
not may be inherently limiting as a source of vocabulary growth given
most learners' reading goals. Taking up a concern expressed by
Venezky (1982), Gardner (2004) performed a text analysis on fiction vs.
expository L1 school reading materials and found that the lexis of
expository text (i.e., the language used in academic and professional
settings) differs from the lexis of fiction in substantial ways.
Expository text basically comprises more words, different words, and
more difficult words, in addition to unfamiliar discourse patterns that
are not simple mirrors of real-life time sequences. Gardner questions
the suitability of using fiction texts as preparation for reading
expository texts, as is common practice.
There is clearly a case for producing graded materials beyond those
presently available. In addition to materials that systematically target
vocabulary growth beyond the 2000-level, there is also a need for
expository materials to supplement the fictional. If publishers are
unlikely to produce a complete range of research-indicated, graded
materials, then these can be produced by institutions or by individuals,
but presumably with some difficulty. However, two complementary types of
text computing can make the task of in-house text simplification
feasible, if not simple. Frequency profiling software can be used to
find, adapt or create texts to a pre-specified lexical profile and
coverage; and text comparison software can be used to establish degree
of lexical recycling over a series of texts.
Writing Words Out: Lexical Frequency Profiling
An example of profiling software is VocabProfile (available at
www.lextutor.ca/vp/), which categorizes the lexis of texts according to
frequency. Users can select either Nation's original frequency
scheme of General Service List and Academic Word list (Coxhead, 2000) or
20 BNC thousand-family lists (Nation's 14 BNC-based lists mentioned
above were recently expanded by the author to 20. Viewable versions of
the 20 lists are linked from BNC Vocabprofile's entry page at
http://www.lextutor.ca/vp/bnc/.) (5). Figure 4 shows the input to the
BNC version of VocabProfile for a text by a Canadian journalist, Rex
Murphy, and Figure 5 shows the output for the same text.
[FIGURE 4 OMITTED]
[FIGURE 5 OMITTED]
It is a fairly simple matter to use VocabProfile interactively to
modify the lexical level of a sizeable text. VocabProfile will identify
the words of learning interest for a particular group, say those between
5,000 and 10,000 frequency level for advanced learners, and thereby
indicate the words that need to be written out so that target items
occur in suitable known-to-unknown ratios. In the Murphy text that would
mean writing out about 20 items. Using the window entry mode, the editor
can go back and forth, editing and checking in iterations. This work is
easier if the learners' approximate level is known using the same
testing framework that the software employs; this is the case with many
of the measures available at http://www.lextutor.ca/tests/.
Writing Words In: Text Comparison Software
As already mentioned, research indicates that the average number of
encounters needed for reliable retention of a novel lexical item is
between six and ten. There are sub-dimensions to this basic learning
condition, such as the spacing between encounters (Mondria & Wit-de
Boer, 1993) and the properties of the contexts surrounding the items
(Cobb, 1999; Mondria & Wit-de Boer, 1991); but as shown above, just
ensuring six encounters of any kind for a significant proportion of any
post-2000 word list is not simple.
Interesting schemes have been proposed for finding existing texts
with high degrees of repeated lexis, for example by following one topic
through a number of related news stories (Wang & Nation, 1989) or
through narrow reading (Schmitt & Carter, 2000). Such schemes have
proven able to ensure high degrees of recycling, but only for relatively
small sets of words. It seems likely that found texts would have to be
supplemented by designed texts to ensure systematic opportunities for
vocabulary expansion on a larger scale. A way of testing the amount of
lexical repetition in found texts, or creating it through interactive
modification, is to use text comparison software that can track large
numbers of words through several successive texts. Such a program is
TextLexCompare (available at www.lextutor.ca/text_lex_compare), which
takes two or more texts as input and gives numbers of repeated and
unrepeated words as output. Figure 6 shows two related texts by the same
author ready for analysis in the program's dual input windows,
namely the first two chapters of the aforementioned Call of the Wild by
Jack London; Figure 7 shows the output.
[FIGURE 6 OMITTED]
[FIGURE 7 OMITTED]
The software also provides an experimental recycling index
(recycled words/total words in the second text), which is currently
being calibrated to establish norms of repetition. Initial indications
(from the four demonstration texts available on the entry screen) are
that the degree of repetition between two unrelated texts by different
authors is about 40% of word tokens (largely function words); between
unrelated texts by the same author about 60%; and between related or
sequential texts by the same author about 70%.
The output in the sample analysis shown in Figure 7 shows that of
3,335 total word tokens in the second chapter of the book, 2,371 are
repeated. In other words, a reader will have already met about 70% of
the running items in the previous chapter (and about 30% will be
'new'). From a vocabulary learning perspective, this is
probably a low proportion of repeated items, as will be outlined below.
The provenance of the unrepeated items in frequency terms can be further
investigated by clicking 'VP Novel Items' at the top right of
the output screen (Figure 7), which is a direct link to VocabProfile
with the novel items as input. The VP analysis shows that for these
texts 36% of the unrepeated lexis is drawn from the 4,000 to 19,000
frequency zones.
In a narrative text, the rate of recycling should logically
increase as the story proceeds. How much does it increase in Call of the
Wild? To answer this question, tokens in each new chapter were matched
against tokens in the combined preceding chapters using the multi-text
input feature of TextLexCompare (see bottom half of Figure 6). Results
of the analysis for the seven chapters of the London novel are shown in
Table 2. The point to notice is that the recycling index never goes
above 90% for any chapter. This means that many or most words throughout
the story are being met in density environments of one unknown word in
10. This means that many or most words throughout the story are being
met in density environments of one unknown word in 10 (double the
density that learners can handle, according to Laufer, 1989), and,
further, that this situation persists right to the end of the novel. In
other words, as it stands this is not a very useful learning text for
many L2 readers. However, the text could be modified to become a useful
learning text by systematically reducing the flow of novel lexis to a
particular level and then increasing the repetition of what remains.
That is what has been done by Longman's writers for its
Penguin graded version of Call of the Wild. To calculate the success of
their reworking of the story, the first seven chapters of the simplified
version were fed into TextLexCompare, as was done in the analysis of the
original. The results are shown on the right side of Table 2. As can be
seen, the recycling index is not only higher overall in the graded than
in the original story (89.04% for graded against 82.85% ungraded, (t
(16.9), p<.001)), but also the index rises over the course of the
story so that in the final chapter the learner is actually meeting new
words in an environment of almost 95% previously met words--previously
met at least once, that is, with the actual number of repetitions
recoverable from the type of data shown in Figure 7.
TextLexCompare or similar software can be used, then, either as an
inspection tool to verify the degree of recycling in sequences of found
texts, or (in conjunction with VocabProfile) as an interactive aid to
principled modification. To summarize, a fairly simple computer-aided
in-house procedure for turning sequences of natural texts into sequences
of learning texts is as follows: use diagnostic testing to determine
students' growth zone (or i+1) in terms of families; find text
sequences that have a high proportion of words from this frequency zone;
use VocabProfile to write out as many words as possible that fall beyond
this level; and use TextLexCompare to write in more of the same or other
words from this level, with the goal of reaching a recycling ratio of
95% well before the end of the story.
This procedure clearly presupposes that candidate texts are
available in machine readable format, as indeed they increasingly are.
This format also offers two further opportunities. First, as noted
above, there is a dearth of graded materials in the world of ESL (and no
doubt other languages) generally, and of both expository materials and
post-3000-focused materials in particular. Using the scheme of frequency
based tests and tools outlined above, it is possible in principle to
organize a large online repository of graded text materials categorized
by size, text type, target level, and recycling schedule.
Computer-aided Enrichment of Undesigned Texts
COPYRIGHT 2007 University of Hawaii, National
Foreign Language Resource Center Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2007, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.