More Resources

Computing the vocabulary demands of L2 reading.


by Cobb, Tom

A further limitation is that fictional text whether simplified or not may be inherently limiting as a source of vocabulary growth given most learners' reading goals. Taking up a concern expressed by Venezky (1982), Gardner (2004) performed a text analysis on fiction vs. expository L1 school reading materials and found that the lexis of expository text (i.e., the language used in academic and professional settings) differs from the lexis of fiction in substantial ways. Expository text basically comprises more words, different words, and more difficult words, in addition to unfamiliar discourse patterns that are not simple mirrors of real-life time sequences. Gardner questions the suitability of using fiction texts as preparation for reading expository texts, as is common practice.

There is clearly a case for producing graded materials beyond those presently available. In addition to materials that systematically target vocabulary growth beyond the 2000-level, there is also a need for expository materials to supplement the fictional. If publishers are unlikely to produce a complete range of research-indicated, graded materials, then these can be produced by institutions or by individuals, but presumably with some difficulty. However, two complementary types of text computing can make the task of in-house text simplification feasible, if not simple. Frequency profiling software can be used to find, adapt or create texts to a pre-specified lexical profile and coverage; and text comparison software can be used to establish degree of lexical recycling over a series of texts.

Writing Words Out: Lexical Frequency Profiling

An example of profiling software is VocabProfile (available at www.lextutor.ca/vp/), which categorizes the lexis of texts according to frequency. Users can select either Nation's original frequency scheme of General Service List and Academic Word list (Coxhead, 2000) or 20 BNC thousand-family lists (Nation's 14 BNC-based lists mentioned above were recently expanded by the author to 20. Viewable versions of the 20 lists are linked from BNC Vocabprofile's entry page at http://www.lextutor.ca/vp/bnc/.) (5). Figure 4 shows the input to the BNC version of VocabProfile for a text by a Canadian journalist, Rex Murphy, and Figure 5 shows the output for the same text.

[FIGURE 4 OMITTED]

[FIGURE 5 OMITTED]

It is a fairly simple matter to use VocabProfile interactively to modify the lexical level of a sizeable text. VocabProfile will identify the words of learning interest for a particular group, say those between 5,000 and 10,000 frequency level for advanced learners, and thereby indicate the words that need to be written out so that target items occur in suitable known-to-unknown ratios. In the Murphy text that would mean writing out about 20 items. Using the window entry mode, the editor can go back and forth, editing and checking in iterations. This work is easier if the learners' approximate level is known using the same testing framework that the software employs; this is the case with many of the measures available at http://www.lextutor.ca/tests/.

Writing Words In: Text Comparison Software

As already mentioned, research indicates that the average number of encounters needed for reliable retention of a novel lexical item is between six and ten. There are sub-dimensions to this basic learning condition, such as the spacing between encounters (Mondria & Wit-de Boer, 1993) and the properties of the contexts surrounding the items (Cobb, 1999; Mondria & Wit-de Boer, 1991); but as shown above, just ensuring six encounters of any kind for a significant proportion of any post-2000 word list is not simple.

Interesting schemes have been proposed for finding existing texts with high degrees of repeated lexis, for example by following one topic through a number of related news stories (Wang & Nation, 1989) or through narrow reading (Schmitt & Carter, 2000). Such schemes have proven able to ensure high degrees of recycling, but only for relatively small sets of words. It seems likely that found texts would have to be supplemented by designed texts to ensure systematic opportunities for vocabulary expansion on a larger scale. A way of testing the amount of lexical repetition in found texts, or creating it through interactive modification, is to use text comparison software that can track large numbers of words through several successive texts. Such a program is TextLexCompare (available at www.lextutor.ca/text_lex_compare), which takes two or more texts as input and gives numbers of repeated and unrepeated words as output. Figure 6 shows two related texts by the same author ready for analysis in the program's dual input windows, namely the first two chapters of the aforementioned Call of the Wild by Jack London; Figure 7 shows the output.

[FIGURE 6 OMITTED]

[FIGURE 7 OMITTED]

The software also provides an experimental recycling index (recycled words/total words in the second text), which is currently being calibrated to establish norms of repetition. Initial indications (from the four demonstration texts available on the entry screen) are that the degree of repetition between two unrelated texts by different authors is about 40% of word tokens (largely function words); between unrelated texts by the same author about 60%; and between related or sequential texts by the same author about 70%.

The output in the sample analysis shown in Figure 7 shows that of 3,335 total word tokens in the second chapter of the book, 2,371 are repeated. In other words, a reader will have already met about 70% of the running items in the previous chapter (and about 30% will be 'new'). From a vocabulary learning perspective, this is probably a low proportion of repeated items, as will be outlined below. The provenance of the unrepeated items in frequency terms can be further investigated by clicking 'VP Novel Items' at the top right of the output screen (Figure 7), which is a direct link to VocabProfile with the novel items as input. The VP analysis shows that for these texts 36% of the unrepeated lexis is drawn from the 4,000 to 19,000 frequency zones.

In a narrative text, the rate of recycling should logically increase as the story proceeds. How much does it increase in Call of the Wild? To answer this question, tokens in each new chapter were matched against tokens in the combined preceding chapters using the multi-text input feature of TextLexCompare (see bottom half of Figure 6). Results of the analysis for the seven chapters of the London novel are shown in Table 2. The point to notice is that the recycling index never goes above 90% for any chapter. This means that many or most words throughout the story are being met in density environments of one unknown word in 10. This means that many or most words throughout the story are being met in density environments of one unknown word in 10 (double the density that learners can handle, according to Laufer, 1989), and, further, that this situation persists right to the end of the novel. In other words, as it stands this is not a very useful learning text for many L2 readers. However, the text could be modified to become a useful learning text by systematically reducing the flow of novel lexis to a particular level and then increasing the repetition of what remains.

That is what has been done by Longman's writers for its Penguin graded version of Call of the Wild. To calculate the success of their reworking of the story, the first seven chapters of the simplified version were fed into TextLexCompare, as was done in the analysis of the original. The results are shown on the right side of Table 2. As can be seen, the recycling index is not only higher overall in the graded than in the original story (89.04% for graded against 82.85% ungraded, (t (16.9), p<.001)), but also the index rises over the course of the story so that in the final chapter the learner is actually meeting new words in an environment of almost 95% previously met words--previously met at least once, that is, with the actual number of repetitions recoverable from the type of data shown in Figure 7.

TextLexCompare or similar software can be used, then, either as an inspection tool to verify the degree of recycling in sequences of found texts, or (in conjunction with VocabProfile) as an interactive aid to principled modification. To summarize, a fairly simple computer-aided in-house procedure for turning sequences of natural texts into sequences of learning texts is as follows: use diagnostic testing to determine students' growth zone (or i+1) in terms of families; find text sequences that have a high proportion of words from this frequency zone; use VocabProfile to write out as many words as possible that fall beyond this level; and use TextLexCompare to write in more of the same or other words from this level, with the goal of reaching a recycling ratio of 95% well before the end of the story.

This procedure clearly presupposes that candidate texts are available in machine readable format, as indeed they increasingly are. This format also offers two further opportunities. First, as noted above, there is a dearth of graded materials in the world of ESL (and no doubt other languages) generally, and of both expository materials and post-3000-focused materials in particular. Using the scheme of frequency based tests and tools outlined above, it is possible in principle to organize a large online repository of graded text materials categorized by size, text type, target level, and recycling schedule.

Computer-aided Enrichment of Undesigned Texts


1  2  3  4  5  6  7  8  
COPYRIGHT 2007 University of Hawaii, National Foreign Language Resource Center Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2007, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.


Browse by Journal Name:
Today on Entrepreneur

e-Business & Technology
Franchise News
Business Book Sampler
Starting a Business
Sales & Marketing
Growing a Business
E-mail*:
Zip Code*: