Employing a methodology of repeated readings and a computer-based
testing apparatus that allowed the tracking of large numbers of words,
Horst and Meara were able to trace the ups and downs of word knowledge
that normally pass below the radar of conventional tests. What new
information emerges from this methodology? For just this one state of
the matrix Heading 3 (column 6) of Figure 1 shows that of the 300
learnable targets, 44 (or 3 + 6 +35) have moved into the "I
definitely know this word" state from another knowledge state. This
is new knowledge that would probably have shown up on a standard test.
However, another 56 words (27 + 9 + 20) have made lesser gains (into the
"not sure" and "think I know" territory) that would
probably not have shown up on a standard test. These ratios change over
the course of several readings, as the learning opportunities diminish,
but in the first three readings there are often at least as many words
moving rightward below the radar as above it, i.e., moving to
knowledge-state 1 or 2 rather than 3. A surprising number of words move
to the right and then back to the left for a time, presumably reflecting
either a learning and forgetting cycle, or a hypothesis testing phase,
or elements of both (Horst, 2000). The evidence from the matrix studies
broadly shows that Krashen is right: there is more vocabulary learning
from reading than most tests measure. It seems uncontroversial to
generalize from Horst and Meara's data that the total amount of
vocabulary learning from reading might be as much as double what the
various studies using more conventional measures have typically shown.
An alternate source of evidence for substantial amounts of hidden
vocabulary growth through reading is provided by Waring and Takaki
(2003) using a different methodology. These researchers tested
twenty-five words acquired from reading with measures at three levels of
difficulty--passive recognition (that a word had been seen in the text),
aided meaning selection (by a multiple choice measure), and unaided
recall (through a translation test)--and found that scores were almost
2.5 times higher for multiple choice than translation, and more than 3
times higher for recognition than for translation. In other words, most
of the initial learning represented by remembering that a word had
appeared in the text would not have registered on either of the other
more difficult tests. This finding thus complements the matrix finding,
albeit for a smaller number of items.
But can we get from here to sufficiency? Even if it is clear that
more learning takes place through word encounters than most tests
measure, is free reading able to provide a sufficient number of such
encounters?
Claim B: The Sufficiency of Hidden Learning
Krashen's related claim, the sufficiency of hidden vocabulary
growth, can also be tested empirically in an L2 context, as the
following very basic experiment in corpus analysis demonstrates. But
first we need some definitions.
To arrive at an operational definition of sufficiency, we might ask
questions such as: How many words are enough for various purposes, such
as to begin academic study in a second language, or to undertake a
professional activity? Vocabulary researchers working on questions of
coverage calculate the minimum number of word families needed for
non-specialist reading of materials designed for native speakers to be
between 3000 (Laufer, 1989) and 5000 word families (Hirsch & Nation,
1992)--provided these are high frequency items and not just random
pick-ups. How many encounters are needed for word learning to occur? The
number varies with a host of individual and contextual factors, but the
majority of studies (reviewed in Zahar, Cobb & Spada, 2001) find
that an average of six to ten encounters are needed for stable initial
word learning to occur. In Horst's (2000) matrix work, six
encounters were the minimum exposure for words to travel reliably from
state 0 to state 3 and stabilize. Will anything like 3,000 word families
be met six times apiece through free reading?
Investigation 1
The materials assembled to answer this question were chosen to give
the free reading argument optimal chances of succeeding. Thus the
vocabulary size assumed to be sufficient for comprehension and learning
was set as low as could be deemed plausible, at 3000 word families of
written English rather than 5000. In contrast, the amount of reading a
typical L2 learner would be likely to achieve was set as high as could
be deemed plausible. A sample of the free reading that an ESL reader
might be expected to undertake over a year or two of language study was
extracted from the 1 million word Brown corpus (Kucera & Francis,
1979). This classic corpus comprises 500 text samples of roughly 2,000
words grouped into subcorpora of various sizes (different kinds of
fiction, etc., as shown in the bottom half of Figure 2). To reflect the
kinds of reading learners might do, the original sub-corpora were
further grouped into three broad categories (press, academic, and
fiction) of roughly similar size (179,000 words, 163,000 words, and
175,000 words, respectively). It is reasonable to suppose that one of
these three groupings is a plausible if optimistic representation of the
amount of free reading of authentic material that learners might achieve
over a year or two of language study (these word counts are roughly
equivalent to 100 pages of newspaper text, six stories the size of Alice
in Wonderland, or 17 academic studies the length of this one.)
[FIGURE 2 OMITTED]
High frequency words were extracted from the 100-million-word
British National Corpus (Leech, Rayson, & Wilson, 2001) and grouped
into families and then into thousand-family lists by Nation (2006,
available at http://www.lextutor.ca/vp/bnc/). The first three of
Nation's lists (i.e. the 3000 most frequent word families)
represent the current best estimate of the basic learner lexicon of
English. A random item-from-wordlist generator (available at
http://www.lextutor.ca/rand_words/) produced 20 sets of three 10-word
samples from the 1000, 2000, and 3000 British National Corpus (BNC)
lists. One of these sets was selected randomly for use as sample
learning targets in the investigation (3).
A computer program calculated the number of occurrences of each
sample word family that a learner would encounter in each of the Brown
sub-corpora. This computer program called Range (Heatley & Nation,
1994) was adapted for Internet by the author and is available at
http://www.lextutor.ca/range/. Figure 2 shows the distribution of a
word, phrase, or family throughout a set of texts. The original version
of the program allowed users to specify their own texts; the online
version shown in Figure 2 provides a set of standard texts, namely the
15 original sub-corpora or the three larger groupings of the Brown
corpus already mentioned. In the present experiment, word families as
opposed to individual words were the search units. This was achieved by
entering a stem form plus apostrophe for each item as appropriate
(abandon' finds abandons, abandoning, abandonment, as shown in
Figure 2). Since it cannot be taken for granted that learners will
recognize family members as being related (Schmitt & Zimmerman,
2002), incorporating whole families in the analysis is likely to provide
a generous estimate of the learning opportunities in the text sample. A
similarly generous assumption is that learners have perfect memory for
encountered items over extended time and text.
The distribution of the BNC 4000-level word family abandon'
throughout the three major divisions of the Brown corpus is requested in
Figure 2; the output of the Range search is shown in Figure 3. The point
to notice is that while this item appears in all three samples, it
appears more than six times in only one of them (press writing).
[FIGURE 3 OMITTED]
The overall and perhaps unexpected finding from this analysis is
that after the most frequent 1000 items, family ranks tend to thin quite
rapidly, and with them the learning opportunities. Table 1 shows the
distributions in the three Brown samples for the ten target word
families from each of the three most frequent BNC levels. For each
target word family, the total number of occurrences in each sub-corpus
is shown; at the bottom of each column, the number of targets appearing
more than six times in each subcorpus is shown. As can be seen, all
1000-level word families will be met more than six times in press
writing, all except bus more than six times in academic writing, and all
except bus and associat' in fiction. However, five 2000-level
families (persua', technolog', wire', analy', and
sue) will dip below six encounters in one or more areas. And none of the
3000-level families will be encountered six times in all three areas,
and half or more are not met six times in any area. (No member of the
irritat' family is met in 163,000 words of academic text!) A
sideline finding is that fiction writing, once the usual focus of free
reading programs for learners in the process of acquiring 1000 and
2000-level vocabulary, does not present the strongest learning
opportunities in either of these zones. Fiction does, however, seem to
be a reasonable source of 3000-level items, providing six occurrences
for five of its 10 words, as compared to four for press and three for
academic writing. It is therefore worth looking at the vocabulary
growing opportunities of fiction reading more closely.
Investigation 2
COPYRIGHT 2007 University of Hawaii, National
Foreign Language Resource Center Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2007, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.