Linguistic computing can make two important contributions to second
language (L2) reading instruction. One is to resolve longstanding
research issues that are based on an insufficiency of data for the
researcher, and the other is to resolve related pedagogical problems
based on insufficiency of input for the learner. The research section of
the paper addresses the question of whether reading alone can give
learners enough vocabulary to read. When the computer's ability to
process large amounts of both learner and linguistic data is applied to
this question, it becomes clear that, for the vast majority of L2
learners, free or wide reading alone is not a sufficient source of
vocabulary knowledge for reading. But computer processing also points to
solutions to this problem. Through its ability to reorganize and link
documents, the networked computer can increase the supply of vocabulary
input that is available to the learner. The development section of the
paper elaborates a principled role for computing in L2 reading pedagogy,
with examples, in two broad areas, computer-based text design and
computational enrichment of undesigned texts.
INTRODUCTION
There is a lexical paradox at the heart of reading in a second
language. On one side, after decades of guesswork, there is now
widespread agreement among researchers that text comprehension depends
heavily on detailed knowledge of most of the words in a text. However,
it is also clear that the words that occur in texts are mainly available
for learning in texts themselves. That is because the lexis (vocabulary)
of texts, at least in languages like English, is far more extensive than
the lexis of conversation or other non-textual media. Thus prospective
readers of English must bring to reading the same knowledge they are
intended to get from reading. This paradox has been known in outline for
some time, but in terms loose enough to allow opposite proposals for its
resolution. On one hand, Nation (e.g., 2001) argues for explicit
instruction of targeted vocabulary outside the reading context itself.
On the other, Krashen (e.g., 1989) believes that all the lexis needed
for reading can be acquired naturally through reading itself, in a
second language as in a first. It is only recently that the dimensions
of this paradox could be quantified, with the application of computer
text analysis to questions in language learning. What this
quantification shows is the extreme unlikelihood of developing an
adequate L2 reading lexicon through reading alone, even in highly
favourable circumstances. This case is made in the initial research part
of the paper. The subsequent development section goes on to show how the
text computing that defined the lexical paradox can be re-tooled to
break it, with (1) research-based design of texts and (2) lexical
enrichment of undesigned texts. Empirical support for computational
tools will be provided where available; all tools referred to, both
analytical and pedagogical, are publicly available at the Compleat
Lexical Tutor website (www.lextutor.ca).
DEFINING THE LEXICAL PARADOX
In applied linguistics conversations, turn-taking can involve a
delay of several years. An example is Krashen's (2003) paper
entitled Free voluntary reading: Still a very good idea, which
criticizes the findings of a study by Horst, Cobb and Meara (1998) that
had called into question the amount of vocabulary acquisition that
normally results from free, pleasurable, meaning-oriented extensive
reading. This study found that even with all the usual variables of an
empirical study of extensive L2 reading controlled rather more tightly
than usual, the number of new words that are learned through the
experience of reading a complete, motivating, level-appropriate book of
about 20,000 running words is minimal, and does not indicate that
reading itself can reasonably be seen as the only or even main source of
an adult reading lexicon. The gist of Krashen's response (2003) was
that such studies typically underestimate the amount of lexical growth
that takes place as words are encountered and re-encountered in the
course of free reading. To support this contention, he calculated an
effect size from the Horst, Cobb and Meara data that he interpreted to
show stronger learning than these researchers' conclusions had
implied. But more importantly, beyond the data, he believes that many
words and phrases are learned from reading that do not appear in the
test results of this type of study, owing to the crude nature of the
testing instruments employed, which typically cannot account for partial
or incremental learning. According to this argument, word knowledge is
bubbling invisibly under the surface as one reads, and may appear as a
known item in a vocabulary test only some time later (1). This hidden
vocabulary learning from reading is seen as extensive enough to "do
the entire job" (Krashen, 1989, p. 448) of acquiring a second
lexicon, an idea that Waring and Nation (2004, p. 11) describe as
"now entrenched" within second and foreign language teaching.
Similar claims are many (e.g. Elley's belief that children
graduating from a book flood approach had learned "all the
vocabulary and syntax they required from repeated interactions with good
stories," 1991, pp. 378-79); but clear definitions of "the
entire job" are few.
Krashen has taken part in a number of conventional
vocabulary-from-reading studies that use conventional measures, but
these studies have not provided empirical evidence of either the extent
of such hidden learning, or its sufficiency as the source of a reading
lexicon. Instead, he cites the "default explanation" for the
size of the adult lexicon, an account borrowed from first language (L1)
theorizing (e.g., Nagy, 1988; Sternberg, 1987), whereby the lexical
paradox is resolved through the sheer volume of reading time available
over the course of growing up in a language. According to this
explanation, a lifetime of L1 reading must eventually succeed in doing
the job--even if very little measurable vocabulary knowledge is
registered in any one reading event--since there is no other plausible
way to account for the large number of words that adult native speakers
typically know.
The extension of research assumptions and procedures from L1 to the
L2 learning contexts is questionable at best (2), particularly in the
absence of empirical support. But as will be shown here, both the extent
and sufficiency of hidden vocabulary learning can in fact be
investigated empirically within the L2 context, without recourse to
default arguments. Key to this undertaking are a research
instrumentation, method, and technology for measuring small increments
of lexical knowledge that can be applied to sufficient numbers of words
over a sufficient length of time to be plausibly commensurate with the
known vocabulary sizes of learners: roughly 17,000 English word families
in the case of a typical literate adult L1 lexicon (as calculated by
Goulden, Nation & Read, 1990), or the 5,000 most frequent word
families in the case of L2 (proposed as minimal for effective L2 reading
by Hirsch & Nation, 1992). That is to say, the experimentation
requires substantially more than the handful of words normally tested in
this type of research (typically between 10 and 30, as discussed in
Horst et al., 1998) in order to arrive at a credible estimate of
"the entire job."
Claim A: The Extent of Hidden Learning
An instrument capable of measuring incremental knowledge is Wesche
and Paribakht's (1996) vocabulary knowledge scale, or VKS, which
asks learners to rate their knowledge of words not in binary terms (I
know/I don't know what this word means) but on a five-point scale
(ranging from "I don't remember having seen this word
before," to "I can use this word in a sentence.") But
since the VKS requires learners to also demonstrate their knowledge
(e.g. by writing sentences), it is cumbersome to use in measuring
changes in the knowledge of large numbers of words over time through
repeated encounters, as would be needed to test the claim of extensive
amounts of hidden acquisition. Therefore, Horst and Meara (1999) and
Horst (2000) devised the following ratings-only version, which was
suitable for adaptation to computer.
0 = I definitely don't know what this word means
1 = I am not really sure what this word means
2 = I think I know what this word means
3 = I definitely know what this word means (Horst, 2000, Chapter 7,
p. 149)
Following a reading of a text, learners can efficiently rate their
knowledge of a large number of its words using a computer input that
employs this scale and stores the number of words rated 0, 1, 2, and 3
for each learner and each reading. But the real innovation of the
adaptation is the conversion of the scale to a matrix, which allows the
comparison of ratings over two (or more) readings of the same text. The
matrix (shown in Figure 1) is essentially the 4-point scale in two
dimensions, so that each cell represents results at both time n and
after a subsequent reading (time n+1). For example, the data in the
first horizontal row shows that 75 words had been rated 0 after reading
n and were still rated 0 (I don't know) after reading n+1, but that
27 words had moved from 0 to 1, nine words from 0 to 2, and three words
from 0 to 3. The second row shows how words rated 1 (not sure) at time n
were distributed at time n+1, and so on. In other words, the cell
intersections capture the numbers of words that have changed or failed
to change from one knowledge state to another as a result of a
subsequent reading.
Employing a methodology of repeated readings and a computer-based
testing apparatus that allowed the tracking of large numbers of words,
Horst and Meara were able to trace the ups and downs of word knowledge
that normally pass below the radar of conventional tests. What new
information emerges from this methodology? For just this one state of
the matrix Heading 3 (column 6) of Figure 1 shows that of the 300
learnable targets, 44 (or 3 + 6 +35) have moved into the "I
definitely know this word" state from another knowledge state. This
is new knowledge that would probably have shown up on a standard test.
However, another 56 words (27 + 9 + 20) have made lesser gains (into the
"not sure" and "think I know" territory) that would
probably not have shown up on a standard test. These ratios change over
the course of several readings, as the learning opportunities diminish,
but in the first three readings there are often at least as many words
moving rightward below the radar as above it, i.e., moving to
knowledge-state 1 or 2 rather than 3. A surprising number of words move
to the right and then back to the left for a time, presumably reflecting
either a learning and forgetting cycle, or a hypothesis testing phase,
or elements of both (Horst, 2000). The evidence from the matrix studies
broadly shows that Krashen is right: there is more vocabulary learning
from reading than most tests measure. It seems uncontroversial to
generalize from Horst and Meara's data that the total amount of
vocabulary learning from reading might be as much as double what the
various studies using more conventional measures have typically shown.
An alternate source of evidence for substantial amounts of hidden
vocabulary growth through reading is provided by Waring and Takaki
(2003) using a different methodology. These researchers tested
twenty-five words acquired from reading with measures at three levels of
difficulty--passive recognition (that a word had been seen in the text),
aided meaning selection (by a multiple choice measure), and unaided
recall (through a translation test)--and found that scores were almost
2.5 times higher for multiple choice than translation, and more than 3
times higher for recognition than for translation. In other words, most
of the initial learning represented by remembering that a word had
appeared in the text would not have registered on either of the other
more difficult tests. This finding thus complements the matrix finding,
albeit for a smaller number of items.
But can we get from here to sufficiency? Even if it is clear that
more learning takes place through word encounters than most tests
measure, is free reading able to provide a sufficient number of such
encounters?
Claim B: The Sufficiency of Hidden Learning
Krashen's related claim, the sufficiency of hidden vocabulary
growth, can also be tested empirically in an L2 context, as the
following very basic experiment in corpus analysis demonstrates. But
first we need some definitions.
To arrive at an operational definition of sufficiency, we might ask
questions such as: How many words are enough for various purposes, such
as to begin academic study in a second language, or to undertake a
professional activity? Vocabulary researchers working on questions of
coverage calculate the minimum number of word families needed for
non-specialist reading of materials designed for native speakers to be
between 3000 (Laufer, 1989) and 5000 word families (Hirsch & Nation,
1992)--provided these are high frequency items and not just random
pick-ups. How many encounters are needed for word learning to occur? The
number varies with a host of individual and contextual factors, but the
majority of studies (reviewed in Zahar, Cobb & Spada, 2001) find
that an average of six to ten encounters are needed for stable initial
word learning to occur. In Horst's (2000) matrix work, six
encounters were the minimum exposure for words to travel reliably from
state 0 to state 3 and stabilize. Will anything like 3,000 word families
be met six times apiece through free reading?
Investigation 1
The materials assembled to answer this question were chosen to give
the free reading argument optimal chances of succeeding. Thus the
vocabulary size assumed to be sufficient for comprehension and learning
was set as low as could be deemed plausible, at 3000 word families of
written English rather than 5000. In contrast, the amount of reading a
typical L2 learner would be likely to achieve was set as high as could
be deemed plausible. A sample of the free reading that an ESL reader
might be expected to undertake over a year or two of language study was
extracted from the 1 million word Brown corpus (Kucera & Francis,
1979). This classic corpus comprises 500 text samples of roughly 2,000
words grouped into subcorpora of various sizes (different kinds of
fiction, etc., as shown in the bottom half of Figure 2). To reflect the
kinds of reading learners might do, the original sub-corpora were
further grouped into three broad categories (press, academic, and
fiction) of roughly similar size (179,000 words, 163,000 words, and
175,000 words, respectively). It is reasonable to suppose that one of
these three groupings is a plausible if optimistic representation of the
amount of free reading of authentic material that learners might achieve
over a year or two of language study (these word counts are roughly
equivalent to 100 pages of newspaper text, six stories the size of Alice
in Wonderland, or 17 academic studies the length of this one.)
[FIGURE 2 OMITTED]
High frequency words were extracted from the 100-million-word
British National Corpus (Leech, Rayson, & Wilson, 2001) and grouped
into families and then into thousand-family lists by Nation (2006,
available at http://www.lextutor.ca/vp/bnc/). The first three of
Nation's lists (i.e. the 3000 most frequent word families)
represent the current best estimate of the basic learner lexicon of
English. A random item-from-wordlist generator (available at
http://www.lextutor.ca/rand_words/) produced 20 sets of three 10-word
samples from the 1000, 2000, and 3000 British National Corpus (BNC)
lists. One of these sets was selected randomly for use as sample
learning targets in the investigation (3).
A computer program calculated the number of occurrences of each
sample word family that a learner would encounter in each of the Brown
sub-corpora. This computer program called Range (Heatley & Nation,
1994) was adapted for Internet by the author and is available at
http://www.lextutor.ca/range/. Figure 2 shows the distribution of a
word, phrase, or family throughout a set of texts. The original version
of the program allowed users to specify their own texts; the online
version shown in Figure 2 provides a set of standard texts, namely the
15 original sub-corpora or the three larger groupings of the Brown
corpus already mentioned. In the present experiment, word families as
opposed to individual words were the search units. This was achieved by
entering a stem form plus apostrophe for each item as appropriate
(abandon' finds abandons, abandoning, abandonment, as shown in
Figure 2). Since it cannot be taken for granted that learners will
recognize family members as being related (Schmitt & Zimmerman,
2002), incorporating whole families in the analysis is likely to provide
a generous estimate of the learning opportunities in the text sample. A
similarly generous assumption is that learners have perfect memory for
encountered items over extended time and text.
The distribution of the BNC 4000-level word family abandon'
throughout the three major divisions of the Brown corpus is requested in
Figure 2; the output of the Range search is shown in Figure 3. The point
to notice is that while this item appears in all three samples, it
appears more than six times in only one of them (press writing).
[FIGURE 3 OMITTED]
The overall and perhaps unexpected finding from this analysis is
that after the most frequent 1000 items, family ranks tend to thin quite
rapidly, and with them the learning opportunities. Table 1 shows the
distributions in the three Brown samples for the ten target word
families from each of the three most frequent BNC levels. For each
target word family, the total number of occurrences in each sub-corpus
is shown; at the bottom of each column, the number of targets appearing
more than six times in each subcorpus is shown. As can be seen, all
1000-level word families will be met more than six times in press
writing, all except bus more than six times in academic writing, and all
except bus and associat' in fiction. However, five 2000-level
families (persua', technolog', wire', analy', and
sue) will dip below six encounters in one or more areas. And none of the
3000-level families will be encountered six times in all three areas,
and half or more are not met six times in any area. (No member of the
irritat' family is met in 163,000 words of academic text!) A
sideline finding is that fiction writing, once the usual focus of free
reading programs for learners in the process of acquiring 1000 and
2000-level vocabulary, does not present the strongest learning
opportunities in either of these zones. Fiction does, however, seem to
be a reasonable source of 3000-level items, providing six occurrences
for five of its 10 words, as compared to four for press and three for
academic writing. It is therefore worth looking at the vocabulary
growing opportunities of fiction reading more closely.
Investigation 2
For a complementary investigation, the sufficiency of a generous
diet of free fiction reading as the sole or main source of vocabulary
growth for 3000-level families is now examined. At the same time, the
reading sample is changed from a corpus sample of texts produced by many
writers to a sample of texts produced by a single author, where the
vocabulary learning opportunities are arguably greater (through
characteristic themes, repetitions, etc.). A corpus of just under
300,000 words was assembled from seven Jack London stories (including
school favorites Call of the Wild (1903) and White Fang (1906) all
offered free of cost at http://london.sonoma.edu/Writings/) as a second
plausible representation of a heavy diet of free reading. Would a
learner who read all these stories meet most of the 3000-level families
six times apiece?
The computational tool used in this analysis is lexical frequency
profiling, in this case the BNC version of VocabProfile (available at
http://www.lextutor.ca/vp/bnc/, illustrated below in another context),
which breaks any English text into its frequency levels according to the
thousand-levels scheme already employed. The results of this analysis
are as follows: The full collection of London adventure stories was
shown to contain 817 word families at the 3000 level; however, only 469
of them are met six times or more, while 348 are met five times or less
(181 of them twice or less). In other words, fewer than half will be met
enough times for reliable learning to occur. Interestingly, this result
is similar to that shown in Table 1, where half the 3000-level words
appeared six times or more in the fiction sub-corpus.
Conclusion
Together, these projections indicate that even the largest
plausible amounts of free reading will not take the learner very far
into the 3000-family zone. It is thus somewhat redundant to raise the
matter that even words met more than six times are not necessarily
learned. New word meanings are normally inferable in environments
containing no more than one unknown item per 20 known items, (Laufer,
1989; Liu Na & Nation, 1985). However, VocabProfile analysis of one
of the best known of the London stories (Call of the Wild, comprising
31,473 words) shows that 10% of the text's words (not including
proper nouns) come from frequency zones beyond the 3000 level itself,
sometimes well beyond it. This means that many of the novel's
3000-level items will be met in environments of 1 unknown item per 10
words, or double the density that research has shown learners able to
enjoy or learn from (4).
To summarize, this analysis is based on the most generous
conditions possible: a 3000 word size requirement rather than 5000; six
occurrences for learning rather than ten; a one in twenty new word
density; a larger and broader diet of input than many learners will
provide for themselves; an assumption that family members are usually
recognized; and an assumption of minimal forgetting between reading
encounters. Even then, fewer than half the 3000 level words present
themselves sufficiently for reliable learning to occur. Further, the
situation only gets worse for word families at the 4000 and 5000 levels
and beyond. Thus, while there may well be more word learning from random
encounters in free extensive reading than meets the eye, the fact is
interesting but irrelevant, since most post-2000 words simply will not
be encountered at all in a year or two of reading. Therefore free
reading alone is not sufficient to "do the entire job" of
building a functional second lexicon in any typical time frame of L2
learning.
To refute this finding, sufficiency proponents would need to define
what the "entire job" of reading in an L2 is, and then show,
either empirically or in principle, how this job can be done through
reading alone, given the learning rate, learning conditions, and lexical
profile findings outlined above and elsewhere. Until then, the common
finding that many ESL learners tend to plateau with usable knowledge of
about 2000 words families or less (leaving them poorly equipped to
comprehend most texts) remains entirely explicable (Cobb, 2003).
BREAKING THE LEXICAL PARADOX
The findings presented thus far present a basis in text analysis
for what many studies have shown empirically in the past 20 years (e.g.
Alderson, 1984; Bernhardt, 2005), that L2 reading is "a
problem" and that the main problem is lexis. This longstanding
awareness, in the research if not in the teaching community, has
produced many proposals to supplement vocabulary growth from reading
with other and more direct approaches to vocabulary learning. Examples
include Paribakht and Wesche's (1997) reading-plus (plus vocabulary
activities) scheme and various vocabulary course supplements (e.g.,
Barnard, 1972; Redman & Ellis, 1991; Schmitt & Schmitt, 2004).
But there are problems in principle with the supplement solution,
all of which rely to some degree on separating learning the words for
reading from the act of reading. One problem is that lexical knowledge
does not necessarily transfer well from vocabulary exercise and
dictionary look-up to text comprehension (Cobb, 1997; Krashen, 2003;
Mezynski, 1983), especially when there is a delay between the two.
Second, the number of words to be met and recycled typically
proliferates the vocabulary supplements to sets of several volumes
(e.g., Barnard's five volumes; Redman & Ellis' four
volumes), diverting a large amount of instruction time away from reading
itself. The reading-plus approach is reading-based in that the target
words are drawn from a text just read, but it has the disadvantage that
this work must be prepared by a teacher with a text and vocabulary items
that have been selected in advance and so can only be developed for a
small handful of texts.
What is missing from either supplement scenario is some way of
focusing attention on and proliferating encounters with new words at any
level within the act of reading, or shortly after reading, for any type
of text, and for lots of texts. The following section of this paper will
look at several concrete proposals for doing this, with reference to
empirical validation where available. The goal is to use computing to
preserve the free in free reading. Two broad approaches will be
described and illustrated. The first is computer-based text design, and
the second is computer-aided enrichment of undesigned texts.
Computer-Aided Text Design: The Case for Home-made Simplified
Materials
In principle, simplified or graded texts can meet some of the word
learning requirements outlined above. Texts can be written to a
particular vocabulary knowledge level, with words beyond that level
introduced in environments that meet the '1 unknown word in
20' ratio mentioned earlier as the criterion for reliable guessing
from context. New words can be recycled the desired number of times, in
a process extending over a series of texts, until a vocabulary target,
whether 3000 or 5000 frequent word families, has been met. Doing such
re-writing well is clearly a difficult and expensive job. Perhaps for
this reason, there is no set of graded readers in English that
explicitly attempts to do it all. The arguably best designed of the
graded reader sets available (e.g., the Longman Penguin series or
Oxford's Bookworm), while useful, share a number of limitations
that are readily evident without the help of detailed text analysis. As
noted by Hill (1997), these texts are almost exclusively based on just
one text genre, narrative fiction (either classics safely out of
copyright or custom written originals). They employ a variety of
unspecified frequency classification systems and offer no method of
matching learner level to text level other than self-selection. And they
make no claims about how many stories at each level a learner would have
to read to achieve mastery at that level, or what coverage this mastery
would provide with respect to real texts (although researchers like
Nation & Wang (1999) have looked at some of these questions).
Computer text analysis can add two further limitations. One is that
no series of graded readers proceeds systematically beyond the
3000-families level, and even those that get this far do not cover it
particularly well. This is shown in a VocabProfile analysis of a whole
set of graded readers similar to the analysis of the Jack London stories
above. If a learner read all 54 stories at six levels in the Bookworm
series (a total of 377,576 words), he or she would indeed meet 931 of
the thousand word families at the 3000 frequency level, but would meet
just over half of them (511 families) six times or more. This analysis
is remarkably similar to the two above which also showed only half of
the 3000-level families appearing six times or more. A difference,
however, is the overall known-word density of the contexts the words
will be met in, owing to the