ABSTRACT
The Test of English as a Foreign Language (TOEFL) contains a direct writing assessment, and examinees are given the option of composing their responses at a computer terminal using a keyboard or composing their responses in handwriting. This study sought to determine whether performance on a direct writing assessment is comparable for examinees when given the choice to compose essays in handwriting versus word processing. We examined this relationship controlling for English language proficiency and several demographic characteristics of examinees using linear models. We found a weak two-way interaction between composition medium and English language proficiency with examinees with weaker English language scores performing better on handwritten essays while examinees with better English language scores performing comparably on the two testing media. We also observed predictable differences associated with geographic region, native language, gender, and age.
INTRODUCTION
Increasingly, computers are being used to administer selection and certification tests. With the transition from a paper-based to a computer-based testing system comes a potential threat to the consequential basis of test use aspect of validity (Messick, 1989). That is, implementation of a computer-based testing program could result in unintended negative consequences for some examinees or for some societal components of the testing system. For example, differences in the performance of gender and ethnic groups exist on paper-based tests, and some fear that the shift toward a computer-based testing system may exacerbate existing social barriers to advancement opportunities for women, minorities, economically disadvantaged, and elderly individuals. Previous research comparing computer-based and paper-and-pencil tests has revealed only small differences between population means of multiple-choice tests administered in these two media (Mead & Drasgow, 1993). However, little is known about the influence of computerized testing on "at risk" groups of examinees or about the comparability of performance-based tests (e.g., direct writing assessments) administered in these two media, particularly for diverse populations of examinees. The purpose of this article is to compare computer-based and paperbased scores on the writing section of the Test of English as a Foreign Language (TOEFL) for a diverse population of international examinees.
LITERATURE REVIEW
What evidence exists to support concerns about the potential negative impact of computer-based testing on some populations of examinees? First, it is clear that some groups of examinees are less likely to have access to, and hence experience and proficiency with, computers. In the US, minorities and women are less likely to have computers in their homes, and males are likely to dominate computer use at school--the primary location within which some groups learn about and gain experience using computers (Campbell, 1989; Grignon, 1993; Keogh, Barnes, Joiner, & Littleton, 2000). Internationally, women, Africans, and Spanish speakers are less likely to have access to computers (Janssen Reinen & Plomp, 1993; Miller & Varman, 1994; Taylor, Kirsch, Eignor, & Jamieson, 1999). Similarly, one would expect older individuals who learned how to use a computer later in life to have less experience using computers, although it is not clear whether these individuals would have restricted access.
Second, inequities in computer access and familiarity may lead to lower levels of confidence and higher levels of anxiety toward computer-based tasks. U.S. minorities and women (internationally) exhibit higher levels of computer anxiety and lower levels of confidence for performing computer-related tasks (Janssen Reinen & Plomp, 1993; Legg & Buhr, 1992; Loyd & Gressard, 1986; Massoud, 1992; Nolan, McKinnon, & Soler, 1992; Shashaani, 1997; Temple & Lips, 1989; Whitely, 1997). Interestingly, the magnitude of group differences in anxiety levels is greatly diminished when computer experience is held constant (Gressard & Loyd, 1987; Loyd & Gressard, 1986), indicating that, to some degree, non-cognitive influences on computer-based testing may lessen as computers become more commonplace in society.
Finally, scores from computer-based tests are already being used widely to make important decisions about individuals, and it is clear that affective responses, like computer anxiety, and proficiencies, like levels of computer experience, are correlated with computer-based test scores at non-trivial levels (Marcoulides, 1988). From previous research concerning computer-administered direct writing assessments with international populations, it is also clear that groups who have had fewer opportunities to use computers (e.g., females and individuals from developing countries) are less likely to choose a computer-based administration model when given the choice (Wolfe & Manalo, in press).
When scores on standardized multiple-choice computer-based and paper-based tests are compared, the differences in test performance at a population level tend to be small, but examinees perform slightly better on the paper-based versions of the tests (Mazzeo & Harvey, 1988; Mead & Drasgow, 1993). Obviously, population-level comparisons do not allow researchers to ascertain whether the influence of computer administration on test performance is stronger for small portions of the population (Wise & Plake, 1989). For example, analyses of several large-scale multiple-choice tests indicate that females may receive higher scores on paper-based tests, but that, contrary to what one might expect, African-Americans and Hispanics receive higher scores on computer-based tests (Gallagher, Bridgeman, & Cahalan, 2002).
Studies concerning the impact of computers on the comparability of direct writing assessments are less common. The few studies that exist suggest that raters may be influenced by the appearance of essays in handwritten versus typed text. Specifically, raters may have higher expectations for word-processed text (Arnold, Legas, Obler, Pacheco, Russell, & Umbdenstock, 1990; Gentile, Riazantseva, & Cline, 2001), but they may also produce more reliable scores for word-processed text because handwriting effects are eliminated (Bridgeman & Cooper, 1998; Wolfe & Manalo, in press). Fortunately, readers can be trained to partially compensate for differential expectations they may have concerning the quality of handwritten and word-processed text (Powers, Fowles, Farnum, & Ramsey, 1994).
Regardless, the use of word processors seems to influence the quality of the writing produced by examinees. For example, handwritten essays contain shorter sentences (Collier & Werier, 1995), are better organized (Russell & Haney, 1997), are freer of mechanical errors (Gentile et al., 2001), and are neater, more formal in tone, and exhibit weaker voice (Wolfe, Bolton, Feltovich, & Niday, 1996) than word-processed essays. More important, however, there may be an interaction between computer experience or proficiency and composition medium with respect to essay quality. In studies conducted on school-aged children, examinees responding to direct writing assessment or performance assessment prompts who had less computer experience received higher scores when tested in handwriting, and examinees with higher levels of computer experience received higher scores when tested using computers (Russell, 1999; Russell & Haney, 1997; Wolfe, Bolton, Feltovich, & Bangert, 1996; Wolfe, Bolton, Feltovich, & Niday, 1996). We hypothesize that this relationship exists because the imposition of keyboard composition requires examinees with less computer experience to perform the equivalent of a translation in order to produce their text. These examinees may formulate their writing cognitively, but then they are required to translate those thoughts into keyboard strokes--a task that is not part of their natural written communication process. As a result, the use of word-processors by examinees with weaker computer and keyboarding skills interferes with the production of writing, but no such interference is encountered by examinees with stronger computer skills because keyboarding has become an automated process for these examinees. It is likely that such an effect would be more pronounced for examinees for whom English is a second language because these examinees would perform a double translation--native language to English and then English to keyboard strokes.
This article summarizes a study of the influence of composition medium on scores assigned to essays written for the TOEFL writing section. The study aims to determine the extent to which examinees with comparable levels of English language proficiency receive comparable scores on word-processed and handwritten TOEFL essays. Specifically, we addressed the following questions. Are there differences in the magnitudes of the scores assigned to essays composed in each mode of composition? Are there differences in the magnitudes of the scores assigned to essays composed in each mode, once the influence of English language proficiency is taken into account? Are groups identified as being potentially "at risk" by prior research more likely to exhibit inconsistent performance in the two modes of composition than are other groups of examinees?
METHOD
In this study, general linear modeling was employed to determine whether a main effect exists for computer medium and demographic characteristics with respect to essay scores when controlling for English language proficiency and whether an interaction exists between computer medium and English language proficiency with respect to essay scores for a large sample of TOEFL examinees.
Participants
Participants were 133,906 TOEFL examinees who participated in regular administrations of the computer-based TOEFL between January 24, 1998, and February 9, 1999--a small portion of the total number of examinees tested during this period. Only those examinees who provided complete demographic data, multiple-choice scores, and writing assessment scores were selected for this study. Participants were from 200 countries and represented 111 different languages. There were slightly more males than females (54% vs. 46%). Examinees ranged in age from 15 to 55 years--the average age was 24.26 years. The majority of examinees took the TOEFL for admittance into undergraduate or graduate academic programs (38% and 46%, respectively). In fact, 82% of the examinees indicated that they planned to pursue an academic degree. Only 15% of the examinees indicated that they were taking the TOEFL for reasons other than to satisfy academic requirements.




Mobile Edition
Print
Get the Mag
Weekly Updates