Editor

JLLT edited by Thomas Tinnefeld
Journal of Linguistics and Language Teaching
Volume 4 (2013) Issue 1
pp. 29 - 47


Measuring Productive Vocabulary
in English Language Learners


Gabriella Morvay (New  YorkUSA) / Mary Sepp (New YorkUSA)


Abstract  (English)
In the present study[1], we sought to examine the most effective way to measure productive vocabulary in order to determine its role in L2 writing.  For pedagogical purposes, we also considered the question of whether it is a worthwhile endeavor to invest valuable instruction time into teaching academic words to our ESL community college students, whose ability to write an analysis-synthesis essay based on a short reading determines whether they can enroll in credit-bearing courses. These goals were accomplished by testing students’ knowledge of academic vocabulary and calculating to what extent this knowledge relates to competent writing, i.e., what is deemed competent according to established criteria. 
Key words:  academic vocabulary; ESL writing;  L1 literacy,  L2 reading, vocabulary assessment

Abstract (Français)
Dans la présente étudenous avons tenté d’examiner la manière la plus efficace pour mesurer le vocabulaire productif afin de déterminer son rôle dans l'écriture L2À des fins pédagogiques, nous avons également examiné la question de savoir s’il est un effort utile d’investir du temps précieux dans l'enseignement du vocabulaire académique à nos étudiants de l'anglais comme deuxième langue étrangèrepour lesquelles la capacité d'écrire un essai d'analyse-synthèse basé sur une courte lecture détermine s’ils peuvent s'inscrire à des cours réguliers à l’universitéCes objectifs ont été atteints en testant les connaissances des étudiants de vocabulaire académique et en calculant dans quelle mesure cette connaissance se rapporte à l'écriture compétente, c’est-à-direce qui est jugé compétent selon les critères établis.
Mots-clés : vocabulaire académique, expressione écrite, anglais comme deuxième langue  étrangère, le savoir lire et écrire, compréhension écrite, examen de vocabulaire


1   Introduction
Achieving a competent level in college writing is a hurdle that many students must overcome when they begin their college careers.  This is especially true for ESL students whose first language literacy skills range over a wide scale.  As writing instructors, we strive to come up with a winning formula for our students.  In doing this, we try to find the right balance of what seem to be the obvious tools for good writing.  One of these tools is vocabulary. 

In order to help students build on their vocabulary, we need to have a way of assessing their lexical knowledge.  This might seem at first to be a straightforward task.  But if the vast amount of literature on the subject doesn’t dispel the misconception, we need only ask ourselves the following question: “What exactly does it mean to know a word?”  Is it enough to know its definition?   It would make sense that to claim knowledge of a word, one should know its meaning - but which meaning?  Polysemy is a very common feature of English words.  Furthermore, a person can look up a word’s definition and perhaps not know how to use the word in a sentence.  Thus, it must also be necessary to know a word’s syntactic behavior (e.g., part of speech, phrase structure), its semantic properties (e.g., male/female, animate/inanimate), and its potential semantic contexts.  There are numerous other requirements that might be considered, such as morphological properties (e.g., plural form), pronunciation, and spelling.  According to Bachman and Palmer (1996), knowledge of vocabulary is part of grammatical knowledge that involves knowledge of syntax as well as phonology and orthography.  Clearly, this is a complex and somewhat controversial issue.  Thus, it is critical for anyone involved in the assessment of lexical competence to establish a consistent set of criteria for determining word knowledge. 

When speaking of vocabulary in the context of writing, it is necessary to acknowledge the receptive/productive dichotomy of lexical knowledge.  In simple terms, “receptive” or “passive” vocabulary generally refers to the ability to read or hear a word and understand the word in that context.  “Productive” or “active” vocabulary is needed in order to speak and write.  This distinction, which is much more complex, particularly with respect to assessment, will be discussed in this paper. 

The paper also reports on data from a recent study (Sepp & Morvay, 2012) examining, among other things, the relationship between vocabulary knowledge, reading, and writing proficiency.  The research involved the creation of a series of three tests for vocabulary measurement. Taking into consideration two contrasting perspectives on the role of vocabulary in language assessment, the investigators measured whether learners know the meaning and usage of a set of words, taken as independent semantic units. This was done by means of a multiple choice test and a cloze test.  The third assessment instrument was an essay prompt, designed to assess their lexical ability in the context of a language-use task, namely the productive skill of writing. 

In measuring the productive vocabulary of ESL learners for this research, the authors decided to focus on a certain type of vocabulary, more precisely, academic vocabulary.  Rather than choose the target words randomly, a popular and highly regarded resource known as the Academic Word List (AWL) by Coxhead (2000) was used. This list contains 570 word families from a 3.5 million word academic corpus.  The AWL includes words occurring at least 100 times in the corpus that are not part of the  General Service List (GSL) (West 1953), a list of roughly 2,000 of the most commonly used words in English. According to Schmitt (2010: 79), “the AWL is the best list of academic vocabulary currently available, and is widely used in vocabulary research” . 

Data from the study were examined statistically, using Pearson Correlations.  The findings, while suggesting a weaker than expected correlation between vocabulary and writing competence, underscore the complex nature of assessing each of these skills and the need for research in this area. 



2   Background

The distinction between receptive (passive) and productive (active) vocabulary knowledge is usually accepted as given, and the notions are rarely challenged.  The question, however, of what is really meant by the two notions, is not really answered.  In other words, the terms are problematic on a conceptual level.  What are the criteria to categorize certain words as belonging to the receptive domain as opposed to the productive one, and how do we create reliable vocabulary tests that can accomplish this task?  It is assumed that prior to becoming productive, words are known receptively for both L1 and L2 speakers.  Consequently, there are more words that we understand and recognize than we are able to produce, though the ratio between the two is not constant (Nation, 2001).  ESL researchers consider listening and reading as passive skills since the learner is a passive recipient of the language, and correspondingly, speaking and writing to belong to the active, productive skills. 

One of the fundamental questions regarding the two constructs is the relationship between them.  Most researchers conceptualize productive vs. receptive knowledge as a continuum (Faerch, Haastrup & Phillipson, 1984; Melka, 1997; Palmberg, 1987; Pigott, 1981).  Henriksen (1996) created a model in which vocabulary acquisition moves along not two, but three continua: one is the partial-precise continuum, which is a knowledge continuum; the receptive-productive continuum is a control continuum which describes the different levels of access through different tasks, and the depth-of-knowledge continuum which entails not only the word’s referential meaning, but also paradigmatic and syntagmatic relationships. Henriksen hypothesizes that progress along this continuum is an important factor for the development of partial to precise meaning, since the knowledge of a given word grows in relationship to other words and their relationships with others.

One of the few researchers that opposes the conceptualization of the receptive-productive knowledge as a continuum, and rather takes the position of it being a dichotomy, is Meara (1990).  He believes that even though there are different ways of “knowing a word” and hence productive vocabulary exists on a continuum, receptive vocabulary is qualitatively different from productive vocabulary.  He applies graph theory (Wilson & Beineke, 1979) and examples of word associations to illustrate his point.  The importance of Meara’s hypothesis lies in pedagogical implications, for he claims that traditional methods that are used to practice words that are not well-known in order to make them productive, might not be effective.  He suggests that “exercises which deliberately stress the association links leading from already known words to newly learned words might be a more effective way of activating passive vocabulary” (Meara 1990: 154). 

While this debate continues, classroom teachers and researchers are busy testing vocabulary knowledge that they label either productive or receptive, because there is no straightforward definition of the constructs.  Melka (1997) points out that there has not been consistency in the ways the two types of vocabulary have been measured, for test formats such as check-lists, multiple-choice, and translation tests have been used for assessing both receptive and productive knowledge.

Most L2 vocabulary tests focus on receptive rather than productive aspects of knowledge, since this kind of ability is easier to elicit, and the tests are less labor-intensive to administer and score. Traditionally, the elicitation methods for receptive vocabulary are recognition tests, in which the test-taker is able to either recognize the target word when given its meaning, or to recognize the meaning of a target word when given meaning options (Laufer & Goldstein, 2004). These types of tests are also distinguished as active and passive recognition respectively.  On the other hand, in the case of recall, which is meant to measure productive knowledge, test-takers are provided with some stimulus designed to elicit the target word from memory.  Just like in the case of the above mentioned measures for receptive knowledge, Laufer & Goldstein (2004) distinguish active recall, which refers to the ability to supply the target word, and passive recall in which test-takers supply the meaning of a target word. 

As we can see, measuring vocabulary knowledge in English Language Learners (ELLs) merely seems to be a simple task.  It is especially tricky when we are targeting productive vocabulary.    Some of the most widely-accepted measurements of productive vocabulary are the cloze tests, such as the Productive Levels Test (Laufer & Nation, 1999), and Lex30 (Meara & Fitzpatrick, 2000), which is a stimulus-based test.  A very different category is producing an essay that can be analyzed in a number of ways, such as lexical originality, density, sophistication and variation (Laufer & Nation, 1995).  Fitzpatrick (2007) is concerned with the possibility of whether it is realistic to gather useful validity information about vocabulary tests that claim to measure a specific aspect of vocabulary knowledge, in this case, the productive aspect.  Due to the fact that vocabulary measures are used not only in second language  testing, but also in forensic linguistics, psychiatric studies of schizophrenia or emotional disorders, Lado (1961) demands a reliability coefficient of at least 0.9 before vocabulary, grammar and reading comprehension tests are considered to be “reliable”.  Also, Read (2000: 120) claims, for example, that “there has been surprisingly little research until recently to establish the Vocabulary Levels Test’s validity”.  In order to avoid this criticism, Fitzpatrick (2007) set out to assess Lex30’s concurrent validity.

Fitzpatrick (2007) used two tests of productive vocabulary; one was The Productive Levels Test, and the other was a translation test.  Given that this experiment was somewhat similar to our study, a brief description of it follows.  Lex30 is a relatively new test and is revolutionary in the sense that unlike tests that rely on the production of lengthy texts (Lexical Frequency Profile, or Type-Token ratio measures), it makes it possible to elicit “a lexically rich text in an economical way” (Fitzpatrick, 2007: 539). It is essentially a word association task; subjects are presented with a list of 30 stimulus words and are required to produce three or more responses to each of these stimuli.  The responses are processed and a mark is awarded for every infrequent word that the test-taker has produced.  An “infrequent word” is each word that falls outside the first 1,000-word frequency band (Nation, 1984).  The 30 stimulus words are chosen according to a set of criteria; for instance, they are all taken from the first 1,000-word list, and none of them elicits a single, dominant response.  The experimental tools were Laufer and Nation’s vocabulary-size test of controlled productive ability (1999) which is referred to as the Productive Levels Test.  In this test, eighteen target words are selected from each frequency band – five bands altogether- and are embedded in a contextually unambiguous sentence.  Therefore, the highest possible score was 90 (18 words X 5 bands). The first few letters of the target word are provided in order to eliminate other semantically plausible answers, and participants are required to complete each target word.  The knowledge demonstrated by the test-taker is undoubtedly productive in a sense that s/he has to provide the word as opposed to recognize it.  It is also controlled in that the subject is prompted to produce a predetermined target as opposed to free productive tasks, such as essay writing or oral presentation, or  Lex30, where there is no constraint as to requiring a specific word.  The second validation tool was a simple translation task from the subjects’ L1 (Mandarin).  Sixty randomly selected words (20 from each of Nation’s first three 1,000-bands) were selected, and each included the initial letter.  Each accurately produced target word regardless of its spelling was awarded a point.  Fitzpatrick (2007) points out that each of these three tests share certain characteristics that qualify them to be used for the concurrent validity tests.  Some of these shared characteristics include, for example, the fact that all three are similarly administered, and all three use the same frequency bands, which is in fact, central to the test designs.   

According to the results there were significant correlations between all three tests, but a particularly strong relationship between the translation test and the Productive Levels Test.  There is a more modest correlation between these two tests and Lex30.  Fitzpatrick explains the stronger correlation between the translation test and the PLT with the fact that the latter presents questions in blocks of 18, beginning with the band of 2,000-word level and getting progressively more difficult.  All subjects scored higher on the 2,000-level than on subsequent levels.  In fact, on average, subjects produced 14 correct answers at the 2,000- and 3,000 levels, and only 3 correct answers at the other three levels. This means that the PLT essentially measured subjects’ knowledge at the first 3,000 words.  Since the translation task tested words from the first 3,000 band, it is no surprise that the two tests correlated so strongly.  On the other hand, Lex30 awarded credit to any spontaneously produced word outside the 1,000-level band.  Another reason for the moderate correlation between Lex30 and the other two tools lies in the elicitation stimulus.  For example, the PLT asks subjects to complete target words in a sentence context while providing the first letter.  This means three different types of activation:  L2 semantic, L2 orthographic, and L2 collocational stimuli.  The translation test, on the other hand, provides subjects with L1 semantic stimulus and L2 orthographic stimulus given that the first letter is also provided.  Finally, Lex30 only provides one stimulus, the L2 semantic one. Therefore, because these tests activate knowledge in very different ways, the quality of knowledge might differ as well.  Fitzpatrick also speculates that the aspect measured by these tests might also differ somewhat.  Using Nation’s (1990) checklist of word knowledge, she claims that even though all three claim to measure productive knowledge, each tests somewhat different aspects of it, and the PLT measures some aspects of receptive knowledge as well.  

In our study, we used three different tests that varied in the degree of productivity: an essay task, a multiple-choice test, and a cloze test. We decided to use an essay format for one of the test instruments given the fact that promotion to credit-bearing classes is dependent on passing a writing test. The task that also involved receptive knowledge relied on multiple-choice recognition. All target words were selected from the Academic Word List.  The cloze task relied exclusively on production, and similarly to the PLT in the above-mentioned study, provided participants with the initial letter of the target word, thus aiding them orthographically and contextually.

Let us briefly examine the three formats.

i) Multiple-choice test
This is one traditional method of recognition-type tests, and so it assesses mostly receptive knowledge.  Although the multiple-choice format is one of the most widely used methods of vocabulary assessment for both L1 and L2 speakers, its limitations have been widely acknowledged.  Some of the criticisms spelled out by Wesche & Paribakht (1996) include the fact that they are difficult to construct and the test-taker may choose the right word by a process of elimination, giving them a 33% chance of guessing correctly in a three-alternative format.  Consequently, it might test students’ knowledge of distractors rather than their ability to identify the exact meaning of the target word.  Despite these criticisms, they are still popular formats and there is relatively little ongoing research on these types of tests. 

ii) Cloze test
Cloze tests have been used for a variety of purposes.  Initially, they were used to evaluate the readability of tests for both L1 and L2 readers. In our study we used the selective-deletion, or rational cloze test, which is a modified version of the standard cloze test.  In the standard cloze test a fixed ratio (e.g. every seventh word) of words is deleted and replaced by a blank of uniformed length.  In this kind of vocabulary test the test-writer deliberately chooses the words to be deleted, according to a certain principle.  In our case, these deliberately deleted ten words came from the Academic Word List. 


iii) Essay writing
Due to its labor-intensive nature, essay writing is not too often used as a test instrument for vocabulary measurement in ESL studies.  Our rationale for choosing it as one of the formats in our study was the fact that at our institution, remediation exemption or requirement depends on students’ essay-writing ability.  Furthermore, essays allow for the flexibility of obtaining a variety of data.  Thus, essay length and vocabulary range were also measured.  Two key concepts in corpus studies, and so in essay writing tasks are tokens and types, and the distinction between them.  Tokens refer to the total number of words in a text, regardless of the number of occurrences.  This gives the essay length or “word count”.  Types, on the other hand, refer to the number of different words.  The relative proportion of types and tokens is known as the type-token ratio or TTR.  The TTR is a measure of the vocabulary range of the essay.

Given the fact that it is impossible to test all the words that the native speakers of a language know, selecting test items from word lists that are based on frequency distributions is the most common practice in ESL vocabulary testing.  Some lists (e.g., the General Service List by West, 1953; or Teacher’s Word Book by Thorndike and Lorge, 1944) involve mostly the high-frequency items, but others (e.g. University Word List by Xue and Nation, 1984; or the Academic Word List by Coxhead, 1998) contain the so-called subtechnical vocabulary which tends to occur relatively frequently and across a range of registers in academic and technical language. Coxhead (2000: 214) claims that “an academic word list should play a crucial role in setting vocabulary goals for language courses, guiding learners in their independent study, and informing course and material designers in selecting texts and developing learning activities”.  We, consequently, in our rationale for selecting the target words for our tests, were guided by the need of our community college students to acquire an adequate amount of college-level vocabulary for the purpose of successfully pursuing their higher education goals.



3   Research Summary

In Sepp & Morvay (2012), students at an urban community college were tested in academic vocabulary and morphosyntactic knowledge. The test results were compared to their performance on standardized reading and writing tests, which determine whether they are ready for credit-bearing college composition classes.  The discussion here revisits the vocabulary data produced from the earlier study, exploring a few additional variables such as word count and TTR.

The vocabulary tests were administered to 95 students, who were enrolled in an intensive writing course for advanced ESL students during the spring 2011 semester.  Seven advanced ESL writing classes participated in the study.  Each class consisted of approximately 25 students from a variety of linguistic backgrounds.  The statistical analyses presented here are based on the entire group of 95 students and also a subset of this group characterized by a higher word count on the essay task.

One goal of the research was to ascertain the most effective way to measure the productive vocabulary of ESL students in advanced developmental writing courses. To that end, the investigators compared the results of the three test types in order to determine whether there was a significant correlation between any of the tests and the outcome of a standardized writing test, in this case, the CUNY Assessment Test in Writing (CATW).  Students are required to pass the CATW in order to exit the college’s remedial writing course sequence, so it is a high-stakes test. It is also a formidable challenge for many of these students.  The CATW is a reading-based essay test and it is scored analytically for content, structure, and language use.  Since reading is a component of the writing exam, we also looked for correlations between vocabulary and reading in the dataset. 


3.1 Test Design

The context for the multiple choice and cloze tests was derived from three articles on three different topics (see Appendix), which were adapted for assessment of the target skills.  Thirty target words (10 per topic) from the Academic Word List (AWL) developed by Coxhead (2000) were used in the vocabulary tests.

Three tests were created to assess academic vocabulary:    
    
i.                Multiple Choice Vocabulary (Test 1) - 10 target words were tested in a multiple choice format with a 15-minute time constraint.  Students were given three choices per item.  The non-target words selected for the tests were intended to give students pause but not to confuse them.  In each case, the target word should be unambiguously correct.
               e.g., During the first few years of life, it's (crucial/ trivial/ decisive) to meet a child's 
                nutritional needs.(from Test 1, topic A – see Appendix)

ii.              Cloze Vocabulary (Test 2) – the same 10 target words in Test 1 were deleted from a passage (except for the first letter) where students had to provide the word under a 20-minute time constraint.  Only the first letter was provided because providing more than that made the task more of a receptive task than a productive one.
               e.g.,  During the first few years of life, it's c                      to meet a child's nutritional 
               needs in order to e                      proper growth and also to e                 a lifelong habit 
               of healthy eating.(from Test 2, topic A– see Appendix)

iii.             Essay - students were instructed to write a 60-minute timed essay on a specified topic and to incorporate 10 target vocabulary items into their writing.  The topics were of a fairly general nature and designed to mimic the theme of the texts used in the other two tests in order to reasonably elicit the target words. For example, students were asked to try to use words such as portion, ensure, and crucial in their response to the prompt below.

Essay prompt A
It is common knowledge that childhood obesity is a problem in this country.  What can parents or educators do to help end this problem? 

Testing took place between week 3 and week 11 of a 15-week semester.  All tests were administered in the classroom. The two vocabulary tests that were given in weeks 3 and 5 were administered in a randomized order between the various classes; in other words, some classes were given Test 1 and then Test 2 and others were given Test 2 and then Test 1.  The essay was administered during week 11.

A total of 130 students completed the essay task. The average length was 330 words and the topic didn’t seem to have any effect on how much students wrote, as averages were virtually the same for all three topics.

Tests were scored by the co-investigators and then data were compiled. To ensure accuracy and consistency, scoring of all tests was cross-checked by the two co-investigators.  Test results from a set of 80 were used in the original data analysis. But since there were 95 students who completed all three vocabulary tasks, data were reanalyzed to include the additional 15 cases.  In addition,  two other variables were considered:  essay length (word count) and vocabulary range (type-token ratio).  Thus, a total of 5 vocabulary variables were used in the statistical analysis. These are identified and defined below:

i.        MultiV – percent correct on multiple choice vocabulary test
ii.      ClozeV – percent correct on cloze vocabulary test
iii.    EssayV – percentage of 10 target words student was able to use correctly at least once
iv.     TTR – type-token ratio per essay
v.       Word count -  total number of tokens per essay



3.2 Scoring Criteria

Scoring the multiple choice tests was fairly simple. For the cloze vocabulary test, the first letter of the word was provided, so in many cases there were a number of non-target words that fit the context as well.  Such words were accepted as correct.  In addition, while the response in general should match the syntactic category of the target word, if an adverb form were used instead of an adjective – e.g., proportionately instead of  proportionate – and it fit both the structural context and the meaning, then it was accepted.  Errors in tense and/or number were tolerated, as were minor spelling errors.  The criteria for scoring the essay were similar to those used for the cloze tests:

i.        Scores were calculated as the percentage of the 10 target words the writer was able to use correctly (using each correctly once was sufficient).
ii.      Correct responses had to be globally and locally plausible.  The vocabulary item also had to be written in the form of the correct syntactic category, with the exception of adjectives and adverbs, in which case either form was accepted as long as it was deemed appropriate to the context.  Minor spelling errors and errors in tense and/or number were tolerated.
iii.    Credit was given for target academic words only.

The actual scoring of the essay was of course a more complicated process due to the range of contexts that had to be considered, as well as differences in students’ syntactic ability. 


3.3 Analysis of Data

Average test scores were compared vis-à-vis both developmental writing and reading competence.  Reading competence is taken as the numeric score attained on the university’s standardized reading test. Writing competence was determined based on whether the student passed the writing test at the end of the semester in which the research was conducted.  Of the 95 participants in this group 24% passed ESL, while 74% passed reading.

In addition, Pearson Correlations were calculated to determine the relationship between test scores, vocabulary range, essay length, and reading and ESL course outcomes.

With regard to average scores, the results (see Table 1) showed slightly higher average scores on the three vocabulary tests for students who failed ESL095 at the end of the semester. The essays were used to calculate two additional measures related to vocabulary size: type-token ratio (TTR), which measures the range of vocabulary used in a text, and essay word count.  The average TTR was also higher for students who failed ESL, while the average word count was considerably higher for passing students at 417:310.

N=95
multiV
clozeV
essayV
essay word count
essay TTR
Passed ESL
0.73
0.28
0.29
417
0.45
Failed ESL
0.75
0.34
0.32
310
0.49
All students
0.75
0.33
0.31
336
0.48
Table 1:  Average scores based on ESL outcomes

With respect to reading level, vocabulary scores were about the same on average in all categories except EssayV, as shown in Table 2.  In this case, students who passed reading performed a bit better.

N=95
multiV
clozeV
essayV
essay word count
essay TTR
Passed Reading
0.74
0.33
0.32
336
.48
Failed Reading
0.75
0.34
0.29
335
.48
All students
0.75
0.33
0.31
336
.48

Table 2:  Average scores based on Reading outcomes

To gain a more reliable sense of the relationship between vocabulary and writing / reading competence, Pearson Correlations were computed.  This was done for the set of 95 and also for a subset of 26, which included data from participants who wrote longer essays in test 3.  Essays in this subset contained 400 or more tokens, and this dataset is therefore referred to here as “400plus”.  Three correlations tables are presented below, showing the results for:  the core set for vocabulary/writing (Table 3); the 400plus group for vocabulary/writing (Table 4); and the 400plus group for vocabulary/reading (Table 5). Vocabulary/writing and vocabulary/reading data are provided in separate tables for easier comparison.

In Table 3, Pearson correlations were calculated for vocabulary variables, showing a .01 level of significance between clozeV and multiV.  In addition, there were significant correlations at the .05 level between essayV and multiV.  Establishing a correlation between student scores on the three tests shows a relative consistency in the performance of these assessment instruments.

vocab-eslfinal-95

multiV
clozeV
essayV
essay word count
TTR
ESL final
multiV
Pearson Correlation
1
.339**
.346**
-.280**
.183
-.067
Sig. (2-tailed)

.001
.001
.006
.076
.520
N
95
95
95
95
95
95
clozeV
Pearson Correlation
.339**
1
.830**
-.170
.145
-.137
Sig. (2-tailed)
.001

.000
.099
.161
.186
N
95
95
95
95
95
95
essayV
Pearson Correlation
.346**
.830**
1
-.120
.086
-.075
Sig. (2-tailed)
.001
.000

.246
.410
.469
N
95
95
95
95
95
95
essay word count
Pearson Correlation
-.280**
-.170
-.120
1
-.705**
.385**
Sig. (2-tailed)
.006
.099
.246
-
.000
.000
N
95
95
95
95
95
95
TTR
Pearson Correlation
.183
.145
.086
-.705**
1
-.218*
Sig. (2-tailed)
.076
.161
.410
.000

.034
N
95
95
95
95
95
95
ESL final
Pearson Correlation
-.067
-.137
-.075
.385**
-.218*
1
Sig. (2-tailed)
.520
.186
.469
.000
.034

N
95
95
95
95
95
95

Table 3: Pearson Correlations for vocabulary features and ESL outcomes – core set
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).

Vocabulary range (TTR) and essay length are negatively correlated (-.705) at a .01 level of significance.  This reflects the fact that shorter essays normally repeat fewer words. 

There was a significant (p<.01) correlation between essay length (word count) and ESL095 outcome. But there was no correlation between the results of the vocabulary tests and the ESL outcome.  Moreover, there was a negative correlation between ESL writing outcome and TTR.  Again, this result is not surprising since many of the failing essays were rather short.  In order to adjust for the length issue, we decided to analyze a subset of the data which contained essays of 400 or more words.  Pearson Correlations were recalculated and the same pattern emerged.  The correlations are represented in Table 4 below. 

400plus
N=26
ESL Final
multiV
-.324
clozeV
-.076
essayV
-.170
essay word count
.459*
TTR
.079
Table 4: Pearson Correlations for vocabulary features and ESL outcome – 400plus
*. Correlation is significant at the 0.05 level (2-tailed)

The data revealed a significant correlation at the .05 level between essay length and ESL writing outcome but no correlation between range of vocabulary and ESL outcome.  The 400plus essays were also analyzed based on reading ability (Table 5), and while essay length showed no correlation here, the correlation between range of vocabulary (TTR) and reading was significant at the .05 level.

400plus
N=26
Reading P/F
multiV
.196
clozeV
.021
essayV
-.028
essay word count
-.175
TTR
.408*

Table 5: Pearson Correlations for vocabulary features and
reading outcome – 400plus dataset
*. Correlation is significant at the 0.05 level (2-tailed).



4   Discussion and Conclusions

The lack of correlation between students’ performance on the three vocabulary tests and writing outcomes suggests that targeting a specific set of vocabulary items may not be an effective way to approximate students’ lexical competence.  Furthermore, using the AWL as a basis for testing vocabulary ability may be too limiting. While it is a valued resource in present-day research, it derives from a corpus of edited academic texts, and thus may not be the best indicator of a non-native speaker’s vocabulary size.  Also, the Academic Word List is made up of high frequency academic words, and good writers may simply opt for less frequent but equally appropriate words than those in the AWL.  In other words, one can potentially demonstrate a good command of vocabulary without using words from the AWL. 

The fact that reading correlated to a broader vocabulary is not surprising.  Likewise, the connection between relatively longer essays and better writing skills also makes sense.  On the other hand, the notion that a broader vocabulary does not necessarily correspond to competent writing, even in the 400plus dataset, may seem counterintuitive.  In fact, a previous study conducted by one of the authors (Sepp 2010) revealed that TTR (vocabulary range) did significantly correlate (p<.05) to a positive outcome in ESL writing.  However, the latter research was based on a different standardized essay test (CUNY ACT), which was scored holistically.  The CATW is scored analytically, meaning that readers assign individual scores for different aspects of structure, content and language use. Language use is scored for sentence variety, vocabulary, and sentence mechanics.  Thus, it might be surmised that readers who are scoring analytically pay more attention to clarity of expression than range of vocabulary.  This, however, is a topic for another study.

On the other hand, the results might also be different with a larger dataset. There is no doubt that better writers tend to have a better vocabulary, but the question here, from a pedagogical perspective, is how much emphasis should be placed on vocabulary instruction in order to help ELLs reach a level of acceptable competence.  What makes a college student’s writing good enough? 

In the end, the relationship between vocabulary and competent writing may be intrinsically linked to what the assessors are looking for and how “competence” is perceived.


5   Future Research

For future studies of assessment, an important aspect of validation will have to be a conceptual clarification of what the particular test is supposed to measure.  As Fitzpatrick (2007) indicates, instead of trying to find a 100% reliable and valid test to measure one particular aspect of vocabulary ability, we should engage in investigating, for example, how different learners access lexical knowledge at different stages of development.


References

Coxhead, A. (1998).  An academic word list (English Language Institute Occasional Publication No. 18).   Wellington, NZ: Victoria University of Wellington.

Coxhead, A. (2000).  A new Academic Word List.  TESOL Quarterly 34 (2), 213-38.

Fitzpatrick, T (2007).  Productive vocabulary tests and concurrent validity.  In H. Daller, J. Milton, and J. Treffers-Daller (eds.), Modelling and Assessing Vocabulary Knowledge (pp. 116-132).  Cambridge: Cambridge University Press.

Fitzpatrick T. & Clenton, J. (2010).  The challenge of validation: Assessing the performance of a test of productive vocabulary.  Language Testing, 27(4), 537-554.

Henriksen, B.  (1996).  Semantisation, retention and accessibility: Key concepts in vocabulary learning.  Paper presented at the AILA Congress, JyvaskylaFinland.  August 1996.

Faerch, K. Haastrup, K. &Phillipson. (1984). Learner Language and Language Learning.  Copenhagen: Multilingual Matters.

Laufer, B. (1998).  The development of passive and active vocabulary in a second language: same or different? Applied Linguistics , 19, 255-271.

Laufer, B. &Goldstein, Z. (2004).  Testing vocabulary knowledge: Size, strength, and computer adaptiveness.  Language Learning, 54, 399-436.

Laufer, B.  &  Nation, P. (1999).  A vocabulary-size test of controlled productive ability.  Language Testing, 16, 33-51.

Lado, R.  (1961).  Language TestingLondon: Longman.

Meara, P. (1990).  A note on passive vocabulary.  Second Language Research, 6 (2).  150-154.

Meara, P. & Fitzpatrick, T. (2000).  Lex30: an improved method of assessing productive vocabulary in an L2.  System, 28, 19-30.

Melka, F. (1997).  Receptive vs. productive aspects of vocabulary.  In N. Schmitt and M. McCarthy  (eds.), Vocabulary: Description, Acquisition, and Pedagogy (pp. 84-102). CambridgeCambridge University Press.

Nadarajan, S. (2008).Assessing in-depth vocabulary ability of adult ESL learners. The International Journal of Language, Society and Culture, 26, 93-106.

Nation, I. S. P. (1983).  Testing and teaching vocabulary.  Guidelines, 12-25.

Nation, I.S.P. (1990).  Teaching and Learning Vocabulary.  Boston, MA: Heinle and Heinle.

Nation, I. S. P. (2001).  Learning vocabulary in another language.  CambridgeCambridge University Press.

Nation, I.S.P.  (2006). How large a vocabulary is needed for reading and listening?  The Canadian Modern Language Review, 63,1,  59-82.

Palmberg, R. (1987).  Patterns of vocabulary development in foreign language learners.  Studies in Second Language Acquisition, 9, 201-220.

Pearson, P. D., Hiebert, E. H., and Kamil, M. L. (2007).  Reading Research Quarterly, vol. 42, no. 2, 282-296.

Pigott, P. (1981).  Vocabulary growth in EFL beginners.  MA project, Birkbeck CollegeLondonEngland.

Read, J. (2000).  Assessing Vocabulary.  CambridgeCambridge University Press.

Read, J.: (2007).  Second language vocabulary assessment: Current practices and new directions.  International Journal of English Studies, vol. 7 (2), 105-125.

Sepp, M. (2010). Getting Assessment Right. The Inquirer, fall 2010..BMCC, CUNY.
Sepp, M. & Morvay, G. (2012). Productive Vocabulary, Morphosyntactic Knowledge, Reading Ability, and ESL Writing Success. Iranian Journal of TEFLL,  vol. 2 (2), 3-22.

Thorndike, E. & Lorge, I. (1944).  The teacher’s word book of 30,000 words.  New York: Teachers CollegePress.
Wesche, M. & Paribakht, T. S. (1996).  Assessing second language vocabulary knowledge: depth vs. breadth.  Canadian Modern Language Review 53, 13-39.

West, M. (1953). A General Service List of English words.  London: Longman, Green.

Wilson, R. & Beineke, L. (1979).  Applications of graph theory.  New York: Academic Press.

Xue, G. & Nation, I.S.P. (1984).  A university word list. Language Learning and Communication, 3, 15-29.




Appendix
Topic A (for Tests 1 and 2)

Practice Healthful Eating

During the first few years of life, it's crucial to meet a child's nutritional needs in order to ensure proper growth and also to establish a lifelong habit of healthy eating…

(Adapted from August 2010 Readers Digest online article, “Childhood Nutrition: Food for the Growing Years”. Adapted text length – 273 words.)



Topic B

School Uniforms
To require uniforms or not to require uniforms: that is the question many school districts are debating these days. Students in many cities are wearing uniforms to school...
(Adapted from “School Uniforms” by Hannah Boyd, http://www.education.com/magazine/article/School_Uniforms. Adapted text length – 300 words.)


Topic C

Classroom Revolution
Students of almost every age are far ahead of their teachers in computer literacy.  ... So how is this digital revolution affecting education?  According to a federal study, most schools are essentially unchanged today despite reforms ...
(Excerpted from “Classroom Revolution” by Mortimer Zuckerman, in U.S. News and World Report, Oct. 10, 2005, p. 68. Adapted text length –  347 words.)



Authors:

Gabriella Morvay, Ph.D.
Assistant Professor of Linguistics and ESL
Borough of Manhattan Community College
City University of New York
E-mail: gmorvay@bmcc.cuny.edu

Mary Sepp, Ph.D.
Assistant Professor of Linguistics and ESL
Borough of Manhattan Community College
City University of New York
E-mail: msepp@bmcc.cuny.edu




[1] This research was supported by a BMCC Faculty Development Grant.