Editor

JLLT edited by Thomas Tinnefeld
Journal of Linguistics and Language Teaching
Volume 6 (2015) Issue 2

An Integrated Diagnostic Vocabulary Test

for Young EFL Learners

 
Istvan Jerry Thekes (Szeged, Hungary)

Abstract
The goal of this study was to validate an integrated diagnostic vocabulary test for young learners of English as a foreign language. The research questions were: 1) How difficult was each task type? 2) Which items were inappropriate in the test battery? 3) Which word class proved to be the easiest and the most difficult one? 4) How did the different task types correlate? The vocabulary test battery was administered to 103 students in Hungary in November 2013. The reliability of the test battery proved to be acceptable (Cronbach's alpha = 0.763). Students scored highest in the recognition of nouns. The two listening tasks had the strongest correlations. 
Key words: vocabulary, diagnostic assessment, validation

1 Introduction
The construct of vocabulary is a popular research area in the literature on foreign language acquisition, which is not only in the focus of scholars but also in that of teachers. Educators have been encouraged (Lewis 1993, Thornbury 2002) to promote the intentional learning of words in the classroom. Since the early 1990s teachers have laid special emphasis on teaching vocabulary as well (Fitzpatrick, Al-Qarni & Meara 2008). Successful language acquisition is greatly determined by foreign language word knowledge (Schoonen & Verhallen 2008). Both adults’ and young learners’ English as a foreign language vocabulary has mainly been assessed comprehensively as part of a test measuring general language knowledge. Thus we have hardly any data concerning young learners’ vocabulary. The primary goal of this study was to create and an integrated diagnostic vocabulary test for young learners of English as a foreign language. The secondary goal was to analyze each and every item in this test that is pioneering in the sense that there is no validated integrated vocabulary test for young learners to the best of our knowledge. Webb and Sasao (2013) have lately proposed an attempt to create an integrated test. However it is meant to assess adults. By integrated test, we mean a tool that measures both productive and receptive word knowledge. The tool also needs to be diagnostic so as to map the English as a foreign language word knowledge of young learners. Diagnostic tests are developed for the purpose of exploring knowledge acquired before a given learning process so they have major classroom implications (Vidákovich 1990). The pilot study was administered as a paper and pencil test in order to gain useful data for a reconsideration of the task types and items involved. The ultimate purpose of this pilot study was to create an online integrated vocabulary test which will help to diagnostically map students’ lexical knowledge in an efficient way.

2 Investigating Word Knowledge
Word knowledge is interpreted along several dimensions (Doró 2013, Nation 2001, Schmitt 1998), and two important aspects of it are distinguished by the researchers (Laufer & Nation 2001, Read 2000): breadth and depth of vocabulary. The notion breadth of vocabulary means the quantitative trait of vocabulary, i.e. how many words a student knows whereas depth of vocabulary means the qualitative trait of vocabulary and is characterized by the syntagmatic relationships between words and the inner structure of words (Nation 2001, Read 1999, Vidákovich & Cs. Czachesz, 2006). The breadth and depth of an individuals’ vocabulary determines his or her reading comprehension to a great extent (Nagy 2004). Meara (2009) asserts that the interpretation of vocabulary breadth equals the number of words learners know. Depth, on the other hand, means how well learners know these words. Another essential distinction in this construct is the receptive and productive word knowledge (Nation 2001). Receptive lexical knowledge means that a student is able to recognize spoken and written words whereas productive knowledge is the ability to use words in spoken or written discourse.
Henrikssen’s (1999) three-dimensional description of word knowledge also merits attention. Her dimension of vocabulary knowledge is proposed on a spectrum of partial to precise, of shallow to deep and that of receptive to productive ability. Other studies refer to her second dimension, the one concerning depth as the quality of word knowledge. She argues that it may even happen that a learner will never possess the complete knowledge of a given word. However, she asserts that the complete knowledge of a given word may not even be necessary to understand texts. In her view, a kind of network building is involved in the process. She defines this process as “developing and handling new sense relations between words” (Henrikssen 1999: 308).
More recently, Meara (2009) has proposed an alternative to the notion of depth and breadth of vocabulary knowledge by stating that vocabulary knowledge is rather more than the sum of learners’ knowledge of individual words in their vocabulary. As he states, it is not so interesting how deep and how broad a given vocabulary knowledge is, but rather, how the individual words interact with one another. He claims that “these interactions are what distinguishes between a mere vocabulary list and a vocabulary network“ (Meara 2009: 76). He even goes further in his book by stating that the breadth / depth distinction is an unfortunate one. He proposes that the terms size and organization will be used instead in future research.
The selection of words in every validated vocabulary test is grounded on corpora (Vidákovich, Vígh, S. Hrebik & Thékes 2013). Corpus linguistics is a fast developing field of applied linguistics. Its application is a major help for vocabulary learning and also for teaching researchers as well as teachers (Lehmann 2009). A large amount of corpora are being developed all over the world for a lot of languages and for a lot of jargons, too. For example, there exist corpora of car mechanics jargon, spoken Scottish English jargon or legal jargon. The major general corpora available around the word are the LOB Corpus, CANCODE, the British National Corpus, and COBUILD. Horváth (2001) claims that a corpus can provide a lot of information in terms of word frequency, collocations as well as lexical and syntactic patterns. These are the pieces of information that are necessary in the selection process of vocabulary when lexical tests are created. All the major vocabulary tests have a corpus-based aspect, and the different levels of difficulty of the various vocabulary tests are determined based on frequency lists Nation 2001). Frequency is the most basic concept that is examined in corpus linguistics. The most elementary issue that can be concluded from studying the language in a corpus is how many times a particular word occurs. The earliest corpora in research gave the frequency of a word as the first piece of information. At the dawn of corpus linguistics, it took scholars a long time to count the frequencies of words, but nowadays, with technology-based support, it is a matter of seconds.

2.1 Vocabulary Tests
As Schmitt (2008) points out, there is no commonly accepted standardized test of English vocabulary available. What is accepted as a commonly valid test was devised by Nation (2001). It bears the name of Vocabulary Levels Test (VLT). Instead of giving an estimate of vocabulary size, words at four frequency levels are measured: 2,000, 30,000, 5,000 and 10,000. Students find six items on the left side and three synonyms of three of the six given items on the right side. They are expected to match the three synonyms with three of the words on the left side. Three items are distracters. Since the test gives estimates of vocabulary size at five levels, it can be applied for formative assessment purposes and for the diagnosis of vocabulary gaps. Table 1 presents the sample task of the Vocabulary Levels Test:
1 bitter

2 independent
small
3 lovely
beautiful
4 merry
liked by many people
5 popular

6 slight

Table 1: Vocabulary Levels Test
Another vocabulary measure which can serve the purpose of self-assessment is the widely-known Vocabulary Knowledge Scale (VKS) (Paribakht & Wechse 1999). Schmitt (2008) praises this type of vocabulary measurement by saying that it emphasizes what students know, rather than what they do not know. By allowing them to show their partial knowledge of a lexical item, it may be more motivating than other types of tests (Schmitt 2008: 175). Students must indicate their knowledge of the given word on a scale of no knowledge to productive knowledge:
1. I don’t remember having seen this word before.
2. I have seen this word before, but I don’t know what it means.
3. I have seen this word before and I think it means………………………................
4. I know this word. It means………………..............................................................
5. I can use this word in a sentence:……………………………….............................
Table 2: Vocabulary Knowledge Scale
Besides Nation’ Vocabulary Levels Test and Paribakht and Wechse’s Vocabulary Knowledge Scale, the popularity checklist tests are also tools often used for diagnostic purposes. They are simple tests in the sense that all that students need to do is to check if they know a given item it or not. The evident problem is that most students overestimate (Orosz 2009: 188) their knowledge and might check many more words than they actually know. In order to compensate for this, nonwords that look like real words are put in the test and when they are checked, students are penalized for that with minus points. Meara is acknowledged for having developed a test called EFL Vocabulary Tests (Meara 1990), and a commercialized computerized version is called the Eurocentres Vocabulary Size Test (EVST) (Meara & Jones, 1990), which was further developed into the X-Lex test (Meara & Milton 2003).
As far as productive knowledge of vocabulary is concerned, Laufer and Nation (1995) developed an instrument that measures productive word knowledge. In this test students see sentences. In each sentence, only the initial letters of a word are given. Students must write the missing part of the word. This test is named Productive Vocabulary Levels Test:
  1. He likes walking in the fo……………… because the trees are beautiful there.
  1. He takes cr..........................and sugar in his coffee
  1. The actor took the st………… to perform in the long-awaited play.
Table 3: Productive Vocabulary Levels Test

2.2 Assessing Young Learners’ Vocabulary
Although the above-mentioned data collection instruments have been designed to assess university students or adults, there have been studies reporting on the testing of young learners’ word knowledge as well. Nikolov and Mihaljevic Djigunovic (2011) state that by young learners, they mean primary school students up to the age of 14. The most comprehensive book on the assessment of young foreign language learners was compiled by McKay (2006). The book also has implications as far as vocabulary learning is concerned. It is typical of young learners studying vocabulary that they use memorized chunks. In this sense, their knowledge is implicit; explicit studying ability, that enables learners to understand rules, is acquired during adolescence only (Nikolov & Szabó 2011). Most of the young learners learn words quickly. However, after they have acquired the ability to recognize words, the ability to use connotations, shades of meaning, synonyms and antonyms is only acquired as a result of a long process of learning (Cameron 2001). In the literature, it has also been emphasized that until the age of twelve, students know a limited amount of words only (Laufer 2000). Students hardly ever know the secondary meanings of words (Schmitt 1998), and they have limited awareness of the derivative forms of a word (Schmitt & Zimmerman 2002).
A number of studies have tried to explore the vocabulary size of young learners. In the study conducted by Jiménez Catalan és Terrazas Gallego (2008), the receptive vocabulary of Spanish 4th graders (N=270) was diagnostically explored. At the time of data collection, students had learnt English for three years (three lessons a week). The VLT was used as the test up to the 2,000 most frequent words. In this study, a strong relationship was found between the frequency of a given word and students' knowledge of it. The same test was used in a longitudinal study in the frame of which 224 4th graders’ word knowledge development was studied in the course of three years Terrazas Gallego és Agustín Llach (2009). According to the results, the vocabulary of learners goes through a significantly fast developmental process. By transforming the results received in the test, it was estimated that out of the most frequent 1,000 words, 4th graders knew an average of 361 in average whereas this figure was 817 by the end of the seventh grade.
In a cross-sectional study, the receptive vocabulary of Hungarian students from third to sixth grade (N=253) was assessed with the X-Lex test (Orosz 2009). Her finding was that learners’ word knowledge develops fast and gradually. Sixth graders’ size of vocabulary was estimated to be around 1700 words. Based on the results of the variance-analysis, it was also asserted that there was a significant difference among third and sixth graders.
In another study done in Hungary, an online receptive word knowledge test was used to assess Hungarian sixth and seventh graders English and German as a foreign language vocabulary (Vidákovich et al. 2013). The assessment was based on 216 English and German words with identical word meaning, CEFR-level, and similar word frequency. The two tests consisted of 54 tasks. Each task included a simple or complex picture and four words which were true-false items. Students had to make a decision as to whether the given word matched the picture through identification or implication. Three test versions were developed, using the same test construction in terms of task structure, operation, and CEFR-level. It was found that those students who studied English reached a significantly higher score than those who studied German.
In Taiwan, a new vocabulary test was created so as to assess the recognition speed of vocabulary among nine-year-old 4th graders (Yu-Cheng 2008). During the test, learners had six seconds to signal the meaning belonging to either a picture or native language equivalent. This test was applied in a control group study with 64 participants as well. While the treatment group learnt vocabulary with the help of visual input, the control group learnt vocabulary with the help of L1 words. The finding was that there was no significant difference between the two groups with regards to the recognition speed of words.
In research exploring the receptive vocabulary of young learners, visual input has been used in several cases (Schmitt, Schmitt & Clapham 2001, Yu-Cheng 2008). An example is the Peabody Picture Vocabulary Test (Dunn & Dunn 1997), used to assess receptive vocabulary. With the help of this test, teachers can also gain information concerning the deficiency in students’ word knowledge in order to plan the learning process more efficiently. In this test, which is also used for the assessment of students with special needs, students see four numbered pictures. The researcher says one word and the learners need to decide which picture can be matched with the word in question.
We decided to assess Hungarian sixth graders’ English vocabulary with an integrated test because there had only been two studies (Orosz 2009 and Vidákovich et al. 2013) that measured Hungarian young learners’ vocabulary. Gaining information on learners’ receptive word knowledge might be useful. Adding productive tasks and audio tasks to an instrument can, however, provide us with more relevant data.

3. The Study
3.1 Research Questions
With regards to the goals of the present study, the following research questions were formulated:
  1. How difficult was each task type?
  2. Which items were inappropriate in the test battery?
  3. Which word class proved to be the easiest and which the most difficult one in the test battery?
  4. How do the different task types correlate?

3.2 Participants
The students taking the test were sixth graders (N = 103) in four Hungarian primary schools. Careful selection took place in terms of the number of English lessons per week. Only students in classes of general curriculum were selected. This means that learners had three English lessons a week in the school-year. This also means that they were studying English as a foreign language in three lessons a week when data collection was carried out and that they had been learning English since the 4th grade. Learners knew that they would take a vocabulary test as they had been informed a week before the test was administered. Motivation was ensured by the teachers who promised all participating students a grade 5, equivalent of grade A in the Anglo-Saxon education system.

3.3 Research instrument
A diagnostic integrated vocabulary test was designed to assess learners’ word knowledge. Most of the diagnostic vocabulary tests measure one dimension of vocabulary (Nation 1990). They either tap into receptive or productive word knowledge. Our diagnostic instrument consisted of seven different tasks presented in Table 4:

Task
Receptive/
Productive
Language Skill(s) / Strategy 
1)
Listen to words and match them with pictures.
Receptive
listening /
recognition of the words heard 
2)
Listen to definitions and match them with words
Receptive
listening /
comprehension of the text and 
inference of the word
3)
Match six written words with three pictures
Receptive
reading /
receptive recognition
4)
Match written words with the picture
Receptive
reading /
receptive recognition
5)

Match written definitions with the written word
Receptive
reading /
comprehension of text and 
inference of meaning
6)
Write word next to picture
Productive
writing / 
productive use of vocabulary and production of required lexis
7)
Translate or write sentence with word
Productive
writing
Table 4: Tasks in the diagnostic vocabulary test battery
The test is diagnostic in the sense that it is intended to determine the breadth of English as a foreign language vocabulary of sixth graders and to map the lexical competence of these students at a certain point in time as far as different topics are concerned. These topics are 1) food and eating, 2) home and furniture 3) shops and shopping, 4) travelling and transport, 5) jobs, 6) professions and sports. The outcome of the test will be an indicator of the size and limitations of the students’ vocabulary at this stage of their learning process. We also estimated the difficulty of the different tasks. On the basis of literature, it was concluded that the easiest task would be the one that involved listening and visual input and the most challenging would be the two tasks that required production. In Table 5, the ranking of the seven tasks is presented according to difficulty. Rank 1 is the simplest estimated task type and Rank 7 is the most difficult one:
Ranking
Task
Rank 1
Task 1
Rank 2
Task 2
Rank 3
Task 3
Rank 4
Task 4
Rank 5
Task 5
Rank 6
Task 6
Rank 7
Task 7
Table 5: Estimated difficulty rank order of task types

3.4 Selection of Words
First of all, words up to the first 2,000 frequency rank were selected from the British National Corpus (BNC). The reason for this decision was that researchers (Laufer Elder Congdon & Hill 2004, Nation 1999, Schmitt 2003) suggested that the most important thing for a language learner is to acquire the first 2,000 words of a language. However, concerning young learners, it might be the case that they know some infrequent words better than frequent ones. This can occur as a result of learning age-appropriate topics in school and incidental learning of out-of-school exposure to words. The words students are interested in knowing come from television programs, online-videos and books written in English. Sixth graders might also encounter less frequent words because they are simply more motivated to learn them.
Besides taking corpus-based data into account, suggestions in the Hungarian National Core Curriculum (2007) and Nikolov (2011) were also considered in terms of grouping words based on topics and involving them in the list. Nikolov (2011) suggests 14 broader topics that should be considered by elementary school teachers for classroom practice, and she also presumes that the lexis that is included in these topics might be the area of interest for young language learners. Consequently, the most relevant vocabulary of these topics has been added to the list of 2,000 words, irrespective of the word frequency rank. Nagy (2004) also supports the view of teaching those words to students that they are interested in learning. As a result, our list of words to be assessed was completed by the addition of another 2,000 words summing it up to 4,000 words. Nation and Waring (1995) found that the knowledge of the 4,000 most frequent words is crucial from the perspective of communicating in a given language.
For the measurement tool, six of the major topics specified above were selected. There are two reasons for this decision: (1) not all of the 14 topics can be included in the test and (2) these six topics are covered by the present sixth-grade curriculum. In our test, students’ breadth of vocabulary is assessed, since most vocabulary tests (Nation 1990, Meara 2009; Read 2000) assess this domain. The only assessment tool that measures the breadth of vocabulary was developed by Paribakht and Wechse (1999). So far, no study has been published on the assessment of young learners’ depth of EFL vocabulary. Moreover, it is reckoned that it would be a heavy cognitive load for sixth graders if synonyms and antonyms of different lexical items were tested.

3.5 Final Step in Determining the List of Selected Words
When creating the seven tasks for the diagnostic battery, we needed to consider two factors. There are:
  • very frequent words that students do not know, simply because those words belong to the lexis used by adults, and
  • infrequent words, for example, words of animals and jobs that are rather infrequent but which students know (Lehmann, 2009: 36).
For example, the word lion is rather infrequent as it is not outside the 3K list, but most of the students know it. The reason for this is partly that the recommended topics calibrated for this age group contain infrequent words.
We decided to establish three word categories grounded on the BNC list and the amount of occurrence of a particular word in the course books. The necessity of creating categories is underlined by the fact that major vocabulary tests (Nation 2001, Laufer & Nation 1995, Paribakht & Wechse 1999) include items selected on the basis of word frequencies. The basis of the three categories is indicated in Table 6. However, some of the words that would have been in Category 3 were ranked as Category 2 or even Category 1 in case they were expected to be known by learners based on course-book occurrence:
Word Frequency
Category
1 - 2,000
1
2,000 - 4,000
2
4,000 - more
3
Table 6: The basis of the categorization of the words
As we selected the lexical items for the tasks, we followed the system of selecting words from all three frequency categories. In all tasks, the majority (at least 4 words) belongs to Category 1, and at least two words represent Category 3; thus, Category 2 is represented by three or four words. With this system, it is guaranteed that words form all the possible layers are assessed. To present an example, the items from Task 2 are shown with their representative categories in Table 7:
Word
Category
A: bakery
3
B: butcher’s
3
C: cinema
2
D: hospital
1
E: kindergarten
2
F: market
1
G: restaurant
1
H: school
1
I: station
2
J: supermarket
1
K: theatre
2
Table 7: Words and their categories in Task 2
As far as the scoring of the test is concerned, all tasks were scored on a 1 to 9 point scale. In each task, there were nine items. One example was always given and at least one distractor - except Task 6 and 7 that did not match any of the pictures or definitions - was also placed in the sub-tests. Table 8 shows the number of items, the maximum possible points and the number of distractors:

Number of items
Maximum possible points
Number of distractors
Items in example
Task 1
11
9
1
1
Task 2
11
9
1
1
Task 3
24
9
9
6
Task 4
11
9
1
1
Task 5
11
9
1
1
Task 6
10
9
0
1
Task 7
10
9
0
1
Table 8: The scoring of the sub-tests
Since students had to use their productive vocabulary in Task 6, there was no need to add a distractor item. In Task 7, the case was similar to Task 6: there was no point in selecting any distracting items.

3.6 Procedure
The vocabulary test battery was administered in four schools in seven classes in November 2013. Language classes in Hungary are usually divided into two groups, with two teachers working with the groups simultaneously. However, test taking took place in the whole classes in order to save time. Prior to giving the paper-based test booklet to learners, the researcher contacted the school management and the teachers, and we gave them account of the goals of the research. The entire 45-minute class time was used in all classes. Students were given the tasks one by one so that no confusion could be caused. Besides seeing the instructions written on the test pages, students were also told in their native language by the researcher what they were supposed to do. Since it was a paper-based test and no prior voice recording had been done, the researcher himself read the words to the students in Task 1 and Task 2, which required listening comprehension. Once students had completed all the seven tasks, the test papers were collected and evaluated on the very day. Data were uploaded onto our SPSS database, and analysis was done with the help of a professional statistician from the Educational Science Institute of SZTE.

4 Results
The reliability of the test battery proved to be acceptable (Cronbach's alpha = 0.763). In Table 9, reliabilities, means and standard deviations are given as far as all the tasks are concerned:

Cronbach's Alpha
Mean
SD
Task 1
0.360
5.786
1.569
Task 2
0.671
5.310
2.052
Task 3
0.679
5.864
2.052
Task 4
0.728
4.767
2.052
Task 5
0.605
4.563
1.968
Task 6
0.657
5.660
1.752
Task 7
0.311
2.631
1.427
Table 9: Descriptive statistics of seven tasks
In Table 10, descriptive statics in terms of each item is presented. The means and standard deviations provide information on students’ achievements on each item:
 
Item Number
Mean
Standard Deviation
camel
1
.8058
.39750
helicopter
2
.6311
.48487
monkey
3
.6214
.48742
lion
4
.7087
.45657
ship
5
.7282
.44709
skating
6
.4078
.49382
swimming
7
.7573
.43082
train
8
.9417
.23537
tram
9
.1845
.38976
to arrive
10
.1942
.39750
to study
11
.4369
.49843
bake
12
.7864
.41185
grocery
13
.2039
.40485
to sell
14
.7573
.43082
cinema
15
.7961
.40485
hospital
16
.6117
.48976
to play
17
.7476
.43653
to eat
18
.7670
.42482
cleaning
19
.7087
.45657
drinking
20
.8058
.39750
driving
21
.7184
.45196
heavy
22
.6311
.48487
quick
23
.5922
.49382
tiny
24
.7670
.42482
boat
25
.2039
.40485
leg
26
.8058
.39750
pocket
27
.6311
.48487
to cook
28
.5825
.49555
dentist
29
.8835
.32240
firefighter
30
.1748
.38162
hairdresser
31
.5437
.50052
mechanic
32
.3786
.48742
pilot
33
.6214
.48742
plumber
34
.5825
.49555
tailor
35
.1748
.38162
waiter
36
.8252
.38162
cook (noun)
37
.7087
.45657
carpet
38
.1942
.39750
to wash
39
.6214
.48742
dining room
40
.7961
.40485
to talk
41
.6214
.48742
cupboard
42
.2039
.40485
shelf
43
.1748
.38162
bedroom
44
.6214
.48742
open
45
.6214
.48742
mushroom
46
0.0000
0.00000
cheese
47
.7961
.40485
hamburger
48
1.0000
0.00000
fish
49
.8252
.38162
chicken
50
.7961
.40485
sausage
51
.2039
.40485
ice-cream
52
.6214
.48742
cake
53
.7961
.40485
coffee
54
.6214
.48742
frozen
55
.3786
.48742f
fruit
56
.7961
.40485
foreign
57
0.0000
0.00000
whole
58
.1748
.38162
lightning
59
.3786
.48742
through
60
.2039
.40485
to accuse
61
.2718
.44709
probably
62
.1359
.34438
handsome
63
.2913
.45657
Table 10: Descriptive statistics of 63 items in seven tasks
On the basis of our descriptive statistics, the least differentiating items can be identified. The items skating, hospital, heavy, mechanic, sausage, frozen, lightning and through are the ones that indicate the lowest correlations with the other items.
Examining correlations is one aspect of our item analysis. In addition, the question which items have the highest and the lowest standard deviations also needs to be examined. In the test, as had been expected, some of the items were very easy; others were extremely difficult for learners. It is evident from Table 9 that the item hamburger was the easiest one, and the items mushroom and foreign were the two most difficult ones, which no students could produce in Task 6. It should also be noted, however, that if mushroom had been an item in Task 3, the task with the highest mean - students were expected to match a picture with a word -, some of the students might have been able to recognize this word. As for the correlations among the seven different tasks, we found strong and significant correlations. The two listening tasks (Task 1 and Task 2) showed a robust statistical relationship with each other whereas the two reading tasks (Task 4 and Task 5) had a much weaker - but still significant - correlation. It is inevitable to examine the functioning of the depth of the vocabulary test (Task 7) because it showed significantly negative correlations with all the other task types except Task 4 in which students were expected to match written words with pictures. In Table 11, the the correlations among task types are shown:

Task 1
Task 2
Task 3
Task 4
Task 5
Task 6

Task 1

0.945**
0.803**
0.509**
0.823**
0.647**

Task 2
0.945**

0.763**
0.543**
0.790**
0.582**

Task 3
0.803**
0.763**

0.337**
0.853**
0.820**

Task 4
0.509**
0.543**
0.337**

0.237*
0.010

Task 5
0.823**
0.790**
0.853**
0.237*

0.937**

Task 6
0.647**
0.582**
0.820**
0.010 (n.s.)
0.937**


Task 7
-0.329**
-0.246*
-0.615**
0.180 (n.s.)
-0.679**
-0.827**

** Correlation is significant at the 0.01 level (2-tailed).
* Correlation is significant at the 0.05 level (2-tailed).
Table 11: Correlations across tasks
In order to compare the means of each task with one another, paired samples t-tests were run. There is obviously a significant difference among most of the different tasks. The biggest difference is found between the listening tasks, Task 1 and Task 2 (t=4.854, p=0.000). However, there is no significant difference between the means of the two reading tasks, Task 4 and Task 5 (t=0.202, p > 0.005). As for the two productive tasks, Task 6 and Task 7, a significant difference (t=3.029, p=0.000) could be found.

5 Discussion
In order to answer our research questions, we need to examine the results carefully.
As regards RQ 1, it can be stated that Task 3 proved to be the easiest one even though it had been estimated that Task 1 and Task 2 are the easiest ones. Thus, recognizing words based on pictures is the easiest modality in this test, and this reinforces Laufer et al.’s (2004) hypothesis that receptive recognition is the most simple modality. Looking at the results makes it clear that receptive word knowledge is easier to acquire than productive word knowledge. Thus, the results in this integrated test correspond to a general tendency.
Those items that were known by less than 20% of the participants and those ones that have no or a very low standard deviation were considered inappropriate. For example, the item hamburger, a cognate in Hungarian, was known by every student whereas the item mushroom was totally unknown and nobody could write down the word correctly next to the corresponding picture in Task 6. These items will be removed in future research and will not be involved in the online version of the test. The items skating, hospital, heavy, mechanic, sausage, frozen, lightning, through are the ones that need to be put under further scrutiny due to reasons stated in connection with the items hamburger and mushroom.
To tap into the problem of students' knowledge of word classes (RQ3), nouns proved to be the easiest word class (p<0 .001="" a="" another.="" as="" be="" been="" confirmed="" correlate="" create="" difference="" different="" first="" font="" from="" has="" identical="" in="" is="" most="" of="" one="" online="" only="" same="" second="" significantly="" task="" tests="" that="" the="" therefore="" two="" type="" types="" used="" version.="" version="" we="" will="" with="" words="">
The two listening tasks have the strongest correlation. It is also evident that Task 1 has strong and significant correlations with the rest of the tasks. However, there is no significant relationship between Task 4 and Task 6. This means that the relationship between students' receptive recognition of words and their productive written lexical knowledge is called into question (Laufer et al. 2004).
The negative correlations of Task 7 with most of the other tasks make it doubtful to apply Task 7 in future research. Since the ultimate goal of the test was to develop a validated online instrument, it is questionable whether the production of a sentence containing the given word should be involved in the online test because online tests have the disadvantage that productive knowledge is generally difficult to test in an online environment.

6 Conclusion
In this study the results of a pilot study are presented that involved the application of a new integrated vocabulary test developed for young learners. The ultimate goal of this research was to finalize an instrument that will be used online. All the linguistic skills - except for speaking - were measured with a focus on English-as-a-foreign-language vocabulary. Test takers need to have good listening, reading and writing skills to reach a high score in the test.
With the item and task analysis described above, valuable data could be gained with regards to future assessment. The results have provided sufficient information as to what kind of modifications must be implemented. It is also a useful finding that most of the task types correlate with one another, which means that these tasks are not independent of one another.

References
Cameron, L. (2001). Teaching languages to young learners. Cambridge: Cambridge Teaching Library
Doró, K. (2013). Role of lexical knowledge and its testing in an L2 academic context. Saarbrücken: Lambert Academic Publishing
Dunn, L. M. & L. M. Dunn (1997). Peabody picture vocabulary test. Circle Pines, Michigan: American Guidance Service
Fitzpatrick, T., I. Al-Qarni & P. Meara (2008). Intensive vocabulary learning: a case study. Language Learning Journal, 36(2), 239–248
Henrikssen, B. (1999). Three dimensions of vocabulary development. Studies in Second Language Acquisition, 21(2), 303–317
Horváth, J. (2001). Advanced writing in English as a foreign language. A corpus-based study of processes and products. Pécs: Lingua Franca Group.
Jiménez Catalan, R. M. & Terrazas Gallego, M. (2005-2008). The receptive vocabulary of English foreign language young learners. Journal of English Studies, 5(1),173–191.
Kilgarriff, A (1997). Putting frequencies in the dictionary. International Journal of Lexicography 10, 135–155.
Laufer, B. & I.S.P. Nation (1995). Vocabulary size and use: lexical richness in L2 written production. Applied Linguistics, 16(3), 307-322.
Laufer, B. & I. S. P. Nation (2001). Passive vocabulary size and speed of recognition. EUROSLA Yearbook 1. 7–28.
Laufer, B., C. Elder, K. Hill & P. Congdon, (2004). Size and strength: do we need both to measure vocabulary knowledge? Language Testing, 21(2), 202–226.
Lehmann, M. (2009). Assessing English majors’ vocabulary at the University of Pécs. Unpublished PhD thesis. University of Pécs.
Lewis, M. (1993). The lexical approach. Hove : Teacher Training Publications.
McKay, P. (2006). Assessing young language learners. Cambridge: Cambridge University Press.
Meara, P. (1990). Some notes on the Eurocentres vocabulary tests. In: Tommola, J. (ed.). Foreign language comprehension and production. Turku: AFinLa Yearbook, 103–113.
Meara, P. (2009). Connected words. Amsterdam: John Benjamins Publishing.
Meara, P. & J. Milton (2003). X-Lex: The Swansea Vocabulary Levels Test. Swansea: Express Publishing.
Milton, J. (2009). Measuring second language vocabulary acquisition (Vol. 45). Bristol: Multilingual Matters.
Nagy, J. (2004). A szóolvasó készség fejlődésének kritériumorientált diagnosztikus feltérképezése. The reference-oriented diagnostic mapping of word reading skills. Magyar Pedagógia, 104(2), 123–142.
Nation, I. S. P. (1990). Teaching and learning vocabulary. Boston: Heinle and Heinle.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
National Core Curriculum (2007). Magyar Közlöny, 102, 7683–7686.
Nikolov, M. (2011). Az angol nyelvtudás fejlesztésének és értékelésének keretei az általános iskola első hat évfolyamán. / Frames of the development and assessment of English language knowledge int he first six gardes of school/. Modern Nyelvoktatás, 17(1), 9–31
Nikolov, M. & J. Mihaljevic Djigonovic, (2011). All shades of every colour: An overview of early teaching and learning of foreign languages. Annual review of applied linguistics. 31, 95–119.
Nikolov M. & G. Szabó, (2011). Az angolnyelv-tudás diagnosztikus mérésének és fejlesztésének lehetőségei az általános iskola 1-6. évfolyamán. / The possibilities of diagnostic assessment of English lnguage knowledge in 1st-6th grade in primary school/ In: Csapó, B. and Zsolnai A. (eds.): Kognitív és affektív fejlődési folyamatok diagnosztikus értékelésének lehetőségei az iskola kezdő szakaszában. /The possibilities of diagnostic assessment of cognitive and affective developmental processes in the begging phase of school/. Budapest: Nemzeti Tankönyvkiadó, 13–41.
Orosz, A. (2009). The growth of young learners’ English vocabulary size. In: Nikolov, M. (ed.). Early learning of modern foreign languages. Processes and outcomes. Bristol: Multilingual Matters,181–195.
Paribakht, T.S. & M. Wechse, (1999). Reading and incidental L2 vocabulary acquisition. An introspective study of lexical inferencing. Studies in Second Language Acquisition, 21(2), 195-224.
Pellicer-Sanchez, A. & N. Schmitt, (2012). Scoring Yes-No vocabulary tests: Reaction time vs. nonword approaches. Language Testing, 29(4), 489–509.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Schmitt, N. (2000). Vocabulary in Language Teaching. Cambridge: Cambridge University Press.
Schmitt, N., D. Schmitt, & C. Clapham, (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18(1), 55–88.
Schmitt, N. & C. Zimmerman, (2002). Derivative word forms: What do learners know? TESOL Quarterly 36(2), 145-171.
Schoonen, R. & B. Verhallen, (2008). The assessment of deep word knowledge in young first and second language learners. Language Testing, 25(2), 211–236.
Terrazas Gallego, M. & M. P. Agustín Llach, (2009). Exploring the increase of receptive vocabulary knowledge int he foreign language: A longitudinal study. IJES, 9(1), 113–133.
Thornbury, S. (2002). How to teach vocabulary. London: Pearson.
Vidákovich Tibor (1990). Diagnosztikus pedagógiai értékelés. /Diagnostic Educational Assessment/. Budapest : Akadémiai Kiadó.
Vidákovich, T. & E. Csirikné Czachesz (2006). Középiskolás tanulók szókincse, a szókincsmélység és a szövegértés összefüggései./The vocabulary of secondary school students, the correlations of breadth and depth of text comprehension/ Modern Nyelvoktatás, 12(2), 16-29.
Vidákovich T., T. Vígh, O. Sominé Hrebik & I. Thékes (2013). Az angol és német nyelvi szókincs online diagnosztikus tesztelése a 6. évfolyamon./Online diagnostic testing of English and German vocabulary/. Iskolakultúra, 23(11), 117-131.
Webb, S., Sasao, Y. (2013) New directions in vocabulary testing. RELC Journal 44(3): 263-277.

Author:
Istvan Jerry Thekes
Doctoral School of Education
University of Szeged
Szeged, 6722, Petőfi sgt. 21.
Email: jerrythekes@gmail.com