of Linguistics and Language Teaching
6 (2015) Issue 2
Diagnostic Vocabulary Test
Young EFL Learners
Istvan Jerry Thekes
(Szeged, Hungary)
goal of this study was to validate an integrated diagnostic
vocabulary test for young learners of English as a foreign language.
The research questions were: 1) How difficult was each task type? 2)
Which items were inappropriate in the test battery? 3) Which word
class proved to be the easiest and the most difficult one? 4) How did
the different task types correlate? The vocabulary test battery was
administered to 103 students in Hungary in November 2013. The
reliability of the test battery proved to be acceptable (Cronbach's
alpha = 0.763). Students scored highest in the recognition of nouns.
The two listening tasks had the strongest correlations.
words: vocabulary, diagnostic assessment, validation
construct of vocabulary is a popular research area in the literature
on foreign language acquisition, which is not only in the focus of
scholars but also in that of teachers. Educators have been encouraged
(Lewis 1993, Thornbury 2002) to promote the intentional learning of
words in the classroom. Since the early 1990s teachers have laid
special emphasis on teaching vocabulary as well (Fitzpatrick,
Al-Qarni & Meara 2008). Successful language acquisition is
greatly determined by foreign language word knowledge (Schoonen &
Verhallen 2008). Both adults’ and young learners’ English as a
foreign language vocabulary has mainly been assessed comprehensively
as part of a test measuring general language knowledge. Thus we have
hardly any data concerning young learners’ vocabulary. The primary
goal of this study was to create and an integrated diagnostic
vocabulary test for young learners of English as a foreign language.
The secondary goal was to analyze each and every item in this test
that is pioneering in the sense that there is no validated integrated
vocabulary test for young learners to the best of our knowledge.
Webb and Sasao (2013) have lately proposed an attempt to create an
integrated test. However it is meant to assess adults. By integrated
test, we mean a tool that measures both productive and receptive word
knowledge. The tool also needs to be diagnostic so as to map the
as a foreign language
word knowledge of young learners. Diagnostic tests are developed for
the purpose of exploring knowledge acquired before a given learning
process so they have major classroom implications (Vidákovich 1990).
The pilot study was administered as a paper and pencil test in order
to gain useful data for a reconsideration of the task types and items
involved. The ultimate purpose of this pilot study was to create an
online integrated vocabulary test which will help to diagnostically
map students’ lexical knowledge
in an efficient way.
Investigating Word Knowledge
knowledge is interpreted along several dimensions (Doró 2013, Nation
2001, Schmitt 1998), and two important aspects of it are
distinguished by the researchers (Laufer & Nation 2001, Read
2000): breadth and depth of vocabulary. The notion breadth
of vocabulary
means the quantitative trait of vocabulary, i.e. how many words a
student knows whereas depth
of vocabulary
means the qualitative trait of vocabulary and is characterized by the
syntagmatic relationships between words and the inner structure of
words (Nation 2001, Read 1999, Vidákovich
& Cs.
2006). The
breadth and depth of an individuals’ vocabulary determines his or
her reading comprehension to a great extent (Nagy 2004). Meara (2009)
asserts that the interpretation of vocabulary breadth
the number of words learners know. Depth,
on the other hand, means how well learners know these words. Another
essential distinction in this construct is the receptive and
productive word knowledge (Nation 2001). Receptive lexical knowledge
means that a student is able to recognize spoken and written words
whereas productive knowledge is the ability to use words in spoken or
written discourse.
(1999) three-dimensional description of word knowledge also merits
attention. Her dimension of vocabulary knowledge is proposed on a
spectrum of partial
of shallow
of receptive
Other studies refer to her second dimension, the one concerning depth
as the quality of word knowledge. She
argues that it may even happen that a learner will never possess the
complete knowledge of a given word. However, she asserts that the
complete knowledge of a given word may not even be necessary to
understand texts.
her view, a kind of network building is involved in the process. She
defines this process as “developing and handling new sense
relations between words” (Henrikssen 1999: 308).
recently, Meara (2009) has proposed an alternative to the notion of
depth and breadth of vocabulary knowledge by stating that vocabulary
knowledge is rather more than the sum of learners’ knowledge of
individual words in their vocabulary. As he
states, it is not so interesting how deep and how broad a given
vocabulary knowledge is, but rather, how the individual words
interact with one another. He claims that “these interactions are
what distinguishes between a mere vocabulary list and a vocabulary
network“ (Meara 2009: 76). He even goes further in his book by
stating that the breadth / depth distinction is an unfortunate
proposes that the terms size
and organization
be used instead in future research.
selection of words in every validated vocabulary test is grounded on
corpora (Vidákovich, Vígh, S. Hrebik & Thékes 2013). Corpus
linguistics is a fast developing field of applied linguistics. Its
application is a major help for vocabulary learning and also for
teaching researchers as well as teachers (Lehmann 2009). A large
amount of corpora are being developed all over the world for a lot of
languages and for a lot of jargons, too. For example, there exist
of car mechanics jargon, spoken Scottish English jargon or
The major general corpora available around the word are the LOB
Corpus, CANCODE,
the British National Corpus, and COBUILD. Horváth (2001) claims
that a corpus can provide a lot of information in terms of word
frequency, collocations as well as lexical and syntactic patterns.
These are the pieces of information that are necessary in the
selection process of vocabulary when lexical tests are created. All
the major vocabulary tests have a corpus-based aspect, and the
different levels of difficulty of the various vocabulary tests are
based on frequency
lists Nation 2001). Frequency is the most basic concept that is
examined in corpus linguistics. The most elementary issue
can be concluded from studying the language in a corpus is how many
times a particular word occurs. The earliest corpora in research gave
the frequency of a word as the first piece of information. At the
dawn of corpus linguistics, it took scholars a long time to count the
frequencies of words, but nowadays, with
it is a matter of seconds.
2.1 Vocabulary Tests
Schmitt (2008) points out, there is no commonly accepted standardized
test of English vocabulary available. What is accepted as a commonly
valid test was devised by Nation (2001). It bears the name of
Levels Test (VLT).
of giving an estimate of vocabulary size, words at four frequency
levels are measured: 2,000, 30,000, 5,000 and 10,000.
Students find six items on the left side and three synonyms of three
of the six given items on the right side. They are expected to match
the three synonyms with three of the words on the left side. Three
items are distracters.
the test gives estimates of vocabulary size at five levels, it can be
applied for
formative assessment purposes
and for the diagnosis of vocabulary gaps. Table 1 presents the sample
task of the Vocabulary Levels Test:
1 bitter
2 independentsmall3 lovelybeautiful4 merryliked by many people5 popular
6 slight
Table 1: Vocabulary
Levels Test
vocabulary measure which can serve the purpose of self-assessment is
the widely-known
Knowledge Scale (VKS)
(Paribakht & Wechse 1999). Schmitt (2008) praises this type of
vocabulary measurement by saying that it emphasizes what students
know, rather than what they do not know. By allowing them to show
their partial knowledge of a lexical item, it may be more motivating
than other types of tests (Schmitt 2008: 175). Students
must indicate their knowledge of the given word on a scale of no
knowledge to productive knowledge:
1. I don’t remember having seen this word before.2. I have seen this word before, but I don’t know what it means.3. I have seen this word before and I think it means………………………................4. I know this word. It means………………..............................................................5. I can use this word in a sentence:……………………………….............................
Table 2: Vocabulary Knowledge Scale
Nation’ Vocabulary Levels Test and Paribakht and Wechse’s
Knowledge Scale,
the popularity checklist tests are also tools often used for
diagnostic purposes. They are simple tests in the sense that all that
students need to do is to check if they know a given item it or not.
The evident problem is that most students overestimate (Orosz 2009:
188) their
knowledge and might check many more words than they actually know. In
order to compensate for this, nonwords that look like real words are
put in the test and when they are checked, students are penalized for
that with minus points. Meara is acknowledged for having developed a
test called EFL Vocabulary Tests (Meara 1990), and a commercialized
computerized version is called the Eurocentres Vocabulary Size Test
(EVST) (Meara & Jones, 1990), which was further developed into
the X-Lex test (Meara & Milton 2003).
As far
as productive knowledge of vocabulary is concerned, Laufer and Nation
(1995) developed an instrument that measures productive word
knowledge. In this test students see sentences. In each sentence,
only the initial letters of a word are given. Students must write the
missing part of the word. This test is named Productive
Vocabulary Levels Test:
- He likes walking in the fo……………… because the trees are beautiful there.
- He takes cr..........................and sugar in his coffee
- The actor took the st………… to perform in the long-awaited play.
Table 3: Productive Vocabulary Levels
2.2 Assessing Young
Learners’ Vocabulary
the above-mentioned data collection instruments have been designed to
assess university students or adults, there have been studies
reporting on the testing of young learners’ word knowledge as well.
Nikolov and Mihaljevic Djigunovic (2011) state that by young
learners, they mean primary school students up to the age of 14. The
most comprehensive book on the assessment of young foreign language
learners was compiled by McKay (2006). The book also has implications
as far as vocabulary learning
is concerned. It is typical of young learners studying vocabulary
that they use memorized chunks. In this sense, their knowledge is
implicit; explicit studying ability, that enables learners to
understand rules, is acquired during adolescence only (Nikolov &
Szabó 2011). Most of the young learners learn words quickly.
However, after they have acquired the ability to recognize words, the
ability to use connotations, shades of meaning,
synonyms and antonyms is only acquired as a result of a long process
of learning (Cameron 2001). In the literature, it has also been
emphasized that until the age of twelve, students know a limited
amount of words only (Laufer 2000).
Students hardly ever know the secondary meanings of words (Schmitt
1998), and they have limited awareness of the derivative forms of a
word (Schmitt & Zimmerman 2002).
number of studies have tried to explore the vocabulary size of young
learners. In the study conducted by Jiménez
és Terrazas
(2008), the receptive vocabulary of Spanish 4th
graders (N=270) was diagnostically explored. At the time of data
collection, students had learnt English for three years (three
lessons a
week). The VLT was used as the test
up to the 2,000 most frequent words. In this study, a strong
relationship was found between the frequency of a given word and
students' knowledge of it. The same test
was used in a longitudinal study in
the frame of
which 224 4th graders’ word knowledge development was studied in
the course of three years Terrazas
és Agustín
(2009). According to the results, the vocabulary of learners goes
through a significantly fast developmental process. By transforming
the results received in the test, it was estimated that out of the
most frequent 1,000 words, 4th graders knew an average of 361 in
average whereas this figure was 817 by the end
of the seventh grade.
In a cross-sectional
study, the receptive vocabulary of Hungarian students from third to
sixth grade (N=253) was assessed with the X-Lex test (Orosz 2009).
Her finding was that learners’ word knowledge develops fast and
gradually. Sixth graders’ size of vocabulary was estimated to be
around 1700 words. Based on the results of the variance-analysis, it
was also asserted that there was a significant difference among third
and sixth graders.
another study done in Hungary, an online receptive word knowledge
test was used to assess Hungarian sixth and seventh graders English
and German as a foreign language
vocabulary (Vidákovich et al. 2013). The assessment was based on 216
English and German words with identical word meaning, CEFR-level, and
similar word frequency. The two tests consisted
of 54 tasks. Each task included a simple or complex picture and four
words which were true-false items. Students had to make a decision as
to whether the given word matched the picture through identification
or implication. Three test versions were developed, using the same
test construction in terms of task structure, operation, and
It was found that those students who studied English reached a
significantly higher score than those who studied German.
Taiwan, a new vocabulary test was created so as to assess the
recognition speed of vocabulary among nine-year-old 4th graders
(Yu-Cheng 2008). During the test, learners had six seconds to signal
the meaning belonging to either a picture or native language
equivalent. This test was applied in a control group study with 64
participants as well. While the treatment group learnt vocabulary
with the help of visual input, the control group learnt vocabulary
with the help of L1 words. The finding was that there was no
difference between the two groups with regards to the recognition
speed of words.
research exploring the receptive vocabulary of young learners, visual
input has been used in several cases (Schmitt, Schmitt & Clapham
2001, Yu-Cheng 2008). An example is the Peabody Picture Vocabulary
Test (Dunn & Dunn 1997), used to assess receptive vocabulary.
With the help of this test, teachers can also gain information
the deficiency in students’ word knowledge in order to plan the
learning process more efficiently. In this test, which is also used
for the assessment of students with special needs, students see four
numbered pictures. The researcher says one word and the learners need
to decide which picture can be matched with the word in question.
decided to assess Hungarian sixth graders’ English vocabulary with
an integrated test because there had only
been two studies (Orosz 2009 and Vidákovich et al. 2013) that
measured Hungarian young learners’ vocabulary. Gaining information
on learners’ receptive word knowledge might be useful. Adding
productive tasks and audio tasks to an instrument can, however,
provide us with more relevant data.
3. The Study
3.1 Research Questions
regards to the goals of the present study, the following research
questions were formulated:
- How difficult was each task type?
- Which items were inappropriate in the test battery?
- Which word class proved to be the easiest and which the most difficult one in the test battery?
- How do the different task types correlate?
3.2 Participants
students taking the test were sixth graders (N = 103) in four
Hungarian primary schools. Careful selection
took place in terms of the number of English lessons per week. Only
students in classes of general curriculum were selected. This means
that learners had three English lessons a week in the school-year.
This also means that they were studying English as a foreign language
in three lessons a week when data collection was carried out and that
they had been learning English since the 4th
grade. Learners knew that they would take a vocabulary test as they
had been informed a week before the test was administered. Motivation
was ensured by the teachers who promised all participating students a
grade 5, equivalent of grade A in the Anglo-Saxon education system.
3.3 Research
diagnostic integrated vocabulary test was designed to assess
learners’ word knowledge. Most of the diagnostic vocabulary tests
measure one dimension of vocabulary (Nation 1990). They either tap
into receptive or productive word knowledge. Our diagnostic
instrument consisted of seven different tasks presented in Table 4:
TaskReceptive/ProductiveLanguage Skill(s) / Strategy1)Listen to words and match them with pictures.Receptivelistening /recognition of the words heard2)Listen to definitions and match them with wordsReceptivelistening /comprehension of the text andinference of the word3)Match six written words with three picturesReceptivereading /receptive recognition4)Match written words with the pictureReceptivereading /receptive recognition5)
Match written definitions with the written wordReceptivereading /comprehension of text andinference of meaning6)Write word next to pictureProductivewriting /productive use of vocabulary and production of required lexis7)Translate or write sentence with wordProductivewriting
Table 4: Tasks in the diagnostic
vocabulary test battery
test is diagnostic
in the sense that it is intended to determine the breadth of English
as a foreign language
vocabulary of sixth graders and to map the lexical competence of
these students at a certain point in time as far as different topics
are concerned. These topics are 1) food and eating, 2) home and
furniture 3) shops and shopping, 4) travelling and transport, 5)
jobs, 6) professions and sports. The outcome of the test will be an
indicator of the size and limitations of the students’ vocabulary
at this stage of their learning process. We also estimated the
difficulty of the different tasks. On the basis of literature, it was
concluded that the easiest task would be the one that involved
listening and visual input and the most challenging would be the two
tasks that required production. In Table 5, the ranking of the seven
tasks is presented according to difficulty. Rank 1 is the simplest
estimated task type and Rank 7 is the most difficult one:
RankingTaskRank 1Task 1Rank 2Task 2Rank 3Task 3Rank 4Task 4Rank 5Task 5Rank 6Task 6Rank 7Task 7
Table 5: Estimated difficulty rank
order of task types
3.4 Selection of Words
of all, words up to the first 2,000 frequency rank were selected from
the British National Corpus (BNC). The reason for this decision was
that researchers (Laufer Elder Congdon & Hill 2004, Nation 1999,
Schmitt 2003) suggested that the most important thing for a language
learner is to acquire the first 2,000 words of a language. However,
concerning young learners, it might be the case that they know some
infrequent words better than frequent ones. This can occur as a
result of learning age-appropriate topics in school and incidental
learning of out-of-school exposure to words. The words students are
interested in knowing come from television programs, online-videos
and books written in English. Sixth graders might also encounter less
frequent words because they are simply more motivated to learn them.
taking corpus-based data into account, suggestions in the Hungarian
National Core Curriculum (2007) and Nikolov (2011) were also
considered in terms of grouping words based on topics and involving
them in the list. Nikolov (2011) suggests 14 broader topics that
should be considered by elementary school teachers for classroom
practice, and she also presumes that the lexis that is included in
these topics might be the area of interest for young language
learners. Consequently, the most relevant vocabulary of these topics
has been added to the list of 2,000 words, irrespective of the word
frequency rank. Nagy (2004) also supports the view of teaching those
words to students that they are interested in learning. As a result,
our list of words to be assessed was completed by the addition of
another 2,000 words summing it up to 4,000 words. Nation and Waring
(1995) found that the
knowledge of the
4,000 most frequent words
crucial from the perspective of
in a given language.
the measurement tool, six of the major topics specified above were
selected. There are two reasons for this decision: (1) not all of the
14 topics can be included in the test and (2) these
six topics are covered by the present sixth-grade curriculum. In our test, students’ breadth of vocabulary is assessed, since
most vocabulary tests (Nation 1990, Meara 2009; Read 2000) assess
this domain. The only assessment tool that measures the breadth of
vocabulary was developed by Paribakht and Wechse (1999). So far, no
study has been published on the assessment of young learners’ depth
of EFL vocabulary. Moreover, it is reckoned that it would be a heavy
cognitive load for sixth graders if synonyms and antonyms of
different lexical items were tested.
3.5 Final Step in
Determining the List of Selected Words
creating the seven tasks for the diagnostic battery,
we needed
to consider two factors. There
- very frequent words that students do not know, simply because those words belong to the lexis used by adults, and
- infrequent words, for example, words of animals and jobs that are rather infrequent but which students know (Lehmann, 2009: 36).
example, the word lion
is rather infrequent as it is not outside the 3K list, but most of
the students know it. The reason for this is partly that the
recommended topics calibrated for this age group contain infrequent
decided to establish three word categories grounded on the BNC list
and the amount of occurrence of a particular word in the course
books. The necessity of creating categories is underlined by the fact
that major vocabulary tests (Nation 2001, Laufer & Nation 1995,
Paribakht & Wechse 1999) include items selected on the basis of
The basis of the three categories is indicated in Table 6. However,
some of the words that would have been in Category 3 were ranked as
Category 2 or even Category 1 in case they were expected to be known
by learners based on course-book occurrence:
- 2,000
- 4,000
- more
Table 6: The basis of the
categorization of the words
we selected the lexical items for the tasks, we followed the system
of selecting words from all three frequency categories. In all tasks,
the majority (at least 4 words) belongs
Category 1, and at least two words represent Category 3; thus,
2 is represented by three or four words.
this system, it is guaranteed that words form all the possible layers
are assessed. To present an example, the items from Task 2 are shown
with their representative categories in Table
WordCategoryA: bakery3B: butcher’s3C: cinema2D: hospital1E: kindergarten2F: market1G: restaurant1H: school1I: station2J: supermarket1K: theatre2
7: Words and their categories in Task 2
As far
as the scoring of the test is concerned, all tasks were scored on a 1
to 9 point scale. In each task, there were nine items. One example
was always given and at least one distractor - except Task 6 and 7
that did not match any of the pictures or definitions - was also
placed in the sub-tests. Table
shows the number of items, the maximum possible points and the number
of distractors:
Number of itemsMaximum possible pointsNumber of distractorsItems in exampleTask 111911Task 211911Task 324996Task 411911Task 511911Task 610901Task 710901
8: The scoring of the sub-tests
Since students had to use
their productive vocabulary in Task 6, there was no need to add a
distractor item. In Task 7, the case was similar to Task 6: there was
no point in selecting any distracting items.
3.6 Procedure
vocabulary test battery was administered in four schools in seven
classes in November 2013. Language classes in Hungary are usually
divided into two groups, with two teachers working with the groups
simultaneously. However, test taking took place in the whole classes
in order to save time. Prior to giving the paper-based test booklet
to learners, the researcher contacted the school management and the
teachers, and we gave them account of the goals of the research. The
entire 45-minute class time was used in all classes. Students were
given the tasks one by one so that no confusion could be caused.
Besides seeing the instructions written on the test pages, students
were also told in their native language by the researcher what they
were supposed to do. Since it was a paper-based test and no prior
voice recording had been done, the researcher himself read the words
to the students in Task 1 and Task 2, which required listening
comprehension. Once students had completed all the seven tasks, the
test papers were collected and evaluated on the very day. Data were
uploaded onto our SPSS database, and analysis was done with the help
of a professional statistician from the
Educational Science Institute of SZTE.
4 Results
reliability of the test battery proved to be acceptable (Cronbach's alpha = 0.763). In Table
reliabilities, means and standard deviations are given as far as all
the tasks are concerned:
Cronbach's AlphaMeanSDTask 10.3605.7861.569Task 20.6715.3102.052Task 30.6795.8642.052Task 40.7284.7672.052Task 50.6054.5631.968Task 60.6575.6601.752Task 70.3112.6311.427
9: Descriptive statistics of seven tasks
descriptive statics in terms of each item is presented. The means and
standard deviations provide information
on students’
achievements on each item:
Item NumberMeanStandard Deviationcamel1.8058.39750helicopter2.6311.48487monkey3.6214.48742lion4.7087.45657ship5.7282.44709skating6.4078.49382swimming7.7573.43082train8.9417.23537tram9.1845.38976to arrive10.1942.39750to study11.4369.49843bake12.7864.41185grocery13.2039.40485to sell14.7573.43082cinema15.7961.40485hospital16.6117.48976to play17.7476.43653to eat18.7670.42482cleaning19.7087.45657drinking20.8058.39750driving21.7184.45196heavy22.6311.48487quick23.5922.49382tiny24.7670.42482boat25.2039.40485leg26.8058.39750pocket27.6311.48487to cook28.5825.49555dentist29.8835.32240firefighter30.1748.38162hairdresser31.5437.50052mechanic32.3786.48742pilot33.6214.48742plumber34.5825.49555tailor35.1748.38162waiter36.8252.38162cook (noun)37.7087.45657carpet38.1942.39750to wash39.6214.48742dining room40.7961.40485to talk41.6214.48742cupboard42.2039.40485shelf43.1748.38162bedroom44.6214.48742open45.6214.48742mushroom460.00000.00000cheese47.7961.40485hamburger481.00000.00000fish49.8252.38162chicken50.7961.40485sausage51.2039.40485ice-cream52.6214.48742cake53.7961.40485coffee54.6214.48742frozen55.3786.48742ffruit56.7961.40485foreign570.00000.00000whole58.1748.38162lightning59.3786.48742through60.2039.40485to accuse61.2718.44709probably62.1359.34438handsome63.2913.45657
10: Descriptive statistics of 63 items in seven tasks
the basis of our
statistics, the least differentiating items can be identified. The
items skating,
hospital, heavy, mechanic, sausage, frozen, lightning
and through
ones that indicate the lowest correlations with the other items.
correlations is one aspect of our item analysis. In addition, the
question which items have the highest and the lowest standard
deviations also needs to be examined. In the test, as had been
expected, some of the items were very easy; others were extremely
difficult for learners. It is evident from Table
the item hamburger
was the easiest one, and the items mushroom
and foreign
were the two most difficult ones, which no students could produce in
Task 6. It should also be noted, however, that if mushroom
had been an item in Task 3, the task with the highest mean - students
were expected to match a picture with a word -, some of the students
might have been able to recognize this word. As for the correlations
among the seven different tasks, we found strong and significant
correlations. The two listening tasks (Task 1 and Task 2) showed a
robust statistical relationship with each other whereas the two
reading tasks (Task 4 and Task 5) had a much weaker - but still
significant - correlation. It is inevitable to examine the
functioning of the depth of the vocabulary test (Task 7) because it
showed significantly negative correlations with all the other task
types except Task 4 in which
students were expected to match written words with pictures. In Table
11, the the correlations among task types are shown:
Task 1Task 2Task 3Task 4Task 5Task 6
Task 1
Task 20.945**
Task 30.803**0.763**
Task 40.509**0.543**0.337**
Task 50.823**0.790**0.853**0.237*
Task 60.647**0.582**0.820**0.010 (n.s.)0.937**
Task 7-0.329**-0.246*-0.615**0.180 (n.s.)-0.679**-0.827**
** Correlation is significant at the 0.01 level (2-tailed).* Correlation is significant at the 0.05 level (2-tailed).
11: Correlations across tasks
order to compare the means of each task with one another, paired
samples t-tests were run. There is obviously a significant difference
among most of the different tasks. The biggest difference is found
between the listening tasks, Task 1 and Task 2 (t=4.854, p=0.000).
However, there is no significant difference between the means of the
two reading tasks, Task 4 and Task 5 (t=0.202, p > 0.005). As for
the two productive tasks, Task 6 and Task 7, a significant
difference (t=3.029, p=0.000) could be found.
5 Discussion
In order to answer our
research questions, we need to examine the results carefully.
As regards RQ 1, it can
be stated that Task 3 proved to be the easiest one
even though it had been estimated that Task 1 and Task 2 are the
easiest ones.
Thus, recognizing words based on pictures is the easiest modality in
this test, and this reinforces Laufer et al.’s (2004) hypothesis
that receptive recognition is the most simple modality. Looking at
the results makes it clear that receptive word knowledge is easier to
acquire than
productive word knowledge. Thus, the results in this integrated test
correspond to a general tendency.
items that were known by less than 20% of the participants and those
ones that have no or a very low standard deviation were considered
inappropriate. For example, the item hamburger,
cognate in Hungarian, was known by every student whereas the item
was totally unknown and nobody could write down the word correctly
next to the corresponding picture in Task 6. These items will be
removed in future research and will not be involved in the online
version of the test. The items skating,
hospital, heavy, mechanic, sausage, frozen, lightning, through are
ones that need to be put under further scrutiny due to reasons stated
in connection with the items hamburger
and mushroom.
To tap into the problem of students' knowledge of word classes (RQ3), nouns proved to be the easiest word class. The two listening tasks had the strongest correlation. It is also evident that Task 1 has strong and significant correlations with the rest of the tasks.
The two listening tasks
the strongest correlation. It is also evident that Task 1 has strong
and significant correlations with the rest of the tasks. However,
there is no significant relationship between Task 4 and Task 6.
This means that the relationship between students' receptive
recognition of words and their productive written lexical
is called into question (Laufer et al. 2004).
negative correlations of Task 7 with most of the other tasks make it
doubtful to apply Task 7 in future research. Since the ultimate goal
of the test was to develop a validated online instrument, it is
questionable whether the production of a sentence containing the
given word should be involved in the online test because online tests
have the disadvantage that productive knowledge is generally
difficult to test in an online environment.
this study the results of a pilot study are presented that involved
the application of a new integrated vocabulary test developed for
young learners. The ultimate goal of this research was to finalize an
instrument that will be used online. All the linguistic skills -
except for speaking - were measured with a focus on
English-as-a-foreign-language vocabulary. Test takers need to have
good listening, reading and writing skills to reach a high score in
the test.
the item and task analysis described above, valuable data could be
gained with regards to future assessment. The results have provided
sufficient information as to what kind of modifications must be
implemented. It is also a useful finding that most of the task types
correlate with one another, which means that these tasks are not
independent of one another.
