Editor

JLLT edited by Thomas Tinnefeld
Journal of Linguistic and Language Teaching
Volume 3 (2012) Issue 2



Bilingual Testing at the Phrase and Text Levels
And Its Implications for Bilingual Programmes

Kay Cheng Soh (Singapore)

Abstract
As an earlier study (Soh 2011) shows, by controlling the substantive content and item format, bilingual testing which systematically combined questions and options yielded much greater correlations in students text performance in two languages. This indicates that bilingual ability has been under-estimated when assessed by using two language tests separately. The finding is only for bilingual testing at the word level. The present study goes beyond to test bilingual ability by using the same approach but at the levels of phrase and text. The same effect was found for this more complex testing. The findings are discussed with reference to the bilingual approach to language syllabus design, classroom instruction, and assessment.
Key words: Bilingualism, bilingual testing, code-switching, language assessment



1   Introduction

A continuous issue in the teaching of second or foreign language centres around the question of approach: should it be monolingual or bilingual? For example as late as 1998, “(t)he voters in California are being asked to consider an initiative this June that would ban the use of foreign languages in the instruction of younger children with limited English proficiency” (Greene 1998: 1). This represents a typical controversy, with some emotion overtone, of multilingual communities in which languages and learning are concerned. As stated, it seems that voting to decide one way or the other will resolve the issue.

In a democracy, fair voting will no doubt settle a sensitive controversy for which different interest groups hold opposing views. But, this does not solve the problem which is not political but pedagogical; and it stands to reason that a pedagogical controversy is best solved by looking objectively at empirical evidence which may support either side of the debate or may point to the non-existence of a ready solution up to the time of decision because not enough is known about the effects of either decision. In the California case, Greene (1998) conducted a meta-analysis of the effectiveness of bilingual programmes in contrast with monolingual ones with the explicit purpose of helping to resolve the disagreement between two interest groups.

Greene’s (1998) meta-analysis is based on 11 studies which satisfied stringent inclusion criteria from a pool of 75 studies located. The 11 studies spanned over a 20-year period from 1972 to 1991 and involved 2179 students of whom 1562 were in bilingual programmes. Measures were taken of English Language, Reading, and Mathematics. The effect sizes corrected for sample sizes vary from -0.03 to 0.79 for English Language and -0.33 to 0.74 for Reading in English. The average effect sizes in terms of Hedge’s g are 0.21 for Reading in English and 0.12 for Mathematics (tested in English). In such a case, without the meta-analysis, both camps could cite individual studies in isolation from the rest to support their arguments and silent on those not supportive of their preferences. However, the meta-analysis is able to indicate with clarity what the case is on average. In this California case, there is evidence, albeit not large, in favour of the bilingual approach. This report drew a favourable comment of the renowned bilingual educationist Krashen, “Greene's Meta-Analysis is a short report that should have a profound impact on the field” (Krashen 1998: 1).

In fact, not long prior to Greene’s meta-analysis cited above, Willig (1985) conducted a similar meta-analysis and found that, with statistical controls for methodological inadequacies,  participation in bilingual education programs consistently produced small to moderate differences favoring bilingual programmes for tests of reading, language skills, mathematics, and total achievement when the tests were in English, and for reading, language, mathematics, writing, social studies, listening comprehension, and attitudes toward school or self when tests were in other (native) languages.

Later, Rolstad, Mahoney & Glass (2005) meta-analyzed 17 studies and obtained an effect size of 0.26 for English Language and even an effect size of 0.86 for educational outcomes assessed in the students’ native language. The authors then concluded that “(i)t is shown that bilingual education is consistently superior to all-English approaches” (Rolstad, Mahoney & Glass 2005: 1).

Willig’s (1985) and Greene’s (1998) meta-analyses are cited here to illustrate how meta-analysis could contribute to the search for a more effective approach of bilingual education. In the years 2004-2006, several meta-analyses on bilingual education programmes appeared, including those by Grissom (2004), Krashen & McField (2005), and August & Shanahan (2006). For a complete list, see Norm Gold Associates (2007).

Notwithstanding these, bilingual education is gaining more attention in the world, as more Westerners are learning Asian languages (especially Chinese) and more Asians (especially PRC Chinese) are learning English. The issue of approaches is yet to be settled and will continue. However, the question of which approach to adopt for bilingual programmes remains unsettled because it touches on a number of conceptual and practical issues, including the facilitating and interfering effects of one language on another in a bilingual context. The study of the relative strengths of these effects (transfer or interference) has a long history in bilingualism research with moments of clear interest and moments of disregard. The current period is a period of renewed interest (Grosjean (2012).  

When students learn two languages concurrently and make errors in one, teachers tend to blame the other language. This is natural and in some cases, though not all, justified. The effect of one language on another comes in three forms: facilitation, delay, and interference (Hsin 2011), and the possibility of facilitating structural transfer plays an important role in the course of children’s syntactic development. One form of facilitation is code-switching which may be viewed as an extension of one language to another for bilinguals rather than interference, but from other perspectives, it may be viewed as interference. Whether code-switching is facilitation or interference depends on the situation and context in which it occurs.  

Perhaps, because it is relatively easier to detect interferences than to notice facilitation, much more research has been done on inter-language interference (basically through error-analysis and contrastive analysis) than on the facilitating effect (methodologically, correlation analysis). In a recent study, Nayernia (2011) has found that inter-language errors account for only 16.7% of the total errors made in writing by Iranian students of EFL while intra-language errors account for 83.3%. Moreover, in his review, studies spanning over a 12-year period from 1971 to 1983 reported inter-language errors varying from as little as 3% to 51%, with a median of 31%. These together suggest that teachers of second or foreign language will benefit more by focusing on inter-language facilitation than interference.

Where bilingual education is concerned, in programme evaluation as in assessment of learning, the abilities to function in two languages are usually assessed by using separate monolingual tests, for instance, one for English and one for Chinese. As pointed out in an earlier paper (Soh 2011), this approach to testing bilingual ability could well underestimate the relationships between the performance levels in the two languages and hence the beneficial effect of one language on the other, because the two tests tend to be based on different substantivecontext and use different formats for the language tasks. It was argued and shown there that had these two critical factors in measurement been controlled (equated), the L1-L2 correlation would be greater than what has hitherto been found. In other words, to truly assess students’ bilingual ability, a bilingual approach is needed.

It may be speculated that, for the following reasons, several conditions might have contributed to this measurement approach which may be less efficient than it could be:
·   In the school curriculum, first and second languages are usually considered as  two unrelated subjects in the curriculum.
·        Therefore, these languages have their respective syllabuses and textbooks.
·        The two languages are taught by two teachers who are more likely monolingual; even if some teachers are bilingual, they are normally not asked to teach both languages.
·  Because of these three pre-conditions, the two languages are usually assessed separately using different tests set by different teachers.

All in all, there is an absence of cross-linguistic coordination. Even in bilingual education programmes, a monolingual approach is used to teach two languages with no interaction between them; everything is monolingual in a bilingual learning environment.

As just alluded to above, bilingual ability usually is tested by using two monolingual tests. These tests are based on different content, assess different linguistic knowledge and even have different items and formats, as the tests are normally designed by the teachers of the two languages. It is therefore more appropriately described as consecutive assessment of two languages in two discrete contexts.  Thus, the task is basically monolingual and consists in testing two languages one at a time. An alternative to this approach of testing bilingual ability is to design tests which require the simultaneous use of two languages in the same process of testing. Thus, when taking a bilingual test, the student needs to use knowledge in one language to answer questions posed in another language. In terms of Paivio & Desrochers’ (1980) model of bilingual dual coding (Soh 2010: 274; 291), knowledge stored in two verbal systems is invoked through code-switching to perform one task where the bilingual test items function as a L1-L2 connector: 



In our study, we (2011) produced evidence, albeit tentative, to support the argument for bilingual testing of bilingual ability at the word or vocabulary level. He crossed the item stems and options in English and Chinese, thus producing two monolingual and two bilingual word tests (Fig. 2). With a sample of about 200 primary school students, sizeable correlations were obtained indicating that knowledge in one language could be  activated in another language through the bilingual tests involving code-switching. The results are summarized in Table1:





As shown in Table 1, the correlation between scores for the two monolingual tests is r=0.90, indicating 81% of shared variance. This is greater than what has been found (around r=0.70, on the average) when correlating scores for separate monolingual tests with no language interaction during the testing. The enhanced correlation was attributed to the fact that, in bilingual testing, the substantive content and the item format of the tests were controlled (equalized), thus minimizing the error variance due to the differences in these two aspects of the measurement.

Measures correlated

R
Variance percent
Monolingual English-English and Chinese-Chinese Tests
.90
81
English-Chinese Code-switch Test with English-English Test
.83
68
Chinese-English Code-switch Test with English-English Test
.80
64
English-Chinese Code-switch Test with English-English and Chinese-Chinese Tests
.86
74
Chinese-English Code-switch Test with English-English and Chinese-Chinese Tests
.83
69
Tab. 1. Correlations among Word Tests

Next, when scores for bilingual code-switch tests were correlated with scores for monolingual English test, the coefficients were 0.83 and 0.80, indicating 68% and 64% of variances, slightly higher for English-Chinese code-switching than for Chinese-English code-switching. Moreover, when scores for the other monolingual test (Chinese) were added to the prediction, the multiple correlations improved to 0.83 and 0.86, indicating 74% and 69% of shared variance. Two points are worthy of note here:

·       The additional predictor (monolingual Chinese test) predicted only an additional 5% or 6% of the variances in the code-switching tests.
·      As was reported above, switching from the first language (English) which the students were more familiar with was slightly more efficient than switching for their second language (Chinese).

Against the background of these findings of bilingual testing at the word level, a logical extension of the question is whether the same phenomenon is found at higher levels. While learning single words or vocabulary building is the fundamental of language learning in both monolingual and bilingual contexts, students have to go beyond this level so as to be capable of mastering normal bilingual communicative situations. Therefore, operationally, the question is whether bilingual students are able, and to what extent, to perform code-switching tasks at the phrase and text levels.


2   Method

2.1 Participants

Participants of the study were 212 Primary Three, Four, and Five students of two schools which had had above-national averages in English and Chinese assessments for the three consecutive years prior to data collection. Within class, students had equivalent performance in the two languages. As schools normally do not change their performance level drastically, it was assumed that the students participating in this study would have a very similar language profile. There was a slight preponderance of girls in the sample, with 45% boys and 55% girls. They attended schools in which English was taught at the first-language level and Chinese at the second-language level. However, it is necessary to point out that such labels of first language and second language, as used in Singapore schools, were used in an administrative sense and did not necessarily reflect the linguistic background of the students. However, at the time of data collection, these students tended to come from families where English was more commonly the home language (i.e. the real first language in a linguistic sense).


2.2  Measures

Students took four tests. First, there were the two monolingual tests in English and Chinese. These tested students’ English and Chinese abilities separately at the word or vocabulary level. The substantive content and the item format (four-option multiple-choice items) of the two tests were the same, although the questions were presented in two languages separately. The advantage of this cross-language uniformity in the assessment of bilingual ability was discussed recently (Soh 2011).

The other two tests are bilingual tests situated at the phrase and the text levels. They are described more in detail with sample items given below.


2.3 Tests

2.3.1 Bilingual Phrase Test
There were 20 multiple-choice items in this test. Each item took the form of a stem in one language and four options in another language. Thus, when completing an item, students needed to involve two languages, code-switching from English to Chinese, or the other way round. Ten of the items had stems in English and options in Chinese; the other ten items had it the other way round. Sample items are shown in Figure 3. 

In the first sample item below, the four options in Chinese are (1) It’s a fair day, (2)  It’s a  cold day, (3)  It’s a rainy day, and (4)  It’s a  cool day. One of these is to be matched with the stem Mei Leng carries a red umbrella. In the second sample item, the stem Ali pasted a stamp (in Chinese) was to be matched with one of the four options in English. Each correct matching earned one point, thus, the highest possible score was 20 for the complete bilingual phrase test.


1.       晴天,
2.       天气冷,
3.       雨天,
4.       天气凉,




Mei Leng carries a red umbrella.


阿里贴邮票

1.       under the table.
2.       on the envelope.
3.       on the wall.
4.       in the basket.

Fig. 3: Sample items of the Phrase Test


2.3.2 Bilingual Text Test

This test took the form of the usual reading comprehension test. It first presented a passage followed by four-option multiple-choice items. However, the passage was in one language but the questions and options were in another, thus demanding a code-switching in the process of answering.  For this test, there was one passage in English with five questions in Chinese, and also one passage in Chinese followed by five questions in English. Thus, altogether, there were 10 items, giving a possible maximum score of 10.

Two sample items for the English passage are shown below (Fig. 4). The first question was Who sent the letter to Mei Leng? with the options (1) Her sister, (2) Her friend, (3) Her teacher, and (4) Her brother. The second question was What was inside the envelope? with the options (1) A photo, (2) Many used stamps, (3) Many new stamps, and (4) A stamp. The next passage in Chinese was about Mei Leng’s friend Kong Wah who disliked stamps but liked sports, especially swimming.


Mei Leng took the envelope from the postman and ran into the house. The letter was from her friend in England. Mei Leng knew that she was going to see a lot of used stamps. She opened the envelope…
     

谁寄信给美玲?
1.       美玲的姐姐。
2.       美玲的朋友。
3.       美玲的老师。
4.       美玲的哥哥。


信封里有什么?
1.       有一张照片。
2.       有许多用过的邮票。
3.       有许多新的邮票。
4.       有一张邮票。

      当美玲在看邮票时,光华来找她。美玲就开门,让光华进去。光华和美玲是好朋友,可是光华不喜欢邮票。他喜欢到海边去游泳。他时常运动,所以身体很健康。。。


When did Kong Wah visit Mei Leng?
1.       When Mei Leng ran out of the house.
2.       When Mei Leng was looking at the stamps.
3.       When Mei Ling was keeping the stamps.
4.       When Mei Leng ran into the house.



What did Kong Wah like to do?
1.       Kong Wah liked to go to the market.
2.       Kong Wah liked to go fishing.
3.       Kong Wah liked to collect stamps.
4.       Kong Wah liked to swim.
Fig. 4: Sample Items of the Text Test


3   Results

3.1 Bilingual Phrase Test     

Table 2 shows the performance levels, mean comparisons, and correlations between scores for the bilingual phrase test and the two monolingual word tests:

Class
N
Mean
SD
Cohen’s d
t-value
r(p.e)
r(p.c)
Primary 5
59
17.45 (87%)
1.24
1.10
6.41
0.63
0.65
Primary 4
80
16.13 (81%)
1.17
2.44
12.27
0.58
0.53
Primary 3
73
12.01 (60%)
2.08
-
-
0.73
0.71
All
212
15.08 (75%)
1.56
-
-
0.65
0.63
Tab. 2: Performance and Correlations for Phrase Test[1]

As shown therein, students at the three class level scored 60% - 87% correctly on the test, with an average of 75% for the sample as a whole. One-way ANOVA results show statistical significances among the three class levels. Subsequent pair-wise comparisons between adjacent class levels by the independent t-tests also show statistical differences. Analyses also show statistical power of 0.90, which is greater than the conventional 0.80. There are very large effect sizes in terms of Cohen’s d. All these indicate that the bilingual phrase test was valid in that its scores differentiated among the three class levels as expected.

Tab. 2 also shows that the correlations between the bilingual phrase test and the monolingual English word test vary from 0.58 to 0.73, with an average of 0.65 for the sample as a whole. Assuming a causal direction from the monolingual English word test, these indicate that knowledge at the word level contributed 34% - 53% of the variance in the bilingual test at the phrase level, with an average of 42% for the whole sample.

Similarly, as also shown in Table 2, the correlations between the bilingual phrase test and the monolingual Chinese word test vary from 0.53 to 0.71, with an average of 0.63 for the sample as a whole. Assuming a causal direction from the monolingual Chinese word test, these indicate that knowledge at the word level contributed 28% - 50% of the variance level, with an average of 40% for the whole sample.

In sum, students’ word knowledge in the two languages contributed substantially to their bilingual ability at the phrase level, slightly more by the English word knowledge.


3.2  Bilingual Text Test 

Table 3 shows the performance levels, mean comparisons, and correlations between scores for the bilingual phrase test and the two monolingual word tests:

Class
N
Mean
SD
Cohen’s d
t-value
r(t.e)
r(t.c)
Primary 5
59
8.58 (86%)
1.47
0.43
2.21
0.39
0.35
Primary 4
80
7.95 (80%)
1.79
0.98
5.76
0.54
0.45
Primary 3
73
5.97 (60%)
2.44
-
-
0.79
0.65
All
212
7.44 (74%)
1.96
-
-
0.61
0.51
               Tab. 3. Performance and Correlation for Text Test[2]

As shown therein, students at the three class level scored 60% -86% correctly on the test, with an average of 74% for the sample as a whole. One-way ANOVA results show statistical significances among the three class levels. Subsequent pair-wise comparisons between adjacent class levels by the independent t-tests also show statistical differences. Analyses also show statistical power of 0.92, which is greater than the conventional 0.80. However, the effect size is a small 0.43 for the comparison of Primary 5 and Primary 4, but a large 0.98 for the comparison between Primary 4 and Primary 3.  All these indicate that the bilingual text test was valid in that its scored differentiated among the three class levels as would be expected.

Table 3 also shows that the correlations between the bilingual text test and the monolingual English word test vary from 0.39 to 0.79, with an average of 0.61 for the sample as a whole. Assuming a causal direction from the monolingual English word test, these indicate that knowledge at the word level contributed 15% - 62% of the variance in the bilingual test at the text level, with an average of 37% for the whole sample.

Similarly, as also shown in Table 3, the correlations between the bilingual text test and the monolingual Chinese word test vary from 0.35 to 0.65, with an average of 0.51 for the sample as a whole. Assuming a causal direction from the monolingual Chinese word test, these indicate that knowledge at the word level contributed 12% - 42% of the variance level, with an average of 26% for the whole sample.

In sum, students’ word knowledge in the two languages contributed perceivably to their bilingual ability at the phrase level, more by the English word knowledge. Moreover, by comparison, monolingual word knowledge contributed much more to bilingual ability (around 40%) at the phrase level than at the text level (around 25%), as would be expected in view of the greater complexity of text reading.


4   Discussion and Conclusion

In the context of bilingual programmes, there is a continuous search for a more appropriate and effective assessment of bilingual ability to reflect the effectiveness of instruction. This study therefore set out to verify whether bilingual testing which was shown to be effective at the word level (Soh 2011: 263) is workable at the higher levels of phrase and text. The finding is positive in that students presented with phrase and text materials in one language were able to deal with them by using another language, evidencing the validity and hence viability of the Paivio-Desrocher’s model of bilingual dual coding (Soh 2010: pp. 263). This finding has important implications for bilingual programmes.

Since learning two languages concurrently has, in the literature as well as in the present study, been found to have more facilitation than interference, the positive effect can be maximized by coordinating the programmes of the two languages involved. This can be achieved through alignment of language concepts and skills of the two languages such that the teaching of them comes close to one another to allow for inter-language references. In short, the two language syllabi will cover the same grounds while leaving some rooms for differences which language peculiarities demand. Moreover, the substantive content (e.g. stories) can, again, be very much the same so that what has been learned in one language will be available for use to learn in another language without having to go through the same ground; in this case, when learning another language, students only need to learn the symbols (language) and not the message (content), with some rooms for language variations.

In terms of classroom instruction, bilingual teaching in which cross-language references are freely made will facilitate the learning of another language without the struggle to grapple with the new content. This form of bilingual teaching and bilingual learning will put both teacher and students on a psychologically safer ground because past knowledge in the other language can be activated  to solve a current learning problem.

As for assessment, this and the earlier study (Soh 2010, 2011) have shown that bilingual testing yields results which are a more truthful representation of students bilingual ability in moving freely between languages – and, this is the hallmark of being truly bilingual.


References

August, D. & Shanahan, T., eds. (2006). Developing Literacy in Second-Language Learners: Report of the National Literacy Panel on Language Minority Children and Youth. Mahwah, NJ: Lawrence Erlbaum Associates.

Greene, J. P. (1998). A meta-Analysis of the Effectiveness of Bilingual Programmes. University of Texas at Austin. Accessed on 15 July, 2001 from http://www.hks.harvard.edu/pepg/PDF/Papers/biling.pdf

Grissom, J. B. (July 2004). Reclassification of English learners. Educational Policy Analysis Archives, 12 (36). Retrieved 02-21-06 from http://epaa.asu.edu/epaa/v12n36/

Grosjean, F. (May 2011). An attempt to isolate, and then differentiate, transfer and interference. International Journal of Bilingualism 16 (1), 11-21.

Hsin, L. (April 2011). Accelerated Acquisition in English-Spanish Bilinguals: The Structural Transfer Hypothesis. Accessed on July 20, 2011 from
       http://www.cog.jhu.edu/grad-students/hsin/Lisa_Hsin_-_CV_files/hsinWCCFL29.pdf

Krashen, S. D. (March 1998). A Note on Greene's "A Meta-Analysis of the Effectiveness of Bilingual Education. Accessed on 15 July, 2001 from http://www.languagepolicy.net/archives/Krashen2.htm#N_1_

Krashen, S. & McField, G. (Nov/Dec 2005). What works? Reviewing the latest evidence on bilingual education. Language Learner, 1(2).  

Nayernia, A,. (Summer, 2011). Writing errors, what they can tell a teacher? Modern Journal of Applied Linguistics, 3 (2), 200-208. Accessed on July 21, 2011 from
  http://www.mjal.org/Journal/14.Writing%20Errors,%20what%20they%20can%20tell%20a%20teacher.pdf

Norm Gold Associates (September 2007).Selected Reports on the Effectiveness of Bilingual Education. Accessed on July 15, 2011 from
http://www.bilingualeducation.org/pdfs/RefsEffectivenessofBiEd.pdf

Paivio A (1971) Imagery and Verbal Processes. New York: Holt, Rinehart, & Winston. (Reprinted by Lawrence Erlbaum Associastes, Hillsdale, NJ, 1979).

Paivio A and Desrochers A (1980) A dual-coding approach to bilingual memory. Candian      Journal of psychology, Review of Canadian Psychology, 34(4): 388-399.

Rolstad, K., Mahoney, K. S., & Glass, G. V.(Spring 2005).Weighing the Evidence: A Meta-Analysis of Bilingual Education in Arizona. Bilingual Research Journal, 29 (1), 43-67.

Skiba, R. (October 1997).  Code switching as a countenance of language interference. The Internet TESL Journal, 3 (10). Accessed on July 20, 2011 from http://iteslj.org/Articles/Skiba-CodeSwitching.html

Soh, K. C. (2010) Bilingual dual-coding and code-switching. Journal of Linguistics and Language Teaching, 1/ 2, 271-296.

Soh, K. C. (2011). Testing Students' Bilingual Ability in a Bilingual Manner. Journal of Linguistics and Language Teaching, 2 / 2, 253-266.

Willig, A. C. (Fall 1985). A Meta-analysis of selected studies on the effectiveness of bilingual education. Review of Educational Research, 55 (3), 269-317.



Author:
Dr. Kay Cheng Soh
50 Lorong 40 Geylang #07-29
The Sunny spring
Singapore 398074
E-mail: sohkc@singnet.com.sg



[1] r(p.e)=correlation between the bilingual Phrase Test and the monolingual English Word Test. r(p.c.)=correlation between the bilingual Phrase Test and the monolingual Chinese Word Test.
F=214.544, df 2:209 p=.001. (2) Cohen’s d’s and t-values are for comparing means at the two adjacent class levels. (3) For alpha=.05 and d=0.50, P5-P4 comparison, power is 0.90; for P4-P3 comparison, power=0.92. (4) All t-values and correlation coefficients are statistically significant (p<.05, two-tailed).

[2] r(t,e)=correlation between the bilingual Text Test and the monolingual English Word Test. r(t.c.)=correlation between the bilingual Text Test and the monolingual Chinese Word Test.
   F=33.045, df 2:209, p=.001. (2) Cohen’s d’s and t-values are for comparing means at the two adjacent class levels. (3) For alpha=.05 and d=0.50, P5-P4 comparison, power is 0.90; for P4-P3 comparison, power=0.92. (4) All t-values and correlation coefficients are statistically significant (p<.05, two-tailed)