Editor

JLLT edited by Thomas Tinnefeld

Journal of Linguistics and Language Teaching

Volume 12 (2021) Issue 2, pp. 173-192


“I can see a lady with a curly brown hair.” - 

A Corpus-Based  Investigation of Article Use in the Language of Young Norwegian EFL Learners[1]


Sofie Larsen (Måløy, Norway) & Kristian A. Rusten (Bergen, Norway)


Abstract 

This paper investigates the use of articles in texts written by young EFL learners in Norway. The accurate  use of articles has been highlighted as a problem area for Norwegian learners by e.g. Bækken (2006), yet scant quantitative research exists on the performance of young Norwegian learners of English. By means of the recently compiled Corpus of Young Learner Language (CORYL), we investigate the frequency of overuse, underuse and ungrammatical use of the definite and indefinite articles among Norwegian EFL learners aged 12-13 and 15-16 years. We also quantitatively contrast learners’ non-target-like uses of the articles with their (overt) target-like uses, and we demonstrate how the  learners’ use of articles develops from Year 7 to Year 10 of primary and lower secondary education. On the basis of our findings, pedagogical implications are discussed. We will show that Norwegian learners have achieved a very high level of accuracy as early as in Year 7 of primary education, but that little discernible development occurs between Years 7 and 10. Our quantitative data also allow us to question Bækken’s assertion that Norwegian EFL learners have particular problems with the overuse of the definite  article and the omission of the indefinite article. 

Key words: Young EFL learners, L2 writing, definite article, indefinite article, acquisition of articles 


Abstract (Norsk)

Artikkelen undersøker bruken av artikler i tekster skrevet av elever i Norge som har engelsk som fremmedspråk. Korrekt bruk av artikler på engelsk har blitt løftet frem som et problemområde for norske elever av blant annet Bækken (2006), men likevel finnes det svært lite kvantitativ forskning på hvordan norske elever presterer i engelsk. Ved å bruke det nylig sammensatte korpuset Corpus of Young Learner Language (CORYL), har vi undersøkt frekvensen av overbruk, underbruk og ugrammatisk bruk av bestemt og ubestemt artikkel blant norske elever med engelsk som fremmedspråk i alderen 12–13 og 15–16 år. Vi kontrasterer også elevenes ukorrekte bruk av artikler med deres korrekte bruk av (uttrykte) artikler gjennom en kvantitativ undersøkelse, og vi viser hvordan elevenes bruk av artikler utvikler seg mellom år 7 og år 10 i grunnskolen. Basert på funnene våre diskuterer vi pedagogiske implikasjoner. Vi vil vise at norske elever oppnår et høyt nivå i bruken av artiklene så tidlig som i 7. klasse i grunnskolen. Likevel er det lite merkbar utvikling i korrekt bruk av artikler fra 7. klasse til 10. klasse. Våre kvantitative data fører til at vi tillater oss å stille spørsmål ved Bækkens påstand om at norske elever med engelsk som fremmedspråk har spesielle problemer med overbruk av bestemt artikkel og utelatelse av ubestemt artikkel. 

Søkeord: Norske elever med engelsk som fremmedspråk, artikler, bestemt artikkel, ubestemt artikkel



1   Introduction 

The present paper sets out to investigate the use and non-use of the definite and the indefinite article in the writing of young Norwegian learners of English as a Foreign  Language (EFL) through a corpus-based study. The article draws its empirical material from the recently compiled Corpus of Young Learner Language (CORYL), which is an error-tagged corpus containing texts written by Norwegian EFL learners aged 12-13, 15-16 and 16-17 years at the time of writing.[2]

Many significant strides have been made in corpus-based examinations of learner language, and there is a wealth of studies which investigate English as a target  language. However, one overarching characteristic of the learner corpus research tradition is arguably that its focus tends to rest predominantly on the linguistic performance of intermediate and advanced learners, and particularly on that of tertiary  level undergraduate students.[3] This potentially represents a lacuna in the research tradition, as useful corpus-based insights into learner language and language learning  may be derived from studying the output of younger learners, who operate on much lower proficiency levels than University students. In the Norwegian context, for  example, English is obligatorily taught as a school subject from Year 1, when learners  are 5-6 years of age, and learners are expected to write short texts at a reasonably early stage of education.[4] This early stage of EFL development has not received much attention by previous corpus researchers, but may be accessed via the CORYL  corpus. This is the area to which this paper mainly seeks to make a contribution, by  means of investigating the CORYL learners’ linguistic performance relating to articles. 

Over the course of several decades, the acquisition of the articles has become one of the most well-researched grammatical areas in ESL (e.g. García Mayo  2009, Jarvis 2002, Master 1987, 1997, 2002, 2003, Parrish 1987, Sarko 2009, Snape  2005, Thomas 1989, Young 1996 and Zdorenko & Paradis 2008, 2011). The acquisition of the definite and the indefinite article is described by Ionin et al. (2008: 555) as a "notoriously difficult process for L2 learners", and they also note that "many studies (...) have found that L2-English learners make errors omitting and / or misusing English  articles" (ibid. 2008: 55). Swan’s (2016: section 133) usage guide for foreign and second  language learners of English notes that the "correct use of the articles is one of the most difficult points in English grammar", and Master (2002: 331) stresses that even "the  most advanced non-native speaker of English as a second or other language" makes mistakes in the use of articles "even when all other elements of the language have  been mastered". According to Master (2002: 332), one reason for this is that the English "article system stacks multiple functions onto a single morpheme", causing 

a considerable burden for the learner, who generally looks for a one-form-one-function  correspondence in navigating the labyrinth of any human language until the advanced stages  of acquisition’. (Master 2002: 332)

The problems involved have been convincingly argued to be most severe for "learners whose native language does not have any  articles, such as Chinese or Russian" (De Capua 2017: 60; Nordanger 2017: 7 (for the list of references)). Norwegian is a language typologically similar to English, but Bækken (2006: 99) – although she does not give insight into the quantitative basis for her claims – nevertheless states that the "[c]orrect use of the articles is difficult for Norwegians", that "this particular field of grammar is the source of numerous mistakes", and that "Norwegians are particularly prone to 'overuse' the definite article" "while the indefinite article is often left out". On the basis of these claims, school-aged Norwegian EFL learners can be expected to have (considerable) difficulties in achieving a target-like use of the articles, which, in part, are due to a number of (subtle) differences in the article systems of English and Norwegian (Section 2 for details). However, quantitative investigations of the performance of young Norwegian EFL learners with regard to article use are non-existent, at least as far as we are aware, and no corpus-based work in this area has, to our knowledge, been carried out. Our study is novel in this sense, then. Moreover, as pointed out in Nordanger’s (2017) study on acquisition of the Norwegian articles by L1 speakers of Russian and English, work on articles and definiteness has most often tended to contrast learners having “L1 backgrounds not exhibiting any grammatical category of  definiteness” with “learners with an L1 that exhibits an article system similar to that of English, such as French and Spanish” (Nordanger 2017: 8). Norwegian has definiteness, but its encoding is notably different from English: Norwegian definite noun phrases are indicated by means of morphological inflection (i.e. a suffix), and indefinite ones are marked periphrastically via an independent prenominal article morpheme (e.g. Faarlund et al. 1997: 150).[5]  This means that our study also seeks to contribute to knowledge in the sense that it supplies a corpus-based investigation of the L2 English output of young school-aged speakers of a language which is closely related to English, but  which differs notably in its details. 

This article, then, quantitatively examines the target-like and non-target-like use of  articles by young Norwegian EFL learners. Non-target-like uses have been  operationalized in terms of the categories of underuse, overuse and the ungrammatical use of articles – evidence of which can be found through error analysis –, and we contrast these uses with the learners’ actual (overt) target-like uses of the definite and indefinite articles. In so doing, we aim to present an overview of article use among the Norwegian EFL learners under investigation. The purpose of this paper is, thus, to acquire a better understanding of the degree to which novice Norwegian EFL learners – speakers of a language closely related to English – have acquired the articles, as evidenced by their L2 writing. This understanding can subsequently contribute to raising awareness about challenges faced by learners in this area, and it follows from this that the insights derived from the present study may be applied to the teaching of English in Norway and elsewhere. 

Our objectives are therefore threefold: 

  • to describe the frequency of overuse, underuse and the ungrammatical use of the definite and indefinite articles among young Norwegian EFL learners, 

  • to quantitatively contrast the learners’ non-target-like uses of the articles with their (overt ) target-like uses, and 

  • to chart how the learners’ use of articles develops from year 7 to year 10 of primary and lower secondary education. 

Due to the absence of previous quantitative work, the present study is descriptive and exploratory in focus, and it is expected that the CORYL data will reflect the general remarks made by Bækken (2006). That is, if Bækken’s statements are essentially correct, we expect the CORYL data to demonstrate frequent non-target-like use of articles, and we expect overuse of the definite and underuse of the indefinite article to predominate in our data. We also predict that the frequency of non-target-like uses will correlate with age; hence we expect that non-target-like uses will be more prevalent with younger, as opposed to older, learners. 

The article is structured as follows: Section 2 provides the background for our study by contrasting key aspects of the structure and usage patterns of articles in  English and Norwegian. Section 3 presents the corpus-based methods through which our data have been collected. Our findings are presented and discussed in Section 4, and Section 5 concludes the article, discussing the pedagogical implications of our findings. 


Articles in English and Norwegian – a Short Background Survey 

As is well known, the words traditionally referred to as ‘articles’ belong to the functional category of determiner, a category whose members in English precede nouns in linear order as part of a noun phrase (NP) (or, as analyzed in generative frameworks following Abney (1987), a determiner phrase (DP)). The English articles specify a noun’s definiteness (via the definite article the) or indefiniteness (via the indefinite article  a / an), as well as its discourse status. While the article may, under certain circumstances, be omitted from the NP – most typically with uncountable nouns, plural countables and in cases where nouns have generic reference (e.g. Biber et al. 1999: 261) –, it is assumed to nevertheless be covertly present as a zero, null or ‘empty’ article even in traditional, surface-oriented frameworks, since articles confer specification to the noun regardless of overt realisation, as shown in the examples (1) to (4) below (the noun phrases in question have been set in boldface).[6], [7]

(1) Two policemen had been admitted to hospital along with Wullie Robertson with a broken nose.  (BNC HNK (W_fict_prose)) 

(2) Most will stay on for an extra year at school or go into some form of further training (BNC CN5 (W_ac_polit_law_edu)) 

(3) Appeal said there was no duty to pass on all the information available to the hospital (BNC  HXV (W_ac_polit_law_edu)) 

(4) Many pupils have long wanted a quiet, peaceful area in the school so they can talk, read and think during playtimes and lunch breaks (BNC K9S (W_advert)) 

As concerns patterns of use, the English definite article combines ‘with all types of noun, except proper nouns, in the singular as well as the plural’ (Bækken 2006: 103),[8] and it is also used to indicate that the speaker and listener are familiar with the referent (ibid.: 102), i.e. the referent conveys given information in the sense of e.g. Chafe (1976). The indefinite article is used to indicate the introduction of an unknown referent – new information (cf. Chafe 1976) – and, being historically descended from the Old English numeral an ‘one’ (e.g. Los 2015: 47-48), the indefinite article is only used in the singular. As a consequence of this, then, the indefinite article is ‘typically used with countable nouns’ (Bækken, 2006: 110). 

A complete review of the usage patterns of the definite, indefinite and zero  articles in English is outside the scope of this article (for this, readers are referred e.g. to the grammars by Biber et al. (1999) and Quirk et al. (1985)). Suffice it to say that several constraints govern the use of the articles, and the specific article selected depends on e.g. countability, reference, specificity, number, and discourse status, and the uses of the omitted article are arguably more complex than the standard overt uses of the articles (e.g. the overviews given in Biber et al. (1999: 261-263) and Quirk et al. (1985: 274-281)). This clearly comprises a system which can be challenging for second or foreign language learners to acquire. Furthermore, articles are very highly frequently used words in English,[9] and Master claims that this – presumably in addition to the complexities evident in usage – makes ‘continuous conscious rule application difficult’ for learners ‘over an extended stretch of discourse’ (Master 2002: 332). 

Recall now that Bækken (2006: 99) asserts that ‘[c]orrect use of the articles  is difficult for Norwegians’ and that ‘this particular field of grammar is the source  of numerous mistakes’. According to her, “interference from Norwegian” (ibid.: 99) is the reason why non-target-like uses and / or non-uses of articles occur in the linguistic output of this group of learners. While the article systems in English and Norwegian are broadly similar, in the sense that both languages are [+ART] [10] and both languages express definiteness and indefiniteness structurally, there are also a number of formal and functional  differences between the two languages. A complete review is outside the scope of this article, yet we consider a very short survey to be useful here. To this effect,  Table 1 demonstrates the formal differences between the article systems of English and Norwegian. 

As is apparent, both languages feature indefinite articles, but where English expresses definiteness periphrastically, this grammatical category is expressed inflectionally in Norwegian, with specific forms contingent on gender and number. Despite the evident  differences, however, 2he mapping of the grammatical category [of definiteness – S.L & K.A.R] onto core pragmatic and semantic contexts’ has been argued to be ‘largely identical in Norwegian and English” (Nordanger 2017: 10). Thus, the impression should definitely not be given of two fundamentally different systems.


Singular

Plural

Indefinite

Definite

Indefinite

Definite

Masculine (Norw. only)

a car

en bil


the car

bilen 

car-m.def. sg

cars

biler 


the cars

bilene

car-m.def.pl

Feminine (Norw. only)

a girl

ei jente 


the girl

jent

girl-f.def.sg

girls

jenter 


the girls

jentene 

girl-f.def.pl

Neuter

Strong (Norw. only)


an apple

et eple



the apple

eplet-

apple-n.def.sg


apples

epler 



the apples

eplene 

apple-n.def.pl

Neuter

Weak (Norw. only)

a hotel

et hotell

the hotel

hotellet

hotel-n.wk.def.sg

hotels 

hotell


the hotels

hotellene

hotels-n.wk.def.sg

    Table 1:  Definite and indefinite articles in English and Norwegian (glosses provided in definite contexts)

The examples (5) to (10), which are intended to be illustrative, not exhaustive, demonstrate a number of differences of usage between English and Norwegian.[11]

(5) Søsteren min er lærer 

‘My sister is a teacher.’ 

(6) Han er svenske 

‘He is a Swede.’ 

(7) Jeg skal på forelesning 

‘I’m going to a lecture.’ (adapted from Bækken 2006: 112, example 138) 

(8) Du kan ikke kjøre bil uten førerkort 

‘You can’t drive a car without a license.’ (adapted from Bækken 2006: 112, example

137)

(9) Gikk du på universitetet

‘Did you go to university?’ (adapted from Bækken 2006: 112, fn. 1) 

(10) Solen går ned i vest 

‘The sun sets in the west.’

As becomes clear in these examples, in English, the indefinite article is used with predicate nouns referring to occupations (5) and nationality (6), while in Norwegian, bare NPs are used in both cases. Differences are not restricted to such cases, however, as illustrated in examples (7) and (8). Example (9) shows that Norwegian selects definiteness marking on nouns referring to institutions (e.g University, hospital, school) whereas English generally does not, while example (10) shows how the cardinal directions take the definite article in English, but not in Norwegian. Thus, despite a considerable similarity between these two languages, there are also several clear differences, which may have an effect on the findings to be presented in Section 4.

 

3   Method 

Our data collection and analysis rely on corpus linguistics, all data being drawn from the CORYL corpus. CORYL is an error-tagged[12] corpus developed at the University of Bergen (Norway), and it contains texts written by Norwegian EFL learners enrolled in years 7, 10 and 11 of primary and secondary education.[13] At the time of writing, learners were 12-13, 15-16 and 16-17 years of age. The majority of the texts originally derive from the National Tests in English conducted in 2005 and 2011,[14] and can be placed into six different text types: story, description, letter, personal letter, letter to the editor, and essay. The length and quality of the texts vary, and some of them were scored by the annotators according to the Common European Framework of Reference (CEFR) scale. While both females and males were involved in writing the texts, the ratio between the genders is not known, as the corpus does not systematically include this information. 

The data were extracted from CORYL in two main steps: firstly, all tokens containing the error tag ART, representing ‘[a]ny clear article error’ (Hasselgreen & Sundet 2017: 214), were pulled from the corpus. The ART tag encompasses the underuse and overuse of articles as well as instances where the learner selected an inappropriate article. Underuse errors constitute cases in which learners omit an article where an article should in fact be included, and, conversely, overuse errors include cases in which learners insert an article where it should have been omitted. The final category includes cases in which the learner used the definite article where the appropriate option was the indefinite article, and vice versa.

Moreover, for contrastive purposes, all instances of target-like overt uses of the definite and indefinite articles were extracted from the corpus.[15],[16], After collection, all tokens were entered into a dataframe, and all representations of data in Section 4 below were derived from this dataframe. 

CORYL is a small corpus, consisting of 129,421 words at the time of data collection.[17] This clearly raises issues of representativeness. However, Granger (2012: 9) rightly stresses that “the optimal size of a learner corpus depends on the targeted linguistic phenomenon”. Specification according to (in)definiteness – which often implies the use of articles – is a crucial feature of NPs, and nominal categories occur in most clauses. Articles can therefore be expected to be quite pervasive even in small datasets, and Granger (2012: 9) in fact uses articles as an example of grammatical phenomena which are so frequent that it is possible to investigate their use on the basis of a small corpus (Granger 2012: 9). This alleviates concerns of representativeness. Moreover, the objective of the present study is not formal generalization, but to provide an explorative investigation into the use of articles in the learner texts compiled in this specific corpus. The use of CORYL is therefore considered more than justified, not least due to the uniqueness of the corpus: as mentioned in Section 1, previous corpus studies on learner language have, to a considerable degree, focused on the language of undergraduate writers, and CORYL allows access to a stage of learner language development which has arguably been understudied thus far. 

Finally, we did not utilise a reference corpus and thus, all remarks concerning grammaticality and target-like/non-target-like performance are based on the error analysis conducted by the corpus analysts. This error analysis relies on the intuitions of the annotators (Hasselgreen & Sundet 2017: 199), and rather a strict prescriptive-traditional grammatical framework seems to have been assumed. We have generally accepted the annotators’ analyses, but have excluded from our results c. 140 tokens containing what we considered to be clear annotational errors.


4   Results and Discussion 

The procedure described in Section 3 led to the collection of a total of 6,621 target-like and non-target-like article tokens. In Section 4.1, an overview of the  types of non-target-like uses showcased in the corpus will be given, as well as their relative distribution, while contrastive quantitative overviews of target-like and non-target-like uses of articles according to year level and text type will be provided in Sections. 4.2 and 4.3. 


4.1 Types and Distribution of Non-Target-Like Uses 

As mentioned in Section 1, Ionin et al. (2008: 55) noted that many studies had  shown that L2 learners of English “make errors omitting and/or misusing English  articles”. Bækken (2006: 99) counts the overuse of the definite article and the ungrammatical omission of the indefinite article as typical mistakes among Norwegian learners of English. We find instances of these as well as other types of non-target-like use of articles in the CORYL corpus, as illustrated in the examples (11) to (18).[18]

(11) One day Jack and I came home from the school (p213-7) 

(12) Dear John I am in the Miami. (p11-7) 

(13) I can see a lady with a curly brown hair (p293-07) 

(14) We drink a te and spiste pizza (p156-07) [19]

(15) taking care of environment (p120-10) 

(16) Next day I woke up of a nois (p168-07) 

(17) They have made fier (p07-7)  [20]

(18) I can see lazy guy hvo is eating bananas. (p152-07) 

The examples (11) to (12) and (13) to (14) illustrate the overuse of the definite and indefinite articles, respectively. The examples (15) and (16) display underuses of the definite article, while the examples (17) and (18) show  underuses of the indefinite article. In a number of these citations, the learners’ non-target-like performance follows directly from differences between Norwegian and English. For example, the overuse of the in example (11) is expected since the Norwegian rendition would be overtly specified for definiteness (kom hjem fra skolen). The omission of the in (16) is also expected, since a common way of referring to subsequent time in a narrative in Norwegian is to use NPs lacking an article (e.g. neste dag ‘the next day’). Contrary to such examples, and perhaps puzzlingly, however, some of the non target-like uses are not expected in analogy with the Norwegian system. For  example, in (12) the definite article is attached to the proper noun Miami, but  proper nouns referring to cities or towns do not generally take articles in Norwegian,[21] and the Norwegian rendition of this example would be jeg er i Miami ‘I am in Miami’.[22] Similarly, in (18) an overt indefinite article would be expected in Norwegian (en lat fyr) exactly as in English (a lazy guy). 

In total, we extracted 733 instances of article use tagged as ‘clear’ article errors from the CORYL corpus. A considerable number (n=101) of these represent cases where learners selected an inappropriate form of the indefinite article: a where an is expected, and vice versa. These tokens could reasonably be taken to represent orthographical performance errors rather than an inadequate acquisition or application of a [-Definite] feature, as there is no error present in terms of specification according to definiteness. These tokens have  nevertheless been included here, as such errors do showcase incomplete formal  acquisition. Table 2 demonstrates the distribution of the 733 non-target-like article uses in our data according to error type. In the table, numbers representing non-target-like uses are cross-tabulated with the type of article involved and according to whether the non-target-like performance involves overuse, underuse or misuse of the articles. Percentages refer to proportions of the total number of errors:

Error type

Definite article

Indefinite article

Total

Overuse

142 (19.4%)

45 (6.1%)

187 (25.5%)

Underuse

171 (23.3%)

218 (29.7%)

389 (53.1%)

Misuse

41 (5.6%)

116 (15.8%)

157 (21.4%)

Total

354 (48.3%)

379 (51.7%)

733 (100%)


              Table 2: Distribution of error types tagged ART in the CORYL corpus 

Table 2 shows that in our data, non-target-like uses are quite evenly distributed across the definite and the indefinite article, but errors involving the indefinite article are slightly more frequent than those involving the definite article, at 51.7% vs. 48.3% of the total number of errors, respectively. Certain error types are very rare in the dataset: an overuse of the indefinite article occurs at no more than 6.1% of the total number of errors, and a misuse of the definite article occurs at no more than 5.6%. The errors documented in the dataset most frequently involve the underuse of the indefinite article (29.7% of the total number of article errors), while the underuse of the definite article occurs at a somewhat lower rate (23.3%). The overuse of the definite article is  the third most frequent error type in the dataset, at 19.4% of the total number of  non-target-like uses. 

These findings recall Bækken’s (2006: 99) assertions that Norwegians are  “particularly prone to ‘overuse’ the definite article” and that “the indefinite article is often left out”. As demonstrated above, errors such as those described by Bækken do occur at comparatively high proportions of the total number of article  errors in our data, and this could prima facie be taken as corroboration of  Bækken’s claims. However, examining non-target-like uses in isolation yields an  incomplete picture, as there is no indication of what proportion the various non-target-like uses represent when contrasted with target-like uses. Bækken’s  claims can be better evaluated by considering the data in Tables 3 and 4, which  show the frequencies of target-like use of the articles contrasted with frequencies of overuse and underuse, respectively. In both tables, the rightmost column shows the over- or underuse of the articles expressed as a percentage of the total number of articles, according to definiteness: [23]

Article type

Target-like uses

Overuse

Total

Overuse

(in percent)

Definite Article

3195

142

3337

4.3 %

Indefinite Article

2693

45

2738

1.6 %

Total

5888

187

6075

3.1%

              Table 3:  Target-like uses and overuse of the definite and indefinite articles in the CORYL corpus

As concerns Bækken’s (2006: 99) strongest claim, then, namely that Norwegians are “particularly prone to ‘overuse’ the definite article”, it can be observed that the overuse of the definite article comprises no more than 4.3% of the total number of uses of the definite article in this subset of the data, i.e. a subset in which cases of misuse and underuse are omitted, along with target-like uses of the zero article (footnote. 23). In our view, an overuse of slightly more than 4% can certainly not be considered significant enough to constitute a strong tendency, and as Table 2 shows, the underuse of the definite article is actually a more frequent error type in the CORYL data (Table 4 for a contrastive perspective). Consequently, contra Bækken (2006), we would tentatively like to argue that our young Norwegian EFL learners do not, in fact, seem ‘particularly prone’ to an overuse of articles.  

Similarly, Table 4 shows the total number of underuses of articles in CORYL, contrasted with target-like uses. As is evident, 7.5% of the instances of the indefinite article in this subset of data are omitted, while the corresponding  figure for the definite article is 5.1%. While the difference between the omission rates of the two types of article is statistically significant in a chi-squared test (χ2(df=1)= 15.58, p<.0001), the effect size is very low (φ=0.05).24 Thus, there is little reason to accept Bækken’s claim that Norwegian EFL learners generally are particularly prone to underuse the indefinite article: the CORYL learners’ underuse of the indefinite article is indistinguishable from their underuse of the  definite article.

 

Article type

Target-like uses

Omission

Total

Omission

(in percent)

Definite Article

3195

171

3366

5.1 %

Indefinite Article

2693

218

2911

7.5 %

Total

5888

389

6277

6.2%

Table 4: Target-like uses and omission of the definite and indefinite articles in the CORYL corpus


4.2 The Use and Non-Use of the Definite and Indefinite Articles according to Year 

With regards to a contrastive examination of article use across the different year levels in CORYL, Table 5 gives an overview of all instances of target-like uses of the articles, along with all non-target-like uses of the definite, the indefinite and the zero article, subdivided according to Year and definiteness. As noted previously, limitations inherent in the corpus prevented us from investigating the learners’ target-like uses of the zero article; thus, this is a dimension lacking from our data:

Article /

Year level

Target-Like Uses

Non-Target-Like Uses

Total

Non-Target-Like

(in percent)

Definite Article

Year 7

1587

165

1752

9.4 %

Year 10

1608

189

1797

10.5 %

Total

3195

354

3549

10.0 %

Indefinite Article

Year 7

1852

273

2125

12.8 %

Year 10

841

106

947

11.2 %

Total

2693

379

3072

12.3 %

Grand Total

5888

733

6621

11.1 %

    Table 5: Target-like and non-target-like uses of the definite and indefinite articles according to year level

The CORYL texts demonstrate a very high level of  fluency as far as article use is concerned: young Norwegian EFL learners have achieved an accuracy level of c. 90% as early as in year 7. This accuracy level is consistent with the fact that English and Norwegian are closely related languages which both specify definiteness overtly, albeit not in identical fashion (Section 2). Furthermore, there are only minor differences between the proportions of non-target-like article use in  years 7 and 10. Table 5 shows that the rates of the non-target-like use of the definite article are largely identical in the year 7 and year 10 groups, at 9.4% and 10.5%, respectively. A similar difference can be detected between the year-7 and year-10 learners' use of the indefinite article: the year-7 learners’ use is non-target-like in 12.8% of the total cases, while that figure is 11.2% for the year-10 learners. No statistically significant effect is documented for either of these differences (definite article, year 7 vs. year 10: χ2(df=1)=1.19, p=0.275; indefinite article, year 7 vs. year 10:  χ2(df=1)= 1.66, p=0.198). Overall, non-target-like uses of articles occur at relative frequencies of 11.3% of the cases in year 7 and of 10.8% in year 10. This difference is very modest, and not statistically significant (χ2(df=1)= 0.49, p= 0.484). 

Thus, while the learners’ use of articles is very accurate indeed, no  development in the accuracy of their performance can be detected between year  7 and year 10. This fact is unexpected for several reasons. Firstly, since  year-10 learners will in most cases have had almost three more years of language  instruction as well as almost three more years’ worth of language exposure and  general cognitive development than he year-7 learners, we would expect the frequency of the non-target-like use of both types of article to be consistently and significantly lower in the former group. Secondly, assuming that Master (1997: 225) is right saying that the definite article is learned first, it would be reasonable to expect that year-10 learners would display a notably higher accuracy in the use of the definite article than year-7 learners. As we have shown, this expectation is not confirmed by our data. Thirdly, and relatedly, if the indefinite article is acquired after the definite article, we would expect year-10 learners to demonstrate a higher proficiency in its use compared to year-7 learners, equivalent at least to a medium effect size when tested quantitatively. Again, however, this is not the  case in our data. 

It is a puzzling fact that none of these expectations is borne out. One possibility raised by the data, then, is that the restricted variation seen in the use of articles in  CORYL might, to some degree, be epiphenomenal, i.e. that it is affected by the text types in the corpus rather than by differences in linguistic competence. 


4.3 The Use and Non-Use of the Definite and the Indefinite Article according to Text Type

The texts produced by the year 7 learners are of different text types than the texts  produced by the year 10 learners. This fact could influence the linguistic performance demonstrated in the two datasets, since texts belonging to different text types might  pose different requirements for the learners in the sense that type A might require higher densities of specific nouns than type B. Tables 6 and 7 explore generic differences in the two datasets by showing target-like and non-target-like  uses of articles according to the text types produced by year 7 and year 10 learners, respectively:

Year 7

Target-like

Non-target-like

Total

Non-target-like

(in percent)

Description

1564

211

1775

11.9 %

Letter

366

49

415

11.8 %

Story

1509

178

1687

10.6 %

Total

3439

438

3877

11.3 %

Table 6: Year 7 Learners’ Target-Like and Non-Target-Like Uses of Articles according to Text Type


Year 10

Target-like

Non-target-like

Total

Non-target-like 

(in percent)

Essay

1495

125

1620

7.7 %

Letter to the editor

492

120

612

19.6 %

Personal letter

435

50

485

10.3 %

Total

2422

295

2717

10.9 %

Table 7: Year 10 learners’ target-like and non-target-like uses of articles according to text type 

As can be seen in Table 6, frequencies of non-target-like use are remarkably similar even in the text types in our year 7 data. Here, no statistically significant effects are detectable between the different text types (description vs. letter: χ2(df=1)= 0, p=1; description  vs. story: χ2(df=1)= 1.55, p= 0.213), and the overall relative frequency of 11.3% of non-target-like use is not attributable to influence from any particular text type. The  year-10 data in Table 7 present a different picture. In these data, non-target-like uses occur at a frequency of 19.6% in the text type letter to the editor, while the other text types represented (essay’ and personal letter) display non-target like uses at frequencies of 7.7% and 10.3%, respectively. The difference between letter to the editor and the other text types is statistically significant in both cases (letter to the editor vs. essay: χ2(df=1)= 64.28, p <.0001; letter to the editor vs. personal letter: χ2(df=1)= 17.86, p <.0001). Moreover, the overall difference in accuracy between year-7 and the year10 learners becomes larger if letters to the editor are omitted from consideration: the relative frequency of non-target like uses in our year-10 data would then be reduced to 8.3%, and the overall difference between year 7 and year 10 then emerges as statistically significant (χ2(df=1)= 13.21, p= 0.0003). However, the effect size is very low (φ=0.05), and no meaningful distinction can be made between the performance of year-7 and  year-10 learners even when letters to the editor are excluded. Even so, certain insights may be derived from a qualitative examination of non-target-like uses of articles in the learners’ letters to the editor

The key point for discussion here is what exactly causes this text type to  facilitate a comparatively higher rate of errors in the year-10 group. While the  errors made by learners involve a wide variety of nouns, we would like to argue  that the prevalence of article errors within this text type can, to some degree, be  linked to specific nouns required by the texts which learners were instructed to write. For example, one task given to learners apparently relates to environmental  action, and consequently, learners had to use several nouns whose NP structures are slightly different in Norwegian and English. For example, one recurring error in the letters to the editor involves learners omitting the definite article in the NP environment, as shown in the examples below:

(19) Young people today don't care less about environment (p176-10) (20) We care about environment (p43-10) 

(21) Environment can be everything (p101-10) 

In Norwegian, the corresponding noun miljø may, but need not, take the definite  inflectional ending -et when referring to global climate and environmental action (cp. vi må ta vare på miljøet ‘we must take care of the environment’ and vi må  fokusere på miljø ‘we must focus on the environment’). This variability may make learners think that a bare NP is grammatical in English in analogy with Norwegian. Hence, interference can likely explain this error. Similarly, the letters to the editor also showcase several instances of overuse of the with the noun adult used generically, as illustrated in these examples

(22) Young people care about the environment more than the adults do. The adults drives  around in cares and on some off the working pleases they dumping poisen in to the waters  and into the air. (p160-10) 

(23) We do the same things as they do, so if the adults say we care less about the  environment, they should think about how they are and all the things they do that affects  the environment. (p158-10) 

These errors may also be explained by interference from Norwegian. When the  Norwegian voksen ‘adult’ is a nominalized adjective heading an NP with generic plural reference, two semantically and pragmatically equivalent structures are  permissible (examples 24a–b) (the NPs in question have been bracketed), and one of these (24b) involves a demonstrative pronoun preceding the nominal head. Errors involving an overuse of the definite article with adults present in CORYL seem to apply the structure in (26b) to English:

(24) a. [Voksne]NP bryr seg ikke om miljøet 

                ‘Adults don’t care about the environment.’  

        b. [De voksne]NP bryr seg ikke om miljøet

                 ‘Adults don’t care about the environment.’ 

Additionally, judging from the content of learners’ texts, it appears that they were given the opportunity to write letters to the editor concerning whether very old or young people should be allowed to hold a driving licence. Article errors within these texts show, for example, that learners underuse the indefinite article in the  NPdriving license (25a), and overuse the definite article in NPs such as traffic (26a) and train (27a). In all these examples, interference from Norwegian may explain the errors, as shown in (25b), (26b) and (27b): 

(25) a. I think it's not all right for an eighty year old to have drivers license and sixteen years  old, not. (p24-10) 

    b. Jeg synes ikke at det er riktig at en åttiåring kan ha førerkort, men ikke en sekstenåring. 

(26) a. how to act in the traffic (p05-10)   

        b. Hvordan man skal oppføre seg i trafikken-DEF.SG 

 (27) a. We young have to take train (p100-10) 

             b. Vi ungdommer må ta tog. 

This text type thus seems to involve a higher relative density of nouns which cause particular difficulty for Norwegian learners in terms of article use. The requirements of the text type may cause learners to make a relatively higher number of mistakes, and a different result might have been obtained if learners had been given tasks involving nouns which they might have found to be less challenging. This being said, the low effect size demonstrated above clearly suggests that caution in interpretation is needed here: even with the exclusion of the year-10 learners’ letters to the editor, the overall difference in performance between the two groups of learners is very small. 


5   Conclusions and Pedagogical Implications 

In this article, we have demonstrated by means of a corpus-based analysis that  young Norwegian EFL learners display a very high level of accuracy in their use  of the definite and the indefinite article, even at an early learning stage (year 7). We have also shown that little discernible proficiency development occurs between years 7 and 10. The close typological similarity between the source and the target language, and the high degree of exposure to English in the Norwegian society (e.g. Rindal 2014: 8-10) may explain the high competence demonstrated. Differences in text types could also be identified in learners’ performance: the highest frequency of a non-target-like use of articles in our data occurs in a text type (letters to the editor, year-10 learners) which promotes the use of specific nouns the use of which exhibits systematic differences between English and Norwegian in terms of articles and demonstratives. That said, the influence exerted by this text type is apparently not decisive in the overall picture – there genuinely is very little detectable development between years 7 and 10, regardless of differences in text type.  

This lack of development may be related to a lack of sustained explicit focus on linguistic form in Norwegian classrooms – specifically, in our case, a lack of focus on Norwegian-English cross-linguistic nuances in article use. Following from this, our data can arguably be said to show that there is room for more form-focused attention to articles and the structure of NPs in Norwegian EFL classrooms, particularly in lower secondary education, where the development was found to stagnate (admittedly on a high level) in our data. 

Rindal (2014: 5) points out that the “principal goal” of the two most recent  English curricula in Norway at the time of her writing was “teaching students to  communicate in English”, and Garshol (2019: 86) states that “English teaching in  Norway is highly communicative” and that “explicit grammar instruction, if present, can hardly be considered extensive”. In agreement with Garshol, our impression (as a practicing teacher and a teacher trainer) is that EFL teaching in secondary education in Norway features a comparatively heavy focus on communicative aspects of language use, as well as on engaging critically with literature and with cultural and societal phenomena, potentially to the detriment of formal linguistic  acquisition. If generally true,25 this would be unfortunate, as it is well-known from previous research that focus on form is beneficial in instructed language  acquisition. For example, Nassaji & Fotos (2011: 8-9) state that “teaching  approaches that put the primary focus on meaning with no attention to  grammatical forms are inadequate”, noting on the basis of previous research (i.e. Harley & Swain 1984, Lapkin et al. 1991 and Swain 1985) that “some type of  focus on grammatical forms is necessary if learners are to develop high levels of accuracy in the L2” (Nassaji & Fotos 2011: 9). The developmental stagnation that we see in our data on articles can be said to illustrate this, but since our findings are based on a small corpus, firm conclusions must be deferred to a later stage.




References

Abney, S. P. (1987). The English noun phrase in its sentential aspect. (PhD thesis). MIT. 

Bader M. & S. Hoem Iversen. (2017). Demonstrative reference in the writing of young EFL Norwegian learners. In: de Haan P., S. van Vuuren & R. de Vries (eds.). Language, learners and levels: Progression and variation. Corpora and Language in Use – Proceedings 3. Louvain-la-Neuve: Presses universitaires de Louvain, 181-202. 

Biber, D., S. Johansson, G. Leech, S. Conrad & E. Finegan. (1999). Longman grammar of spoken and written English. Harlow: Pearson.  

Bækken, B. (2006). English grammar: An introduction for students of English as a foreign  language. Bergen: Fagbokforlaget. 

Chafe, W. L. (1976). Givenness, contrastiveness, definiteness, subjects, topics, and point of view.  In: C. N. Li (ed.). Subject and topic. New York: Academic Press, 25-55. 

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd edn). Hillsdale, NJ:  Lawrence Erlbaum.  

DeCapua, A. (2017). Grammar for teachers. A guide to American English for native and non native speakers (2nd edn). Dordrecht: Springer. 

Faarlund, J. T., S. Lie & K. I. Vannebo (1997). Norsk referansegrammatikk. Oslo:  Universitetsforlaget. 

Fuchs, R. & V. Werner (2018). The use of stative progressives by school-age learners of English  and the importance of the variable context: Myth vs. (corpus) reality. In: International Journal  of Learner Corpus Research 4, 195—24.  

García Mayo, del Pilar, M. (2009). Article choice in L2 English by Spanish speakers: Evidence for full transfer. In: García Mayo, del Pilar, M. & R. Hawkins (eds). Second language acquisition of articles: Empirical findings and theoretical implications. Amsterdam: John Benjamins, 13—35. 

Garshol, L. (2019). I just doesn’t know: Agreement errors in English texts by Norwegian L2  learners: Causes and remedies. (PhD thesis). University of Agder, Kristiansand.

Granger, S. (2012). How to use foreign and second language learner corpora. In: Mackey, A. &  S. M. Gass (eds). Research methods in Second Language Acquisition: A practical guide.  Malden: Wiley-Blackwell, 7-29. 

Hasselgreen, A. & K. T. Sundet (2017). Introducing the CORYL corpus. What it is and how we can  use it to shed light on learner language. In: Bergen Language and Linguistic Studies 7, 197- 215. 

Harley, B. & M. Swain (1984). The interlanguage of immersion students and its implications for  second language teaching. In: Davies A., C. Criper, & A. P. R. Howatt (eds.). Interlanguage.  Edinburgh: Edinburgh University Press, 291-311. 

Hasselgård, H., P. Lysvåg & S. Johansson (2012). English grammar: Theory and use (2nd edn).  Oslo: Universitetsforlaget. 

Ionin, T., M. L. Zubizarreta & S. B. Maldonado (2008). Sources of linguistic knowledge in the  second language acquisition of English articles. In: Lingua 118, 554-576.

Jarvis, S. (2002). Topic continuity in L2 English article use. In: Studies in Second Language  Acquisition 24, 387-418. 

Julien, M. (2003). Double definiteness in Scandinavian. In: Nordlyd 31, 230-244. 

Lapkin, S., D. Hart & M. Swain (1991). Early and middle French immersion programs: French language outcomes. In: Canadian Modern Language Review 48, 11-40. 

Larsen, Sofie. (2019). Article use in the writing of young Norwegian EFL learners: A corpus-based study. (MA thesis). Western Norway University of Applied Sciences. 

Los, B. (2015). A historical syntax of English. Edinburgh: Edinburgh University Press. 

Master, P. (1987). A cross-linguistic interlanguage analysis of the acquisition of the English  article system. (PhD thesis). UCLA. 

Master, P. (1997). The English article system: acquisition, function, and pedagogy. In: System 25, 215-232. 

Master, P. (2002). Information structure and English article pedagogy. In: System 30, 331-348. 

Master, P. (2003). Acquisition of the zero and null articles in English. In: Issues in Applied  Linguistics 14, 3-20. 

Nassaji, H., & S. Fotos (2011). Teaching grammar in second language classrooms: Integrating  form-focused instruction in communicative context. New York: Routledge.

Nordanger, M. (2017). The encoding of definiteness in L2 Norwegian: A study of L1 effects and universals in narratives written by L1 Russian and L1 English learners. (PhD thesis). University  of Bergen.  

The Norwegian Directorate for Education and Training. (2018). Kva er nasjonale prøver? (https://www.udir.no/eksamen-og-prover/prover/nasjonale-prover/om-nasjonale-prover;17-08- 2020). 

Parrish, B. (1987). A new look at methodologies in the study of article acquisition for learners of  ESL. In: Language Learning 37, 361-383.

Quirk, R., S. Greenbaum, G. Leech & J. Svartvik (1985). A comprehensive grammar of the English  language. London: Longman. 

Rindal, U. (2014). What is English? In: Acta Didactica Norge 8, 117. 

Sarko, G. (2009). L2 English article production by Arabic and French speakers. In: García Mayo,  del Pilar, M. & R. Hawkins (eds.). Second language acquisition of articles: empirical findings  and theoretical implications. Amsterdam: John Benjamins, 37-66.  

Snape, N. (2005). The use of articles in L2 English by Japanese and Spanish learners. In: Essex  Graduate Student Papers in Language and Linguistics 7, 1-23. 

Swain, M. (1985). Communicative competence: Some roles of comprehensible input and  comprehensible output in its development. In Gass, S. & C. Madden (eds.). Input in second  language acquisition. Rowley, MA: Newbury House, 235-253.  

Swan, M. (2016). Practical English usage (4th edn). Oxford: Oxford University Press. 

Thomas, M. (1989). The acquisition of English articles by first- and second-language learners. In: Applied Psycholinguistics 10, 335-355. 

Young, R. (1996). Form-function relations in articles in English interlanguage. In: Bayley, R. & D.  R. Preston (eds.). Second language acquisition and linguistic variation. Amsterdam: John  Benjamins, 135-175. 

Zdorenko, T. & J. Paradis (2008). The acquisition of articles in child second language English:  fluctuation, transfer or both? In: Second Language Research 24, 227-250.

Zdorenko, T., & J. Paradis (2011). Articles in child L2 English: when L1 and L2 acquisition meet  at the interface. In: First Language 32, 38-62.




Authors:

Sofie Larsen

EFL Teacher

Vågsøy Lower Secondary School 

Gate 5 no. 96

6700 Måløy

Norway

Email: sofie.larsen@kinn.kommune.no

 

Kristian A. Rusten, PhD

Associate Professor of English Language and Didactics

Western Norway University of Applied Sciences

Department of Language, Literature, Mathematics and Interpreting

Inndalsveien 28, 

5063 Bergen

Norway

Email: karu@hvl.no


__________________

 [1] We gratefully acknowledge our debt to Kari T. Sundet, Christine Möller-Omrani and Milica Savic. Any errors present in the article are entirely attributable to the authors.

[2The corpus can be accessed by clicking ‘Coryl’ in the list of corpora at https://cla-rino. uib.no/korpuskel/corpus-list?session-id=252077957191909 (20.11.2021).

[3E.g. Bader & Hoem Iversen (2017: 182) and Fuchs & Werner (2018: 196) for similar statements.

[4Specifically, the version of the Norwegian National Curriculum in English which was current at the time this research was carried out (2018-2019) stipulated that learners should be able to "write short texts that express opinions and interests, and that describe, narrate and enquire" no later than after Year 4 (https://www.udir.no/kl06/ENG1- 03/Hele/Kompetansemaal/competence-aims-after-year-4?lplang=http://data.udir.no/kl06/eng; 22-11-2021)

[5Moreover, in contexts involving demonstratives, Norwegian features ‘double definiteness’, i.e. definiteness is marked on both the demonstrative and the head of the NP, as illus- trated in examples (i)–(ii). 

(i) Den bilen ser dyr ut. 

‘That car looks expensive.’ 

(ii) Dei gule skjortene 

‘The yellowshirts.’ (adapted from Julien 2003: 230 example 1d).


[6] The data cited in examples (1) to (4) have been extracted from the British National Corpus, distributed by the University of Oxford on behalf of the BNC Consortium. All rights in the texts cited are reserved.

[7] The variant with a zero article is possible in these examples if "focus is on the type of institution rather than on a specific entity" (Biber et al. 1999: 261). Note, however, that there is variation here – with university, for example, American English "requires the definite article", while "usage varies in British English" (Bækken 2006: 116, fn. 1).

[8] Bækken’s inclusion of proper nouns here prompts us to note that this only holds for frameworks which exclusively consider the surface structure. In certain generative theories, even proper nouns are considered determiner phrases: since proper nouns canonically have definite reference, and since this specification must come from somewhere, a null (i.e. phonologically unrealised) definite article is assumed to be present in the phrase structure.

[9] E.g. Young (1996: 135), who refers to articles as "the most frequent forms that are available to learners in input".

[10] We follow Nordanger (2017: 14, fn. 14) in considering Norwegian a [+ART] language “since the language possesses a grammatical category encoding the same semantic/pragmatic content as article languages such as English, French, and German”

[11] For more details, see Bækken’s (2006: ch. 3) careful contrastive survey of the use of articles in English presented for a Norwegian undergraduate audience (and also e.g. Hasselgård et al. 2012: 119-126).

[12] To be specific, CORYL is only tagged for errors, which means that there is no syntactic or part-of-speech tagging.

[13] We exclude year-11 learners from consideration since students in the Norwegian system decide between vocational and academic study programmes after year 10.

[14] The purpose of the Norwegian National Tests is to acquire knowledge about the pupils’ basic skills in English, mathematics and reading. This knowledge should be the foundation of formative assessment and development of quality in the different levels of the school system (Norwegian Directorate for Education and Training 2018). The tests are currently (as of spring 2020) administered in years 5, 8 and 9.

[15] This was done by specifying a search for all the occurrences of the lexical items a, an and the (and spelling variants, including misspelled renderings, of these lexical items) which do not carry an ART tag.

[16] Since CORYL does not feature syntactic annotation, it is impossible to search for invisible syntactic categories such as the zero article. Thus, whenever contrastive data are provided, target-like uses of the zero article are omitted from consideration. This omission admittedly is unfortunate, since the accurate use of the zero article, or bare NPs, is obviously a highly relevant dimension of article use among the learners under investigation. This is, however, an unavoidable consequence of the make-up of the corpus used as source material for this study.

[17] It has subsequently been expanded to encompass 191,568 tokens.

[18] Some readers might view addition of a reference corpus as a desideratum. While this would be useful, we do not consider this to be a crucial requirement.

[19] The Norwegian past tense form spiste means ‘ate’ in English.

[20] Note that this example must be read as conferring specific meaning (‘they have lit a fire’, ‘de har tent på bål’), not as involving generic reference (‘they discovered fire’).

[21] However, it is possible for personal names to take either a preproprial or a postproprial article in some Norwegian dialects, as shown in examples (iii) and (iv) (the examples were constructed by the authors on the basis of native-speaker competence): 

(iii) Han Halvor sa at han skulle stikke innom før jobb. 

‘Halvor said that he would stop by before work.’ 

(iv) Halvoren sa at han skulle stikke innom før jobb. 

‘Halvor said that he would stop by before work.

[22] There is one sense in which the definite article would be possible here: if the learner is trying to express unique emphasis. However, this interpretation seems to be out of the question here, and hence, we consider (12) an example of overuse of the in agreement with the corpus annotators.

[23] For Tables 3 and 4, the percentages were calculated on the basis of the total number of target-like uses of the articles added to the total number of over- or underuses. Thus, cases of article misuse are excluded from the tables. As mentioned earlier, target-like uses of the zero article are not included in the data, as CORYL does not facilitate the extraction of such tokens.