Editor

JLLT edited by Thomas Tinnefeld
Journal of Linguistics and Language Teaching
Volume 8 (2017) Issue 1



Using Corpus Data in the Development of
Second Language Oral Communicative Competence

Randall Gess (Ottawa, Canada)


Abstract (English)

The present paper describes how a large corpus of spoken French, stemming from the international Phonology of Contemporary French (PFC) project, can be used in the development of second language oral communicative competence, with a non-exclusive focus on pronunciation. Following a brief overview of the PFC project, data from one survey point of this corpus will be provided, illustrating a widespread phenomenon in Canadian French: word-final cluster simplification. It will be shown how and to what ends the data can be exploited for classroom use. For students, the potential benefits of using corpus data are manifold. These include greater learner autonomy, quantitatively and qualitatively rich natural input, excellent points of comparison between written and oral language, exposure to numerous and diverse varieties of spoken French as well as to rich cultural information from across the francophone world and, last but certainly not least, a raised awareness of different dimensions of sociolinguistic variation.
Keywords: Spoken language corpus, oral communicative competence, pronunciation teaching

Abstract (Français)

Cet article décrit comment on peut utiliser un corpus important du français parlé, provenant du projet international Phonologie du Français Contemporain (PFC), dans le développement de la compétence communicative orale d’une deuxième langue, avec une attention non exhaustive attribuée à la prononciation. Suite à un bref survol du projet PFC, il est présenté des données d’un point d’enquête de ce corpus, qui illustre un phénomène répandu du français canadien: la simplification des groupes consonantiques finales. Il sera également démontré des chances d’exploitation des données dans la salle de classe, et les fins liées à celles-ci. Pour les étudiants, les avantages potentiels sont nombreux. Ceux-ci comprennent une autonomie d’apprentissage plus importante, de l’input naturel riche du point de vue quantitatif et qualitatif, d’excellents points de comparaison entre la langue écrite et la langue parlée, l’exposition à des variétés nombreuses et diverses du français parlé ainsi qu’à de riches informations culturelles de partout à travers la francophonie et enfin, une connaissance approfondie des différentes dimensions de variation sociolinguistique.
Mots clés: Corpus de langue parlée, compétence communicative orale, l’enseignement de la prononciation



1 Introduction

Given the impressive rise of corpus linguistics over the past few decades, the dearth of research on the use of corpora for the development of second language (L2) pronunciation, a crucial aspect of L2 oral communicative competence (OCC), is somewhat surprising. Works treating the use of corpora in the teaching of language and linguistics start to appear in the 1990s (Knowles 1990, Sinclair 2004, Wichmann et al. 1997). However, within this body of work the focus is largely on written corpora. Notable exceptions are chapters on using a spoken German corpus to determine things like vocabulary frequency, and on teaching intonation to students of English phonetics and phonology in Wichmann et al. (1997) (Jones 1997, Wichmann 1997); and chapters on authenticity, communicative utility, and formulaic expressions in English, and on the use of concordancing in the teaching of Portuguese in Sinclair (2004) (Mauranen 2004, Santos Pereira 2004). Of these works, none have a focus on pronunciation, although they do have relevance to OCC more generally, to varying degrees.

The first research evidence on the use of corpora for teaching pronunciation is, to my knowledge, the very short article by Gut (2005). Besides a PFC-related publication that will be discussed shortly, the only other available research product focused on using a corpus for teaching pronunciation is the website at the Hong Kong Institute of Education, A Corpus-Based Pronunciation Learning Website (Chen et al. 2014). It is interesting that the corpus of focus in both Gut and Chen et al. is a learner corpus: German-speaking learners of English and English-speaking learners of German in the case of Gut, and Chinese-speaking learners of English in Chen et al. The work of Detey et al. (2010), based entirely on the Phonologie du Français Contemporain (PFC) project (or rather of its off-shoot project, the PFC-Enseignement du Français or PFC-EF), therefore represents an important landmark development. It is important to note, however, that the focus of the PFC-EF is much broader than the teaching of pronunciation – it explores the use of the PFC corpus for the development of OCC generally, as well as for focusing on grammatical form, and even for the development of writing, the latter principally by way of explicit stylistic comparison between written and spoken language.

In this article, data from one PFC survey point will be provided, illustrating a single aspect of French phonology. The survey point is a community called Maillardville, in Coquitlam, British Columbia, and the aspect of French phonology is the reduction of word-final consonant clusters. Before turning to the relevant data, first, a brief overview of the PFC project will be given and then, a basic description of word-final cluster simplification in French will be added. After going over the data from the PFC survey point in question, it will be shown how it can be exploited in the classroom, and for what purposes. Finally, I will outline what I see as the many benefits to using corpus data for teaching aspects of pronunciation and other aspects of OCC.


2 The PFC Project

The goal of the PFC project (http://www.projet-pfc.net) is to describe the pronunciation of French in all its geographic, social, and stylistic diversity. To this end, the project seeks to build a vast corpus of French as it is spoken around the world, based on surveys conducted by an international team of researchers and their students, using a common protocol, as well as common methods and tools for analysis. The reasons for doing so are not purely descriptive. The envisaged corpus can also serve to test current models of phonetics and phonology, to encourage the sharing of research, to provide for the renewal of data informing the teaching of French, as well as simply to preserve a crucial part of the patrimoine linguistique of the francophone world (Durand, Laks & Lyche 2002, 2009).

The ideal PFC survey point involves 12 speakers representing both genders, a minimum of two age groups, and some differences in level of education and / or professional profile. Speakers complete four tasks:
  • a guided conversation designed to gather basic information about the speaker and her or his linguistic background;
  • a free conversation of approximately 30 minutes with a fellow member of the speech community;
  • the reading of a common word list (94 words (plus an additional, tailored list of 115 words, for Canadian speakers)); and
  • the reading of a common text (an invented three-paragraph news article).
At least one interviewer per survey point should belong to (or be well known to) the community of speakers – this is usually the partner for the free conversation task. The four tasks are designed to elicit different registers of the language, from very careful and monitored (reading words in isolation) to casual and unmonitored (free conversation with someone familiar to the speaker). It should be noted that these are ideals that are not always achieved. For example, for the survey point to be discussed here, there was no interviewer known to the participants and, although one did live in the same town, she was not perceived as a member of the community.

Among the targets for analysis are the vowel inventory (contrasts, allophonic realizations), the consonant inventory (e.g., h-aspiré, the rhotic, the palatal nasal, allophonic realizations), realization of schwa (at prosodic boundaries, in different positions within the word, in monosyllables (clitics), in schwa sequences), realization of liaison (so-called “obligatory” and “optional”), and prosody. Of course, any other aspect of phonology may be analyzed, as is the case here, where we will look at the reduction of word-final consonant clusters.

The PFC-EF, an off-shoot of the larger PFC project, is designed principally for those teaching French and / or developing French pedagogical materials. PFC data is used to develop materials for the teaching / learning and the diffusion of French, representative of the variation found across the francophone world. The goal of the PFC-EF is to provide rich and diverse classroom materials for listening and speaking, for comparison to written language and to le français de référence (Morin 2000) as well as for analysis of variation that exists across the francophone world. These resources should be useful for teachers of French as a first, second, or foreign language. The audience assumed here is one consisting of university-level learners of French as a second language in a predominantly English-speaking part of Canada. The context envisioned here is a third-year course on oral expression, although any pedagogical suggestions made are easily transferable to other contexts with minor or major adaptations as required.



3 Word-Final Consonant Cluster Reduction in French

The simplification of word-final consonant clusters is a widespread phenomenon in French, and more so in Canadian French than in European varieties (see Milne 2016 for a close examination of the treatment of final clusters in the two dialects). Most clusters in question, and all of those that will be discussed here, are in final position following a historical deletion process targeting word-final schwa. Current scholarship therefore assumes the absence of an underlying final schwa in the relevant forms (Côté 2004), although its former presence continues to be reflected in orthography with a final e. When a schwa is pronounced following the clusters in question, it is considered to be epenthetic. While the assumption of underlying representations without a schwa is not absolutely critical, it does have implications for pedagogical approach and learning outcomes that will be discussed later.

For simplicity, the focus here is only on clusters of the type obstruent+liquid (OL) (for an extensive treatment of all types of word-final consonant clusters, the reader is referred to Côté (2004)). OL consonant cluster reduction is illustrated in the following examples taken from Ostiguy & Tousignant (2008), which are represented based on orthography (abbreviated in the case of the reduced forms). (Recall that the final orthographic e in the non-reduced forms is not pronounced):

possib (< possible)
sob (< sobre)
peup (< peuple)
prop (< propre)
souf (< souffle)
poud (< poudre)
règ (< règle)
let (< lettre)
spectac (< spectacle)
                                                         (Ostiguy & Tousignant 2008: 173)
Ostiguy & Tousignant describe word-final cluster simplification as being very common in spoken Quebec French, so much that even cultivated speakers or speakers attending to their speech will not notice it and will assign no negative judgement to it. Indeed, the phenomenon is so pervasive that the authors actually pose the question of whether the non-reduced variants should be brought to students’ attention. They come to an affirmative conclusion, saying that their existence in formal spoken Quebec French fully justifies doing so (2008: 177). Debating whether students should be guided to notice non-reduced variants pre-supposes that the reduced variants are the unmarked target form. Should the latter therefore serve as the pedagogical norm? The data discussed in the following section, as limited as they are, shed interesting light on this question, to which we return afterwards.


4 Data

The data in this section come from a single, male speaker from the PFC survey point at Maillardville (Coquitlam, British Columbia). At the time of the recording (July 20061), the speaker was 62 years old. He was born in Winnipeg, Manitoba, to francophone parents, and the family moved to Maillardville when he was four years old. He is a highly educated speaker, having an advanced degree in education and in his career not only taught French, but also played an important role in the development of French immersion programs in the public school system. It is the professional background of this speaker which determined the use of his data for the present study, so as to show that the nonstandard feature described and analyzed here is not at all representative of "unsophisticated" speech.

The data come from the four task conditions – the reading lists, the text, the guided conversation (a six-minute extract), and the free conversation (a six-minute extract). Recall that for Canadian survey points, there are in fact two reading lists. The first contains 94 items and targets a variety of sounds and contrasts considered as important points of potential variation across varieties. The second list consists of 115 items targeting phenomena more common in varieties of French in Canada, including word-final consonant cluster reduction. The guided conversation was conducted by the author of this article, who was a visitor to the community, unknown to any of its members. The second interviewer was the one described previously, living in the same town, but not perceived as a member of the Maillardville community, and unknown to any participants. It was this interviewer who engaged in the free conversation task with the speakers. Because she was unknown to the participants, little or no real difference is expected between the guided conversation and the free conversation.

4.1 The Word Lists

In the first word-list, there are only three items with the target sequences: peuple, meurtre, and feutre. Of these three, none are reduced by the speaker. Indeed, one of them, peuple, is produced with an epenthetic final schwa. However, in reading the list of words, participants are also asked to say aloud the number for each. The 94 items give seven instances of the word quatre (presented to the reader in number form (4), not orthographically): quatre (4), vingt-quatre (24), trente-quatre (34), quarante-quatre (44), cinquante-quatre (54), soixante-quatre (64), and quatre-vingt-quatre (84). Of these seven items, there is only one token without a reduced form, which happens to be the shortest and the first of the series, i.e., quatre. The astute reader will have noticed that there is a second token of quatre in quatre-vingt-quatre. In this case, the schwa is lexicalized in the collocation quatre-vingts, and its realization is exceptionless across speakers and varieties. Given that the form is lexicalized, we can consider the OL sequence as word-internal, not word-final.

The second list has many more target sequences, having been designed specifically for Canadian French, where final cluster reduction is noted to be more widespread. In fact, there are 19 such items, as follows:

mettre
maître
sable
libre
couple
ministre
neutre
jungle
tabernacle
le prêtre
aveugle
convaincre
vinaigre
orchestre
ombre
épingle
arbre
cent piastres sur la table

Of these 19 items, none is pronounced with a reduced cluster. Five are pronounced with a final audible schwa. With this longer list, there are eight instances of quatre (4): quatre (4), vingt-quatre (24), trente-quatre (34), quarante-quatre (44), cinquante-quatre (54), soixante-quatre (64), quatre-vingt-quatre (84), cent quatre (104). Of these, all are pronounced with a reduced cluster. The results of the word lists are summarized in Table 1:

Forms
Word-List Items
quatre
reduced
0 (0%)
14 (93%)
unreduced (no schwa)
16 (73%)
1 (7%)
unreduced (with schwa)
6 (27%)
0 (0%)

Table 1: Reduction of Final Clusters in Word Lists

It is clear from Table 1 that the list-reading condition disfavours reduction, since there are no reduced forms produced at all. Indeed, to the extent that OL sequences are not tolerated in word-final position (27%), they are resolved via schwa epenthesis rather than by reduction. For the number quatre, quite the opposite holds, with reduction being the near absolute rule (93%).

4.2 The Reading Text

The reading text contains 12 tokens with a final cluster, seven of which are instances of the same word, Ministre (from Premier Ministre). The other five items are titres, autre, membre, articles, and centre. Of the 12 tokens, ten are pronounced with a reduced cluster (indeed, in six of them, the cluster is deleted). Only autre and centre are produced without reduction, and both with an epenthetic schwa. These items occur in the sequences un autre côté and au centre d’une bataille. All three of the reduced forms similarly occur before a consonant, so the phonological environment appears not to be relevant.

On close inspection, one is tempted to say that Ministre has been lexicalized with the entire OL sequence deleted (i.e., as [minis]). The word is pronounced with a [t] only once, in the second token, where a vowel follows. However, a vowel also follows in the first token, and no [t] is pronounced. In the lone token with the [t], the target sequence is ‘Le Premier Ministre a en effet décidé… ,’ and the speaker stumbles quite noticeably. Immediately following the sequence [minis], he pronounces a glottalized [t] followed by the incorrect vowel [ɛ], followed by another, clearer [t] followed by [a], then a longish glottal stop followed by the correct formulation for ‘a en effet’. Otherwise, the form [minis] occurs in all environments: before a vowel, before a consonant, at the end of a phonological phrase, and at the end of a phonological utterance.

The results of the reading text are summarized in Table 2:

Forms
Ministre2
Other Tokens
deleted
7 (100%)
0 (0%)
reduced
0 (0%)
3 (60%)
unreduced (no schwa)
0 (0%)
0 (0%)
unreduced (with schwa)
0 (0%)
2 (40%)

Table 2: Deletion and Reduction of Final Clusters in the Reading Text

Again, we see that in the form Ministre, deletion is categorical, and perhaps for the entire OL sequence rather than just for the liquid member of the cluster. Otherwise, it appears that the reading text condition favours reduction, 60% to 40%, although the number of tokens is rather limited. Interestingly, no OL clusters surface unaltered – they are either reduced or followed by an epenthetic schwa.

4.3 The Guided Conversation

Seven tokens with final consonant clusters occur in the six-minute extract from the guided conversation. The words that appear are eux-autres, couvre, autre, favorable, prendre, peut-être, and kilomètres.3 Of the seven tokens, five are produced with a reduced cluster. The two non-reduced tokens are in couvre (“… ça couvre la la …”) and kilomètres (“… une vingtaine de kilomètres d’ici…”). All of the other forms occur before a consonant, with the exception of peut-être, which occurs before the hesitation form euh (and is reduced), so as for the reading text, phonological environment appears not to be relevant to reduction. The results of the guided conversation are presented in Table 3:

Forms
Tokens
reduced
5 (71%)
unreduced (no schwa)
0 (0%)
unreduced (with schwa)
2 (29%)

Table 3: Reduction of Final Clusters in the Guided Conversation

In the guided conversation, reduction is clearly favoured, at 71%. It is also noteworthy that, as was the case for the reading text, OL clusters never surface unaltered – when not reduced, they are pronounced with a following epenthetic schwa.

4.4 The Free Conversation

There are 16 tokens with final consonant clusters in the six-minute extract from the free conversation. The relevant words are nous-autres (x4), apprendre, autre (adj.) (x3), autres (n.) (x2), répondre, notre, exemple, incroyable, peut-être, and favorable. Of the 16 tokens, 11 are reduced. The results of the free conversation are presented in Table 4:

Forms
Tokens
reduced
11 (68.75%)
unreduced (no schwa)
3 (18.75%)
unreduced (with schwa)
2 (12.5%)

Table 4: Reduction of Final Clusters in the Free Conversation

The pattern of reduction in the free conversation quite closely mirrors the pattern in the guided conversation (as we expected, given that we did not have a real community member to participate in the free conversation).

4.5 Discussion

The question we asked before presenting the data was whether reduced forms should serve as the pedagogical norm. To facilitate consideration of this question, Table 5 provides a synthesis of the information from Tables 1 through 4, leaving out the tokens of quatre (4) from the word-list condition and those of Ministre from the reading text condition, for the same reasons they were treated separately in Tables 1 and 2:

Forms
Word lists
Text
Conversations
reduced
0%
60%
70%
unreduced (no schwa)
73%
0%
13%
unreduced (with schwa)
27%
40%
17%

Table 5: Reduction of Final Clusters across Conditions (non-lexicalized tokens only)

From the limited data from this single speaker, it would seem that a strong argument can be made for the reduced form serving as the pedagogical norm, at least for spoken French in Canada. A clear majority of word-final clusters are reduced in both natural (i.e., non-list) reading, and in conversations (more or less formal to the extent that formality varied in the conversations for this survey point). This is a highly conservative count, given that the number quatre (4) is reduced in the reading task at a rate of 93%, and Ministre is reduced without exception in the reading text condition, and virtually to the point of having no relevant OL sequence! Only in the reading of the word-lists were final clusters unreduced. Clearly, we do not want learners to sound stilted in natural speaking conditions, and this seems a real risk if we use the unreduced form as a pedagogical norm.

On the other hand, in order to guide learners to the correct lexical representations (unreduced, without schwa), exposure to the word-list data appears crucial. It is only from this data that learners can be reasonably expected to arrive at the underlying form without schwa since it is a minority variant in conversation and does not occur in the reading text at all, whereas the unreduced form with schwa occurs as a minority pronunciation in all conditions, and a fairly robust one in the reading conditions (the word-lists and especially the text). Learners could be led to the correct form by way of formal explanation – i.e., that variant pronunciations can be derived from the unreduced form without schwa via a simple, one-step change in one of two possible directions (consonant deletion or schwa epenthesis) – but our goal is to focus on pronunciation, not to train phonologists! Presenting the forms from the word lists is a far less frustrating way to guide learners to the correct lexical representation.

It is legitimate to ask whether it is even necessary to attend to the development of underlying lexical forms, but if a learner’s base form contains a final schwa, there is a risk of highly unnatural, hyper-articulated speech, the avoidance of which is an important goal of pronunciation training. On the other hand, if a learner posits the reduced form as the lexical representation, the risk is of inappropriately informal speech across contexts, as well as of outright pronunciation errors if an incorrect second consonant of a given cluster is realized in some instances (if the second consonant is not in the lexical representation, its accurate recovery in production is unsure). Another point to consider is that students will almost certainly come with some relevant lexical representations already formed. Part of the goal may therefore to be to correct those that are wrong and give rise to unnatural pronunciations.

The up-shot of the preceding discussion is that it is important to present to learners a full range of appropriate input across speech styles – precisely what the corpus affords (rich input is discussed in some detail below). The arguments presented here with respect to word-final clusters are easily transferable to other aspects of pronunciation.


5 Exploiting the Data for Teaching Pronunciation

We now turn to the question of how the data we have seen in the previous section can be put to use for teaching pronunciation. In fact, corpus data can be useful to at least three of four crucial aspects of pronunciation learning: creating lexical representations, understanding pronunciation rules (i.e., developing declarative knowledge), and developing automaticity. Its usefulness to a fourth aspect of learning – developing procedural knowledge – is not obvious, although some researchers consider the step from declarative knowledge to procedural knowledge somewhat trivial in the case of at least some aspects of pronunciation. For example, Dalton & Seidlhofer assert that, “once learners know that simplifications are normal, they are often able to convert this knowledge into active, procedural knowledge with astounding ease” (Dalton & Seidlhofer 1994: 116). Reed (2016) provides two simple classroom strategies for bridging “the declarative to procedural knowledge gap” (Reed 2016: 237) for more problematic aspects of pronunciation. Of course in our case, the procedural knowledge involved seems quite unproblematic – it simply entails the inhibition of articulators in a certain specific phonological structure.

The following sections, in turn, focus on creating lexical representations through the provision of rich input, building declarative knowledge, and developing automaticity.

5.1 Providing Rich Input

A key aspect of teaching pronunciation is modeling – the provision of input targeting a specific form or structure. The most obvious benefit of using a large corpus is that it affords a quantitative and qualitative richness in terms of modeling that is unparalleled. In the case of word-final clusters, and our single speaker from our single survey point, we have 75 relevant tokens. There are 16 other speakers from our survey point, although the norm is 12, and there are 38 survey points with data made available online on the PFC project website referred to earlier. For each survey point, one can access data (sound files and transcription) by speaker for the word list(s), the reading text, six minutes of the guided conversation, and six minutes of the free conversation. This adds up to a large quantity of valuable data.

In terms of quality, we have seen that the PFC protocol results in data from up to four speech styles, depending on how distinct the guided and free conversations are (which, in turn, depends on whether the conversation partner for the free conversation is previously known to the speaker – a weak point of our own survey point, as discussed earlier). The data available through the PFC project also span a number of dialect areas across the francophone world, therefore providing ready access to data that was previously extremely difficult to come by, if not impossible in some cases. The naturalistic nature of the input also contributes to its high quality. While it is certainly true that reading a word list and reading a prepared text are not generally considered 'authentic language', these are instances of native speakers reading for a non-pedagogical purpose. And, of course, the data from these contexts are complemented by the data from conversations. The unquestionably naturalistic nature of the latter ensures (or at least militates in favour of) variety in terms of the phonological environments that target phonological forms occur in. So the form in question will be heard not only in phrase-final position, but rather (as we have seen), phrase-internally before both consonants and vowels, and the end of a phonological phrase, and at the end of a phonological utterance.

The most important role of the quantitatively and qualitatively rich input in the acquisition process is to facilitate the building of lexical representations. If we follow Bybee (2000, 2001) in assuming that lexical representations are basically clusters of exemplars that can be continuously updated, then exposure to multiple tokens (exemplars) serves to enrich representations, providing phonetic detail and information regarding permissible surface variations. Exposure over time and across different contexts will aid learners in associating certain exemplar types (i.e., full or reduced forms) to appropriate speech contexts (more or less formal).

Students can be exposed to speech samples directly from the PFC site, with simultaneous audio and transcription. One potential drawback to direct access, depending on the input-providing activity envisioned, is that the speech samples are not tailored to individual phonological structures or processes but rather, by design, to a wide variety of them. For example, for our case of word-final clusters, these are found amongst a multitude of non-target items. It may not be a beneficial use of time to have students listen to entire extracts for only a few tokens. This is most obvious in the case of the word lists, in which there are only 22 relevant tokens amongst 209 items (the ratio is much worse for the sole word list used outside of Canada, at 3 / 94). More importantly than the ratio of target items to non-target ones, there is not much benefit to listening to a list of words being read, compared to listening to a reading text, or especially to natural conversation about a topic that may well be of inherent interest. So, even if the conversations do not have a high number of target items, important side benefits can be had from listening to them.

If using speech samples directly from the PFC site is not ideal for a given purpose, tokens from any or all of the four conditions can also be easily extracted (recorded) from the PFC audio files through freely available software such as Audacity® or Praat (Boersma & Weenkink 2016). Tokens can then be presented to students for the purpose of building lexical representations in a variety of ways. They can be presented with or without reference to their written forms, they can be presented in isolation or in context (of course those tokens from the word list(s) are inherently in isolation), and different surface variants can be presented when and as desired.

5.2 Building Declarative Knowledge

Students’ exposure to multiple variants of tokens in context allows them, over time, to ascertain the phonological and stylistic factors conditioning variation. The beauty of the corpus in this regard is that it allows for the development of this declarative knowledge through exploration on the part of the learner, which can be guided to a greater or lesser extent by the instructor. This explorative use of the corpus simultaneously provides input for building and enriching lexical representations (Section 5.1) and for constructing declarative knowledge, the latter relying absolutely on the context in which variants occur. That is, while lexical representations can be developed through exposure to different variants in isolation, learning the rules that produce them depends on hearing them in their conditioning environments, whether these be speech style (on the formal to informal continuum, corresponding to word list(s), reading text, guided and free conversations), or phonological context (pre-vocalic, pre-consonantal, phrase-internal, phrase-final, etc.), and preferably both. Exposure to a range of speech styles is, of course, a built-in feature of the PFC corpus, and exposure to a range of phonological contexts is heavily favoured – by design in the reading text, at least for some phenomena (realization of schwa, liaison) and in relation to natural frequency of occurrence in the six minutes each of guided and free conversations.

With respect to word-final consonant clusters, one can begin by exposing students to a speech sample from conversation, in which variants reflecting the pedagogical norm are in preponderance. At first, the focus on form can be passive, couched in a listening comprehension activity with a primary focus on the content of the conversation. Following this can be a more explicit focus on form, with students actively listening for reduced OL clusters. They can first be asked to identify relevant words in the written transcript, underlining them, and then to listen for the pronunciations on subsequent replays. Then students and instructor can discuss the variants and their quantitative distribution, and come up with hypotheses to explain these (i.e., possible pronunciation rules). The instructor is obviously free to tailor any such activity as he or she deems appropriate, and the focus on form can be broader – on any reductions involving consonants, or indeed on reductions more generally. This will depend on the type of course and its overall goals, time allotted to pronunciation training versus other aspects of OCC, etc.

Remaining on our focus here on OL clusters, one can then move to the reading text and then to the word lists. Before moving to each new condition, students can be asked to make predictions regarding what they will hear, taking into account the nature of the task the speaker is engaged in. After working with the reading text, comparison can be made with the conversation, and hypotheses made earlier can be revisited and adjusted as necessary. For the word lists, playing isolated realizations of the number quatre (4) would be a good way to begin, to reinforce what was found in the naturalistic speech of conversations, and then moving to the relevant words extracted from the word-lists. Overall comparisons can follow the word-list activity, and final pronunciation rules arrived at.

Another possible way to approach the students’ discovery process would be to have them explore the online corpus themselves (with appropriate instruction in how to do so, of course). They can be directed to find relevant tokens in each condition, identify variants and quantify their distributions, and come up with hypotheses to explain them. This kind of activity can be done outside of class and, depending on the structure of the course and its constraints (including its size), students can be assigned to work with different target structures and to report back their findings to the class. The corpus provides unique potential for precisely this kind of autonomous learning.

5.3 Developing Automaticity

Repetition is clearly a key ingredient for developing automaticity, and the word lists are of obviously utility for this purpose, but single words and / or short collocations can also be extracted from any of the conditions. Beyond simple repetition, practice reading is a perfect next step, and the reading text provides great modeling for this purpose. After listening closely to the text as read by one or more PFC speakers, learners can practice reading it themselves. Learners can do so in pairs or groups, recording theirs and others’ reading of the text, and comparing the recordings to those of the PFC speaker model(s) they have used.

For use with PFC conversations, two useful, more advanced activities for developing automaticity are shadowing (Dauer 2004, Grant 2000, Quarterman & Boatwright 2003) and mirroring (Dauer 2004, Monk, Lindgren & Meyers 2003). In a shadowing activity, a learner repeats word for word what a speaker says, following by just a word or two. This is a holistic activity, in which learners must pay close attention to imitating as exactly as possible not only the words spoken (including aspects like OL cluster reduction), but every aspect of pronunciation, including rate and rhythm, pauses and hesitations, prominence patterns, and phrasal intonation. With the right equipment, learners can access the input for this activity through headphones and simultaneously record themselves, analyzing the results afterwards in comparison with the model recording.

Mirroring is similar to shadowing, but with more explicit attention on linguistic features ahead of time. Learners transcribe around a minute’s worth of speech (or take a transcript of speech available through the PFC site), and carefully annotate intonation contours, prominences, hesitations, and pauses. PFC transcriptions are based on normal orthography and so do not indicate segmental phenomena like assimilation, deletion, or lengthening, any or all of which can also be annotated, as desired. Learners then practice mirroring the speech as precisely as possible, eventually recording themselves and evaluating the product (with or without the assistance of peers).

A great extension of these types of activities using the PFC corpus is to have students act as if they themselves are subjects of the study, participating in the guided and free conversation components. Students can interview each other, using questions of the type asked in the guided conversations, and / or they can simply engage in free conversation. Depending on the amount of time that can be allotted to this type of activity, these conversations can be recorded, transcribed and analyzed for the feature(s) treated in class.


6 Exploiting the Corpus for Other Aspects of OCC

As mentioned earlier, the PFC-EF project is designed specifically to explore the use of the PFC corpus for pedagogical purposes. These purposes are broad, and they are not all related to teaching pronunciation. In fact, of the nine pedagogical sheets (fiches pédagogiques) available for download (http://www.projet-pfc.net/ressources-didactiques/fichespedago.html), very few have activities with an explicit focus on pronunciation per se (i.e., in the procedural sense). Certainly, all provide rich input to learners, and several have activities focusing on building declarative knowledge (for example, of obligatory / optional liaisons, and the realization / deletion of the schwa), but there is little related to basic production (one activity calls for repetition of verb forms in casual versus more formal speech), or on developing automaticity as described in Section 5.3 (one imitation exercise contrasting two regional varieties). Nevertheless, most activities found in these documents are in some way more generally related to the development of OCC.

A common stated objective of the sheets is to sensitize learners to phonological variation, including to prejudices that exist with respect to certain geographical varieties, or features thereof. Another common objective is to draw learners’ attention to markers of spontaneous oral discourse (and in so doing, to stylistic variation itself). Noticing these markers may happen implicitly (learners hear them, and they see them in the transcripts when provided), but it is sometimes done explicitly as well, either by pointing them out, or by asking students to identify them in a transcript. One of the pedagogical sheets has as an explicit goal to develop awareness of register, in this case as manifested at the level of lexis. All of these types of awareness – of geographic and stylistic variation, and of features of oral as opposed to written discourse – are important aspects of OCC.

Probably the most common activity in the pedagogical sheets, and one of obvious relevance to the development of OCC, is the listening comprehension activity. This is often the first activity in a sequence, sometimes preceded by some type of predicting exercise, to activate schemata. The conversations are ideally suited to this purpose, as topics vary considerably, and are often full of cultural information one may not easily find elsewhere. (In our case, the speaker discussed the history of bilingual education in the province of British Columbia.)

Other activities in the pedagogical sheets focus explicitly on phonological, grammatical, or even orthographic form (the latter to build declarative knowledge with respect to spelling-to-pronunciation regularities). Possibilities with respect to focus on phonological form are manifold (reduction of OL clusters is one example), and the corpus offers an unparalleled richness in this regard since its raison d’être is phonological analysis. It is perhaps not as obviously the case, then, that the corpus is equally rich when it comes to focus on grammatical form – the guided and free conversations particularly so. The beauty here is that learners see grammar in completely natural contexts and witness how grammatical structures function in real communicative situations. Yet another activity found in the pedagogical sheets is one in which students take content from a segment of conversation and use it to write a text representative of some form of written language. The form of written language students are asked to produce can range from what one might find on a postcard to something far more formal, with expected features of the language changing accordingly, and all being quite distinct from the spoken form.

If we use the pedagogical sheets as a model, whether specific pedagogical applications of the corpus focus on advanced listening skills, attending to phonological or grammatical form, or even on the development of writing, they will be tailored to a specific intended audience (defined with respect to competence according to the Common European Framework of Reference for Languages), and will have clear objectives with respect to phonological, grammatical, sociolinguistic and / or discourse features. Objectives will be met through a variety of tasks, using specified PFC material. The content from the guided and free conversations can be used to organize lessons around themes, or the varieties of language to be explored can drive the organization. At all levels, there is much freedom on the part of instructors and, to the extent that they wish to use the pedagogical sheets as models, most of them will have their own expertise to draw on in adapting them. What the PFC corpus brings to the table is a wealth of raw material to draw on.


7 Benefits of Using Corpus Data in the Development of OCC

People may have differing views with respect to the principal benefits of using corpus data for teaching pronunciation. For example, where the instructor sees a tremendous benefit in exposing students to a wide range of varieties of French, and certainly in exposing students in Canada to Canadian varieties, others with more prescriptive proclivities may disagree. We should take seriously the problem identified in Auger (2002), whereby students leaving French immersion programs in Montreal are unable to interact with speakers in the communities in which they live and work, because the pedagogical norm they are exposed to is too distant from the language used in the community, as the author demonstrates. Learners of French in any part of Canada should be equipped (i.e., have the OCC) to interact with French-speaking Canadians. The PFC corpus offers authentic language to familiarize learners with this variety of the language amongst others, and it can be the focus of attention to a lesser or greater extent depending on the specific goals of the instructor, or more importantly, the learners in question.

Another huge benefit of working with a corpus is that it makes students aware of language in a general sense. If we look at Tables 1 through 4, corresponding to the four conditions of the PFC project, we see a number of discoverable aspects of language from just this limited data. Already from the data summarized in Table 1, students may sense a frequency effect – i.e., that a very commonly used word, or one that occurs with a high frequency in a given context, like quatre, will be subject to reduction processes to a much higher degree than other words that are less frequent. They may also notice where quatre does not reduce – in its first occurrence in the reading of the first word list – and hypothesize the relevance of repetition on reduction. Another phenomenon for which learners may notice evidence in the word-list condition is lexicalization, as in the form quatre-vingts, where reduction does not occur in Canadian varieties. In all of these instances, an instructor may wish to point these things out to learners rather than relying on them to notice.

From the data summarized in Table 2, there is another example of apparent lexicalization, with the form Ministre, which is pronounced [minis] with only one exception. The exception provides another valuable lesson, demonstrating the messiness of naturalistic data. Yet another case of lexicalization is apparent from the data summarized in Table 3, that of Notre-Dame-de-Lourdes. The combined data from Tables 1-4, of course, provide the information about variation across styles of speech, that reduction is more likely in less formal conditions compared to more formal ones like reading from a text, or especially reading words from a list. This type of metalinguistic knowledge is useful to learners as they work towards advanced OCC, and it is also inherently interesting, and so likely to keep them engaged in the learning process.

There is unlikely to be disagreement with respect to the potential of the corpus to provide quantitatively and qualitatively rich input, and we have extensively described the benefits this brings to the teaching of pronunciation as well as for the development of other aspects of OCC. The guided and free conversations provide a wealth of authentic language, much of which is also of cultural interest. Another indisputable fact is that the corpus data is ideal for raising awareness of different dimensions of sociolinguistic variation, since its design was intended to show precisely this.

Another benefit to mention is that the corpus promotes learner autonomy and learning through a discovery process. Learners can explore the corpus on their own, and they can discover interesting aspects of the language (and language in general) on many levels. Learners can, to a greater or lesser extent, be guided in what they do with the corpus by specially designed pedagogical materials, but what they learn is almost certain to extend beyond the specified goals of a given activity. Further, the inherent interest of the corpus is likely to encourage learners to engage with it beyond what might be specifically assigned.

Finally, it is worth highlighting the perhaps unintuitive usefulness of the spoken corpus for developing writing skills. This potential can be exploited by explicitly comparing spoken and written language, and by having students convert informational content from conversations into written text.



References

Auger, Julie (2002). French immersion in Montréal: Pedagogical norm and functional competence. In: Gass, Bardovi-Harlig, Sieloff Magnan, & Walz (Eds.) (2002). Pedagogical Norms for Second and Foreign Language Learning and Teaching. Amsterdam: John Benjamins, 81-101.

Boersma, Paul & David Weenink (2016). Praat: doing phonetics by computer [Computer program]. Version 6.0.15. (http://www.praat.org/; 23-05-2016).

Bybee, Joan 2000. The phonology of the lexicon: Evidence from lexical diffusion. In: Barlow & Kemmer (Eds.) (2000). Usage-based models of language. Stanford: CSLI, 65-85.

Bybee, Joan 2001. Phonology and language use. Cambridge: Cambridge University Press.

Chen, Hsueh Chu, Lixun Wang, Wong, Pui Man Jennie & Ka Yin Chan. (2014). The Spoken Corpus of the English of Hong Kong and Mainland Chinese learners. The Hong Kong Institute of Education. (http://corpus.ied.edu.hk/phonetics/; 23-05-2016).

Côté, Marie-Hélène (2004). Consonant cluster simplification in Québec French: In: Probus, 16 (2004) 2, 151-201.

Dalton, Christiane & Barbara Seidlhofer. (1994). Pronunciation. Oxford: Oxford University Press.

Dauer, Rebecca M. (2004) Ways of using video: A report from TESOL’s 2003 convention: In: SPLIS Newsletter. As We Speak 1 (2004) 1, 9-10.

Detey, Sylvain, Jacques Durand, Bernard Laks & Chantal Lyche (Eds.) (2010). Les variétés du français parlé dans l espace francophone. Ressources pour l enseignement. Paris: Éditions Ophrys.

Durand, Jacques, Bernard Laks & Chantal Lyche (2002). La phonologie du français contemporain: usages, variétés et structure. In: Pusch, C.D. & W. Raible (Eds.) (2002). Romanistische Korpuslinguistik- Korpora und gesprochene Sprache/Romance Corpus Linguistics - Corpora and Spoken Language. Tübingen: Gunter Narr Verlag, 93-106.

Durand, Jacques, Bernard Laks & Chantal Lyche (2009). Le projet PFC: une source de données primaires structurées. In: Durand, Laks & Lyche (Eds.) (2009). Phonologie, variation et accents du français. Paris: Hermès, 19-61.

Grant, Linda (2000). Well said: Pronunciation for clear communication. Boston: Heinle & Heinle.

Gut, Ulrike. (2005). Corpus-Based Pronunciation Training. Proceedings of the Phonetics Teaching and Language Conference, London: University College London. 
(https://www.ucl.ac.uk/pals/study/cpd/cpd-courses/ptlc/proceedings_2005).

Jones, Randall (1997). Creating and Using a Corpus of Spoken German. In: Wichmann, Fligelstone, McEnery & Knowles (Eds.) (1997). Teaching and Language Corpora. Harlow: Addison Wesley Longman, 146-156.

Knowles, Gerald (1990). The Use of Spoken and Written Corpora in the Teaching of Language and Linguistics: In: Literary & Linguistic Computing 5 (1990) 1, 45-48.

Mauranen, Anna (2004). Spoken Corpus for an Ordinary Learner. In: Sinclair (Ed.), How to Use Corpora in Language Teaching. Amsterdam: John Benjamins, 89-105.

Milne, Peter (2016). The variable pronunciations of word-final consonant clusters in a force aligned corpus of spoken French. Paper presented at the Montréal-Ottawa-Laval-Toronto Phonology Workshop, Carleton University, Canada.

Monk, J., C. Lindgren & M. Meyers (2003). The mirroring technique in prosodic acquisition. Paper presented at the 37th Annual TESOL Convention, Baltimore, USA.

Morin, Yves-Charles (2000). Le français de référence et les normes de prononciation: In: Le Cahier de l’Institut de linguistique de Louvain 26 (2000) 1, 91-135.

Ostiguy, Luc & Claude Tousignant (2008). Le français québécois: normes et usages (2nd ed.). Montreal: Guérin.

Quarterman, Carolyn & C. Boatwright. (2003). Helping pronunciation students become independent learners. Paper presented at the 37th Annual TESOL Convention, Baltimore, USA.

Reed. Marine (2016). Teaching talk and tell-backs: The declarative to procedural knowledge interface. In: Levis, Le, Lucic, Simpson & Vo (Eds.) (2016) Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference. Ames, IA: Iowa State University, 237-244.

Santos Pereira, Luisa Alice (2004). How to Use Corpora in Language Teaching. In: Sinclair (Ed.) (2004). How to Use Corpora in Language Teaching. Amsterdam: John Benjamins, 109-122.

Sinclair, John (2004). How to Use Corpora in Language Teaching. Amsterdam: John Benjamins.

Wichmann, Anne (1997). The Use of Annotated Speech Corpora in the Teaching of Prosody. In: Wichmann, Fligelstone, McEnery & Knowles (Eds.) (1997). Teaching and Language Corpora. Harlow: Addison Wesley Longman, 211-223.

Wichmann, Anne, Steven Fligelstone, Tony McEnery, & Gerry Knowles (Eds.) (1997). Teaching and Language Corpora. Harlow: Addison Wesley Longman.



Author:
Randall Gess, Ph.D.
Professor / Professeur titulaire
School of Linguistics and Language Studies
Département de français
1618 Dunton Tower
Carleton University
1125 Colonel By Drive
Ottawa ON K1S 5B6
Email: randall.gess@carleton.ca


1The relatively long gap between the retrieval of our data and their methodological analysis here is partly due to the priority of phonological data analysis in the PFC project, with any further applications, such as methodological ones, being of secondary importance only.
2 Actual forms of Ministre produced were as follows: [minisʔiɹa], [ministʔɛ], [minis(ʔ#)lɑse], [minisʔnə], [minis(ʔ#)lə], [minis##], [minispuɹ].

3 Two instances of Notre-Dame-de-Lourdes occur, in which the OL sequence of Notre is unreduced and followed by schwa. One can assume this form to be lexicalized with schwa like quatre-vingt above, and so the OL sequence is excluded from our data as word-internal rather than word-final.