Volume 8 (2017) Issue 1
Using
Corpus Data in the Development of
Second
Language Oral Communicative Competence
Randall
Gess (Ottawa, Canada)
Abstract
(English)
The
present paper describes how a large corpus of spoken French, stemming
from the international Phonology of Contemporary French (PFC)
project, can be used in the development of second language oral
communicative competence, with a non-exclusive focus on
pronunciation. Following a brief overview of the PFC project, data
from one survey point of this corpus will be provided, illustrating a
widespread phenomenon in Canadian French: word-final cluster
simplification. It will be shown how and to what ends the data can be
exploited for classroom use. For students, the potential benefits of
using corpus data are manifold. These include greater learner
autonomy, quantitatively and qualitatively rich natural input,
excellent points of comparison between written and oral language,
exposure to numerous and diverse varieties of spoken French as well
as to rich cultural information from across the francophone world
and, last but certainly not least, a raised awareness of different
dimensions of sociolinguistic variation.
Keywords:
Spoken language corpus, oral communicative competence, pronunciation
teaching
Abstract
(Français)
Cet
article décrit comment on peut utiliser un corpus important du
français parlé, provenant du projet international Phonologie du
Français Contemporain (PFC), dans le développement de la compétence
communicative orale d’une deuxième langue, avec une attention non
exhaustive attribuée à la prononciation. Suite à un bref survol du
projet PFC, il est présenté des données d’un point d’enquête
de ce corpus, qui illustre un phénomène répandu du français
canadien: la simplification des groupes consonantiques finales. Il
sera également démontré des chances d’exploitation des données
dans la salle de classe, et les fins liées à celles-ci. Pour les
étudiants, les avantages potentiels sont nombreux. Ceux-ci
comprennent une autonomie d’apprentissage plus importante, de
l’input naturel riche du point de vue quantitatif et qualitatif,
d’excellents points de comparaison entre la langue écrite et la
langue parlée, l’exposition à des variétés nombreuses et
diverses du français parlé ainsi qu’à de riches informations
culturelles de partout à travers la francophonie et enfin, une
connaissance approfondie des différentes dimensions de variation
sociolinguistique.
Mots
clés: Corpus de langue parlée, compétence communicative orale,
l’enseignement de la prononciation
1
Introduction
Given the
impressive rise of corpus linguistics over the past few decades, the
dearth of research on the use of corpora for the development of
second language (L2) pronunciation, a crucial aspect of L2 oral
communicative competence (OCC), is somewhat surprising. Works
treating the use of corpora in the teaching of language and
linguistics start to appear in the 1990s (Knowles 1990, Sinclair
2004, Wichmann et al. 1997). However, within this body of work the
focus is largely on written corpora. Notable exceptions are chapters
on using a spoken German corpus to determine things like vocabulary
frequency, and on teaching intonation to students of English
phonetics and phonology in Wichmann et al. (1997) (Jones 1997,
Wichmann 1997); and chapters on authenticity, communicative utility,
and formulaic expressions in English, and on the use of concordancing
in the teaching of Portuguese in Sinclair (2004) (Mauranen 2004,
Santos Pereira 2004). Of these works, none have a focus on
pronunciation, although they do have relevance to OCC more generally,
to varying degrees.
The first
research evidence on the use of corpora for teaching pronunciation
is, to my knowledge, the very short article by Gut (2005). Besides a
PFC-related publication that will be discussed shortly, the only
other available research product focused on using a corpus for
teaching pronunciation is the website at the Hong Kong Institute of
Education, A
Corpus-Based Pronunciation Learning Website
(Chen et al. 2014). It is interesting that the corpus of focus in
both Gut and Chen et al. is a learner corpus: German-speaking
learners of English and English-speaking learners of German in the
case of Gut, and Chinese-speaking learners of English in Chen et al.
The work of Detey et al. (2010), based entirely on the
Phonologie
du Français Contemporain
(PFC)
project (or rather of its off-shoot project, the PFC-Enseignement du
Français or PFC-EF), therefore represents an important landmark
development. It is important to note, however, that the focus of the
PFC-EF is much broader than the teaching of pronunciation – it
explores the use of the PFC corpus for the development of OCC
generally, as well as for focusing on grammatical form, and even for
the development of writing, the latter principally by way of explicit
stylistic comparison between written and spoken language.
In
this article, data from one PFC survey point will be provided,
illustrating a single aspect of French phonology. The survey point is
a community called Maillardville, in Coquitlam, British Columbia, and
the aspect of French phonology is the reduction of word-final
consonant clusters. Before turning to the relevant data, first, a
brief overview of the PFC project will be given and then, a basic
description of word-final cluster simplification in French will be
added. After going over the data from the PFC survey point in
question, it will be shown how it can be exploited in the classroom,
and for what purposes. Finally, I will outline what I see as the many
benefits to using corpus data for teaching aspects of pronunciation
and other aspects of OCC.
2
The PFC Project
The
goal of the PFC project (http://www.projet-pfc.net) is to describe
the pronunciation of French in all its geographic, social, and
stylistic diversity. To this end, the project seeks to build a vast
corpus of French as it is spoken around the world, based on surveys
conducted by an international team of researchers and their students,
using a common protocol, as well as common methods and tools for
analysis. The reasons for doing so are not purely descriptive. The
envisaged corpus can also serve to test current models of phonetics
and phonology, to encourage the sharing of research, to provide for
the renewal of data informing the teaching of French, as well as
simply to preserve a crucial part of the patrimoine
linguistique
of the francophone world (Durand, Laks & Lyche 2002, 2009).
The
ideal PFC survey point involves 12 speakers representing both
genders, a minimum of two age groups, and some differences in level
of education and / or professional profile. Speakers complete four
tasks:
- a guided conversation designed to gather basic information about the speaker and her or his linguistic background;
- a free conversation of approximately 30 minutes with a fellow member of the speech community;
- the reading of a common word list (94 words (plus an additional, tailored list of 115 words, for Canadian speakers)); and
- the reading of a common text (an invented three-paragraph news article).
At
least one interviewer per survey point should belong to (or be well
known to) the community of speakers – this is usually the partner
for the free conversation task. The four tasks are designed to elicit
different registers of the language, from very careful and monitored
(reading words in isolation) to casual and unmonitored (free
conversation with someone familiar to the speaker). It should be
noted that these are ideals that are not always achieved. For
example, for the survey point to be discussed here, there was no
interviewer known to the participants and, although one did live in
the same town, she was not perceived as a member of the community.
Among
the targets for analysis are the vowel inventory (contrasts,
allophonic realizations), the consonant inventory (e.g., h-aspiré,
the rhotic, the palatal nasal, allophonic realizations), realization
of schwa (at prosodic boundaries, in different positions within the
word, in monosyllables (clitics), in schwa sequences), realization of
liaison
(so-called
“obligatory” and “optional”), and prosody. Of course, any
other aspect of phonology may be analyzed, as is the case here, where
we will look at the reduction of word-final consonant clusters.
The PFC-EF, an
off-shoot of the larger PFC project, is designed principally for
those teaching French and / or developing French pedagogical
materials. PFC data is used to develop materials for the teaching /
learning and the diffusion of French, representative of the variation
found across the francophone world. The goal of the PFC-EF is to
provide rich and diverse classroom materials for listening and
speaking, for comparison to written language and to le
français de référence
(Morin 2000) as well as for analysis of variation that exists across
the francophone world. These resources should be useful for teachers
of French as a first, second, or foreign language. The audience
assumed here is one consisting of university-level learners of French
as a second language in a predominantly English-speaking part of
Canada.
The
context envisioned here is a third-year course on oral expression,
although any pedagogical suggestions made are easily transferable to
other contexts with minor or major adaptations as required.
3
Word-Final Consonant Cluster Reduction in French
The
simplification of word-final consonant clusters is a widespread
phenomenon in French, and more so in Canadian French than in European
varieties (see Milne 2016 for a close examination of the treatment of
final clusters in the two dialects). Most clusters in question, and
all of those that will be discussed here, are in final position
following a historical deletion process targeting word-final schwa.
Current scholarship therefore assumes
the
absence of an underlying final schwa in the relevant forms (Côté
2004), although its former presence continues to be reflected in
orthography with a final e.
When a schwa is pronounced following the clusters in question, it is
considered to be epenthetic.
While
the assumption of underlying representations without a schwa is not
absolutely critical, it does have implications for pedagogical
approach and learning outcomes that will be discussed later.
For
simplicity, the focus here is only on clusters of the type
obstruent+liquid (OL) (for an extensive treatment of all types of
word-final consonant clusters, the reader is referred to Côté
(2004)). OL consonant cluster reduction is illustrated in the
following examples taken from Ostiguy & Tousignant (2008), which
are represented based on orthography (abbreviated in the case of the
reduced forms). (Recall that the final orthographic e
in the non-reduced forms is not pronounced):
- possib (< possible)sob (< sobre)peup (< peuple)prop (< propre)souf (< souffle)poud (< poudre)règ (< règle)let (< lettre)spectac (< spectacle)
(Ostiguy & Tousignant 2008: 173)
Ostiguy
& Tousignant describe word-final cluster simplification as being
very common in spoken Quebec French, so much that even cultivated
speakers or speakers attending to their speech will not notice it and
will assign no negative judgement to it. Indeed, the phenomenon is so
pervasive that the authors actually pose the question of whether the
non-reduced variants should be brought to students’ attention. They
come to an affirmative conclusion, saying that their existence in
formal spoken Quebec French fully justifies doing so (2008: 177).
Debating whether students should be guided to notice non-reduced
variants pre-supposes that the reduced variants are the unmarked
target form. Should the latter therefore serve as the pedagogical
norm? The data discussed in the following section, as limited as they
are, shed interesting light on this question, to which we return
afterwards.
4
Data
The
data in this section come from a single, male speaker from the PFC
survey point at Maillardville (Coquitlam, British Columbia). At the
time of the recording (July 20061),
the speaker was 62 years old. He was born in Winnipeg, Manitoba, to
francophone parents, and the family moved to Maillardville when he
was four years old. He is a highly educated speaker, having an
advanced degree in education and in his career not only taught
French, but also played an important role in the development of
French immersion programs in the public school system. It is the
professional background of this speaker which determined the use of
his data for the present study, so as to show that the nonstandard
feature described and analyzed here is not at all representative of
"unsophisticated" speech.
The
data come from the four task conditions – the reading lists, the
text, the guided conversation (a six-minute extract), and the free
conversation (a six-minute extract). Recall that for Canadian survey
points, there are in fact two reading lists. The first contains 94
items and targets a variety of sounds and contrasts considered as
important points of potential variation across varieties. The second
list consists of 115 items targeting phenomena more common in
varieties of French in Canada, including word-final consonant cluster
reduction. The guided conversation was conducted by the author of
this article, who was a visitor to the community, unknown to any of
its members. The second interviewer was the one described previously,
living in the same town, but not perceived as a member of the
Maillardville community, and unknown to any participants. It was this
interviewer who engaged in the free conversation task with the
speakers. Because she was unknown to the participants, little or no
real difference is expected between the guided conversation and the
free conversation.
4.1 The
Word Lists
In
the first word-list, there are only three items with the target
sequences: peuple,
meurtre,
and feutre.
Of these three, none are reduced by the speaker. Indeed, one of them,
peuple,
is produced with an epenthetic final schwa. However, in reading the
list of words, participants are also asked to say aloud the number
for each. The 94 items give seven instances of the word quatre
(presented to the reader in number form (4),
not orthographically): quatre
(4),
vingt-quatre
(24),
trente-quatre
(34),
quarante-quatre
(44),
cinquante-quatre
(54),
soixante-quatre
(64),
and quatre-vingt-quatre
(84).
Of these seven items, there is only one token without a reduced form,
which happens to be the shortest and the first of the series, i.e.,
quatre.
The astute reader will have noticed that there is a second token of
quatre
in quatre-vingt-quatre.
In this case, the schwa is lexicalized in the collocation
quatre-vingts,
and its realization is exceptionless across speakers and varieties.
Given that the form is lexicalized, we can consider the OL sequence
as word-internal, not word-final.
The
second list has many more target sequences, having been designed
specifically for Canadian French, where final cluster reduction is
noted to be more widespread. In fact, there are 19 such items, as
follows:
mettre
|
maître
|
sable
|
libre
|
couple
|
ministre
|
neutre
|
jungle
|
tabernacle
|
le
prêtre
|
aveugle
|
convaincre
|
vinaigre
|
orchestre
|
ombre
|
épingle
|
arbre
|
cent
piastres sur la table
|
Of
these 19 items, none is pronounced with a reduced cluster. Five are
pronounced with a final audible schwa. With this longer list, there
are eight instances of quatre
(4):
quatre
(4),
vingt-quatre
(24),
trente-quatre
(34),
quarante-quatre
(44),
cinquante-quatre
(54),
soixante-quatre
(64),
quatre-vingt-quatre
(84),
cent
quatre (104).
Of these, all are pronounced with a reduced cluster. The results of
the word lists are summarized in Table 1:
Forms
|
Word-List
Items
|
quatre
|
reduced
|
0
(0%)
|
14
(93%)
|
unreduced
(no schwa)
|
16
(73%)
|
1
(7%)
|
unreduced
(with schwa)
|
6
(27%)
|
0
(0%)
|
Table
1: Reduction of Final Clusters in Word Lists
It
is clear from Table 1 that the list-reading condition disfavours
reduction, since there are no reduced forms produced at all. Indeed,
to the extent that OL sequences are not tolerated in word-final
position (27%), they are resolved via schwa epenthesis rather than by
reduction. For the number quatre,
quite the opposite holds, with reduction being the near absolute rule
(93%).
4.2 The
Reading Text
The
reading text contains 12 tokens with a final cluster, seven of which
are instances of the same word, Ministre
(from Premier
Ministre).
The other five items are titres,
autre,
membre,
articles,
and centre.
Of the 12 tokens, ten are pronounced with a reduced cluster (indeed,
in six of them, the cluster is deleted). Only autre
and centre
are produced without reduction, and both with an epenthetic schwa.
These items occur in the sequences un
autre côté
and au
centre d’une bataille.
All three of the reduced forms similarly occur before a consonant, so
the phonological environment appears not to be relevant.
On
close inspection, one is tempted to say that Ministre
has been lexicalized with the entire OL sequence deleted (i.e., as
[minis]). The word is pronounced with a [t] only once, in the second
token, where a vowel follows. However, a vowel also follows in the
first token, and no [t] is pronounced. In the lone token with the
[t], the target sequence is ‘Le
Premier Ministre a en effet décidé… ,’
and the speaker stumbles quite noticeably. Immediately following the
sequence [minis], he pronounces a glottalized [t] followed by the
incorrect vowel [ɛ], followed by another, clearer [t] followed by
[a], then a longish glottal stop followed by the correct formulation
for ‘a
en effet’.
Otherwise, the form [minis] occurs in all environments: before a
vowel, before a consonant, at the end of a phonological phrase, and
at the end of a phonological utterance.
The
results of the reading text are summarized in Table 2:
Forms
|
Ministre2
|
Other
Tokens
|
deleted
|
7
(100%)
|
0
(0%)
|
reduced
|
0
(0%)
|
3
(60%)
|
unreduced
(no schwa)
|
0
(0%)
|
0
(0%)
|
unreduced
(with schwa)
|
0
(0%)
|
2
(40%)
|
Table
2: Deletion and Reduction of Final Clusters in the Reading Text
Again,
we see that in the form Ministre,
deletion is categorical, and perhaps for the entire OL sequence
rather than just for the liquid member of the cluster. Otherwise, it
appears that the reading text condition favours reduction, 60% to
40%, although the number of tokens is rather limited. Interestingly,
no OL clusters surface unaltered – they are either reduced or
followed by an epenthetic schwa.
4.3
The Guided Conversation
Seven
tokens with final consonant clusters occur in the six-minute extract
from the guided conversation. The words that appear are eux-autres,
couvre,
autre,
favorable,
prendre,
peut-être,
and kilomètres.3
Of the seven tokens, five are produced with a reduced cluster. The
two non-reduced tokens are in couvre
(“… ça
couvre la la
…”) and kilomètres
(“… une
vingtaine de kilomètres d’ici…”).
All of the other forms occur before a consonant, with the exception
of peut-être,
which occurs before the hesitation form euh
(and is reduced), so as for the reading text, phonological
environment appears not to be relevant to reduction. The results of
the guided conversation are presented in Table 3:
Forms
|
Tokens
|
reduced
|
5
(71%)
|
unreduced
(no schwa)
|
0
(0%)
|
unreduced
(with schwa)
|
2
(29%)
|
Table
3: Reduction of Final Clusters in the Guided Conversation
In
the guided conversation, reduction is clearly favoured, at 71%. It is
also
noteworthy that, as was the case for the reading text, OL clusters
never surface unaltered – when not reduced, they are pronounced
with a following epenthetic schwa.
4.4
The Free Conversation
There
are 16 tokens with final consonant clusters in the six-minute extract
from the free conversation. The relevant words are nous-autres
(x4), apprendre,
autre
(adj.) (x3), autres
(n.) (x2), répondre,
notre,
exemple,
incroyable,
peut-être,
and favorable.
Of the 16 tokens, 11 are reduced. The results of the free
conversation are presented in Table 4:
Forms
|
Tokens
|
reduced
|
11
(68.75%)
|
unreduced
(no schwa)
|
3
(18.75%)
|
unreduced
(with schwa)
|
2
(12.5%)
|
Table
4: Reduction of Final Clusters in the Free Conversation
The
pattern of reduction in the free conversation quite closely mirrors
the pattern in the guided conversation (as we expected, given that we
did not have a real community member to participate in the free
conversation).
4.5
Discussion
The
question we asked before presenting the data was whether reduced
forms should serve as the pedagogical norm. To facilitate
consideration
of this question, Table 5 provides a synthesis of the information
from Tables 1 through
4, leaving out the tokens of quatre
(4)
from the word-list condition and those of Ministre
from the reading text condition,
for the same reasons they were treated separately in Tables 1 and 2:
Forms
|
Word
lists
|
Text
|
Conversations
|
reduced
|
0%
|
60%
|
70%
|
unreduced
(no schwa)
|
73%
|
0%
|
13%
|
unreduced
(with schwa)
|
27%
|
40%
|
17%
|
Table
5: Reduction of Final Clusters across Conditions (non-lexicalized
tokens only)
From
the limited data from this single speaker, it would seem that a
strong argument can be made for the reduced form serving as the
pedagogical norm, at least for spoken French in Canada. A clear
majority of word-final clusters are reduced in both natural (i.e.,
non-list) reading, and in conversations (more or less formal to the
extent that formality varied in the conversations for this survey
point). This is a highly conservative count, given that the number
quatre
(4)
is reduced in the reading task at a rate of 93%, and Ministre
is reduced without exception in the reading text condition, and
virtually to the point of having no relevant OL sequence! Only in the
reading of the word-lists were final clusters unreduced. Clearly, we
do not want learners to sound stilted in natural speaking conditions,
and this seems a real risk if we use the unreduced form as a
pedagogical norm.
On
the other hand, in order to guide learners to the correct lexical
representations (unreduced, without schwa), exposure to the word-list
data appears crucial. It is only from this data that learners can be
reasonably expected to arrive at the underlying form without schwa
since it is a minority variant in conversation and does not occur in
the reading text at all, whereas the unreduced form with schwa occurs
as a minority pronunciation in all conditions, and a fairly robust
one in the reading conditions (the word-lists and especially the
text). Learners could be led to the correct form by way of formal
explanation – i.e., that variant pronunciations can be derived from
the unreduced form without schwa via a simple, one-step change in one
of two possible directions (consonant deletion or schwa epenthesis) –
but our goal is to focus on pronunciation, not to train phonologists!
Presenting the forms from the word lists is a far less frustrating
way to guide learners to the correct lexical representation.
It
is legitimate to ask whether it is even necessary to attend to the
development of underlying lexical forms, but if a learner’s base
form contains a final schwa, there is a risk of highly unnatural,
hyper-articulated speech, the avoidance of which is an important goal
of pronunciation training. On the other hand, if a learner posits the
reduced form as the lexical representation, the risk is of
inappropriately informal speech across contexts, as well as of
outright pronunciation errors if an incorrect second consonant of a
given cluster is realized in some instances (if the second consonant
is not in the lexical representation, its accurate recovery in
production is unsure). Another point to consider is that students
will almost certainly come with some relevant lexical representations
already formed. Part of the goal may therefore to be to correct those
that are wrong and give rise to unnatural pronunciations.
The
up-shot of the preceding discussion is that it is important to
present to learners a full range of appropriate input across speech
styles – precisely what the corpus affords (rich input is discussed
in some detail below). The arguments presented here with respect to
word-final clusters are easily transferable to other aspects of
pronunciation.
5
Exploiting the Data for Teaching Pronunciation
We
now turn to the question of how the data we have seen in the previous
section can be put to use for teaching pronunciation. In fact, corpus
data can be useful to at least three of four crucial aspects of
pronunciation learning: creating lexical representations,
understanding pronunciation rules (i.e., developing declarative
knowledge), and developing automaticity. Its usefulness to a fourth
aspect of learning – developing procedural knowledge – is not
obvious, although some researchers consider the step from declarative
knowledge to procedural knowledge somewhat trivial in the case of at
least some aspects of pronunciation. For example, Dalton &
Seidlhofer assert that, “once learners know that simplifications
are normal, they are often able to convert this knowledge into
active, procedural knowledge with astounding ease” (Dalton &
Seidlhofer 1994: 116). Reed (2016) provides two simple classroom
strategies for bridging “the declarative to procedural knowledge
gap” (Reed 2016: 237) for more problematic aspects of
pronunciation. Of course in our case, the procedural knowledge
involved seems quite unproblematic – it simply entails the
inhibition of articulators in a certain specific phonological
structure.
The
following sections, in turn, focus on creating lexical
representations through the provision of rich input, building
declarative knowledge, and developing automaticity.
5.1
Providing Rich Input
A
key aspect of teaching pronunciation is modeling – the provision of
input targeting a specific form or structure. The most obvious
benefit of using a large corpus is that it affords a quantitative and
qualitative richness in terms of modeling that is unparalleled. In
the case of word-final clusters, and our single speaker from our
single survey point, we have 75 relevant tokens. There are 16 other
speakers from our survey point, although the norm is 12, and there
are 38 survey points with data made available online on the PFC
project website referred to earlier. For each survey point, one can
access data (sound files and transcription) by speaker for the word
list(s), the reading text, six minutes of the guided conversation,
and six minutes of the free conversation. This adds up to a large
quantity of valuable data.
In
terms of quality, we have seen that the PFC protocol results in data
from up to four speech styles, depending on how distinct the guided
and free conversations are (which, in turn, depends on whether the
conversation partner for the free conversation is previously known to
the speaker – a weak point of our own survey point, as discussed
earlier). The data available through the PFC project also span a
number of dialect areas across the francophone world, therefore
providing ready access to data that was previously extremely
difficult to come by, if not impossible in some cases. The
naturalistic nature of the input also contributes to its high
quality. While it is certainly true that reading a word list and
reading a prepared text are not generally considered 'authentic
language', these are instances of native speakers reading for a
non-pedagogical purpose. And, of course, the data from these contexts
are complemented by the data from conversations. The unquestionably
naturalistic nature of the latter ensures (or at least militates in
favour of) variety in terms of the phonological environments that
target phonological forms occur in. So the form in question will be
heard not only in phrase-final position, but rather (as we have
seen), phrase-internally before both consonants and vowels, and the
end of a phonological phrase, and at the end of a phonological
utterance.
The
most important role of the quantitatively and qualitatively rich
input in the acquisition process is to facilitate the building of
lexical representations. If we follow Bybee (2000, 2001) in assuming
that lexical representations are basically clusters of exemplars that
can be continuously updated, then exposure to multiple tokens
(exemplars) serves to enrich representations, providing phonetic
detail and information regarding permissible surface variations.
Exposure over time and across different contexts will aid learners in
associating certain exemplar types (i.e., full or reduced forms) to
appropriate speech contexts (more or less formal).
Students
can be exposed to speech samples directly from the PFC site, with
simultaneous audio and transcription. One potential drawback to
direct access, depending on the input-providing activity envisioned,
is that the speech samples are not tailored to individual
phonological structures or processes but rather, by design, to a wide
variety of them. For example, for our case of word-final clusters,
these are found amongst a multitude of non-target items. It may not
be a beneficial use of time to have students listen to entire
extracts for only a few tokens. This is most obvious in the case of
the word lists, in which there are only 22 relevant tokens amongst
209 items (the ratio is much worse for the sole word list used
outside of Canada, at 3 / 94). More importantly than the
ratio of target items to non-target ones, there is not much benefit
to listening to a list of words being read, compared to listening to
a reading text, or especially to natural conversation about a topic
that may well be of inherent interest. So, even if the conversations
do not have a high number of target items, important side benefits
can be had from listening to them.
If
using speech samples directly from the PFC site is not ideal for a
given purpose, tokens from any or all of the four conditions can also
be easily extracted (recorded) from the PFC audio files through
freely available software such as Audacity® or Praat (Boersma &
Weenkink 2016). Tokens can then be presented to students for the
purpose of building lexical representations in a variety of ways.
They can be presented with or without reference to their written
forms, they can be presented in isolation or in context (of course
those tokens from the word list(s) are inherently in isolation), and
different surface variants can be presented when and as desired.
5.2
Building Declarative Knowledge
Students’
exposure to multiple variants of tokens in context allows them, over
time, to ascertain the phonological and stylistic factors
conditioning variation. The beauty of the corpus in this regard is
that it allows for the development of this declarative knowledge
through exploration on the part of the learner, which can be guided
to a greater or lesser extent by the instructor. This explorative use
of the corpus simultaneously provides input for building and
enriching lexical representations (Section 5.1) and for constructing
declarative knowledge, the latter relying absolutely on the context
in which variants occur. That is, while lexical representations can
be developed through exposure to different variants in isolation,
learning the rules that produce them depends on hearing them in their
conditioning environments, whether these be speech style (on the
formal to informal continuum, corresponding to word list(s), reading
text, guided and free conversations), or phonological context
(pre-vocalic, pre-consonantal, phrase-internal, phrase-final, etc.),
and preferably both. Exposure to a range of speech styles is, of
course, a built-in feature of the PFC corpus, and exposure to a range
of phonological contexts is heavily favoured – by design in the
reading text, at least for some phenomena (realization of schwa,
liaison) and in relation to natural frequency of occurrence in the
six minutes each of guided and free conversations.
With
respect to word-final consonant clusters, one can begin by exposing
students to a speech sample from conversation, in which variants
reflecting the pedagogical norm are in preponderance. At first, the
focus on form can be passive, couched in a listening comprehension
activity with a primary focus on the content of the conversation.
Following this can be a more explicit focus on form, with students
actively listening for reduced OL clusters. They can first be asked
to identify relevant words in the written transcript, underlining
them, and then to listen for the pronunciations on subsequent
replays. Then students and instructor can discuss the variants and
their quantitative distribution, and come up with hypotheses to
explain these (i.e., possible pronunciation rules). The instructor is
obviously free to tailor any such activity as he or she deems
appropriate, and the focus on form can be broader – on any
reductions involving consonants, or indeed on reductions more
generally. This will depend on the type of course and its overall
goals, time allotted to pronunciation training versus other aspects
of OCC, etc.
Remaining
on our focus here on OL clusters, one can then move to the reading
text and then to the word lists. Before moving to each new condition,
students can be asked to make predictions regarding what they will
hear, taking into account the nature of the task the speaker is
engaged in. After working with the reading text, comparison can be
made with the conversation, and hypotheses made earlier can be
revisited and adjusted as necessary.
For
the word lists, playing isolated realizations of the number quatre
(4)
would be a good way to begin, to reinforce what was found in the
naturalistic speech of conversations, and then moving to
the relevant words extracted from the word-lists. Overall comparisons
can follow the word-list activity, and final pronunciation rules
arrived at.
Another
possible way to approach the students’ discovery process would be
to have them explore the online corpus themselves (with appropriate
instruction in how to do so, of course). They can be directed to find
relevant tokens in each condition, identify variants and quantify
their distributions, and come up with hypotheses to explain them.
This kind of activity can be done outside of class and, depending on
the structure of the course and its constraints (including its size),
students can be assigned to work with different target structures and
to report back their findings to the class. The corpus provides
unique potential for precisely this kind of autonomous learning.
5.3
Developing Automaticity
Repetition
is clearly a key ingredient for developing automaticity, and the word
lists are of obviously utility for this purpose, but single words and
/ or short collocations can also be extracted from any of the
conditions. Beyond simple repetition, practice reading is a perfect
next step, and the reading text provides great modeling for this
purpose. After listening closely to the text as read by one or more
PFC speakers, learners can practice reading it themselves. Learners
can do so in pairs or groups, recording theirs and others’ reading
of the text, and comparing the recordings to those of the PFC speaker
model(s) they have used.
For
use with PFC conversations, two useful, more advanced activities for
developing automaticity are shadowing
(Dauer 2004, Grant 2000, Quarterman & Boatwright 2003) and
mirroring
(Dauer 2004, Monk, Lindgren & Meyers 2003). In a shadowing
activity, a learner repeats word for word what a speaker says,
following by just a word or two. This is a holistic activity, in
which learners must pay close attention to imitating as exactly as
possible not only the words spoken (including aspects like OL cluster
reduction), but every aspect of pronunciation, including rate and
rhythm, pauses and hesitations, prominence patterns, and phrasal
intonation. With the right equipment, learners can access the input
for this activity through headphones and simultaneously record
themselves, analyzing the results afterwards in comparison with the
model recording.
Mirroring
is similar to shadowing, but with more explicit attention on
linguistic features ahead of time. Learners transcribe around a
minute’s worth of speech (or take a transcript of speech available
through the PFC site), and carefully annotate intonation contours,
prominences, hesitations, and pauses. PFC transcriptions are based on
normal orthography and so do not indicate segmental phenomena like
assimilation, deletion, or lengthening, any or all of which can also
be annotated, as desired. Learners then practice mirroring the speech
as precisely as possible, eventually recording themselves and
evaluating the product (with or without the assistance of peers).
A
great extension of these types of activities using the PFC corpus is
to have students act as if they themselves are subjects of the study,
participating in the guided and free conversation components.
Students can interview each other, using questions of the type asked
in the guided conversations, and / or they can simply engage in free
conversation. Depending on the amount of time that can be allotted to
this type of activity, these conversations can be recorded,
transcribed and analyzed for the feature(s) treated in class.
6
Exploiting the Corpus for Other Aspects of OCC
As
mentioned earlier, the PFC-EF project is designed specifically to
explore the use of the PFC corpus for pedagogical purposes. These
purposes are broad, and they are not all related to teaching
pronunciation. In fact, of the nine pedagogical sheets (fiches
pédagogiques)
available for download
(http://www.projet-pfc.net/ressources-didactiques/fichespedago.html),
very few have
activities with an explicit focus on pronunciation per
se
(i.e., in the procedural sense). Certainly, all provide rich input to
learners, and several have activities focusing on building
declarative knowledge (for example, of obligatory / optional
liaisons, and the realization / deletion of the schwa), but
there is little related to basic production (one activity calls for
repetition of verb forms in casual versus more formal speech), or on
developing automaticity as described in Section 5.3 (one imitation
exercise contrasting two regional varieties). Nevertheless, most
activities found in these documents are in some way more generally
related to the development of OCC.
A
common stated objective of the sheets is to sensitize learners to
phonological variation, including to prejudices that exist with
respect to certain geographical varieties, or features thereof.
Another common objective is to draw learners’ attention to markers
of spontaneous oral discourse (and in so doing, to stylistic
variation itself). Noticing these markers may happen implicitly
(learners hear them, and they see them in the transcripts when
provided), but it is sometimes done explicitly as well, either by
pointing them out, or by asking students to identify them in a
transcript. One of the pedagogical sheets has as an explicit goal to
develop awareness of register, in this case as manifested at the
level of lexis. All of these types of awareness – of geographic and
stylistic variation, and of features of oral as opposed to written
discourse – are important aspects of OCC.
Probably
the most common activity in the pedagogical sheets, and one of
obvious relevance to the development of OCC, is the listening
comprehension activity. This is often the first activity in a
sequence, sometimes preceded by some type of predicting exercise, to
activate schemata. The conversations are ideally suited to this
purpose, as topics vary considerably, and are often full of cultural
information one may not easily find elsewhere. (In our case, the
speaker discussed the history of bilingual education in the province
of British Columbia.)
Other
activities in the pedagogical sheets focus explicitly on
phonological, grammatical, or even orthographic form (the latter to
build declarative knowledge with respect to spelling-to-pronunciation
regularities). Possibilities with respect to focus on phonological
form are manifold (reduction of OL clusters is one example), and the
corpus offers an unparalleled richness in this regard since its
raison
d’être
is phonological analysis. It is perhaps not as obviously the case,
then, that the corpus is equally rich when it comes to focus on
grammatical form – the guided and free conversations particularly
so. The beauty here is that learners see grammar in completely
natural contexts and witness how grammatical structures function in
real communicative situations. Yet another activity found in the
pedagogical sheets is one in which students take content from a
segment of conversation and use it to write a text representative of
some form of written language. The form of written language students
are asked to produce can range from what one might find on a postcard
to something far more formal, with expected features of the language
changing accordingly, and all being quite distinct from the spoken
form.
If
we use the pedagogical sheets as a model, whether specific
pedagogical applications of the corpus focus on advanced listening
skills, attending to phonological or grammatical form, or even on the
development of writing, they will be tailored to a specific intended
audience (defined with respect to competence according to the Common
European Framework of Reference for Languages), and will have clear
objectives with respect to phonological, grammatical, sociolinguistic
and / or discourse features. Objectives will be met through
a variety of tasks, using specified PFC material. The content from
the guided and free conversations can be used to organize lessons
around themes, or the varieties of language to be explored can drive
the organization. At all levels, there is much freedom on the part of
instructors and, to the extent that they wish to use the pedagogical
sheets as models, most of them will have their own expertise to draw
on in adapting them. What the PFC corpus brings to the table is a
wealth of raw material to draw on.
7
Benefits
of Using Corpus Data in the Development of OCC
People
may have differing views with respect to the principal benefits of
using corpus data for teaching pronunciation. For example, where the
instructor sees a tremendous benefit in exposing students to a wide
range of varieties of French, and certainly in exposing students in
Canada to Canadian varieties, others with more prescriptive
proclivities may disagree. We should take seriously the problem
identified in Auger (2002), whereby students leaving French immersion
programs in Montreal are unable to interact with speakers in the
communities in which they live and work, because the pedagogical norm
they are exposed to is too distant from the language used in the
community, as the author demonstrates.
Learners
of French in any part of Canada should be equipped (i.e., have the
OCC) to interact with French-speaking Canadians. The PFC corpus
offers authentic language to familiarize learners with this variety
of the language amongst others, and it can be the focus of attention
to a lesser or greater extent depending on the specific goals of the
instructor, or more importantly, the learners in question.
Another
huge benefit of working with a corpus is that it makes students aware
of language in a general sense.
If
we look at Tables 1 through 4, corresponding to the four conditions
of the PFC project, we see a number of discoverable aspects of
language from just this limited data. Already from the data
summarized in Table 1, students may sense a frequency effect –
i.e., that a very commonly used word, or one that occurs with a high
frequency in a given context, like quatre,
will be subject to reduction processes to a much higher degree than
other words that are less frequent. They may also notice where quatre
does not reduce – in its first occurrence in the reading of the
first word list – and hypothesize the relevance of repetition on
reduction. Another phenomenon for which learners may notice evidence
in the word-list condition is lexicalization, as in the form
quatre-vingts,
where
reduction does not occur in Canadian varieties. In
all of
these instances, an instructor may wish to point these things out to
learners rather than relying on them to notice.
From
the data summarized in Table 2, there is another example of apparent
lexicalization, with the form Ministre,
which is pronounced [minis] with only one exception. The exception
provides another valuable lesson, demonstrating the messiness of
naturalistic data. Yet another case of lexicalization is apparent
from the data summarized in Table 3, that of Notre-Dame-de-Lourdes.
The
combined data from Tables 1-4, of course, provide the information
about variation across styles of speech, that reduction is more
likely in less formal conditions compared to more formal ones like
reading from a text, or especially reading words from a list. This
type of metalinguistic knowledge is useful to learners as they work
towards advanced OCC, and it is also inherently interesting, and so
likely to keep them engaged in the learning process.
There
is unlikely to be disagreement with respect to the potential of the
corpus to provide quantitatively and qualitatively rich input, and we
have extensively described the benefits this brings to the teaching
of pronunciation as well as for the development of other aspects of
OCC. The guided and free conversations provide a wealth of authentic
language, much of which is also of cultural interest. Another
indisputable fact is that the corpus data is ideal for raising
awareness of different dimensions of sociolinguistic variation, since
its design was intended to show precisely this.
Another
benefit to mention is that the corpus promotes learner autonomy and
learning through a discovery process. Learners can explore the corpus
on their own, and they can discover interesting aspects of the
language (and language in general) on many levels. Learners can, to a
greater or lesser extent, be guided in what they do with the corpus
by specially designed pedagogical materials, but what they learn is
almost certain to extend beyond the specified goals of a given
activity. Further, the inherent interest of the corpus is likely to
encourage learners to engage with it beyond what might be
specifically assigned.
Finally,
it is worth highlighting the perhaps unintuitive usefulness of the
spoken corpus for developing writing skills. This potential can be
exploited by explicitly comparing spoken and written language, and by
having students convert informational content from conversations into
written text.
References
Auger,
Julie (2002). French immersion in Montréal: Pedagogical norm and
functional competence. In: Gass, Bardovi-Harlig, Sieloff Magnan, &
Walz (Eds.) (2002). Pedagogical
Norms for Second and Foreign Language Learning and Teaching.
Amsterdam: John Benjamins, 81-101.
Boersma,
Paul & David Weenink (2016). Praat: doing phonetics by computer
[Computer program]. Version 6.0.15. (http://www.praat.org/;
23-05-2016).
Bybee,
Joan 2000. The phonology of the lexicon: Evidence from lexical
diffusion. In: Barlow & Kemmer (Eds.) (2000). Usage-based
models of language.
Stanford: CSLI, 65-85.
Bybee,
Joan 2001. Phonology
and language use.
Cambridge: Cambridge University Press.
Chen,
Hsueh Chu, Lixun Wang, Wong, Pui Man Jennie & Ka Yin Chan.
(2014). The Spoken Corpus of the English of Hong Kong and Mainland
Chinese learners. The Hong Kong Institute of Education.
(http://corpus.ied.edu.hk/phonetics/; 23-05-2016).
Côté,
Marie-Hélène (2004). Consonant cluster simplification in Québec
French: In: Probus,
16
(2004) 2, 151-201.
Dalton,
Christiane & Barbara Seidlhofer. (1994). Pronunciation.
Oxford: Oxford University Press.
Dauer,
Rebecca M. (2004) Ways of using video: A report from TESOL’s 2003
convention: In: SPLIS
Newsletter. As We Speak 1
(2004)
1,
9-10.
Detey,
Sylvain, Jacques Durand, Bernard Laks & Chantal Lyche (Eds.)
(2010). Les
variétés du français parlé dans l espace francophone. Ressources
pour l enseignement.
Paris: Éditions Ophrys.
Durand,
Jacques, Bernard Laks & Chantal Lyche (2002). La phonologie du
français contemporain: usages, variétés et structure. In: Pusch, C.D. & W. Raible (Eds.) (2002).
Romanistische Korpuslinguistik- Korpora und gesprochene
Sprache/Romance Corpus Linguistics - Corpora and Spoken Language.
Tübingen: Gunter Narr Verlag, 93-106.
Durand,
Jacques, Bernard Laks & Chantal Lyche (2009). Le projet PFC: une
source de données primaires structurées. In: Durand, Laks &
Lyche (Eds.) (2009). Phonologie,
variation et accents du français.
Paris: Hermès, 19-61.
Grant,
Linda (2000). Well
said: Pronunciation for clear communication.
Boston: Heinle & Heinle.
Gut,
Ulrike. (2005). Corpus-Based Pronunciation Training. Proceedings
of the Phonetics Teaching and Language Conference,
London: University College London.
(https://www.ucl.ac.uk/pals/study/cpd/cpd-courses/ptlc/proceedings_2005).
Jones,
Randall (1997). Creating and Using a Corpus of Spoken German. In:
Wichmann, Fligelstone, McEnery & Knowles (Eds.) (1997). Teaching
and Language Corpora.
Harlow: Addison Wesley Longman, 146-156.
Knowles,
Gerald (1990). The Use of Spoken and Written Corpora in the Teaching
of Language and Linguistics: In: Literary
& Linguistic Computing 5
(1990) 1,
45-48.
Mauranen,
Anna (2004). Spoken Corpus for an Ordinary Learner. In: Sinclair
(Ed.), How
to Use Corpora in Language Teaching.
Amsterdam: John Benjamins, 89-105.
Milne,
Peter (2016). The
variable pronunciations of word-final consonant clusters in a force
aligned corpus of spoken French.
Paper presented at the Montréal-Ottawa-Laval-Toronto Phonology
Workshop, Carleton University, Canada.
Monk,
J., C. Lindgren & M. Meyers (2003). The
mirroring technique in prosodic acquisition.
Paper presented at the 37th Annual TESOL Convention, Baltimore, USA.
Morin,
Yves-Charles (2000). Le français de référence et les normes de
prononciation: In: Le
Cahier de l’Institut de linguistique de Louvain 26
(2000) 1, 91-135.
Ostiguy,
Luc & Claude Tousignant (2008). Le
français québécois: normes et usages
(2nd ed.). Montreal: Guérin.
Quarterman,
Carolyn & C. Boatwright. (2003). Helping
pronunciation students become independent learners.
Paper presented at the 37th Annual TESOL Convention, Baltimore,
USA.
Reed.
Marine (2016). Teaching
talk and tell-backs: The declarative to procedural knowledge
interface.
In: Levis, Le, Lucic, Simpson & Vo (Eds.) (2016) Proceedings
of the 7th Pronunciation in Second Language Learning and Teaching
Conference.
Ames, IA: Iowa State University, 237-244.
Santos
Pereira, Luisa Alice (2004). How
to Use Corpora in Language Teaching. In:
Sinclair (Ed.) (2004). How
to Use Corpora in Language Teaching.
Amsterdam: John Benjamins, 109-122.
Sinclair,
John (2004). How
to Use Corpora in Language Teaching.
Amsterdam: John Benjamins.
Wichmann,
Anne (1997). The Use of Annotated Speech Corpora in the Teaching of
Prosody. In: Wichmann, Fligelstone, McEnery & Knowles (Eds.)
(1997). Teaching
and Language Corpora.
Harlow: Addison Wesley Longman, 211-223.
Wichmann,
Anne, Steven Fligelstone, Tony McEnery, & Gerry Knowles (Eds.)
(1997). Teaching
and Language Corpora.
Harlow: Addison Wesley Longman.
Author:
Randall
Gess, Ph.D.
Professor
/ Professeur titulaire
School
of Linguistics and Language Studies
Département
de français
1618
Dunton Tower
Carleton
University
1125
Colonel By Drive
Ottawa
ON K1S 5B6
Email: randall.gess@carleton.ca
1The
relatively long gap between the retrieval of our data and their
methodological analysis here is partly due to the priority of
phonological
data analysis in the PFC project, with any further applications,
such as methodological
ones, being of secondary importance only.
2
Actual
forms of Ministre
produced were as follows: [minisʔiɹa], [ministʔɛ],
[minis(ʔ#)lɑse], [minisʔnə], [minis(ʔ#)lə], [minis##],
[minispuɹ].
3
Two
instances of Notre-Dame-de-Lourdes
occur, in which the OL sequence of Notre
is unreduced and followed by schwa. One can assume this form to be
lexicalized with schwa like quatre-vingt
above, and so the OL sequence is excluded from our data as
word-internal rather than word-final.