Volume 6 (2015) Issue 2
A
New Training Study
Asmaa Shehata
(Calgary, Canada)
Abstract
The
purpose of this study was to investigate how training with varying
talkers could affect native English speakers’ acquisition of the
Arabic pharyngeal-glottal consonant contrast that is not contrastive
in English. Learners’ performance on two discrimination tasks,
following a word-learning phase was analyzed in terms of training
type (multiple talkers vs. single talker) and task type (non-lexical
vs. lexical). Findings of the two experiments revealed the
significant effect of training type. That is, the multiple-talker
groups in the two experiments performed more accurately on the two
AXB tasks than did the single-talker groups. This finding suggests
that variability in talkers may be a significant factor that affects
learners’ ability to distinguish words on the basis of L2 consonant
contrasts. Additionally, the results exhibited differences in the
scores of subjects on the two discrimination tasks among the
different groups, which were found to be insignificant, suggesting
that the distinct demands of the two tasks did not have a significant
beneficial effect on learning the nonnative contrastive sounds.
Key
words: lexical representations,
second
language,
talker
variability, task type,
word
recognition
1
  Introduction
Accented speech
produced by second language (L2) learners who acquire their L2 after
childhood is one of the significant areas in L2 speech research where
several scholars have become interested in exploring the reasons
behind the persistence of the foreign accent in L2 speech (Flege et
al. 1999, MacKay et al. 2006, MacKay at al. 2001). Previous studies
have shown several factors that contribute to the accentedness of L2
speech such as learners’ age of arrival in a new host country
(Baker  Trofimovich 2006), the amount of exposure to the target
language (TL) (Bradlow  Bent 2008), learners’ cultural
attitudes (Moyer 2007), musical ability (Wong et al. 2007), and
learners’ length of residence (Flege  Liu 2001). It is also
well documented that learning L2 phonological contrasts represents a
challenge to L2 learners whose first languages (L1s) do not include
these distinctive features that have been reported to be one of the
factors behind the complexity of L2 accented speech. For example,
native Dutch speakers experience difficulty distinguishing English
words like bet
and bat
due to difficulty with the English /æ/-/ε/ contrast
(Cutler & Broersma 2005), while both native Spanish and Portuguese
speakers experience difficulty discriminating the English words beat
and bit
(Bion
et al. 2006), and native Japanese speakers experience difficulty with
the English words right
and
light
(Aoyama
et al. 2004).
To examine
whether or not adult L2 learners can categorize and create lexical
representations for L2 phonological contrasts, previous research has
reported mixed results. While some studies have displayed learners’
inability to lexically encode the target L2 contrasts (Curtin et al.
1998, Ota et al. 2009, Pater 2003, 2004), others have provided
evidence that learners can successfully create lexical
representations for newly-learned words differentiated by novel
phonological contrasts that they exhibit difficulty to distinguish in
non-lexical discrimination tasks. For example, findings of Weber & Cutler (2004) sought to examine the mapping of phonetic information
to lexical entries in second language, using eye-tracking technology.
They explored the native Dutch speakers’ ability to discriminate
between the English lax vowel pair /ε/–/æ/ and the diphthong pair
/aɪ/-/eɪ/. Participants were asked to choose only pictures shown on
the computer screen that matched the words they heard. Their findings
displayed that learners were able to maintain a distinction between
English words containing /ε/ and /æ/ in their lexical
representations, even though they could not perceive the contrast in
the online auditory word identification task.
On the other
hand, a large body of laboratory-based training studies including
both infants and adult L2 learners have provided evidence that
auditory perception training can enhance performance with respect to
novel L2 contrasts (Bradlow et al. 1997, Lively et al. 1993, Lively
et al. 1994, Wang et al. 2003). For instance, McCandliss et al.
(2002) explored the identification of the English /r/ and /l/
contrast by native Japanese speakers in two training environments: a
high-variability training where multiple talkers produced both real
and nonwords, and a limited-variability training where sequences of
phonemes ranging from /r/ to /l/ were spoken by a single talker. A
comparison of learners’ performance in the two training
environments showed a significant improvement in the performance of
those in the high variability training condition after the training,
suggesting that the availability of a rich training environment can
improve native speakers’ of Japanese perceptual identification of
the English /l/ and /r/ phonemes. 
Further
evidence for the interaction between high-variability training and
the acquisition of L2 phonological forms can be found in Logan et al.
(1991) study on native speakers of Japanese. Before training, six
native speakers of Japanese were tested on their ability to identify
the contrasting /l/ and /r/ via a pre-test that included 16 minimal
pairs contrasting the target phonemes. Participants were instructed
to mark each of the words they heard from a minimal pair printed in
the answer booklet given to them. The training phase included 15
training sessions in which participants heard 272 trails (68 minimal
pairs contrasting /l/ and /r/ in a variety of word positions (initial
singleton, initial cluster, intervocalic, final singleton, and
cluster) presented twice) produced by five different talkers (three
times each) and were asked to choose which of the words they saw on
the computer screen matched the words they heard. Participants could
only pass to the following trail if their answer was correct,
otherwise the correct answer was highlighted and another
representation of the stimulus was presented that was followed by the
next trail. After passing the training phase, the subjects took the
post-test that was the same test presented in the pre-test phase that
was followed by two generalization tests: Generalization Test 1, in
which 98 novel words were presented by a new talker, and
Generalization Test 2, in which 98 novel words produced by a familiar
talker. The
results displayed that participants were significantly able to
distinguish the English /r/ and /l/, and this improvement in their
performance was maintained when they were retested three weeks later.
Thus,
the availability of the high-variability training environment
supported the native Japanese speakers’ ability to accurately
discriminate the target phonemes.  
It
has also been shown that low-variability training environment that
included one or two sources of variability (e.g., talker, stimulus,
phonetic environments, and speaking rate) cannot support learners’
ability to recognize the phonological forms of the new phonemes. For
example, Strange & Dittman (1984) trained eight female native
Japanese speakers to distinguish the English /r/ and /l/ only in
word-initial position during 14-18 training sessions. Researchers
used three sets of stimuli in their training: a set of real word
minimal pairs (e.g., rock/lock),
and two sets of synthetic minimal pairs where feedback was provided
for each correct response. Their findings revealed the inability of
Japanese listeners to display a notable enhancement to distinguish
/r/ and /l/ in a generalization task with natural speech tokens
involving /r/-/l/ minimal pairs. While subjects were able to transfer
knowledge acquired during the training to identify the unfamiliar
phoneme contrast in nonword stimuli, they were not able to do so with
real words. Therefore,
it was concluded that word perception accuracies decrease when
learners listened to word lists that lacked stimulus variability.
The training
studies discussed thus far have all considered cases in which the
availability of various sources of variability in the training
studies such as stimuli, talkers, phonetic environments and tasks has
positively affected learners’ perception of unfamiliar segmentals.
Conversely, other training studies have investigated the relative
influence of training on the learning of nonnative suprasegmental
contrasts. For instance, Wang et al. (1999) trained eight native
English speakers for two weeks to discriminate four Mandarin tones in
real words spoken by native Mandarin speakers. To check the possible
improvement in the subjects’ identification of Mandarin tones due
to training, Wang et al. (1999) used a pretest and a posttest that
were followed by two generalization tests in order to investigate
whether the training benefit can be extended to new stimuli produced
by new talkers. The researchers were also interested in checking the
relative influence of long-term training. Therefore, they conducted a
long-term retention test six months after the training. The results
indicated a significant
development
in subjects’ performance from pretest (69% correct responses) to
posttest (90% correct responses), with a 21% increase in subjects’
tone detection accuracy. Wang et al. (1999) also found that the
native English speakers’ recognition of Mandarin tones was enhanced
in the two generalization tests, where trainees were successfully
able to extend their knowledge of the target tone contrasts to new
stimuli produced by novel talkers. These findings provide evidence
that using a high-variability training paradigm could improve L2
learners’ acquisition of both novel segmental and suprasegmental
contrasts. 
2
  Talker Variability 
In light of the
findings of the above-mentioned training studies, talker variability
is known as one of the principal sources of variability influencing
learners’ perception (Halle 1985). Qualities of talkers’ voices
could differ due to a number of different elements such as the shape,
size, and length of the vocal tract, and how talkers use different
acoustic measures, such as the rate and the length of formant
transitions. Crucially, these elements have been reported to be
influential in listeners’ perception of L2 speech (Cartell 1984). A
series of studies has explored the effects of talker variability on
speech perception in general, and word recognition in particular,
both in infant and adult studies. However, these studies reported
paradoxical findings that are briefly summarized in the following. 
2.1 Infant
Studies
The
influence of talker variability on infants’ perception of novel
phonemes has been thoroughly investigated. Some studies demonstrated
that the availability of several talkers helped infants accurately
discriminate unfamiliar phonetic categories. For example,
six-month-old-infants who first learned fricative contrasts and two
other vowel contrasts (e.g., the English vowels /a/-/i/, and
/a/-/ɔ/), demonstrated some abilities to differentiate the target
contrasts when they were spoken by different speakers (Barker & Newman 2004). Similarly, Houston & Jusczyk (2000) examined the
effects of talker variability on the recognition of words in fluent
speech by two different age groups of infants:
seven-and-a-half-month-old and ten-and-a-half-month-old infants. When
there was a one-day delay between the training and testing sessions,
the seven-and-a-half-month-old infants demonstrated a significant
improvement in their word recognition ability only when stimuli were
spoken by talkers of the same gender as the talker in the training
session. While these infants were successfully able to generalize
training to words spoken by two new female talkers,
they were unable to recognize words produced by two new male talkers.
This finding suggests that listening to several talkers of different
genders did have an impact on the perceptual identification of the
spoken words by infants at this age. Nevertheless, the
ten-and-a-half-month-old infants performed differently as they were
able to generalize words produced by a single talker to other talkers
of the opposite gender. 
Further evidence
for the positive role of multitalker variability can be found in Rost
& McMurray’s (2009) study that tested the role of phonetic
variability in two experiments. In Experiment 1, 39 monolingual
English 14-month-olds saw pictures whose labels were read by one
talker. Conversely, 16 monolingual English 14-month-olds participated
in Experiment 2 in which they saw the same pictures shown in
Experiment 1 and listened to their labels read by 18 talkers. The
results showed the performance of infants who listened to labels
(e.g., buk
vs. puk)
spoken by multiple talkers to be more accurate at distinguishing the
difference between words contrasting /p/ and /b/ in initial position
than infants who listened to labels spoken by a single talker. The
findings revealed the positive contribution of talker variability in
identifying novel contrasts by infants.
Conversely,
other infant studies showed that talker variability hinders infants’
speech recognition performance. For
example, Jusczyk et al. (1992) found that variability in both tokens
and talkers impeded the word recognition abilities of two-month-old
infants after a delay interval. While infants in the two training
conditions did notice the change in the phoneme from bug
in the training session into dug
in the test session in the first experiment, only young infants in
the single-talker group were able to observe the change in the target
phoneme in the following experiments that included a two-minute delay
between training and testing. Therefore, researchers concluded that
listening to a single talker could assist two-month-olds to establish
lexical representations
for the contrasting phonemes of the target language. More recently,
Schmale
& Seidl (2009) reported the inability of nine-month-old-infants
to distinguish words when produced by different talkers who varied in
both voice and accent. Variability in talkers raised the word
processing load that consequently resulted in infants’ low word
recognition performance. Based on these results, it can be concluded
that prior research has reported paradoxical findings concerning the
role of talker variability in infants’ speech perception and word
recognition ability. 
2.2 Adult
Studies
Contradictory
results regarding talker variability and its impact on learners’
perception are not only limited to infants studies, but they can be
found in adult studies as well. A number of studies reported the
negative role of talker variability. For example, Mullennix et al.
(1989) conducted a series of experiments. In Experiment 1, 22 native
English speakers were instructed to identify 68 English words
produced in noise by either a single-talker or multiple-talkers in an
identification task. Experiment 2 included a naming task in which 12
native English speakers were asked to name each of the target words
produced by multiple-talkers once they heard it. In Experiment 3,
seventy native speakers listened to 96 English words on a naming task
that varied in its word frequency - 48 low frequent words and 48 high
frequent words - and were produced in two training environments: a
single-talker condition and a multiple-talker condition. In
Experiment 4, however, 30 native English speakers listened to the
same stimuli presented in Experiment 3, and were asked to write down
the words they heard. Findings of the four experiments revealed that
subjects performed worse when they listened to different talkers. 
Likewise,
Sommers et al. (1994) asked two groups of native English speakers in
two different training settings - single-talker training and
multiple-talker training - to type the words they heard as well as
they could. Researchers also tested two other groups of subjects in
either a single speaking-rate (i.e., subjects heard words produced at
either fast, medium, or slow rate) or a mixed speaking-rate (i.e.,
subjects heard words produced at one of the three different speaking
rates). Furthermore, Sommers and colleagues recruited 60 more
subjects to investigate subjects’ performance when they heard word
lists differing in either talker variability or speaking rate, and
when these word lists differed along these two dimensions. All tokens
were shown in noise to subjects in different groups. Findings
revealed that when a single talker introduced word lists to learners,
they accurately identified the target words better than words spoken
by different talkers. It
was also found that hearing the target stimuli produced by more than
one talker at different speaking rates (e.g., low, middle and high)
hindered the subjects’ perception. Therefore, the researchers
concluded that too much variability in the given speech signals did
impede the subjects’ perception.
In contrast to
the studies discussed above, a number of L2 training studies have
shown the positive role of talker variability. For example, Lively et
al. (1993) examined which of the three word positions (e.g., initial
singleton, initial cluster, and intervocalic) Japanese learners found
the most difficult one. Their
findings displayed
that native Japanese speakers in the multiple-talker training were
more accurate than their counterparts in the other group at
identifying the English /l/ and /r/ spoken by
both familiar and unfamiliar talkers on the generalization task.
The
authors concluded that hearing a single talker did not enable
listeners to generalize their word familiarity
to tests with novel tokens and novel talkers compared with the
performance of subjects
in the multiple-talker training environment.
In a follow-up study, Lively et al. (1994) trained Japanese learners
of English in Japan for three weeks, using the same stimulus set of
Lively et al. (1993). After 15 training sessions, the Japanese
speakers’ identification of the English /r/ and /l/ contrasts was
significantly improved as a result of the high variability training
paradigm. Three months later, the subjects’ ability to retain the
new contrasts was tested through generalization tests. Findings of
these tests displayed no significant decline in the subjects’
ability to reliably categorize the English
/r/ and /l/ contrasts,
confirming the efficiency of this type of training in the acquisition
of nonnative contrasts.
More
recently Bradlow & Bent (2008) examined the influence of talker
variability on learners’ transcription skills.
The
authors gave English sentences produced by three groups of talkers -
multiple native Chinese talkers, a single native Chinese talker and
five native English talkers - to 87 native Enlgish listeners. 
Their
findings revealed the capability of learners in the multiple-talker
group to transcribe the target sentences more accurately than the
other two groups. The researchers concluded that the beneficial role
of talker variability could be extended to accented-speech where it
played an advantageous part in improving learners’ transcription
skills. Studies have also demonstrated that the influence of talker
variability can be marginally significant, with both multiple and
single-talker trainings having been found to facilitate comprehension
of unfamiliar speech. For example, Hardison’s (2003) study
investigated the impact of word position - adjacent vowel, talker
variability, and training type (auditory versus visual) - on native
Japanese and Korean speakers’ perception of the English /r/ and /l/
contrasts. The researcher recruited 16 native Japanese speakers and
eight native Korean speakers who participated in two different
experiments. Experiment 1 included two training environments: the
first one included auditory and visual inputs, and the second
training environment only included an auditory input. Experiment 1
included the following main sessions: pretest, tainting, posttest,
and two generalization tests (one with a familiar talker from the
training phase and one with an unfamiliar talker). In Experiment 2,
the impact of visual input on training Korean learners of English was
examined. Like Experiment 1, the second experiment included two
training environments: a visual and auditory training group and the
auditory-only training group. Moreover, each of these training groups
was divided into two groups: one in which the subjects listened to
stimuli presented by multiple talkers and another one in which the
subjects listened to stimuli spoken by a single talker. While
findings indicated a significant impact of training type, word
position, and adjacent vowel on the perception and production
of /r/ and /l/ by the ESL participants in the two training
conditions, they also
revealed a marginal significance of talker variability on the
subjects’ performance in the two generalization tests. Korean
speakers displayed a minor success in generalizing the training they
received to new tokens produced by unfamiliar talkers.  
Based on the
studies discussed thus far, two main conclusions can be made. 
- Firstly, testing the relative effects of talker variability on learners’ acquisition of novel phonological features has shown mixed findings.
While some
studies found talker variability to be an ineffective factor that
impairs the learners’ performance, other studies provided evidence
in favor of the positive role of talker variability. A third group of
prior studies, on the other hand, displayed a minor role for talker
variability where both single- and multiple-talker training could
help L2 learners acquire the phonological structure of L2 words.
These conflicting findings clearly display the need for conducting
further research to address this significant issue. Therefore, the
first goal of the present study is to determine whether learners can
benefit from a variety of talker-specific properties of speech to
help them learn the non-native consonant contrasts. 
- Secondly, talker-variability studies have mainly used non-lexical tasks that examined the learners’ online perception of the newly learned phoneme contrasts and paid little attention to the lexical processing of these contrasts.
Thus, the second
goal of the current study is to further examine the relative impact
of talker variability on adult L2 learners’ ability to
categorically discriminate and lexically store unfamiliar speech
contrasting phonemes, using both lexical and non-lexical tasks.
2.3 Task Type
Previous
L2 research has shown that different demands of tasks do influence
learners’ perceptual performance (Logan & Pruitt 1995, Matthews
& Brown 2004, Werker & Tees 1984b). For example, lexical
tasks are reported to be more demanding than non-lexical tasks since
they require listeners to access their memory for the meaning of the
target stimuli (Curtin et al. 1998). To explore L2 learners’
ability to detect and encode novel L2 phonological features, prior
second
language acquisition
(SLA) studies used both non-lexical and lexical tasks that resulted
in different results. For example, native speakers of English were
more accurate at discriminating the Thai voice contrasts after
training in a non-lexical task than in the lexical task that required
memory of the target contrasts (Curtin et al. 1998). Conversely,
Hayes-Harb, & Masuda (2008) found that native English speakers
who studied Japanese for one year were able to accurately
discriminate the Japanese length consonant contrast in the given
lexical task; however, their performance was significantly less
accurate on the non-lexical one. 
Unlike the two
above-mentioned patterns of previous research findings, Pater’s
(2003) study showed no difference in the performance of native
English speakers on both non-lexical and lexical tasks. When subjects
were asked to match one of the words they heard, to the corresponding
picture, they performed as well on the lexical XAB identification
task as they did on the non-lexical XAB identification task. Pater
concluded that the similar design of the two tasks, which included
the same pictures and the same number and types of phases, was behind
the learners’ similar performance on the two different tasks. These
conflicting findings in the literature exhibit a need for further
investigation of this issue, which is the third main goal of the
current study that examines the possible influence of task type
(i.e., non-lexical versus lexical) on learners’ recognition of
novel L2 contrasts. 
3
  The Study
With these three
goals in mind, the present study is guided by the following research
questions: 
- Does training with single-talker versus multiple-talkers influence L2 learners’ identification of newly-learned sound contrasts in terms of generalization to novel talkers?
- Is there any more accurate performance with multiple-talker training than with single-talker training on a non-lexical task?
- Is there any more accurate performance with multiple-talker training than with single-talker training on a lexical task?
- Does task type training (in this case, non-lexical versus lexical) influence learners’ ability to discriminate novel L2 phoneme contrasts?
The acquisition
of Arabic by native English speakers was an ideal scenario for this
research because of the recent rapid increase in the enrolment in
Arabic classes in North America in general and the US in particular
where new Arabic programs have been established and teaching it has
matured as a profession (Al-Batal & Belnap 2006). While
new summer programs have been established in the Arab world and new
professional organizations have witnessed
increase in memberships, such as the duplication of the total number
of the members of The American Association of Teachers of Arabic in
less than a year
(Ryding 2006), Arabic remains to be classified as one of the
languages that are less studied as a second language (Rabiee 2010).
In addition, Arabic includes a number of consonant contrasts that do
not exist in English, and their acquisition by native speakers of
English is notably difficult (Al Mahmoud 2013, Alwabari 2013). Like
other Arabic contrasts, the Arabic /ħ/-/h/contrast has received no
attention in the literature of L2 phonology whose “discrimination
of /h/-/ħ/ was significantly worse than all other contrasts” (Al
Mahmoud 2013: 22).
By and large, all
the aforementioned reasons justify examining this Arabic contrast in
the present study.
3.1
Experiment 1
Experiment 1 was
designed to explore the role of talker variability in the acquisition
of novel L2 phonemes on a non-lexical discrimination task, i.e. the
impact of single-talker versus multiple-talker training on the
recognition of the Arabic pharyngeal-glottal contrasts by learners
with no prior experience with Arabic in non-lexical tasks, in terms
of generalization of training to stimuli produced by unfamiliar
talkers.
3.1.1
Participants
Thirty
native English speakers (11 males and 19 females) with prior
knowledge of Arabic were recruited from undergraduate courses at the
University of Utah (USA). Seven of them received course credit for
their voluntary participation; the other 23 participants received
payment for their participation. Via
a background questionnaire that they filled out before performing the
study, all participants reported having no speech or hearing problems
and no neurological disorders. Participants also reported not being
under the influence of any medication that might impact their motor
skills. The participants’ mean age was computed as 22.5 years.
Participants were randomly assigned to one of the two word learning
environments: a single-talker environment (7 males and 8 females) and
a multiple-talker environment (4 males and 11 females). To avoid
talker’s idiosyncrasies, participants in the single-talker
environment were randomly assigned to one of the three subgroups:
- Group1 that listened to stimuli produced only by Talker 1,
- Group 2 that listened to stimuli produced only by Talker 2,
- and Group 3 that listened to stimuli spoken only by Talker 3. See Table 1 below.
| 
Word
					Learning Phase | 
XAB
					Non-lexical Task (New Talkers) | |
| 
Single-Talker
					Environment | 
Group1:
					listened only to Talker 1, Group 2: listened only to Talker 2
					Group 3: listened only to Talker 3 | 
The
					two groups listened to Talker 4, Talker 5 and Talker 6 | 
| 
Multiple-Talker
					Environment | 
Subjects
					listened to Talker 1, Talker 2 and Talker 3 | 
Table
1: Summary of Training Environments in Experiment 1
3.1.2 Stimuli
Experiment
1 included two sets of stimuli. The first set included 12 disyllabic
Arabic nonwords. These
tokens consisted of six minimal pairs contrasting the target Arabic
phonemes (i.e. /h/ and /ħ/) in three different positions: initial
position (e.g. ħaθa-haθa),
intervocalic
position (e.g. diħi-dihi),
and word-final position (e.g. itiħ
-itih).
The second set included six filler tokens that were three minimal
pairs contrasting familiar phonemes found in both English and Arabic
as controls in the same vowel environments as the target stimuli:
word initial (e.g. sata-ʃata),
intervocalic position (e.g. fisi-fiʃi),
and in word-final position (e.g. anas-anaʃ).
Each stimulus was randomly assigned to a picture that indicated its
meaning. Using Arabic nonwords and having subjects with no prior
exposure to Arabic made it easy to assign any picture to any auditory
stimulus. List of pseudowords and their assigned meanings (pictures)
are shown in Table 2 below.
| 
Target
				Items | 
Filler
				Items | |||||
| 
Auditory
				Pseudoword | 
Picture | 
Auditory
				Pseudoword | 
Picture | 
Auditory
				Pseudoword | 
Picture | |
| 
[haθa] | 
Hanger | 
[dihi] | 
Paper
				clip | 
[sata] | 
Clock | |
| 
[ħaθa] | 
Dice | 
[diħi] | 
Pen | 
[ʃata] | 
Hair
				dryer | |
| 
[hibi] | 
Eye
				glasses | 
[anah] | 
Microscope | 
[fisi] | 
Keychain | |
| 
[ħibi] | 
Hammer | 
[anaħ] | 
Mushroom | 
[fiʃi] | 
Fruit
				plate | |
| 
[gaha] | 
Pencil
				sharpener | 
[itih] | 
Glass
				water jug | 
[anas] | 
Safety
				pin | |
| 
[gaħa] | 
Pliers | 
[itiħ] | 
Ice
				cream scoop | 
[anaʃ] | 
Paint
				roller | |
Table
2: List of pseudowords and their assigned meanings (pictures)
Six
male native speakers of Egyptian Arabic were recruited from the
University of Utah community to produce the spoken materials. Talkers
were recorded, reading the stimuli in a carrier sentence, “uridu
?an ?aktubu kalemeta
________” (‘I want to write the word _________’) in a
sound-attenuated booth, using a Marantz PMD 660 recorder and a Samson
QV microphone. Talkers
were instructed to read the list of 18 Arabic nonwords that were
written in Arabic script at their normal speaking rate three times,
each time reading the nonwords in a different random order. The
second production of each stimulus was extracted for presentation in
the study. Table 3 provides information about the six native Arabic
talkers.
| 
				        Age | 
Period of Learning English | 
Length
				of Residence in an English-Speaking Country (USA) | 
Major | |||
| 
Talker
				1 | 
24 | 
12 | 
4 | 
Economics | ||
| 
Talker
				2 | 
30 | 
16 | 
5 | 
Engineering | ||
| 
Talker
				3 | 
26 | 
14 | 
2 | 
Political
				Science | ||
| 
Talker
				4 | 
27 | 
11 | 
9 | 
Physics | ||
| 
Talker
				5 | 
23 | 
12 | 
8 | 
Engineering | ||
| 
Talker
				6 | 
32 | 
18 | 
10 | 
Education | ||
Table
3: Talker Group: Six Native Arabic (Egyptian) Speakers
3.1.3
Procedure
This experiment
was administered in a single session that took place in a
sound-attenuated booth where audio and visual stimuli were
introduced, using a computer and Sony MDR-7506 headphones that
participants used to listen at a comfortable level. Three phases were
included in Experiment 1: word-learning, criterion test, and
non-lexical discrimination test. All phases were shown through the
DMDX software that was developed by Forster & Forster (2003).
First, in the word-learning phase, participants listened to each
nonword and saw the picture indicating its meaning, and they were
instructed to learn the words and their meanings as well as possible.
While participants in the single-talker training environment listened
to stimuli produced by a single talker (i.e. either Talker 1 or
Talker 2 or Talker 3), their counterparts in the multiple-talker
training environment listened to stimuli spoken by three multiple
talkers (i.e. Talker 1, Talker 2, and Talker 3). The 18 Arabic
nonwords were presented two times per block, and each block was
presented three times. This resulted in a total of 108 presentations
that were presented in random order in each training environment. 
After the
word-learning phase, participants started the criterion test phase in
which they were tested on their knowledge of the training stimuli on
a non-lexical discrimination task (that did not require lexical
access). In this test, participants heard a word (X), saw a picture
(A), and then saw another picture (B), and it was their task to
decide whether the word (X) matched picture (A) or picture (B) by
pressing either the right or left shift keys (labeled First
and
Second)
on the keyboard. Each word appeared as (X) twice: one-half was
matched with (A) and one-half was matched with (B). Thus, the
criterion test included 36 test items that were introduced in a
different random order where participants in the two training
conditions listened to stimuli produced by the same talker(s) they
heard in the word-learning phase. This task did not require any
discrimination of the target contrasts. To proceed to the following
phase, participants had to score 90% or better on the criterion test
phase. Scoring below 90% resulted in retaking the training phase that
could be repeated as many times as needed until they achieved the
passing score. Figure 1 displays an example of a criterion test item:
Fig.
1: Example presentations in the criterion test
(Sound-Picture-Picture) used in Experiment 1
Third,
after passing the criterion test, participants proceeded to the last
test - i.e. the XAB non-lexical discrimination test - that examined
their ability to distinguish the Arabic pharyngeal-glottal minimal
pairs. In the XAB non-lexical discrimination test
(sound-sound-sound), participants in the two training groups listened
to auditory stimuli produced by three unfamiliar talkers: Talker 4,
Talker 5, and Talker 6, who had not participated in the word-learning
phase for either group. Each trail consisted of the presentation of
three auditory words (X, A, and B), and the participants were asked
whether the auditory X was more similar to A or B (e.g.
/diħi/-/diħi/-/anah/) by pressing either the right or left shift
keys (labeled First
and
Second)
on the computer keyboard. Unlike the criterion test, the final test
included 36 trails: 24 contrast trails (in which A and B were minimal
pairs) and 12 foil trails (in which A and B were not members of a
minimal pair) that were shown in random order. Figure 2 shows an
example of the non-lexical test stimuli as presented to subjects in
the two training groups.
Fig.
2: Example presentations in the XAB non-lexical discrimination task
(Sound-Sound-Sound)
used in Experiment 1
3.1.4 Results
Proportion
correct (proportion of responses correctly identifying the intended
production of the talker) was calculated for each participant. The
data were submitted to Analysis of Variance, with item type (two
levels: target, filler) as a within-subjects variable and training
group (single, multiple talker) as a between-subjects variable. The
main effect of the training group was significant (F (1,28) = 88.866,
p<.001, partial eta squared = .760), with performance by
participants in the multiple talker training group (.915) being more
accurate than that of those in the single talker training group
(.671). The effect of item type was also significant (F (1,28) =
79.646, p<.001, partial eta squared = .740) with performance on
filler items (.911) higher than that on target items (.675). The
interaction of item type and training group was also significant (F
(1,28) = 39.685, p<.001, partial eta squared =.586).
Following
up on the significant interaction of item type and training group, we
will now focus on the results for each item type separately. There
was a significant effect of training group on performance on target
items (F (1,28) = 161.398, p<.001), with more accurate performance
by subjects having been shown in the multiple talker training group
(.881) than the single talker training group (.469). However, the
effect of training group on performance on filler items was not
significant (F (1,28) = 3.564, p=.069; single talker group: .872,
multiple talker group: .950). Thus, while performance on filler
items, on which it was expected that all subjects would perform well,
did not differ significantly, performance on target items did differ
significantly between the groups - and in the expected direction -
with subjects in the multiple-talker training group outperforming
those in the single-talker training group. Figure 3 presents a visualization of these results:
Fig.
3 Proportion correct for subjects in the two training groups on the
non-lexical task; 
bars
represent +/-1 standard error
3.2
Experiment 2
Unlike
Experiment 1, Experiment 2 tested the possible influence of talker
variability training on the participants’ ability to generalize
knowledge gathered from word-based training
to novel talkers
in
a lexical discrimination task that required them to match auditory
forms to pictures.
Therefore, participants were mainly tested on their ability to store
the contrasting sounds.
3.2.1
Participants
Thirty native
English speakers without any prior knowledge of Arabic, ranging in
age from 18 to 31 (M=24.5) and who were recruited from the University
of Utah campus but had not participated in Experiment 1, took part in
this experiment. Participants either received undergraduate course
credit (N=16) or $10 payment (N=14) for their voluntary participation
in the study. via the given background questionnaire, they reported 
having no speech or hearing problems and no neurological disorders.
The questionnaire data also verified that none of them were under the
influence of any medication that might affect their motor skills.
Participants were randomly assigned to one of the two word learning
environments: the single-talker training (4 males and 11 females) and
the multiple-talker training (6 males and 9 females). 
3.2.2 Stimuli
The
two sets of stimuli that were previously used in Experiment 1 were
also the stimuli for Experiment 2. That is, 9 minimal pairs (12
target nonwords + 6 filler nonwords) contrasting Arabic
glottal-pharyngeal contrasts (i.e. /h/ and /ħ/) in three different
positions: initial, intervocalic and final) in
a
Consonant-Vowel-Consonant-Vowel (CVCV)
structure.
In Experiment 2, the same pictures and the same productions from the
same native Arabic speakers were also used.  
3.2.3
Procedure
Like
the design of Experiment 1, the first two phases - the word learning
phase the and criterion test - were included in Experiment 2, using
the same auditory and visual representations. Again, the
word-learning training included 108 tokens (12 target words + 6
filler words - * 2 presentations * 3 blocks), and the criterion test
included 36 test items (12 target words + 6 filler words - * 2
presentations) that were displayed in a different random order for
each participant. When passing the criterion test with 90% or better
accuracy, participants could proceed to the final test. Otherwise,
the training phase began again. Like in Pater (2003), the final test
was different from Experiment 1 and included an XAB lexical
discrimination test (sound-picture-picture) that was identical to the
criterion test in which learners in the two training groups heard a
word (X), saw a picture (A), and then saw another picture (B) and
were asked to match the auditory word they heard to the correct
picture (either A or B) by pressing either the right or left shift
keys (labeled First
and
Second)
on the keyboard. This task required a discrimination of the target
pharyngeal-glottal contrasts where A and B included members of the
target minimal pairs (contrast trial, e.g. /hibi/-/ħibi/).
Unfamiliar talkers produced tokens in the final test. Figure 4 shows
an example of stimuli presented in the lexical discrimination test in
Experiment 2.
Fig.
4: Example presentations in the XAB lexical discrimination task 
(Sound-Picture-Picture)
used in Experiment 2
3.2.4 Results
As
in Experiment 1, proportion correct (proportion of responses
correctly identifying the intended production of the talker) was
calculated for each participant. The data were submitted to Analysis
of Variance,
with the item type (two levels: target, filler) as a within-subjects
variable and the training group (single, multiple talker) as a
between-subjects variable. This analysis revealed that a significant
main effect of training group was significant (F
(1,28) =  20.264, p
<
.001, partial eta squared =. 420),
with subjects in the multiple-talker training group (.869) performing
more accurately than their counterparts in the single-talker training
group (.654). Moreover,
the main effect of item type was significant (F
(1,28) = 35.598, p
< .001, partial eta squared =. 560), with subjects’ performance
on targets (.676) being lower than that on fillers (.847).
The
interaction of item type and training group was significant as well
(F
(1,28) = 11.861, p
< .001, partial eta squared =. 298).
Following
up on the significant interaction of item type and training group, we
will now focus on the results for each item type separately. There
was a significant difference between the two training groups for
target items (F
(1,28) = 47.722, p
< .001, partial eta squared =. 630),
where subjects in the multiple-talker training group performed more
accurately (.833) than those in the single-talker training group
(.519). On
the other hand, the effect of the training group on subjects’
performance for filler items was not significant (F
(1,28) = 3.281, p
= .081, partial eta squared = .105). Hence, while performance on
filler items, on which it was expected that all subjects would
perform well, did not differ significantly, performance on target
items did differ significantly between the groups - and in the
expected direction - with subjects in the multiple-talker training
group outperforming those in the single-talker training group. Figure 5 presents a visualization of these results:
Fig.
5: Proportion mean correct for subjects in the two training groups on
the lexical task; 
bars represent +/ -1 standard error
As
demonstrated above, both Experiments - 1 and 2 - revealed the
expected pattern of results, with subjects in the multiple-talker
training conditions outperforming subjects in the single-talker
training conditions on
the items tested.
The data also indicated that all participants were more accurate at
identifying familiar contrastive phonemes from their native language
(i.e. /s/ and /ʃ/) than novel ones (i.e., /ħ/ and /h/).
3.2.4.1
Comparison of Experiment 1 and Experiment 2: Results
In
order to evaluate the effect of task type - in this case, non-lexical
versus lexical - the results from Experiments 1 and 2 were compared.
An Analysis
of Variance
was performed, with task type (two levels: non-lexical and lexical)
and training group (two levels: single talker and multiple talker) as
a between-subjects variable and the item type (two levels: targets
and fillers) as a within-subjects variable. As expected from the
results reported above for Experiments 1 and 2 separately, the main
effect of item type was significant (F
(1,56) = 108.965, p
<. 001, partial eta squared = .661; target mean: .676; filler
mean: .879). In addition, the main effect of the training group was
also significant (F
(1,56) = 71.415, p<.
001, partial eta squared = .560; single talker mean: .663, multiple
talker mean: .892), as was the interaction of item type and training
group (F
(1,56) = 46.304,
p<.
001, partial eta squared = .453). In contrast, neither the main
effect of task type (F
(1,56) = 1.320, p>.
05, partial eta squared = .023; non-lexical task mean: .793, lexical
task mean: .762) nor any of the two-way or three-way interactions
involving the task-type variable were significant (all p
>.05). Overall, these findings indicate that there was no
difference in performance by subjects on the non-lexical versus the
lexical tasks. In
Figure 6, these findings are visually represented: 
Fig.
6: Proportion mean correct for subjects in Experiment 1 (non-lexical
task)
and
Experiment 2 (lexical task); bars represent +/ -1 standard error
4 Discussion
In the
experiments presented here, it has been examined how variability in
the voice of the talker and task type can affect native English
speakers’ recognition of the Arabic /ħ/-/h/ contrast on two
discrimination tasks that required detection of the target contrast.
To this end, two groups of native English speakers in each experiment
were taught to learn 18 Arabic non-words produced either by a singer
talker or several talkers. In Experiment 1, evidence was provided
that participants who heard the target tokens produced by multiple
talkers during the training, performed significantly more accurately
on the test items than their counterparts in the single-talker
training groups. That means that native English speakers who heard
the target tokens spoken by various talkers during the training phase
achieved a percentage correct of 88% or above when matching the test
items with their correct auditory, suggesting that their knowledge of
the phonological forms of the newly-learned words was improved by the
availability of multiple talkers in the training environment. One
possibility is that multiple talkers’ speech signals provided rich
language input where indexical properties of talkers’ voices
integrated with the linguistic component of the target language that
consequently resulted in facilitating recognition of the phonological
forms of the novel Arabic words by native English speakers in this
training condition. In contrast, subjects in the single-talker
conditions were deprived of this advantage, and listening to
different voices for the first time at the final test added extra
difficulty to their task. Not only did they have to focus on the
novel contrast in the auditory forms in order to detect the
difference between the target tokens, but they also needed to attend
to the new voices whose productions of the newly learned words might
sound different from those introduced by the familiar single talker
in the previous phases. As a result, they correctly distinguished a
smaller number of the newly-learned words with 47% accuracy. 
Despite
the difficulty of the XAB lexical task that was reported by Pater
(2003), subjects in the multiple-talker group in Experiment 2 were
successfully able to exploit the target phoneme contrast to
discriminate the meaning of words in the lexical identification task.
For example, realizing that the two tokens diħi
and dihi
refer to two different lexical items (i.e. a pen and a paper clip,
respectively) provided subjects in the multiple-talker group with the
adequate information to detect the difference between their middle
consonant phonemes /ħ/ and /h/, and that knowledge accordingly
helped them establish phonetic categories of the target contrasts
(with 83% accuracy). Considered together, findings from the two
experiments provide additional evidence supporting the positive role
of talker variability in the acquisition of one of the difficult
Arabic consonant contrasts (i.e. /ħ/-/h/) that learners of Arabic
often find challenging to acquire. This can provide more robust
results regarding the beneficial role of talker variability in L2
acquisition.
The accurate
performance of the multiple-talker groups can be explained, using the
framework of the exemplar models (Goldinger 1998, Johnson 1997).
According to this approach, the acoustic characteristics of target
tokens produced by different talkers, which include indexical
information (i.e. information about talker’s gender, age, dialect,
social class, and speaking rate) and phonetic information, are stored
in learners’ mental lexicons, resulting in facilitating recognition
of novel representations of these target words when they are produced
by new talkers. While subjects in the multiple-talker groups stored
three representations of each target token produced during the
training that helped them in identifying the same tokens when
produced by novel talkers during the test phase, subjects in the
single-talk groups only stored fewer representations of each token
and consequently did not have enough exemplars that could enable them
to distinguish the novel productions of the new talkers. 
In relation to
the second question, comparing subjects’ performance in the two
experiments demonstrated no significant difference between subjects’
performance on the XAB non-lexical task (92% correct) versus the XAB
lexical task (87% correct) despite the different demands of each
task. This finding is consistent with Pater’s (2003) study in which
subjects’ performance on the two XAB tasks did not differ (78%
correct on both tasks). One possible reason for this finding may be
due to the similar L2 input that subjects in the two experiments
received. Having the same information, whether introduced by one or
multiple talkers, provided subjects with the same input that
presumably resulted in mapping stimuli consistently during the two
different stages of the experiments (Schneider & Shiffrin 1997)
regardless of the different demands of each of the tasks that
subjects performed afterwards. In other words, it can be claimed that
subjects in the two experiments started the final XAB discrimination
task, non-lexical or lexical, with the same mental representations of
the newly learned words. Therefore, the different demands of the
tasks did not influence their performance.
In the area of
L2 instruction, findings from the present study are important for
both teachers and designers of L2 materials. They underscore the
importance of using rich acoustic input that is characterized with
variability in talkers, stimuli and phonetic environments when
introducing novel L2 phonetic features. This change in L2 teaching
methods is expected to facilitate the acquisition of L2 phonology.
Additionally, findings draw L2 teachers’ attention to the
significance of using less controlled tasks, similar to the criterion
tests used in this study, to better prepare learners for the demands
of the controlled tasks that follow. The rationale is that learners
need to practice before they are tested on their mastery of the given
materials. 
In terms of
pedagogy, the robust result of the benefits of talker variability in
the two experiments implies that L2 learners of Arabic can benefit
from exposure to several talkers providing variable Arabic language
input to overcome phonological and lexical confusion when they are at
an early stage of learning. This finding also draws attention of L2
instructors in general - and Arabic in particular - to the importance
of exposing L2 learners to numerous speakers of the target language,
for instance through integrating multimedia and / or guest speakers
of the target L2. Thus, a systematic examination of the impact of
different factors such as talker variability and task type, is
essential to elucidate confusions in the acquisition of L2 consonant
contrasts. Certainly, these consonants should be addressed more
directly and explicitly in pre-reading activities than consonants
that the child is expected to learn because of their utility in
everyday conversation.
The present
study also raises some interesting questions for future research such
as:
- How are linguistic information and indexical properties retained in learners’ lexicons (i.e. in the same or separate units)?
- How are novel L2 features are initially stored?
- How are they transferred from learners’ working memory into their long-term memory?
Answers to these
questions can possibly help us better see the big picture of speech
perception development with reference to variability in talkers and
consequently improve our understanding of this issue. Moreover,
replicating this study with other salient Arabic contrasts, including
segmentals and suprasegmentals and using both perception and
production tests, is another direction for future research that can
enrich word recognition investigation in particular and L2 speech
research in general. 
References
Al-Batal,
M. & Belnap, R. K. (2006). The
teaching and learning of Arabic in the United States: Realities,
needs, and future directions. In K. M. Wahba, Z. A. Taha & L.
England (Eds.), Handbook
for Arabic language teaching professionals in the 21st century (pp.
389-399). Mahwah, NJ: Lawrence Erlbaum Associates. 
Al
Mahmoud, M. S. (2013). Discrimination of Arabic contrasts by American
learners. Studies
in Second Language,
3(2),
261-292. 
Alwabari,
S. (2013). Non-Native Production of Arabic Pharyngeal and
Pharyngealized Consonants. Master’s Thesis). Carleton University,
Ottawa. Available from ProQuest Dissertations and Theses database.
Aoyama,
K., Flege, J. E., Guion, S. G., Akahane-Yamada, R., & Yamada, T.
(2004). Perceived
phonetic dissimilarity and L2 speech learning: The case of Japanese
/r/ and English /l/ and /r/. Journal
of Phonetics,
32,
233–250. 
Barker,
B. A., & Newman, R. S. (2004). Listen to your mother! The role of
talker familiarity in infant streaming. Cognition,
94,
B45-B53.
Baker,
W., & Trofimovich, P. (2006). Perceptual paths to accurate
production of L2 vowels: the role of individual differences. IRAL,
44,
3, 231-250. 
Bion,
R. A. H., Escudero, P., Rauber, A. S., & Baptista, B.O. (2006).
Category
formation and the role of spectral quality in the perception and
production of English front vowels. Proceedings of Interspeech 2006,
1363-1366.
Bradlow,
A. R., & Bent, T. (2008). Perceptual adaptation to nonnative
speech. Cognition,
106(2),
707–729.
Bradlow,
A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. (1997).
Training
Japanese listeners to identify English /r/ and /l/: IV. Some effects
of perceptual learning on speech production. The
Journal of the Acoustical Society of America,
101(4),
2299-2310.
Bradlow,
A. R., Akahane-Yamada, R., Pisoni, D. B., & Tohkura, Y. (1999).
Training
Japanese listeners to identify English /r/and /l/: Long-term
retention of learning in perception and production. Perception & Psychophysics, 61(5),
977-85.
Cartell,
T. D. (1984). Contributions of fundamental frequency, formant
spacing, and glottal waveform to talker identification. Res.
Speech Percept. Tech. Rep.
No. 5
(Indiana Univ., Bloomington, IN ).
Curtin,
S., Goad, H., & Pater, J. V. (1998). Phonological transfer and
levels of representation: The perceptual acquisition of Thai voice
and aspiration by English and French speakers. Second
Language Research,
14(4),
389–405. 
Cutler,
A., & Broersma, M. (2005). Phonetic precision in listening. In W.
J. Hardcastle & J. M. Beck (Eds.), A
figure of speech: A festschrift for John Laver
(pp. 63-91). Mahwah, NJ: Erlbaum.
Cutler,
A., Weber, A., & Otake, T. (2006). Asymmetric mapping from
phonetic to lexical representations in second- language listening.
Journal
of Phonetics,
34,
269–284.
Flege,
J.E., & Liu, S. (2001). The effect of experience on adults’
acquisition of a second language. Studies
in Second Language Acquisition,
23,
527-552.
Flege,
J. E., MacKay, I. A., & Meador, D. (1999). Native Italian
speakers’ perception and production of English vowels. Journal
of the Acoustical Society of America,
106(5),
2973-2987.
Forster,
K. I., & Forster, J. C. (2003). DMDX: A windows display program
with millisecond accuracy. Behavior
Research Methods, Instruments, & Computer,
35,
116-124.
Goldinger,
S. D. (1998). Echoes of echoes? An episodic theory of lexical access.
Psychological
Review,
105,
251–279. 
Halle,
M. (1985). Speculations about the representations of words in memory.
In V. A. Fromkin (Ed.), Phonetic
linguistics: Essays in honor if Peter Ladefoged
(pp. 101-114). Orlando: Academic Press.
Hardison,
D. M. (2003). Acquisition of second-language speech: Effects of
visual cues, context, and talker variability. Applied
Psycholinguistics,
24(4),
495-522.
Hayes-Harb,
R., & Masuda, K. (2008). Development of the ability to lexically
encode novel L2 phonemic contrast. Second
Language Research,
24(1),
5–33. 
Houston,
D., & Jusczyk, P. (2000). The role of talker-specific information
in word segmentation by infants. Journal
of Experimental Psychology: Human Perception and Performance,
26(5),
1570-1582.
Johnson,
K., & Mullennix, J. W. (1997). Talker
variability in speech processing.
San Diego: Academic Press, pp. 1–237.
Jusczyk,
P., Pisoni, D., & Mullennix, J. (1992). Effects of talker
variability on speech perception by 2- month-old infants. Cognition,
43(3),
253–291.
Lively,
S. E., Logan, J. S., & Pisoni, D. B.  (1993). Training
Japanese listeners to identify English /r/ and /l/: II. The role of
phonetic environment and talker variability in learning new
perceptual categories. Journal
of the Acoustical Society of America,
94,
1242–1255.
Lively,
S. E, Pisoni, D. B., Yamada, R. A., Tohkura, Y., & Yamada, T.
(1994). Training Japanese listeners to identify English /r/ and /l/:
III. Long-term retention of new phonetic categories. Journal
of the Acoustical Society of America,
96(4),
2076–2087.
Logan,
J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese
listeners to identify English /r/ and /l/: A first report. Journal
of the Acoustical Society of America, 89,
874–886.
MacKay,
I.R.A., Flege, J.E., & Imai, S. (2006). Evaluating the effects of
chronological age and sentence duration on degree of perceived
foreign accent. Applied
Psycholinguistics,
27,
157-183. 
MacKay,
I. R. A., Meador, D., & Flege, J. E. (2001). The identification
of English consonants by native speakers of Italian. Phonetica,
58,
103-125.
Martin,
C. S., Mullennix, J. W., Pisoni, D. B., & Sommers, W. V. (1989).
Effects of talker variability on recall of spoken word lists. Journal
of Experimental Psychology: Learning, Memory, and Cognition,
15(4),
676-684.
McCandliss,
B. D., Fiez, J. A., Protopapas, A., Conway, M., & McClelland, J.
L. (2002). Success and failure in teaching the [r]-[l] contrast to
Japanese adults: Tests of a Hebbian model of plasticity and
stabilization in spoken language perception. Cognitive,
Affective, and Behavioral Neuroscience,
2,
89 -109. 
Moyer,
A. (2007). Do language attitudes determine accent? A study of
bilinguals in the USA. Journal
of Multilingual and Multicultural Development,
28,(6)
502-517.
Mullennix,
J. W., & Pisoni, D. B. (1990). Stimulus variability and
processing dependencies in speech perception. Perception
and Psychophysics,
47,
379-390.
Mullennix,
J. W., Pisoni, D. B., & Martin, C. S. (1989). Some
effects of talker variability on spoken word recognition. Journal
of the Acoustical Society of America,
85,
365–378.
Nygaard,
L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech
perception. Perception and Psychophysics, 60(3),
355–376. 
Nygaard,
L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception
as a talker-contingent process. Psychological
Science, 5(1),
42-46.
Ota,
M., Hartsuiker, R. J., & Haywood, S. (2009). The KEY to the ROCK:
near-homophony in nonnative visual word recognition. Cognition,
111,
363- 269.
Pater,
J. (2003). The perceptual acquisition of Thai phonology by English
speakers: task and stimulus effects. Second
Language Research, 19(3),
209-223. 
Pater,
J., Stager, C, & Werker, J. (2004). The perceptual acquisition of
phonological contrasts. Language,
80(3),
384-402. 
Rabiee,
M. (2010). Arabic, Farsi fluency considered ‘critical’ to US
national security. Retrieved from
http://www.voanews.com/english/news/usa/Arabic-Farsi-Fluency-Considered-Critical-to-US-National-Security-102171449.html.
Rost,
G., & McMurray, B. (2009) Speaker variability augments
phonological processing in early word learning. Developmental
Science, 12(2),
339-349.
Ryding,
C. K. (2006). Teaching Arabic in the United States. In K. M. Wahba,
Z. A. Taha & L. England (Eds.), Handbook
for Arabic language teaching professionals in the 21st century (pp.
13-20). Mahwah, NJ: Lawrence Erlbaum Associates. 
Schmale,
R., & Seidl, A. (2009). Accommodating variability in voice and
foreign accent: flexibility of early word representations.
Developmental
Science,
12(4),
583-601. 
Schneider,
W., & Shiffrin, R. M. (1997). Controlled and automatic
information processing: I. Detection, search, and attention,
Psychological
Review,
84,
1–66.
Sommers,
M. S., Nygaard, L. C., & Pisoni, D. B. (1994). Stimulus
variability and spoken word recognition. I. Effects of variability in
speaking rate and overall amplitude. Journal
of the Acoustical Society of America, 96,
1314-1324.
Sommers,
M. S., Kirk, K. I., & Pisoni, D. B. (1997). Some considerations
in evaluating spoken word recognition by normal-hearing, noise-masked
normal-hearing, and cochlear implant listeners. I: The effects of
response format. Ear
and Hearing,
18,
89-99. 
Strange,
W., & Dittman, S. (1984). Effect of discrimination training on
the perception of /r-l/ by Japanese adults learning English.
Perception and Psychophysics, 36(2),
131-145.
Wang,
Y., Jongman, A., & Sereno, J. A. (2006). Second language
acquisition and processing of Mandarin tone. In E. Bates, L. Tan, & O. Tzeng (Eds.), Handbook
of Chinese psycholinguistics
(pp. 250-257). Cambridge: Cambridge University Press.
Wang,
Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training
American listeners to perceive Mandarin tones. Journal
of the Acoustical Society of America,
106(6),
3649-3658.
Weber,
A., & Cutler, A. (2004). Lexical competition in nonnative
spoken-word recognition. Journal
of Memory and Language,
50, 1–25.
Wong,
P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007).
Musical experience shapes human brainstem encoding of linguistic
pitch patterns. Nature
Neuroscience,
10,
420–422.
Author:
Dr. Asmaa
Shehata
University of
Calgary
Department of
Linguistics, Languages and Cultures
CH C114, 2500
University Drive NW,
Calgary, Alberta
T2N 1N4, Canada
Email:
Asmaa.shehata@ucalgary.ca