(Originally submitted November 19,2010)
Introduction
Second language learners often have trouble perceiving
and producing sounds that are not part of their native language.
Perception and production are clearly closely interrelated,
although the precise relationship between them is still not completely clear
(Bohn & Flege, 1997). The literature
seems to be split on whether production leads perception, or whether perception
leads production.
Some studies have shown that at least in the early stages
of L2 learning increased perception accuracy may result in more accurate production
(Bohn et al. 1997).
Others have argued that the reverse is true and that the
ability to produce correct phonetic distinctions shapes perceptual abilities. According to this theory, articulation and
perception are connected in the mind. A non-native speaker will adjust their
pronunciation until they are understood by a native listener, and this change
in articulation results in the re-shaping of their mental phonetic categories, resulting
in an increase in perceptual abilities (Sheldon and Strange, 1982).
However
insofar as it is possible to separate perception and production, this paper
will focus only on the perception of foreign speech sounds.
This paper will first look at perception of foreign
speech sounds in general, before looking at the perception of English “r” and “l”
sounds by Japanese learners of English as an example of second language speech
perception.
Second
Language Speech Perception
It is well documented that second language learners have
trouble perceiving sounds that do not occur in their native language (Munro
& Bohn, 2007). Since native speakers
presumably have the same auditory capabilities as non-native speakers,
accounting for this difference in perception creates a challenge for linguists.
The ability to perceive the difference between non-native
sounds appears to be lost fairly early in childhood.
By observing the
interest of infants in different sounds (measured through the vigorousness of
the infants sucking) we know that babies are born with the ability to
distinguish between all sorts of sounds that their parents can not. For example English learning infants under
the age of six months can distinguish phonemes used in Czech, Hindi and Inselkampx
that English speaking adults, even with training or university coursework,
cannot distinguish (Pinker, 1994).
However by six months the babies are beginning to organize
sounds into phonemes according to the categories of their native language. By ten months they do not distinguish between
phonemes that do not occur in their native language (Pinker, 1994.) By the age of 8, children show adult like
perception of both native and non-native speech (Best & Tyler, 2006).
It has been hypothesized that this is the result of the limited
processing ability of the attention mechanism in the human mind. Because the human mind can only focus its
attention on a limited amount of aspects at one time, instead of focusing on
all the features of any new piece of input the mind looks for patterns and organizes
new input by categories already established.
These established categories are the result of information already
stored in the memory (Lively, Logan, & Pisoni, 1993).
There are two different models which are often used to explain
second language speech perception: Flege’s Speech Learning Model and Best’s
Perceptual Assimilation Model (Munro et al., 2007).
The Speech Learning Model (SLM), developed by James Flege
in 1995, suggests that learners will tend to assimilate foreign sounds to the
phonetic categories of their native language, if the sounds are similar enough
to allow assimilation. Therefore according to
the SLM, sounds that are identical in the two languages present no problem to
the learner. As far as new sound
contrasts go, it is relatively easy for the learner to acquire new categories
for sounds that are phonetically dissimilar from anything in the native
language, because there is no problem of L1 interference (Hazan, Sennema, Iba,
& Faulkner, 2005).
The Perceptual Assimilation Model (PAM), developed by Catherine
Best in 1995, is based on a different theoretical framework and created for
different purposes. (The SLM was
developed for L2 learners actively learning a foreign language, whereas PAM was
developed for naïve listeners (Best et al., 2006).) However PAM makes similar predictions about
non-native speech sounds. According to
the PAM, a non-native sound is either “categorized” (as an example of a
pre-existing phoneme category from the native language), “uncategorized” (if
similar to two or more native categories) or nonassimilable (if it is not
similar to any pre-existing native category) (Hazan et al., 2005).
According to both of these theories, non-native speakers
may still be able to discriminate between two or more sounds in an L2 if they
are sufficiently phonetically dissimilar, and if there no such category in
their L1, such as American English speakers correctly discriminating between
various isiZulu click consonants (Best et al., 2006). However if two or more foreign speech sounds
have a high degree of similarity, and if this contrast does not occur in the
native language of the learner, and particularly if there is a native language
phonetic category that both foreign sounds could be assimilated into, an adult
learner will have trouble distinguishing between these sounds (Munro et al.,
2007). One often cited case of just such
an issue is the problem Japanese learners of English have distinguishing
between the two English liquid consonants: “r” and “l”.
To better understand this problem, it is useful to
briefly look at how liquid consonants compare in English and Japanese language,
and then look at Japanese perceptions of English liquids.
Liquid
Consonants in English and Japanese
English has two liquid
consonants: “r” and “l”. Japanese only
has one. This is thought to be the cause
of difficulty Japanese speakers have in perceiving “r” and “l” sounds in
English.
A liquid consonant is a kind of consonant in which the
airflow is only partially obstructed in the oral cavity and, unlike stop
consonants, air is still allowed to escape through part of the oral
cavity. Unlike fricatives, there is also
no friction created (Carr, 2008).
The /l/ phoneme in English is called a lateral
equivalent, meaning that the air does not go through the center of the tongue
(as in most other phonemes) but around the sides of the tongue. In making an /l/ sound the speaker usually
touches the center of the tongue to the alveolar ridge. (Although the articulation of /l/ can change
depending on whether or not it is before vowels or consonants, such as the “clear
l”/ “dark l” distinction) (Roach, 2009).
The other liquid phoneme in English is the “r” consonant,
and is called a post-alveolar approximate.
The tip of the tongue gets close to the alveolar area, but never
actually makes contact (Roach, 2009).
In both cases, the consonant is voiced, although
devoicing can occur when it occurs in a consonant cluster with an unvoiced
consonant (Roach, 2009).
The perceptual differences between English “r” and “l”,
as measured on an acoustic spectrogram, are located in variation in the
steady-state onset, and frequency transition of the third oral format
(F3). It is based on these differences
that American speakers differentiate between /r/ and /l/ sounds (O’Connor,
Gerstman, Liberman, Delattre & Cooper, 1957).
The Japanese have one liquid consonant, or at least a
consonant that is often referred to as a liquid. (Some phoneticians question whether the Japanese
consonant would be more accurately referred to as a flap (Flege, Takagi &
Mann, 1995).) It is represented in the
Japanese writing system by the symbols ら、り、る、れ、and ろ. Using the Hepburn writing system, the most
conventional way of converting Japanese sounds into the Roman alphabet, these
sounds are usually written as “ra”, “ri”, “ru”, “re”, and “ro”. (Because the Japanese writing system is based
on a syllabary rather than an alphabet, with the exception of the syllable
final “n” it is impossible to isolate a single consonant on its own.) It is this convention that gives us the “r”
consonant in such well-known Japanese words as “karate”, “samurai”, “Hiroshima,” and others.
In precise phonetic terms, exactly what this sound is,
and how it is articulated, is a matter of some debate. Its pronunciation may vary depending on
whether or not it is word initial (or utterance initial), depending on which
vowels it proceeds, depending on whether or not it is lengthened for emphasis,
and depending on individual variation among speakers. It has been described as an apico-alveolar
tap (palatalized before /i/ and /y/). Accordingly various phoneticians have assigned
it different values using the International Phonetic Alphabet (IPA) [r], [ɹ ], [lː][ɾ]or [d] (Vance, 1987).
However despite these differences in articulation,
Japanese speakers still consistently assimilate both English liquid consonants
with the Japanese “r” (Ayoma, Flege, Guion, Yamada & Yamada, 2004).
The exact perceptual relationship between the English [ɹ] and [l] and the Japanese “r” is also
uncertain. Japanese listeners identify
both the English [ɹ]
and [l] as the Japanese “r”, although it maybe closer to [l] (Guion, Flege,
Yamada & Pruitt, 2000). Flege et al. (1995) write that “Japanese /r/
appears to occupy a position in phonological space that is somewhere between
English /l/, /ɹ/,
and /d/ (and possibly /w/).”
Japanese
Perception of /r/ and /l/
The fact that Japanese speakers have had difficulty
pronouncing “r” and “l” sounds has long been observed informally. (The US War Department made use of this in 1942
with their pamphlet “How to Spot a Jap” (Caniff, 1942)). However the first serious linguistic study on
the matter was done by Hiromu Goto in 1971.
Goto also established for the first time that perception was just as
much of a problem for Japanese speakers as production, and that their listening
discrimination test results for Japanese speakers were not much above chance. This was apparently contrary to what most
people expected at the time. “Now the
question is whether or not we Japanese can distinguish ‘L’ from ‘R’ when it is
enunciated by native speakers of English.
Most people have thought that we could clearly distinguish them since
the native teachers would naturally emphatically differentiate them,” (Goto,
1971.)
Since that time many further tests have also validated
this research, as well as shown that the errors of Japanese speakers are
consistently bi-directional. Japanese
speakers are just as likely to misidentify an English “r” sound as “l”, as the
reverse (Flege et al. 1995).
A subsequent study by Miyawaki, Strange, Verbrugge,
Liberman, Jenkins and Fujimura (1975) also showed that when the frequency
values of the first and second format were held constant, and only the third
format (F3) was changed, American listeners tended to perceive the changes
categorically in terms of “r” and “l” sounds depending on the transition of the
F3, whereas the Japanese listeners showed much more random results. However when the third format was isolated and
just played by itself (a non-speech sound) there was little difference between
Japanese and Americans. The authors
concluded that the fact that perception only differed within speech sounds
means that it is the result of linguistic experience and not auditory
functions.
Also, because the contrast between /r/ and /l/ is based
on spectral cues, it has been argued that the perception is more difficult for
foreigners to acquire than temporal cues such as voice-onset time (Lively,
Pisoni, Yamada, Tohkura, Yamada, 1994).
Much research has been done into how, and under what
conditions, perception is acquired. Many
experiments were developed that sought to create new phonetic categories in the
minds of the listener by perceptual training.
A typical example of this is the study carried out by Bradlow, Pisoni,
Yamada and Tohkura in 1997 (which itself was a replication of several
previously published studies with similar results). They presented Japanese listeners with an
/r/-/l/ minimal pair on a computer screen, and then asked the Japanese listener
to connect the word they heard on the headphones with the correct orthographic
representation on the computer screen. Correct answers were rewarded with a
chime. Wrong answers received a buzzer
signaling an incorrect response, and the test word was repeated until the
correct answer was given. By this
method, the perception of /r/ and /l/ phonemes greatly increased from the
pre-test (65% correct) to the post-test (81% correct). However the participants did not reach native
English level perception, which is near perfect identification of /r/ and /l/
phonemes (Bradlow et al., 1997).
Outside of
training, natural exposure also seems to play a part in improving perception. For example, a study by MacKain, Best, and
Strange (1981) tested /r/ and /l/ perception on two groups of Japanese
subjects, one experienced group, which had training in English conversation by
native speakers, and an inexperience group, which did not have this
training. The experienced Japanese
subjects showed much better identification of /r/ and /l/ sounds, and were much
closer to the American control subjects, than the inexperienced Japanese
learners, although neither group had had explicit perceptual training outside
of exposure the exposure to conversation.
A further study by Flege, Takagi and Mann (1996) also
confirmed that Japanese subjects with English experience did better on /r/-/l/
perception tests than inexperienced Japanese subjects, although not quite as
well as native speakers. Flege et al.
also found an effect of lexical familiarity.
Both experienced and inexperienced Japanese learners were more likely to
correctly identify words they were already familiar with, indicating that previous
linguistic experience does indeed play a role in perception abilities.
Another study, published by Aoyama et al. in 2003, tested
the perception of 16 Japanese adults and 16 Japanese children living in Texas. The participants were tested twice, one year
apart, on their perception of /r/ and /l/.
The first testing was after the participants had been living in the United States
for an average of 0.5 years, the second testing was at an average of 1.6 years
length or residence. It was found that
the perception of the Japanese children improved dramatically between the first
and second test, but the Japanese adults’ perception did not improve
significantly (Aoyama et al., 2003).
Future
Research and Personal and Professional Experience
During my eight
years of teaching English in Japan,
and informally interacting with native Japanese speakers outside of the
classroom, I can personally testify that the “r” and “l” contrast is a major
issue for Japanese learners. It is not
the only phonological issue Japanese students have when learning English. There is also a failure to distinguish (both
perceptively and productively) between “sh” and “s” sounds, between “v” and “b”
sounds, between “chi” and “ti”, and others.
However my informal observation (not based on any
statistical measurement) is that it is the “r” and “l” contrast that most often
leads to confusion when conversing in English with L1 Japanese speakers. This is probably due to the fact that there
exists a high amount of naturally occurring /r/-/l/ minimal pairs in the
English language. In these cases
sometimes the meaning is clear from the context, but sometimes it is not. More often than not, as far as the Japanese
learner is concerned, communication break-down happens on the perceptive end. The native English speaker has more resources
(such as a larger receptive and productive vocabulary) to usually infer the
meaning even if learner mispronounces the word.
The learner, however, is more likely to misunderstand, particularly if
they are familiar with only one word in a minimal pair and their conversation
partner is using the unfamiliar word (again, all based on my informal
observation).
Failure to accurately perceive /r/ and /l/ sounds are therefore
a source of frustration for Japanese learners and the foreign language teacher
alike, and more research into the acquisition of non-native speech sounds would
be welcome.
As mentioned in the introduction to this paper, it is
still unclear whether perception leads production, or production leads
perception. This question is of obvious
importance for the language teacher looking to create a curriculum, and so it
is a beneficial area to further explore.
Also, in their 2006 paper, Best and Tyler suggest many areas where further
research on non-native and second language speech perception should take
place. Many of these suggestions could
be applied to the /r/ and /l/ contrast.
For example, if the acquisition of new phonetic
categories is caused by the acquisition of new L2 vocabulary, Best and Tyler suggest testing
beginning learners to see if there is an identifiable critical point in the
expansion of the L2 lexicon at which these new categories are created.
Best and Tyler also claim that many studies on the
acquisition of second language speech perception have used the passage of time
as the only variable, without examining what influences were present during
that time. They suggest looking at such
variables such as the relationship between orthography and perception, and the
influence on the learner of their conversational partners’ style of speech
(since most conversational partners will make adjustments in their speech
patterns according to the listener’s needs).
Best and Tyler
also suggest comparing perceptual recognition of L2 speech contrast between the
foreign language environment and the second language environment.
Hopefully
future studies like these will continue to give us more insight into the nature
of second language speech perception.
Bibliography
Bibliography
Aoyama,
K., Flege, J.E., Guion, S., Yamada, R.A. & Yamada, T. (2004). Perceived phonetic
dissimilarity and L2 speech learning: The case of Japanese /r/ and English /l/ and /r/. Journal
of phonetics, 32, 233-250.
Best,
C. (1995.) A direct realist view of cross-language speech perception. In
Strange, W. (Ed.) Speech Perception and
Linguistic Experience: Theoretical and Methodological
Issues (pp. 171-204). Baltimore: New York Press
Best,
C.T., & Tyler, M.D. (2006).
Nonnative and second-language speech perception: Commonalities and complementarities. In M.J. Munro & O.-S.
Bohn (Eds.) Second language speech learning: The role of language experience in
speech perception and production. (pp.2-47).
Amsterdam: John
Benjaimins.
Bohn,
O.S., & Flege, J.E. (1997). Perception and production of a new vowel
category by adult second language
learners. In Leather, J. & James, A.
(Eds.) Second-language speech: structure and process. (pp. 51-71). Berlin: de Gruyter.
Bradlow,
A., Pisoni, D., Yamada, R.A., & Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of
perceptual learning on speech
production. Journal of Acoustical Society of America, 101 (4), 2299-2310.
Caniff,
M. (1942). How to spot a Jap. Washington
D.C.: US War and Navy
Departments. http://www.ep.tc/howtospotajap/index.html
Carr,
P. (2008). A Glossary of Phonology. Edinburgh:
Edinburgh University Press Ltd.
Flege,
J.E., (1995.) Second-language speech learning: theory, findings, and problems. In: Strange, W. (Ed.) Speech Perception
and linguistic Experience: Theoretical and
Methodological Issues (pp229-273). Baltimore: New York Press.
Flege,
J., Takagi, N., & Mann, V. (1995).
Japanese adults can learn to produce English /ɹ/ and /l/ accurately. Language
and Speech 38 (1), 25-55
Flege,
J.E., Takagi, N., Mann, V. (1996). Lexical familiarity and English-language experience affect Japanese adults’ perception
of /ɹ/
and /l/. Journal of Acoustical Society
of America,
99(2), 1161-1172
Goto,
H. (1971). Auditory perception by normal Japanese adults of the sounds “l” and “r”.
Neuropsychologia 9,
317-323
Guion,
S., Flege, J., Yamada, R.A., & Pruitt, J. (2000). An investigation of current models of second language speech perception:
The case of Japanese adults’ perception
of English consonants. Journal of the Acoustical Society of America, 107 (5), 2711-2724
Hazan,
V., Sennema, A., Iba, M. & Faulkner, A. (2004). Effects of audiovisual perceptual training on the perception and
production of consonants by Japanese learners
of English. Speech Communication, 47, 360-378.
Lively,
S., Logan, J., & Pisoni, D. (1993). Training Japanese listeners to identify
English /r/ and /l/. II: The role of
phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94 (3),
1241-1255.
Lively,
S., Pisoni, D., Yamada, R.A., Tohkura, Y. & Yamada, T. (1994). Training Japanese listeners to identify English /r/
and /l/. III. Long-term retention of new phonetic
categories. Journal of the Acoustical
Society of America,
96 (4), 2076-2087
MacKain,
K.S., Best, C.T., and Strange, W. (1981.) Categorical perception of English /r/
and /l/ by Japanese bilinguals. Applied Psycholinguistics, 2, 369-390.
Miwayki,
K., Strange, W., Verbrugge, R., Liberman, A., Jenkins, J. & Fujimura, O. (1975). An effect of linguistic
experience: The discrimination of [r] and [l] by native speakers of Japanese and English. Perception and psychophysics, 18 (5), 331-340
Munro,
M.J., & Bohn, N.S., (2007). The study of second language
speech: A brief overview. In
O.S. Bohn (Ed.) Language experience in second language speech learning: In honor of James Emil Flege
(pp. 3-11). Amsterdam:
John Benjamins Publishing Company.
O’Conner,
J.D., Gerstman, L.J., Liberman, A.M., Delattre, P.C, & Cooper, F.S.
(1957). “Acoustic
cues for the perception of initial /w,r,l/ in English,” Word 13, 25-43.
Roach,
P. (2009). English phonetics and phonology: A practical course. Cambridge: Cambridge University
Press.
Sheldon,
A., & Strange, W. (1982). The
acquisition of /r/ and /l/ by Japanese learners of English: Evidence that speech production can precede
speech perception. Applied Psycholinguistics, 3, 243-261.
Vance,
T. (1987). An introduction to Japanese phonology. Albany:
State University
of New York Press.
Grade: H2A
Grade: H2A
3 comments:
According to this theory, articulation and perception are connected in the mind. speech recognition program
The spam bots are getting cleverer, I see. The link posted above is just spam, but since this spam bot has gone through the trouble of commenting on something vaguely related to the post, I think I'll let this comment stand for the moment.
Indeed, as I mentioned in this paper:
Perception and production are clearly closely interrelated, although the precise relationship between them is still not completely clear (Bohn & Flege, 1997). The literature seems to be split on whether production leads perception, or whether perception leads production.
Some studies have shown that at least in the early stages of L2 learning increased perception accuracy may result in more accurate production (Bohn et al. 1997).
Others have argued that the reverse is true and that the ability to produce correct phonetic distinctions shapes perceptual abilities. According to this theory, articulation and perception are connected in the mind. A non-native speaker will adjust their pronunciation until they are understood by a native listener, and this change in articulation results in the re-shaping of their mental phonetic categories, resulting in an increase in perceptual abilities (Sheldon and Strange, 1982).
Please, do help me I'm struggling with pronunciation
Post a Comment