(Originally submitted November 19,2010)
Second language learners often have trouble perceiving and producing sounds that are not part of their native language.
Perception and production are clearly closely interrelated, although the precise relationship between them is still not completely clear (Bohn & Flege, 1997). The literature seems to be split on whether production leads perception, or whether perception leads production.
Some studies have shown that at least in the early stages of L2 learning increased perception accuracy may result in more accurate production (Bohn et al. 1997).
Others have argued that the reverse is true and that the ability to produce correct phonetic distinctions shapes perceptual abilities. According to this theory, articulation and perception are connected in the mind. A non-native speaker will adjust their pronunciation until they are understood by a native listener, and this change in articulation results in the re-shaping of their mental phonetic categories, resulting in an increase in perceptual abilities (Sheldon and Strange, 1982).
However insofar as it is possible to separate perception and production, this paper will focus only on the perception of foreign speech sounds.
This paper will first look at perception of foreign speech sounds in general, before looking at the perception of English “r” and “l” sounds by Japanese learners of English as an example of second language speech perception.
Second Language Speech Perception
It is well documented that second language learners have trouble perceiving sounds that do not occur in their native language (Munro & Bohn, 2007). Since native speakers presumably have the same auditory capabilities as non-native speakers, accounting for this difference in perception creates a challenge for linguists.
The ability to perceive the difference between non-native sounds appears to be lost fairly early in childhood.
By observing the interest of infants in different sounds (measured through the vigorousness of the infants sucking) we know that babies are born with the ability to distinguish between all sorts of sounds that their parents can not. For example English learning infants under the age of six months can distinguish phonemes used in Czech, Hindi and Inselkampx that English speaking adults, even with training or university coursework, cannot distinguish (Pinker, 1994).
However by six months the babies are beginning to organize sounds into phonemes according to the categories of their native language. By ten months they do not distinguish between phonemes that do not occur in their native language (Pinker, 1994.) By the age of 8, children show adult like perception of both native and non-native speech (Best & Tyler, 2006).
It has been hypothesized that this is the result of the limited processing ability of the attention mechanism in the human mind. Because the human mind can only focus its attention on a limited amount of aspects at one time, instead of focusing on all the features of any new piece of input the mind looks for patterns and organizes new input by categories already established. These established categories are the result of information already stored in the memory (Lively, Logan, & Pisoni, 1993).
There are two different models which are often used to explain second language speech perception: Flege’s Speech Learning Model and Best’s Perceptual Assimilation Model (Munro et al., 2007).
The Speech Learning Model (SLM), developed by James Flege in 1995, suggests that learners will tend to assimilate foreign sounds to the phonetic categories of their native language, if the sounds are similar enough to allow assimilation. Therefore according to the SLM, sounds that are identical in the two languages present no problem to the learner. As far as new sound contrasts go, it is relatively easy for the learner to acquire new categories for sounds that are phonetically dissimilar from anything in the native language, because there is no problem of L1 interference (Hazan, Sennema, Iba, & Faulkner, 2005).
The Perceptual Assimilation Model (PAM), developed by Catherine Best in 1995, is based on a different theoretical framework and created for different purposes. (The SLM was developed for L2 learners actively learning a foreign language, whereas PAM was developed for naïve listeners (Best et al., 2006).) However PAM makes similar predictions about non-native speech sounds. According to the PAM, a non-native sound is either “categorized” (as an example of a pre-existing phoneme category from the native language), “uncategorized” (if similar to two or more native categories) or nonassimilable (if it is not similar to any pre-existing native category) (Hazan et al., 2005).
According to both of these theories, non-native speakers may still be able to discriminate between two or more sounds in an L2 if they are sufficiently phonetically dissimilar, and if there no such category in their L1, such as American English speakers correctly discriminating between various isiZulu click consonants (Best et al., 2006). However if two or more foreign speech sounds have a high degree of similarity, and if this contrast does not occur in the native language of the learner, and particularly if there is a native language phonetic category that both foreign sounds could be assimilated into, an adult learner will have trouble distinguishing between these sounds (Munro et al., 2007). One often cited case of just such an issue is the problem Japanese learners of English have distinguishing between the two English liquid consonants: “r” and “l”.
To better understand this problem, it is useful to briefly look at how liquid consonants compare in English and Japanese language, and then look at Japanese perceptions of English liquids.
Liquid Consonants in English and Japanese
English has two liquid consonants: “r” and “l”. Japanese only has one. This is thought to be the cause of difficulty Japanese speakers have in perceiving “r” and “l” sounds in English.
A liquid consonant is a kind of consonant in which the airflow is only partially obstructed in the oral cavity and, unlike stop consonants, air is still allowed to escape through part of the oral cavity. Unlike fricatives, there is also no friction created (Carr, 2008).
The /l/ phoneme in English is called a lateral equivalent, meaning that the air does not go through the center of the tongue (as in most other phonemes) but around the sides of the tongue. In making an /l/ sound the speaker usually touches the center of the tongue to the alveolar ridge. (Although the articulation of /l/ can change depending on whether or not it is before vowels or consonants, such as the “clear l”/ “dark l” distinction) (Roach, 2009).
The other liquid phoneme in English is the “r” consonant, and is called a post-alveolar approximate. The tip of the tongue gets close to the alveolar area, but never actually makes contact (Roach, 2009).
In both cases, the consonant is voiced, although devoicing can occur when it occurs in a consonant cluster with an unvoiced consonant (Roach, 2009).
The perceptual differences between English “r” and “l”, as measured on an acoustic spectrogram, are located in variation in the steady-state onset, and frequency transition of the third oral format (F3). It is based on these differences that American speakers differentiate between /r/ and /l/ sounds (O’Connor, Gerstman, Liberman, Delattre & Cooper, 1957).
The Japanese have one liquid consonant, or at least a consonant that is often referred to as a liquid. (Some phoneticians question whether the Japanese consonant would be more accurately referred to as a flap (Flege, Takagi & Mann, 1995).) It is represented in the Japanese writing system by the symbols ら、り、る、れ、and ろ. Using the Hepburn writing system, the most conventional way of converting Japanese sounds into the Roman alphabet, these sounds are usually written as “ra”, “ri”, “ru”, “re”, and “ro”. (Because the Japanese writing system is based on a syllabary rather than an alphabet, with the exception of the syllable final “n” it is impossible to isolate a single consonant on its own.) It is this convention that gives us the “r” consonant in such well-known Japanese words as “karate”, “samurai”, “Hiroshima,” and others.
In precise phonetic terms, exactly what this sound is, and how it is articulated, is a matter of some debate. Its pronunciation may vary depending on whether or not it is word initial (or utterance initial), depending on which vowels it proceeds, depending on whether or not it is lengthened for emphasis, and depending on individual variation among speakers. It has been described as an apico-alveolar tap (palatalized before /i/ and /y/). Accordingly various phoneticians have assigned it different values using the International Phonetic Alphabet (IPA) [r], [ɹ ], [lː][ɾ]or [d] (Vance, 1987).
However despite these differences in articulation, Japanese speakers still consistently assimilate both English liquid consonants with the Japanese “r” (Ayoma, Flege, Guion, Yamada & Yamada, 2004).
The exact perceptual relationship between the English [ɹ] and [l] and the Japanese “r” is also uncertain. Japanese listeners identify both the English [ɹ] and [l] as the Japanese “r”, although it maybe closer to [l] (Guion, Flege, Yamada & Pruitt, 2000). Flege et al. (1995) write that “Japanese /r/ appears to occupy a position in phonological space that is somewhere between English /l/, /ɹ/, and /d/ (and possibly /w/).”
Japanese Perception of /r/ and /l/
The fact that Japanese speakers have had difficulty pronouncing “r” and “l” sounds has long been observed informally. (The US War Department made use of this in 1942 with their pamphlet “How to Spot a Jap” (Caniff, 1942)). However the first serious linguistic study on the matter was done by Hiromu Goto in 1971. Goto also established for the first time that perception was just as much of a problem for Japanese speakers as production, and that their listening discrimination test results for Japanese speakers were not much above chance. This was apparently contrary to what most people expected at the time. “Now the question is whether or not we Japanese can distinguish ‘L’ from ‘R’ when it is enunciated by native speakers of English. Most people have thought that we could clearly distinguish them since the native teachers would naturally emphatically differentiate them,” (Goto, 1971.)
Since that time many further tests have also validated this research, as well as shown that the errors of Japanese speakers are consistently bi-directional. Japanese speakers are just as likely to misidentify an English “r” sound as “l”, as the reverse (Flege et al. 1995).
A subsequent study by Miyawaki, Strange, Verbrugge, Liberman, Jenkins and Fujimura (1975) also showed that when the frequency values of the first and second format were held constant, and only the third format (F3) was changed, American listeners tended to perceive the changes categorically in terms of “r” and “l” sounds depending on the transition of the F3, whereas the Japanese listeners showed much more random results. However when the third format was isolated and just played by itself (a non-speech sound) there was little difference between Japanese and Americans. The authors concluded that the fact that perception only differed within speech sounds means that it is the result of linguistic experience and not auditory functions.
Also, because the contrast between /r/ and /l/ is based on spectral cues, it has been argued that the perception is more difficult for foreigners to acquire than temporal cues such as voice-onset time (Lively, Pisoni, Yamada, Tohkura, Yamada, 1994).
Much research has been done into how, and under what conditions, perception is acquired. Many experiments were developed that sought to create new phonetic categories in the minds of the listener by perceptual training. A typical example of this is the study carried out by Bradlow, Pisoni, Yamada and Tohkura in 1997 (which itself was a replication of several previously published studies with similar results). They presented Japanese listeners with an /r/-/l/ minimal pair on a computer screen, and then asked the Japanese listener to connect the word they heard on the headphones with the correct orthographic representation on the computer screen. Correct answers were rewarded with a chime. Wrong answers received a buzzer signaling an incorrect response, and the test word was repeated until the correct answer was given. By this method, the perception of /r/ and /l/ phonemes greatly increased from the pre-test (65% correct) to the post-test (81% correct). However the participants did not reach native English level perception, which is near perfect identification of /r/ and /l/ phonemes (Bradlow et al., 1997).
Outside of training, natural exposure also seems to play a part in improving perception. For example, a study by MacKain, Best, and Strange (1981) tested /r/ and /l/ perception on two groups of Japanese subjects, one experienced group, which had training in English conversation by native speakers, and an inexperience group, which did not have this training. The experienced Japanese subjects showed much better identification of /r/ and /l/ sounds, and were much closer to the American control subjects, than the inexperienced Japanese learners, although neither group had had explicit perceptual training outside of exposure the exposure to conversation.
A further study by Flege, Takagi and Mann (1996) also confirmed that Japanese subjects with English experience did better on /r/-/l/ perception tests than inexperienced Japanese subjects, although not quite as well as native speakers. Flege et al. also found an effect of lexical familiarity. Both experienced and inexperienced Japanese learners were more likely to correctly identify words they were already familiar with, indicating that previous linguistic experience does indeed play a role in perception abilities.
Another study, published by Aoyama et al. in 2003, tested the perception of 16 Japanese adults and 16 Japanese children living in Texas. The participants were tested twice, one year apart, on their perception of /r/ and /l/. The first testing was after the participants had been living in the United States for an average of 0.5 years, the second testing was at an average of 1.6 years length or residence. It was found that the perception of the Japanese children improved dramatically between the first and second test, but the Japanese adults’ perception did not improve significantly (Aoyama et al., 2003).
Future Research and Personal and Professional Experience
During my eight years of teaching English in Japan, and informally interacting with native Japanese speakers outside of the classroom, I can personally testify that the “r” and “l” contrast is a major issue for Japanese learners. It is not the only phonological issue Japanese students have when learning English. There is also a failure to distinguish (both perceptively and productively) between “sh” and “s” sounds, between “v” and “b” sounds, between “chi” and “ti”, and others.
However my informal observation (not based on any statistical measurement) is that it is the “r” and “l” contrast that most often leads to confusion when conversing in English with L1 Japanese speakers. This is probably due to the fact that there exists a high amount of naturally occurring /r/-/l/ minimal pairs in the English language. In these cases sometimes the meaning is clear from the context, but sometimes it is not. More often than not, as far as the Japanese learner is concerned, communication break-down happens on the perceptive end. The native English speaker has more resources (such as a larger receptive and productive vocabulary) to usually infer the meaning even if learner mispronounces the word. The learner, however, is more likely to misunderstand, particularly if they are familiar with only one word in a minimal pair and their conversation partner is using the unfamiliar word (again, all based on my informal observation).
Failure to accurately perceive /r/ and /l/ sounds are therefore a source of frustration for Japanese learners and the foreign language teacher alike, and more research into the acquisition of non-native speech sounds would be welcome.
As mentioned in the introduction to this paper, it is still unclear whether perception leads production, or production leads perception. This question is of obvious importance for the language teacher looking to create a curriculum, and so it is a beneficial area to further explore.
Also, in their 2006 paper, Best and Tyler suggest many areas where further research on non-native and second language speech perception should take place. Many of these suggestions could be applied to the /r/ and /l/ contrast.
For example, if the acquisition of new phonetic categories is caused by the acquisition of new L2 vocabulary, Best and Tyler suggest testing beginning learners to see if there is an identifiable critical point in the expansion of the L2 lexicon at which these new categories are created.
Best and Tyler also claim that many studies on the acquisition of second language speech perception have used the passage of time as the only variable, without examining what influences were present during that time. They suggest looking at such variables such as the relationship between orthography and perception, and the influence on the learner of their conversational partners’ style of speech (since most conversational partners will make adjustments in their speech patterns according to the listener’s needs).
Best and Tyler also suggest comparing perceptual recognition of L2 speech contrast between the foreign language environment and the second language environment.
Hopefully future studies like these will continue to give us more insight into the nature of second language speech perception.
Aoyama, K., Flege, J.E., Guion, S., Yamada, R.A. & Yamada, T. (2004). Perceived phonetic dissimilarity and L2 speech learning: The case of Japanese /r/ and English /l/ and /r/. Journal of phonetics, 32, 233-250.
Best, C. (1995.) A direct realist view of cross-language speech perception. In Strange, W. (Ed.) Speech Perception and Linguistic Experience: Theoretical and Methodological Issues (pp. 171-204). Baltimore: New York Press
Best, C.T., & Tyler, M.D. (2006). Nonnative and second-language speech perception: Commonalities and complementarities. In M.J. Munro & O.-S. Bohn (Eds.) Second language speech learning: The role of language experience in speech perception and production. (pp.2-47). Amsterdam: John Benjaimins.
Bohn, O.S., & Flege, J.E. (1997). Perception and production of a new vowel category by adult second language learners. In Leather, J. & James, A. (Eds.) Second-language speech: structure and process. (pp. 51-71). Berlin: de Gruyter.
Bradlow, A., Pisoni, D., Yamada, R.A., & Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. Journal of Acoustical Society of America, 101 (4), 2299-2310.
Caniff, M. (1942). How to spot a Jap. Washington D.C.: US War and Navy
Carr, P. (2008). A Glossary of Phonology. Edinburgh: Edinburgh University Press Ltd.
Flege, J.E., (1995.) Second-language speech learning: theory, findings, and problems. In: Strange, W. (Ed.) Speech Perception and linguistic Experience: Theoretical and Methodological Issues (pp229-273). Baltimore: New York Press.
Flege, J., Takagi, N., & Mann, V. (1995). Japanese adults can learn to produce English /ɹ/ and /l/ accurately. Language and Speech 38 (1), 25-55
Flege, J.E., Takagi, N., Mann, V. (1996). Lexical familiarity and English-language experience affect Japanese adults’ perception of /ɹ/ and /l/. Journal of Acoustical Society of America, 99(2), 1161-1172
Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “l” and “r”. Neuropsychologia 9, 317-323
Guion, S., Flege, J., Yamada, R.A., & Pruitt, J. (2000). An investigation of current models of second language speech perception: The case of Japanese adults’ perception of English consonants. Journal of the Acoustical Society of America, 107 (5), 2711-2724
Hazan, V., Sennema, A., Iba, M. & Faulkner, A. (2004). Effects of audiovisual perceptual training on the perception and production of consonants by Japanese learners of English. Speech Communication, 47, 360-378.
Lively, S., Logan, J., & Pisoni, D. (1993). Training Japanese listeners to identify English /r/ and /l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94 (3),
Lively, S., Pisoni, D., Yamada, R.A., Tohkura, Y. & Yamada, T. (1994). Training Japanese listeners to identify English /r/ and /l/. III. Long-term retention of new phonetic categories. Journal of the Acoustical Society of America, 96 (4), 2076-2087
MacKain, K.S., Best, C.T., and Strange, W. (1981.) Categorical perception of English /r/ and /l/ by Japanese bilinguals. Applied Psycholinguistics, 2, 369-390.
Miwayki, K., Strange, W., Verbrugge, R., Liberman, A., Jenkins, J. & Fujimura, O. (1975). An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English. Perception and psychophysics, 18 (5), 331-340
Munro, M.J., & Bohn, N.S., (2007). The study of second language speech: A brief overview. In O.S. Bohn (Ed.) Language experience in second language speech learning: In honor of James Emil Flege (pp. 3-11). Amsterdam: John Benjamins Publishing Company.
O’Conner, J.D., Gerstman, L.J., Liberman, A.M., Delattre, P.C, & Cooper, F.S. (1957). “Acoustic cues for the perception of initial /w,r,l/ in English,” Word 13, 25-43.
Roach, P. (2009). English phonetics and phonology: A practical course. Cambridge: Cambridge University Press.
Sheldon, A., & Strange, W. (1982). The acquisition of /r/ and /l/ by Japanese learners of English: Evidence that speech production can precede speech perception. Applied Psycholinguistics, 3, 243-261.
Vance, T. (1987). An introduction to Japanese phonology. Albany: State University of New York Press.