Monday, June 25, 2018

Bottom-Up and Top Down Listening for Elementary Learners (Research Essay)

Google: drive, docs, pub
[This is the research essay I needed to complete as part of my application for the Delta. 
Also as part of the application process, I needed to plan and teach a lesson based on this research paper.  That lesson I have posted over on my main blog here.]

My Reason for Choosing Listening
There are several reasons why listening appeals to me as an area of research.  First of all, it is the primary source of input for the learners, especially early learners.  (In my personal experience, the low level classes that I teach, both children and adults, all follow textbooks that are light on reading activities, but heavy on listening activities.)  According to David Nunan, over 50% of the time learners spend in a foreign language is devoted to listening (as quoted in Nation & Newton p. 37).
Also, I am sympathetic to Krashen’s input hypothesis which claims that all language features are acquired through comprehensible input.  For example, Krashen believed that speaking ability can not be taught by oral drilling, but instead will naturally emerge if the student is exposed to sufficient language at a level they can understand.  Although I believe in a modified form of this (which uses conscientious grammar study as a supplement to comprehensible input), I still believe that no acquisition can take place without plenty of input.  Since most of this input is in the form of listening, especially at the early levels, I am interested in finding ways to aid learners in comprehending the input.
Within the general skill of listening, I have chosen to focus on top-down and bottom up listening skills for pre-intermediate students.  Bottom-up listening is especially important for the grammatical features to be noticed, and thus important for Krashen’s input theory.  However many TESOL authorities advocate that top down listening skills should always proceed any bottom-up work (for example Harmer, 1998, p. 100).  So it is necessary to focus on both skills in the lesson.  I have chosen elementary students for two reasons—one is the obvious practical reason that this is the class I am currently teaching.  But also my elementary students struggle a lot with the listening exercises from the textbook, and I want to focus on ways to improve their listening skills.

There is wide agreement among theorists that listening actually consists of two skills: Top Down Processing and Bottom Up Processing (a view supported in almost all the resources I consulted, including: Nation & Newton, Pinker, Harmer, Scrivener).  Bottom Up Processing is succinctly described by David Nunan, who writes “The bottom up processing model assumes that listening is a process of decoding the sounds that one hears in a linear fashion, from the smallest meaningful units (or phonemes) to complex texts.  According to this view, phonemic units are decoded and linked together to form words, words are linked together to form phrases, phrases are linked together to form utterances, and utterances are linked together to form meaningful text” (p. 200).
There is an intuitive logic to the idea of bottom-up processing.  When we speak we articulate sounds, and we naturally assume listeners are decoding the sounds that we articulate. However, this is probably not the case.  “Spoken language probably comes at you too fast to be able to adopt such an item-by-item approach on its own,” writes Jim Scrivener (2005, p.178).
In actuality, human beings do not simply decode sound patterns like robots.  Steven Pinker, in his book The Language Instinct, devotes a chapter to the differences between how humans decode language, and how computers decode language.  Often times the exact same phonemes could be interpreted as multiple words (e.g. I scream, ice cream), but unlike computers, humans decode the words according to context.  Research and laboratory tests show that oftentimes learners will perceive the words that they expect to hear in a given context, rather than the actual words that are spoken (Pinker, 2005).
All of this indicates that actual listening perception may be a largely top-down procedure. Rather than mechanically decoding the sounds into words, we make large use of the context and our prior expectations to interpret speech.
This awareness has lead to a big emphasis on Top-Down Processing in second language classrooms.  It is expected that learners will have huge difficulties decoding speech if they are not given sufficient context for the speech, or do not have any expectations.  This has lead to the Schema Theory.  According to this theory, everyone has a mental framework built up of past experiences and previously learned knowledge.  We use this framework, or schemata, to help us decode phonemic sounds into meaning.  (Nunan, p. 201-204). The English learner, while perhaps not knowledgeable about the English language, already has fair amount of general knowledge about the world, and they will use this knowledge to help them decode the sounds that they hear.
And in fact, schema building has been shown to help accuracy in listening exercises.  Nunan reports on an investigation conducted by Spada on the effectiveness of structuring a listening lesson for learners.  Learners who did a set of predictive exercises before the listening task did much better on the listening exercises than learners who did not do pre-listening tasks (Spada in Nunan, p.208).
It is therefore the job of the ESL teacher to try to activate the students’ schemata before the listening, and to provide sufficient context for the listening.
In my personal experience, many textbooks I have taught out of already follow this pattern, with pre-listening activities designed to familiarize students with the context, and to activate their previous knowledge of the topic.  IELTS preparation textbooks also often encourage the student to visualize the situation before listening, and to predict the likely vocabulary that they will hear.
At the same time, however, it is important not to neglect Bottom-Up Listening skills.  The process of listening is not solely Top Down or Bottom up, but rather a combination of both procedures at work (Nation & Newton, p. 40-41).
Bottom-up listening skills are especially important for listening tests when the correct answer does not match the content schemata.  Research has shown that the students who correctly answered questions where the answer did not match the schemata made use of Bottom Up processing (p. Nation & Newton, p. 41).
Also, in terms of listening comprehension aiding language acquisition, Bottom-Up processing is especially important for noticing grammatical features.  In my personal experience, I can   think of many students who had very high listening comprehension abilities, but very grammatically flawed production.  This is probably because the student has been overly-reliant on Top Down listening procedures to infer meaning, and as a result has not noticed many of the grammatical features available in the input.  Swain’s study of English students in French immersion schools also confirms the existence of students who were doing quite well in the subject matter, but not improving their French grammar (as quoted in Nation & Newton, p.41).
It is the job of the language teacher, therefore, to assist students not only in developing their top down listening abilities, but also in their bottom up listening procedures.

Classroom Procedures for Top Down Listening
In my personal experience, I have taught a wide range of levels and I usually find it useful to make a distinction between lower level general English classes, and examination classes.
In an actual IELTS examination, there will be no teacher present to activate the students’ schemata before the listening test.  The students must therefore take the responsibility for doing the pre-listening tasks themselves, although the teacher should help them with strategies.  (And most IELTS textbook teach these strategies, for example reading the questions and trying to predict the situation and vocabulary.)
In lower-level classes, the students will not be familiar with these strategies, and the teacher should take a more active role in setting the context.  In lower levels especially, the students often feel a certain amount of anxiety when listening to English, and it is important for them to gain successful experiences in listening tasks, so as not to feel disheartened (Scrivener, 2005, p.177).  Therefore the teacher should give as much support to the students as possible in the form of setting context, activating schemata, helping with vocabulary, et cetera.
Listening tasks usually create a certain anxiety among students due to the fact that the listeners can not control the speed of material.  Unlike reading, they cannot go back and forth in the text to resolve a point of confusion.  Unlike conversation, they cannot interrupt to ask for clarification.  As Jeremy Harmer says: “…the speed of the speaker(s) dominates the interaction, not that of the listeners. … It is perhaps this relentlessness of taped material which accounts for the feeling of panic which many students experience during listening activities” (Harmer, p. 99, 1998).
To overcome this feeling of panic, Harmer suggests that listeners be trained to first listen to the tape for a general understanding, rather than trying to pick out every little detail.  “They must first get into the habit of letting the whole tape “wash over them” on first hearing, thus achieving general comprehension before returning to listening for specific details” (p. 99).
Therefore, every listening exercise should start out with what is commonly known as a “gist” listening, in which the CD is played once just to give the students a general idea of the content before moving on to more specific tasks.

Classroom Procedures for Bottom Up Listening
Several bottom-up listening activities are detailed in chapter 4 of Nation & Newton, but one is dictation.  They write that this exercise is useful because “Dictations help language learning by making learners focus on the language form of phrase and clause level constructions, and by providing feedback on the accuracy of their perception” (p. 59).
Nation & Newton write that a good dictation text is “a piece of connected language about 100 to 150 words long” (p.59).  The teacher reads the text to the class, and the students write down the words that they hear.
Nation & Newton detail several variations on dictations (running dictation, one chance dictation, dictation of long phrases, guided dictation, dictation for a mixed class, peer dictation, perfect dictation, unexploded dictation, monitoring dictation, dictogloss, et cetera).
Since this is the first dictation activity I will be attempting with this group of students, the completion dictation combined with the sentence dictation seems appropriate as this will give them the most possible support.  In the completion dictation, the students have several copies of the text.  In the first dictation sheet only a few of the words are missing.  The next copy has more words missing.  This continues until the students are writing the whole text.
This activity can be combined with sentence dictation, which is a technique where the teacher writes the correct sentence on the blackboard (or PowerPoint) immediately, to give the students a chance to see and correct their mistakes before moving onto the next sentence.
With Vietnamese students in particular, the /s/ sound on the end of words is particularly a problem.  It is most obviously a problem in production, but I have noticed in my own classrooms that is often a problem in perception.  (In my IELTS classes, for example, students will often lose points on practice examinations because they do not hear the final “s” on the end of the word.)
Therefore, I am anticipating that the final /s/ sound will be particularly challenging for my students in a dictation activity.
There is, however, a technique advocated by Nation & Newton to help students with any difficulty in perception.  They advocate several pre-dictation activities (p.61-61).  One such activity is to have the students study the text before the dictation, and to be directed to underline certain features such as “verb endings, plural s, etc.” (p.61)


David N. (1999) Second Language Teaching & Learning Boston: Heinle & Heinle Publishers

Harmer, J. (1998)  How to Teach English Essex: Pearson Education Limited

                                  (No page numbers are given for Krashen and Terrell in the in-text citation because I have no copy in front of me and am referencing from memory).

Nation, I.S.P., Newton, J. (2009) Teaching ESL/EFL Listening and Speaking  New York: Routledge

                                  (No page numbers are given for Pinker in the in-text citation because I have no copy in front of me and am referencing from memory).

Friday, February 28, 2014

Identification and Discrimination of /r/ and /l/ Phonemes by L1 Japanese Speakers: A Comparison Between Two Groups

[Originally submitted December 2010.  This is a minor thesis I wrote.  I have copied and pasted below, but some of the formatting got a bit mangled copying from Microsoft Word to blogger, and the version on Google Documents can be viewed here. (drive, docs, pub) Although I've chosen to post this online for whatever it may or may not be worth to anyone, it should be treated with caution.  It is a severely flawed study for all the reasons mentioned in the limitations section (Chapter 7) as well as the criticisms pointed out by the reviewers at the end.]

            This study tested two groups of L1 Japanese speakers on perception of /r/ and /l/ phonemes.  One group had never lived outside of Japan.  The other group was living in Melbourne, Australia at the time of the testing.  Both groups were tested on three different tasks: identification of /r/ and /l/ phonemes, discrimination of same and different /r/ and /l/ phonemes, and identification of correct English phonotactic phoneme sequences using /r/ and /l/.  The two groups were then compared against each other to see if there was any advantage for the group living in Melbourne.  The results showed that the two groups were similar on identification tasks and phonotactic tasks.  However the group living in Melbourne scored significantly higher for discriminating between same and different sounds. 

Table of Contents
Chapter 1. Literature Review……………………………………………………….....7
1.1 General Literature on Second Language Speech Perception………………………...7
1.2Japanese Perception of /r/ and /l/…………………………………………………......9
1.2.1 Liquid Consonants in English and Japanese……………………………………….9 /l/……………………………………………………………………..…10 English /r/………………………………………………………………………11 Japanese liquid features……………………………………………………….12
1.2.2 Research on Japanese perception of /r/ and /l/…………………………………..13 Japanese perception of /r/ and /l/ in consonant clusters………………………..18 English phonotactics with /r/ and /l/ and perception…………………………20
Chapter 2. This study……………………………………………………...………….24
Chapter 3. Research hypotheses……………………………………………………25
Chapter 4. Methodology……………………………………………………………..27
4.1 Participants…………………………………………………………………………27
4.1.1 Participants residing in Melbourne……………………………………………….27
4.1.2 Participants residing in Japan……………………………………………………29
4.2 Tests………………………………………………………………………………..30
4.2.1Listening test one………………………………………………………………….30 Listening test one materials…………………………………………………….30 Task One: Identification of /r/ and /l/ Sounds…………………………………33 Task Two: Identification of Same or Different Sounds………………………..34
4.2.2 Listening test two…………………………………………………………………36 Listening test two materials…………………………………………………..36 Listening test two tasks........................................................................................39 Listening identification task.............................................................................39 Post perception test task...................................................................................42
4.3 Statistical Analyses Procedures.................................................................................43
Chapter 5. Results.........................................................................................................45
5.1. Listening test 1, Task 1: Identification of /r/ and /l/.................................................45
5.1.1 Total scores.............................................................................................................45
5.1.2 Singleton scores......................................................................................................47
5.1.3. Consonant cluster scores.......................................................................................49
5.1.4 Comparisons between singletons and consonant clusters......................................54
5.2 Listening test one, task two: accuracy in identifying same or different sounds (discrimination task)......................................................................................................55
5.3 Test two: Accuracy in identifying correct English phonotactics..............................59
5.4 Test two: Known versus unknown words..................................................................64
5.5 Correlations between tasks........................................................................................66
5.6. Correlations between perception accuracy and length of residence (Melbourne group only).....................................................................................................................71
Chapter 6. Discussions..................................................................................................73
6.1 Singletons versus consonant clusters........................................................................73
6.1.1 Identification task..................................................................................................73
6.1.2 Singletons versus consonant clusters: Discrimination task...................................75
6.2 Phonotactic awareness...............................................................................................75
6.3 Comparisons between groups....................................................................................78
6.3.1 Discrimination and identification tasks..................................................................78
6.3.2 Length of residence variable...................................................................................80
6.4 Correlations between tasks.......................................................................................82
6.4.1 Phonotactic awareness task....................................................................................82
6.4.2 Identification and discrimination tasks...................................................................82
Chapter 7. Limitations.................................................................................................83
7.1 Participants...................................................................................................83
7.1.1 Control group..........................................................................................................83
7.1.2 Participant number.........................................................................................84
7.1.3 Balanced participants..........................................................................................84
7.2 Tasks..........................................................................................................................85
7.2.1 Discrimination task................................................................................................85
7.2.2 Phonotactic task.....................................................................................................85
7.3 Consistency between groups....................................................................................87
Appendix I. Participant tasks……………………………………………………….93
Appendix II. List of tokens.............................................................................................96

1. Literature review
1.1 General literature on second language speech perception
            It is well documented that second language learners have trouble perceiving sounds that do not occur in their native language (Munro & Bohn, 2007).  Since native speakers presumably have the same auditory capabilities as non-native speakers, accounting for this difference in perception creates a challenge for linguists.
            The ability to perceive the difference between non-native sounds appears to be lost fairly early in childhood (Best & Tyler, 2006).
              By observing the interest of infants in different sounds (measured through the vigorousness of the infants sucking) we know that babies are born with the ability to distinguish between all sorts of sounds that their parents can not.  For example English learning infants under the age of six months can distinguish phonemes used in Czech, Hindi and Inslekampx (a Native American language) that English speaking adults, even with training or university coursework, cannot distinguish (Pinker, 1994).
            However by six months the babies are beginning to organize sounds into phonemes according to the categories of their native language.  By ten months they do not distinguish between phonemes that do not occur in their native language (Pinker, 1994.)  By the age of 8, children show adult like perception of both native and non-native speech (Best et al., 2006).
            There are two different models which are often used to explain second language speech perception: Flege’s Speech Learning Model and Best’s Perceptual Assimilation Model (Munro et al., 2007).  The Speech Learning Model (SLM), developed by Flege in 1995, suggests that learners will tend to assimilate foreign sounds to the phonetic categories of their native language, if the sounds are similar enough to allow assimilation.  Therefore according to the SLM, sounds that are identical in the two languages present no problem to the learner.  As far as new sound contrasts go, it is relatively easy for the learner to acquire new categories for sounds that are phonetically dissimilar from anything in the native language, because there is no problem of L1 interference (Hazan, Sennema, Iba, & Faulkner, 2005).  
            The Perceptual Assimilation Model (PAM), developed by Best in 1995, is based on a different theoretical framework and created for different purposes.  (The SLM was developed for L2 learners actively learning a foreign language, whereas PAM was developed for naïve listeners (Best et al., 2006).)  However PAM makes similar predictions about non-native speech sounds.  According to the PAM, a non-native sound is either “categorized” (as an example of a pre-existing phoneme category from the native language), “uncategorized” (if similar to two or more native categories) or nonassimilable (if it is not similar to any pre-existing native category) (Hazan et al., 2005). 
            According to both of these theories, non-native speakers may still be able to discriminate between two or more sounds in an L2 if they are sufficiently phonetically dissimilar, and if there no such category in their L1, such as American English speakers correctly discriminating between various isiZulu click consonants (Best et al., 2006).  However if two or more foreign speech sounds have a high degree of similarity, and if this contrast does not occur in the native language of the learner, and particularly if there is a native language phonetic category that both foreign sounds could be assimilated into, an adult learner will have trouble distinguishing between these sounds (Munro et al., 2007).  One often cited case of just such an issue is the problem Japanese learners of English have distinguishing between the two English liquid consonants: “r” and “l”.
            To better understand this problem, it is useful to briefly look at how liquid consonants compare in English and Japanese language, and then look at Japanese perceptions of English liquids.
1.2 Japanese perception of /r/ and /l/
1.1.2 Liquid consonants in English and Japanese
            English has two liquid consonant phonemes: /r/ and /l/.  Japanese only has one.  This is thought to be the cause of difficulty Japanese speakers have in perceiving /r/ and /l/ sounds in English.
            A liquid consonant is a kind of consonant in which the airflow is only partially obstructed in the oral cavity and, unlike stop consonants, air is still allowed to escape through part of the oral cavity.  It is generally described as an approximant.  Unlike fricatives, there is also no friction created during the constriction phase (Carr, 2008). English /l/
            The phoneme of /l/ in English can vary widely in terms of its articulatory and acoustic realization.  It has many different allophones depending on its position in the syllable.  In the syllable initial position, however, the English /l/ tends to be a lateral alveolar approximant (Scobbie & Wench, 2003).  The /l/ phoneme in English is a lateral equivalent, meaning that in syllable initial position airflow does not pass through the center channel of the vocal tract (as in most other phonemes) but passes around the sides of the tongue blade and out of the vocal tract.  In producing an /l/ sound the speaker usually raises the tongue tip to the alveolar ridge (Roach, 2009).  The edges of the tongue blade are then compressed inward, away from the upper teeth, to create an airway.  Because of this, /l/ is a highly sonorous consonant despite the fact that there is alveolar contact (Scobbie et al., 2003).  /l/ is also always stronger in the onset position than when it is in the coda (Scobbie et al., 2003).
(Although the articulation of /l/ can change depending on whether or not it is before vowels or consonants, such as the “clear l”/ “dark l” distinction (Roach, 2009, Scobbie et al. 2003) this study will focus only on syllable initial /l/). English /r/
            The other liquid phoneme in English is the /r/ approximant.  The tip of the tongue is raised in proximity to the alveolar, but never actually makes contact (Roach, 2009).
            In both cases, the consonant is voiced, although devoicing can occur when it occurs in a consonant cluster with an unvoiced consonant (Roach, 2009).
            The major acoustic differences between English /r/ and /l/ are located in variation in the steady-state onset, and frequency transition of the third formant (F3).  Studies have shown that it is based on this frequency transition that native English speakers differentiate between /r/ and /l/ sounds (O’Connor, Gerstman, Liberman, Delattre & Cooper, 1957). Japanese liquid features
            Japanese has one liquid consonant phoneme, or at least a consonant that is often referred to as a liquid.  (Some phoneticians question whether the Japanese consonant would be more accurately referred to as a flap (Flege, Takagi & Mann, 1995).)  It is represented in the Japanese writing system by the symbols ら、り、る、れ、and .  Using the Hepburn writing system, the most conventional way of converting Japanese sounds into the Roman alphabet, these sounds are usually written as “ra”, “ri”, “ru”, “re”, and “ro”.  (Because the Japanese writing system is based on a syllabary rather than an alphabet, with the exception of the syllable final “n” it is impossible to isolate a single consonant on its own.)  It is this convention that gives us the “r” consonant in such well-known Japanese words as “karate”, “samurai”, “Hiroshima,” and others.
            In precise phonetic terms, exactly what this sound is, and how it is articulated, is a matter of some debate.  It appears to have some degree of phonotactic variation.  Its pronunciation may vary depending on whether or not it is word initial (or utterance initial), depending on which vowels it proceeds, depending on whether or not it is lengthened for emphasis, and depending on individual variation among speakers.  It has been described as an apico-alveolar tap (palatalized before /i/ and /y/).   Accordingly various phoneticians have assigned it different values using the International Phonetic Alphabet (IPA) [r], [ɹ], [l],[ɾ]or [d] (Vance, 1987).
            However despite these differences in phonetic transcription, Japanese speakers still consistently assimilate both English liquid consonants with the Japanese liquid (Ayoma, Flege, Guion, Yamada & Yamada, 2004).
            The exact perceptual relationship between the English [ɹ] and [l] and the Japanese liquid is also uncertain.  Japanese listeners identify both the English [ɹ] and [l] as the Japanese liquid, although it maybe closer to [l] (Guion, Flege, Yamada & Pruitt, 2000). Flege et al. (1995) write that “Japanese /r/ appears to occupy a position in phonological space that is somewhere between English /l/, /ɹ /, and /d/ (and possibly /w/).”
1.2.2 Research on Japanese perception of /r/ and /l/
            The fact that Japanese speakers have had difficulty pronouncing /r/ and /l/ sounds has long been observed informally.[1]  “Difficulties that Japanese … encounter with the English /l-r/ contrast are so well known as to become a linguistic stereotype” write Ingram and Park (1998).
However the first serious linguistic study on the matter was done by Goto (1971).  Goto also established for the first time that perception was just as much of a problem for Japanese speakers as production, and that listening discrimination test results for Japanese speakers were not much above chance.  This was apparently contrary to what most people expected at the time.  “Now the question is whether or not we Japanese can distinguish ‘L’ from ‘R’ when it is enunciated by native speakers of English.  Most people have thought that we could clearly distinguish them since the native teachers would naturally emphatically differentiate them,” (Goto, 1971.) 
            Goto also tested the ability of his participants to discriminate between /r/ and /l/ sounds as either the same or different.  This second test, he said, was different from the first because it was not testing English ability, just testing the “ability of auditory discrimination between one syllable and another,” (Goto, 1971.) 
            Goto’s participants were too few in number to establish results with statistical significance, but one an eight question test, none of his participants achieved accurate discrimination (which Goto defined as “a score of 8/8, or at least 7/8” (Goto, 1971)).
            Since that time many further tests have also validated this research, as well as showing that the errors of Japanese speakers are consistently bi-directional.  Japanese speakers are just as likely to misidentify an English /r/ sound as /l/, as the reverse (Flege et al. 1995).
            A subsequent study by Miyawaki, Strange, Verbrugge, Liberman, Jenkins and Fujimura (1975) also showed that when the frequency values of the first and second format were held constant, and only the third formant (F3) was changed, American listeners tended to perceive the changes categorically in terms of /r/ and /l/ sounds depending on the transition of the F3, whereas the Japanese listeners showed much more random results.  However when the third format was isolated and just played by itself (a non-speech sound) there was little difference between Japanese and Americans.  The authors concluded that the fact that perception only differed within speech sounds means that it is the result of linguistic experience and not auditory function alone. 
            Also, because the contrast between /r/ and /l/ is based on spectral cues, it has been argued that the perception is more difficult for foreigners to acquire than temporal cues such as voice-onset time (Lively, Pisoni, Yamada, Tohkura, Yamada, 1994). 
            Much research has been done into how, and under what conditions, perception is acquired.  Many experiments were developed that sought to create new phonetic categories in the minds of the listener by perceptual training.  A typical example of this is the study carried out by Bradlow, Pisoni, Yamada and Tohkura in 1997 (which itself was a replication of several previously published studies with similar results).  They presented Japanese listeners with an /r/-/l/ minimal pair on a computer screen, and then asked the Japanese listener to connect the word they heard on the headphones with the correct orthographic representation on the computer screen.  Correct answers were rewarded with a chime.  Wrong answers received a buzzer signaling an incorrect response, and the test word was repeated until the correct answer was given.  By this method, the perception of /r/ and /l/ phonemes greatly increased from the pre-test (65% correct) to the post-test (81% correct).  However the participants did not reach native English level perception, which is near perfect identification of /r/ and /l/ phonemes (Bradlow et al., 1997).  Repeated studies have shown that even after intensive training, there is a limit to how well Japanese speakers on average can be trained to identify /r/ and /l/, particularly in word initial positions, consonant clusters, and intervocalic positions. (Japanese speakers do somewhat better with word final /r/ and /l/) (Takagi, 2002).
            Outside of training, natural exposure also seems to play a part in improving perception.  For example, a study by MacKain, Best, and Strange (1981) /r/ and /l/ perception of two groups of Japanese subjects was tested, one experienced group, which had training in English conversation by native speakers, and an inexperienced group, which did not have this training.  The experienced Japanese subjects showed much better identification of /r/ and /l/, and were much closer to the American control subjects, than the inexperienced Japanese learners, although neither group had had explicit perceptual training outside of exposure to English conversation.
            A further study by Flege, Takagi and Mann (1996) also confirmed that Japanese subjects with English experience did better on /r/-/l/ perception tests than inexperienced Japanese subjects, although not quite as well as native speakers.  Flege et al. also found an effect of lexical familiarity.  Both experienced and inexperienced Japanese learners were more likely to correctly identify words they were already familiar with, indicating that previous linguistic experience does indeed play a role in perception abilities.
            Another study (Aoyama et al., 2003) tested the perception of 16 Japanese adults and 16 Japanese children living in Texas.   The participants were tested twice, one year apart, on their perception of /r/ and /l/.  The first test was after the participants had been living in the United States for an average of 0.5 years, the second test was at an average of 1.6 years length or residence.  It was found that the perception of the Japanese children improved dramatically between the first and second test, but the Japanese adults’ perception did not improve significantly.
            It should also be noted that, although these studies deal with average scores, some of them do contain anomalies where certain individual Japanese speakers perform much higher than average.  Although no research has shown averages of Japanese speakers obtaining high levels of identification, certain individuals within these averages sometimes obtain near-native results (Underbakke, Polka, Gottfried & Strange, 1988).  However because few longitudinal studies of nonnative phonetic perception exist, what causes this individual difference, whether aptitude or linguistic experience or a combination, is difficult to determine (Underbakke et al., 1998). Japanese perception of /r/ and /l/ in consonant clusters
            Most of the previous research has indicated that Japanese speakers will identify /r/ and /l/ sounds less accurately as initial consonant clusters than as initial single consonants.  This result was shown in Goto (1971), Mochizuki (1981), Sheldon and Strange (1982), Lively, Logan and Pisoni. (1993). 
            In fact, Lively, Logan and Pisoni (1993) write: “To our knowledge, every experiment that has examined Japanese listener’s perception of /r/ and /l/ in different phonetic environments has found that contrasts in initial consonant clusters are the most poorly identified.”
            Many different reasons have been postulated for this, and it may well be a combination of different reasons.
            Japanese listeners often have difficulty perceiving English consonant clusters in general (Dupoux, Kakehi, Hirose, Pallier & Mehler, 1999).  This is because the phonotactics of Japanese do not allow consonant clusters.  According to Japanese phonotactics, it is impossible for one consonant to directly follow another consonant without a vowel in between.  (Japanese listeners, when transcribing English words with consonant clusters, will often insert an epenthetic vowel between consonants (Dupoux et al., 1999).)
            Because Japanese listeners do not have consonant clusters in their native language, it has been theorized that this causes them added difficulty in perceiving consonant clusters in English (Mann, 1986).
            Another theory is one put forward by Sheldon and Strange (1982).  When liquids occur with other consonants, they are usually coarticulated.  This results in the sound being different than it would be in the single state, and may cause a problem to non-native speakers who have been trained to perceive the sound as a single consonant.  Also the steady-state values for the third formant (what has been determined as critical for perceiving the differences between /r/ and /l/ sounds) may not exist.
            Also research has shown that /r/ and /l/ phonemes are identified correctly by Japanese speakers in direct proportion to their duration.  /r/ and /l/ phonemes which occur in the word final position have the longest duration, and are consequently identified by Japanese speakers with the greatest accuracy.  Syllable initial consonant clusters containing /r/ and /l/ have the shortest duration for the /r/ and /l/ phonemes, and are consequently identified at the lowest rate of accuracy (Lively et al., 1994). English phonotactics with /r/ and /l/ and perception
            Within any language there are certain phoneme combinations that are prohibited independently of articulatory abilities.  The study of what phoneme combinations are possible and what phoneme combinations are impossible in a certain language is called phonotactics (Roach, 2009).
            Phonotactics are not the same across all languages.  An oft cited example is the phoneme [ŋ], which in English can appear at the end of words, but does not occur word or syllable-initially..  Vietnamese, however, allows a word initial [ŋ] phoneme (Onishi, Chambers, & Fisher, 2001).
            In relation to the liquid consonants /l/ and /r/, there are only certain phoneme combinations that can be followed by the sound /l/ or /r/ in English.  With /r/, only the following phoneme combinations can exist using /r/ as the post initial syllable: /fr/, /gr/, /pr/, /kr/, /br/, /tr/, /dr/, /ʃr/, /θr/, /spr/, /skr/ and /str/ (Roach, 2009).
            With /l/ as the post initial syllable, only the following sound combinations can occur: /fl/, /gl/, /pl/, /kl/, /bl/, /spl/, /skl/ and /sl/ (Roach, 2009). 
            As is evident from the list above, there are six phoneme combinations in which either an “r” or an “l” is permitted, but not both: /sl/, /tr/, /dr/, /ʃr/, /θr/and /str/ are permitted phoneme combinations in English, but not /sr/, /tl/, /dl/, /ʃl/, /θl/, or /stl/. 
            The exception to this is when the consonant cluster occurs across a syllable boundary.  In this case phoneme combinations which are phonotactically impossible as word initial sounds can be permissible if they appear across a syllable, such as the /tl/ sequence sound in “Atlantic” or the /dl/ sequence in “seedling.”
            In cases such as the word “Atlantic” it is even evident that the phonotactics of English decide how the word is divided into separate syllables.  Although it would have been physically possible from a purely articulatlory standpoint to put the /tl/ sound as an onset for the second syllable, the rules of English phonotactics dictate that the syllable is divided between the /t/ and /l/ (Pitt, 1998).
            Native speakers are never explicitly taught the phonotactic rules of their native language, and yet they learn this knowledge simply through repeated exposure.  It appears to be learned as early as nine months, based on studies which show that infants will listen longer to phonotactically legal syllables (Jusczyk, Friederici, Wessles, Svenkerud, & Jusczyk, 1993).
            With native speakers, furthermore, it has been shown that intuitive phonotactic knowledge plays a part in perception.  For example in one study native speakers of English were asked to categorize sounds along an /r/-/l/ continuum as either /r/ or /l/.  Each of these steps was presented as the second consonant of a two-consonant cluster.  Listeners showed a bias towards categorizing the sounds in favor of phonotactically legal sequences.  For example sounds similar to /l/ were indentified as /r/ when preceded by a /t/ sound, but not when preceded by an /s/.  Although this was not true of every phonotactic sequence.  The consonant /d/ had a minimal effect on perception of /r/ versus /l/, even though it was phonotactically legal in the case of /r/ and illegal in the case of /l/.  It is thought that perhaps this was because that the /tr/ combination is less frequent than the /dr/ combination (Massaro & Cohen, 1983.)
            The fact that permissible and impermissible phonotactic sequences influenced native speaker perception of /r/ and /l/ was also further verified in a study by Pitt in 1998.
In second language learning, phonotactic research on /r/ and /l/ has focused on the transfer effect from the native language with learners that already have the /r/ and /l/ contrast in their native language (for example Altenberg, 2005).
However no previous studies were found testing phonotactic knowledge of /r/ and /l/ consonant clusters among Japanese speakers, or other speakers who did not have an /r/ and /l/ contrast in their native language.

Chapter 2. This study
            The purpose of this present study is to examine the different perception abilities between Japanese people studying English in Japan, and Japanese people living in an English speaking country.  This study will test a group of Japanese speakers currently studying English in Japan, and a group of Japanese speakers currently living in Melbourne, Australia, and compare their results.  As such, this will be one of the few studies that uses Australian English as a stimuli to test Japanese perception of /r/ and /l/ sounds, since most previous published studies have used American English as the stimuli.  (Although there are some notable exceptions, such as Ingram et al., 1998, which was based in Australia and have used Australian English.)
This study will test the participants on three separate categories: identification of /r/ and /l/ sounds, discrimination of same or different sounds using /r/ and /l/ phonemes, and identification of correct or incorrect English phonotactics using /r/ and /l/.
This study will also focus on single Japanese adults who are abroad on either student or working holiday visas, in contrast to previous studies which have concentrated on Japanese families living abroad (Aoyama et al., 2003).

Chapter 3. Research hypothesizes
              The following hypothesizes can be made concerning the test results:
1.        Based on previous research, it is hypothesized that participants living in Japan will have a perception of /r/ and /l/ sounds that is about equivalent to chance.
2.        Based on the research which indicates natural exposure to English increases perception of /r/ and /l/ sounds, it is hypothesized that Japanese people living in an English speaking country will have increased perception of /r/ and /l/ sounds across all categories.
3.        Based on the same research, it is hypothesized that among Japanese people living in English speaking countries perception accuracy will correlate positively with length of residence.
4.        Based on the research which indicates that consonant clusters containing liquid consonants are always perceived at a lower rate by Japanese listeners than singletons, it is hypothesized that identification of consonant clusters will be lower for both groups of participants.
5.        Based on the assumption that indentifying /r/ and /l/ sounds, discrimintation amongst /r/ and /l/ sounds, and perceiving phonotactic clusters using /r/ and /l/ sounds all make use of overlapping perceptual cognitive abilities, it is hypothesized that there will be a correlation between different tasks.

Chapter 4. Methodology
4.1 Participants
            All the participants in this study were native Japanese speakers learning English as a second language.  The participants were selected from two different groups.  One group was living in Melbourne, Australia, in which they had a high degree of naturalistic exposure to English.  The other group was studying English in Japan at an English conversation school, at which they received approximately 45 minutes of English instruction per week from a native speaker of English.  All of the participants had also completed at least 6 years of English study in the Japanese school system.

4.1.1 Participants residing in Melbourne

Table 1: Total Months Living in an English Speaking Country
Participant Number
Standard Deviation
months in an English speaking country

Table 1
            All the participants were between the ages of 18 years and 35.  They were all living in Melbourne either on student visas or on working holidays. 
None of the participants had family residing in Melbourne, and so while living in Melbourne they had to seek social interaction outside of their family, presumably giving them amble opportunity for English input (unlike some previous studies of Japanese adults living abroad which focused on Japanese adults living with their families (Aoyama et al., 2003)).  However the extent to which they actually interacted with native English speakers, and the extent to which they stayed within the Japanese expatriate community can not be adequately determined.
            As shown in table 1, the average length of residence in an English speaking country (including countries outside of Australia that participants may have previously lived in, such as Canada or the United States) was 13.5 months with a maximum of 36 and a minimum of one month.  The standard deviation of 10.5 shows that within these participants there is a high degree of variation in their length of residence.

4.1.2 Participants residing in Japan

Table 2: Years in English Conversation School
Std. Deviation
Years in English Conversation school

Table 2

             20 participants were tested in Japan.  Of these 20, three participants indicated on a questionnaire that they had spent time living in an English speaking country, and their data was excluded, resulting in 17 remaining participants from the Japan group.  These 17 participants were all drawn from an English conversation school, and so had an active interest in learning and studying English, but had never lived in an English speaking country.  The age varied widely from 18 to over 60.  As shown in table 2, the average amount of time spent studying at an English conversation school was 3.1 years with a maximum of 13 years and a minimum of 0.5 years.  (Two participants, although students at the Conversation school, did not enter in data for the number of years spent studying at an English conversation school.)
4.2 Tests
4.2.1 Listening test one Listening test one materials
            The first listening test consisted of 24 word pairs.  All the data for the listening test was recorded in a sound studio by a trained phonetician, who was a native speaker of Australian English.  (See Appendix II for the complete list of words used.)
Because Australian English is a nonrhotic variety (meaning /r/ phonemes are not pronounced before consonants or in word final positions) there is no /l-r/ contrast in the coda position in Australian English (Ingram et al., 1998.)  Therefore all of the stimuli for this test consisted of /r/ and /l/ phonemes in word initial position. 
Twelve of these pairs consisted of either an /r/ or an /l/ as the single initial sound, across a variety of vowel sounds.  The remaining twelve words began with a consonant cluster containing either /r/ or /l/.  The consonant clusters began with /f/, /k/, /g/, /b/, /p/ and /sp/, followed by either an /r/ or an /l/.  These particular consonants were chosen because all of these consonants can be followed by both /r/ and /l/ sounds in English phonotactics, thus it was possible to create real word minimal pairs using theses consonants in combination with /r/ and /l/.  Of the six consonants selected, two word pairs were created with each.  All the words used were single syllable words.  Table 3 summarizes the various words used.
Table 3
Minimal Pairs (Total 24)
Singletons (Total 12)
/r/ and /l/ (x12)
Consonant Clusters (Total 12)
/fr/ and /fl/ (x2)
/kr/ and /kl/ (x2)
/br/ and /bl/ (x2)
/gr/ and /gl/ (x2)
/pr/ and /pl/ (x2)
/spr/ and /spl/ (x2)

Table 3
Twelve of the 24 word pairs consisted of the same word repeated twice.  The remaining twelve words consisted of minimal pairs in which only the /r/ or /l/ sound differed. 
            There were four possible different sound patterns for these word pairs: “/r/-/l/”, “/l/-/r/”, “/r/-/r/”, and “/l/-/l/”.  The word pairs were divided evenly into these four:  six “/l/-/r/” pairs, six “/r/-/l/” pairs, six “/l/-/l/” pairs and six “/r/-/r/” pairs.  Table 4 summarizes these four different patterns.
Table 4
/r/-/l/ (x6)
/l/-/r/ (x6)
/l/-/l/ (x6)
/r/-/r/ (x6)
Table 4
All of the words used in this section were real words that could be found in standard English dictionaries.   In cases where minimal pairs were used, both words were real English words.  In cases where the same sound was repeated twice (“/r/-/r/” and “/l/-/l/” patterns) the corresponding minimal pair word, even though not used, would have been a real English word.  Wherever possible high frequency words were chosen over low frequency words. 
Once all the word pairs had been recorded, the order of the pairs was randomized. Task One: Identification of /r/ and /l/ Sounds
The first task was a standard identification task of /r/ and /l/ phonemes. This is a standard identification task that has been used by almost all researchers investigating Japanese identification of /r/ and /l/ over the past forty years (see section 1.2.2).
Each word pair was played separately for the participants.  Participants were instructed to listen to each word pair, and then depending on what sound they perceived to write down either an “r” or an “l”.  Since each word pair consisted of two words, participants had to identify two words for every pair number for a total of 48 different identifications.  For example participants were played a sound recording of the words “right” and “light”, and would have to write down the appropriate consonant for each word.  A sample of the first task can be seen in table 5.
Table 5: Task One: Example
Please listen to the following word pairs.  For each word, write down whether you hear an r or an l sound.
     Word 1                                                   Word 2               
1.   _________                                        __________                   
2.   _________                                        __________                    
Table 5 Task Two: Identification of Same or Different sounds
            After labeling each word in a word pair, participants were asked to circle an “s” if they had thought they had heard the same word twice, and a “d” if they thought they had heard two different words.
            As in Goto’s test (1971), participants were specifically told that the scores for this column would be marked separately from the rest of the test, so that they could mark this column regardless of what labels they had given to each individual word.
            Participants had to choose either “same” or “different” once for each word pair, resulting in a total of 24 same or different identifications.  For example, if participants heard the words “right” and “light”, then after identifying each word for as an “r” or “l” they would circle either “s” for same or “d” for different.  A sample of this is included in table 6.
Table 6: Task Two Example
Please listen to the following word pairs.    For each word, write down whether you hear an r or an l sound.  Then for each pair circle whether the words you hear are the same or different.  Sometimes the words will be the same, sometimes the words will be different. 
Word 1                          Word 2                Same or Different    
1.   _________            __________                       S/D                  
2.   _________            __________                        S/D                
Table 6
            This second task is drawn directly from Goto (1971), who determined that labeling sounds was only part of perception testing. If Japanese participants could tell the difference between same and different sounds, Goto reasoned, this reflects at ability to discriminate even if the /r/ and /l/ labels are incorrectly applied. 
            Since Goto’s original study, many subsequent studies have been done involving Japanese discrimination between /r/ and /l/ sounds.  Although Goto’s participants showed poor discrimination, other studies have shown that Japanese speakers can often discriminate sounds that they are unable to identify (Mann, 1986).
            Since identification involves not only accurately perceiving a sound, but matching it to a pre-existing linguistic category, it has been hypothesized that identification tasks are more likely to reflect a listener’s linguistic experience and language learning than a discrimination task, and a discrimination test would simply involve auditory perception (Ingram et al., 1998).
4.2.2 Listening test two Materials
            The second listening test consisted of word pairs containing possible and impossible consonant clusters containing “r” and “l” sounds according to the rules of English phonotactics.
            Using either “r” or “l” as the second phoneme, five possible consonant clusters (/tr/, /dr/, /θr/, /str/, and /ʃr/) were paired against their impossible counterparts (/tl/, /dl/, /θl/, /stl/, and /ʃl./). 
            (The contrast between /sl/ and /sr/ was initially recorded for testing, but not used because of the difficulty in recording an authentic sounding /sr/ sound that would not be perceived as /ʃr/.)
 Two word pairs were used with each of the five possible/impossible consonant clusters for a total of ten word pairs.  Each word pair consisted of one real English word paired against an unreal word which differed phonetically only in the /r/ or /l/ sound.  All words contained the /r/ or /l/ consonant cluster as the initial syllable. Examples of the word pairs used can be seen in table 7.
Table 7
Minimal Pairs (Total 10)
/dr/-/dl/ (x2)
/str/-/stl/ (x2)
/ʃr/-/ʃl/ (x2)
/θr/-/θl/ (x2)
Table 7

These word pairs were divided into two five “real-unreal” patterns and five “unreal-real” patterns and then randomized.  An example of this pattern is shown in table 8.
Table 8
Real—Unreal (x5)
Unreal—Real (x5)
Table 8
            Because this test contained several nonsense words using phonotactic sequences not permitted in normal English, there was a concern that they be produced in a manner not unnatural or overly affected.  In an attempt to guard against this, a trained phonetician was selected to record the test tokens. As with the first test, all the words were recorded in a sound studio at the University of Melbourne by a trained phonetician native speaker of Australian English.  (This was the same phonetician that was used for test one.)
            Despite this precaution, it is perhaps difficult even for a trained phonetician to produce unnatural sounds in a natural sounding way.  This maybe a limitation of this task, and it is further discussed in the “Limitations” section. Test Two Tasks
            There were two tasks for this section.  One consisted of a listening identification task, the other was a short post-test task to account for lexical familiarity. Listening identification task
Participants were played each word pair separately.  Participants were asked to listen to each word pair, and select which of the two words was a real English word. For example participants would hear sound recordings of the words “three” and “thlee” and would put an “x” in the column they was the real word.  A sample of this task can be seen in table 8.
Table 9: Task Example
Please listen to the following word pairs.  Both words will be different.  Each word pair will contain one real English word, and one unreal English word.  Put an X symbol in the column next to the real English word.  If you don’t recognize either of the words, mark the word that sounds more natural in English.
       Word 1                                             Word 2                                       
1.   _________                                        __________                                          
2.   _________                                        __________                                       
Table 9
The justification for this task does not come directly from the literature, but rather is based on assumptions of what task will best test the participants phonotactic knowledge.  As mentioned in section, no previous studies have been published testing Japanese sensitivity to English phonotactics using /r/ and /l/ in consonant clusters. 
Previous phonotactic perception research has usually used identification tasks, but has focused on either native speakers (such as Pitt, 1998, and Massaro et al., 1983) or has focused on the transfer effect from language learners who already have an /r/ and /l/ contrast in their native language (such as Altenberg, 2005) (see section
Because Japanese speakers rarely score much above chance on /r/ and /l/ identification tests in general (Goto, 1971), it was thought that adding phonotactic /r/ and /l/ identification tests would not increase identification and not produce any new results.  That is, it was thought if a Japanese person cannot correctly label /r/ and /l/ sounds above chance, and cannot correctly label /gl/ and /gr/ sounds above chance, then it is unlikely they will be able to label /tr/ and /tl/ sounds above chance.  
(It is possible that this way of thinking may have too quickly dismissed the potential value in testing to see if legal and illegal phonotactics changes perception.   See the Limitations section (7.2.2) for more discussion.)
Accuracy in identification relies on the participants not only accurately perceiving phonemes, but also matching that perception to an external category.  It is possible with learners of English, however, that some participants are occasionally able to perceive sounds even if they mislabel them.  Therefore a Japanese speaker who can not accurately label the /r/ and /l/ sounds in a /tr/ and /tl/ minimal pair might still be able to recognize that one constitutes a natural sounding English consonant cluster, and one does not, particularly if they have a high level of English exposure.  Therefore the decision was made to have the participants simply select the correct word rather than label /r/ and /l/ phonemes for this section. Post perception task
After completing both listening tests, participants were given a sheet of paper containing the words from the second listening test.  (Only the real words from this listening test were listed.)  Participants were then asked to circle any words that they did not know.  For an example, see table 10.
Table 10: Post perception task example
Here are the words from the listening test.  Please circle any words you don’t recognize.

1.   Train
2.   Trend
3.  Strong
Table 10
The justification for this post perception task comes from information gathered during the pilot test.  In the initial stages, this test was pilot tested on native speakers of Mandarin. (This is another language group that has trouble distinguishing between /r/ and /l/ phonemes (Chen & Fon, 2007).)
During the pilot test, participants claimed that whether or not they were familiar with a certain word directly effected their perception of whether or not this word was being pronounced accurately.
Therefore in order to account for lexical familiarity as an additional variable, this short post test task was created.
4.3 Statistical Analyses
            All the statistical analyses were calculated using SPSS.  Whenever comparing average scores between the two residence groups, independent t-tests were used.  Whenever comparing the same participants on different scores, the data was file was split into two different groups by residence, and two separate dependent samples t-test (or paired samples t-test, as it is called in the SPSS menu) were run. 
            Correlations were also run by splitting the data file by residence, and then calculation for each group separately.  Since all of the data used was interval data, all the correlations run used Pearson. 
            A more thorough description of which statistical procedures were used to determine which results is given in the results section following.

Chapter 5.  Results
The results ware presented in various sections.  Each section will look first at the group in Melbourne, than at the group in Japan, and then where appropriate compare the two groups.
The first section is the average correct for the identification task as a whole (5.1.1).  The identification task is then divided into the singleton tokens, which are examined in their own subsection (5.1.2) and the consonant cluster tokens, which are also examined in their own subsection (5.1.3).  The next section compares singletons and consonant cluster average scores (5.1.4).  This is followed by an analysis of the discrimination task (5.2), and the phonotactic task (5.3).  Section 5.4 looks at the effect of known versus unknown words on the phonotactic task, section 5.5 looks at correlations between tasks, and section 5.6 looks at the correlation between accuracy on all tasks and the participants length of residence.
5.1. Listening test 1, Task 1: Identification of /r/ and /l/
5.1.1 Total scores
Using SPSS, mean averages were calculated for each group.  The mean average was then divided against the total number of possible correct answers for an accuracy percentage.  Independent t-tests were also conducted to see if there were any significant differences between groups.  The results are summarized in table 11.
Table 11: Total Correct for the First Listening Test (out of 48)
Std. Deviation

Total correct

Total correct

Table 11
            Out of a total possible score of 48, participants in Melbourne scored an average of 27.8, with a minimum of 20 correct, and a maximum of 35 correct, resulting in an accuracy of 57.9%.
            Participants in Japan scored an average of 27.5, with a minimum of 22 correct and a maximum of 33.  The accuracy percentage was calculated for as 57.3%, also slightly more than chance.
            The two means are highly similar, and a t-test confirmed there was no significant between the two groups (p>0.05).  The fact that the participants in Japan scored at about chance supports the first hypothesis, however the fact that identification does not appear to be greater for the group in Melbourne is contrary to the predictions made in the second hypothesis.

5.1.2 Singleton scores
            This section looks only at the minimal pairs containing singletons, a subsection of the total score for test 1.  Using SPSS, the mean accuracy scores for singleton words were calculated.  An accuracy percentage was calculated by dividing the mean score by the number of total possible correct answers.  A t-test was also conducted to test to see if there were any significant differences between the two groups.  The results are summarized in Table 12.

Table 12: Number of Singleton Words Correct (out of 24)
Std. Deviation
Singletons correct

Singletons correct

Table 12
            Participants in Melbourne averaged 15.95 correct out of 24 or 66.5% accuracy, with a minimum of 10 and a maximum of 24.  As a percentage, the accuracy rate was slightly higher for singleton /r/ and /l/ identification than for the total score (66.5% compared to 57.9%).
            It is worth noting that the maximum score, 24, represents a score of complete accuracy for the singleton subsection.  This was achieved by only one participant.  This participant may represent one of the statistical anomalies mentioned by Underbakke et al. (1988) although the same participant did not do as well on the consonant cluster sub-section, scoring 11 out of 24.
            Participants living in Japan averaged 14.6 correct out of 24, or 60.8%, with a minimum of 9 and a maximum of 19.  Percentage wise, this section was also higher than their total score (60.8% compared to 57.3%).
            An independent t-test showed no significant difference between the two groups.  This was again contrary to the second hypothesis.

5.1.3. Consonant cluster scores
            For minimal pairs containing consonant clusters, a subset of the scores for test one, mean scores were calculated using SPSS.  These mean scores were then divided against the totals number of correct possible answers for an accuracy percentage. 
            After the total consonant cluster scores were calculated, the consonant clusters were sorted by the initial consonant variable, and the individual means of each initial consonant were calculated.  The consonant clusters were then arranged in order from highest to lowest in terms of correct identification by participants.  Paired samples t-tests were then conducted to determine if the differences between consonant clusters reached statistical significance.
To determine if the average consonant cluster accuracy differed significantly, dependent t-tests were conducted.
            Independent t-tests were run to determine if the group of participants in Melbourne differed significantly from the group in Japan.  The results are summarized in table 13.  (Singleton and consonant cluster scores will be compared in section 5.1.4).

Table 13: Average Number of Consonant Cluster Words Correct (out of 24)
Std. Deviation
Consonant clusters correct

Consonant clusters correct

Table 13
            Melbourne participants averaged 11.8 out of 24, resulting in an accuracy percentage of 49.2%, or about what would be expected on pure chance alone.  As an accuracy percentage, consonant clusters were lower than the total score (49.2% compared to 57.9%).  There was a minimum of 7, and a maximum of 19.

Table 14: Average Consonant Clusters Correct (out of 4)
Melbourne Group
Std. Deviation

/gl/ or /gr/
/pl/ or /pr/
/fl/ or /fr/
/bl/ or /br/
/spl/ or /spr/
/kl/ or /kr/

Table 14
            Within the consonant clusters, average accuracy varied depending on the initial consonant, resulting in the following order from highest accuracy to lowest (listed in terms of initial consonant) /g/ > /p/ > /f/ > /b/ > /sp/ > /k/, as shown in table 14.
            To determine whether these differences reached statistical significance, several paired samples t-tests were run using SPSS.  The results showed that some differences were significant.  The consonant cluster with the highest degree of accuracy, /gl-gr/ was significantly higher than every other consonant except /pl-pr/.  The consonant cluster identified with the lowest degree of accuracy, /kl-kr/, was significantly lower than every other consonant except /fr-fl/. 
            However many of the other consonant clusters did not differ significantly from each other.  For example /fr-fl/ did not differ significantly from /br-bl/and /br-bl/ did not differ signification from /spr-spl/ (p>0.05).
Participants in Japan averaged 12.9 out of 24, resulting in an accuracy percentage of 53.8%.  As an accuracy percentage, consonant clusters were lower than the total score, but not by much (53.8% compared to 57.3%).  There was a minimum of 9, and a maximum of 16.

Table 15: Average Consonant Clusters Correct (out of 4)
Japan Group
Std. Deviation

/pl/ -/pr/
/fl/ - /fr/
/gl/ - /gr/
/spl/ - /spr/
/kl/ or /kr/
/bl/ or /br/

Table 15
            Within the consonant clusters, average accuracy varied depending on the initial consonant, resulting in the following order from highest accuracy to lowest (listed in terms of initial consonant) /p/ > /f/ > /g/ > /sp/ > /k/ > /b/, as shown in table 15.
            Paired samples t-tests were run to determine if any of these differences were significant.  The results showed that the consonant clusters identified with the highest accuracy (/pl/-/pr/) differed significantly from those identified with the lowest accuracy (/bl/-/br/).
            An independent t-test between the two groups confirmed that on the total number of consonant clusters there was no significant difference between the two groups.  Despite expectations that living in the target community would increase perception, descriptively, the group living in Japan actually scored slightly higher than the group living in Melbourne (12.9 as opposed to 11.8).  However this difference was not statistically significant (p>0.05).
            Tests on individual consonant clusters showed that only words containing the initial consonant cluster /k/ differed significantly, the group in Melbourne scoring an average of 1.8, which was significantly higher than the group in Japan’s score of 1.2 (t=2.114, df =37, p<0 .05="" span="" style="mso-spacerun: yes;">  None of the other consonant clusters showed any significant difference between groups.

5.1.4 Comparisons between singletons and consonant clusters
            To determine if the difference between singleton and consonant cluster scores reached statistical significance, a paired samples t-test was run for both the group in Melbourne and the group in Japan. 
            For the group in Melbourne, the results showed that singleton tokens were identified at a higher rate than consonant clusters (15.95 average for singletons as opposed to 11.8 average for consonant clusters) which reached statistical significance (t=4.411, df=21, p<0 .05="" span="">
            It is worth noting here that this is the average difference, and was not true across the board on an individual level.  Four of the Melbourne participants actually scored more accurately on consonant clusters, and two participants had the same score for singletons and consonant clusters.
            Participants in Japan followed a similar pattern.  On average singleton tokens were identified correctly at a higher rate than consonant clusters (14.6 compared to 12.9).  A paired samples t-test confirmed that this difference reached significance (t=2.562df=16, P<0 .05="" span="" style="mso-spacerun: yes;"> 

            Once again, it is worth noting that this was not true of every individual.  Of the 17 participants, 3 actually scored higher on the consonant clusters, and 1 had the same score for both singletons and clusters.
            On average however, the results of both groups confirmed hypothesis 4 that perception of consonant clusters would be lower for both groups.
5.2 Listening test one, task two: accuracy in identifying same or different sounds (discrimination task)
            To determine the accuracy with which participants identified sounds as the same or different, the total number of correct answers for each participant was totaled and then entered into SPSS.  Descriptive statistics analyses were run on SPSS to determine the average number correct for each group.  The results are summarized in table 16.

            Table 16: Discrimination Scores: Same or Different Total (out of 24)
Same or different Scores
Std. Deviation
Total Scores

   Singleton Scores (out of 12)                                                         
Consonant Cluster Scores (out of 12)
Consonant Clusters
Table 16
            Because this task involves discriminating between two tokens used for test one, the number of questions for this task equals half the number of questions from the identification task.  Therefore the mean scores between tasks are not directly comparable.  However, as a descriptive statistic, it is possible to compare accuracy percentages between tasks.  To determine accuracy percentages, the number of correct answers was divided against the total number of questions.
            To determine if there was any significant difference between the identification of singletons and consonant clusters, a paired samples t-test was run for each group.
            Independent t-tests were then run on the discrimination tests (on the total scores, as well as the singleton and consonant cluster sub-sets) to determine if there was any significant difference between the groups.
            For participants in Melbourne, the mean accuracy for identifying same or different sounds was 17.7 out of 24(with a minimum score of 13 and a maximum of 21).  The accuracy percentage for identifying same or different sounds was 70.8%, which was higher than the 57.9% total score from task 1.  As predicted in the literature (Mann, 1986), this indicates that Japanese speakers can occasionally recognize that there is a difference between /r/ and /l/ sounds at points even when they mislabel which is which.
            The average accuracy for discriminating same or different singletons was 9.2, or 76.7%.  (As a percent, this was higher than the singleton task one identification task of 66.5%).  Same or different consonant clusters average accuracy was 8.5, or 70.8%.  (This was higher than the 49.2% for consonant clusters identification task.)
            Interestingly enough, despite the fact that participants labeled singleton /r/ and /l/ sounds more accurately than consonant clusters in task one, this division was not reflected in recognizing same or different sounds.  The average accuracy of 9.2 for singletons and 8.5 for consonant clusters was similar, and a paired samples t-test confirmed that the differences were not significant (p>0.05).
            Furthermore, there was one participant who achieved perfect discrimination on the singleton subset, and one participant who achieved perfect discrimination on the consonant cluster subset, although no participant achieved perfect scores on the total.
            The mean total accuracy for participants living in Japan was 13.7 out of 24, which was 57.1% accuracy.  In the case of the Japan participants, this was roughly equal with their identification of /r/ and/l/ sounds on task one (57.3%).
            The average accuracy for discriminating singletons was 7.9 out of 12, or 65.8% (which was slightly higher than the 60.8% for the identification task).  The consonant cluster discriminating accuracy was 5.8 out of 12, or 48.3% (as opposed to 53.8% for the identification task). 
            Percentage wise, the Japanese group did not follow the same pattern as the group living in Melbourne.  The discrimination task was not more accurate than the identification task as a total (57.1% compared to 57.3%), and with consonant clusters discrimination tasks were actually less accurate than identification tasks (48.3% compared to 53.8%).  
            Moreover, although the Melbourne group did not show any significant difference between singletons and consonant clusters in the discrimination task, a paired samples t-test confirmed that the participants in Japan discriminated singleton pairs significantly more accurately than consonant clusters (average 7.9 as opposed to 5.8) (t=3.380, df=16, p<0 .05="" span="">
            The group living in Melbourne discriminated among same or different sounds with greater accuracy than the group in Japan.  This was true of the total score (17.7 to 13.7) as well as the singleton subset (9.2 to 7.9) and the consonant cluster subset (8.5 to 5.2).  Independent t-tests confirmed that the difference was statistically significant in all 3 cases: total (t=1.212, df=37, p<0 .05="" and="" clusters="" consonant="" df="37," p="" singleton="" span="" t="4.993,">
            This result confirms that discrimination is better among the Melbourne group.  Although perception is not greater for the group in Melbourne across all tasks (as was hypothesized in hypothesis 2) this result at least confirms that discrimination is more accurate for the Melbourne group.
5.3 Test two: Accuracy in identifying correct English phonotactics
            The second test dealt with identifying real and unreal words using English phonotactics.
            Using SPSS, the mean average of correct real words identified for each group was calculated.  This was then converted into an accuracy percentage by dividing it against the total number of possible correct answers.
            Averages were then calculated for individual consonant clusters.  Paired samples t-tests were run to determine if the difference between consonant clusters reached statistical significance.
            To determine if there was any statistical significance between groups, independent t-tests were run comparing the group in Japan with the group in Melbourne.  The results are summarized in table 17.

Table 17: Identifying Real or Unreal Words (Out of 10)

Std. Deviation
Real or Unreal Words
Table 17
            For participants in Melbourne, the average score for identifying phonotactically correct English words was 7.14 out of 10, or 71.4% accuracy.
            The minimum score for this section was 2. 
            The maximum score was 10 out of 10, which was achieved by two participants.  It is notable that this is the only task in which participants achieved perfect accuracy (although there were subsets of tasks, mentioned above, in which perfect scores were achieved for the subset.)

Table 18: Phonotactic Sequences Correctly Identified as Real (Out of 2)
Std. Deviation

Table 18

            Within the different patterns, the accuracy rate from highest to lowest is as follows: /θr-θl/> /dr-dl/ > /str-stl/ > /ʃr-ʃl/ > /tr-tl/ as shown in table 18.  However paired samples t-tests showed that the average differences in accuracy between these phoneme sequences were not statistically significant (p>0.05).  There was no phonotactic sequence which was always identified as correct or incorrect by all of the participants.
            The average score for participants living in Japan was 7.41, or 74.1% accuracy.  Two of the participants achieved scores of 10 out of 10.  For the Japan group, this was the only section in which any participants achieved perfect accuracy.

Table 19: Phonotactic Sequences Correctly Identified as Real (Out of 2)
Std. Deviation


Table 19

            Within the different patterns, the accuracy rate from highest to lowest is: /θr-θl/>/dr-dl/ >/str-stl/ >/tr-tl/>/ʃr- ʃl/, as shown in table 19.
            Paired samples t-tests showed that the difference between phoneme sequences reached significance in 4 cases.   The phoneme sequence identified with the highest accuracy as a real English cluster, /θr/, differed significantly from the two lowest phoneme sequences (/tr/ and / ʃr/).  Likewise the phoneme sequence identified as a real English cluster with the lowest accuracy, /ʃr/, differed significantly from the 3 highest phoneme sequences: /θr/ (already mentioned), /str/, and /tr/, indicating that participants had the most trouble perceiving the difference between /ʃr/ and /ʃl/. As with the group in Melbourne, there was no phonotactic sequence which was always identified as correct or incorrect by all of the participants.
            Descriptively, this was another case where the group residing in Japan actually performed slightly better on average than the group in Melbourne, (7.41 compared to 7.31), despite not having the same advantages of input.  However independent t-tests showed this difference was not significant.  All of the individual phonemes were also tested, but the results showed no statistical significance.  This result contradicts hypothesis 2 that the participants in Melbourne would show superior perception across all tasks.

5.4 Test two: Known versus unknown words
            The total number, across all participants, was tallied for the following four categories: known correct words, known incorrect words, unknown correct words, and unknown incorrect words.  Accuracy percentages were then calculated for known correct and unknown correct words.  The results are summarized in table 20.
Table 20                      Correct                                   Incorrect        Total
137 (72%)
53 (28%)
19 (63%)
11 (37%)
Table 20
            Accuracy percentages for known and unknown words were also calculated for every participant on an individual level, and then averaged on SPSS.  A paired samples t-test was run to compare mean accuracy percentages to test for statistical significance.  The results are summarized in table 21.
Table 21
Std. Deviation
Known Correct Percentage
Unknown Correct Percentage
Table 21

            Participants recognized most of the words used in the phonotactic test.  In fact there were very few words that participants did not know.  Four participants recognized all ten words.  Nine participants recognized nine words.  Seven participants recognized eight words.  One participant recognized seven words, and one participant recognized six words.  There were no participants who recognized less than six words. 
            The paired samples t-test showed that there was no significant difference in accuracy between known and unknown words (p>0.05).
            Despite the subjective feeling of pilot test participants that lexical knowledge affected accuracy, this was not born out in the actual results.  As a whole, participants in Melbourne accurately identified correct phonotactics in 137 out of 190 words that they knew (72.1%) and 19 out of 30 words that they did not know (63.3%). 
            Unfortunately, this post-test task was not given to participants in Japan, so it is not possible to compare groups on this aspect.

5.5 Correlations between tasks
            Using SPSS, correlations were run to determine if there was any significant correlations between the various tasks.  Subgroups within tasks were also correlated.  The results are summarized in table 22.

Table 22: Correlations between tasks

Identification of /r/ and /l/ Sounds
Discrimination of Same and Different /r/ and /l/ Sounds
Identification of Phonotactics
Identification of /r/ and /l/ Sounds
Discrimination of Same and Different /r/ and /l/ Sounds

total identification
consonant cluster identification
total discrimination
0.524, p<0 .05="" span="">
consonant cluster
Identification of Phonotactics
Table 22(“X” represents where a correlation was not run because the variable would have been correlated with itself.  “--“ represents where a correlation value is not given because the value has already been given in another column.  The value of correlations which did not reach statistical significance are not given in this table, and instead only “p>0.05” is written.)
            An analysis of the data showed that the only statistically significant correlation that emerged was between the singleton /r/ and /l/ identification and the total number of same and different discriminations (0.524, p<0 .05="" span="" style="mso-spacerun: yes;"> 
            The results for the group in Japan are summarized in table 23.

Table 23: Correlations between tasks
Japan Participants
Identification of /r/ and /l/ Sounds
Discrimination of Same and Different /r/ and /l/ Sounds
Identification of Phonotactics
Identification of /r/ and /l/ Sounds
Discrimination of Same and Different /r/ and /l/ Sounds

total identification
singleton identification
consonant cluster identification
total discrimination
0.655, p<0 .05="" span="">
0.627, p<0 .05="" span="">
singleton discrimination
0.679, p<0 .05="" span="">
0.644, p<0 .05="" span="">
consonant cluster discrimination
Identification of Phonotactics
Table 23(“X” represents where a correlation was not run because the variable would have been correlated with itself.  “--“ represents where a correlation value is not given because the value has already been given in another column.  The value of correlations which did not reach statistical significance are not given in this table, and instead only “p>0.05” is written.)

            The group in Japan showed significant positive correlations between total number of correct /r/ and /l/ identifications, and the total number of same and different discriminations (0.655, p<0 .05="" span="" style="mso-spacerun: yes;">  Various subsets within tasks also correlated significantly: consonant cluster identification with total discrimination (0.627, p<0 .05="" and="" discrimination="" identification="" p="" singleton="" span="" total="" with="">
            The results show partial support for hypothesis 5 (the hypothesis that different tasks would correlate positively.)  It appears that there is some correlation between identification and discrimination.  There was no correlation between identification of phonotactics, and other tasks, however.  This is possibly because of task effect, and will be addressed in the discussion section (6.4).

5.6. Correlations between perception accuracy and length of residence (Melbourne group only)
            Several correlations were run on SPSS, and the “length of residency” variable was correlated with all the test scores, and several subsets of variables within each test.  For example, in the /r/ and /l/ identification task, length of residency was correlated with the total accuracy, singleton accuracy, consonant cluster accuracy, and accuracy for each of the various consonant cluster patterns.  For the discrimination task, length of residency was correlated with total accuracy, singleton accuracy, consonant cluster accuracy, and accuracy across the various consonant cluster patterns.  For the phonotactic tasks, length of residence was correlated with total accuracy in identifying correct English phonotactics, and the accuracy for each of the various phonotactic patterns.
            Length of residence was positively correlated with accuracy results on perception across all categories, but it only reached statistical significance for three categories:
1) identifying total /r/ and /l/ sounds in the first task (0.445, p< 0.05), (id est participants who had been the longest in Melbourne longest had higher scores).
2) identifying singleton /r/ and /l/ sounds in the first task (0.574, p<0 .05="" span="">
3). Identifying consonant clusters using /g/ in the first task (0.442, p<0 .05="" span="">
            These results, although not true of every task, show a partial validation of hypothesis 3, which hypothesized that length of residence would correlate positively with perception.

Chapter 6. Discussions
            Looking at the results of the data, it is possible to see the following patterns emerging:
1) On average, singletons are identified at a rate higher than consonant clusters,
2). On average, participants identify correct and incorrect phonotactics using /r/ and /l/ sounds at a higher rate than they can identify /r/ and /l/ phonemes,
3). Based on group averages, there appears to be no advantage for participants living in Melbourne in terms of identification tasks, but there is an advantage on discrimination tasks.  However based on correlation coefficients, it appears that length of residence correlates positively with identification tasks.
4). Identification abilities and discrimination abilities appear to correlate with each other
            These trends will be discussed in more detail in the sections that follow.

6.1 Singletons versus consonant clusters
6.1.1 Identification task
            It was hypothesized that consonant clusters would show the lowest rate of identification among both groups, as has been shown in every previous study.  On average, this was true among both groups, and for both groups it was lower than singleton identification at a level that reached statistical significance.  (However, as was previously noted in the results section (5.1.4) on an individual level not all participants followed the expected pattern.)
            Looking at broad averages, it was interesting to see which consonant clusters were correctly identified.
            Although the exact sequence of consonant cluster identification was not the same for both groups, there were some similarities.  In both groups, clusters starting with /p/, /f/ and /g/ were identified correctly at the highest rate, and /k/, /b/ and /sp/ were the lowest.
            In terms of the velar consonants (/g/ and /k/), in both groups /r/ and /l/ sounds were correctly identified with the voiced velar /g/ at a higher rate than the unvoiced velar /k/.  In fact for the group residing in Melbourne, consonant clusters with /g/ had the highest rate of identification, and consonant clusters with /k/ had the lowest rate of identification, with t-tests showing the difference was significant.
            Both the liquid consonants /r/ and /l/ are voiced in singleton contexts and when adjacent to a phonemically voiced stop.  Therefore they may be easier to perceive when paired with another voiced consonant.  However, when paired with an unvoiced consonant, the /r/ and /l/ phonemes have devoicing, which may have made them more difficult to perceive.
            However this theory is complicated by looking at the case of bilabial plosives (/p/ and /b/), in which the opposite pattern occurs.  In both groups /r/ and /l/ sounds with the unvoiced /p/ are identified at a higher rate than the voiced /b/.  If fact for the group residing in Japan, consonant clusters with /p/ had the highest rate of identification, and consonant clusters with /b/ had the lowest rate of identification, with the difference being statistically significant.

6.1.2 Singletons versus consonant clusters: Discrimination task
            The group living in Japan discriminated singletons significantly more accurately than consonant clusters.  The group in Melbourne showed almost no difference in discrimination between singletons and consonant clusters (and had significantly higher scores than the group in Japan.)  This perhaps indicates that the increased input from living in the target country causes consonant clusters to be discriminated at the same rate of accuracy as singletons, but not identified with the same accuracy.

6.2 Phonotactic awareness
            Percentagewise, participants scored much better on the phonotactic awareness section than they did on any other identification section (71.4% Melbourne and 74.1% Japan). 
            Furthermore, two of the Melbourne participants and two of the Japan participants had perfect scores on the phonotactic awareness section.  This is somewhat surprising, since previous research has indicated that even intensive phonetic training, L1 Japanese speakers never achieve perfect English L1 native-like perception of /r/ and /l/ sounds (Takagi, 2002).  Although within the Melbourne group there is some indication that length of residence may affect phonotactic awareness, two sets of perfect scores also occurred in the Japan residence group.
            These results could indicate that participants have some perception of correct and incorrect phonotactic structures using /r/ and /l/ phonemes even if they do not always have the ability to accurately label /r/ and /l/.
            It could also be a result of the task effect.  Because participants were only asked to judge whether the word was correct or incorrect, this was a different task than the identification of /r/ and /l/ phonemes in the first listening test, and the results may not be directly comparable.
            Also despite using a trained phonetician to record the tokens used for this section, it is difficult for a native speaker to pronounce incorrect phonotactic sequences in a perfectly natural way.  Although the tokens were listened to by the researcher and judged to be acceptable, the possibility can not be ruled out that at some level participants may have noticed a less than natural pronunciation.
            Another possible explanation is that there were just too few test questions for the phonotactic section.  Since participants only had to answer ten questions for this section, fewer than any other section, there is a higher probability that a participant could get them all correct by chance.  A higher number of questions may have produced a different result.
            Finally, it is worth noting that although some participants did extremely well on this section, some participants did not.  Scores below chance occurred in three participants in Melbourne (scores of 2, 3, and 4 out of 10), and one participant in Japan (4 out of 10).
            On average, both groups showed a similarity on which consonant clusters were identified as correct at the highest levels.  The same pattern (/θr/> /dr/ > /str/) emerged for the top three consonant clusters, and the bottom two (/tr/ and /ʃr/) were the same for both groups.  This is interesting, because Pitt (1998) found that for native speakers /t/ was more likely to influence perception of liquid phonemes than /d/.  This would seem to reverse that pattern (although the difference between /dr/ and /tr/ was not statistically significant).
6.3 Comparisons between groups
6.3.1 Discrimination and identification tasks
            It was hypothesized that the group living in Melbourne would have more accurate perception of /l/ and /r/ contrasts than the group living in Japan because of the greater advantage of input.  However this was not true for all categories.  In fact in identification tasks the group in Japan scored slightly better on two categories (identifying consonant clusters, identifying real or unreal words) although in both of these cases the differences were slight and did not reach statistical significance.
            Only in the same/different discrimination was there any statistically significant advantage for the group living in Melbourne.  The results show an advantage for the Melbourne group in discriminating both singleton pairs and consonant clusters.
            Because the discrimination task questions consisted of half the total of the identification questions, the mean average between identification and discrimination tasks are not directly comparable on a t-test.  (This is an issue further addressed in the Limitations section (7.2.1).)
            However, based on percentages, participants in Melbourne did better on the discrimination task than on the identification task, whereas participants in Japan did not.  Participants in Melbourne had an accuracy percentage 12.9% higher on the discrimination task, whereas participants in Japan had an accuracy percentage that was 0.2% lower on their discrimination.
            Based on this result, it would appear that residence in a target country creates an advantage for discrimination, but not for identification.
            One possible reason why this might be is because correct identification involves not only internal perceptual accuracy, but also matching internal perception to pre-established external categories.  It is possible that the increase of English input may have sharpened the discrimination ability of Japanese speakers living in Melbourne, but without adequate instruction about the differences between /r/ and /l/ sounds their identification abilities did not increase.
            In some previous studies, it has been hypothesized, particularly with Japanese speakers who score below chance, that they have actually randomized or switched phonetic categories in their perceptual space and may identify /r/ sounds as /l/ and vice-versa.
            It has been suggested that this could be partly a result of inadequate teaching (Nogita, 2010).  Japanese speakers may have the ability to discriminate, but have not been correctly taught which phoneme labels correspond to which sounds.
            Of the participants in Melbourne, identification below chance appeared to be more of an issue with the consonant clusters.  Two of the Melbourne participants performed below chance on singleton identification, and twelve performed below chance on consonant cluster identification.  Since the realization of a phoneme changes in a consonant cluster, it is possible Japanese speakers have not correct internal representations for /r/ and /l/ sounds as they are pronounced in clusters.

6.3.2 Length of residence variable
            A comparison of task one and task two result would indicate that the group living in the target country has an advantage for discrimination tasks, but not identification tasks.
            However, correlations with length of residence show the opposite result.  Although length of residence correlates positively with both discrimination and identification tasks, it only reaches significance with identification tasks (total scores, and singleton scores). 
            Among other reasons, this statistical anomaly is probably best explained by the fact that there were a higher number of possible correct answers for the identification tasks than for the discrimination tasks.  Because of the way the test was set up, identification and discrimination were tested at the same time, and two identification task words were used for one discrimination task.  The design of this test had been replicated from Goto (1971) who also used the same tokens for both the identification and the discrimination task.  Although this was clearly the most efficient way to accomplish both tasks, it does make it difficult to compare the results of each task.  A further study looking to compare identification versus discrimination might benefit from having a separate discrimination section that has the same number of possible correct answers as the identification section.
            Correlations between correctly indentifying English phonotactics and length of residence also did not reach statistical significance.  There are however some hints in the data that this correlation may be worth investigating further in future studies.  The only two participants in Melbourne who received perfect scores on identifying correct phonotactics (ten correct out of ten possible) were also the participants with the longest length of residence.  (One had 36 months of residence, the highest in the participant sample.  The other had 30 months of residence, equal to the second highest.)  It is possible with a larger sample size, and one that was more evenly distributed among long term and short term residents, that a statistically significant correlation may indeed result between length of residence and English phonotactic knowledge.
            To this point, it is also possible that a larger sample size and more evenly distributed participants may result in statistically significant correlations concerning identifying consonant clusters and identifying same or different phoneme categories.

6.4 Correlations between tasks
6.4.1 Phonotactic awareness task
            The phonotactic task did not correlate with either of the other two tasks.  This may have been the result of the task effect described above (section 6.2). 

6.4.2 Identification and discrimination tasks
            In both groups identification and discrimination appeared to correlate with each other on some level, although whether this was true only of singleton identification (as in the Melbourne group) or was true of the total scores (as in the Japan group) differed.  This may indicate that the perceptual abilities used in identifying sounds can also be used to discriminate between different types of sounds.

Chapter 7. Limitations
7.1 Participants
7.1. Control group
            Many studies of Japanese perception have issues of access to balanced groups of participants, and this study was no exception.
            One of the biggest problems was that there was no control group.  Both groups had more English input than the average Japanese speaker.  The group of Japanese speakers living in Melbourne obviously had the advantage of living in the target country.   However the group living in Japan attended a private English conversation school at their own expense, where they paid tuition for a weekly 45 minute English conversation lesson taught by a native speaker of English.  This level of weekly input makes them different from the average Japanese speaker living in Japan.
            The fact that they are voluntarily paying for and attending weekly English lessons from a private school means that their level of motivation to learn English is probably higher than the average Japanese speaker, and (assuming all the lessons have not gone to waste) their English level may be slightly higher than average.
            It was part of the original design for this research project to test Japanese students in Japan a public Japanese high school or University, in order to establish a baseline against which the other two groups can be measured.  Unfortunately it proved impossible to gain access to students in a public school or University setting, and this part of the project had to be abandoned. 

7.1.2 Participant number
            It was hoped to have at least 20 participants from each group for statistical reasons.  Unfortunately, fewer than the expected number of participant data arrived from Japan.  Because the decision was made to exclude the data from any participant who had lived in a foreign country, the result was that the data from only 17 participants was analyzed from the Japan group. 

7.1.3 Balanced participants
            Also because of reasons of access, it was not possible to balance out the groups as evenly as would have been hoped for.  Gender was not controlled for as a variable.  Age was also not controlled for as a variable, particularly for the group residing in Japan.  Because the group living in Melbourne was made up entirely of students and working holiday visa holders, the age range was only from 18 to 35, but the group in Japan had a much wider age variation.  A more ideal pool of participants would have had a similar age range for both groups.
            Also a better study would have balanced participants more evenly among length of residence.

7.2 Tasks
7.2.1 Discrimination task
            As has already been previously noted (section 6.3.2), using the same tokens for both the identification and the discrimination tests created problems because it meant there was only half the amount of possible correct discrimination answers, and the means scores for these tasks were not directly comparable.
            Although it would have meant doubling the listening test in length, it may have been better to do a separate discrimination task so that the same amount of discrimination questions were included as identification questions.

7.2.2 Phonotactic task
            As noted in the methodology section, the idea of doing an /r/ and /l/ identification task with possible and impossible phonotactics was dismissed as unlikely to produce interesting results, and replaced by having participants identify real and unreal words. 
            This may have been an error in judgment on the part of the researcher.  For one thing, the difference in tasks made it difficult to compare results between sections.           Secondly, since little previous work has been done on Japanese perception of phonotactics involving /r/ and /l/ sounds, it was premature to assume that a simple /r/ and /l/ identification task would yield no interesting results without first testing it.  It is possible, for example, that Japanese speakers, particularly those with sufficient exposure to English, may have been more likely to perceive phonotactic correct patterns (such as /tr/) than incorrect phonotactic patterns (such as /tl/) as had been shown with native speakers (Pitt, 1998).  (Although Pitt’s test used synthetic speech stimuli.)
            It may have been better to first try and establish if incorrect English phonotactics create any difference in perception, and then secondly to test and see if Japanese speakers are able to utilize this knowledge to distinguish between correct and incorrect phonotactics. 
            Furthermore, if the phonotactically possible/ impossible consonant clusters had been included in a simple identification task, it would have been possible to compare them with the other consonant clusters
            Also, the difficultly of recording unnatural English phonotactics in a natural sounding way is also an issue, as has been noted in previous sections.  This may be one of the reasons why previous research on phonotactics has used synthetic speech stimuli which allows direct control of different acoustic phonetic parameters (Massaro et al., 1983, Pitt, 1998).

7.3 Consistency between groups
            There are certain difficulties experienced in trying to compare groups of participants on two different continents.  In this case, the test was e-mailed out to colleagues of the researcher in Japan, while the researcher himself conducted all the tests in Melbourne.
            As much as possible care was taken to make sure that the tests were conducted in the same way.  However, a certain amount of control is relinquished when the test is administered by other people on a different continent.  An example of this is that the post-test task (designed to account for lexical familiarity) was not given to participants in Japan as a result of presumably must have been a communication error.
            Also when administrating the test, it is possible that all sorts of unforeseen variables in the setting or the personality of the administrator may have influenced the results.

            Despite many of the problems with this study, it is not completely without merit.  Among the more interesting results is that listening discrimination of /r/ and /l/ phonemes increases for participants living in the target country even though accurate label identification does not.  This could perhaps indicate that, for Japanese people living abroad, the apparent lack of perception is in part an inability to assign accurate labels to sounds that are perceived.  This may indicate an area for English educators to concentrate on.

Altenberg, E. (2005.) The judgement of consonant clusters in a second language.   International Review of Applied Linguistics, 40, 53-80.

Aoyama, K., Flege, J.E., Guion, S., Yamada, R.A. & Yamada, T. (2004).  Perceived           phonetic dissimilarity and L2 speech learning: The case of Japanese /r/ and        English /l/ and /r/.  Journal of phonetics, 32, 233-250.

Best, C. (1995.) A direct realist view of cross-language speech perception. In Strange, W. (Ed.) Speech Perception and Linguistic Experience: Theoretical and    Methodological Issues (pp. 171-204). Baltimore: New York Press

Best, C.T., & Tyler, M.D. (2006).  Nonnative and second-language speech perception:       Commonalities and complementarities. In M.J. Munro & O.-S. Bohn (Eds.)        Second language speech learning: The role of language experience in speech   perception and production. (pp.2-47). Amsterdam: John Benjaimins.

Bohn, O.S., & Flege, J.E. (1997). Perception and production of a new vowel category       by adult second language learners.  In Leather, J. & James, A. (Eds.)     Second-language speech: structure and process. (pp. 51-71). Berlin: de             Gruyter.

Bradlow, A., Pisoni, D., Yamada, R.A., & Tohkura, Y. (1997).  Training Japanese    listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning         on speech production.  Journal of Acoustical Society of America, 101 (4),            2299-2310.

Caniff, M. (1942).  How to spot a Jap.  Washington D.C.: US War and Navy

Carr, P. (2008). A Glossary of Phonology. Edinburgh: Edinburgh University Press Ltd.

Chen, S. & Fon, J. (2007.) The effects of phonetic distance, learning context and learner
            proficiency on L2 perception of English liquids. ICPhS¸1721-2724.

Dupoux, E., Kakehi, K., Hirose, Y., Pallier, C., & Mehler J. (1999). Epenthetic vowels       in Japanese: a perceptual illusion? Journal of Experimental Psychology:           Human            perception and Performance 25 (6), 1568-1578.

Flege, J.E., (1995.) Second-language speech learning: theory, findings, and problems.        In: Strange, W. (Ed.) Speech Perception and linguistic Experience: Theoretical      and Methodological Issues (pp229-273). Baltimore: New York Press.

Flege, J., Takagi, N., & Mann, V. (1995).  Japanese adults can learn to produce English      /r/ and /l/ accurately.  Language and Speech 38 (1), 25-55

Flege, J.E., Takagi, N., Mann, V. (1996). Lexical familiarity and English-language experience affect Japanese adults’ perception of /r/ and /l/.  Journal of    Acoustical Society of America, 99(2), 1161-1172

Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “l” and      “r”.  Neuropsychologia 9, 317-323

Guion, S., Flege, J., Yamada, R.A., & Pruitt, J. (2000).  An investigation of current            models of second language speech perception: The case of Japanese adults’     perception of English consonants.  Journal of the Acoustical Society of   America, 107 (5), 2711-2724

Hazan, V., Sennema, A., Iba, M. & Faulkner, A. (2004.) Effects of audiovisual       perceptual training on the perception and production of consonants by Japanese           learners of English.  Speech Communication, 47, 360-378.

Ingram, J.& Park, S.G. (1998.) Language, context, and speaker effects in the identification of English /r/ and /l/ by Japanese and Korean listeners. Journal of Acoustic Society of America, 103 (2), 1161-1174.

Jusczyk, P.W., Friederici, A.D., Wessels, J.M., Svenkerud, V.Y., & Jusczyk, A. M. (1993.) Infants’ sensitivity to the sound pattern of native language words. Journal of Memory and Language¸32, 402-420.

Lively, S., Logan, J., & Pisoni, D. (1993). Training Japanese listeners to identify English    /r/ and /l/. II: The role of phonetic environment and talker variability in learning          new perceptual categories. Journal of the Acoustical Society of America, 94 (3),

Lively, S., Pisoni, D., Yamada, R.A., Tohkura, Y. & Yamada, T. (1994). Training     Japanese listeners to identify English /r/ and /l/. III. Long-term retention of new   phonetic categories. Journal of the Acoustical Society of America, 96 (4), 2076-2087

MacKain, K.S., Best, C.T., and Strange, W. (1981.) Categorical perception of English /r/   and /l/ by Japanese bilinguals. Applied Psycholinguistics, 2, 369-390.

Mann, V. (1986). Distinguishing universal and language-dependent levels of speech perception: Evidence from Japanese listeners’ perception of English “l” and “r”. Cognition, 24, 169-196.

Massaro, D.W., & Cohen, M. M. (1983.) Phonological context in speech perception.          Perception & pscyhophysics, 34, 338-348.

Miwayki, K., Strange, W., Verbrugge, R., Liberman, A., Jenkins, J. & Fujimura, O.            (1975). An effect of linguistic experience: The discrimination of [r] and [l] by      native speakers of Japanese and English. Perception and psychophysics, 18 (5),             331-340

Mochizuki, M. (1981.) The identification of /r/ and /l/ in natural and synthesized speech.   Journal of phoneticsm 9, 283-303

Munro, M.J., & Bohn, N.S., (2007). The study of second language speech: A brief             overview. In O.S. Bohn (Ed.) Language experience in second language speech          learning: In honor of James Emil Flege (pp. 3-11). Amsterdam: John Benjamins Publishing Company.

Nogita, A. (2010.) Do Japanese ESL learners’ pronunciation errors come from inability to articulate or misconceptions about targets sounds? Working Papers of the Linguistics Circle of the University of Victoria, 20, 82-116.

O’Conner, J.D., Gerstman, L.J., Liberman, A.M., Delattre, P.C, & Cooper, F.S. (1957).      “Acoustic cues for the perception of initial /w,r,l/ in English,” Word 13, 25-43.

Onishi, K.H., Chambers, K., Fisher, C. (2001.) Learning phonotactic constraints from        brief auditory experience.  Cognition (83) B13-B23.

Pinker, S. (1994). The language instinct.  New York: William Morrow and   Company.

Pitt, M. (1998). Phonological processes and the perception of phonotactically illegal          consonant clusters.  Perception & Psychophysics, 60, 941-951.

Roach, P. (2009). English phonetics and phonology: A practical course.  Cambridge:          Cambridge University Press.

Scobbie, J. & Wrench, A. (2003.) An articulatory investigation of word final /l/ and            /l/-sandhi in three dialects of English. Proceedings of the 15th International
 Congress of Phonetic Sciences, 1871-74.

Sheldon, A., & Strange, W.  (1982). The acquisition of /r/ and /l/ by Japanese learners         of English: Evidence that speech production can precede speech perception.    Applied Psycholinguistics, 3, 243-261.

Takagi, N. (2002.) The limits of training Japanese listeners to identify English /r/ and /l/: Eight case studies. Journal of the Acoustic Society of America, 6, 2887-2896.

Underbakke, M., Polka, L., Gottfried, T. & Strange, W. (1988.) Trading relations in the perception of /r/-/l/ by Japanese learners of English. Journal of the Acoustic Society of America, 84 (1), 90-100.

Vance, T.  (1987). An introduction to Japanese phonology. Albany: State University of       New York Press. 

Appendix I. Participant tasks

I. Questionnaire on English Background
1. Number of years studying English in a Japanese school or Japanese University setting___________________

2. Number of years studying English in English conversation school with native English speakers or private tutoring from an English native speaker __________________

3. Number of years spent in an English speaking country _____________________

Listening Task 1:
Please listen to the following word pairs.    For each word, write down whether you hear an r or an l sound.  Then for each pair circle whether the words you hear are the same or different.  Sometimes the words will be the same, sometimes the words will be different.


      Word 1                   Word 2                Same or Different   
1.   _________                  __________                  S /D
2.   _________                  __________                  S /D
3.   _________                  __________                  S /D
4.   _________                  __________                  S /D
5.   _________                  __________                  S /D
6.   _________                  __________                  S /D
7.   _________                  __________                  S /D
8.   _________                  __________                  S /D
9.   _________                  __________                  S /D
10.  _________                  __________                  S /D
11.  _________                  __________                  S /D
12.  _________                  __________                  S /D
13   _________                  __________                  S /D
14.  _________                  __________                  S /D
15.  _________                  __________                  S /D
16.  _________                  __________                  S /D
17.  _________                  __________                  S /D
18.  _________                  __________                  S /D
19.  _________                  __________                  S /D
20.  _________                  __________                  S /D
21.  _________                  __________                  S /D
22.  _________                  __________                  S /D
23.  _________                  __________                  S /D
24.  _________                  __________                  S /D

Listening Task 2:
Please listen to the following word pairs.  Both words will be different.  Each word pair will contain one real English word, and one unreal English word.  Put an X symbol in the column next to the real English word.  If you don’t recognize either of the words, mark the word that sounds more natural in English.
Also mark your confidence in your choice.  1 is the lowest, 3 is the highest.
   Word 1                 Word 2                                          
1. _________              __________
2. _________              __________
3. _________              __________
4. _________              __________
5. _________              __________
6. _________              __________
7. _________              __________
8. _________              __________
9. _________              __________
10._________              __________

Here are the words from the listening test.  Please circle any words you don’t recognize.

1. Train
2. Trend
3. Strong
4. Strand
5. Dress
6. Dream
7. Three
8. Thrill
9. Shrink
10. Shrimp

Appendix II. List of tokens (before randomization)

Test 1
Pattern: L—R

Pattern: R—R

Pattern: L—L
Lead ---Lead

Pattern: R—L

Pattern: L—R

Pattern: R—R

Pattern: L—L

Pattern: R—L

Pattern: L—R

Pattern: R—R

Pattern: L—L

Pattern: R—L

Pattern: L---R

Pattern: R---R

Pattern: L---L

Patter: R—L

Patter: L---R

Pattern R---R

Pattern L---L

Pattern R—L

Pattern L---R

Pattern: R---R

Pattern: L---L

Pattern: R---L

Test 2
Pattern:R—L, real—unreal

Pattern: L—R, unreal—real

Pattern: R—L real—unreal

Pattern: L—R, unreal—real

Pattern: R—L, real—unreal

Pattern: L—R, unreal—real

Pattern: R—L, real—unreal

Pattern: L—R, unreal—real

Pattern: R—L, real—unreal

Pattern: L—R, unreal—real

[1] The US War Department made use of this in 1942 with their pamphlet “How to Spot a Jap” (Caniff, 1942).  

 Update: Grade H2A
Examiner's Comments:
Report on ‘Identification and discrimination of /r/ and /l/ phonemes by L1 Japanese speakers. MA thesis. 

Examiner 1

The thesis reports on a study of two groups of Japanese L1 speakers and their ability to perceive and identify /r/ and /l/ phonemes in English, using a variety of tests. One group were living in Melbourne, the other had never lived outside of Japan. The hypothesis tested was that the Melbourne-resident group would demonstrate better ability to discriminate between and identify these two phonemes of English, than the Japan-resident group.

The structure of the thesis is a standard experimental report. The candidate’s written expression is adequate on the whole, although poorly structured in parts. Chs 2 and 3 consist of around a single page each.
There are numerous single-sentence paragraphs. Some paragraphs are broken incorrectly, or inadequately foreshadowed.
There are many typos, the tables and the thesis as a whole are poorly formatted making it difficult to read in places. The tables reporting results consistently lack means reported as percentages, making it difficult to interpret the results and compare across tasks.
There are errors or inadequacies in the reporting of the literature. For instance, on p. 19 the candidate claims that ‘According to Japanese phonotactics, it is impossible for one consonant to directly follow another consonant without a vowel in between’.
Some claims (like this one) are unsupported by reference to the literature.
Terminology is used incorrectly; e.g. on p. 20: ‘With /r/, only the following phoneme combinations can exist using /r/ as the post initial syllable…’

The candidate refers to two theoretical models of second language perception: Flege’s Speech Learning Model and Best’s Perceptual Assimilation Model. But there is very little in the discussion of the results which discusses their relevance for these models.

Specific points
The Melbourne-resident group consists of speakers with a very wide range of residence duration, between 1 and 36 months. It strikes me that participants with this degree of residence variation may not be comparable. How different would a participant of 1 month’s residence be from a Japan-resident participant?

In addition, the Japan-resident participants have an even wider range for years of English conversation school exposure, between 6 months and 13 years. Both problems were noted in the ‘limitations’ section.

It was not clear to me whether there was any attempt to control for word frequency or familiarity effects. How familiar would the average L2 English speaker be with words such as ‘flesh’, ‘bland’, ‘splay’ and ‘splint’?? Why were participants not pre-tested on word familiarity?

Clearly the candidate worked hard to design and implement the study, and went to the trouble of arranging testing in Japan with a colleague. He appears to understand the methodology of this kind of research, and the study itself is potentially of interest to researchers in this field, although I find the results rather underwhelming or even contradictory and apparently lacking a clear explanation in places.

Examiner 2
The thesis started out with a thorough review of the literature, had a relatively good coverage of the results with some good reasoning, but petered out toward the end. There was a lack of interpretation of the results, and in fact some of the time spent offering limitations of the study should actually have been spent reflecting on what the results meant, and what the implications are. For example, how does this study relate to the theoretical models described in the literature review? The last sentence of the thesis hints at an implication of the work, but the reader should not be left wondering why else the work is important.

Some specific comments:
A lot of work went into the project, including carrying out a pilot study. However, I was surprised to read about the pilot study in the results section, this should have been addressed in the methodology.

The thesis was very hard to read in parts due to expression, and the formatting of tables was especially challenging. I suggest not using double-spacing in a table, and never having a table cross two pages. Additionally, almost every table had a heading above and below it.

In section 1.1.2, the discussion of what liquid consonants are should be reversed (paragraph 1 should go before paragraph 2).

Page 12. Typo – “which vowel it proceeds” should be “which vowel it precedes”.

There was some inconsistency with which materials were put in the appendix and which stayed in the thesis. That is, /l/ and /r/ words were relegated to the appendix while clusters were shown in Table 3. Both should be in the same place, preferably in the thesis.

The results section had too much repetition. Time and space should have been saved for the discussion/ conclusion.

The start of chapter 3 has a good summary of the research hypotheses. Most of these points are based on previous research, and the references should have been listed again.

Section 4.1. I was left wondering about the Japanese participants’ exposure to rhotic varieties of English – that is, are they exposed to a variety of English which effectively has more /r/ sounds than Australian English?

p.58 “unreal words” is a strange term - should probably be changed to “nonsense words”