Saturday, May 04, 2013

Examining Correlations Between Reading and Writing Among Native and Non-Native English Speakers Using Existing DELA Scores



            This paper will make use of pre-existing scores from the DELA test to examine what the relationship is between reading and writing, and how this reading compares to other possible variables as a predictor of writing scores.
Literature Review
            Many researchers over the past 30 years have commented on the connection between reading and writing, and the cognitive processes that are required for both.  Both involve constructing meaning from a written text.  Both involve cognitive skills of organizing information, either to make sense of what we are reading, or to write in a way that will be understandable to the reader (Samway, 2006).  And it has been argued that extensive reading will often improve writing abilities (Grabe, 2003).
            Many studies show a high degree of correlation between reading and writing.  Grabe (2003) states that research in L1 usually results in a reading and writing correlation somewhere between 0.50 and 0.70. 
            And although it has been argued that increased reading causes increased writing abilities (Grabe, 2003), it is important to remember that correlation does not necessarily equal causation.  There are other theories, such as the non-directional hypothesis, which theorize that reading and writing are related because the same cognitive processes underlie both of them (Eisterhold, 1990).
            Reading ability is also closely related to writing ability in the L2, although here the situation is slightly different.  On the one hand, L2 writers are hindered by not having access to the fully developed language system that an L1 writer would have.  On the other hand, adult L2 writers have literary skills developed in their native language which may transfer over into their L1 writing (Eisterhold, 1990).
            And indeed, when testing L2 learners the results can differ greatly.  A 1990 study comparing Japanese and Chinese learners of English, in both their L1 and their L2, showed only weak to moderate correlations in the L1 reading and writing (0.271 for Chinese learners, 0.493 for Japanese learners) and also for the L2 (0.494 for Chinese learners, 0.271 for Japanese).  In the L1 the correlations were stronger for Japanese learners, but in the L2 it was stronger for Chinese learners (Carson et al. 1990).
            Another study, based on questionnaires that participants filled out on their reading habits (although not based on reading test scores) and written test scores found that it was difficult to make connection between reading habits and writing abilities (Hedgcock et al., 1993).
Research Questions
1)    Will native speakers significantly outscore non-native speakers in every test section?
2)    Will reading be the greatest predictor of writing ability in both native and non-native speaker groups?  (Or put another way, will reading-writing correlation coefficients be higher than the coefficient of any other variable correlated with writing?)
3)    If so, will this difference be statistically significant?
4)    Will the reading and writing correlation coefficient for native speakers will be somewhere between .50 and .70, as predicted in the literature?
5)    Will the reading and writing correlation coefficient be different for non-native speakers and native speakers?
6)    Will this difference be statistically significant?

Methodology
Data
            Using pre-existing data from the DELA test scores, the native language (L1) of each test taker was examined.  In order to set up a binary comparison between native speakers and non-native speakers, every L1 other than English was re-coded as simply “non-native”.  The result was 266 self-identified native English speakers, and 987 non-native speakers.
            33 participants left the L1 space blank on their forms.  In many of these cases it was possible to make a reasonable assumption about the identity of the L1 based on other details such as country of origin, or years of education in Australia.  However rather than risk over-generalizing, it was decided the cleanest way to deal with this problem was simply to drop these participants from any calculations.
            Also it should be noted that in this pre-existing data set, not every participant has scores marked for every category.  Accordingly the N number occasionally varies between different correlations.  Where ever applicable, the different N number will be marked in the tables in the analysis section.
                        Finally, in the pre-existing data set, the reading and listening test results were given as both raw scores and bandwidth scores, thus resulting in two different sets of data for the same variable.  In the rough draft of this paper, both the raw scores and the bandwidth scores were used in all calculations for the sake of thoroughness and greatest possible accuracy.  However the number of correlations that needed to be run as a result of this produced a number of confusing and unwieldy tables.  Also, interestingly enough, including both the raw scores and bandwidth scores in the comparisons often resulted in contradictory results.  For example, when comparing raw reading scores, native speakers showed a greater correlation between reading and writing fluency.  When comparing bandwidth scores, non-native speakers had the greater correlation between reading and writing fluency.  This served to confuse the issue, and made it difficult to determine results which could be easily summarized.
            As a result, the decision was made to use only the raw scores for both the reading and listening tests.  The raw scores were deemed to be closer to the original output of the test taker, and thus had more authenticity.  Critical readers of this paper however should be aware that some of the comparisons mentioned in the analysis section would have been reversed had the bandwidth data been used instead of the raw scores.
            The writing scores were also given as three different variables: writing fluency score, writing content score, and writing form score. 
            In the case of the writing scores, however, it was decided that none of the categories could be discarded.  In the case of the reading and listening scores, the bandwidth score was clearly derived from the raw score number.  In the case of the writing scores, however, while some overlap between skills may arguable exist, one score was not derived from simply converting another.  Therefore all three different writing scores were used for each calculation.
Calculations
            Once the native and non-native speakers had been separated, the reading scores and writing scores were correlated.   Two separate sets of correlations were run: one for native English speakers, one for non-native English speakers.
            It was decided that all scores being used in this correlation other than the raw data scores were sets of ordinal data, and so a Spearman’s rho correlation was used in all cases. 
            (The only case where it would have been appropriate to run a Pearson’s correlation, examining the relationship between listening raw scores and reading raw scores, was deemed to be outside the bounds of this study’s research questions, and thus never run.)
Once the correlation coefficients were calculated using SPSS, the correlation coefficients were compared to determine statistical significance.  Comparisons between groups of native and non-native speakers used the “comparing independent rs” procedure described on page 191 of “Discovering Statistics Using SPSS” (Field, 2009) and 136-141 of “SPSS Survival Manuel” (Pallant 2007).
 Comparisons of test scores within the native or non-native groups used the “comparing dependent rs” procedure described on pages 191-192 of “Discovering Statistics Using SPSS”.
Analysis
            The first research question was easily answered.  As would be expected, the mean scores for native test takers was greater than for non-native test takers in every possible category (including the self-rating sections).  A one-way ANOVA was run, and the post-hoc tests confirmed that the difference was significant in every case. 
            (This result may seem so obvious as to be pointless, but before comparing the two groups it is important to first establish that they are significantly different from each other.)
            The next step was to run correlations on reading and writing scores to test the second research question.
            Because the writing score is divided into three separate categories, this resulted in three separate correlations.
            For the purposes of testing research question number 2 as thoroughly as possible, it was decided to run all three correlations, and then to make sure that all three reading-writing correlation coefficients were significantly higher than any other possible correlation coefficients.
For native speakers, all of the correlation coefficients between reading and writing scores ranged from 0.314 (for reading scores and writing content) to 0.428 (for reading scores and writing fluency).
Table 1
Spearman’s rho Correlation Coefficients
N=266
P<0 .05="" span="">
Reading (Raw Score)
Writing Fluency
0.428
Writing Content
0.314
Writing Form
0.371

Next correlations were run between writing and all the other variables (listening raw scores, self-rating of speaking, and self rating of daily communication in English). 
This produced nine new correlation coefficients ranging from 0.116 (writing content with daily communication in English) to 0.269 (listening with writing form). 
Table 2
Spearman’s rho
Correlation Coefficients
P<0 .05="" span="">
Listening
(Raw Score)
Self-Rating Speaking
Self-Rating Daily Communication
Writing Fluency
0.189
N=266
0.144
N=254
0.128
N=253
Writing Content
0.244
N=266
0.153
N=254
0.116
N=253
Writing Form
0.269
N=266
0.195
N=254
0.181
N=253

As can be seen, the highest correlation coefficient in this group is still lower than the lowest correlation coefficient between reading and writing.
Next, the difference in correlation coefficients was tested for significance, using the comparing dependent rs procedure. 
In order for this procedure to work the way it was described in “Discovering Statistics Using SPSS”, it was necessary to have the same N number for both correlations.  This was no problems in the correlations involve reading and listening scores, but became an issue for the Self-Rating Speaking and Self-Rating Daily Communication, because some of the test takers had left this question blank.  It was therefore decided to test for significance only the comparisons between writing-reading and writing-listening.  (The listening test scores were deemed to be of more interest anyway, since they represented actual results instead of participants’ self-ratings.)
Also, because of the way the equation for comparing dependent rs was designed, it was possible to compare correlation coefficients within writing categories, but not across them.  So the reading-writing correlations for fluency, content, and form were all compared separately.
None of the comparisons between correlations coefficients reached significance (fluency t=3.55, p=0.99; content t=1.00, p=0.84; form t=1.49, p=0.93).
It is also worth noting here that all the correlation coefficients between reading and writing for native speakers were below 0.50, thus giving a negative answer to research question number 4.  This will be addressed in more detail in the discussion section.
Next the same tests were run for non-native speakers.
The non-native speakers, the correlation coefficients between reading and writing scores ranged from 0.395 (for writing fluency and reading scores) to 0.448 (for reading scores and writing form).

Table 3
Spearman’s rho Correlation Coefficients
N=986
P<0 .05="" span="">
Reading (Raw Score)
Writing Fluency
0.395
Writing Content
0.417
Writing Form
0.448

Again, correlations were run between writing and all other variables, producing nine new correlations.
Table 4
Spearman’s rho
Correlation Coefficients
P<0 .05="" span="">
Listening
(Raw Score)
Self-Rating Speaking
Self-Rating Daily Communication
Writing Fluency
0.347
N=986
0.224
N=913
0.213
N=912
Writing Content
0.404
N=986
0.236
N=913
0.207
N=912
Writing Form
0.425
N=986
0.271
N=913
0.236
N=912

The comparison between correlations in this case was not quite as obvious as in the case of native speakers. All the reading-writing correlation coefficients were not higher than all the other variable-writing correlation coefficients across the board.  Self-rating speaking and self-rating daily communication ranked consistently below reading when correlated with writing scores, but this was not true of listening scores.  For example the correlation coefficient between listening raw scores and writing form (0.425) was higher than the correlation coefficients between reading-writing fluency and reading-writing content.
However, if the three different writing categories are all looked at in isolation, then in each of the three categories the reading correlation coefficients are higher than any other variable correlation coefficients, including listening. 
The comparison can be more clearly seen in table 5 below.

Table 5
Spearman’s rho Correlation Coefficients
N=986
P<0 .05="" span="">
Reading (Raw Score)
Listening
(Raw Score)
Writing Fluency
0.395
0.347
Writing Content
0.417
0.404
Writing Form
0.448
0.425

However, none of the comparisons between coefficient scores reached statistical significance (Fluency t=1.56, p=0.94; Content t=0.43, p=0.67; Form t=0.78, p=0.78).
Finally, the native and non-native speakers were compared against each other.  The results were somewhat mixed.  It was found that in writing content and writing form, the correlation was higher for non-native speakers.  In the case of writing fluency, the correlation for native speakers was slightly higher.  But all the correlations were in the same range of 0.300 to 0.450.  And in fact, upon calculating the statistical significance of the difference between the correlation coefficients (as described in Pallant 2007) it was discovered that the difference did not reach significance in any of the three writing categories regardless of which group had the largest correlation (writing fluency correlations: z=0.57, p= 0.5687; writing content correlations z= 1.72, p=0.0854; writing form correlations z=1.33, p=0.1835).
Table 6
Spearman’s rho Correlation Coefficients
P<0 .05="" span="">
Reading (Raw Score)
Native speakers
N=266
Reading
(Raw Score)
Non-native speakers
N=986
Writing Fluency
0.428
0.395
Writing Content
0.314
0.417
Writing Form
0.371
0.448

Summary of Results
1.    Native speakers significantly outscored non-native speakers in all sections.
2.    In both non-native groups and native groups, writing scores correlated the highest with reading scores.
3.    However the difference between correlation coefficients was not significant.
4.    Correlation between reading and writing for native speakers fell below the .50 to .70 predicted in the literature.
5.    Correlations between reading and writing were roughly the same for both native and non-native speakers.
6.    The slight difference was not significant.

Discussion
          There were obviously some limitations with this pre-existing data. 
            For one thing, the N size between the native speakers and the non-native speakers was very different.  This was unavoidable because the data was pre-existing, but a more accurate study would have made an effort to get similar N sizes.
            A second problem, closely related to the first one, is that the participants were self-selecting, as the DELA test was an optional test.  This is no doubt why the N size of the native speakers was much less than that of the non-native speakers.
            It is understandable that this diagnostic test would be very attractive to non-native speakers nervous about their ability to use English in academic settings.  However, based only on the test score data, we do not know why the native speakers opted to take this test.  It could be because they felt their own academic writing was weak compared to other native speakers.  Or it could be because they were serious students, and wanted to err on the side of being over-prepared.  Or it could be a combination of the two.
            Either way, it is likely that this self-selecting group is not a true representative sample of the larger population. This may be why the reading-writing correlation among native speakers did not fall into the range predicted by the literature.
            However in both groups it is clear, as expected, that writing and reading have a significant relationship.  Although neither group reaches the level of what might be considered a strong correlation, in all cases writing had the largest correlation with reading.
Writing and listening correlated higher with non-native speakers than with native speakers.  In fact in the case of non-native speakers writing and reading correlations were just barely ahead of writing and listening.  This perhaps indicates that in the case of non-native speakers especially, there are other variables which underline all three, such as vocabulary or grammatical knowledge.  Whereas in the case of native speakers, a full language system is already in place, and listening skills maybe separate from the organizational skills unique to reading and writing.
If it were possible to do a further study, it would be nice to have groups that had equal numbers between native and non-native speakers, and participants that were not self-selecting.  It would be interesting to test for other variables such as grammatical knowledge or vocabulary, and examine the relationship these had with reading, writing, and listening.

References
Carson, J. E., Carrel, P.L., Silberstein, S., Kroll, B., & Kuehn, P.A. (1990). Reading-writing relationships in first and second language. TESOL Quarterly, 24(2), 245-266.

Eisterhold, J. C. (1990). “Reading-writing connections: toward a description for second language learners.” In B. Kroll (Ed.) Second Language Writing: Research Insights for the Classroom.  Cambridge, UK: Cambridge University Press.

Field, Andy. (2009). Discovering Statistics Using SPSS (Introducing Statistical Methods series). Sage Publications Ltd, January 2009.

Grabe, W. (2003). “Reading and writing relations: Second language perspectives on research and practice.” In B. Kroll (Ed.) Exploring the Dynamics of Second Language Writing. Cambridge, UK: Cambridge University Press.

Hedgecok, J., & Atkinson D. (1993). Differing Reading-Writing Relationships in L1 and L2 Literacy Development? TESOL Quarterly, 27, 329-333.

Pallant, J. (2001, May). SPSS Survival Manual: A Step By Step Guide to Data Analysis Using SPSS for Windows (Version 10). Open University Press.

Samway, K. Davies. (2006). When English Language Learners Write.  Portsmouth, NH: Heinemann.

Grade and Comments from professor:
Grade 70 out of 100.0
The subheadings of this study are somewhat off--it's not called "Calculations" and "Analysis" should probably be called "Results". And why compare correlation coefficients? And where is the significance of the coefficients themselves?

No comments: