Wednesday, March 23, 2022

Module 7: Testing

Google: drive, docs, pub

 The program that I am currently working on is dealing with pre-sessional EAP students.  The students need to complete the program to enter their university courses, so consequently the assessments are very important.
There are two tests in each term-- a midterm test and an end of course test.  The midterm test is 30% of the grade, while the final test is 50%.  The difference in weighting is to allow students who fail the midterms a chance to pass the course if they diagnose their weak points and devise a study plan for the final test.
The reading and listening tests are created by the assessment team, who have had a lot of training on creating valid test questions involving gap fills, multiple choice, and matching headings.  Test questions are further piloted in practice tests, and data is collected on their reliability prior to the questions being used in live exams.  It is all beyond my personal expertise, but I have confidence that the reading and listening tests are reliable and valid.
There is, however, backwash created by the tests.  The curriculum planners want the class time to be focused on building general reading and listening proficiency, but students want to spend class time practicing how to answer the exam style questions.  The poor teacher is in the middle.
The speaking and writing are assessed by a different teacher than the classroom teacher.  Speaking tests are a one-on-one conversation with the examiner.  Writing tests are a formal essay on a given topic.  In an attempt to improve reliability between examiners, the criteria for scoring speaking and writing is incredibly detailed--much more detailed than even standardized proficiency exams like the IELTS.  But the problem with overly detailed criteria is that it is difficult to hold all the criteria in your head when conducting a speaking exam.  Furthermore, despite the school’s best efforts, marks can still often vary widely between examiners.  All exams are recorded and failing exams are second marked by a second teacher, and wide discrepancies are a repeated problem.
As far as continuous assessment, this is the first school I have worked at where daily in-class speaking and participation do not count toward the grade.  I suspect this is because the university is careful to check all failing grades, and daily participation cannot be second marked after the fact.  Also daily homework, such as extensive reading logs, used to be part of continuous assessment, but was removed after the university could not control widespread copying and cheating.
As a result, continuous assessment now solely consists of a couple big essays and speaking tasks.  The essays make use of process writing.  The first draft is 25% of the grade, and then the second draft is 75%, but the students must respond to teacher feedback on the second draft.  
The speaking assessments are similarly set up as process tasks.  The students video themselves doing a speaking task, the teacher makes comments, and then the students have to respond to the feedback on the next “draft”.  However, because speaking involves real time production, students often have trouble utilizing feedback while simultaneously producing.  Students also have difficulty improving on speaking criteria like rhythm and intonation in a short time.  Therefore, in my opinion, the process speaking tasks are not effective.

No comments: