Six very common flaws of foreign language assessment

Teaching – Assessment mismatch

Often foreign language instructors test students on competences that have not been adequately emphasized in their teaching or, in some cases, have not even been taught.

The most common example of this refers to the issue of task unfamiliarity, i.e. the use of an assessment tool or language task the students have never or rarely carried out prior to the test. This can be an issue, as research clearly shows that the extent to which a learner is familiar with a task will affect his/her performance. The reasons for this refer to the anxiety that the unfamiliarity engenders, the higher cognitive load that grappling with an unfamiliar task obviously poses on working memory (especially when the task is quite complex) and of course the fact that memory (and, consequently, task-knowledge) is context-dependent – which means that knowledge is not easily transferred from task to task (the so-called T.A.P. or Transfer Appropriate Processing principle). By doing a task over and over again prior to an assessment involving that task, the student develops task-related cognitive and metacognitive strategies which ease the cognitive load and facilitate its execution.

Another common scenario is when students are not explicitly focused on and provided sufficient practice in a given area of language proficiency (e.g. accuracy, fluency, vocabulary range, grammar complexity); yet their teachers use assessment scales which emphasize performance in that area (e.g. by given grammatical accuracy a high weighting in speaking performance whilst practicing grammar only through cloze tasks). I have had several colleagues in the past who taught their students through project-based work involving little speaking practice even though they knew that the students would be assessed in terms of fluency at the end of the unit. Bizarre!

Language unfamiliarity is another instance of this mismatch, in my opinion. This refers to administering to students a test which requires them to infer from context or even use unfamiliar words and results in assessing the learners not on the language learnt during the unit but on compensation strategies (e.g. guessing words from context). Although compensation strategies are indeed a very important component of autonomous competence, I do believe that a test needs to assess students only on what they have been taught and not on their adaptive skills – or the assessment might be perceived by the learners as unfair, with negative consequence for student self-efficacy and motivation. A test must have construct validity, i.e. it must assess what it sets out to assess. Hence, unless we explicitly provide extensive practice in inferential skills, we should not test students on them.

Some teachers feel that since the students should possess the knowledge of the language required by the task whether the students are familiar with the task or not will not matter; this assumption, however, is based on a misunderstanding of L2 language acquisition and task-related proficiency.

Knowledge vs control

Very often teachers administer ‘grammar’ tests in order to ascertain whether a specific grammar structure has been ‘learnt’. This is often done through gap-fill/cloze tests or translations. This approach to grammar testing is correct if one is purporting to assess declarative (intellectual) knowledge of the target structure(s) but not the extent of the learners’ control over it (i.e. the ability to use grammar in real operating conditions, in relatively unmonitored speech or written output). An oral picture task or spontaneous conversational exchange eliciting the use of the target structure would be more accurate ways to assess the extent of learner control over grammar and vocabulary. This is another common instance of construct invalidity.

Listening vs Listenership

This refers less to a mistake in assessment design than to a pedagogical flaw and assessment deficit and is a very important issue because of its major wash-back effect on learning. Listening is usually assessed solely through listening comprehension tasks; however, this does not test an important set of listening skills, ‘listenership’, i.e. the ability to respond to an interlocutor (a speaker) in real conversation. If we only test students on this aspect of listening, the grade or level we assign to them will only be reflecting an important set of listening skills (comprehending a text) but not the one they need the most in real-life interaction (listening to an interlocutor as part of meaning negotiation). Listening assessments need to address this important deficit, which, in my opinion is widespread in the UK system.

Lack of piloting

To administer a test without piloting it can be very ‘tricky’ even if the test comes from a widely used textbook assessment pack. Ambiguous pictures and solutions, speed of delivery, inconsistent and/or very subjective grading of tests and construct validity issues are not uncommon flaws of many renowned course-books’ assessment materials. Ideally, tests should be piloted by more than one person on the team, especially when it comes to the grading system; in my experience this is usually the most controversial aspect of an assessment.

‘Woolly’ assessment scales

When you have a fairly homogenous student population, it is important to use assessment scales/rubrics which are as detailed as possible in terms of complexity, accuracy, fluency, communication and range of vocabulary. In this respect, the old UK National Curriculum Levels (still in use in many British schools) were highly defective and so are the GCSE scales adopted by UK examination boards. MFL departments should invest some quality time to come up with their own scales, making specific reference in the grade descriptors to the traits they emphasize the most in their curriculum (so as to satisfy the construct validity criterion).

Fluency – the neglected factor

Just like ‘listenership’, fluency is another factor of language performance that is often neglected in assessment; yet, it is the most important indicator of the level of control someone has achieved in TL receptive and productive skills. Whereas in speaking UK MFL departments do often include fluency amongst their assessment criteria, in writing and reading this is not often the case. Yet, it can be relatively easily done. For instance, in essay writing, all one has to do is to set a time and word limit for the task-in-hand and note down the time of completion for each student as they hand it in. Often teachers do not differentiate between students who score equally across accuracy, complexity and vocabulary but differ substantially in terms of writing fluency (i.e. the time to word ratio). By so doing we fail to assess one of the most important aspects of language acquisition: executive control over a skill. In my view, this is something that should not be overlooked, both in low-stake and high-stake assessments.