Nine interesting foreign language research findings you may not know about

images (5)

In  this post I am going to share with the reader a very succinct summary of 9 pieces of research I have recently come across which I found interesting and have impacted my classroom practice in one way or another. They are not presented in any particular order.

  1. Green and Hecht 1992 – Area: Explicit grammar instruction and teaching of aspect

Green and Hecht investigated 300 German learners of English. They asked them to correct 12 errors in context and to offer an explanation of the rule. Most interesting finding: the students could correct 78 % of the errors but could not provide an explanation for more than 46 % of the grammar rules that referred to those errors. They identified a set of rules that were hard to learn (i.e. most students did not recall them) and a set of easy rules (the vast majority of them could recall them successfully). Their implications for teaching: the explicit teaching of grammar may actually not work for all grammar items. For example, the teaching of aspect (e.g. Imperfect vs Preterite in Spanish), would be more effectively taught, according to them, by exposure to masses of comprehensible input (e.g. narrative texts) rather than through the use of PPTs or diagrams on the classroom whiteboard/screen – in fact Blyth (1997) and Macaro (2002a) demonstrated the futility of drawing horizontal lines interrupted by vertical ones to indicate that the perfect tense ends the action.

My conclusions: I do not entirely agree with Blyth and Macaro that explicit explanation of grammar in the realm of aspect does not work and I do like diagrams (although they do not work with all of one’s students). However, I do agree with Green and Hecht (1992) that the best way to teach aspect is through exposure to masses of comprehensible input containing examples of aspect in context. The grammar explanation and production phase may be carried out at a later stage.

  1. Milton and Meara (1998) – Comparative study of vocabulary learning between German, English and Greek students aged 14-15 years.

197 students from the three countries studying similar syllabi for the same number of years were tested on their vocabulary. The findings were that:

1.The British students’ score was the worst (averaging at 60 %). According to the researchers, they showed a poor grasp of basic vocabulary ;

2.They spent less time learning and were set lower goals than their German and Greek counterparts;

3. 25 % of the British students scored so low (after four years of MFL learning) that the researchers questioned whether they had learnt anything at all.

The authors of the study also found that British learners are not necessarily worse in terms of language aptitude; rather, they questioned the effectiveness of MFL teaching in the UK.

My conclusions: this study is quite old and the sample they used may not be indicative of the overall British student population. If it were, though, representative of the general situation in Britain, teachers may have to – as I have advocated in several previous blogs of mine – consciously recycle words over and over again, not just within the same units, but across units.

Moreover a study of 850 EFL learners, by Gu and Johnson (1996), may indicate an important issue underlying our students poor vocabulary retention; they found that students who excelled in vocabulary size were those who used three metacognitive strategies in addition to the cognitive strategies used by less effective vocabulary learners : selective attention to words (deciding to focus on certain words worth memorizing), self-initiation (making an effort to learn beyond the classroom and the exam system) and deliberate activation of newly-learnt words (trying out using that word independently to obtain positive or negative feedback as to the correctness of their use) . Teaching should aim, in other words, at developing learner autonomy and motivation to apply all of these strategies independently outside the classroom.

  1. Knight (1994) – Using dictionaries whilst reading – effects on vocabulary learning

Knight gave her subjects a text to read on a computer. One group had access to electronic dictionaries whilst the other did not. She found that those who did use the dictionary and not simply guessing strategies, actually scored higher in a subsequent vocabulary test. This and other previous (Luppescu and Day, 1993) and subsequent studies (Laufer & Hadar, 1997; Laufer & Hill, 2000; Laufer & Kimmel,1997) suggest that students should not be barred from using dictionaries in lessons. These findings are important for 1:1 (tablet or PC) school settings considering the availability of free online dictionaries (e.g.

  1. Anderson and Jordan (1998) – Rate of forgetting

Anderson and Jordan set out to investigate the number of words that could be recalled by their informants immediately after initial learning, 1 week, 3 weeks, and 8 weeks thereafter. They identified a learning rate of 66%, 48%, 39%, and 37% respectively. The obvious implication is that, if immediately after learning the subjects could not recall 66 % of the target vocabulary, consolidation should start then and continue (at spaced intervals – through recycling in lessons or as homework) for several weeks. At several points during the school year, I remind my students of Anderson and Jordan’s study and show them the following diagram. It usually strikes a chord with a lot of them:


  1. Erler (2003) – Relationship between phonemic awareness and L2 reading proficiency

Erler set out to investigate the obstacles of learners of French as a foreign language in England. She studied 11-12 year olds. She found that there was a strong correlation between low level of phonemic awareness and reading skills (especialy word recognition skills). She concluded that explicit training and practice in the grapheme-phoneme system (i.e. how letters/combination of letters are pronounced) of French would improve L1-English learners’ reading proficiency in that language. This find corroborates other findings by Muter and Diethelm (2001) and Comeau et al (1999). The implications is that micro-listening enhancers of the like I discussed in a previous blog (e.g. ‘Micro-listening skills tasks you may not do in your lessons’) or any other teaching of phonics should be performed in class much more often than it is currently done in many UK MFL classrooms.

Please note: teaching pronunciation and decoding skills instruction are not the same thing.  Pronunciation is about understanding how sounds are produced by the articulators, whilst teaching decoding skills means instructing learners on how to convert letters and combination of letters into sound. Also, effective decoding-skill instruction occurs in communicative contexts (whether through receptive or productive processing) not simply through matching sounds with gestures and/or phonetic symbols.

  1. Feyten (1991) – Listening ability as predictor of success

Feyten investigated the possibility that listening ability may be a predictor of success in foreign language learning. The researcher assessed the students at pre-test using a variety of tasks and measures of listening proficiency. After a ten-week course she tested them again (post-test) and found that there was a strong correlation between listening ability and overall foreign language acquisition, i.e.: the students who had scored high at pre-test did better at post-test not just in listening, but also in written grammar, reading and vocabulary assessment. Listening was a better predictor of foreign language proficiency than any other individual factor (e.g. gender, previous learning history, etc.).

My implications: we should take listening more seriously than we currently do. Increased exposure to listening input and more frequent teaching of listening strategies are paramount in the light of such evidence. Any effective baseline assessment at the outset of a course ought to include a strong listening comprehension component; the latter ought to include a specific decoding-skill assessment element.

  1. Graham (1997) – Identification of foreign language learners’ listening strategies

This study investigated the listening strategies of 17-year-old English learners of German and French. Amongst other things she found the following issues undermining their listening comprehension. Firstly, they were slow in identifying key items in a text. Secondly, they often misheard words or syllables and transcribed what they believed they had heard thereby getting distracted. Graham’s conclusions were that weaker students overcompensated for lack of lexical knowledge by overusing top-down strategies (e.g. spotting key words as an aid to grasp meaning).

My implications are that Graham’s research evidence, which echoes finding from Mendelsohn (1998) and other studies, should make us wary of getting students to over-rely on guessing strategies based on key-words recognition. Teachers should focus on bottom-up processing skills much more than they currently do, e.g. by practising (a) micro-listening skills; (b) narrow listening or any other listening instruction methodology which emphasizes recycling of the same vocabulary through comprehensible input (N.B. not necessarily through videos or audio-tracks; it can be teacher-based, in absence of other resources); (c) listening with transcripts – whole, gapped or manipulated in such a way as to focus learners on phoneme-grapheme correspondence.

  1. Polio et al. (1998) – Effectiveness of editing instruction

Polio et al. (1998) set out to investigate whether additional editing instruction – the innovative feature of the study – would enhance learners’ ability to reduce errors in revised essays. 65 learners on a university EAP course were randomly assigned to an experimental and a control group who wrote four journal entries each week for seven weeks. Whereas the control group did not receive any feedback, the experimental group was involved in (1) grammar review and editing exercises and (2) revision of the journal entries, both of which were followed by teacher corrective feedback. On each pre- and post-tests, the learners wrote a 30-minute composition which they were asked to improve in 60 minutes two days later. Linguistic accuracy was calculated as a ratio of error-free T-units to the total number of T-units in the composition.

The results suggested that the experimental group did not outperform the control group. The researchers conjectured that the validity of their results might have been undermined by the assessment measure used (T-units) and/or the relatively short duration of the treatment. They also hypothesised that the instruction the control group received might have been so effective that the additional practice for the experimental group did not make any difference.

The implications of this study are that editing instruction may take longer than seven weeks in order to be effective. Thus, the one-off editing instruction sessions that many teachers do on finding common errors in their students’ essays to address the grammar issues that refer to them, are absolutely futile, unless they are followed up by extensive and focused practice with lots of recycling.

  1. Elliott (1995) – Effect of explicit instruction on pronunciation

Elliott set out to investigate the effects of improving learner attitude toward pronunciation and of explicitly teaching pronunciation on his subjects (66 L1 students of Spanish). He compared the experimental group (which received 10-15 minutes of instruction per lesson over a semester) with a group of students whose pronunciation was corrected only when it impeded understanding. The results were highly significant, both in terms of improved accent and of attitude (92 % of the informants being positive about the treatment). The experimental group outperformed the control group.

Implications: this study , which confirms evidence from several others (e.g. Elliot 1997; Zampini, 1994), confirms that explicit pronunciation instruction is more effective than implicit instruction whereby L2 learners are expected to learn pronunciation simply by exposure to comprehensible input. Arteaga’s (2000) review of US Spanish textbooks found that only 4 out of 10 Spanish textbooks include activities attempting to teach pronunciation. I suspect that the figure may be even lower in the UK. In the light of Elliott’s findings, this is quite appalling, as the mastery of phonology not only is a catalyst of reading ability but also of listening and speaking proficiency as well as playing an enormous role in Working Memory’s processing efficiency in general (see my blog: ‘ Eight important facts about Working Memory’).


How the brain acquires foreign language grammar – A Skill-theory perspective

Caveat: Being an adaptation of a section of a chapter in my Doctoral thesis, this is a fairly challenging article which may require solid grounding in Applied Linguistics and Cognitive Theories of Skill Acquisition.

1. L2-Acquisition as skill acquisition: the Anderson Model

The Anderson Model, called ACT* (Adaptive Control of Thought), was originally created as an account of the way students internalise geometry rules. It was later developed as a model of L2-learning (Anderson, 1980, 1983, 2000). The fundamental epistemological premise of adopting a skill-development model as a framework for L2-acquisition is that language is considered as governed by the same principles that regulate any other cognitive skill. A number of scholars such as Mc Laughlin (1987), Levelt (1989), O’Malley and Chamot (1990) and Johnson (1996), have produced a number of persuasive arguments in favour of this notion.

Although ACT* constitutes my espoused theory of L2 acquisition, I do not endorse Anderson’s claim that his model alone can give a completely satisfactory account of L2-acquisition. I do believe, however, that it can be used effectively to conceptualise at least three important dimensions of L2-acquisition which are relevant to type of Explicit MFL instructional approaches implemented in many British schools: (1) the acquisition of grammatical rules in explicit L2-instruction, (2) the developmental mechanisms of language processing and (3) the acquisition of Learning Strategies.


 Figure 1: The Anderson Model (adapted from Anderson, 1983)




The basic structure of the model is illustrated in Figure 1, above. Anderson posits three kinds of memory, Working Short-Term Memory (WSTM), Declarative Memory and Production (or Procedural) Memory. Working Memory shares the same features discussed in previous blogs (see ‘Eight important facts about Working Memory’) while Declarative and Production Memory may be seen as two subcomponents of Long-Term Memory (LTM). The model is based on the assumption that human cognition is regulated by cognitive structures (Productions) made up of ‘IF’ and ’THEN’ conditions. These are activated every single time the brain is processing information; whenever a learner is confronted with a problem the brain searches for a Production that matches the data pattern associated with it. For example:


IF the goal is to form the present perfect of a verb and the person is 3rd singular/


THEN form the 3rd singular of ‘have’


IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed /


THEN form the past participle of the verb


The creation of a Production is a long and careful process since Procedural Knowledge, once created, is difficult to alter. Furthermore, unlike declarative units, Productions control behaviour, thus the system must be circumspect in creating them. Once a Production has been created and proved to be successful, it has to be automatised in order for the behaviour that it controls to happen at naturalistic rates. According to Anderson (1985), this process goes through three stages: (1) a Cognitive Stage, in which the brain learns a description of a skill; (2) an Associative Stage, in which it works out a method for executing the skill; (3) an Autonomous Stage, in which the execution of the skill becomes more and more rapid and automatic.


In the Cognitive Stage, confronted with a new task requiring a skill that has not yet been proceduralised, the brain retrieves from LTM all the declarative representations associated with that skill, using the interpretive strategies of Problem-solving and Analogy to guide behaviour. This procedure is very time-consuming, as all the stages of a process have to be specified in great detail and in serial order in WSTM. Although each stage is a Production, the operation of Productions in interpretation is very slow and burdensome as it is under conscious control and involves retrieving declarative knowledge from LTM. Furthermore, since this declarative knowledge has to be kept in WSTM, the risk of cognitive overload leading to error may arise.


Thus, for instance, in translating a sentence from the L1 into the L2, the brain will have to consciously retrieve the rules governing the use of every single L1-item, applying them one by one. In the case of complex rules whose application requires performing several operations, every single operation will have to be performed in serial order under conscious attentional control. For example, in forming the third person of the Present perfect of ‘go’, the brain may have to: (1) retrieve and apply the general rule of the present perfect (have + past participle); (2) perform the appropriate conjugation of ‘have’ by retrieving and applying the rule that the third person of ‘have’ is ‘has’; (3) recall that the past participle of ‘go’ is irregular; (4) retrieve the form ‘gone’.


Producing language by these means is extremely inefficient. Thus, the brain tries to sort out the information into more efficient Productions. This is achieved by Compiling (‘running together’) the productions that have already been created so that larger groups of productions can be used as one unit. The Compilation process consists of two sub-processes: Composition and Proceduralisation. Composition takes a sequence of Productions that follow each other in solving a particular problem and collapses them into a single Production that has the effect of the sequence. This process lessens the number of steps referred to above and has the effect of speeding up the process. Thus, the Productions


P1 IF the goal is to form the present perfect of a verb / THEN form the simple present of have


P2 IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed / THEN form the past participle of the verb would be composed as follows:


P3 IF the goal is to form the present perfect of a verb / THEN form the present simple of have and THEN the past participle of the verb


An important point made by Anderson is that newly composed Productions are weak and may require multiple creations before they gain enough strength to compete successfully with the Productions from which they are created. Composition does not replace Productions; rather, it supplements the Production set. Thus, a composition may be created on the first opportunity but may be ‘masked’ by stronger Productions for a number of subsequent opportunities until it has built up sufficient strength (Anderson, 2000). This means that even if the new Production is more effective and efficient than the stronger Production, the latter will be retrieved more quickly because its memory trace is stronger.


The process of Proceduralisation eliminates clauses in the condition of a Production that require information to be retrieved from LTM memory and held in WSTM. As a result, proceduralised knowledge becomes available much more quickly than non-proceduralised knowledge. For example, the Production P2 above would become


IF the goal is to form the present perfect of a verb


THEN form ‘have’ and then form the past participle of the verb


The process of Composition and Proceduralisation will eventually produce after repeated performance:


IF the goal is to form the present perfect of ‘play’/ THEN form ‘ has played’


For Anderson it seems reasonable to suggest that Proceduralisation only occurs when LTM knowledge has achieved some threshold of strength and has been used some criterion number of times. The mechanism through which the brain decides which Productions should be applied in a given context is called by Anderson Matching. When the brain is confronted with a problem, activation spreads from WSTM to Procedural Memory in search for a solution – i.e. a Production that matches the pattern of information in WSTM. If such matching is possible, then a Production will be retrieved. If the pattern to be matched in WSTM corresponds to the ‘condition side’ (the ‘if’) of a proceduralised Production, the matching will be quicker with the ‘action side’ (the ‘then’) of the Production being deposited in WSTM and make it immediately available for performance (execution). It is at this intermediate stage of development that most serious errors in acquiring a skill occur: during the conversion from Declarative to Procedural knowledge, unmonitored mistakes may slip into performance.


The final stage consists of the process of Tuning, made up of the three sub-processes of Generalisation, Discrimination and Strengthening. Generalisation is the process by which Production rules become broader in their range of applicability thereby allowing the speaker to generate and comprehend utterances never before encountered. Where two existing Productions partially overlap, it may be possible to combine them to create a greater level of generality by deleting a condition that was different in the two original Productions. Anderson (1982) produces the following example of generalization from language acquisition, in which P6 and P7 become P8


P6 IF the goal is to indicate that a coat belongs to me THEN say ‘My coat’


P7 IF the goal is to indicate that a ball belongs to me THEN say ‘My ball’


P8 IF the goal is to indicate that object X belongs to me THEN say ‘My X’


Discrimination is the process by which the range of application of a Production is restricted to the appropriate circumstances (Anderson, 1983). These processes would account for the way language learners over-generalise rules but then learn over time to discriminate between, for example, regular and irregular verbs. This process would require that we have examples of both correct and incorrect applications of the Production in our LTM.


Both processes are inductive in that they try to identify from examples of success and failure the features that characterize when a particular Production rule is applicable. These two processes produce multiple variants on the conditions (the ‘IF’ clause(s) of a Production) controlling the same action. Thus, at any point in time the system is entertaining as its hypothesis not just a single Production but a set of Productions with different conditions to control the action.

Since they are inductive processes, Generalization and Discrimination will sometimes err and produce incorrect Productions. As I shall discuss later in this chapter, there are possibilities for Overgeneralization and useless Discrimination, two phenomena that are widely documented in L2-acquisition research (Ellis, 1994). Thus, the system may simply create Productions that are incorrect, either because of misinformation or because of mistakes in its computations.
ACT* uses the Strengthening mechanism to identify the best problem-solving rules and eliminate wrong Productions. Strengthening is the process by which better rules are strengthened and poorer rules are weakened. This takes place in ACT* as follows: each time a condition in WSTM activates a Production from procedural memory and causes an action to be deployed and there is no negative feedback, the Production will become more robust. Because it is more robust it will be able to resist occasional negative feedback and also it will be more strongly activated when it is called upon:
The strength of a Production determines the amount of activation it receives in competition with other Productions during pattern matching.Thus, all other things being equal, the conditions of a stronger Production will be matched more rapidly and so repress the matching of a weaker Production (Anderson, 1983: 251)
Thus, if a wrong Interlanguage item has acquired greater strength in a learner’s LTM than the correct L2-item, when activation spreads the former is more likely to be activated first, giving rise to error. It is worth pointing out that, just as the strength of a Production increases with successful use, there is a power-law of decay in strength with disuse.
2.Extending the model: adding a ‘Procedural-to-Procedural route’ to L2-acquisition
One limitation of the model is that it does not account for the fact that sometimes unanalysed L2-chunks of language are through rote learning or frequent exposure. This happens quite frequently in classroom settings, for instance with set phrases used in everyday teacher-to-student communication (e.g. ‘Open the book’, ‘Listen up!’). As a solution to this issue Johnson (1996) suggested extending the model by allowing for the existence of a ‘Procedural to Procedural route’ to acquisition whereby some unanalysed L2-items can be automatised with use, ‘jumping’, as it were, the initial Declarative Stage posited by Anderson.
This means that teaching memorised unanalysed chunks can work in synergy with explicit language teaching, as happens in my approach. See my blog post on how I teach lexicogrammar.