Irregular before regular – maximizing explicit grammar instruction by inverting traditional instructional sequences

images (7)

In most coursebooks and schemes of work adopted by UK MFL education providers, the exceptions to a given grammar structure are usually taught after the dominant rule governing that structure has been imparted. In the present post I argue that in many cases inverting the teaching sequence may have a more beneficial impact on acquisition. The rationale for this approach is rooted in the way the brain forms and revise the L2 Interlanguage system.

When a learner is taught a grammar rule, the brain creates a cognitive ‘structure’ that s/he will consolidate through much receptive exposure and production. As already discussed in my post on how L2 grammar ‘rules’ are acquired, when a grammar structure is in the process of being automatised, the brain tends to be extremely circumspect in accepting as ‘correct’ – and consequently ‘learnable’ – any use of that structure which does not match the declarative knowledge (or mental rule representation) stored in Long-Term Memory which refers to it. This is particularly true of the final stage in L2 grammar structure acquisition – Andersons’s (2000) Strengthening process. During this stage, the brain needs to be particularly impervious to any alteration to the rule system referring to that structure in order for that system to be stable and avoid encoding ambiguity. For any successful cognitive restructuring of an existing grammar rule to occur two conditions must be met:

  • The grammar rule one wants to restructure must be fully acquired for any exception to it to be incorporated; only then will the brain be more likely to ‘see’ the exception to that rule as a separate subsystem which does not pose any ‘threats’ to the dominant rule system;
  • The exception to the rule must be processed by the brain numerous times in salient and meaningful contexts; this entails that exceptions to a given rule which do not occur frequently in the language processed in classroom or out-of-the-classroom L2-based activities are less likely to be internalized as they will be ‘masked’ so to speak by the dominant rule.

Let us look at an example: teaching the Passé Composé in French. Coursebooks normally begin with the verbs forming this tense with ‘Avoir’ and after a few lessons move on to the ‘Etre’ verbs. Whilst some of the more able and focused learners can cope with this, in my experience many learners cannot. Very often, teachers may believe students have acquired mastery over the two sets of rules based on their learners’ ability to perform successfully at cloze tasks or other mechanical grammar activities. However, in less structured activities (e.g. spontaneous speech) errors in this area will be usually rife.

Issues in acquiring the exception to the dominant Passé Composé rule are exacerbated by the fact that very few of the verbs requiring Etre are high frequency verbs, hence the students do not usually receive great exposure to them when processing classroom or naturalistic French input. This will make restructuring of the ‘have + past participle’ rule more difficult.

In this case, teaching the ‘Etre’ verbs before the ‘Avoir’ ones is a more effective strategy; once acquired the exception (Etre + past participle) through extensive modelling and practice, the learners will find it easier to learn the dominant rule due to the very frequent occurrence of ‘Avoir Verbs’ in classroom or naturalistic target language input.

The same applies to any other grammar structure where the exceptions to the rule do not occur very frequently in the instructional or naturalistic target language input. Think about irregular past participle such as ‘reçu’, ‘vecu’, ‘su’, etc. which are notoriousy less easy for students to acquire than ‘pris’ or ‘fait’, for instance.

In conclusion, L2 teachers, curriculum designers and course-book writers may want to invert the traditional instructional sequence whereby irregular forms are taught after the regular ones. Moreover, before moving from the less dominant ‘X’ rule sub-system to the dominant one, they ought to ensure as much as possible that the former has been internalized through masses of practice; in other words, that the learners master the use of the target grammar structure not simply in terms of knowing the rule but also in terms of cognitive control over its use, under Real Operating Conditions (see my post on ‘Cognitive Control’ if not clear as to what I mean here).

The case for translation in foreign language instruction

Introduction

This article was inspired by www.frenchteacher.net Steve Smith’s very informative and insightful post ‘What is the point of translation” (http://tinyurl.com/ooxjxeg) in which he clearly outlines the pros and cons of adopting translation in the MFL classroom. I strongly recommend Steve’s brilliant article – a must-read for MFL teachers. The present post is meant as a way to add a research ‘edge’ to and expand on Steve’s very valid points.

The controversy over translation

Whether translation is a useful  learning tool or not is still very controversial amongst L2 educators (Brown, 2002). Why? Mainly because not much research has been carried out on the extent of its impact on L2 proficiency. Moreover, at least until recently, translation has been out of favour with large part of the teacher community because of the following reasons:

  • It is associated with the Grammar translation approach;
  • It is assumed that L1 use in the classroom hampers L2 acquisition;
  • Translation is seen by many as a mechanical transfer of meaning from one language to another – not a communicative activity;
  • Translation tasks are perceived as boring;
  • Translation is seen as independent of the other four skills;
  • Translation takes up lots of valuable time that could be devoted to more beneficial communicative activities;
  • Translation is believed to be appropriate only for training translators.

However, attitudes towards translation have been gradually shifting recently, especially in the last 10 -15 years. As Duff (1994) points out, translation is a real-life task that happens everywhere around the world in a wide range of contexts. In the MFL classroom, students translate for their classmates L2 items they do not understand on a daily basis. When visiting a foreign country, L2-knowers translate for non L2-knowers signs, notices, announcements, etc. When socializing with foreigners, interpreting is a common occurrence, too. And I would add to this that, when using the internet, our learners draw upon translation more than often in their interaction with social media or other knowledge sources – whether through dictionaries or other digital tools. Finally, Kern (1994) found that most teachers agree that mental translation into one’s L1 is inevitable when reading.

Moreover, research in Good Language Learner Strategies has found that more effective students often “refer back to their native language(s) judiciously [translate into L1] and make effective cross-lingual comparisons at different stages of 293 language learning” (Naiman et al, 1978:14). Increasingly, studies suggest a facilitative role of translation or L1 transfer in students’ language learning (e.g. Omura, 1996; Prince, 1996; Cohen & Brooks-Carson, 2001). In Horwitz’s (1988) study the majority of German (70%) and Spanish (75%) students believed that learning a foreign language is largely a matter of learning to translate from English into their L1. Prince (1996) noted that students often believe that learning through translation, with the new word being linked to its native-language equivalent, is more effective than learning vocabulary in context.

Hsieh (2000) reported that translation benefited his Taiwanese students’ L2 reading strategies, vocabulary acquisition, whilst enhancing their cultural background knowledge: 85% of his informants reported that translating helped them pay attention to the coherence and contextualization of English reading text; 65% thought that they became more aware of multiple meanings of an English word; and 62% felt that translation helped extend vocabulary knowledge and reading skills. On the whole, these students believed that the adoption of translation had a desirable effect on their English reading and vocabulary learning.

Several studies (Zhai, 2008; Cumming, 1989; Uzawa, 1996; Kobayashi & Rinnert,1992; Cohen & Brooks-Carson, 2001) have investigated the effect of composing in L1 and then translating into L2. Zhai (2008) concluded that the lower-level learners benefit most from the translated writing. Similarly, Cumming (1989) reported that inexpert French ESL writers use their first language to generate content, and expert writers, in contrast, use translation not just to generate content but to verify appropriate word choice

Dagiliene (2012) found that “ translation activities are a useful pedagogical tool. When introduced purposefully and imaginatively into language learning programme, translation becomes a suitable language practice method for many students. When integrated into daily classroom activities translation can help students develop and improve reading, speaking, writing skills, grammar and vocabulary. Translation in foreign language classes enhances better understanding of structures of the two languages and also strengthens students’ translation skills. It is an effective, valid tool in the foreign language learning and can be used in the university classroom to improve knowledge in English. Still, translation should not be overused and should be integrated into language teaching at the right time and with the right students”.

My case for translation

Before being a teacher I worked as an interpreter and translator (English to French /Italian and viceversa) for about three years. A lot of the written translations involved highly specialized vocabulary that is not normally learnt in school or in a naturalistic setting. It was very challenging, but I learnt loads and not just in terms of vocabulary and grammar; it definitely improved my accuracy, especially in terms of those ‘horrible’ little function words that every L2 learner struggles with: e.g. prepositions.

L2-to-L1 translation tasks, especially when carried out with the support of a bilingualised dictionary and pitched at the right level of challenge can do marvel in terms of linguistic proficiency enhancement; provided, that is, that they are carried out as part of a well-sequenced inter- and intra-lesson series of tasks which recycles the target vocabulary and structures systematically. The following are, in my view, the most important benefits of translation as an instructional tool:

  • Vocabulary consolidation and expansion – this is demonstrated by a number of studies and is pretty self-evident;
  • Noticing grammar and lexical collocations in context– by this I refer to Schmidt’s (1990) noticing hypothesis, whereby spotting the difference between the L1 and L2 usage of a given grammar structure sparks off the acquisition process of that structure ( i.e. the cross-lingual comparison that Naiman et al, 1978, mentioned above, alluded to). An example from a recent lesson: I wanted my students to notice verb-subject inversion in Spanish, but I wanted them to do so in context and without any input whatsoever from me. As I was working out a teaching strategy which would prompt that process I immediately thought of translation; it would have definitely forced them to ask me and/or themselves the question: where is the subject of this sentence? And indeed, when I did ask them to translate a text which included a few instances of verb-subject inversion the next day, it did spark off many questions along this line.
  • Rigor / Focus on accuracy – In the most common reading tasks staged in MFL classrooms, student can ‘get away’ with just ‘getting’ the main gist of the text or spot the required details through the use of para-textual or contextual cues. However, when asked to translate, they cannot operate impressionistically all of the time. They need to interpret the meaning of each and every word and make sense of each syntactic unit. This focuses the learner on grammar and syntax as well since, even when they use dictionaries, they will often have to analyze the grammar to infer meaning.
  • Resourcing’ strategies enhancement – in my Ph.D study I identified ‘resourcing’ as one of the most powerful language learning strategies in terms of vocabulary, spelling and grammar knowledge acquisition. Translation tasks whether from the L1 to the L2 or viceversa will require students to resort to dictionaries, more advanced target language knowers or online forums (e.g. http://www.wordreference.com).
  • Ease of differentiation – It is easy to cater for different abilities through translation tasks. If the translation involves sentences, one can create subsets of sentences for each ability group in the class; when it involves longer texts (e.g an 80 words e-mail), one can design them in such a way that the language starts easy and becomes increasingly complex.
  • Control on input/output – It is one of the tenets of my approach to foreign language instruction that before involving students in unstructured / unplanned activities one should engage them in fairly extensive controlled practice (from easy to incrementally challenging). Translation is very valuable in the context of this approach as it is one of very few tasks that gives the teacher total control on student output. An example: your students are going to perform an oral task that you have carried out several times before in the past in the same topic areas. You will have practised the relevant lexical items already as discrete items and in the context of reading and listening texts. Now, prior to the task you may want to prepare them for the larger units of meaning they are going to attempt to convey in the oral interaction (i.e. sentences); knowing what kind of sentences they are likely to produce in the performance of the target task – based on past experience – you can ask the class to translate them on mini-boards, on a google-doc (displayed on the classroom screen/interactive whiteboard/Apple tv) or through a card game. That should facilitate the ensuing task.

Important caveats and guidelines for translation tasks implementation

The reader should note the following important caveats which, in my view, should be heeded in the adoption of translation as a learning tool:

  1. Translation tasks are best given as homework, unless they are used as relatively short and snappy starters, plenaries or pre-task warm-ups with more confident learners;
  2. Translation tasks must be logically integrated in the learning flow of a lesson or series of lessons. L2 to L1 translation, can occur at any point in an activities sequence; however, L1 to L2 translation tasks should only be staged after extensive practice in the vocabulary and grammar structures they include have been practised extensively. In the language-gym.com/work-outs modules, for instance, L2 to L1 translation only occurs at end of a sequence of 10-15 vocabulary building/reading activities;
  3. Translation tasks are not for everyone; teachers need to be careful in adopting translation with less able or less motivated earners;
  4. Comprehensible input (from L2 to L1) and achievable output (from L1 to L2) should be the guiding principle in the selection or design of translation tasks. Vigotsky’s zones-of-proximal-development should be borne in mind in designing or selecting translations;
  5. Since translation is perceived by many students as a ‘boring’ task (Dagiliene, 2012), teachers need to ensure that translation tasks are as stimulating and imaginative as possible as well as relevant to what the students are learning, the objective of the lesson and the preceding sequence of activities;
  6. Translations should not be randomly selected using merely relevance to the topic as a guiding principle; they should recycle as much as possible the target vocabulary and grammar;
  7. In L2 to L1 translation, the target sentences / texts should contain as many contextual clues as possible in order to facilitate the inference of unfamiliar language items;
  8. Students should be given access to bilingualised dictionaries (e.g. http://www.wordreference.com);
  9. Unless one is dealing with very advanced students, L1 to L2 translation including challenging and connotative language or complex idioms ought to be avoided; language should preferably be denotative and straightforward to translate;
  10. In order to avoid cognitive overload, sentences including complex subordination should be avoided in L2-to-L1 translation with less advanced learners;
  11. It may be advisable to scaffold L1 to L2 translation for less advanced learners by cueing students to the problematic nature of specific language items. Colour coding, symbols or simply different fonts could be used to this effect. For example: in the sentence “I live in a big city” the adjective big may be written in bold as a reminder that there is something to be aware of in translating it into French (big in French precedes the noun, unlike the majority of French adjectives);
  12. Translation should not be overused as a classroom and even homework task, unless we are dealing with highly motivated and able learners.

In conclusion, translation can be very useful as a learning tool if one bears in mind the above caveats and guidelines. I believe the arguments I have put across in this article make a sufficiently strong case for adopting it in the MFL classroom or as homework fairly regularly – but judiciously, without overusing it. I particularly recommend the adoption of translation to teachers who, like myself, lay a lot of emphasis on unplanned oral interaction, as a way to balance the emphasis on communicative fluency with focus on form and accuracy.

Transcription – a much underused yet powerful micro-listening skills enhancer

transcribing_catland

Being a music lover, many years ago (before sites like http://www.metrolyrics.com existed), as a learner of English, French, Spanish and German, I improved my listening skills and vocabulary repertoire by transcribing songs in those languages. Being forced to listen to each song over and over again, the process sharpened my ‘ear’ for the target language sounds whilst forcing me to use dictionaries in search for any word which matched whatever I heard the singer say often using the context to try and guess them. I learnt lots of new words in the process and my pronunciation improved, too, massively.

As a teacher, I have often recycled that learning strategy with my pre-intermediate to upper intermediate students, mostly as homework, but at times in class, too. The objective? Mainly to develop micro-listening skills and spelling, especially word-endings, so problematic for many learners of romance languages; but most importantly, to focus learners on the relationship between the target language phonemic and graphemic systems, which, especially in English and  French, is not always strightforward. As Macaro (2007) rightly notes, Transcription, defined as ‘the converting of authentic recorded text to written form by individuals with the freedom to listen and repeat as often as they wish’  is a much underused activity. He points out three advantages of learning tasks involving transcription.

  1. It practices phonological decoding (which in turns will impact on pronunciation);
  2. It enables the learner to practice analytical skills (by thinking through the separation of the morpho-syntax);
  3. It focuses students on spelling.

I would add a fourth advantage to transcription which relates to the post-task activities. In my experience, when – after completion of the task – the students listen to the same text again whilst reading the original (correct) script, they often experience a few ‘eureka’ moments when they notice the way phonemes relate to their graphemic form. Moreover, quite a few questions about target language phonology will be asked that teachers never usually hear after typical listening comprehension tasks. For instance, the other day, from an L1-Italian student: “Sir, why did he not pronounce the ‘h’ in ‘heir’? I thought you are supposed to pron ‘h’ in English!”. What is remarkable about this question is that I had made that point about the word ‘heir’ several times before in class; yet, the boy only noticed it in the context of this task, due to the greater focus it lays on phoneme-grapheme correspondence.

The following are three transcription tasks I use quite a lot. Teachers should note that for task 1 and 2 it is preferable not to use lengthy texts. Moreover, as I am sure it is evident, teachers should use easy texts to start with and may want to carry out vocabulary building pre-transcription tasks involving the language items found in the target text – especially the more linguistically challenging ones.

  1. Pure transcription of video or audio recording – students simply transcribe the passage they hear, writing down every word. This is more suitable for highly motivated and able groups.
  2. L1-scaffolded transcription – students are provided the L1-translation of the to-be-listened text on the left-hand side of a piece of paper and, whilst listening to it, they write out what they hear in the target language on the right-hand side. The rationale for providing the L1 translation is that it gives the learners some badly needed support when they struggle with more challenging words.
  3. Partial transcription tasks – the students are provided with a gapped transcript of the recording. The gaps involve entire sentences. This type of transcription task is useful in that the sentence preceding each gap helps the students in the decoding of the missing sentence, thereby eliciting the application of inference strategies.

I have been using transcription tasks for a very long time with my more able groups, especially as homework. I use it at KS3, too, with very short texts or sets of 7-8 sentences on a given topic and students find it extremely useful. Giving feedback on them is very simple: scan the transcript from the textbook, show it on the screen and ask the students to self- or peer-correct. The learning benefits cut across several dimension of language acquisition and go well beyond micro-listening skill enhancement. They involve gains in vocabulary, spelling and even grammar and syntax as feedback will inevitably concern itself with ending mistakes.

Combining transcription tasks with narrow listening activities (see my blog on Narrow listening) creates, in my experience, an amazing powerful learning synergy which addresses a wide range of macro- and micro-listening skills components by recycling vocabulary and grammar structures at every possible level of target language processing.

Finally, let us not forget that apart from its value as a teaching and learning strategy, Transcription is a skill that has a number of application in the academic and business world (journalism, social science research, interpreting, etc.).

The ‘student-K’ paradox- How ineffective classroom learning can enhance language proficiency and implications for MFL instruction

images (6)

Recently, I have been undertaking lots of research into vocabulary acquisition in order to enhance the learning potential of my website (www.language-gym.com) at a time when it is undergoing massive restructuring and expansion. But as always with ‘paper’-based research, I felt something important was missing: the student’s views on how learning occurs; not in its dry, ‘scientific’ representation by scholars and researchers, but as articulated by the learners’ themselves.

Hence, I turned to my students and, as I usually do every term, I carried out a few semi-structured interviews with my most effective and least effective students. As always, the findings were fascinating. Today, one student in particular, ‘K’, yielded the most interesting data ever, as they referred to an apparently paradoxical phenomenon that I had never heard of or read about before; what I will refer to, from now on, in my personal jargon, as the ‘student-K paradox’. But who is ‘Student K’? And what is this ‘student-K paradox?’

‘K’ is the dream language student every effective language teacher would love to teach and every less effective teacher would fear to have in their classes: the hyper-talented, driven, inquisitive and risk-taking student who, however meaty your lesson content is, however ambitious your learning intentions are, will always ask you for more – more words, more grammar, more resources, more challenge.

Despite never showing real interest for French or Spanish before the age of 11 – two years ago – and coasting through 2 years of French at Primary, K. has managed to attain at 13 a level of proficiency in both languages that I had never witnessed before in any individual of her age. Her spoken and written output is not only rich, varied, complex and accurate but seems to be produced effortlessly. Whilst interacting with her in spontaneous speech, one never detects any anxiety. What is particularly striking about her, is her vocabulary repertoire, which transcends by far the boundaries of an excellent GCSE student – and she is only Year 9!

But how did she get there? This is the most interesting bit; the answer is: because she felt that in year 5 and 6 she was not learning much in class. She felt that her teacher’s approach just wasn’t working for her. She gave too many worksheets, teaching too much grammar causing an information and cognitive overload not supported by effective recycling that K just could not handle. In addition, she felt the approach was overly prescriptive and ‘narrow’ in terms of learning scope; she felt she was not progressing and grew increasingly worried about it.

So, K decided to address the issue by taking on French grammar on her own, at home. She started googling the grammar points she felt her teacher did not teach her properly; study independently; taking notes; doing online activities; reading and translating independently, etc. until she ‘nailed’ the verbs and tenses she had not managed to learn in class. In other words, the anxiety she felt in class for not being able to learn from her teacher’s input paradoxically enhanced her learning as she self-initiated activities which widened her lexical repertoire and improved her knowledge of the target grammar. She would then try out the lexical and grammar items learnt through this process in class so as to obtain teacher feedback (vocabulary and grammar activation). It would be interesting to know whether K would have learnt more than she did autonomously had the teacher taught her more effectively. Probably not…

Another crucial acquisition factor she mentioned was the new teacher she had in Year 7 who allegedly inspired her to go beyond the vocabulary set by the book and the schemes of work. She found the lessons with the new teacher more fun – although in the interview she could not articulate why. The teacher would encourage her to be creative and risk-taking with the language and gave her additional vocabulary lists she would eagerly learn independently at home. Studying these vocabulary lists – in the most traditional way ever, i.e. rote learning – became almost an obsession for her.

Another strategy that she began using was to read the multilingual instructions that came with whatever product (e.g. electrical appliances, gadgets, computers) her or her parents bought, so as to learn new vocabulary. She also said that whenever she said or thought of a ‘cool’ phrase or sentence in English she would try to translate it into French using dictionaries or by asking the teacher for help with it.

Student K’s case is impressive in many respects and, if it were true of other talented linguists of similar caliber it would have important implications for learning. First and foremost, K’s story clearly illustrates how powerful facilitative anxiety can be in enhancing learning; facilitative anxiety, as conceptualized by Macintyre and Gardner (1989) refers to the state of arousal caused by mild levels of anxiety which may push a student to do better at something (e.g. my current approach is not working; I need to do something about it!). Obviously, this does not entail that teacher should deliberate be ineffective in class in order to foster effective autonomous learning. However, it does indicate that autonomous work by a student as young as K can yield amazing results; hence, teachers may have to find strategies to motivate students to learn vocabulary autonomously at home – easier these days due to the availability of mobile digital technology. Although equipping the students with effective vocabulary learning strategies may be important, K’s case shows that inspiring them, make them want to learn vocabulary independently may be more crucial.

K’s learning behaviour also seems to confirm Gu and Johnson’s (1996) finding that self-initiation and activation can play a huge role in vocabulary acquisition. K requested extra vocabulary lists and worked on them alone at home (self-initiation); she would then deliberately use the new words learnt from those lists in class to try them out in context to see if the use she made of them was accurate (activation). The implication is that teaching should concern itself much more than it currently does with modelling these two independent learning behaviours. I can identify with this, especially in my learning of English, Spanish, French and German; less so, with my Swedish and Malay – is this why I speak them much less fluently?

K’s use of vocabulary lists to learn new lexis is also interesting as it goes against what I have always said to my trainees. K does not use fancy mnemonic devices or engaging online vocabulary-building games. Yet, her vocabulary repertoire is vast and varied. This goes to show that motivation and the focal awareness it places on the target linguistic items can seriously impact vocabulary learning. To brush aside the old-fashioned way of using word pairs based on the argument that it is boring or obsolete may be wrong, after all. And K’s preference for word lists is echoed by a number of recent studies that show the superiority of this approach to vocabulary learning to the keyword technique.

Another important implication relates to G and T (gifted and talented) provision in schools. Teachers are often recommended to teach G & T students through higher order thinking tasks. On the other hand, K’s case defies this notion; learning L1/L2 word pairs from a vocabulary list does hardly involve higher order thinking skill. The implications for learning are that to best cater for our learners’ needs we may need to ask them what their preferred learning tools and strategies are rather than using our presumptions as to what best suits their higher cognitive and linguistic capabilities.

K’s learning strategy involving using multilingual translations, kind of echoes my point in a previous post on how parallel texts can foster vocabulary acquisition. It is an easy way to notice and learn the differences between the L1 and the L2 as well learn new words effortlessly.

Last, but not least, the impact of the inspiring teacher on K’s learning. K did self-initiate autonomous learning in order to compensate for the lack of progress in lessons; however, it was the inspiring teacher who brought her learning to another level by being less prescriptive than her predecessor and by letting K go beyond the learning intentions boundaries set in each lesson.

In conclusion, my interview with student K was a real ‘eye-opener’ as it defied many of my pre-conceptions about effective learning. Teachers ought to set aside some time every so often to interview students as a means to understand them better and sync their teaching to their needs – it can be one of the best professional development practices they may ever get. K’s account of how she learns vocabulary made me understand so much more about her and about students like her. These learners may not necessarily need to be involved in higher order thinking skills; they may simply need to be inspired and encouraged to learn autonomously in ways that best suit them.

Nine interesting foreign language research findings you may not know about

images (5)

In  this post I am going to share with the reader a very succinct summary of 9 pieces of research I have recently come across which I found interesting and have impacted my classroom practice in one way or another. They are not presented in any particular order.

  1. Green and Hecht 1992 – Area: Explicit grammar instruction and teaching of aspect

Green and Hecht investigated 300 German learners of English. They asked them to correct 12 errors in context and to offer an explanation of the rule. Most interesting finding: the students could correct 78 % of the errors but could not provide an explanation for more than 46 % of the grammar rules that referred to those errors. They identified a set of rules that were hard to learn (i.e. most students did not recall them) and a set of easy rules (the vast majority of them could recall them successfully). Their implications for teaching: the explicit teaching of grammar may actually not work for all grammar items. For example, the teaching of aspect (e.g. Imperfect vs Preterite in Spanish), would be more effectively taught, according to them, by exposure to masses of comprehensible input (e.g. narrative texts) rather than through the use of PPTs or diagrams on the classroom whiteboard/screen – in fact Blyth (1997) and Macaro (2002a) demonstrated the futility of drawing horizontal lines interrupted by vertical ones to indicate that the perfect tense ends the action.

My conclusions: I do not entirely agree with Blyth and Macaro that explicit explanation of grammar in the realm of aspect does not work and I do like diagrams (although they do not work with all of one’s students). However, I do agree with Green and Hecht (1992) that the best way to teach aspect is through exposure to masses of comprehensible input containing examples of aspect in context. The grammar explanation and production phase may be carried out at a later stage.

  1. Milton and Meara (1998) – Comparative study of vocabulary learning between German, English and Greek students aged 14-15 years.

197 students from the three countries studying similar syllabi for the same number of years were tested on their vocabulary. The findings were that:

1.The British students’ score was the worst (averaging at 60 %). According to the researchers, they showed a poor grasp of basic vocabulary ;

2.They spent less time learning and were set lower goals than their German and Greek counterparts;

3. 25 % of the British students scored so low (after four years of MFL learning) that the researchers questioned whether they had learnt anything at all.

The authors of the study also found that British learners are not necessarily worse in terms of language aptitude; rather, they questioned the effectiveness of MFL teaching in the UK.

My conclusions: this study is quite old and the sample they used may not be indicative of the overall British student population. If it were, though, representative of the general situation in Britain, teachers may have to – as I have advocated in several previous blogs of mine – consciously recycle words over and over again, not just within the same units, but across units.

Moreover a study of 850 EFL learners, by Gu and Johnson (1996), may indicate an important issue underlying our students poor vocabulary retention; they found that students who excelled in vocabulary size were those who used three metacognitive strategies in addition to the cognitive strategies used by less effective vocabulary learners : selective attention to words (deciding to focus on certain words worth memorizing), self-initiation (making an effort to learn beyond the classroom and the exam system) and deliberate activation of newly-learnt words (trying out using that word independently to obtain positive or negative feedback as to the correctness of their use) . Teaching should aim, in other words, at developing learner autonomy and motivation to apply all of these strategies independently outside the classroom.

  1. Knight (1994) – Using dictionaries whilst reading – effects on vocabulary learning

Knight gave her subjects a text to read on a computer. One group had access to electronic dictionaries whilst the other did not. She found that those who did use the dictionary and not simply guessing strategies, actually scored higher in a subsequent vocabulary test. This and other previous (Luppescu and Day, 1993) and subsequent studies (Laufer & Hadar, 1997; Laufer & Hill, 2000; Laufer & Kimmel,1997) suggest that students should not be barred from using dictionaries in lessons. These findings are important for 1:1 (tablet or PC) school settings considering the availability of free online dictionaries (e.g. www.wordreference.com).

  1. Anderson and Jordan (1998) – Rate of forgetting

Anderson and Jordan set out to investigate the number of words that could be recalled by their informants immediately after initial learning, 1 week, 3 weeks, and 8 weeks thereafter. They identified a learning rate of 66%, 48%, 39%, and 37% respectively. The obvious implication is that, if immediately after learning the subjects could not recall 66 % of the target vocabulary, consolidation should start then and continue (at spaced intervals – through recycling in lessons or as homework) for several weeks. At several points during the school year, I remind my students of Anderson and Jordan’s study and show them the following diagram. It usually strikes a chord with a lot of them:

ebbinghaus-graph

  1. Erler (2003) – Relationship between phonemic awareness and L2 reading proficiency

Erler set out to investigate the obstacles of learners of French as a foreign language in England. She studied 11-12 year olds. She found that there was a strong correlation between low level of phonemic awareness and reading skills (especialy word recognition skills). She concluded that explicit training and practice in the grapheme-phoneme system (i.e. how letters/combination of letters are pronounced) of French would improve L1-English learners’ reading proficiency in that language. This find corroborates other findings by Muter and Diethelm (2001) and Comeau et al (1999). The implications is that micro-listening enhancers of the like I discussed in a previous blog (e.g. ‘Micro-listening skills tasks you may not do in your lessons’) or any other teaching of phonics should be performed in class much more often than it is currently done in many UK MFL classrooms.

Please note: teaching pronunciation and decoding skills instruction are not the same thing.  Pronunciation is about understanding how sounds are produced by the articulators, whilst teaching decoding skills means instructing learners on how to convert letters and combination of letters into sound. Also, effective decoding-skill instruction occurs in communicative contexts (whether through receptive or productive processing) not simply through matching sounds with gestures and/or phonetic symbols.

  1. Feyten (1991) – Listening ability as predictor of success

Feyten investigated the possibility that listening ability may be a predictor of success in foreign language learning. The researcher assessed the students at pre-test using a variety of tasks and measures of listening proficiency. After a ten-week course she tested them again (post-test) and found that there was a strong correlation between listening ability and overall foreign language acquisition, i.e.: the students who had scored high at pre-test did better at post-test not just in listening, but also in written grammar, reading and vocabulary assessment. Listening was a better predictor of foreign language proficiency than any other individual factor (e.g. gender, previous learning history, etc.).

My implications: we should take listening more seriously than we currently do. Increased exposure to listening input and more frequent teaching of listening strategies are paramount in the light of such evidence. Any effective baseline assessment at the outset of a course ought to include a strong listening comprehension component; the latter ought to include a specific decoding-skill assessment element.

  1. Graham (1997) – Identification of foreign language learners’ listening strategies

This study investigated the listening strategies of 17-year-old English learners of German and French. Amongst other things she found the following issues undermining their listening comprehension. Firstly, they were slow in identifying key items in a text. Secondly, they often misheard words or syllables and transcribed what they believed they had heard thereby getting distracted. Graham’s conclusions were that weaker students overcompensated for lack of lexical knowledge by overusing top-down strategies (e.g. spotting key words as an aid to grasp meaning).

My implications are that Graham’s research evidence, which echoes finding from Mendelsohn (1998) and other studies, should make us wary of getting students to over-rely on guessing strategies based on key-words recognition. Teachers should focus on bottom-up processing skills much more than they currently do, e.g. by practising (a) micro-listening skills; (b) narrow listening or any other listening instruction methodology which emphasizes recycling of the same vocabulary through comprehensible input (N.B. not necessarily through videos or audio-tracks; it can be teacher-based, in absence of other resources); (c) listening with transcripts – whole, gapped or manipulated in such a way as to focus learners on phoneme-grapheme correspondence.

  1. Polio et al. (1998) – Effectiveness of editing instruction

Polio et al. (1998) set out to investigate whether additional editing instruction – the innovative feature of the study – would enhance learners’ ability to reduce errors in revised essays. 65 learners on a university EAP course were randomly assigned to an experimental and a control group who wrote four journal entries each week for seven weeks. Whereas the control group did not receive any feedback, the experimental group was involved in (1) grammar review and editing exercises and (2) revision of the journal entries, both of which were followed by teacher corrective feedback. On each pre- and post-tests, the learners wrote a 30-minute composition which they were asked to improve in 60 minutes two days later. Linguistic accuracy was calculated as a ratio of error-free T-units to the total number of T-units in the composition.

The results suggested that the experimental group did not outperform the control group. The researchers conjectured that the validity of their results might have been undermined by the assessment measure used (T-units) and/or the relatively short duration of the treatment. They also hypothesised that the instruction the control group received might have been so effective that the additional practice for the experimental group did not make any difference.

The implications of this study are that editing instruction may take longer than seven weeks in order to be effective. Thus, the one-off editing instruction sessions that many teachers do on finding common errors in their students’ essays to address the grammar issues that refer to them, are absolutely futile, unless they are followed up by extensive and focused practice with lots of recycling.

  1. Elliott (1995) – Effect of explicit instruction on pronunciation

Elliott set out to investigate the effects of improving learner attitude toward pronunciation and of explicitly teaching pronunciation on his subjects (66 L1 students of Spanish). He compared the experimental group (which received 10-15 minutes of instruction per lesson over a semester) with a group of students whose pronunciation was corrected only when it impeded understanding. The results were highly significant, both in terms of improved accent and of attitude (92 % of the informants being positive about the treatment). The experimental group outperformed the control group.

Implications: this study , which confirms evidence from several others (e.g. Elliot 1997; Zampini, 1994), confirms that explicit pronunciation instruction is more effective than implicit instruction whereby L2 learners are expected to learn pronunciation simply by exposure to comprehensible input. Arteaga’s (2000) review of US Spanish textbooks found that only 4 out of 10 Spanish textbooks include activities attempting to teach pronunciation. I suspect that the figure may be even lower in the UK. In the light of Elliott’s findings, this is quite appalling, as the mastery of phonology not only is a catalyst of reading ability but also of listening and speaking proficiency as well as playing an enormous role in Working Memory’s processing efficiency in general (see my blog: ‘ Eight important facts about Working Memory’).

How the brain acquires foreign language grammar – A Skill-theory perspective

Caveat: Being an adaptation of a section of a chapter in my Doctoral thesis, this is a fairly challenging article which may require solid grounding in Applied Linguistics and Cognitive Theories of Skill Acquisition.

1. L2-Acquisition as skill acquisition: the Anderson Model

The Anderson Model, called ACT* (Adaptive Control of Thought), was originally created as an account of the way students internalise geometry rules. It was later developed as a model of L2-learning (Anderson, 1980, 1983, 2000). The fundamental epistemological premise of adopting a skill-development model as a framework for L2-acquisition is that language is considered as governed by the same principles that regulate any other cognitive skill. A number of scholars such as Mc Laughlin (1987), Levelt (1989), O’Malley and Chamot (1990) and Johnson (1996), have produced a number of persuasive arguments in favour of this notion.
 
 

Although ACT* constitutes my espoused theory of L2 acquisition, I do not endorse Anderson’s claim that his model alone can give a completely satisfactory account of L2-acquisition. I do believe, however, that it can be used effectively to conceptualise at least three important dimensions of L2-acquisition which are relevant to type of Explicit MFL instructional approaches implemented in many British schools: (1) the acquisition of grammatical rules in explicit L2-instruction, (2) the developmental mechanisms of language processing and (3) the acquisition of Learning Strategies.

 
 

 Figure 1: The Anderson Model (adapted from Anderson, 1983)

 

                 

 

The basic structure of the model is illustrated in Figure 1, above. Anderson posits three kinds of memory, Working Short-Term Memory (WSTM), Declarative Memory and Production (or Procedural) Memory. Working Memory shares the same features discussed in previous blogs (see ‘Eight important facts about Working Memory’) while Declarative and Production Memory may be seen as two subcomponents of Long-Term Memory (LTM). The model is based on the assumption that human cognition is regulated by cognitive structures (Productions) made up of ‘IF’ and ’THEN’ conditions. These are activated every single time the brain is processing information; whenever a learner is confronted with a problem the brain searches for a Production that matches the data pattern associated with it. For example:

 
 

IF the goal is to form the present perfect of a verb and the person is 3rd singular/

 

THEN form the 3rd singular of ‘have’

 

IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed /

 

THEN form the past participle of the verb

 
 

The creation of a Production is a long and careful process since Procedural Knowledge, once created, is difficult to alter. Furthermore, unlike declarative units, Productions control behaviour, thus the system must be circumspect in creating them. Once a Production has been created and proved to be successful, it has to be automatised in order for the behaviour that it controls to happen at naturalistic rates. According to Anderson (1985), this process goes through three stages: (1) a Cognitive Stage, in which the brain learns a description of a skill; (2) an Associative Stage, in which it works out a method for executing the skill; (3) an Autonomous Stage, in which the execution of the skill becomes more and more rapid and automatic.

 
 

In the Cognitive Stage, confronted with a new task requiring a skill that has not yet been proceduralised, the brain retrieves from LTM all the declarative representations associated with that skill, using the interpretive strategies of Problem-solving and Analogy to guide behaviour. This procedure is very time-consuming, as all the stages of a process have to be specified in great detail and in serial order in WSTM. Although each stage is a Production, the operation of Productions in interpretation is very slow and burdensome as it is under conscious control and involves retrieving declarative knowledge from LTM. Furthermore, since this declarative knowledge has to be kept in WSTM, the risk of cognitive overload leading to error may arise.

 
 

Thus, for instance, in translating a sentence from the L1 into the L2, the brain will have to consciously retrieve the rules governing the use of every single L1-item, applying them one by one. In the case of complex rules whose application requires performing several operations, every single operation will have to be performed in serial order under conscious attentional control. For example, in forming the third person of the Present perfect of ‘go’, the brain may have to: (1) retrieve and apply the general rule of the present perfect (have + past participle); (2) perform the appropriate conjugation of ‘have’ by retrieving and applying the rule that the third person of ‘have’ is ‘has’; (3) recall that the past participle of ‘go’ is irregular; (4) retrieve the form ‘gone’.

 
 

Producing language by these means is extremely inefficient. Thus, the brain tries to sort out the information into more efficient Productions. This is achieved by Compiling (‘running together’) the productions that have already been created so that larger groups of productions can be used as one unit. The Compilation process consists of two sub-processes: Composition and Proceduralisation. Composition takes a sequence of Productions that follow each other in solving a particular problem and collapses them into a single Production that has the effect of the sequence. This process lessens the number of steps referred to above and has the effect of speeding up the process. Thus, the Productions

 
 
 

P1 IF the goal is to form the present perfect of a verb / THEN form the simple present of have

 
 

P2 IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed / THEN form the past participle of the verb would be composed as follows:

 
 

P3 IF the goal is to form the present perfect of a verb / THEN form the present simple of have and THEN the past participle of the verb

 
 

An important point made by Anderson is that newly composed Productions are weak and may require multiple creations before they gain enough strength to compete successfully with the Productions from which they are created. Composition does not replace Productions; rather, it supplements the Production set. Thus, a composition may be created on the first opportunity but may be ‘masked’ by stronger Productions for a number of subsequent opportunities until it has built up sufficient strength (Anderson, 2000). This means that even if the new Production is more effective and efficient than the stronger Production, the latter will be retrieved more quickly because its memory trace is stronger.

 
 

The process of Proceduralisation eliminates clauses in the condition of a Production that require information to be retrieved from LTM memory and held in WSTM. As a result, proceduralised knowledge becomes available much more quickly than non-proceduralised knowledge. For example, the Production P2 above would become

 
 

IF the goal is to form the present perfect of a verb

 

THEN form ‘have’ and then form the past participle of the verb

 

The process of Composition and Proceduralisation will eventually produce after repeated performance:

 
 

IF the goal is to form the present perfect of ‘play’/ THEN form ‘ has played’

 
 

For Anderson it seems reasonable to suggest that Proceduralisation only occurs when LTM knowledge has achieved some threshold of strength and has been used some criterion number of times. The mechanism through which the brain decides which Productions should be applied in a given context is called by Anderson Matching. When the brain is confronted with a problem, activation spreads from WSTM to Procedural Memory in search for a solution – i.e. a Production that matches the pattern of information in WSTM. If such matching is possible, then a Production will be retrieved. If the pattern to be matched in WSTM corresponds to the ‘condition side’ (the ‘if’) of a proceduralised Production, the matching will be quicker with the ‘action side’ (the ‘then’) of the Production being deposited in WSTM and make it immediately available for performance (execution). It is at this intermediate stage of development that most serious errors in acquiring a skill occur: during the conversion from Declarative to Procedural knowledge, unmonitored mistakes may slip into performance.

 
 

The final stage consists of the process of Tuning, made up of the three sub-processes of Generalisation, Discrimination and Strengthening. Generalisation is the process by which Production rules become broader in their range of applicability thereby allowing the speaker to generate and comprehend utterances never before encountered. Where two existing Productions partially overlap, it may be possible to combine them to create a greater level of generality by deleting a condition that was different in the two original Productions. Anderson (1982) produces the following example of generalization from language acquisition, in which P6 and P7 become P8

 
 
 

P6 IF the goal is to indicate that a coat belongs to me THEN say ‘My coat’

 
 

P7 IF the goal is to indicate that a ball belongs to me THEN say ‘My ball’

 
 

P8 IF the goal is to indicate that object X belongs to me THEN say ‘My X’

 
 
 

Discrimination is the process by which the range of application of a Production is restricted to the appropriate circumstances (Anderson, 1983). These processes would account for the way language learners over-generalise rules but then learn over time to discriminate between, for example, regular and irregular verbs. This process would require that we have examples of both correct and incorrect applications of the Production in our LTM.

 

Both processes are inductive in that they try to identify from examples of success and failure the features that characterize when a particular Production rule is applicable. These two processes produce multiple variants on the conditions (the ‘IF’ clause(s) of a Production) controlling the same action. Thus, at any point in time the system is entertaining as its hypothesis not just a single Production but a set of Productions with different conditions to control the action.

 
 
Since they are inductive processes, Generalization and Discrimination will sometimes err and produce incorrect Productions. As I shall discuss later in this chapter, there are possibilities for Overgeneralization and useless Discrimination, two phenomena that are widely documented in L2-acquisition research (Ellis, 1994). Thus, the system may simply create Productions that are incorrect, either because of misinformation or because of mistakes in its computations.
 
 
 
ACT* uses the Strengthening mechanism to identify the best problem-solving rules and eliminate wrong Productions. Strengthening is the process by which better rules are strengthened and poorer rules are weakened. This takes place in ACT* as follows: each time a condition in WSTM activates a Production from procedural memory and causes an action to be deployed and there is no negative feedback, the Production will become more robust. Because it is more robust it will be able to resist occasional negative feedback and also it will be more strongly activated when it is called upon:
 
 
 
The strength of a Production determines the amount of activation it receives in competition with other Productions during pattern matching.Thus, all other things being equal, the conditions of a stronger Production will be matched more rapidly and so repress the matching of a weaker Production (Anderson, 1983: 251)
 
 
 
Thus, if a wrong Interlanguage item has acquired greater strength in a learner’s LTM than the correct L2-item, when activation spreads the former is more likely to be activated first, giving rise to error. It is worth pointing out that, just as the strength of a Production increases with successful use, there is a power-law of decay in strength with disuse.
 
 
 
 
2.Extending the model: adding a ‘Procedural-to-Procedural route’ to L2-acquisition
 
One limitation of the model is that it does not account for the fact that sometimes unanalysed L2-chunks of language are through rote learning or frequent exposure. This happens quite frequently in classroom settings, for instance with set phrases used in everyday teacher-to-student communication (e.g. ‘Open the book’, ‘Listen up!’). As a solution to this issue Johnson (1996) suggested extending the model by allowing for the existence of a ‘Procedural to Procedural route’ to acquisition whereby some unanalysed L2-items can be automatised with use, ‘jumping’, as it were, the initial Declarative Stage posited by Anderson.
 
This means that teaching memorised unanalysed chunks can work in synergy with explicit language teaching, as happens in my approach. See my blog post on how I teach lexicogrammar.

Eight important facts about Working Memory and their implications for foreign language teaching and learning

download (1)

  1. Introduction

There is no blogpost of mine which does not mention Working Memory (WM) at some point. Why? Because effective language processing and learning largely depends on how well Working Memory performs. In fact, apart from automatic processes – which bypass WM’s attentional control – all conscious processing of information (visual, auditory, etc.) occurring in the human brain is performed by WM. Whether our students are reading or listening to target language input, translating a passage into French, planning an essay or performing an oral task it will be WM that does most or all of the work.

Let us consider reading a target language text. It is WM that matches any lexis in the text with its meaning (by retrieving it from Long Term Memory). And what if we struggle with that text? Every single operation the brain performs in an attempt to decode will take place in WM, too. In the case of vocabulary learning, any rehearsal we perform in an attempt to commit the words/phrases we are trying to learn to Long-term Memory (e.g. repeating aloud) will be performed in WM, which will temporarily hold that information for as long as we repeat it. In speaking and writing, all the operations involved in ‘translating’ ideas (or ‘propositions’ as psychologists call them) into words and evaluating their accuracy will occur in WM, too.

These are but a few examples of how cognition occurs in WM. With the above in mind, it goes without saying that knowing how WM works can help foreign language instructors devise strategies to teach more effectively. The following are eight important facts about WM and their implications for L2 learning that all foreign language teachers should bear in mind when planning and delivering the curriculum, assessing and providing feedback on learner performance.

  1. The structure of WM

As the picture below shows, WM, which is located in the prefrontal cortex of the brain, is made up of three main components:

  • A visuospatial (i.e. Graphic/Visual) sketchpad which activates areas near the visual cortex of the brain and allows us to hold images, including the graphic images of words ‘alive’ in WM so that they are available for processing;
  • A phonological loop which’ uses Broca’s area as a kind of ‘inner voice’ that repeats word sounds to hold them in WM;
  • A central executive which regulates the flow of information in and out of the phonological loop and the visuospatial sketchpad, both as coming from the perceptual organs and from Long-Term Memory. The central executive is basically in charge of orchestrating all the processes occurring in WM.

m_27

 

So, for example, when we read a target language word or phrase, the visuospatial sketchpad will hold its graphic image, the phonological loop its sound (if we are pronouncing it) and the central executive will match it to any existing information in Long-term Memory in an attempt to make sense of it. If a match is found, the process will stop there; otherwise, if the word/phrase is new, the central executive will call upon a range of interpretive processes as well as resources from Long-Term Memory in order to attempt to decode it.

2.1. There are two distinct memory systems in the human brain

WM is one of two systems which memory is made of. The other one is the ‘place’ along the brain’s neural networks where memories are stored permanently and cannot be deleted unless by disease, physical damage or intervention affecting the prefrontal cortex (Long-Term Memory). It is after rehearsal in WM that information passes into Long-Term Memory.

2.2. WM is a temporary storage ‘facility’                      

Whether it is processing input from the outside world or retrieving material from Long-term Memory, WM will hold any information only for a few seconds. After that, spontaneous decay will set in, unless one makes a conscious effort to keep it there by focusing a considerable amount of his/her attentional resources on it through what we call ‘rehearsal’ (shallow or deep). Distinctiveness (how much it stands out) and high relevance (how much it matters to us) of input can also result in the stimulus to stay in WM longer. This has enormous implications for foreign language instruction and learning across all macros-skills and for any teaching in general.

Take, for example, oral recasts; the teacher responds to an erroneous utterance by a student by interrupting his/her conversation flow, and recasts (i.e. reformulates) his/her utterance correctly. At that point, the student have only a few seconds to process the teacher’s correction (has the correction will very soon decay from Workin Memory), notice and make sense of it whilst s/he is supposed to restart the conversation or to attend to another students’ input. Research shows that this is unlikely to result in learning unless the student has a much bigger and more efficient WM than average. Should teacher stop recasting? Maybe so, and reserve any feedback on or treatment of the errors noticed in learner input later on in the lesson.

Another implication refers to listening. Often MFL students sit through listening tasks which require them to identify details in a text spoken at native speaker speed. With the above in mind it is clear how this task can be a very tall order for novice-to-intermediate learners, as they have to hold on to information they hear by actively rehearsing it (through the phonological loop) to prevent decay whilst the listening track is still playing. Being a listening task, the learner’s WM will be rehearsing it by engaging the phonological loop; thus, if the learner’s pronunciation is not too good, s/he will find it very hard to rehearse the information s/he hears thereby slowing down the whole process. Hence the need for teachers to implement approaches to listening instruction which lessen the cognitive load on learners (e.g. narrow listening) and include focus on micro-listening skills (see my article on micro-listening enhancers).

There are obviously many more implications for teachers, as far as the temporariness of WM storage is concerned. Too many to deal with in this article. The most important relates to the issue of distinctiveness of teacher input: the more distinctive (e.g. engaging, outstanding, impressive, particularly funny) teacher input is, the more likely it is to linger for longer than the 1-2 seconds it would normally stay in WM and to pass into Long-Term Memory. That is also why, engaging students in the semantic analysis of a target word/phrase (what psychologists call ‘elaboration’) is more likely to result in learning as such analysis, by involving deeper processing, will require the learner to hold the word in WM for longer than 1-2 seconds whilst engaging the brain in higher order thinking (which strengthens retention).

2.3 WM has limited channel capacity

WM has a very limited capacity or memory span. According to Miller (1965), it cannot contain more than 7+/- 2 items at the same time (i.e. between 5 and 9). More recent estimates concede that Miller’s number may be true of university population but not of the average person; they estimate WM’s capacity at 4 to 5 items at the same time. WM’s channel capacity is affected by genetic factors (some individual’s WM is bigger than others) and by motivation.

The amount of words WM can hold at any given time is phonologically determined (for instance, Chinese speakers can hold more words in WM than English speakers because in Mandarin each word is a syllable). This means that a novice foreign language learner will be able to hold fewer words in WM than s/he does in his/her mother tongue as s/he will pronounce the words more slowly. The more rapidly a foreign language speaker can utter a word or phrase, the less space in their working memory it will take.

The phonology-dependent nature of learning vocabulary and the limitation of the phonological loop also means that words that are long and contain complex target language sounds cannot be processed efficiently and therefore not learnt ‘properly’. Hence, work on phonics from the very early days of instruction is paramount.

One implication of this issue for MFL teaching and learning is that in order to increase MFL learners’ WM processing efficiency in a foreign language, they must receive extensive speaking practice. Such practice will also impact their listening skills in that, as already explained above, whilst listening the learner needs to hold in his/her phonological loop fairly big chunks of target language in order to comprehend the text.

Another implication relates to writing and speaking. novice L2 English learners will find it hard to produce longer or complex sentences accurately in languages like French, Italian, German or Spanish as most or all of their WM’s channel capacity will be taken up by the retrieval of the L2 lexis required to form those sentences and little space will be left to focus on less salient grammar features such as adjectival and verb endings, function words and syntactic order.

Finally, to enhance learner memory span, teachers may want to train students with poorer WM in the use of mnemonics such as the Key Word technique or other associative memory techniques. Research shows that through the effective use of mnemonic strategies WM’s digit span can be even increased tenfold.

Another strategy to increase WM’ capacity is chunking the target information. This consists in organizing a number of items which would normally would be too big for WM to hold into manageable units. An example of this is the way we memorize a phone number; by memorizing 0176324167 as 017 632 4167 we basically reduce 10 units to 3, thereby greatly reducing the cognitive load. Imagine learning the phrase ‘appareils électroménagers’ – almost impossible for a novice’s phonological loop to cope with. By chunking it into appa / reils / électro / ménagers’ even a novice can cope with pronouncing and memorizing it.

2.4 Storage in WM is ‘fragile’

When items are stored in WM they can be easily lost due to interference from competition with other items (divided attention) or interference from environmental factors (e.g. noise). Anxiety, worry and self-concern during performance can also cause divided attention and WM memory loss.

The obvious implication is that our teaching should bring about as much arousal in our students as possible so as to keep the target language input in their focal awareness at all times.

Another implication is that apart from the obvious sources of distraction which pertain to student’s misbehavior or environmental factors, teachers must try to minimize any other source of distraction. A frequent source of distraction comes, in this day and age, by learning languages through the digital medium or by producing a digital artefact as part of projects in the target language.

 

2.5 Error is often caused by WM processing inefficiency

When we are carrying out complex tasks WM may have to juggle several tasks at the same time. Base on points 2.3 and 2.4 above the ‘multi-tasking’ that WM has to do can cause information processing or retrieval to slow down and/or result in performance error. Anxiety can have a detrimental effect in this regard, too.

The application of declarative knowledge (i.e. intellectual knowledge of L2 grammar) in speaking and listening performance is likely to cause processing inefficiency as WM needs to apply every rule consciously. Imagine, in talking about what you did yesterday in French, having to apply every step to forming the Perfect Tense of ‘Aller’ one by one as compared to simply saying ‘je suis allé’. Hence the very long pauses and hesitation when a novice-to-intermediate speaker has solid declarative knowledge of the language but little control over the speaking medium, due to lack of practice.

The implications for teaching are obvious and refer to the issues I have dealt with extensively in previous blogs. On the one hand teachers must focus their efforts on developing students’ cognitive control over the target language; on the other, they need to try as much as possible to lessen the cognitive load on students’ WM by (a) pitching the tasks they involve students in to the right level of cognitive/linguistic challenge; (b) prepping the students before each target language task through activities which recycle the language items they will need in the execution of that task; (c) keeping anxiety out of the classroom as much as possible.

Also, in order to facilitate WM processing efficiency, students may have to be taught strategies that can compensate for lack of procedural competence. For instance, teachers may raise learners’ awareness of how their WM’s processing inefficiency can cause them to make specific mistakes (e.g. agreement mistakes in writing) and model editing strategies to identify and/or prevent such mistakes (e.g. through mnemonics).

2.6 Forgetting is caused by WM failure to access the required information (cue-dependent forgetting)

Memory is context-dependent, in other words, the environment in which one is learning a given language item will enhance the chances of recalling that item later on. Hence, when we do not remember something, it is not because that information is not stored in Long-Term Memory any longer; but rather, because we are not using the right cue to retrieve that information from Long-Term Memory. So, for instance, if my teacher has used a picture of Arnold Schwarzenegger to teach the word ‘Musculoso’ in Spanish, that picture will facilitate my recall of that word.

Here, too, training students in the use of memory strategies to prevent cue-dependent forgetting can be extremely helpful.

2.7 There may be a link between poor WM and depression

Recent research has evidenced a link between poor WM and depression. They found that people with a highly efficient WM have a more positive outlook on life and are generally more self-confident. Individuals with poor WM tend be more prone to anxiety and to brood and sulk more over things.

The implications for teachers are very obvious; minimize the potential sources of anxiety for students who fall in this category. Don’t presume that this issue affect only children with special educational needs. Research shows clearly that depression amongst adolescents has risen substantially in the last decade or so. Hence one has to be very mindful of this issue and handle it with much emotional and cognitive empathy.

2.8 An efficient WM is a good predictor of academic success including MFL learning

 Alloway and Alloway (2009) actually found that poor WM is a better predictor of future academic success than IQ. They found that “working memory is not a proxy for IQ but rather represents a dissociable cognitive skill with unique links to academic attainment”. Students with poor working memory do badly across all or most subjects, including foreign languages. In fact, more recent theories of language aptitude include WM as an important factor affecting success in foreign language learning.

  1. Conclusion

In conclusion, MFL teaching should concern itself from the very early stages of instruction with the development of processing efficiency. A big and efficient WM allows for faster recall and processing, for more accurate performance and more ‘noticing’. This is a very important issue if one considers that WM is first and foremost the gateway to Long-Term Memory – where all the knowledge we have about a language and the world is permanently stored.

‘Noticing’ new key target language features, as Schmidt (1990) posits, propels our students’ learning forward, but only if they make the connection between what they notice and the system they have been building in their Long-Term Memory (their Interlanguage). Often this connection must be made under Real Operating Conditions (ROC) as they interact orally with an expert speaker, watch a video or listen. For this to happen in these contexts – when they operate under considerable communicative pressure- their WM must be highly efficient.

Teachers should heed the above recommendations in their daily practice and ensure that lessons are as much about developing students’ WM processing efficiency (cognitive control) what I call ‘horizontal progression’ – as they are about vertical progression, i.e. ‘jumping’ from one level of linguistic challenge to a higher one, for the sake of being able to say “we have covered three tenses” or “we have created complex sentences”. Vertical progression without horizontal progression creates very unstable system, like a tall building without strong foundations.

Finally, raising the students’ awareness of how WM’s works can be very useful in enhancing their learning and their metacognition. I have several short sessions with my KS3 classes where I summarize the key features of memory and how WM works. The teacher must create the right context for these sessions and make them as simple, visual and engaging as possible. I was so proud when last week, a year 8 girl said to another who was finding a word difficult to pronounce:”You have to chunk it” and actually modelled the chunking to her classmate. Ultimately, the more students know about how their mind works, the more they will feel in control of their learning.

Things learners do not enjoy about their foreign language lessons

images (4)

In the last two years or so, I have carried out short interviews with around 150 year 8 and 9 MFL students of various ability which I ranked as very able (38), able (72) and less able (40) based on their (Midyis) test results. I asked them the question: “Which 3 things about your language lessons do you neither enjoy nor find conducive to learning? Why?”. Although there was a high degree of idiosyncrasy and variation across the informants’ answers in terms of individual preferences, seven ‘things’ stood out as being disliked and found ‘not very productive’ by at least 40 % of the students. Here they are:

  1. Tasks they do not feel prepared for – 52 % of the students found demotivating and not very productive to be required to carry out tasks they felt were beyond their linguistic competence level. Unsurprisingly, most of the students in this category included the less able ones. However, some of the more able students complained about this issue, too, possibly because, being perfectionists, they hated not being able to be 100 % accurate.
  2. Long sessions of writing in lessons – 65 % of the students stated they disliked or even ‘hated’ doing writing in class for 15-20 minutes or more, most of them feeling that it was boring and should be done at home when one would have more time and focus to learn from it. Not surprisingly, here, too, the less able students were those with the strongest negative feelings about long writing sessions.
  3. Listening comprehension tasks from course-books– 55 % of the students mentioned this as a motivation inhibitor and as an activity they did not feel they learnt much from because they felt that the actors on the course-book audio-tracks went either too slow (in the lower level activities) or too fast (in the higher level ones). They enjoyed listening to the teacher speaking to them in the target language as they felt it was a more natural context for practising listening.
  4. Target setting / Being given targets – 45 % of the students felt they did not learn much from this process because after the target-setting session they would not look at the targets often enough to do something about them. They found the process tedious and unproductive.
  5. Very long sessions on the iPad – 60 % of my informants reported generally enjoying using the iPad in class but not for the whole lesson. They felt that they needed a mix of activities. 40% mentioned tiredness and/or boredom as the reasons why using it for the whole lesson resulted in less focus and interest.The more able students seemed to be the ones who objected more strongly against the overuse of the iPad. The suggestion made by the students was that each session should not be longer than 15-20 minutes maximum.
  6. Lack of group-work – 45 % of the students reported disliking lessons where they had to work alone from beginning to end. These students (mainly belonging to the able and less able group) said that there had to be some form of group work in every single lesson. 25 % of these students mentioned movement around the classroom as a desirable feature of such group-work activities.
  7. Learning verb conjugations – 40 % of the students reported disliking learning verb conjugations through drills, gap-fills and even online conjugators (this was painful considering that I created one at www.language-gym.com). Most of the students who mentioned this belonged to the less able group and a minority to the more able groups. These learners found learning conjugations challenging and said they learnt them more effectively at home as they felt less under pressure and had more time to focus on them.

The above findings are hardly generalizable, as they are situated in a very specific educational and cultural setting and the sampling was not randomized. Hence, it would be interesting to see if colleagues working in different contexts would obtain similar or divergent findings.

Of course, the fact that students do not dislike something does not entail that we should not do it, if we do believe those things are actually highly conducive to learning. For example, although it is true that learners do not particularly enjoy verb conjugation drills, I have noticed remarkable improvements in terms of verb/tense awareness (not necessarily ‘acquisition’) since I started to regularly use online verb conjugator trainers (5-10 minutes at a time); hence, I will keep using them as short homework and/or warm-up/follow-up activities in the context of grammar learning sessions.

Ultimately, however, I do believe that we should try and heed our students’ preferences as much as possible and be ‘brave’ enough to ask them if they truly enjoyed and learnt from specific activities we stage in class. You will be very suprised at how mistaken some of our assumptions often are.

The least talked-about yet most important attribute of an effective MFL teacher

download (3)

 

MFL teachers are typically involved in CPD events which deal with L2 teaching methodology and techniques, the use of technology in the classroom, motivational theory and practice, learning management and differentiation, AFL , lifelong-learning skills and team building. However, they rarely explicitly focus on enhancing the teacher attribute that is crucial to the success of all of that : Cognitive Empathy.

Cognitive Empathy (henceforth CE) refers to the teacher’s ability to sync every level of their teaching (e.g. planning lessons, classroom delivery, feedback provision, target-setting, setting out-of-the-classroom consolidation work) to their students’ cognition. It is a distinct construct to Emotional Empathy (another crucial attribute of an effective teacher), in that it does not concern itself with socio-affective empathising (reading our students’ emotional states), but rather with the understanding of what goes on through the MFL learner’s mind. CE and EE (emotional empathy) do overlap in some areas, but for CPD purposes they are best kept separate, whilst hammering home to teachers the importance of their mutual synergy : for either of them to effectively impact teaching and learning it needs to be supported by the other.

Someone may object that since a lot of effort is placed, in CPD, on differentiation, CE is not as ‘neglected’ as I claim. However, differentiation usually concerns itself with the implementation of techniques to tackle identified issues in our students’ cognition, but not with the identifications of the root causes of those problems.

And how about AFL strategies? Does formative assessment of the like envisaged by Dylan William not address this issue ? Yes and no. As I reserve to discuss below, it does so only partially and through means which do not delve deep enough into our students’ cognition. And as for Learning Styles and Multiple Intelligences research, they are out of the equation as they are invalid constructs based on phony research.

Data obtained through baseline testing (i.e. MiddYS, Yellis, etc.) with alleged high predictive power are indeed useful. However, they provide but a snapshot of our students’ cognition at a specific moment in time. Also, teachers are rarely, if ever, trained in reading what the categories and scores made available to them actually mean and how they relate to our students’ learning.

CE, as I envisage it, requires 5 macro-competences :

(1) An awareness of the cognitive challenges posed by foreign language learning in general and by the specific language items one is teaching  – especially in the planning of a lesson ;

(2) An understanding of how the target learners respond to such challenges. This also involves an awareness of how cognition in an MFL learning context is affected by individual variables (e.g. specific age group, gender, personality types, culture, etc.)

(3) Metacognition – Obviously, effective teachers constantly keep in their focal awareness the importance of syncing their teaching with the cognitive needs of their learners, but they must also :

  • constantly reflect on their practice as cognitively empathetic teachers , both before, during and after their lessons ;
  • use their own past experiences as language learners to enhance their levels of cognitive empathy ;
  • start and maintain an ongoing metacognitive dialogue with their learners (e.g. through feedback or learner reflective journals) ;
  • actively seek ways to further their understanding of learner cognitive needs (which relates to the next point).

(4) Research methodoly knowledge and skills – Teachers need to be able to discern between valid and ‘phony’ theory and research. This is important when one thinks of how some less than reliable research (e.g. the one of Multiple Intelligences and Learning Styles) has affected us over the decades – with hardly any positive result – in terms of educational policies and prescribed pedagogical approaches. The ability to understand how reliable a piece of research is can prevent teachers from adopting teaching approaches or techniques acritically. Moreover, teachers need a degree of expertise on how to obtain and analyze useful data that may inform their curricular and methodological choices. For example, in my own practice, my mastery of the use of think-aloud protocols, interviews and retrospective verbal reports (acquired during my Ph.D and MA ) has helped me a great deal in terms of developing my understanding of students’ problems. On the other hand, lack of expertise in this domain often leads to an overliance on questionnaires (not valid research tools) which are not always well crafted (e.g. rarely include measures to reinforce their internal validity) or to other less than valid research practices. This overliance on questionnaires can be highly detrimental to teaching and learning, especially when the quantitative data obtained through such procedures are used as determinant of educational policies.

(5) Metadigital awareness – this is of crucial importance at this time of revolutionary technological advances as digital assisted learning is playing an important role in many MFL classrooms around the world. Teachers need to become increasingly aware of the impact of the specific digital medium (or media) they use in the classrom and internet based resources on learner cognition (see my article on the ‘Five central psychological challenges of mobile learning’). Knowing how a given technological device -and related apps – works and knowing how to use it effectively to enhance learning are two very different things. That is why, ICT integration coaches should not only be teachers with high levels of digital knowledge ; but, also, and most importantly, ought to be highly metacognizant outstanding practitioners deeply aware of how the internet and digital media affect students’ thinking and learning processes.

What are the implications for teacher professional development ?

Firstly, CPD should focus much more than it currently does on rendering MFL teachers highly conversant with current theory and research on language acquisition and on how they can inform effective classroom practice. The current practice of providing teachers with behavioural templates (e.g. tasks or sequence of tasks to use in class) disjointed from a solid reference framework is insufficient in generating high level of teaching competence.

Secondly, professional development should aim at expanding teacher understanding of how individual variables affect learning. For instance, psychological research  has generated lots of useful taxonomies for the classification of personality types which can be very useful for teachers in understanding learner attitudes and behaviour. Such taxonomies (e.g. Myers and Briggs, see: http://www.personalitypage.com/high-level.html )  are used on a daily basis in the corporate world to assess and cater for staff, but rarely imparted on teachers. Moreover, there is quite a fairly solid body of research on how individual factors (e.g. aptitude, gender and age) interact with L2 language acquisition that teachers may benefit greatly from.

Thirdly, in the realm of metacognitive and teacher-skills enhancement, CPD should focus teachers on the importance of CE and scaffold self-reflection in this area whilst equipping them with the tools which can enhance it (e.g. the ones discussed in the previous two paragraphs.). Teachers should also be made conversant with effective approaches that can foster an ongloing metacognitive dialogue with the students vis-à-vis their learning needs. Some approaches and techniques useful in starting such dialogue, drawn from AFL practice (e.g. questionnaire/student voice and reflective journals), are quite common in many MFL classrooms ; others, like think-aloud, concurrent introspection and retrospective verbal reports (see my article on ‘Think-aloud’) are less frequently used but are more valuable in getting into our students’ thinking processes and identifying their learning problems.

Fourthly, for teachers to be able to effectively understand their learners’ thinking processes they must be conversant with some of the fundamentals of research methodology. CPD should focus on this important set of skills more than it is currently done and foster an environment conducive to classroom-based research (N.B. I am not envisaging PhD level research, here, but a much smaller scale and less formal kind). Schools often base their educational policies on data obtained from studies often geographically and culturally distant ; classroom-based research carried out within their walls, on the other hand, may be more relevant and therefore impact learning more effectively. Ultimately, an effective MFL classroom-based teacher-researcher will be a more cognitively empathetic instructor.

The fifth component of Cognitive Empathy, Metadigital learning awareness, is the most difficult to address in CPD in view of the lack of a solid body of research which can inform teacher training in this area. This is the domain in which, in my opinion being a self-reflective practitioner and an effective classroom-based researcher can be extremely useful. Hence, at this moment in time at least, CPD that attempts to address this metacomponent of Cognitive Empathy should focus less on making teachers conversant with relevant research and more on enhancing their reflective and research skills.

In conclusion, this article advocates the need for CPD in MFL teaching to focus much more than it currently does on the development of Cognitive Empathy. I have argued that simply training teachers in the deployment of AFL and differentiation strategies may not be sufficient. Moreover, using baseline testing may form assumptions about students that are skewed and not very useful when the teachers do not possess the specialised knowledge necessary to effectively interpret psychological test scores. The approach I advocate in my model of CE is laborious and relative expensive in terms of training, but CE being central to effective teaching and learning, it may be worth the effort and the cost.

In over 25 years of secondary school and university teaching career and more than that as a foreign language learner, the best teachers and educational managers I have come across were those who exhibited high levels of emotional and cognitive empathy. An old friend of mine once said that the worst line manager an MFL head of department can have is one who has never learnt a foreign language ; what he was actually referring to, indirectly, was to someone with low levels of cognitive empathy (how can you truly understand a foreign language student or teacher when you have never been through the process of learning a foreign language ?).

One the most important reasons why high levels of teacher cognitive empathy correlate with effective learning refers to student motivation. There is often a mismatch between how teachers expect students’ cognition to work and the way it actually does. In a survey I carried out with 150 of my students (the results of which I will publish in a future post), the activities that most impacted their motivation were those they did not feel they were ‘linguistically’ ready for. In second place they put listening tasks where speakers ‘go’ too fast. In third place doing long writing tasks in lessons. Another one of their pet hates was corrections they did not understand. All of these motivation inhibitors refer to low levels of cognitive empathy.

Another example of teacher-student cognitive mismatch refers to a widely used App: Padlet. Padlet is often praised by ‘ed tech’ MFL educators because it allows students to see what their class mates write on the wall, thereby promoting learning from that input. This presumes that all or most learners actively process their fellow students’ output and internalize it  – which, even as a highly motivated and gifted language learner I am not sure I would have done in my teen-age years. So I put this assumption to the test. I asked students from four of my classes, immediately after creating a padlet wall to which all of them had contributed to note down in their books or iPads three language items they found in their classmates’ writing which they found interesting or useful. Two weeks later I asked them to recall the items they had noted down. Guess what ? Only two of the 68 students involved in this experiment actually remembered something (one item each).

Finallly, CE can be enhanced by attempting to learn a new language every so often. A great Cognittive empathy enhancement CPD activity which I was involved in as a PGCE trainee was to attend three Swahili classes. It was a much needed reminder of how hard it is to learn a language. To this day, whenever I teach a less able group, I cast my mind back to those three sessions and this helps me approach that group with more humility, patience and understanding.

The causes of learner errors in L2 writing – an attempt to integrate Skill-theory and mainstream accounts of Second Language Acquisition

A cognitive account of errors in L2-writing rooted in skill acquisition and production theory

1. Introduction

 The purpose of this paper is to shed light on the cognitive sources of errors. An understanding of the psycholinguistic mechanisms that cause our students to err is fundamental if we aim to significantly enhance the (surface-level) accuracy of their written output. In what follows, I intend to take the reader through the cognitive processes underlying second language writing mapping out in detail the stages and contexts in which mistakes are usually made. In order for the reader to fully comprehend the ensuing discussion, I will begin by outlining four key concepts in Cognitive psychology which are essential for an understanding of any skill-acquisition theory of language development and production. I will then proceed to concisely discuss the way humans acquire languages according to one of the most widely accepted models of second language acquisition (Anderson’s 2000). Finally, I will provide an exhaustive account of the way we process writing rooted in Cognitive theory and resulting from an integration of a number of models of monolingual and bilingual production. I shall then draw my conclusions as to the implications of the reviewed theories and research for an approach to error correction.

2. Key concepts in Cognitive psychology

Before engaging in my discussion of L2-acquisition and L2-writing, I shall introduce the reader to the following concepts, central to any Cognitive theory of human learning and information processing:

1. Short-term and Long-Term Memory

2. Metalinguistic Knowledge and Executive Control

3. The representation of knowledge in memory

4. Proceduralisation or Automatisation

2.1 Short-Term Memory and Long-Term Memory

In Information Processing Theory, memory is conceived as a large and permanent collection of nodes, which become complexly and increasingly inter-associated through learning (Shiffrin and Schneider, 1977). Most models of memory identify a transient memory called ‘Short-Term Memory’ which can temporarily encode information and a permanent memory or Long-Term Memory (LTM). As Baddeley (1993) suggested, it is useful to think of Short-Term Memory as a Working Short-Term Memory (WSTM) consisting of the set of nodes which are activated in memory as we are processing information. In most Cognitive frameworks, WSTM is conceived as the provision of a work space for decision making, thinking and control processes and learning is but the transfer of patterns of activation from WSTM to LTM in such a way that new associations are formed between information structures or nodes not previously associated. WSTM has two key features:

(1) fragility of storage (the slightest distraction can cause the brain to lose the data being processed);

(2) limited channel capacity (it can only process a very limited amount of information for a very limited amount of time).

LTM, on the other hand, has unlimited capacity and can hold information over long periods of time. Information in LTM is normally in an inactive state. However, when we retrieve data from LTM the information associated with such data becomes activated and can be regarded as part of WSTM.

In the retrieval process, activation spreads through LTM from active nodes of the network to other parts of memory through an associative chain: when one concept is activated other related concepts become active. Thus, the amount of active information resulting can be much greater than the one currently held in WSTM. Since source nodes have only a fixed capacity for emitting activation (Anderson, 1980), and this capacity is divided amongst all the paths emanating from a given node, the more paths that exist, the less activation will be transmitted to any one path and the slower will be the rate of activation (fan effect). Thus, additional information about a concept interferes with memory for a particular piece of information thereby slowing the speed with which that fact can be retrieved. In the extreme case in which the to-be-retrieved information is too weak to be activated (owing, for instance, to minimal exposure to that information) in the presence of interference from other associations, the result will be failure to recall (Anderson, 2000).

2.2 Metalinguistic knowledge and executive control (processing efficiency)

This distinction originated from Bialystock (1982) and its validity has been supported by a number of studies (eg Hulstijin and Hulstijin, 1984). Knowledge is the way the language system is represented in LTM; Control refers to the regulation of the processing of that knowledge in WSTM during performance. The following is an example of how this distinction applies to the context of my study: many of my intermediate students usually know the rules governing the use of the Subjunctive Mood in Italian, however, they often fail to apply them correctly in Real Operating Conditions, that is when they are required to process language in real time under communicative pressure (e.g. writing an essay under severe time constraints; giving a class presentation; etc.). The reason for this phenomenon may be that WSTM’s attentional capacity being limited, its executive-control systems may not cope efficiently with the attentional demands required by a task if we are performing in operating conditions where worry, self-concern and task-irrelevant cognitive activities make use of some of the available limited capacity (Eysenck and Keane, 1995). These factors may cause retrieval problems in terms of reduced speed of recall/recognition or accuracy. Thus, as Bialystock (1982) and Johnson (1996) assert, L2-proficiency involves degree of control as well as a degree of knowledge.

2.3 The representation of knowledge in memory

Declarative Knowledge is knowledge about facts and things, while Procedural Knowledge is knowledge about how to perform different cognitive activities. This dichotomy implies that there are two ‘paths’ for the production of behaviour: a procedural and a declarative one. Following the latter, knowledge is represented in memory as a database of rules stored in the form of a semantic network. In the procedural path, on the other hand, knowledge is embedded in procedures for action, readily at hand whenever they are required, and it is consequently easier to access.

Anderson (1983) provides the example of an EFL-learner following the declarative path of forming the present perfect in English. S/he would have to apply the rule: use the verb ‘have’ followed by the past participle, which is formed by adding ‘-ed’ to the infinitive of a verb. S/he would have to hold all the knowledge about the rule formation in WSTM and would apply it each time s/he is required to form the tense. This implies that declarative processing is heavy on channel capacity, that is, it occupies the vast majority of WSTM attentional capacity. On the other hand, the learner who followed the procedural path would have a ‘program’, stored in LTM with the following information: the present perfect of ‘play’ is ‘I have played’. Deploying that program, s/he would retrieve the required form without consciously applying any explicit rule. Thus, procedural processing is lighter on WSTM channel capacity than declarative processing.

2.4 Proceduralisation or Automatization

Proceduralisation or Automatization is the process of making a skill automatic. When a skill becomes proceduralised it can be performed without any cost in terms of channel capacity (i.e. “memory space”): skill performance requires very little conscious attention, thereby freeing up ‘space’ in WSTM for other tasks.

3. L2-Acquisition as skill acquisition: the Anderson Model

The Anderson Model, called ACT* (Adaptive Control of Thought), was originally created as an account of the way students internalise geometry rules. It was later developed as a model of L2-learning (Anderson, 1980, 1983, 2000). The fundamental epistemological premise of adopting a skill-development model as a framework for L2-acquisition is that language is considered as governed by the same principles that regulate any other cognitive skill. A number of scholars such as Mc Laughlin (1987), Levelt (1989), O’Malley and Chamot (1990) and Johnson (1996), have produced a number of persuasive arguments in favour of this notion.

Although ACT* constitutes my espoused theory of L2 acquisition, I do not endorse Anderson’s claim that his model alone can give a completely satisfactory account of L2-acquisition. I do believe, however, that it can be used effectively to conceptualise at least three important dimensions of L2-acquisition which are relevant to this study: (1) the acquisition of grammatical rules in explicit adult L2-instruction, (2) the developmental mechanisms of language processing and (3) the acquisition of Learning Strategies.

 Figure 1: The Anderson Model (adapted from Anderson, 1983)

                 

The basic structure of the model is illustrated in Figure 1, above. Anderson posits three kinds of memory, Working Memory, Declarative Memory and Production (or Procedural) Memory. Working Memory shares the same features previously discussed in describing WSTM while Declarative and Production Memory may be seen as two subcomponents of LTM. The model is based on the assumption that human cognition is regulated by cognitive structures (Productions) made up of ‘IF’ and ’THEN’ conditions. These are activated every single time the brain is processing information; whenever a learner is confronted with a problem the brain searches for a Production that matches the data pattern associated with it. For example:

IF the goal is to form the present perfect of a verb and the person is 3rd singular/

THEN form the 3rd singular of ‘have’

IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed /

THEN form the past participle of the verb

The creation of a Production is a long and careful process since Procedural Knowledge, once created, is difficult to alter. Furthermore, unlike declarative units, Productions control behaviour, thus the system must be circumspect in creating them. Once a Production has been created and proved to be successful, it has to be automatised in order for the behaviour that it controls to happen at naturalistic rates. According to Anderson (1985), this process goes through three stages: (1) a Cognitive Stage, in which the brain learns a description of a skill; (2) an Associative Stage, in which it works out a method for executing the skill; (3) an Autonomous Stage, in which the execution of the skill becomes more and more rapid and automatic.

In the Cognitive Stage, confronted with a new task requiring a skill that has not yet been proceduralised, the brain retrieves from LTM all the declarative representations associated with that skill, using the interpretive strategies of Problem-solving and Analogy to guide behaviour. This procedure is very time-consuming, as all the stages of a process have to be specified in great detail and in serial order in WSTM. Although each stage is a Production, the operation of Productions in interpretation is very slow and burdensome as it is under conscious control and involves retrieving declarative knowledge from LTM. Furthermore, since this declarative knowledge has to be kept in WSTM, the risk of cognitive overload leading to error may arise.

Thus, for instance, in translating a sentence from the L1 into the L2, the brain will have to consciously retrieve the rules governing the use of every single L1-item, applying them one by one. In the case of complex rules whose application requires performing several operations, every single operation will have to be performed in serial order under conscious attentional control. For example, in forming the third person of the Present perfect of ‘go’, the brain may have to: (1) retrieve and apply the general rule of the present perfect (have + past participle); (2) perform the appropriate conjugation of ‘have’ by retrieving and applying the rule that the third person of ‘have’ is ‘has’; (3) recall that the past participle of ‘go’ is irregular; (4) retrieve the form ‘gone’.

Producing language by these means is extremely inefficient. Thus, the brain tries to sort out the information into more efficient Productions. This is achieved by Compiling (‘running together’) the productions that have already been created so that larger groups of productions can be used as one unit. The Compilation process consists of two sub-processes: Composition and Proceduralisation. Composition takes a sequence of Productions that follow each other in solving a particular problem and collapses them into a single Production that has the effect of the sequence. This process lessens the number of steps referred to above and has the effect of speeding up the process. Thus, the Productions

P1 IF the goal is to form the present perfect of a verb / THEN form the simple present of have

P2 IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed / THEN form the past participle of the verb would be composed as follows:

P3 IF the goal is to form the present perfect of a verb / THEN form the present simple of have and THEN the past participle of the verb

An important point made by Anderson is that newly composed Productions are weak and may require multiple creations before they gain enough strength to compete successfully with the Productions from which they are created. Composition does not replace Productions; rather, it supplements the Production set. Thus, a composition may be created on the first opportunity but may be ‘masked’ by stronger Productions for a number of subsequent opportunities until it has built up sufficient strength (Anderson, 2000). This means that even if the new Production is more effective and efficient than the stronger Production, the latter will be retrieved more quickly because its memory trace is stronger.

The process of Proceduralisation eliminates clauses in the condition of a Production that require information to be retrieved from LTM memory and held in WSTM. As a result, proceduralised knowledge becomes available much more quickly than non-proceduralised knowledge. For example, the Production P2 above would become

IF the goal is to form the present perfect of a verb

THEN form ‘had’ and then form the past participle of the verb

The process of Composition and Proceduralisation will eventually produce after repeated performance:

IF the goal is to form the present perfect of ‘play’/ THEN form ‘ has played’

For Anderson it seems reasonable to suggest that Proceduralisation only occurs when LTM knowledge has achieved some threshold of strength and has been used some criterion number of times. The mechanism through which the brain decides which Productions should be applied in a given context is called by Anderson Matching. When the brain is confronted with a problem, activation spreads from WSTM to Procedural Memory in search for a solution – i.e. a Production that matches the pattern of information in WSTM. If such matching is possible, then a Production will be retrieved. If the pattern to be matched in WSTM corresponds to the ‘condition side’ (the ‘if’) of a proceduralised Production, the matching will be quicker with the ‘action side’ (the ‘then’) of the Production being deposited in WSTM and make it immediately available for performance (execution). It is at this intermediate stage of development that most serious errors in acquiring a skill occur: during the conversion from Declarative to Procedural knowledge, unmonitored mistakes may slip into performance.

The final stage consists of the process of Tuning, made up of the three sub-processes of Generalisation, Discrimination and Strengthening. Generalisation is the process by which Production rules become broader in their range of applicability thereby allowing the speaker to generate and comprehend utterances never before encountered. Where two existing Productions partially overlap, it may be possible to combine them to create a greater level of generality by deleting a condition that was different in the two original Productions. Anderson (1982) produces the following example of generalization from language acquisition, in which P6 and P7 become P8

P6 IF the goal is to indicate that a coat belongs to me THEN say ‘My coat’

P7 IF the goal is to indicate that a ball belongs to me THEN say ‘My ball’

P8 IF the goal is to indicate that object X belongs to me THEN say ‘My X’

Discrimination is the process by which the range of application of a Production is restricted to the appropriate circumstances (Anderson, 1983). These processes would account for the way language learners over-generalise rules but then learn over time to discriminate between, for example, regular and irregular verbs. This process would require that we have examples of both correct and incorrect applications of the Production in our LTM.

Both processes are inductive in that they try to identify from examples of success and failure the features that characterize when a particular Production rule is applicable. These two processes produce multiple variants on the conditions (the ‘IF’ clause(s) of a Production) controlling the same action. Thus, at any point in time the system is entertaining as its hypothesis not just a single Production but a set of Productions with different conditions to control the action.

Since they are inductive processes, Generalization and Discrimination will sometimes err and produce incorrect Productions. As I shall discuss later in this chapter, there are possibilities for Overgeneralization and useless Discrimination, two phenomena that are widely documented in L2-acquisition research (Ellis, 1994). Thus, the system may simply create Productions that are incorrect, either because of misinformation or because of mistakes in its computations.
ACT* uses the Strengthening mechanism to identify the best problem-solving rules and eliminate wrong Productions. Strengthening is the process by which better rules are strengthened and poorer rules are weakened. This takes place in ACT* as follows: each time a condition in WSTM activates a Production from procedural memory and causes an action to be deployed and there is no negative feedback, the Production will become more robust. Because it is more robust it will be able to resist occasional negative feedback and also it will be more strongly activated when it is called upon:
The strength of a Production determines the amount of activation it receives in competition with other Productions during pattern matching.Thus, all other things being equal, the conditions of a stronger Production will be matched more rapidly and so repress the matching of a weaker Production (Anderson, 1983: 251)
Thus, if a wrong Interlanguage item has acquired greater strength in a learner’s LTM than the correct L2-item, when activation spreads the former is more likely to be activated first, giving rise to error. It is worth pointing out that, just as the strength of a Production increases with successful use, there is a power-law of decay in strength with disuse.
 4.Extending the model: adding a ‘Procedural-to-Procedural route’ to L2-acquisition
One limitation of the model is that it does not account for the fact that sometimes unanalysed L2-chunks of language are through rote learning or frequent exposure. This happens quite frequently in classroom settings, for instance with set phrases used in everyday teacher-to-student communication (e.g. ‘Open the book’, ‘Listen up!’). As a solution to this issue Johnson (1996) suggested extending the model by allowing for the existence of a ‘Procedural to Procedural route’ to acquisition whereby some unanalysed L2-items can be automatised with use, ‘jumping’, as it were, the initial Declarative Stage posited by Anderson. In classroom settings where instruction is grammar-based, however, only a minority of L2-items will be acquired this way.

5. Bridging the ‘gap’ between the Anderson Model and ‘mainstream’ second language acquisition (SLA) research

As already pointed out above, a number of theorists believe that Anderson provides a viable conceptualisation of the processes central to L2-acquisition. However, ACT* was intended as a model of acquisition of cognitive skills in general and not specifically of L2-acquisition. Thus, the model rarely concerns itself explicitly with the following phenomena documented by SLA researchers: Language Transfer, Communicative Strategies, Variability and Fossilization. These phenomena are relevant to secondary school settings for the following reasons: firstly, as far as Language Transfer and Communicative Strategies are concerned, they constitute common sources of error in the written output of L2-intermediate learners. Variability, on the other hand, refers to the phenomenon, particularly evident in the written output of beginner to intermediate learner writing, whereby learners produce a given structure correctly in certain contexts and incorrectly in others. Finally, Fossilization is often produced as a possible explanation of the recurrence of erroneous Interlanguage forms in learner Production. Although these phenomena are accounted for in Anderson’s framework, I believe that a discussion of mainstream SLA theories and research will enhance the reader’s understanding of their nature and implications for L2 teaching. It should be noted that for reason of relevance and space my discussion will be concise and focus only on the aspects which are most relevant to the present study.

5.1 Language Transfer

This phenomenon refers to the way prior linguistic knowledge influences L2-learner development and performance (Ellis, 1994). The occurrence of Language Transfer can be accounted for by applying the ACT* framework since, as Anderson asserts, existing Declarative Knowledge is the starting point for acquiring new knowledge and skills. In a language-learning situation this means drawing on knowledge about previously learnt languages both in order to understand the mechanisms of the target language and to solve a communicative problem. In this section, I shall draw on the SLA literature in order to explain how, when and why Language Transfer occurs and with what effects on learner written output.

As Odlin (1989) points out, Language Transfer can be positive, facilitating L2-performance. This is often the case with students of mine who studied French or Spanish and are able to transfer their knowledge of these languages advantageously to Italian because Romance languages share a large number of cognates and grammatical rules. However, Language Transfer can also be negative, resulting in erroneous L2-output. For instance, over-confidence in the fact that Italian and French/Spanish are similar may prompt a learner with L3-French to apply the rules of the French Subjunctive in the deployment of the Italian Subjunctive. This strategy will be effective in some contexts but unsuccessful in others.
Transfer can also result in the avoidance or the over production of L2-structures. For example, several intermediate Japanese learners of Italian I taught in the past avoided using relative clauses because these do not exist in their L1. On the other hand they over-used the definite article because, being totally unfamiliar with the concept of definite article in their language and noticing that Italians use it frequently, they thought that they were less likely to err if they used it all the time.
Transfer can occur as a deliberate Compensatory Strategy: a learner’s conscious attempt to fill a gap in his/her L2-knowledge (Faerch and Kasper, 1983). This phenomenon is particularly recurrent when the distance between the learner’s L1/L3 and the target language is perceived as close (e.g. Spanish and Italian). Transfer can also occur subconsciously (Poulisse, 1990). When used as Compensatory Strategy, Transfer can give rise to ‘Foreignization’ and ‘Code-switching’ errors. The former refer to the conscious alteration of L1- or L3-words to make them ‘sound’ target language like. For instance, not knowing the Italian for ‘rice’ (= riso) a French learner may add an ‘o’ to the French word ‘riz’ in the hope that the resulting ‘rizo’ will be correct. Code-switching, instead, consists in the conscious or subconscious use of unaltered L1-/L3-words/phrases when an L2-word is required. Both types of error are more likely to happen in spoken language, especially when a learner is under communicative pressure or does not have access to dictionaries or other sources of L2-knowledge. However, I have personally observed this phenomenon also in the writing of many L2-student writers, especially at the level of connectives (e.g. the French conjunction ‘et’ instead of the Italian ‘e’).
Transfer may affect any level of L2-learner output. As far as the areas of language use more relevant to the present study are concerned (syntax, morphology and lexis), Ringbom (1987) reports evidence from Ringbom (1978) and other studies (e.g. Sjoholm, 1982) that L1-Transfer affects lexical usage more than it does syntax or morphology. Of these two, it appears that morphology is the less affected area. The following factors appear to determine the extent to which Language Transfer occurs:
 (1) Perceived language distance: the closer two languages are perceived to be the more likely is Transfer to occur (see Sjoholm,1982)
 (2) Learning environment: it appears that Transfer is more likely to occur in settings where the naturalistic input is lower (Odlin, 1989);
 (3) Levels of monitoring: Gass and Selinker (1983) observe that careful, unmonitored learner output usually contains fewer instances of Transfer errors
 (4) Learner-type: learners who take more risks and are more meaning-oriented tend to transfer less than form-focused ones (Odlin, 1989);
(5) Task: some tasks appear to elicit greater use of Transfer (Odlin, 1989). This appears to be the case for L1-into-L2 translation including the approach, typical of many beginner L2-learners, whereby an L2-essay is produced first in the L1 and then translated word by word.
 (6) Proficiency: as the Anderson Model and many other Cognitive models (e.g. deBot, 1992) posit, the starting point of acquisition is the L1 which is gradually replaced by the target language as more and more L2-language items are acquired. Thus, Transfer is more likely to occur at the early stages of development than in the advanced ones. This is borne out by a number of studies (e.g. Taylor, 1975; Liceras, 1985; Major, 1987). Kellerman (1978), however, found that a number of Transfer errors occur only at advanced stages.
 5.2 Communication Strategies
Due to space constraints, my discussion of Communication Strategies (CSs) will be limited to the basic issues and levels of language (i.e. grammar, lexis and orthography) relevant to this study. Corder (1978) defined a CS as follows:

a systematic technique employed by a speaker to express his meaning

when faced with some difficulty. Difficulty in this definition is taken to

refer uniquely to the speaker’s inadequate command of the language in

the interaction (Corder, 1978: 8)

A number of taxonomies of CSs have been suggested. Most frameworks (e.g. Faerch and Kasper, 1983) identify two types of approaches to solving problems in communication: (1) avoidance behaviour (avoiding the problem altogether); (2) achievement behaviour (attempting to solve the problem through an alternative plan). In Faerch and Kasper’s (1983) framework, the two different approaches result respectively in the deployment of (a) reduction strategies, governed by avoidance behaviour, and (b) achievement strategies, governed by achievement behaviour.

Reduction strategies can affect any level of writing from content (Topic avoidance) to orthography (Graphological avoidance). Most CSs studies, however, have focused on lexical items. Achievement strategies (Faerch and Kasper, 1983) correspond to Tarone’s (1981) concept of Production Strategies and to Corder’s (1978, 1983) Resource expansion strategies. By using an achievement strategy, the learner attempts to solve problems in communication by expanding his communicative resources (Corder, 1978) rather than by reducing his communicative goal (functional reduction). Faerch and Kasper (1983) identify two broad categories of achievement strategies: Compensatory and Non linguistic. The Compensatory strategies relevant to the present study are:
 (1) Code switching (see 2.4.1 above)
(2) Interlingual transfer (see 2.4.1 above)

(3) Inter-/intralingual transfer, i.e. a generalization of an IL rule is made but the generalization is influenced by the properties of the corresponding L1-structures (Jordens, 1977)

 (4) IL based strategies. These include:

(i) Generalization: the extension of an item to an inappropriate context in order to fill the ‘gaps’ in their plans. One type of generalization relevant to the present study is Approximation, that is: the use of a lexical item to express only an approximation of the intended meaning.

(ii) Word coinage. This kind of strategy involves the learner in a creative construction of a new IL word

 5.3 Variability: the occurrence of unsystematic errors
Variability in learner language refers to the phenomenon whereby a given structure is produced correctly in certain contexts and incorrectly in others. As Ellis (1994) observed, this phenomenon is very common in the early stages of acquisition and may rapidly disappear. The Anderson model can be used to account for Variability as follows: firstly, as Anderson posits, two or more Productions which refer to different hypotheses about the use of a structure can co-exist in a learner’s LTM before the onset of the Discrimination process. These Productions compete for retrieval and, if they have more or less equal strength, may be used alternately at a given stage of development as the learner is testing their effectiveness through the trial-and-error process which characterizes the early stages of learning.
Secondly, if amongst the Productions relative to a given structure, Production ‘X’ based on the correct rule is much weaker than Production ‘Y’ based on an incorrect rule, Production ‘Y’ is likely to be retrieved first when a learner is not devoting sufficient conscious attention to it and and his/her brain ‘runs on automatic’. The lack of attention is usually determined by processing inefficiency, that is the incapacity of WSTM to cope with the demands that the task poses on its attentional system (Bygate, 1988). Processing inefficiency issues in writing are more likely to arise in unplanned and/or unmonitored Production (Krashen, 1977, 1981), especially when the L2-learner is under severe time constraints / communicative pressure (Polio, Fleck and Ledere 1998).
 A third cause of Variability refers to what above I called the ‘Procedural route’ to acquisition: aspects of the usage of a structure may have been acquired by a learner through the rote learning of or exposure to set L2-phrases (e.g. classroom phrases). Thus, in cases where that structure is well beyond that learner’s stage of development and s/he doe not know any declarative knowledge of that structure, s/he will deploy that structure correctly within the context of those set phrases while being likely to make mistakes with it in other contexts.
 5.4 Fossilization

In the SLA literature, Fossilization (or Routinization) refers to the phenomenon whereby some IL forms keep reappearing in a learner’s Interlanguage ‘in spite of the learner’s ability, opportunity and motivation to learn the target language…’ (Selinker and Lamendella, 1979: 374). An error can become fossilised even if L2-learners possess correct declarative knowledge about that form and have received intensive instruction on it (Mukkatesh, 1986).

Applying the Anderson Model, Fossilization can be explained as the Proceduralisation of an erroneous form through frequent and successful use. As already discussed, Productions that have been proceduralised are very difficult to alter, which would explain why some theorists believe that Fossilisation is a permanent state (Lamendella, 1977; Mukkatesh, 1986). For applied linguists working in the Skill-theory paradigm errors can be de-fossilised, but only after a lengthy and painstaking process of re-learning of the correct form through targeted monitoring and practice in real operating conditions (Johnson, 1996).
Several models (biological, acculturational, interactional, etc.) have been proposed to account for the development of Fossilization in L2-learning. Interactional models state that the interaction between the learner and other L2-speakers determines whether a component of the learner’s Interlanguage system is reinforced contributing to Fossilization. One such model, Tollefson and Firn’s (1983), posits that an overemphasis on conveyance of meaning in the classroom may, in the absence of cognitive feedback, promote fossilization.
On this issue, Johnson (1996) also asserts that linguistic survival is often achieved by a form of pidgin and that encouraging this type of communication in the language classroom is a practice conducive to fossilisation. Skehan (1994) and Long (1983) also make the point that communicative production might lead to the development of reduction strategies resulting in pidginogenesis and fosssilization.
 6. A Cognitive account of the writing processes: the Hayes and Flower (1980) model

Hayes and Flower’s (1980) model of essay writing is regarded as one of the most effective accounts of writing available to-date (Eysenck and Keane, 1995). As Figure 2 below shows, it posits three major components:

1. Task-environment,

2. Writer’s Long-Term Memory,
3. Writing process.

Figure 1: The Hayes and Flower model (adapted from Hayes and Flower, 1980)

The Task-environment includes: (1) the writing assignment (the topic, the target audience, and motivational factors) and the text; (2) The Writer’s LTM, which provides factual knowledge and skill/genre specific procedures; (3) the Writing Process, which consists of the three sub-processes of Planning, Translating and Reviewing.

The Planning process sets goals based on information drawn from the Task-environment and Long-Term Memory (LTM). Once these have been established, a writing plan is developed to achieve those goals. More specifically, the Generating sub-process retrieves information from LTM through an associative chain in which each item of information retrieved functions as a cue to retrieve the next item of information and so forth. The Organising sub-process selects the most relevant items of information retrieved and organizes them into a coherent writing plan. Finally, the Goal-setting sub-process sets rules (e.g. ‘keep it simple’) that will be applied in the editing process. The second process, Translating, transforms the information retrieved from LTM into language. This is necessary since concepts are stored in LTM in the form of Propositions, not words. Flower and Hayes (1980) provide the following examples of what propositions involve:

[(Concept A) (Relation B) (Concept C)]

 or
{Concept D) (Attribute E)], etc.

Finally, the Reviewing processes of Reading and Editing have the function of enhancing the quality of the output. The Editing process checks that discourse conventions are not being flouted, looks for semantic inaccuracies and evaluates the text in the light of the writing goals. Editing has the form of a Production system with two IF- THEN conditions:

 The first part specifies the kind of language to which the editing production

applies, e.g. formal sentences, notes, etc. The second is a fault detector for

such problems as grammatical errors, incorrect words, and missing context.

(Hayes and Flower, 1980: 17)

 When the conditions of a Production are met, e.g. a wrong word ending is detected, an action is triggered for fixing the problem. For example:

CONDITION 1: (formal sentence) first letter of sentence lower case

CONDITION 2: change first letter to upper case

(Adapted from Hayes and Flower, 1980: 17)

Two important features of the Editing process are: (1) it is triggered automatically whenever the conditions of an Editing Production are met; (2) it may interrupt any other ongoing process. Editing is regulated by an attentional system called The Monitor. Hayes and Flower do not provide a detailed account of how it operates. Differently from Krashen’s (1977) Monitor, a control system used solely for editing, Hayes and Flower’s (1980) device operates at all levels of production orchestrating the activation of the various sub-processes. This allows Hayes and Flower to account for two phenomena they observed. Firstly, the Editing and the Generating processes can cut across other processes. Secondly, the existence of the Monitor enables the system to be flexible in the application of goal-setting rules, in that through the Monitor any other processes can be triggered. This flexibility allows for the recursiveness of the writing process.

 7. Extending the model: Cognitive accounts of the translating sub-processes and insights from proofreading research

Hayes and Flower’s model is useful in providing teachers with a framework for understanding the many demands that essay writing poses on students. In particular, it helps teachers understand how the recursiveness of the writing process may cause those demands to interfere with each other causing cognitive overload and error. Furthermore, by conceptualising editing as a process that can interrupt writing at any moment, the model has a very important implication for a theory of error: self-correctable errors occurring at any level of written production are not always the result of a retrieval failure; they may also be interpreted as caused by detection failure. However, one limitation of the model for a theory of error is that its description of the Translating and Editing sub-processes is too general. I shall therefore supplement it with Cooper and Matsuhashi’s (1983) list of writing plans and decisions along with findings from other L1-writing Cognitive research, which will provide the reader with a more detailed account. I shall also briefly discuss some findings from proofreading research which may help explain some of the problems encountered by L2-student writers during the Editing process.

7.1 The translating sub-processes

Cooper and Matsuhashi (1983) posit four stages, which correspond to Hayes and Flower’s (1980) Translating: Wording, Presenting, Storing and Transcribing. In the first stage, the brain transforms the propositional content into lexis. Although at this stage the pre-lexical decisions the writer made at earlier stages and the preceding discourse limit lexical choice, Wording the proposition is still a complex task: ‘the choice seems infinite, especially when we begin considering all the possibilities for modifying or qualifying the main verb and the agentive and affected nouns’ (Cooper and Matsuhashi, 1983: 32). Once s/he has selected the lexical items, the writer has to tackle the task of Presenting the proposition in standard written language. This involves making a series of decisions in the areas of genre and grammar. In the area of grammar, Agreement and Tense will be the main issues.
The proposition, as planned so far, is then temporarily stored in Working Short Term Memory (henceforth WSTM) while Transcribing takes place. Propositions longer than just a few words will have to be rehearsed and re-rehearsed in WSTM for parts of it not to be lost before the transcription is complete. The limitations of WSTM create serious disadvantages for unpractised writers. Until they gain some confidence and fluency with spelling, their WSTM may have to be loaded up with letter sequences of single words or with only 2 or 3 words (Hotopf, 1980). This not only slows down the writing process, but it also means that all other planning must be suspended during the transcriptions of short letter or word sequences.

The physical act of transcribing the fully formed proposition begins once the graphic image of the output has been stored in WSTM. In L1-writing, transcription occupies subsidiary awareness, enabling the writer to use focal awareness for other plans and decisions. In practiced writers, transcription of certain words and sentences can be so automatic as to permit planning the next proposition while one is still transcribing the previous one. An interesting finding with regards to these final stages of written production comes from Bereiter, Fire and Gartshore (1979) who investigated L1-writers aged 10-12. They identified several discrepancies between learners’ forecasts in think-aloud and their actual writing. 78 % of such discrepancies involved stylistic variations. Notably, in 17% of the forecasts, significant words were uttered in forecasts which did not appear in the writing. In about half of these cases the result was a syntactic flaw (e.g. the forecasted phrase ‘on the way to school’ was written ‘on the to school’). Bereiter and Scardamalia (1987) believe that lapses of this kind indicate that language is lost somewhere between storage in WSTM and grapho-motor execution. These lapses, they also assert, cannot be described as ‘forgetting what one was going to say’ since almost every omission was reported on recall: in the case of ‘on the to school’, for example, the author not only intended to write ‘on the way’ but claimed later to have written it. In their view, this is caused by interference from the attentional demands of the mechanics of writing (spelling, capitalization, etc.), the underlying psychological premise being that a writer has a limited amount of attention to allocate and that whatever is taken up with the lower level demands of written language must be taken from something else.

In sum, Cooper and Matsuhashi (1983) posit two stages in the conversion of the preverbal message into a speech plan: (1) the selection of the right lexical units and (2) the application of grammatical rules. The unit of language is then deposited in STM awaiting translation into grapho-motor execution. This temporary storage raises the possibility that lower level demands affects production as follows: (1) causing the writer to omit material during grapho-motor execution; (2) leading to forgetting higher-level decisions already made. Interference resulting in WSTM loss can also be caused by lack of monitoring of the written output due to devoting conscious attention entirely to planning ahead, while leaving the process of transcription to run ‘on automatic’.

 7.3 Some insights from proofreading research

Proofreading theories and research provide us with the following important insights in the mechanisms that regulate essay editing. Firstly, proofreading involves different processes from reading: when one proofreads a passage, one is generally looking for misspellings, words that might have been omitted or repeated, typographical mistakes, etc., and as a result, comprehension is not the goal. When one is reading a text, on the other hand, one’s primary goal is comprehension. Thus, reading involves construction of meaning, while proofreading involves visual search. For this reason, in reading, short function words, not being semantically salient, are not fixated (Paap, Newsome, McDonald and Schvaneveldt, 1982). Consequently, errors on such words are less likely to be spotted when one is editing a text concentrating mostly on its meaning than when one is focusing one’s attention on the text as part of a proofreading task (Haber and Schindler, 1981). Errors are likely to decrease even further when the proofreader is forced to fixate on every single function word in isolation (Haber and Schindler, 1981).

 It should also be noted that some proofreader’s errors appear to be due to acoustic coding. This refers to the phenomenon whereby the way a proofreader pronounces a word/diphthong/letter influences his/her detection of an error. For example, if an English learner of L2-Italian pronounces the ‘e’ in the singular noun ‘stazione’ (= train station) as [i] instead of [e], s/he will find it difficult to differentiate it from the plural ‘stazioni’ (= train stations). This may impinge on her/his ability to spot errors with that word involving the use of the singular for the plural and vice versa.
 The implications for the present study are that learners may have be trained to go through their essays at least once focusing exclusively on form. Secondly, they should be asked to pay particular attention to those words (e.g. function words) and parts of words (e.g. verb endings) that they may not perceive as semantically salient.

7.4 Bilingual written production: adapting the unilingual model

Writing, although slower than speaking, is still processed at enormous speed in mature native speakers’ WSTM. The processing time required by a writer will be greater in the L2 than in the L1 and will increase at lower levels of proficiency: at the Wording stage, more time will be needed to match non-proceduralized lexical materials to propositions; at the Presenting stage, more time will be needed to select and retrieve the right grammatical form. Furthermore, more attentional effort will be required in rehearsing the sentence plans in WSTM; in fact, just like Hotopf’s (1980) young L1-writers, non proficient L2-learners may be able to store in WSTM only two or three words at a time. This has implications for Agreement in Italian in view of the fact that words more than three-four words distant from one another may still have to agree in gender and number. Finally, in the Transcribing phase, the retrieval of spelling and other aspects of the writing mechanics will take up more WSTM focal awareness.

Monitoring too will require more conscious effort, increasing the chances of Short-term Memory loss. This is more likely to happen with less expert learners: the attentional system having to monitor levels of language that in the mature L1-speaker are normally automatized, it will not have enough channel capacity available, at the point of utterance, to cope with lexical/grammatical items that have not yet been proceduralised. This also implies that Editing is likely to be more recursive than in L1-writing, interrupting other writing processes more often, with consequences for the higher meta-components. In view of the attentional demands posed by L2-writing, the interference caused by planning ahead will also be more likely to occur, giving rise to processing failure. Processing failure/WSTM loss may also be caused by the L2-writer pausing to consult dictionaries or other resources to fill gaps in their L2-knowledge while rehearsing the incomplete sentence plan in WSTM. In fact, research indicates that although, in general terms, composing patterns (sequences of writing behaviours) are similar in L1s and L2s there are some important differences.
In his seminal review of the L1/L2-writing literature, Silva (1993) identified a number of discrepancies between L1- and L2-composing. Firstly, L2-composing was clearly more difficult. More specifically, the Transcribing phase was more laborious, less fluent, and less productive. Also, L2-writers spent more time referring back to an outline or prompt and consulting dictionaries. They also experienced more problems in selecting the appropriate vocabulary. Furthermore, L2-writers paused more frequently and for longer time, which resulted in L2-writing occurring at a slower rate. As far as Reviewing is concerned, Silva (1993) found evidence in the literature that in L2-writing there is usually less re-reading of and reflecting on written texts. He also reported evidence suggesting that L2-writers revise more, before and while drafting, and in between drafts. However, this revision was more problematic and more of a preoccupation. There also appears to be less auditory monitoring in the L2 and L2-revision seems to focus more on grammar and less on mechanics, particularly spelling. Finally, the text features of L2-written texts provide strong evidence suggesting that L2-writing is a less fluent process involving more errors and producing – at least in terms of the judgements of native English speakers – less effective texts.
 8. Conclusion : Implications for teaching and learning
 In the above I have discussed my espoused theories of L2-acquisition and L2-writing. I started by focusing on Anderson’s (1980, 1982, 1983, 2000) account of how language structures are acquired and language processing develops. Drawing on SLA research I then discussed some important phenomena and processes involved in the aetiology of error relevant to the present study. Finally, I discussed Hayes and Flower (1980) and Cooper and Matsuhashi’s (1983) models of written production and their implications for bilingual written production. The following notions emerging from my discussion must in my view provide the theoretical underpinnings of any remedial corrective approach to L2 writing errors.
 (1) L2-acquisition occurs in much the same way as the acquisition of any other cognitive skill;

(2) the acquisition of a skill begins consciously with an associative stage during which the brain creates a declarative representation of Productions (i.e. the procedures that regulate that skill);

 (3) it is an adaptive feature of the human brain to make the performance of any skill automatic in order to render its execution fast and efficient in terms of cognitive processing;
(4) automatisation can be a very lengthy process, since for a skill to become automatic it must be performed numerous times;

(5) the Productions that regulate a skill become automatised only if their application is perceived by the brain as resulting in positive outcomes;

 (6) at a given stage in learner development, more than one Production relating to a given item can co-exist in his/her Interlanguage. These compete for retrieval. The Productions with the stronger memory trace – not necessarily the correct one – will win;

(7) negative evidence as to the effectiveness of a Production determines whether it is going to be rejected by the brain or automatised;

(8) once a Production (including those giving rise to errors) is automatised, it is difficult to alter;

(9) errors may be the result of lack of knowledge or processing efficiency problems;

(10) learners use Language Transfer and Communication Strategies to make up for the absence of the appropriate L2-declarative knowledge necessary in order to realize a given communicative goal. These phenomena are likely to give rise to error.

(11) the writing process is recursive and can be interrupted by editing any time;

(12) the errors in L2-writing relating to morphology and syntax occur mostly in the Translating phase of the writing process when Propositions are converted into language. They may occur as a result of cognitive overload caused by the interference of various processes occurring simultaneously and posing cognitive demands beyond the processing ability of the writer’s WSTM.

(13) editing for meaning involves different processes than editing for form. When editing for meaning the writer/editor is more likely to miss function words because they are less semantically salient.

These notions have important implications for any approach to error correction. One refers to Anderson’s assumption that the acquisition of L2-structures in classroom-settings mostly begins at conscious level with the creation of mental representations of the rules governing their usage. The obvious corollary being that corrective feedback should help the learners create or restructure their declarative knowledge of the L2-rule system, any corrective approach should involve L2 students in grammar learning involving cognitive restructuring and extensive practice. This entails delivering a well planned and elaborate intervention not just a one-off lesson on a structure identified as a problem in a learner’s written piece.

Another important notion advanced by Anderson is that the automization of a Production occurs only after it has been applied numerous times and with success (actual or perceived). This notion has three major implications for Error Correction.
 (1) Error Correction can play an important role in L2-acquisition since, in order to reject a wrong production, the learner needs lots of negative evidence that informs him/her of its incorrectness.

(2) Errors should be corrected consistently to avoid sending the learners confused messages about the correctness of a given structure.

(3) For Error Correction to lead to the de-fossilization of wrong Productions and the automatization of new, correct Productions, the former should occur in learner output as rarely as possible, whereas the latter should be produced as frequently as possible.

 Consistently with these three notions, a teacher may want to invest a lot of effort in raising the learners’ awareness of their errors, should be as consistent as possible in correcting them and, finally, encourage learners to practise the problematic structures as often as possible in and outside the context of the essays they will write.
Other implications refer to the concept of automatization. As discussed above, automatised cognitive structures are difficult to alter. It follows that Error Correction is more likely to be successful (in the absence of major developmental constraints) at the early stages of learning an L2-item, before ‘incorrect’ Productions have reached the ‘Strengthening’ stage of Acquisition. Thus, in order to prevent error fossilization or automatization any corrective intervention should tackle errors more prone to routinization (usually those referring to less semantically salient language items) as early as possible in the acquisition process.
Another set of implications relates to the causes and nature of learner errors. As discussed above, a number of errors result from L2-learners’ attempt to make up for their lack of correct L2-declarative knowledge through the deployment of the following problem-solving strategies:

(1) Communication Strategies: in the absence of linguistic knowledge of an L2-item a learner may deploy achievement strategies. As far as lexical items are concerned they may deploy the following strategies leading to error: ‘Approximation’, ‘Coinage’ and ‘Foreignization’. In the case of grammar or orthography learners will draw on existing declarative knowledge, over generalizing a rule (generalization) or guessing;

(2) Use of resources: learners may use dictionaries or other sources of L2-knowledge (including people) incorrectly;
(3) L1-or L3-transfer;

(4) Avoidance.

 Since these errors are extremely likely to occur in beginner and intermediate students’ writing, teachers should involve students in activities raising learner awareness of these issues and provide practice in ways of tackling them. For instance, as far as the above Communicative Strategies are concerned, students should be trained to use dictionaries and other resources more frequently to prevent errors due to Approximation, Coinage and Foreignization. Secondly, as far as poor use of resources is concerned learners must be made aware of the possible pitfalls of using dictionaries and textbooks and be trained to use these tools more effectively and efficiently. Thirdly, learners must be made aware of the issues related to the excessive reliance on L1-/L3-Transfer and of negative Transfer (again, through effective learner training)

As discussed above, errors can also be caused by WSTM processing failure due to cognitive overload. Grammatical, lexical and orthographical errors will occur as a result of learners handling structures which have not been sufficiently automatized, in situations where the operating conditions in WSTM are too challenging for the attentional system to monitor all levels of production effectively. The implications for Error Correction is that learners should be made aware of which types of contexts are more likely to cause processing efficiency failure so that they may approach them more carefully in the future. Examples of such contexts may be sentences where the learner is attempting to express a difficult concept which requires new vocabulary and the use of tenses/moods he has not totally mastered; long sentences where items agreeing with each other in gender and/or number are located quite far apart from each other (not an uncommon occurrence in Italian); situations in which the production of a sentence has to be interrupted several times because the learner needs to consult the dictionary. Remedial practice should provide the learners with opportunity to operate in such contexts in order to train them to cope with the cognitive demands they pose on processing efficiency in Real Operating Conditions.

Another important implication of my discussion for Error Correction refers to the notion that errors are not simply the result of a Translating failure, but also of an Editing failure. The failure to detect may be due to two factors. One relates to the goal oriented-ness of the Production systems that regulate any levels of language processing: the brain is going to review the accuracy of every single aspect of the text only if it perceives that this is relevant to its goals in the production of the text. Thus, if the communication of content is the main goal the writer sets in an essay, the accuracy of function words is likely to become a secondary concern since they are not perceived as salient to the realisation of that goal. The other issue will be time. It is likely that lack of time will exacerbate this issue since it will force learners to prioritise certain aspects of their output in the Editing phase(s) over others. The implication for Error Correction is that it should aim at developing learner intentionality to be accurate at every level of the text. This may not be easy if accuracy does not feature prominently amongst the curriculum, teacher and/or student’s priorities.
Secondly, editing failure may be due to the fact that reading an essay to check and/or improve the quality of its content is different from proofreading aimed at checking non-semantic aspects of the output. As noted above, the former approach to text revision often results in the failure to detect errors with function words. The implications of this phenomenon for corrective approaches is that learner awareness of the importance of paying greater attention to function words in Editing essays should be raised. Moreover, as an editing strategy, learners should be advised to carry out the revision of their essay-drafts in two distinct phases: one aimed at checking the content and another one focused exclusively on the accuracy of grammar, lexis and orthography.
Furthermore, editing failure may be caused by the same issues that caused learners to err in the first place, that is: processing efficiency. Thus, the contexts that I listed above, sentences that are long and/or complex and/or contain problematic structures, etc. may pose problems on the learner ability to detect and/or self-correct the errors. One way to tackle this issue in remedial teaching is to advise the learners to be particularly careful in editing this kind of sentences and to approach them in a way that poses less strain on their processing efficiency; for example, by concentrating first on the items that, based on the self-knowledge they will have developed as part of metacognitive training, they are more likely to get wrong in that kind of context (training in the Monitoring-Familiar-Errors strategy would help in this respect).
A final point refers to the implications of the phenomenon of Variability for the diagnostic phase of any error treatment. As discussed above, this phenomenon may confuse the teacher or the error analyst as to whether a learner knows a given structure or not, since s/he seems to get it right at times and wrong at others. The implications of this phenomenon for Error Correction is that teachers should investigate the causes of any occurrence of this phenomenon in their learners’ writing in order to ascertain whether they refer to poor editing skills, partial knowledge of the target rule, etc. Based on the identification of the causes an appropriate action plan will be decided.