The ugly truth about school-based Modern Language teaching

images-4

(with Steve Smith)

I was recently criticised by some of Stephen Krashen’s fans for something that to me and many other teachers is a sad given : MFL teachers operating in secondary schools have simply no time to teach languages the way they should ideally be taught. Time and syllabus constraints force teachers to extremely tight schedules which do not allow for the extensive listening and reading practice that it is evident from much research that every language learner benefits from before engaging in real-life-like speaking.

If I had five hours contact time a week I would teach entirely differently from the way I teach now.  This would be my recipe: lots of daily receptive exposure to compelling aural and written input ; plenty of oral interaction through fun and challenging communicative activities (even more than the 30 minutes per lesson I do now);  engaging multimedia project-based learning ;  drama and art activities ; cultural awareness-raising through videos and realia ; exciting enquiry-based grammar learning.

The problem is, for teachers working in England to effectively prepare their students for GCSE and A-Level examinations, all of the very desirable above simply cannot be done as often as one would like. We all know that. Hence, effective teaching in our context is not merely about applying what we know best benefits language acquisition ; but it is first and foremost how to make the most of the time we have available to build our students’ linguistic competence, self-confidence and motivation adapting what we know about human language acquisition to the context we operate in.

The American army knew this all too well when they had to prepare their troops linguistically for the Normandy invasion in 1945. Surely they could not afford to put their soldiers through hours and hours of receptive learning through engaging stories in the belief that languages are best learnt subconsciously through exposure to comprehensible input (as many Americans in Dr Krashen’s camp – my critics – believe). Hence they devised an approach which was drill-based ; lots of repetition through controlled tasks aimed at practising phrase after phrase to death until they were so embedded in their soldiers’ memory that they became spontaneous. In this approach, grammar was taught through robotic repetition and manipulation of small parts of sentences, e.g. I play tennis, my mother plays tennis, my father doesn’t play tennis, we play tennis.

Although ideologically I do not agree with this method at all, and it is not the way I learnt the seven languages I am fluent in and the other seven I speak less well, I see the merit of aspects of this approach in the beginning phase of every learning, the parroting stage of classroom-based acquisition. Lots of drilling does help embed the core vocabulary and grammar structures, it is undeniable. And it can be made fun, too, with a bit of imagination – e.g. my receptive drills in the game room at http://www.language-gym.com/#/game-room  or my oral communicative drills. And if the phrases and words we embed in the drills consists of lexical items and sentences which can be very useful in the real world and are taught and practised within typical real-life communicative contexts, all the better still !

The truth is that every method language researchers and educationsts have come up with in the last fifty – sixty decades or so is effective in its own way, each of them addressing one different stage or facet of the complex process that language acquisition is. To say my method is better than yours is preposterous. Yet proponents of each method do, sometimes inspired by a genuine passion for and belief in the validity of their approach, more than often driven by a business or political agenda.

We, as school-based teachers, have been historically the victims of this state of affairs, decade after decade. Subjected to fads which were not a faithful reflection of each new method,but rather the botched-up adaptation of often-sound theories and methodologies by governments and their consultants, which reshaped them to fit the target cultural, political and socio-economic context, mindful less of our needs or our students’ than of their own agendas.

The result is a teaching profession whose pedagogic beliefs – whether we are aware of it or not- are often a hybrid of all the methodological approaches it has been exposed to in the last forty years or so  – whether through word of mouth, readings, CPD, government policies, etc. So many of us are advocates of the Communicative approach whilst teaching grammar like the Romans or the Greeks used to 2,000 years ago ; believe that reading extensively for pleasure will subconsciously result in learning whilst we train our students to teach towards reading comprehension tests that teach little ; advocate the importance of oral interaction and listening but most lessons are about reading and writing – or  embrace enquiry-based learning tasks where students barely ever speak; say one should tolerate error and that mistakes are ‘good’ (as CLT preaches) but then make a huge fuss about them by excessively focusing students on correction (through D.I.R.T., stamps and time-consuming dialogic practices).

Eclecticism or pedagogic hypocrisy ? Neither, in my opinion. The ugly truth is that a lot of us are confused and disoriented ; overloaded with government and school policy requirements which change way  too often and quickly ; overflooded with information coming from different camps ; misinformed by CPDs which squeeze years of researching and theorizing in one or two Powerpoint slides ; galvanized by keynote speakers who excite us with great ideas which are difficult to translate into our classroom practice.

Hence, as I always ‘preach’ in my posts, the need for (a) having a clear understanding of modern language pedagogy so as to be able to understand the state of the art of educational pedagogy beyond the different factions and fads’ political agendas ; (b) having a basic reference framework based on that understanding that will enable us to approach lesson and curriculum planning, assessment and feedback in a no-nonsense, practical and principled way.

Having such an understanding and such a framework  – which in my case is MARS + EAR ( see my blogposts on this) – has made my everyday lesson planning much easier and hassle-free and when questioned by my superiors it has allowed me to provide them with a clear rationale for my pedagogic strategies and choices rooted in Skill-Theory and neuroscience. Maybe not perfect, but working well for me. Incidentally, it was interesting to see how Rachel Hawkes and others – who had never publicly advocated Skill theory principles before – have recently published a paper which reflects all of the views I have expressed in my blog in the last year or so. It means that after all, some English MFL ‘influencers’ have finally decided to embrace neuroscience…

The path to becoming a better teacher does come through reflectivity, as most of todays’ CPD gurus preach. But understanding the basic neuroscience facts about language acquisition and developing your own framework fuels and structures that reflectivity and significantly reduces the occurrence of the cognitive block that many teachers who contact me through social media tell me they often experience when they plan lessons. It also reduces the likelihood that your planning is driven by the activities/resources you find rather than the much healthier opposite scenario, i.e.: you choosing the activities/resources to best serve your planning.

Steve Smith and I wrote our book ‘The Language Teacher Toolkitto provide our colleagues with such an understanding of Modern Language pedagogy  and with such a principled teaching framework. Interestingly, we came to it from totally different camps, Steve being a believer in the importance of comprehensible input, whilst I am a Skill-theory fan ; still we could come to an agreement of what constitutes a useful, pragmatic, ‘fadless’ and hassle-free approach to language teaching. Other bloggers, such as Sara Cottrel of www.musicuentos.com  and Justin Slocum Bailey of Indwelling languages have also been pursuing the same noble intent.

No, I am not merely trying to plug our book. My point is that once you have a clear understanding of the basic processes that regulate  human learning, are aware of the core research facts and regularly reflect on your classroom experience in the light of that understanding and that awareness, you will have a powerful pedagogic compass to orientate yourself through the jungle of bastardised pedagogic messages – like the ones I discussed in my previous post – which make our daily professional life so much more challenging and confusing.

In conclusion, the ugly truth that Modern Languages teachers have to contend to, day in day out is that time, logistics, syllabus constraints and government policies prevent them from teaching the way one ideally should. Educationists and researchers rarely recognize that, detached as they are from our world and more concerned with plugging their fads than with the often harsh reality of bog standard state schools. Curriculum designers, examination boards and textbook authors do attempt to incorporate the new methodologies and fads in their work but they often do so superficially at the detriment of sound pedagogy, giving rise to belief systems and practices which teachers often have to adhere to uncritically and which often clash with one another and with common sense. The result is the current state of affairs : an overloaded and overworked teaching profession that is often confused as to what constitutes best pedagogic practice disorientated as it is by mixed messages coming from multiple directions. This may affects teachers’ efficacy thereby eroding their self-confidence, motivation and, ultimately, their well-being.

The solution : getting a better understanding of pedagogy so that you can make an informed choice as to which method to apply where, when and with who ; so that you build instructional sequences based on a method rather than a hunch ; so that you do not let tasks and games you know or have found guide your teaching instead of your know-how; so that you can tell SLT why they got it all wrong.

The Language Teacher Toolkit is available here, on http://www.amazon.co.uk

Advertisement

Why I teach the way I teach. The Skill-Theory principles which underpin my teaching approach

Fig. 1 – The most influential Skill-Theory account of language acquisition (Anderson, 1983)

1. Introduction

In a previous post, I argued that every language teacher, both novice and expert, should ask themselves the question “How do I believe that languages are learnt?” as a starting point for a deep and productive reflection on their own teaching practice.

The answer to that question is key, as without a clear and solid set of pedagogic principles our curriculum planning and design and every other decision that affects teaching and learning in our classroom will be random and haphazard or based on ‘hunches’. Imagine choosing a course-book, creating assessment procedures and materials,  deciding to integrate Information Technology or Generic-skill learning in our teaching without having formed an opinion as to how languages are best taught and learnt? Would you believe me if I told you that I have seen this done, time and again, even in some of the best  schools in the world?

As I suggested in that post, teachers and language departments should identify the set of pedagogic principles that truly constitute the tenets of their teaching philosophy and classroom approach and draw on them to ‘frame’ their long-, medium- and short-term planning, their discussions on teaching and learning (e.g. the ones that occur after a lesson observation), their assessment and any big decision of theirs that may significantly impact teaching and learning. Having such a framework will warrant coherence and fairness in peer and student assessment. It will also give the course administrators a better idea of what Modern Language (ML) teaching and learning is about in the institution they manage.

This is my own personal answer to the question “How do I believe that language are learnt?”, or rather part of it, as I will narrow the scope of this post only to the main tenets of my approach to ML teaching – borrowed from Skill Theory. Hence I will leave out other major influences on my personal pedagogy (e.g. Schmidt’s Noticing Hypothesis, Bandura’s Self-efficacy theory, Selinker’s Interlanguage hypothesis, MCcLelland and Rumelhart’s Connectionism, etc.).

2. My set of guiding principles

2.1  Skill Theory – the (very) bare bones

Whilst it integrates elements from several SLA theories, My approach is rooted in Cognitive-psychology-based accounts of instructed  second language acquisition, especially what Applied Linguists call Skill Theory (as laid out in Anderson,1994; Johnson, 1996,; DeKeyser, 1998; Jensen, 2007). I underscored the word ‘instructed’ for a reason: I do not believe that Skill Theory provides an accurate account of how languages are learnt in naturalistic environments.

In a nutshell, Skill Theorists observe that every complex task humans learn is made up of several layers of sub-tasks. For instance, driving a car requires a driver to pay attention to the road and take important decisions as to where to turn, how fast to go, when to brake; however, whilst taking these decisions, the driver is carrying out multiple ‘lower-order’ tasks such as changing gear, physically pushing the brakes, operating the indicator, etc.

Skill theorists observe that lower-order tasks are performed subconsciously, without requiring the brain’s Working Memory to pay much conscious attention to them (or, as they say: they only occupy subsidiary awareness). This, in their view, points to an adaptive feature of the brain: in order to be able to solely focus on the most important aspect(s) of any complex tasks, the brain, throughout Evolution, has learnt to automatize the less complex tasks.

This is  because, based on current models of Working Memory (e.g. Baddeley,1999) the brain has very little cognitive space to devote to any given task. For instance, when it comes to numbers, Working Memory channel capacity can only process  7+/- 2 digits at any one time  Miller (1965). In simpler terms, the only way for the brain to effectively and efficiently mult-task, is to automatize sub-tasks which are less complex.

Fig. 2 – Working Memory as conceived by Baddeley (1999)

lexical priming2.png

Skill Theorists argue that the same applies to language learning. A language learner needs to automatize lower order skills so as to be able to free up space in Working Memory in order to execute more complex tasks requiring the application of higher order skills. Example: you cannot form the perfect tense if you do not form the past participle of a verb and have not learnt the verb ‘to have’. Hence, the aim of language teaching is to train language learners to automatize the knowledge that the instructor provides explicitly to them (i.e. the knowledge of how a rule is formed). Once automatized, it will not require the brain’s conscious attention and the learner will have more space in their Working Memory to deal with the many demands that a language task poses to them.

Imagine having to produce a sentence and  having to think simultaneously (in real time!) about the message you want to convey, the most suitable vocabulary to convey it through, tense, verb endings, word order, agreement, etc. an impossible task for a novice whose mistakes will be due mainly to (cognitive) overload). Such a task would be a fairly easy one for an advanced learner as s/he will have automatized most of the grammar- and syntax-related tasks and will only have to focus on the message and the lexical selection.

This automatization process is long and requires a greater focus on fluency,  lots of scaffolding in the initial phase and negative feedback (correction) plays an important role.

A final point: Skill theorists (e.g. De Keyser 1998) propose that Communicative Language teaching which integrates explicit grammar instruction and focus on skill-automatization constitutes to date the most effective ML teaching methodology.

2.2 Skill-Theory principles and their implications for teaching and learning

2.2.1 Principle 1: language skills are acquired in the same way as any other human skill

The main point Skill-theory proponents make is that languages are learnt in much the same way as humans acquire any other skill (e.g. driving a car, cook, paint). This sets it apart from other influential schools of thoughts, which view language skills as a totally unique set of skills, whose functioning is regulated by innate mechanisms that formal instruction cannot impact (the so-called Mentalist approaches). This is a hugely important premise as it endorses what Applied Linguists call a strong interface position, i.e. the belief that whatever is learnt consciously (e.g. a grammar rule) can become automatized, i.e. executable subconsciously, through practice.

2.2.2 Principle 2: In instructional settings where the L2 grammar is taught explicitly, grammar acquisition involves the transformation of Declarative into Procedural knowledge

Whatever we learn is stored in the brain in one of two forms: (1) Declarative Knowledge, or the explicit knowledge of how things work and it is applied consciously (like knowing all the steps involved in the formation of the perfect tense) or (2) Procedural knowledge, the knowledge we acquire by doing and that we use to perform a specific task automatically, without thinking (like knowing how to ride a bike).

Example: I have declarative knowledge of the English  perfect tense when I can explain the rule of its formation and application. I have procedural knowledge of it when I can use it without knowing the rule (e.g.  because I have picked it up whilst listening to English songs or interacting with English native speakers).

Declarative knowledge has the advantage of having generative power, e.g.: if I learn the rule of perfect tense formation for French regular verbs I will be able to apply it to every single regular verb I come across. On the other hand, Procedural knowledge is limited only to the regular perfect forms I learn.

An advantage of Procedural Knowledge is that it is fast. So, a beginner who was taught ten perfect verb forms by rote learning can apply all of them instantly without thinking. Another beginner who was taught the rule of perfect tense formation, will have to apply each step of the rule one by one, which will slow down production.

According to Skill Theorists the aim of any skill instruction, including Modern Language teaching is to enable Declarative Knowledge to become Procedural (or Automatic). In the context of grammar learning, this means that a target rule which is initially applied slowly, step by step, occasionally referring to conjugation tables, will be applied – after much practice of the kind described in 2.2.6 below – instantly with little cost in terms of Working Memory processing efficiency.

It should be noted that our students pick up Procedural knowledge all the time in our lessons when we teach them unanalysed chunks such as classroom instructions or formulaic language. Whilst teaching such chunks should not be discouraged, Skill Theorists do believe that, in view of their limited generative power, instruction should not excessively rely on rote learning.

2.2.3 Principle 3: The human brain has limited cognitive space for  processing language, so it automatizes lower order receptive and productive skills in order to free up space and facilitate performance

When we learn to drive, we need to learn basic skills such as how to switch on the engine, change gear, press the clutch, turn on the wipers, operate the brakes, etc. before we actually take to the road. Once the lower order operations and skills listed above have been automatized or at least routinized to the extent that we do not have to pay attention to them (by-pass Working Memory’s attentional systems), we can actually be safe in the assumption that we can wholly focus on the higher order skills which will allow us to take the split seconds decisions that will prevent us from getting lost, clash with other cars, break the traffic laws whilst dealing with our children messing about in the back seats.

This is what the brain does, too, when learning languages. Because Working Memory has a very limited space available when executing any task,  the brain has learnt to automatize lower order skills so that, by being performed ‘subconsciously’ they free up cognitive space. So, for instance, if I am an advanced L2 speaker who has routinized accurate L2 pronunciation, grammar and syntax to a fairly high degree , I will be able to devote more conscious attention (Working Memory space) to the message I want to put across. On the other hand, if I still struggle with pronunciation, word order, irregular verb forms and sequencing tenses most of my attention will be taken up by the mechanics of what I want to say, rather than the meaning; this will slow me down and limit my ability to think through what I want to say due to cognitive overload.

In language teaching this important principle translates as follows: in order to enable our students to focus on the higher order skills involved in L2 comprehension and production we need to ensure that the lower-order ones have been acquired or performance will be impaired. Here are a few scenarios which illustrate what I mean.

Example 1: a student who struggles with pronunciation and decoding skills in English (i.e. being able to match letters and combinations of letters with the way they are sounded) will find it difficult to comprehend aural input from an English native speaker as they will not be able to identify the words they hear with the phonological representation they have stored in their brain. Hence, listening instruction ought to concern itself with automatizing those skills first (read here why and how).

Example 2: for a student who has not routinised Masculine, Feminine and Neuter endings in German, applying the rules of agreement in real time talk will be a nightmare. The same student will take for ever to write a sentence containing a few adjectives and nouns because his brain’s (working memory’s) capacity will be taken up by decisions such as what agrees with what, what the correct ending is and what the word order is; by having to deal with these lower order decision s/he will lose track of the higher order issue: to generate a meaningful and intelligible sentence

Example 3: if you teach long words (e.g. containing three syllables or more) to a beginner who has not automatized the pronunciation of basic target language phonemes, his Working Memory will struggle to process it (because of Phonological Loop overload), which will impair rehearsal and its commitment to Long-Term Memory.

Example 4: you cannot hope for a student of French or Italian to be able to acquire the Perfect tense if they have not automatized the formation of the verbs ‘to be’ and ‘to have’ and of the Past Participle. Yet, often we require our students to produce under time constraints Perfect tense forms a few minutes after modelling the formation of the Past Participle.

Hence, teaching ought to focus much more than it currently does, on the automatization of lower order skills (or micro-skills as we may also call them) across all four language skills . In this sense, progression within a lesson should mainly refer to the ability of our students to produce the target L2 item with greater ease, speed and accuracy (horizontal progression), rather than moving from a level of grammar complexity to a higher one, from using two adjectives in a sentence to using five or from using only one tense to using three (vertical progression).

The progression I believe teachers should prioritize is of the horizontal kind. We should concern ourselves with vertical progression only if and when horizontal progression has achieved automatization of the target L2 item.

Most of the failures our students experience in our lessons is due to focusing on vertical progression to soon, mostly because of teachers’ rush to cover the syllabus and/or ineffective recycling.

2.2.4 Principle 4: Acquisiton is a long pain-staking process whose end-result is highly-routinized consistently- accurate performance (which approximates, rarely matches native-speaker performance)

Automatization is a very long process. Think about a sport, hobby or other activity you excel at. How long it took you to get there. How much practice, how many mistakes, how much focus. Every skill takes huge amounts of practice in order for it to be automatized, lower order skills usually taking less time than higher order ones as they require simpler cognitive operations (there are exceptions though, e.g., in language learning, the acquisition of rules governing items which are not salient such as articled prepositions in French, Spanish or Italian).

The process is long for a reason; whenever a given L2 grammar rule is fully acquired, it gives rise to a cognitive structure (called by Anderson,2000, a ‘production’) which can never be modified. As a  result, the brain is very cautious and requires a lot of evidence that whatever rule we apply in our performance is correct. Hence we need to use a specific grammar rule lots of times and receive lots of positive feedback on it, before a permanent production is formed and incorporated.

Do not forget, also, that when a learner is figuring out if their grasp and usage of a given L2 grammar rule is correct s/he might have two or even more possible hypotheses about how it may work and try them concurrently, awaiting positive or negative feedback to confirm or discard them. Hence, the brain needs to make sure that one of the hypotheses it is testing about how a given language item works ‘prevails’ so to speak over the others substantially before ‘accepting’ to incorporate it as a permanent structure. In the absence of negative feedback – hence the importance of correction, especially in the initial stages of instruction – the brain might store more than one form.

Example: a student keeps using (1) ‘j’ai allé’ and (2) ‘je suis allé’ alternatively to mean ‘I went’ in French ; if he does not heed or receive regular corrective feedback pointing to (2) as the correct one and  does not use (2) in speaking and writing often enough to routinize it, (1) and (2) will still compete for retrieval in his brain.

2.2.5 Principle 5 : the extent to which an item is acquired depends largely on the range and frequency of its application (i.e. across how many context I can use it accurately and automatically)

A tennis player being able to perform a back-hand shot only from one specific point of the tennis court cannot be said to have acquired mastery of back-hand shooting. Evidently, the more varied and complex the linguistic and semantic contexts I can successfully apply  a given grammar rule and vocabulary in,  the greater will be the extent of its acquisition.

Example: whilst learning the topic ‘animals’ student X  has practised over and over again the word ‘dog’ for three weeks only in the contexts ‘I have a dog’,’ my dog is called rex’,  ‘Mark has a dog’, ‘I like dogs because they are cute and playful’, ‘we have a dog in the house’. Student Y, on the other hand, has been given plenty of opportunities to practise the word dog in associations with all the persons of the verb ‘to have’, with many more verbs (e.g. feed, groom,  love,  walk , etc.), with a wider range of adjectives new and old (good, bad, loyal, funny, lazy, grredy,etc.) and other nouns (I have a dog and a turtle, a dog and a cat, etc.). Student Y will have built a more wide-ranging and complex processing history for the word ‘dog’ which will warrant more neural associations in Long-term memory and, consequently greater chances of future recall and transferrability across semantic fields and linguistic contexts.

Consequently, language teachers must aim at  recycling each core target item across as many linguistic and semantic  contexts as possible. For instance, if I am teaching the perfect tense in term 3 and I have covered four different semantic areas prior to that, I would ensure that that tense is recycled across as many of those areas too. In a nutshell: the extent to which the target L2 items have been acquired by our students will be largely a function of their processing history with those items.

In concusion, the more limited the input we provide them with and the output we demand of them the less deeply we are likely to impact their learning.

2.2.6 Principle 6: Acquisition is about learning to comprehend and produce language faster under Real Operating Conditions

The five principles laid out above entail that for language acquisition to occur, effective teaching must aim at enabling the learners to understand and produce language under real life conditions or, as Skill-Theorists say ‘Real Operating Conditions’ (ROC). This changes the focus of instruction from simply passing the knowledge of how grammar works and what vocabulary means (Declarative Knowledge) to enabling students to apply it quickly and accurately (Procedural knowledge) by providing lots of training in fluency. Hence, for grammar to be acquired we must go beyond lengthy grammar explanations, gap-fill exercises and quizzes. E.g.: students must be asked to use the grammar in speaking and writing under time pressure.

Training students to be fluent across all four skills means scaffolding instruction much in the same way as one would do in tennis or football coaching. First, one would start by working on automatizing the micro-skills, as already discussed above. Secondly, one would focus on routinizing the higher-order skills by providing an initial highly structured support which is gradually phased out. This translates itself, in my classroom practice as follows:

(1) An initial highly controlled phase which includes: modelling, receptive processing and structured production– During this phase the target L2 item is practised in a controlled environment. The phase starts with lots of comprehensible input through the listening and written medium. The target grammar/vocabulary is recycled extensively before the students engage in production.

A structured production phase ensues. The input given and the output demanded are highly controlled and the chances of error are minimised by providing lots of scaffolding (e.g. vocab lists; grammar rule reminders; writing mats,dictionaries, etc.) and guidance and by imposing no time constraints. Example (speaking practice in the present tense ): highly structured role-play in the present tense only,  where each student has to translate their respective lines from the L1 to the L2 or are given very clear L1 prompts; the language is simple and the students are very familiar with the verbs to be conjugated; verb tables are available on the desk.

(2) A semi-structured expansion phase –This phase is about consolidation and recycling and cuts across all the topics subsequently taught. So, for instance, if one has introduced the French negatives in Term 1 under the topic Leisure, they will recycle them throughout the subsequent terms as part of the topics taught in those terms until the teacher feels fit. This will ensure that the target structure/vocabulary is systematically recycled in combination with old and new.

During this phase, the support is gradually reduced. The input provided and the output expected are more challenging but the teacher still designs the activities with a specific set of vocabulary and grammar structures in mind. Some form of support still available. Example (speaking practice in the present tense): interview in the present tense across a range of familiar topics. Prompts for questions and answers are provided by the teacher (in the L1 or L2). The students are given some time to look at the prompts and think about the answers. Prompts look like this:

Partner 1: ask where Partner 2 usually goes at the week-end

Partner 2: answer providing three details of your choice relating to sport

This phase ends when the teacher feels the students can produce the target structure/vocabulary without support.

(3) An autonomous phase – Here the support is removed. Examples (speaking practice in the present tense): (1) Students are shown pictures and are recorded and assessed as they describe them. The task may elicit a degree of creativity and the use of communication strategies to make up for lack of vocabulary. (2) students are asked to have a conversation about the target topic with only a vague prompt as a cue (e.g. talk about your hobbies). They generate questions and answers impromptu under time constraints. Conversation is recorded and assessed.

(4) A routinization phase – in this phase, the only concern is speed of delivery. The teacher focuses on training the students to produce language ‘fast’, under R.O.C. (real operation conditions), i.e. real life conditions, across various topics and in spontaneous conversations. In this phase the production activities of election will be oral translation drills and communicative activities (e.g. general conversations, simulations, more complex picture tasks) under time constraints. The tasks will not limit themselves to topic X or Y; rather, they will tap on various areas of human experience at once.

It must be stressed that the four phases above may stretch over a period of several months.

3. Concluding remarks

A lot of L2 teaching nowadays concerns itself with the passing of grammar and declarative knowledge of the target language. Such knowledge stays in our students’ brains as declarative because way too often teachers are obsessed with vertical progression at all costs. This attitude, though, short-circuits and straight-jackets learning preventing the learners from truly automatizing the grammar structures and vocabulary we aim to teach them.

L2 students’ failure at acquiring what we teach them and eventually their disaffection with the learning process is often due to the inadequate amount of horizontal progression we allow for in our classrooms. Automatization, ACROSS ALL FOUR SKILLS,  the ability to apply the core L2 items in the performance of tasks rapidly, fluidly and accurately should take priority in the classroom over activities which build intellectual knowledge (e.g. lengthy grammar explanations and gap-fills), concern themselves  with producing artefacts (e.g. iMovies) or simply entertain (e.g. games and quizzes).

Grammar teaching is currently taught in many classrooms through teacher –led explanations followed by gap-fills. This does not lead to automatization and fluency. Grammar structures ought to be taught in the context of interaction which mimicks real life, first through communicative (highly structured) drills then through activities which increasingly allow the students more creativity and freedom in terms of output choice.

Vocabulary ought to be recycled through as many linguistic contexts as possible, shying away from the almost behaviouristic tendency  one observes in many language classrooms to teach and practise the target words in isolation or almost exclusively in the same unambitiously narrow range of phrases (a tendency encouraged by current ML textbooks and many popular specialised websites, e.g. the tragically unambitious Linguascope).

In conclusion, effective ML teaching, as viewed by Skill theory, concerns itself with

  • the micro-skills needed by the students to carry out the complex tasks teachers often require their students to perform. In many contexts, e.g. listening instructions,such micro-skills (e.g. decoding skills) are grossly neglected, often leading to failure and learner disaffection;
  • providing the students with opportunities to automatize everything they are taught before the class move on to another set of grammar rules, vocabulary or learning strategies;
  • building a wide-ranging processing history so that many neural connections are built between a new target item and as many ‘old’ items as possible through real-time language exposure/use;
  • fluency, i.e. the ability to perform each target L2 item as rapidly and accurately as possible;
  • skill-building rather than knowledge-building. Knowledge building is only the starting point of acquisition; that is why error correction that merely informs of the error and cryptically states the rule is considered as having very limited impact on learning.

For those interested in finding out more, please check out this online article by  Jensen (2007) [click on the rectangular download button]

References and suggested bibliography

Anderson, J.R. (1987). Skill acquisition Compilation of weak-method solutions. Psychological Revie. 94(2) 192-210

Anderson. J.R. et al. (1994). Acquisition of procedural skills from examples. Journal of experimental psychology, 20, 1322 -1340.

DeKeyser, R.M. (1998). Beyond focus on form: Cognitive perspectives on learning and practicing a second language grammar . In C. Doughty and J. Williams. (EDs). Focus on form in classroom second language acquisition. (pp42-63) New York: Cambridge university Press

Jensen, E. (2007) Introduction to brain-compatible learning, 2nd edn. Thousand

Oaks, CA: Corwin Press

Johnson, K. (1996). Language Teaching and Skill learning. Oxford: Blackwell.

Schneider, W. & Shiffrin. R. (1997) Controlled and automatic information processing.

10 common shortcomings of secondary curriculum design and textbooks in the UK

lessonplanner

Please note: this post was written in collaboration with Steve Smith of http://www.frenchteacher.net. Many thanks to Dylan Vinales of Garden International School, too, for the thought-provoking discussion we had on the topic prior to writing this.

Introduction

In this post I will concern myself with issues in typical secondary school MFL curriculum design as evidenced by the schemes of work – and the textbooks these are often based on – which in my view seriously undermine the effectiveness of foreign language instruction in many British secondary schools.

Effective curriculum design is as crucial to successful MFL instruction as effective classroom delivery is and must be based on sound pedagogy and skillful planning. As I intend to discuss in this post, much curriculum planning and textbook writing flouts some of the most fundamental tenets of sound foreign language pedagogy and neglects important dimensions of language acquisition. Although Steve Smith of www.frenchteacher.net – with whom I am currently writing ‘The MFL teacher handbook’ – noted in his blog that the new editions of some British textbooks are actually addressing some of the issues I am about to discuss, there is still much scope for improvement.

Issue n 1 – Coverage vs Time available

Schemes of work are typically over-ambitious as they often reflect the structure of the textbook adopted; they usually aim to cover a given topic (i.e. a chapter / module in the textbook) in 6-7 weeks. This does not allow the students to truly acquire the target material, especially when it comes to grammar structures. As I have showed in a number of previous posts, the acquisition of grammar structures which involve ending manipulations/agreement and differ substantially from their L1 equivalent may take months to internalize. Another problem is that schemes of work – when based on textbooks – often devote only one or two lessons to each of the five or six sub-topics that make up the unit-in-hand and then move on to the next sub-topic. This does often not allow for sufficient recycling.

Solution – obvious: teach less but in greater depth; recycle more.

Issue n 2 – Fluency: the neglected objective

In previous blogs I pointed out how effective foreign language teaching ought to aim at developing fluency across all four skills and especially into areas where speed of processing is paramount to be an effective communicator: oral interaction and interpersonal writing (e.g. instant messaging). Fluency was defined in previous post as the ability to produce intelligible oral or written speech in response to a stimulus at high speed. This is a crucial skill for students to develop if we want to enable them to use the target language in the real world, especially in the workplace. Yet, fluency rarely – if ever- features expicitly as a goal in UK MFL departments’ schemes of work. Hence, teachers neither plan for fluency development nor are allocated adequate resources and training to teach fluency. Nor do they formally assess fluency.

Moreover, the issue highlighted in the previous paragraph often works against the attainment of fluency as rushing through a unit entails neglecting horizontal progression. Without sufficient horizontal progression fluency cannot be obtained.

Solution – Plan for the attainment of fluency. Include activities to develop speech automatization and opportunities for its assessment.

Issue n 3 – Topic compartmentalization / Lack of recycling

Schemes of work – even those that are not based on textbooks – rarely recycle adequately. Many colleagues – obviously not language teachers – ask me why I have uploaded over 1,600 teaching resources in two years on http://www.tes.com  and why I created a whole website devoted mainly to vocabulary teaching (www.language-gym.com). The answer is that textbooks and schemes of work usually compartmentalize teaching; term 1a one teaches topic X, term 1b topic Y, term 2a topic Z etc. Each time a topic or structure is covered, it is rarely consciously and systematically recycled in later units. I have had to produce my own worksheets and online resources to guarantee the necessary recycling; it has paid off, but teachers, as overloaded with work as they already are, should not have to do this.

Solution: include in the schemes of work a section in each unit headed ‘recycling opportunities’ and include activities aiming at consolidating old material. Also, make sure that each end of unit assessment tests students on material covered in previous units – or even previous years.

Issue 4 – What about communicative functions?

Most UK textbooks and MFL departments more or less explicitly adopt a weak communicative notional/functional syllabus with a variable focus (i.e. functions/notions + grammar). However, they usually patently neglect to focus adequately on important communicative functions. A glance at Finocchiaro and Brumfit’s (1983) classification of communicative functions (at http://www.carla.umn.edu/articulation/polia/pdf_files/communicative_functions.pdf ) will clarify what I mean. Much typical British secondary school teaching focuses mainly on Referential communicative functions and on only a few interpersonal functions. However, many Interpersonal and Imaginative functions are hardly touched on. Moreover, many important Personal functions are grossly neglected, too – although, I am sure you will agree,  they are crucial in daily life.

In PBL-based schemes of work this issue is worsened by the nature of the approach adopted which focuses on the attainment of a product rather than interpersonal communication.

Communicative functions are pivotal to effective target language proficiency. They are way more important than many other things textbooks teach.

Solution: use Finocchiaro and Brumfit’s taxonomy to fill the gaps in this area that you will identify in your schemes of work. Make sure that you recycle functions over and over again throughout the year.

Issue 5 – The 2 neglected word-classes

Textbooks, schemes of work and specialized websites focus mainly on nouns and –tragically – neglect verbs and adjectives – and hence adverbs from which adjectives are obtained. Verbs, as I pointed out in previous blogs, are essential in order to acquire a high level of autonomous speaking competence (spontaneous talk). One of the reasons for this neglect, I suspect, is that state-school English learners are notoriously bad at conjugating verbs; hence, textbooks dumb down their comprehensible input and target vocabulary by including only few essential and often more ‘learnable’ verbs.

Solution: include lists of target verbs in the schemes of work. Using quizlet or memrise to create your own online activities to drill them in (in the infinitive). You could use my verb trainer at www.language-gym.com – the pictures help the students learn the verb meaning as they conjugate – or my Work-outs.

Issue 6 – How about improvisation?

Schemes of work are usually planned around specific topics, which, in England, repeat themselves every year – how boring! However, autonomous speaking competence (spontaneous talk) is about being able to talk ‘across topics’ so to speak; to be able to have a ‘natural’ conversation with a speaker of the target language which is not bound to a specific topic or sub-topic but touches different aspects of human life and experiences. MFL departments – at least to my knowledge – never really plan for this. Yet, nearly everyone these days states that spontaneous talk is high on their agenda.

Solution: plan for one or two lessons every now and then – maybe in between half-terms? – which are entirely dedicated to talking, reading, listening and writing in the target language without being tied down to a specific topic. A very easy-to-set-up task is a general conversation task where the students ask each other a wide variety of questions covering several topics, including some that have never been covered before – but that the students possess the linguistic tools to talk about.

Issue 7 – Grammar, the ‘poor sister’

This point is so obvious that I will not dwell too long over it. British textbooks devote a ridiculously small amount of space to grammar and to its recycling. Teachers have to toil on a daily basis to resource grammar teaching.

Solution: teach more grammar and recycle it to death (see my previous post: 16 tips for effective grammar teaching’.

Issue 8 – Intercultural competence

Textbooks and schemes of work often include sections about ‘La Francophonie’ or other facts about the target language civilization. However, one very important dimension of cultural awareness is nearly always missing: how to avoid culture shock or other ‘faux pas’ and, more generally, how to train students to deal with target language native speakers in a way which is culture-sensitive and can foster effective integration. In an era where the labour market is so globalized, intercultural competence has become an important lifelong learning skill which our students need to be equipped with.

Solutions: Cultural awareness teaching should be more about the (cross-cultural) skills than the facts.

Issue 9 – Variety of topics

Every year, from year 6/7 to year 11, English teenagers keep learning about the same blocked topics, often relearning the same words. Here again, textbooks play an important role. As I tweeted earlier on today, most English textbooks seem to replicate the Metro textbook blueprint.

Solution: try new topics or combinations of topics. Prioritize topics teenagers are really interested in like relationships, entertainment, gadgets, social media, fashion, etc, rather than house chores or pets…

Issue 10 – Teaching sequences

The ‘Metro textbook blueprint’ is evident in all its successors not only in terms of the topics which receive more emphasis, but also in the way they sequence grammar structures. In a future post Steve and I will propose how we believe grammar structures should be sequenced and the rationale for it. There are many things we believe textbook writers and curriculum designers in the UK should change. One thing that springs to mind, for instance is modal verbs (e.g. Vouloir, Pouvoir, Devoir in French). One wonders why they are always introduced quite late when they are so important in everyday communication and have very high surrender value. Imagine how ‘handy’ they can be to a beginner learner, before they even start conjugating verb, followed as they are by infinitives. Moreover, their acquisition earlier on would partly address issue 5 by enabling the students to use many verbs at will quite easily.

Solution: Consider the surrender value and learnability of the target grammar structures. Would learning them earlier or later facilitate acquisition in your opinion? If so, don’t wait for the textbook sequence to teach them.

Conclusion

Some of the shortcomings in the typical secondary school MFL curriculum and course-book design I have just discussed are much more important than others. My pet hates are the lack of recycling, the insufficient focus on oral fluency, the neglect of verbs and adjectives and the sketchy and superficial approach to grammar. The reader should note that I have deliberately not dealt with the teaching of lifelong learning skills as I do believe that MFL teacher contact time being so limited, most of them are best taught explicitly as separate from the foreign language curriculum – unless, of course they overlap with the aims of the course (e.g. independent enquiry skills, problem solving, intercultural communication, effective communication, empathy, resilience).

Your greatest priority as a curriculum designer – and every teacher to a certain extent is one – should definitely be the systematic recycling of the target vocabulary, grammar and communicative functions and the allocation of sufficient time for deep encoding to occur. This will entail doing away with the one chapter per half-term approach, a tragic legacy of the Metro-based Schemes of Work.

Six writing research findings that have impacted my teaching practice

images

Every now and then I post concise summaries of research findings from studies I come across in my quest for emprical evidence which supports or negates my intuitions or experiences as a language teacher and learner. As I have mentioned in a previous post (‘ten reasons why you should not trust ground-breaking educational research’), much of the research evidence out there is far from being conclusive and irrefutable, due to flaws in design, data elicitation and analysis procedures which often undermine both their internal and external validity. However, when three or more  reasonaby well-crafted studies (however small) find concurring evidence which challenge commonly held assumptions  and/or resonates with our own ‘hunches’ or experiences about teaching and learning, it is reasonable to assume that ‘there is no smoke without fire’.

The following studies have been picked based on the above logic. They are small and less than perfect in design, but do reflect my professional experience and indicate that the validity some dogmata many teachers hold about language teaching and learning may be questionable.

1. Baudrand-Aertker (1992) – Effects of journal writing on L2-writing proficiency

21 students of French in the third year at a high school in Louisiana were asked to keep a journal over a nine-month period. They were required to write two entries per week at least and were not engaged in any other type of writing tasks for the whole of the duration of the study. The teacher responded to the students’ journal entries focusing only on content – not on form. Using a pre-/post-test design Baudrand-Aertker found that:

  • The students’ written proficiency improved significantly as evidenced by the post-test and their own perception;
  • The students felt that the journals helped them improve their overall mastery of the target language;
  • The students reported positive attitudes towards the activity;
  • The vast majority of the students did not want to be corrected on their grammatical mistakes when engaging in journal writing.

Although this study has important limitations in that there was no control group to compare the independent variable’ effects with, I find the results interesting and I intend to give journal-writing a try myself next year.

  1. Cooper and Morain (1980) – Effects of sentence combining instruction

The researchers investigated the effect of grammar instruction involving sentence combining tasks on the essay writing of 130 third quarter students of French. The subjects were divided into two groups: the experimental group received 60 to 150 minutes instruction per week through sentence combining exercises whilst the control group was taught ‘traditionally’ through workbook exercises. The experimental group outperformed the control group on seven of the nine measures of syntactic complexity adopted. Although the study did not look at the overall quality of the informants’ essays but only at the syntactic complexity, its findings are very interesting and has encouraged me to incorporate sentence combining tasks more regularly in my teaching strategies. Here is an discussion of the merits of sentence combining instruction and how it can be implemented

  1. Florez Estrada (1995) – Effects of interactive writing via computer as compared to traditional journaling

In this small scale study (28 university students of Spanish) Florez-Estrada compared a group of learners exchanging e-mail and chatting online with native-speaking partners with another group of students engaged in interactive paper writing with their teachers. The researcher found that the computer group outperformed the control group on the accuracy of key grammar points such as preterite vs imperfect, ‘ser’ vs ‘estar’, ‘por’ vs ‘para’ and others. The findings of this study were echoed by another study of 40 German students, Itzes (1940), which involved students in chatting via computer amongst themselves in the TL. A notable feature of this study is that the students chose the topics they wanted to chat about. These two studies confirms finding from my own practice; I often use Edmodo or Facebook to create a slow student-initiated chat on given topics in which the whole class is involved, every students sharing their opinions/comments with their peers with the assistance of the dictionaries. I have found this activity very beneficial even with groups of less able learners.

  1. Nummikoski (1991) and Caruso (1994) – Effects of extensive L2-reading on L2-writing proficiency as contrasted with written practice.

Both studies investigated if L2 learners who are engaged in extensive L2-reading (with no writing instruction/practice) write more effectively than L2 learners who are involved in writing tasks but do no reading. The results of both studies show a significant advantage for the writing-only condition. These studies, which are by no means flawless, do challenge the commonly held assumption that we can improve our students’ writing proficiency by engaging them in extensive reading.

  1. Martinez-Lage (1992) – Comparison of focus-on-form with focus-on-form-free writing

The researcher investigated the impact of two writing-task types on the writing output of 23 second-year university Spanish students. The same students were asked to write (a) typical assigned compositions and (b) dialogue journals in which they were told they would not be assessed on grammar accuracy. The surprising finding was that the syntactic complexity across both task types was equivalent but the focus-on-form-free task type (journal writing) was grammatically more accurate. I concur with Martinez-Lage on this one as I have tried this strategy myself with many of my AS groups over the years.

  1. Hedgcock and Lefkowitz (1992) – Effect of peer feedback in L2 writing

The researchers studied 30 students in an accelerated first year college French class, who wrote two essays involving three separate drafts. The experimental group was involved in peer feedback (essays were read aloud to each other and oral feedback was given), whilst the other group received written teacher feedback. In terms of performance from the first to the second essay both groups made significant improvements, but in different areas: the peer-feedback group got worse in grammar but did better on content, organization and vocabulary; the teacher feedback group, exactly the opposite. It should be noted that a previous study by Piasecki (1988) which adopted a very similar design but lasted much longer (8 weeks) and involved 112 students of third-year high school students of Spanish, found no significant differences between the two conditions. This confirms my reservations about using peer-feedback as an effective way to correct learner output and as a blanket corrective strategy; in my opinion it may work quite well with certain groups of individuals with highly developed grammar knowledge and critical thinking skills but not with others.

What is the most effective approach to foreign language instruction? – Part 1

download (6)

Introduction – Of metaphors teachers live by and pedagogy ‘evangelists’

Every single one of us lives by metaphors, behavioural templates which we acquire through our interaction with the environment we grow up and live in. The language learning metaphors that are at the heart of our teaching come to a large extent from our experiences as language learners. These images of learning are so strongly embedded in our cognition that according to researchers it takes years of training and teaching practice to replace them with new templates; in certain cases, they are even impervious to  ‘conditioning’, despite the demands of teacher trainers, course administrators or students – I have observed this phenomenon first-hand time and again in most of the schools I have worked at.

Our beliefs about L2 learning play an enormous role in determining what teachers we will become and our response to any new methodology that we are asked to adopt. Some individuals will reject new instructional approaches in the belief that if they are such good linguists and their teachers’ approach worked so well for them, why should it not work for their own students? Some others – like I did, for instance, during and after my PGCE – will integrate elements of their existing belief system with the new methodology (-ies) to create a sort of personalized ‘hybrid’ – a ‘syncretistic’ approach. Others, instead – what I call the ‘radical converts’ – will espouse the new methodology with some kind of fanaticism often becoming zealous evangelists of their new pedagogic ‘dogmata’

It is the third attitude that one must be wary of: the blind allegiance to any approach that claims to have found a universal pedagogical fit for every learner. Any such claim will be unfounded because every learner brings to bear on the learning process a range of genetic and acquired individual variables that play an important role in language aptitude as well as in the cognitive/emotional response to teachers and their methodology. Whilst some guiding principles may be ‘universal’ in that they refer to general mechanisms that regulate human cognition across age, race, gender, G.I. factor and language aptitude, their implementation will ALWAYS be conditioned by contextual variables.

Consequently, I am not going to play the ‘know-all L2-pedagogue’, here, and tell teachers what the best approach is. After all, if your students are happy, motivated and learning lots, you have found the best approach already. You may want to enhance and vary your repertoire of teaching strategies, but after all, if the vast majority of your students are getting where you want them to be in the time and with the resources that you have been allocated by your course administrators, you do not need anyone to tell you how to teach; unless someone throws the spanner in the works, that is, and tells you that you must ‘integrate’ new technology, life-long learning skills, etc. into your healthy and balanced teaching echo-system…

Psychology, however, does give us some clear indication of how humans acquire cognitive skills. So, if one believes, as it is logical to presume, that language acquisition involves the same processes and mechanisms involved in the acquisition of any other cognitive ability, it is possible to identify some core pedagogical principles as crucial to any form of explicit foreign language instruction. Moreover, there is some sound research empirical evidence out there that should inform our teaching; to claim that it is conclusive and irrefutable would be preposterous, but to ignore it because it is not would be irresponsible. After all, what teachers must do with research evidence is to make an informed choice and ask themselves the questions: do these findings resonate with me and my past experiences? Is it worth trying this out? And, after trying it out: did it work? And if it didn’t, you can modify it or reject it altogether and look elsewhere.

Thirteen pedagogic principles rooted Cognitive psychology

The following are the pedagogical principles rooted in Cognitive psychology theory and research that worked for me. I am no evangelist, thus I am not positing them as the Gospel’s truths: these are merely some of the beliefs I formed in more than 2 decades of primary, secondary and tertiary MFL teaching, researching and, most importantly, reflecting on my own practice and listening to my students.

I am not concerning myself explicitly with the most important issue– motivation. It goes without saying that no methodology will ever be effective unless the teacher brings about a high level of his/her learners’ cognitive and emotional arousal and develops their self-efficacy.

Finally, let me reiterate that the principles below are based on the epistemological assumption that language skills are acquired in the same way as any other cognitive human skill.

  1. Practice makes perfect – Every language skill and item, in order to be acquired, is subject to the ‘Power Law of Practice’ (Anderson, 2000). Hence Listening, Speaking, Reading, Writing, Translation/Interpreting, Grammar and any other skills must all be practised extensively. This entails that any instructional approach (e.g. Grammar Translation and PBL) which does not emphasize all four skills in a balanced manner is defective. Instruction can be successful only through extensive practice and recycling of the kind envisaged in the next two points.
  1. Recycling must start from day one – forgetting starts occurring immediately after a given item has passed into Long-term Memory (Anderson and Jordan,1998). As the diagram below clearly shows, after 19 minutes one loses 40 % of what was recalled at time 0; after 9 hours, 56 % and after 6 days, 75 %. Recycling is imperative and must be of the spaced, distributed kind (a bit every so often) not of the massed kind (a lot of it once a week). Moreover, recycling must start on the same day something has been learnt. Instruction must model independent vocabulary learning habits which focus on autonomous recycling; it must also be mindful of human forgetting rate and provide for consolidation accordingly.

ebbinghaus-graph

  1. Effective language learning = high levels of cognitive control – A language item can be said to be acquired only when it can be performed accurately and efficiently (with little hesitation) under real time conditions in unmonitored execution (e.g. spontaneous conversation). This means that acquisition occurs along a conscious to automatic continuum; it starts from a declarative stage where the application of the knowledge about a specific language item is applied slowly under the brain’s conscious control and it ends when the execution of that item is fully automatic and bypasses working memory (Johnson, 1996). Instruction must involve extensive practice which starts with highly structured tasks (i.e. gap-fill or audiolingual drills) which become increasingly less structured with time and aim at developing cognitive control (the ability to perform effectively in real operating conditions).
  1. Production should always come after extensive receptive processing – Humans learn languages by imitating others’ linguistic input. Instruction should engage learners in masses of receptive practice before engaging them in production. Thus, ideally, extensive listening/reading practice (in the way of comprehensible input) should always precede speaking/writing practice. This rules out reading or listening comprehension tasks as valuable receptive practice, as these are tests, not effective sources of modelling; reading/listening for personal enjoyment or enrichment would be more conducive to learning in this regard.
  1. Cognitive overload should be prevented and controlled for – cognitive overload occurs when learners are engaged in tasks that pose challenging demands on their working memory. Teachers ought to prepare their students for a given task by facilitating their cognitive access to each level of challenge posed by that task. Thus, before reading a challenging text, the learners should be taught the key vocabulary and grammar points it contains and effective strategies to tackle it. Moreover, the text could be adapted to incorporate more contextual clues that may facilitate inference of unfamiliar lexis.
  1. Focus on micro-skills as much as you do on the macro- ones – To execute any task in the L2 (e.g. an unplanned role-play) effectively, the brain must acquire effective cognitive control over both the higher meta-components (e.g. generating meaning) and the lower order skills involved (e.g. pronunciation and intonation). By automatizing lower order language skills, the brain frees up space in learner Working Memory thereby facilitating processing efficiency and cognitive control and, consequently, performance – this is like learning to drive a car whereby a driver automatizes the basic skills such as changing gear or accelerating so that s/he can focus on the road. Instruction must identify and systematically address every set of macro- and micro-skills that typical language tasks involve. Following on from (2) such micro-skills must be practised extensively, too.
  1. Learning is enhanced by depth of processing, distinctiveness of input and personal investment – Learning of any language item does not simply involve practice, but also depth of processing. Instruction must engage learners in semantic analysis and association in order to strengthen the memory trace and to increase the range of context-dependent cues at encoding which will enhance the recall of any target item. The distinctiveness of instructional input (how outstanding and memorable it is) is also an important learning enhancing factor. Personal investment, how much the learning taps into an individual’s emotions and personal background increases retention, too. Hence, in choosing topics and learning materials learner opinions and tastes should always be taken into account (e.g. personalized reading-for-enjoyment activities).
  1. Grammar taught explicitly can be acquired – On condition that it is practised extensively, in context, and through masses of communicative practice which starts from controlled tasks and progresses through increasingly challenging unstructured ones. The process is a lengthy one so it may require training students to work on it independently, too. Implications: recycling is imperative and must occur mostly through the cognitive-control enhancement dimension, i.e. less gap-fills and written translation and more oral semi-structured and unstructured tasks. To enhance grammar acquisition the exceptions to the rule governing an ‘X’ structure should be taught before the dominant rule, e.g. irregular before irregular forms (see my article ‘Irregular before regular…’ for the psycholinguistic rationale for this approach).
  1. Corrective feedback is important, especially at the early stages of instruction – However, in order to be effective it must be processed by the brain long and deeply enough for it to be rehearsed in Working Memory and stored permanently in Long-term memory. Hence, any feedback practice on an erroneous executed ‘X’ item must :
  • Be distinctive;
  • Engage learners in deep processing;
  • Recycle the corrective feedback;
  • Be carried out through various means in order to provide more contextual cues for its recall;
  • Not limit itself to treating the symptom (i.e. the error) but also and more importantly the root cause (whether lack of knowledge, processing inefficiency, etc.)
  • Bring about learner intentionality to eradicate the error (i.e. motivate them to address the error in the future in a sustained effort to eliminate it).

(Conti, 2004)

  1. Learning strategies can be taught – On condition that a persuasive rationale for their instruction is provided; that they are modelled and scaffolded effectively and are practised very extensively through a variety of contexts (Cohen, 1998; Macaro, 2007)
  1. Metacognition should be modelled regularly – enhancing learner metacognition is imperative as a learner who knows how to learn and perform best is a learner who is bound to be more successful. Research shows clearly that highly metacognizant individuals are more successful at L2 learning (Macaro, 2007). Ideally, teaching should regularly scaffold holistic and task specific metacognition by prompting students to monitor and evaluate every level of their language learning and performance. The same approach concisely outlined in point 9 applies here.
  1. Individual variables must be assessed at the beginning of instruction – Learner individual factors may inhibit or facilitate learning. Ideally, at the beginning of instruction it may be helpful (but not always viable, I know…) to obtain as much information as to the following students’ characteristics
  • Previous history as language learners;
  • Personality traits;
  • Learning strategies;
  • Learning preferences (NOT learning styles – but rather how one enjoys learning)
  • Language proficiency across all skills;
  • Language aptitude;
  • Personal interests;
  • Processing efficiency (e.g. how well learners process language);

    This is very time consuming and does require quite a lot of resources and expertise.

  1. Sources of divided attention must be controlled for – This is the most obvious learning principle (Eysenk, 1988); that is why I placed it last. In a lot of UK state school classrooms to expect every student to be focused 100 % of the time is unrealistic. However, in settings where behavior management is not an issue, teachers should endeavour to minimize any distraction stemming from any sources which are directly under their control. One of them is the excessive manipulation of digital media (e.g. app smashing) which hijacks learners’ finite attentional resources away from language processing. Digital media can be effective target language learning enhancers, but must be used judiciously to expand not shrink learning.

In conclusion, as already stated above, the above list is by no means exhaustive. It only includes some of the many pedagogic principles which, in my opinion, ought to underlie any instructional approach regardless of the educationl setting and espoused theory. Unfortunately, something important is missing: how should one implement the above principles in curriculum design, lesson planning and across all four macro-skills? Some of the answers can be found in the other articles on this blog. More answers will be provided in the sequel to this article in the very near future, in which I will concern myself with how those principle should inform pedagogy vis-à vis the four macro-skills, grammar, translation and learning strategy instruction.

Nine interesting foreign language research findings you may not know about

images (5)

In  this post I am going to share with the reader a very succinct summary of 9 pieces of research I have recently come across which I found interesting and have impacted my classroom practice in one way or another. They are not presented in any particular order.

  1. Green and Hecht 1992 – Area: Explicit grammar instruction and teaching of aspect

Green and Hecht investigated 300 German learners of English. They asked them to correct 12 errors in context and to offer an explanation of the rule. Most interesting finding: the students could correct 78 % of the errors but could not provide an explanation for more than 46 % of the grammar rules that referred to those errors. They identified a set of rules that were hard to learn (i.e. most students did not recall them) and a set of easy rules (the vast majority of them could recall them successfully). Their implications for teaching: the explicit teaching of grammar may actually not work for all grammar items. For example, the teaching of aspect (e.g. Imperfect vs Preterite in Spanish), would be more effectively taught, according to them, by exposure to masses of comprehensible input (e.g. narrative texts) rather than through the use of PPTs or diagrams on the classroom whiteboard/screen – in fact Blyth (1997) and Macaro (2002a) demonstrated the futility of drawing horizontal lines interrupted by vertical ones to indicate that the perfect tense ends the action.

My conclusions: I do not entirely agree with Blyth and Macaro that explicit explanation of grammar in the realm of aspect does not work and I do like diagrams (although they do not work with all of one’s students). However, I do agree with Green and Hecht (1992) that the best way to teach aspect is through exposure to masses of comprehensible input containing examples of aspect in context. The grammar explanation and production phase may be carried out at a later stage.

  1. Milton and Meara (1998) – Comparative study of vocabulary learning between German, English and Greek students aged 14-15 years.

197 students from the three countries studying similar syllabi for the same number of years were tested on their vocabulary. The findings were that:

1.The British students’ score was the worst (averaging at 60 %). According to the researchers, they showed a poor grasp of basic vocabulary ;

2.They spent less time learning and were set lower goals than their German and Greek counterparts;

3. 25 % of the British students scored so low (after four years of MFL learning) that the researchers questioned whether they had learnt anything at all.

The authors of the study also found that British learners are not necessarily worse in terms of language aptitude; rather, they questioned the effectiveness of MFL teaching in the UK.

My conclusions: this study is quite old and the sample they used may not be indicative of the overall British student population. If it were, though, representative of the general situation in Britain, teachers may have to – as I have advocated in several previous blogs of mine – consciously recycle words over and over again, not just within the same units, but across units.

Moreover a study of 850 EFL learners, by Gu and Johnson (1996), may indicate an important issue underlying our students poor vocabulary retention; they found that students who excelled in vocabulary size were those who used three metacognitive strategies in addition to the cognitive strategies used by less effective vocabulary learners : selective attention to words (deciding to focus on certain words worth memorizing), self-initiation (making an effort to learn beyond the classroom and the exam system) and deliberate activation of newly-learnt words (trying out using that word independently to obtain positive or negative feedback as to the correctness of their use) . Teaching should aim, in other words, at developing learner autonomy and motivation to apply all of these strategies independently outside the classroom.

  1. Knight (1994) – Using dictionaries whilst reading – effects on vocabulary learning

Knight gave her subjects a text to read on a computer. One group had access to electronic dictionaries whilst the other did not. She found that those who did use the dictionary and not simply guessing strategies, actually scored higher in a subsequent vocabulary test. This and other previous (Luppescu and Day, 1993) and subsequent studies (Laufer & Hadar, 1997; Laufer & Hill, 2000; Laufer & Kimmel,1997) suggest that students should not be barred from using dictionaries in lessons. These findings are important for 1:1 (tablet or PC) school settings considering the availability of free online dictionaries (e.g. www.wordreference.com).

  1. Anderson and Jordan (1998) – Rate of forgetting

Anderson and Jordan set out to investigate the number of words that could be recalled by their informants immediately after initial learning, 1 week, 3 weeks, and 8 weeks thereafter. They identified a learning rate of 66%, 48%, 39%, and 37% respectively. The obvious implication is that, if immediately after learning the subjects could not recall 66 % of the target vocabulary, consolidation should start then and continue (at spaced intervals – through recycling in lessons or as homework) for several weeks. At several points during the school year, I remind my students of Anderson and Jordan’s study and show them the following diagram. It usually strikes a chord with a lot of them:

ebbinghaus-graph

  1. Erler (2003) – Relationship between phonemic awareness and L2 reading proficiency

Erler set out to investigate the obstacles of learners of French as a foreign language in England. She studied 11-12 year olds. She found that there was a strong correlation between low level of phonemic awareness and reading skills (especialy word recognition skills). She concluded that explicit training and practice in the grapheme-phoneme system (i.e. how letters/combination of letters are pronounced) of French would improve L1-English learners’ reading proficiency in that language. This find corroborates other findings by Muter and Diethelm (2001) and Comeau et al (1999). The implications is that micro-listening enhancers of the like I discussed in a previous blog (e.g. ‘Micro-listening skills tasks you may not do in your lessons’) or any other teaching of phonics should be performed in class much more often than it is currently done in many UK MFL classrooms.

Please note: teaching pronunciation and decoding skills instruction are not the same thing.  Pronunciation is about understanding how sounds are produced by the articulators, whilst teaching decoding skills means instructing learners on how to convert letters and combination of letters into sound. Also, effective decoding-skill instruction occurs in communicative contexts (whether through receptive or productive processing) not simply through matching sounds with gestures and/or phonetic symbols.

  1. Feyten (1991) – Listening ability as predictor of success

Feyten investigated the possibility that listening ability may be a predictor of success in foreign language learning. The researcher assessed the students at pre-test using a variety of tasks and measures of listening proficiency. After a ten-week course she tested them again (post-test) and found that there was a strong correlation between listening ability and overall foreign language acquisition, i.e.: the students who had scored high at pre-test did better at post-test not just in listening, but also in written grammar, reading and vocabulary assessment. Listening was a better predictor of foreign language proficiency than any other individual factor (e.g. gender, previous learning history, etc.).

My implications: we should take listening more seriously than we currently do. Increased exposure to listening input and more frequent teaching of listening strategies are paramount in the light of such evidence. Any effective baseline assessment at the outset of a course ought to include a strong listening comprehension component; the latter ought to include a specific decoding-skill assessment element.

  1. Graham (1997) – Identification of foreign language learners’ listening strategies

This study investigated the listening strategies of 17-year-old English learners of German and French. Amongst other things she found the following issues undermining their listening comprehension. Firstly, they were slow in identifying key items in a text. Secondly, they often misheard words or syllables and transcribed what they believed they had heard thereby getting distracted. Graham’s conclusions were that weaker students overcompensated for lack of lexical knowledge by overusing top-down strategies (e.g. spotting key words as an aid to grasp meaning).

My implications are that Graham’s research evidence, which echoes finding from Mendelsohn (1998) and other studies, should make us wary of getting students to over-rely on guessing strategies based on key-words recognition. Teachers should focus on bottom-up processing skills much more than they currently do, e.g. by practising (a) micro-listening skills; (b) narrow listening or any other listening instruction methodology which emphasizes recycling of the same vocabulary through comprehensible input (N.B. not necessarily through videos or audio-tracks; it can be teacher-based, in absence of other resources); (c) listening with transcripts – whole, gapped or manipulated in such a way as to focus learners on phoneme-grapheme correspondence.

  1. Polio et al. (1998) – Effectiveness of editing instruction

Polio et al. (1998) set out to investigate whether additional editing instruction – the innovative feature of the study – would enhance learners’ ability to reduce errors in revised essays. 65 learners on a university EAP course were randomly assigned to an experimental and a control group who wrote four journal entries each week for seven weeks. Whereas the control group did not receive any feedback, the experimental group was involved in (1) grammar review and editing exercises and (2) revision of the journal entries, both of which were followed by teacher corrective feedback. On each pre- and post-tests, the learners wrote a 30-minute composition which they were asked to improve in 60 minutes two days later. Linguistic accuracy was calculated as a ratio of error-free T-units to the total number of T-units in the composition.

The results suggested that the experimental group did not outperform the control group. The researchers conjectured that the validity of their results might have been undermined by the assessment measure used (T-units) and/or the relatively short duration of the treatment. They also hypothesised that the instruction the control group received might have been so effective that the additional practice for the experimental group did not make any difference.

The implications of this study are that editing instruction may take longer than seven weeks in order to be effective. Thus, the one-off editing instruction sessions that many teachers do on finding common errors in their students’ essays to address the grammar issues that refer to them, are absolutely futile, unless they are followed up by extensive and focused practice with lots of recycling.

  1. Elliott (1995) – Effect of explicit instruction on pronunciation

Elliott set out to investigate the effects of improving learner attitude toward pronunciation and of explicitly teaching pronunciation on his subjects (66 L1 students of Spanish). He compared the experimental group (which received 10-15 minutes of instruction per lesson over a semester) with a group of students whose pronunciation was corrected only when it impeded understanding. The results were highly significant, both in terms of improved accent and of attitude (92 % of the informants being positive about the treatment). The experimental group outperformed the control group.

Implications: this study , which confirms evidence from several others (e.g. Elliot 1997; Zampini, 1994), confirms that explicit pronunciation instruction is more effective than implicit instruction whereby L2 learners are expected to learn pronunciation simply by exposure to comprehensible input. Arteaga’s (2000) review of US Spanish textbooks found that only 4 out of 10 Spanish textbooks include activities attempting to teach pronunciation. I suspect that the figure may be even lower in the UK. In the light of Elliott’s findings, this is quite appalling, as the mastery of phonology not only is a catalyst of reading ability but also of listening and speaking proficiency as well as playing an enormous role in Working Memory’s processing efficiency in general (see my blog: ‘ Eight important facts about Working Memory’).

How the brain acquires foreign language grammar – A Skill-theory perspective

Caveat: Being an adaptation of a section of a chapter in my Doctoral thesis, this is a fairly challenging article which may require solid grounding in Applied Linguistics and Cognitive Theories of Skill Acquisition.

1. L2-Acquisition as skill acquisition: the Anderson Model

The Anderson Model, called ACT* (Adaptive Control of Thought), was originally created as an account of the way students internalise geometry rules. It was later developed as a model of L2-learning (Anderson, 1980, 1983, 2000). The fundamental epistemological premise of adopting a skill-development model as a framework for L2-acquisition is that language is considered as governed by the same principles that regulate any other cognitive skill. A number of scholars such as Mc Laughlin (1987), Levelt (1989), O’Malley and Chamot (1990) and Johnson (1996), have produced a number of persuasive arguments in favour of this notion.
 
 

Although ACT* constitutes my espoused theory of L2 acquisition, I do not endorse Anderson’s claim that his model alone can give a completely satisfactory account of L2-acquisition. I do believe, however, that it can be used effectively to conceptualise at least three important dimensions of L2-acquisition which are relevant to type of Explicit MFL instructional approaches implemented in many British schools: (1) the acquisition of grammatical rules in explicit L2-instruction, (2) the developmental mechanisms of language processing and (3) the acquisition of Learning Strategies.

 
 

 Figure 1: The Anderson Model (adapted from Anderson, 1983)

 

                 

 

The basic structure of the model is illustrated in Figure 1, above. Anderson posits three kinds of memory, Working Short-Term Memory (WSTM), Declarative Memory and Production (or Procedural) Memory. Working Memory shares the same features discussed in previous blogs (see ‘Eight important facts about Working Memory’) while Declarative and Production Memory may be seen as two subcomponents of Long-Term Memory (LTM). The model is based on the assumption that human cognition is regulated by cognitive structures (Productions) made up of ‘IF’ and ’THEN’ conditions. These are activated every single time the brain is processing information; whenever a learner is confronted with a problem the brain searches for a Production that matches the data pattern associated with it. For example:

 
 

IF the goal is to form the present perfect of a verb and the person is 3rd singular/

 

THEN form the 3rd singular of ‘have’

 

IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed /

 

THEN form the past participle of the verb

 
 

The creation of a Production is a long and careful process since Procedural Knowledge, once created, is difficult to alter. Furthermore, unlike declarative units, Productions control behaviour, thus the system must be circumspect in creating them. Once a Production has been created and proved to be successful, it has to be automatised in order for the behaviour that it controls to happen at naturalistic rates. According to Anderson (1985), this process goes through three stages: (1) a Cognitive Stage, in which the brain learns a description of a skill; (2) an Associative Stage, in which it works out a method for executing the skill; (3) an Autonomous Stage, in which the execution of the skill becomes more and more rapid and automatic.

 
 

In the Cognitive Stage, confronted with a new task requiring a skill that has not yet been proceduralised, the brain retrieves from LTM all the declarative representations associated with that skill, using the interpretive strategies of Problem-solving and Analogy to guide behaviour. This procedure is very time-consuming, as all the stages of a process have to be specified in great detail and in serial order in WSTM. Although each stage is a Production, the operation of Productions in interpretation is very slow and burdensome as it is under conscious control and involves retrieving declarative knowledge from LTM. Furthermore, since this declarative knowledge has to be kept in WSTM, the risk of cognitive overload leading to error may arise.

 
 

Thus, for instance, in translating a sentence from the L1 into the L2, the brain will have to consciously retrieve the rules governing the use of every single L1-item, applying them one by one. In the case of complex rules whose application requires performing several operations, every single operation will have to be performed in serial order under conscious attentional control. For example, in forming the third person of the Present perfect of ‘go’, the brain may have to: (1) retrieve and apply the general rule of the present perfect (have + past participle); (2) perform the appropriate conjugation of ‘have’ by retrieving and applying the rule that the third person of ‘have’ is ‘has’; (3) recall that the past participle of ‘go’ is irregular; (4) retrieve the form ‘gone’.

 
 

Producing language by these means is extremely inefficient. Thus, the brain tries to sort out the information into more efficient Productions. This is achieved by Compiling (‘running together’) the productions that have already been created so that larger groups of productions can be used as one unit. The Compilation process consists of two sub-processes: Composition and Proceduralisation. Composition takes a sequence of Productions that follow each other in solving a particular problem and collapses them into a single Production that has the effect of the sequence. This process lessens the number of steps referred to above and has the effect of speeding up the process. Thus, the Productions

 
 
 

P1 IF the goal is to form the present perfect of a verb / THEN form the simple present of have

 
 

P2 IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed / THEN form the past participle of the verb would be composed as follows:

 
 

P3 IF the goal is to form the present perfect of a verb / THEN form the present simple of have and THEN the past participle of the verb

 
 

An important point made by Anderson is that newly composed Productions are weak and may require multiple creations before they gain enough strength to compete successfully with the Productions from which they are created. Composition does not replace Productions; rather, it supplements the Production set. Thus, a composition may be created on the first opportunity but may be ‘masked’ by stronger Productions for a number of subsequent opportunities until it has built up sufficient strength (Anderson, 2000). This means that even if the new Production is more effective and efficient than the stronger Production, the latter will be retrieved more quickly because its memory trace is stronger.

 
 

The process of Proceduralisation eliminates clauses in the condition of a Production that require information to be retrieved from LTM memory and held in WSTM. As a result, proceduralised knowledge becomes available much more quickly than non-proceduralised knowledge. For example, the Production P2 above would become

 
 

IF the goal is to form the present perfect of a verb

 

THEN form ‘have’ and then form the past participle of the verb

 

The process of Composition and Proceduralisation will eventually produce after repeated performance:

 
 

IF the goal is to form the present perfect of ‘play’/ THEN form ‘ has played’

 
 

For Anderson it seems reasonable to suggest that Proceduralisation only occurs when LTM knowledge has achieved some threshold of strength and has been used some criterion number of times. The mechanism through which the brain decides which Productions should be applied in a given context is called by Anderson Matching. When the brain is confronted with a problem, activation spreads from WSTM to Procedural Memory in search for a solution – i.e. a Production that matches the pattern of information in WSTM. If such matching is possible, then a Production will be retrieved. If the pattern to be matched in WSTM corresponds to the ‘condition side’ (the ‘if’) of a proceduralised Production, the matching will be quicker with the ‘action side’ (the ‘then’) of the Production being deposited in WSTM and make it immediately available for performance (execution). It is at this intermediate stage of development that most serious errors in acquiring a skill occur: during the conversion from Declarative to Procedural knowledge, unmonitored mistakes may slip into performance.

 
 

The final stage consists of the process of Tuning, made up of the three sub-processes of Generalisation, Discrimination and Strengthening. Generalisation is the process by which Production rules become broader in their range of applicability thereby allowing the speaker to generate and comprehend utterances never before encountered. Where two existing Productions partially overlap, it may be possible to combine them to create a greater level of generality by deleting a condition that was different in the two original Productions. Anderson (1982) produces the following example of generalization from language acquisition, in which P6 and P7 become P8

 
 
 

P6 IF the goal is to indicate that a coat belongs to me THEN say ‘My coat’

 
 

P7 IF the goal is to indicate that a ball belongs to me THEN say ‘My ball’

 
 

P8 IF the goal is to indicate that object X belongs to me THEN say ‘My X’

 
 
 

Discrimination is the process by which the range of application of a Production is restricted to the appropriate circumstances (Anderson, 1983). These processes would account for the way language learners over-generalise rules but then learn over time to discriminate between, for example, regular and irregular verbs. This process would require that we have examples of both correct and incorrect applications of the Production in our LTM.

 

Both processes are inductive in that they try to identify from examples of success and failure the features that characterize when a particular Production rule is applicable. These two processes produce multiple variants on the conditions (the ‘IF’ clause(s) of a Production) controlling the same action. Thus, at any point in time the system is entertaining as its hypothesis not just a single Production but a set of Productions with different conditions to control the action.

 
 
Since they are inductive processes, Generalization and Discrimination will sometimes err and produce incorrect Productions. As I shall discuss later in this chapter, there are possibilities for Overgeneralization and useless Discrimination, two phenomena that are widely documented in L2-acquisition research (Ellis, 1994). Thus, the system may simply create Productions that are incorrect, either because of misinformation or because of mistakes in its computations.
 
 
 
ACT* uses the Strengthening mechanism to identify the best problem-solving rules and eliminate wrong Productions. Strengthening is the process by which better rules are strengthened and poorer rules are weakened. This takes place in ACT* as follows: each time a condition in WSTM activates a Production from procedural memory and causes an action to be deployed and there is no negative feedback, the Production will become more robust. Because it is more robust it will be able to resist occasional negative feedback and also it will be more strongly activated when it is called upon:
 
 
 
The strength of a Production determines the amount of activation it receives in competition with other Productions during pattern matching.Thus, all other things being equal, the conditions of a stronger Production will be matched more rapidly and so repress the matching of a weaker Production (Anderson, 1983: 251)
 
 
 
Thus, if a wrong Interlanguage item has acquired greater strength in a learner’s LTM than the correct L2-item, when activation spreads the former is more likely to be activated first, giving rise to error. It is worth pointing out that, just as the strength of a Production increases with successful use, there is a power-law of decay in strength with disuse.
 
 
 
 
2.Extending the model: adding a ‘Procedural-to-Procedural route’ to L2-acquisition
 
One limitation of the model is that it does not account for the fact that sometimes unanalysed L2-chunks of language are through rote learning or frequent exposure. This happens quite frequently in classroom settings, for instance with set phrases used in everyday teacher-to-student communication (e.g. ‘Open the book’, ‘Listen up!’). As a solution to this issue Johnson (1996) suggested extending the model by allowing for the existence of a ‘Procedural to Procedural route’ to acquisition whereby some unanalysed L2-items can be automatised with use, ‘jumping’, as it were, the initial Declarative Stage posited by Anderson.
 
This means that teaching memorised unanalysed chunks can work in synergy with explicit language teaching, as happens in my approach. See my blog post on how I teach lexicogrammar.

Eight important facts about Working Memory and their implications for foreign language teaching and learning

download (1)

  1. Introduction

There is no blogpost of mine which does not mention Working Memory (WM) at some point. Why? Because effective language processing and learning largely depends on how well Working Memory performs. In fact, apart from automatic processes – which bypass WM’s attentional control – all conscious processing of information (visual, auditory, etc.) occurring in the human brain is performed by WM. Whether our students are reading or listening to target language input, translating a passage into French, planning an essay or performing an oral task it will be WM that does most or all of the work.

Let us consider reading a target language text. It is WM that matches any lexis in the text with its meaning (by retrieving it from Long Term Memory). And what if we struggle with that text? Every single operation the brain performs in an attempt to decode will take place in WM, too. In the case of vocabulary learning, any rehearsal we perform in an attempt to commit the words/phrases we are trying to learn to Long-term Memory (e.g. repeating aloud) will be performed in WM, which will temporarily hold that information for as long as we repeat it. In speaking and writing, all the operations involved in ‘translating’ ideas (or ‘propositions’ as psychologists call them) into words and evaluating their accuracy will occur in WM, too.

These are but a few examples of how cognition occurs in WM. With the above in mind, it goes without saying that knowing how WM works can help foreign language instructors devise strategies to teach more effectively. The following are eight important facts about WM and their implications for L2 learning that all foreign language teachers should bear in mind when planning and delivering the curriculum, assessing and providing feedback on learner performance.

  1. The structure of WM

As the picture below shows, WM, which is located in the prefrontal cortex of the brain, is made up of three main components:

  • A visuospatial (i.e. Graphic/Visual) sketchpad which activates areas near the visual cortex of the brain and allows us to hold images, including the graphic images of words ‘alive’ in WM so that they are available for processing;
  • A phonological loop which’ uses Broca’s area as a kind of ‘inner voice’ that repeats word sounds to hold them in WM;
  • A central executive which regulates the flow of information in and out of the phonological loop and the visuospatial sketchpad, both as coming from the perceptual organs and from Long-Term Memory. The central executive is basically in charge of orchestrating all the processes occurring in WM.

m_27

 

So, for example, when we read a target language word or phrase, the visuospatial sketchpad will hold its graphic image, the phonological loop its sound (if we are pronouncing it) and the central executive will match it to any existing information in Long-term Memory in an attempt to make sense of it. If a match is found, the process will stop there; otherwise, if the word/phrase is new, the central executive will call upon a range of interpretive processes as well as resources from Long-Term Memory in order to attempt to decode it.

2.1. There are two distinct memory systems in the human brain

WM is one of two systems which memory is made of. The other one is the ‘place’ along the brain’s neural networks where memories are stored permanently and cannot be deleted unless by disease, physical damage or intervention affecting the prefrontal cortex (Long-Term Memory). It is after rehearsal in WM that information passes into Long-Term Memory.

2.2. WM is a temporary storage ‘facility’                      

Whether it is processing input from the outside world or retrieving material from Long-term Memory, WM will hold any information only for a few seconds. After that, spontaneous decay will set in, unless one makes a conscious effort to keep it there by focusing a considerable amount of his/her attentional resources on it through what we call ‘rehearsal’ (shallow or deep). Distinctiveness (how much it stands out) and high relevance (how much it matters to us) of input can also result in the stimulus to stay in WM longer. This has enormous implications for foreign language instruction and learning across all macros-skills and for any teaching in general.

Take, for example, oral recasts; the teacher responds to an erroneous utterance by a student by interrupting his/her conversation flow, and recasts (i.e. reformulates) his/her utterance correctly. At that point, the student have only a few seconds to process the teacher’s correction (has the correction will very soon decay from Workin Memory), notice and make sense of it whilst s/he is supposed to restart the conversation or to attend to another students’ input. Research shows that this is unlikely to result in learning unless the student has a much bigger and more efficient WM than average. Should teacher stop recasting? Maybe so, and reserve any feedback on or treatment of the errors noticed in learner input later on in the lesson.

Another implication refers to listening. Often MFL students sit through listening tasks which require them to identify details in a text spoken at native speaker speed. With the above in mind it is clear how this task can be a very tall order for novice-to-intermediate learners, as they have to hold on to information they hear by actively rehearsing it (through the phonological loop) to prevent decay whilst the listening track is still playing. Being a listening task, the learner’s WM will be rehearsing it by engaging the phonological loop; thus, if the learner’s pronunciation is not too good, s/he will find it very hard to rehearse the information s/he hears thereby slowing down the whole process. Hence the need for teachers to implement approaches to listening instruction which lessen the cognitive load on learners (e.g. narrow listening) and include focus on micro-listening skills (see my article on micro-listening enhancers).

There are obviously many more implications for teachers, as far as the temporariness of WM storage is concerned. Too many to deal with in this article. The most important relates to the issue of distinctiveness of teacher input: the more distinctive (e.g. engaging, outstanding, impressive, particularly funny) teacher input is, the more likely it is to linger for longer than the 1-2 seconds it would normally stay in WM and to pass into Long-Term Memory. That is also why, engaging students in the semantic analysis of a target word/phrase (what psychologists call ‘elaboration’) is more likely to result in learning as such analysis, by involving deeper processing, will require the learner to hold the word in WM for longer than 1-2 seconds whilst engaging the brain in higher order thinking (which strengthens retention).

2.3 WM has limited channel capacity

WM has a very limited capacity or memory span. According to Miller (1965), it cannot contain more than 7+/- 2 items at the same time (i.e. between 5 and 9). More recent estimates concede that Miller’s number may be true of university population but not of the average person; they estimate WM’s capacity at 4 to 5 items at the same time. WM’s channel capacity is affected by genetic factors (some individual’s WM is bigger than others) and by motivation.

The amount of words WM can hold at any given time is phonologically determined (for instance, Chinese speakers can hold more words in WM than English speakers because in Mandarin each word is a syllable). This means that a novice foreign language learner will be able to hold fewer words in WM than s/he does in his/her mother tongue as s/he will pronounce the words more slowly. The more rapidly a foreign language speaker can utter a word or phrase, the less space in their working memory it will take.

The phonology-dependent nature of learning vocabulary and the limitation of the phonological loop also means that words that are long and contain complex target language sounds cannot be processed efficiently and therefore not learnt ‘properly’. Hence, work on phonics from the very early days of instruction is paramount.

One implication of this issue for MFL teaching and learning is that in order to increase MFL learners’ WM processing efficiency in a foreign language, they must receive extensive speaking practice. Such practice will also impact their listening skills in that, as already explained above, whilst listening the learner needs to hold in his/her phonological loop fairly big chunks of target language in order to comprehend the text.

Another implication relates to writing and speaking. novice L2 English learners will find it hard to produce longer or complex sentences accurately in languages like French, Italian, German or Spanish as most or all of their WM’s channel capacity will be taken up by the retrieval of the L2 lexis required to form those sentences and little space will be left to focus on less salient grammar features such as adjectival and verb endings, function words and syntactic order.

Finally, to enhance learner memory span, teachers may want to train students with poorer WM in the use of mnemonics such as the Key Word technique or other associative memory techniques. Research shows that through the effective use of mnemonic strategies WM’s digit span can be even increased tenfold.

Another strategy to increase WM’ capacity is chunking the target information. This consists in organizing a number of items which would normally would be too big for WM to hold into manageable units. An example of this is the way we memorize a phone number; by memorizing 0176324167 as 017 632 4167 we basically reduce 10 units to 3, thereby greatly reducing the cognitive load. Imagine learning the phrase ‘appareils électroménagers’ – almost impossible for a novice’s phonological loop to cope with. By chunking it into appa / reils / électro / ménagers’ even a novice can cope with pronouncing and memorizing it.

2.4 Storage in WM is ‘fragile’

When items are stored in WM they can be easily lost due to interference from competition with other items (divided attention) or interference from environmental factors (e.g. noise). Anxiety, worry and self-concern during performance can also cause divided attention and WM memory loss.

The obvious implication is that our teaching should bring about as much arousal in our students as possible so as to keep the target language input in their focal awareness at all times.

Another implication is that apart from the obvious sources of distraction which pertain to student’s misbehavior or environmental factors, teachers must try to minimize any other source of distraction. A frequent source of distraction comes, in this day and age, by learning languages through the digital medium or by producing a digital artefact as part of projects in the target language.

 

2.5 Error is often caused by WM processing inefficiency

When we are carrying out complex tasks WM may have to juggle several tasks at the same time. Base on points 2.3 and 2.4 above the ‘multi-tasking’ that WM has to do can cause information processing or retrieval to slow down and/or result in performance error. Anxiety can have a detrimental effect in this regard, too.

The application of declarative knowledge (i.e. intellectual knowledge of L2 grammar) in speaking and listening performance is likely to cause processing inefficiency as WM needs to apply every rule consciously. Imagine, in talking about what you did yesterday in French, having to apply every step to forming the Perfect Tense of ‘Aller’ one by one as compared to simply saying ‘je suis allé’. Hence the very long pauses and hesitation when a novice-to-intermediate speaker has solid declarative knowledge of the language but little control over the speaking medium, due to lack of practice.

The implications for teaching are obvious and refer to the issues I have dealt with extensively in previous blogs. On the one hand teachers must focus their efforts on developing students’ cognitive control over the target language; on the other, they need to try as much as possible to lessen the cognitive load on students’ WM by (a) pitching the tasks they involve students in to the right level of cognitive/linguistic challenge; (b) prepping the students before each target language task through activities which recycle the language items they will need in the execution of that task; (c) keeping anxiety out of the classroom as much as possible.

Also, in order to facilitate WM processing efficiency, students may have to be taught strategies that can compensate for lack of procedural competence. For instance, teachers may raise learners’ awareness of how their WM’s processing inefficiency can cause them to make specific mistakes (e.g. agreement mistakes in writing) and model editing strategies to identify and/or prevent such mistakes (e.g. through mnemonics).

2.6 Forgetting is caused by WM failure to access the required information (cue-dependent forgetting)

Memory is context-dependent, in other words, the environment in which one is learning a given language item will enhance the chances of recalling that item later on. Hence, when we do not remember something, it is not because that information is not stored in Long-Term Memory any longer; but rather, because we are not using the right cue to retrieve that information from Long-Term Memory. So, for instance, if my teacher has used a picture of Arnold Schwarzenegger to teach the word ‘Musculoso’ in Spanish, that picture will facilitate my recall of that word.

Here, too, training students in the use of memory strategies to prevent cue-dependent forgetting can be extremely helpful.

2.7 There may be a link between poor WM and depression

Recent research has evidenced a link between poor WM and depression. They found that people with a highly efficient WM have a more positive outlook on life and are generally more self-confident. Individuals with poor WM tend be more prone to anxiety and to brood and sulk more over things.

The implications for teachers are very obvious; minimize the potential sources of anxiety for students who fall in this category. Don’t presume that this issue affect only children with special educational needs. Research shows clearly that depression amongst adolescents has risen substantially in the last decade or so. Hence one has to be very mindful of this issue and handle it with much emotional and cognitive empathy.

2.8 An efficient WM is a good predictor of academic success including MFL learning

 Alloway and Alloway (2009) actually found that poor WM is a better predictor of future academic success than IQ. They found that “working memory is not a proxy for IQ but rather represents a dissociable cognitive skill with unique links to academic attainment”. Students with poor working memory do badly across all or most subjects, including foreign languages. In fact, more recent theories of language aptitude include WM as an important factor affecting success in foreign language learning.

  1. Conclusion

In conclusion, MFL teaching should concern itself from the very early stages of instruction with the development of processing efficiency. A big and efficient WM allows for faster recall and processing, for more accurate performance and more ‘noticing’. This is a very important issue if one considers that WM is first and foremost the gateway to Long-Term Memory – where all the knowledge we have about a language and the world is permanently stored.

‘Noticing’ new key target language features, as Schmidt (1990) posits, propels our students’ learning forward, but only if they make the connection between what they notice and the system they have been building in their Long-Term Memory (their Interlanguage). Often this connection must be made under Real Operating Conditions (ROC) as they interact orally with an expert speaker, watch a video or listen. For this to happen in these contexts – when they operate under considerable communicative pressure- their WM must be highly efficient.

Teachers should heed the above recommendations in their daily practice and ensure that lessons are as much about developing students’ WM processing efficiency (cognitive control) what I call ‘horizontal progression’ – as they are about vertical progression, i.e. ‘jumping’ from one level of linguistic challenge to a higher one, for the sake of being able to say “we have covered three tenses” or “we have created complex sentences”. Vertical progression without horizontal progression creates very unstable system, like a tall building without strong foundations.

Finally, raising the students’ awareness of how WM’s works can be very useful in enhancing their learning and their metacognition. I have several short sessions with my KS3 classes where I summarize the key features of memory and how WM works. The teacher must create the right context for these sessions and make them as simple, visual and engaging as possible. I was so proud when last week, a year 8 girl said to another who was finding a word difficult to pronounce:”You have to chunk it” and actually modelled the chunking to her classmate. Ultimately, the more students know about how their mind works, the more they will feel in control of their learning.

The causes of learner errors in L2 writing – an attempt to integrate Skill-theory and mainstream accounts of Second Language Acquisition

A cognitive account of errors in L2-writing rooted in skill acquisition and production theory

1. Introduction

 The purpose of this paper is to shed light on the cognitive sources of errors. An understanding of the psycholinguistic mechanisms that cause our students to err is fundamental if we aim to significantly enhance the (surface-level) accuracy of their written output. In what follows, I intend to take the reader through the cognitive processes underlying second language writing mapping out in detail the stages and contexts in which mistakes are usually made. In order for the reader to fully comprehend the ensuing discussion, I will begin by outlining four key concepts in Cognitive psychology which are essential for an understanding of any skill-acquisition theory of language development and production. I will then proceed to concisely discuss the way humans acquire languages according to one of the most widely accepted models of second language acquisition (Anderson’s 2000). Finally, I will provide an exhaustive account of the way we process writing rooted in Cognitive theory and resulting from an integration of a number of models of monolingual and bilingual production. I shall then draw my conclusions as to the implications of the reviewed theories and research for an approach to error correction.

2. Key concepts in Cognitive psychology

Before engaging in my discussion of L2-acquisition and L2-writing, I shall introduce the reader to the following concepts, central to any Cognitive theory of human learning and information processing:

1. Short-term and Long-Term Memory

2. Metalinguistic Knowledge and Executive Control

3. The representation of knowledge in memory

4. Proceduralisation or Automatisation

2.1 Short-Term Memory and Long-Term Memory

In Information Processing Theory, memory is conceived as a large and permanent collection of nodes, which become complexly and increasingly inter-associated through learning (Shiffrin and Schneider, 1977). Most models of memory identify a transient memory called ‘Short-Term Memory’ which can temporarily encode information and a permanent memory or Long-Term Memory (LTM). As Baddeley (1993) suggested, it is useful to think of Short-Term Memory as a Working Short-Term Memory (WSTM) consisting of the set of nodes which are activated in memory as we are processing information. In most Cognitive frameworks, WSTM is conceived as the provision of a work space for decision making, thinking and control processes and learning is but the transfer of patterns of activation from WSTM to LTM in such a way that new associations are formed between information structures or nodes not previously associated. WSTM has two key features:

(1) fragility of storage (the slightest distraction can cause the brain to lose the data being processed);

(2) limited channel capacity (it can only process a very limited amount of information for a very limited amount of time).

LTM, on the other hand, has unlimited capacity and can hold information over long periods of time. Information in LTM is normally in an inactive state. However, when we retrieve data from LTM the information associated with such data becomes activated and can be regarded as part of WSTM.

In the retrieval process, activation spreads through LTM from active nodes of the network to other parts of memory through an associative chain: when one concept is activated other related concepts become active. Thus, the amount of active information resulting can be much greater than the one currently held in WSTM. Since source nodes have only a fixed capacity for emitting activation (Anderson, 1980), and this capacity is divided amongst all the paths emanating from a given node, the more paths that exist, the less activation will be transmitted to any one path and the slower will be the rate of activation (fan effect). Thus, additional information about a concept interferes with memory for a particular piece of information thereby slowing the speed with which that fact can be retrieved. In the extreme case in which the to-be-retrieved information is too weak to be activated (owing, for instance, to minimal exposure to that information) in the presence of interference from other associations, the result will be failure to recall (Anderson, 2000).

2.2 Metalinguistic knowledge and executive control (processing efficiency)

This distinction originated from Bialystock (1982) and its validity has been supported by a number of studies (eg Hulstijin and Hulstijin, 1984). Knowledge is the way the language system is represented in LTM; Control refers to the regulation of the processing of that knowledge in WSTM during performance. The following is an example of how this distinction applies to the context of my study: many of my intermediate students usually know the rules governing the use of the Subjunctive Mood in Italian, however, they often fail to apply them correctly in Real Operating Conditions, that is when they are required to process language in real time under communicative pressure (e.g. writing an essay under severe time constraints; giving a class presentation; etc.). The reason for this phenomenon may be that WSTM’s attentional capacity being limited, its executive-control systems may not cope efficiently with the attentional demands required by a task if we are performing in operating conditions where worry, self-concern and task-irrelevant cognitive activities make use of some of the available limited capacity (Eysenck and Keane, 1995). These factors may cause retrieval problems in terms of reduced speed of recall/recognition or accuracy. Thus, as Bialystock (1982) and Johnson (1996) assert, L2-proficiency involves degree of control as well as a degree of knowledge.

2.3 The representation of knowledge in memory

Declarative Knowledge is knowledge about facts and things, while Procedural Knowledge is knowledge about how to perform different cognitive activities. This dichotomy implies that there are two ‘paths’ for the production of behaviour: a procedural and a declarative one. Following the latter, knowledge is represented in memory as a database of rules stored in the form of a semantic network. In the procedural path, on the other hand, knowledge is embedded in procedures for action, readily at hand whenever they are required, and it is consequently easier to access.

Anderson (1983) provides the example of an EFL-learner following the declarative path of forming the present perfect in English. S/he would have to apply the rule: use the verb ‘have’ followed by the past participle, which is formed by adding ‘-ed’ to the infinitive of a verb. S/he would have to hold all the knowledge about the rule formation in WSTM and would apply it each time s/he is required to form the tense. This implies that declarative processing is heavy on channel capacity, that is, it occupies the vast majority of WSTM attentional capacity. On the other hand, the learner who followed the procedural path would have a ‘program’, stored in LTM with the following information: the present perfect of ‘play’ is ‘I have played’. Deploying that program, s/he would retrieve the required form without consciously applying any explicit rule. Thus, procedural processing is lighter on WSTM channel capacity than declarative processing.

2.4 Proceduralisation or Automatization

Proceduralisation or Automatization is the process of making a skill automatic. When a skill becomes proceduralised it can be performed without any cost in terms of channel capacity (i.e. “memory space”): skill performance requires very little conscious attention, thereby freeing up ‘space’ in WSTM for other tasks.

3. L2-Acquisition as skill acquisition: the Anderson Model

The Anderson Model, called ACT* (Adaptive Control of Thought), was originally created as an account of the way students internalise geometry rules. It was later developed as a model of L2-learning (Anderson, 1980, 1983, 2000). The fundamental epistemological premise of adopting a skill-development model as a framework for L2-acquisition is that language is considered as governed by the same principles that regulate any other cognitive skill. A number of scholars such as Mc Laughlin (1987), Levelt (1989), O’Malley and Chamot (1990) and Johnson (1996), have produced a number of persuasive arguments in favour of this notion.

Although ACT* constitutes my espoused theory of L2 acquisition, I do not endorse Anderson’s claim that his model alone can give a completely satisfactory account of L2-acquisition. I do believe, however, that it can be used effectively to conceptualise at least three important dimensions of L2-acquisition which are relevant to this study: (1) the acquisition of grammatical rules in explicit adult L2-instruction, (2) the developmental mechanisms of language processing and (3) the acquisition of Learning Strategies.

 Figure 1: The Anderson Model (adapted from Anderson, 1983)

                 

The basic structure of the model is illustrated in Figure 1, above. Anderson posits three kinds of memory, Working Memory, Declarative Memory and Production (or Procedural) Memory. Working Memory shares the same features previously discussed in describing WSTM while Declarative and Production Memory may be seen as two subcomponents of LTM. The model is based on the assumption that human cognition is regulated by cognitive structures (Productions) made up of ‘IF’ and ’THEN’ conditions. These are activated every single time the brain is processing information; whenever a learner is confronted with a problem the brain searches for a Production that matches the data pattern associated with it. For example:

IF the goal is to form the present perfect of a verb and the person is 3rd singular/

THEN form the 3rd singular of ‘have’

IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed /

THEN form the past participle of the verb

The creation of a Production is a long and careful process since Procedural Knowledge, once created, is difficult to alter. Furthermore, unlike declarative units, Productions control behaviour, thus the system must be circumspect in creating them. Once a Production has been created and proved to be successful, it has to be automatised in order for the behaviour that it controls to happen at naturalistic rates. According to Anderson (1985), this process goes through three stages: (1) a Cognitive Stage, in which the brain learns a description of a skill; (2) an Associative Stage, in which it works out a method for executing the skill; (3) an Autonomous Stage, in which the execution of the skill becomes more and more rapid and automatic.

In the Cognitive Stage, confronted with a new task requiring a skill that has not yet been proceduralised, the brain retrieves from LTM all the declarative representations associated with that skill, using the interpretive strategies of Problem-solving and Analogy to guide behaviour. This procedure is very time-consuming, as all the stages of a process have to be specified in great detail and in serial order in WSTM. Although each stage is a Production, the operation of Productions in interpretation is very slow and burdensome as it is under conscious control and involves retrieving declarative knowledge from LTM. Furthermore, since this declarative knowledge has to be kept in WSTM, the risk of cognitive overload leading to error may arise.

Thus, for instance, in translating a sentence from the L1 into the L2, the brain will have to consciously retrieve the rules governing the use of every single L1-item, applying them one by one. In the case of complex rules whose application requires performing several operations, every single operation will have to be performed in serial order under conscious attentional control. For example, in forming the third person of the Present perfect of ‘go’, the brain may have to: (1) retrieve and apply the general rule of the present perfect (have + past participle); (2) perform the appropriate conjugation of ‘have’ by retrieving and applying the rule that the third person of ‘have’ is ‘has’; (3) recall that the past participle of ‘go’ is irregular; (4) retrieve the form ‘gone’.

Producing language by these means is extremely inefficient. Thus, the brain tries to sort out the information into more efficient Productions. This is achieved by Compiling (‘running together’) the productions that have already been created so that larger groups of productions can be used as one unit. The Compilation process consists of two sub-processes: Composition and Proceduralisation. Composition takes a sequence of Productions that follow each other in solving a particular problem and collapses them into a single Production that has the effect of the sequence. This process lessens the number of steps referred to above and has the effect of speeding up the process. Thus, the Productions

P1 IF the goal is to form the present perfect of a verb / THEN form the simple present of have

P2 IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed / THEN form the past participle of the verb would be composed as follows:

P3 IF the goal is to form the present perfect of a verb / THEN form the present simple of have and THEN the past participle of the verb

An important point made by Anderson is that newly composed Productions are weak and may require multiple creations before they gain enough strength to compete successfully with the Productions from which they are created. Composition does not replace Productions; rather, it supplements the Production set. Thus, a composition may be created on the first opportunity but may be ‘masked’ by stronger Productions for a number of subsequent opportunities until it has built up sufficient strength (Anderson, 2000). This means that even if the new Production is more effective and efficient than the stronger Production, the latter will be retrieved more quickly because its memory trace is stronger.

The process of Proceduralisation eliminates clauses in the condition of a Production that require information to be retrieved from LTM memory and held in WSTM. As a result, proceduralised knowledge becomes available much more quickly than non-proceduralised knowledge. For example, the Production P2 above would become

IF the goal is to form the present perfect of a verb

THEN form ‘had’ and then form the past participle of the verb

The process of Composition and Proceduralisation will eventually produce after repeated performance:

IF the goal is to form the present perfect of ‘play’/ THEN form ‘ has played’

For Anderson it seems reasonable to suggest that Proceduralisation only occurs when LTM knowledge has achieved some threshold of strength and has been used some criterion number of times. The mechanism through which the brain decides which Productions should be applied in a given context is called by Anderson Matching. When the brain is confronted with a problem, activation spreads from WSTM to Procedural Memory in search for a solution – i.e. a Production that matches the pattern of information in WSTM. If such matching is possible, then a Production will be retrieved. If the pattern to be matched in WSTM corresponds to the ‘condition side’ (the ‘if’) of a proceduralised Production, the matching will be quicker with the ‘action side’ (the ‘then’) of the Production being deposited in WSTM and make it immediately available for performance (execution). It is at this intermediate stage of development that most serious errors in acquiring a skill occur: during the conversion from Declarative to Procedural knowledge, unmonitored mistakes may slip into performance.

The final stage consists of the process of Tuning, made up of the three sub-processes of Generalisation, Discrimination and Strengthening. Generalisation is the process by which Production rules become broader in their range of applicability thereby allowing the speaker to generate and comprehend utterances never before encountered. Where two existing Productions partially overlap, it may be possible to combine them to create a greater level of generality by deleting a condition that was different in the two original Productions. Anderson (1982) produces the following example of generalization from language acquisition, in which P6 and P7 become P8

P6 IF the goal is to indicate that a coat belongs to me THEN say ‘My coat’

P7 IF the goal is to indicate that a ball belongs to me THEN say ‘My ball’

P8 IF the goal is to indicate that object X belongs to me THEN say ‘My X’

Discrimination is the process by which the range of application of a Production is restricted to the appropriate circumstances (Anderson, 1983). These processes would account for the way language learners over-generalise rules but then learn over time to discriminate between, for example, regular and irregular verbs. This process would require that we have examples of both correct and incorrect applications of the Production in our LTM.

Both processes are inductive in that they try to identify from examples of success and failure the features that characterize when a particular Production rule is applicable. These two processes produce multiple variants on the conditions (the ‘IF’ clause(s) of a Production) controlling the same action. Thus, at any point in time the system is entertaining as its hypothesis not just a single Production but a set of Productions with different conditions to control the action.

Since they are inductive processes, Generalization and Discrimination will sometimes err and produce incorrect Productions. As I shall discuss later in this chapter, there are possibilities for Overgeneralization and useless Discrimination, two phenomena that are widely documented in L2-acquisition research (Ellis, 1994). Thus, the system may simply create Productions that are incorrect, either because of misinformation or because of mistakes in its computations.
ACT* uses the Strengthening mechanism to identify the best problem-solving rules and eliminate wrong Productions. Strengthening is the process by which better rules are strengthened and poorer rules are weakened. This takes place in ACT* as follows: each time a condition in WSTM activates a Production from procedural memory and causes an action to be deployed and there is no negative feedback, the Production will become more robust. Because it is more robust it will be able to resist occasional negative feedback and also it will be more strongly activated when it is called upon:
The strength of a Production determines the amount of activation it receives in competition with other Productions during pattern matching.Thus, all other things being equal, the conditions of a stronger Production will be matched more rapidly and so repress the matching of a weaker Production (Anderson, 1983: 251)
Thus, if a wrong Interlanguage item has acquired greater strength in a learner’s LTM than the correct L2-item, when activation spreads the former is more likely to be activated first, giving rise to error. It is worth pointing out that, just as the strength of a Production increases with successful use, there is a power-law of decay in strength with disuse.
 4.Extending the model: adding a ‘Procedural-to-Procedural route’ to L2-acquisition
One limitation of the model is that it does not account for the fact that sometimes unanalysed L2-chunks of language are through rote learning or frequent exposure. This happens quite frequently in classroom settings, for instance with set phrases used in everyday teacher-to-student communication (e.g. ‘Open the book’, ‘Listen up!’). As a solution to this issue Johnson (1996) suggested extending the model by allowing for the existence of a ‘Procedural to Procedural route’ to acquisition whereby some unanalysed L2-items can be automatised with use, ‘jumping’, as it were, the initial Declarative Stage posited by Anderson. In classroom settings where instruction is grammar-based, however, only a minority of L2-items will be acquired this way.

5. Bridging the ‘gap’ between the Anderson Model and ‘mainstream’ second language acquisition (SLA) research

As already pointed out above, a number of theorists believe that Anderson provides a viable conceptualisation of the processes central to L2-acquisition. However, ACT* was intended as a model of acquisition of cognitive skills in general and not specifically of L2-acquisition. Thus, the model rarely concerns itself explicitly with the following phenomena documented by SLA researchers: Language Transfer, Communicative Strategies, Variability and Fossilization. These phenomena are relevant to secondary school settings for the following reasons: firstly, as far as Language Transfer and Communicative Strategies are concerned, they constitute common sources of error in the written output of L2-intermediate learners. Variability, on the other hand, refers to the phenomenon, particularly evident in the written output of beginner to intermediate learner writing, whereby learners produce a given structure correctly in certain contexts and incorrectly in others. Finally, Fossilization is often produced as a possible explanation of the recurrence of erroneous Interlanguage forms in learner Production. Although these phenomena are accounted for in Anderson’s framework, I believe that a discussion of mainstream SLA theories and research will enhance the reader’s understanding of their nature and implications for L2 teaching. It should be noted that for reason of relevance and space my discussion will be concise and focus only on the aspects which are most relevant to the present study.

5.1 Language Transfer

This phenomenon refers to the way prior linguistic knowledge influences L2-learner development and performance (Ellis, 1994). The occurrence of Language Transfer can be accounted for by applying the ACT* framework since, as Anderson asserts, existing Declarative Knowledge is the starting point for acquiring new knowledge and skills. In a language-learning situation this means drawing on knowledge about previously learnt languages both in order to understand the mechanisms of the target language and to solve a communicative problem. In this section, I shall draw on the SLA literature in order to explain how, when and why Language Transfer occurs and with what effects on learner written output.

As Odlin (1989) points out, Language Transfer can be positive, facilitating L2-performance. This is often the case with students of mine who studied French or Spanish and are able to transfer their knowledge of these languages advantageously to Italian because Romance languages share a large number of cognates and grammatical rules. However, Language Transfer can also be negative, resulting in erroneous L2-output. For instance, over-confidence in the fact that Italian and French/Spanish are similar may prompt a learner with L3-French to apply the rules of the French Subjunctive in the deployment of the Italian Subjunctive. This strategy will be effective in some contexts but unsuccessful in others.
Transfer can also result in the avoidance or the over production of L2-structures. For example, several intermediate Japanese learners of Italian I taught in the past avoided using relative clauses because these do not exist in their L1. On the other hand they over-used the definite article because, being totally unfamiliar with the concept of definite article in their language and noticing that Italians use it frequently, they thought that they were less likely to err if they used it all the time.
Transfer can occur as a deliberate Compensatory Strategy: a learner’s conscious attempt to fill a gap in his/her L2-knowledge (Faerch and Kasper, 1983). This phenomenon is particularly recurrent when the distance between the learner’s L1/L3 and the target language is perceived as close (e.g. Spanish and Italian). Transfer can also occur subconsciously (Poulisse, 1990). When used as Compensatory Strategy, Transfer can give rise to ‘Foreignization’ and ‘Code-switching’ errors. The former refer to the conscious alteration of L1- or L3-words to make them ‘sound’ target language like. For instance, not knowing the Italian for ‘rice’ (= riso) a French learner may add an ‘o’ to the French word ‘riz’ in the hope that the resulting ‘rizo’ will be correct. Code-switching, instead, consists in the conscious or subconscious use of unaltered L1-/L3-words/phrases when an L2-word is required. Both types of error are more likely to happen in spoken language, especially when a learner is under communicative pressure or does not have access to dictionaries or other sources of L2-knowledge. However, I have personally observed this phenomenon also in the writing of many L2-student writers, especially at the level of connectives (e.g. the French conjunction ‘et’ instead of the Italian ‘e’).
Transfer may affect any level of L2-learner output. As far as the areas of language use more relevant to the present study are concerned (syntax, morphology and lexis), Ringbom (1987) reports evidence from Ringbom (1978) and other studies (e.g. Sjoholm, 1982) that L1-Transfer affects lexical usage more than it does syntax or morphology. Of these two, it appears that morphology is the less affected area. The following factors appear to determine the extent to which Language Transfer occurs:
 (1) Perceived language distance: the closer two languages are perceived to be the more likely is Transfer to occur (see Sjoholm,1982)
 (2) Learning environment: it appears that Transfer is more likely to occur in settings where the naturalistic input is lower (Odlin, 1989);
 (3) Levels of monitoring: Gass and Selinker (1983) observe that careful, unmonitored learner output usually contains fewer instances of Transfer errors
 (4) Learner-type: learners who take more risks and are more meaning-oriented tend to transfer less than form-focused ones (Odlin, 1989);
(5) Task: some tasks appear to elicit greater use of Transfer (Odlin, 1989). This appears to be the case for L1-into-L2 translation including the approach, typical of many beginner L2-learners, whereby an L2-essay is produced first in the L1 and then translated word by word.
 (6) Proficiency: as the Anderson Model and many other Cognitive models (e.g. deBot, 1992) posit, the starting point of acquisition is the L1 which is gradually replaced by the target language as more and more L2-language items are acquired. Thus, Transfer is more likely to occur at the early stages of development than in the advanced ones. This is borne out by a number of studies (e.g. Taylor, 1975; Liceras, 1985; Major, 1987). Kellerman (1978), however, found that a number of Transfer errors occur only at advanced stages.
 5.2 Communication Strategies
Due to space constraints, my discussion of Communication Strategies (CSs) will be limited to the basic issues and levels of language (i.e. grammar, lexis and orthography) relevant to this study. Corder (1978) defined a CS as follows:

a systematic technique employed by a speaker to express his meaning

when faced with some difficulty. Difficulty in this definition is taken to

refer uniquely to the speaker’s inadequate command of the language in

the interaction (Corder, 1978: 8)

A number of taxonomies of CSs have been suggested. Most frameworks (e.g. Faerch and Kasper, 1983) identify two types of approaches to solving problems in communication: (1) avoidance behaviour (avoiding the problem altogether); (2) achievement behaviour (attempting to solve the problem through an alternative plan). In Faerch and Kasper’s (1983) framework, the two different approaches result respectively in the deployment of (a) reduction strategies, governed by avoidance behaviour, and (b) achievement strategies, governed by achievement behaviour.

Reduction strategies can affect any level of writing from content (Topic avoidance) to orthography (Graphological avoidance). Most CSs studies, however, have focused on lexical items. Achievement strategies (Faerch and Kasper, 1983) correspond to Tarone’s (1981) concept of Production Strategies and to Corder’s (1978, 1983) Resource expansion strategies. By using an achievement strategy, the learner attempts to solve problems in communication by expanding his communicative resources (Corder, 1978) rather than by reducing his communicative goal (functional reduction). Faerch and Kasper (1983) identify two broad categories of achievement strategies: Compensatory and Non linguistic. The Compensatory strategies relevant to the present study are:
 (1) Code switching (see 2.4.1 above)
(2) Interlingual transfer (see 2.4.1 above)

(3) Inter-/intralingual transfer, i.e. a generalization of an IL rule is made but the generalization is influenced by the properties of the corresponding L1-structures (Jordens, 1977)

 (4) IL based strategies. These include:

(i) Generalization: the extension of an item to an inappropriate context in order to fill the ‘gaps’ in their plans. One type of generalization relevant to the present study is Approximation, that is: the use of a lexical item to express only an approximation of the intended meaning.

(ii) Word coinage. This kind of strategy involves the learner in a creative construction of a new IL word

 5.3 Variability: the occurrence of unsystematic errors
Variability in learner language refers to the phenomenon whereby a given structure is produced correctly in certain contexts and incorrectly in others. As Ellis (1994) observed, this phenomenon is very common in the early stages of acquisition and may rapidly disappear. The Anderson model can be used to account for Variability as follows: firstly, as Anderson posits, two or more Productions which refer to different hypotheses about the use of a structure can co-exist in a learner’s LTM before the onset of the Discrimination process. These Productions compete for retrieval and, if they have more or less equal strength, may be used alternately at a given stage of development as the learner is testing their effectiveness through the trial-and-error process which characterizes the early stages of learning.
Secondly, if amongst the Productions relative to a given structure, Production ‘X’ based on the correct rule is much weaker than Production ‘Y’ based on an incorrect rule, Production ‘Y’ is likely to be retrieved first when a learner is not devoting sufficient conscious attention to it and and his/her brain ‘runs on automatic’. The lack of attention is usually determined by processing inefficiency, that is the incapacity of WSTM to cope with the demands that the task poses on its attentional system (Bygate, 1988). Processing inefficiency issues in writing are more likely to arise in unplanned and/or unmonitored Production (Krashen, 1977, 1981), especially when the L2-learner is under severe time constraints / communicative pressure (Polio, Fleck and Ledere 1998).
 A third cause of Variability refers to what above I called the ‘Procedural route’ to acquisition: aspects of the usage of a structure may have been acquired by a learner through the rote learning of or exposure to set L2-phrases (e.g. classroom phrases). Thus, in cases where that structure is well beyond that learner’s stage of development and s/he doe not know any declarative knowledge of that structure, s/he will deploy that structure correctly within the context of those set phrases while being likely to make mistakes with it in other contexts.
 5.4 Fossilization

In the SLA literature, Fossilization (or Routinization) refers to the phenomenon whereby some IL forms keep reappearing in a learner’s Interlanguage ‘in spite of the learner’s ability, opportunity and motivation to learn the target language…’ (Selinker and Lamendella, 1979: 374). An error can become fossilised even if L2-learners possess correct declarative knowledge about that form and have received intensive instruction on it (Mukkatesh, 1986).

Applying the Anderson Model, Fossilization can be explained as the Proceduralisation of an erroneous form through frequent and successful use. As already discussed, Productions that have been proceduralised are very difficult to alter, which would explain why some theorists believe that Fossilisation is a permanent state (Lamendella, 1977; Mukkatesh, 1986). For applied linguists working in the Skill-theory paradigm errors can be de-fossilised, but only after a lengthy and painstaking process of re-learning of the correct form through targeted monitoring and practice in real operating conditions (Johnson, 1996).
Several models (biological, acculturational, interactional, etc.) have been proposed to account for the development of Fossilization in L2-learning. Interactional models state that the interaction between the learner and other L2-speakers determines whether a component of the learner’s Interlanguage system is reinforced contributing to Fossilization. One such model, Tollefson and Firn’s (1983), posits that an overemphasis on conveyance of meaning in the classroom may, in the absence of cognitive feedback, promote fossilization.
On this issue, Johnson (1996) also asserts that linguistic survival is often achieved by a form of pidgin and that encouraging this type of communication in the language classroom is a practice conducive to fossilisation. Skehan (1994) and Long (1983) also make the point that communicative production might lead to the development of reduction strategies resulting in pidginogenesis and fosssilization.
 6. A Cognitive account of the writing processes: the Hayes and Flower (1980) model

Hayes and Flower’s (1980) model of essay writing is regarded as one of the most effective accounts of writing available to-date (Eysenck and Keane, 1995). As Figure 2 below shows, it posits three major components:

1. Task-environment,

2. Writer’s Long-Term Memory,
3. Writing process.

Figure 1: The Hayes and Flower model (adapted from Hayes and Flower, 1980)

The Task-environment includes: (1) the writing assignment (the topic, the target audience, and motivational factors) and the text; (2) The Writer’s LTM, which provides factual knowledge and skill/genre specific procedures; (3) the Writing Process, which consists of the three sub-processes of Planning, Translating and Reviewing.

The Planning process sets goals based on information drawn from the Task-environment and Long-Term Memory (LTM). Once these have been established, a writing plan is developed to achieve those goals. More specifically, the Generating sub-process retrieves information from LTM through an associative chain in which each item of information retrieved functions as a cue to retrieve the next item of information and so forth. The Organising sub-process selects the most relevant items of information retrieved and organizes them into a coherent writing plan. Finally, the Goal-setting sub-process sets rules (e.g. ‘keep it simple’) that will be applied in the editing process. The second process, Translating, transforms the information retrieved from LTM into language. This is necessary since concepts are stored in LTM in the form of Propositions, not words. Flower and Hayes (1980) provide the following examples of what propositions involve:

[(Concept A) (Relation B) (Concept C)]

 or
{Concept D) (Attribute E)], etc.

Finally, the Reviewing processes of Reading and Editing have the function of enhancing the quality of the output. The Editing process checks that discourse conventions are not being flouted, looks for semantic inaccuracies and evaluates the text in the light of the writing goals. Editing has the form of a Production system with two IF- THEN conditions:

 The first part specifies the kind of language to which the editing production

applies, e.g. formal sentences, notes, etc. The second is a fault detector for

such problems as grammatical errors, incorrect words, and missing context.

(Hayes and Flower, 1980: 17)

 When the conditions of a Production are met, e.g. a wrong word ending is detected, an action is triggered for fixing the problem. For example:

CONDITION 1: (formal sentence) first letter of sentence lower case

CONDITION 2: change first letter to upper case

(Adapted from Hayes and Flower, 1980: 17)

Two important features of the Editing process are: (1) it is triggered automatically whenever the conditions of an Editing Production are met; (2) it may interrupt any other ongoing process. Editing is regulated by an attentional system called The Monitor. Hayes and Flower do not provide a detailed account of how it operates. Differently from Krashen’s (1977) Monitor, a control system used solely for editing, Hayes and Flower’s (1980) device operates at all levels of production orchestrating the activation of the various sub-processes. This allows Hayes and Flower to account for two phenomena they observed. Firstly, the Editing and the Generating processes can cut across other processes. Secondly, the existence of the Monitor enables the system to be flexible in the application of goal-setting rules, in that through the Monitor any other processes can be triggered. This flexibility allows for the recursiveness of the writing process.

 7. Extending the model: Cognitive accounts of the translating sub-processes and insights from proofreading research

Hayes and Flower’s model is useful in providing teachers with a framework for understanding the many demands that essay writing poses on students. In particular, it helps teachers understand how the recursiveness of the writing process may cause those demands to interfere with each other causing cognitive overload and error. Furthermore, by conceptualising editing as a process that can interrupt writing at any moment, the model has a very important implication for a theory of error: self-correctable errors occurring at any level of written production are not always the result of a retrieval failure; they may also be interpreted as caused by detection failure. However, one limitation of the model for a theory of error is that its description of the Translating and Editing sub-processes is too general. I shall therefore supplement it with Cooper and Matsuhashi’s (1983) list of writing plans and decisions along with findings from other L1-writing Cognitive research, which will provide the reader with a more detailed account. I shall also briefly discuss some findings from proofreading research which may help explain some of the problems encountered by L2-student writers during the Editing process.

7.1 The translating sub-processes

Cooper and Matsuhashi (1983) posit four stages, which correspond to Hayes and Flower’s (1980) Translating: Wording, Presenting, Storing and Transcribing. In the first stage, the brain transforms the propositional content into lexis. Although at this stage the pre-lexical decisions the writer made at earlier stages and the preceding discourse limit lexical choice, Wording the proposition is still a complex task: ‘the choice seems infinite, especially when we begin considering all the possibilities for modifying or qualifying the main verb and the agentive and affected nouns’ (Cooper and Matsuhashi, 1983: 32). Once s/he has selected the lexical items, the writer has to tackle the task of Presenting the proposition in standard written language. This involves making a series of decisions in the areas of genre and grammar. In the area of grammar, Agreement and Tense will be the main issues.
The proposition, as planned so far, is then temporarily stored in Working Short Term Memory (henceforth WSTM) while Transcribing takes place. Propositions longer than just a few words will have to be rehearsed and re-rehearsed in WSTM for parts of it not to be lost before the transcription is complete. The limitations of WSTM create serious disadvantages for unpractised writers. Until they gain some confidence and fluency with spelling, their WSTM may have to be loaded up with letter sequences of single words or with only 2 or 3 words (Hotopf, 1980). This not only slows down the writing process, but it also means that all other planning must be suspended during the transcriptions of short letter or word sequences.

The physical act of transcribing the fully formed proposition begins once the graphic image of the output has been stored in WSTM. In L1-writing, transcription occupies subsidiary awareness, enabling the writer to use focal awareness for other plans and decisions. In practiced writers, transcription of certain words and sentences can be so automatic as to permit planning the next proposition while one is still transcribing the previous one. An interesting finding with regards to these final stages of written production comes from Bereiter, Fire and Gartshore (1979) who investigated L1-writers aged 10-12. They identified several discrepancies between learners’ forecasts in think-aloud and their actual writing. 78 % of such discrepancies involved stylistic variations. Notably, in 17% of the forecasts, significant words were uttered in forecasts which did not appear in the writing. In about half of these cases the result was a syntactic flaw (e.g. the forecasted phrase ‘on the way to school’ was written ‘on the to school’). Bereiter and Scardamalia (1987) believe that lapses of this kind indicate that language is lost somewhere between storage in WSTM and grapho-motor execution. These lapses, they also assert, cannot be described as ‘forgetting what one was going to say’ since almost every omission was reported on recall: in the case of ‘on the to school’, for example, the author not only intended to write ‘on the way’ but claimed later to have written it. In their view, this is caused by interference from the attentional demands of the mechanics of writing (spelling, capitalization, etc.), the underlying psychological premise being that a writer has a limited amount of attention to allocate and that whatever is taken up with the lower level demands of written language must be taken from something else.

In sum, Cooper and Matsuhashi (1983) posit two stages in the conversion of the preverbal message into a speech plan: (1) the selection of the right lexical units and (2) the application of grammatical rules. The unit of language is then deposited in STM awaiting translation into grapho-motor execution. This temporary storage raises the possibility that lower level demands affects production as follows: (1) causing the writer to omit material during grapho-motor execution; (2) leading to forgetting higher-level decisions already made. Interference resulting in WSTM loss can also be caused by lack of monitoring of the written output due to devoting conscious attention entirely to planning ahead, while leaving the process of transcription to run ‘on automatic’.

 7.3 Some insights from proofreading research

Proofreading theories and research provide us with the following important insights in the mechanisms that regulate essay editing. Firstly, proofreading involves different processes from reading: when one proofreads a passage, one is generally looking for misspellings, words that might have been omitted or repeated, typographical mistakes, etc., and as a result, comprehension is not the goal. When one is reading a text, on the other hand, one’s primary goal is comprehension. Thus, reading involves construction of meaning, while proofreading involves visual search. For this reason, in reading, short function words, not being semantically salient, are not fixated (Paap, Newsome, McDonald and Schvaneveldt, 1982). Consequently, errors on such words are less likely to be spotted when one is editing a text concentrating mostly on its meaning than when one is focusing one’s attention on the text as part of a proofreading task (Haber and Schindler, 1981). Errors are likely to decrease even further when the proofreader is forced to fixate on every single function word in isolation (Haber and Schindler, 1981).

 It should also be noted that some proofreader’s errors appear to be due to acoustic coding. This refers to the phenomenon whereby the way a proofreader pronounces a word/diphthong/letter influences his/her detection of an error. For example, if an English learner of L2-Italian pronounces the ‘e’ in the singular noun ‘stazione’ (= train station) as [i] instead of [e], s/he will find it difficult to differentiate it from the plural ‘stazioni’ (= train stations). This may impinge on her/his ability to spot errors with that word involving the use of the singular for the plural and vice versa.
 The implications for the present study are that learners may have be trained to go through their essays at least once focusing exclusively on form. Secondly, they should be asked to pay particular attention to those words (e.g. function words) and parts of words (e.g. verb endings) that they may not perceive as semantically salient.

7.4 Bilingual written production: adapting the unilingual model

Writing, although slower than speaking, is still processed at enormous speed in mature native speakers’ WSTM. The processing time required by a writer will be greater in the L2 than in the L1 and will increase at lower levels of proficiency: at the Wording stage, more time will be needed to match non-proceduralized lexical materials to propositions; at the Presenting stage, more time will be needed to select and retrieve the right grammatical form. Furthermore, more attentional effort will be required in rehearsing the sentence plans in WSTM; in fact, just like Hotopf’s (1980) young L1-writers, non proficient L2-learners may be able to store in WSTM only two or three words at a time. This has implications for Agreement in Italian in view of the fact that words more than three-four words distant from one another may still have to agree in gender and number. Finally, in the Transcribing phase, the retrieval of spelling and other aspects of the writing mechanics will take up more WSTM focal awareness.

Monitoring too will require more conscious effort, increasing the chances of Short-term Memory loss. This is more likely to happen with less expert learners: the attentional system having to monitor levels of language that in the mature L1-speaker are normally automatized, it will not have enough channel capacity available, at the point of utterance, to cope with lexical/grammatical items that have not yet been proceduralised. This also implies that Editing is likely to be more recursive than in L1-writing, interrupting other writing processes more often, with consequences for the higher meta-components. In view of the attentional demands posed by L2-writing, the interference caused by planning ahead will also be more likely to occur, giving rise to processing failure. Processing failure/WSTM loss may also be caused by the L2-writer pausing to consult dictionaries or other resources to fill gaps in their L2-knowledge while rehearsing the incomplete sentence plan in WSTM. In fact, research indicates that although, in general terms, composing patterns (sequences of writing behaviours) are similar in L1s and L2s there are some important differences.
In his seminal review of the L1/L2-writing literature, Silva (1993) identified a number of discrepancies between L1- and L2-composing. Firstly, L2-composing was clearly more difficult. More specifically, the Transcribing phase was more laborious, less fluent, and less productive. Also, L2-writers spent more time referring back to an outline or prompt and consulting dictionaries. They also experienced more problems in selecting the appropriate vocabulary. Furthermore, L2-writers paused more frequently and for longer time, which resulted in L2-writing occurring at a slower rate. As far as Reviewing is concerned, Silva (1993) found evidence in the literature that in L2-writing there is usually less re-reading of and reflecting on written texts. He also reported evidence suggesting that L2-writers revise more, before and while drafting, and in between drafts. However, this revision was more problematic and more of a preoccupation. There also appears to be less auditory monitoring in the L2 and L2-revision seems to focus more on grammar and less on mechanics, particularly spelling. Finally, the text features of L2-written texts provide strong evidence suggesting that L2-writing is a less fluent process involving more errors and producing – at least in terms of the judgements of native English speakers – less effective texts.
 8. Conclusion : Implications for teaching and learning
 In the above I have discussed my espoused theories of L2-acquisition and L2-writing. I started by focusing on Anderson’s (1980, 1982, 1983, 2000) account of how language structures are acquired and language processing develops. Drawing on SLA research I then discussed some important phenomena and processes involved in the aetiology of error relevant to the present study. Finally, I discussed Hayes and Flower (1980) and Cooper and Matsuhashi’s (1983) models of written production and their implications for bilingual written production. The following notions emerging from my discussion must in my view provide the theoretical underpinnings of any remedial corrective approach to L2 writing errors.
 (1) L2-acquisition occurs in much the same way as the acquisition of any other cognitive skill;

(2) the acquisition of a skill begins consciously with an associative stage during which the brain creates a declarative representation of Productions (i.e. the procedures that regulate that skill);

 (3) it is an adaptive feature of the human brain to make the performance of any skill automatic in order to render its execution fast and efficient in terms of cognitive processing;
(4) automatisation can be a very lengthy process, since for a skill to become automatic it must be performed numerous times;

(5) the Productions that regulate a skill become automatised only if their application is perceived by the brain as resulting in positive outcomes;

 (6) at a given stage in learner development, more than one Production relating to a given item can co-exist in his/her Interlanguage. These compete for retrieval. The Productions with the stronger memory trace – not necessarily the correct one – will win;

(7) negative evidence as to the effectiveness of a Production determines whether it is going to be rejected by the brain or automatised;

(8) once a Production (including those giving rise to errors) is automatised, it is difficult to alter;

(9) errors may be the result of lack of knowledge or processing efficiency problems;

(10) learners use Language Transfer and Communication Strategies to make up for the absence of the appropriate L2-declarative knowledge necessary in order to realize a given communicative goal. These phenomena are likely to give rise to error.

(11) the writing process is recursive and can be interrupted by editing any time;

(12) the errors in L2-writing relating to morphology and syntax occur mostly in the Translating phase of the writing process when Propositions are converted into language. They may occur as a result of cognitive overload caused by the interference of various processes occurring simultaneously and posing cognitive demands beyond the processing ability of the writer’s WSTM.

(13) editing for meaning involves different processes than editing for form. When editing for meaning the writer/editor is more likely to miss function words because they are less semantically salient.

These notions have important implications for any approach to error correction. One refers to Anderson’s assumption that the acquisition of L2-structures in classroom-settings mostly begins at conscious level with the creation of mental representations of the rules governing their usage. The obvious corollary being that corrective feedback should help the learners create or restructure their declarative knowledge of the L2-rule system, any corrective approach should involve L2 students in grammar learning involving cognitive restructuring and extensive practice. This entails delivering a well planned and elaborate intervention not just a one-off lesson on a structure identified as a problem in a learner’s written piece.

Another important notion advanced by Anderson is that the automization of a Production occurs only after it has been applied numerous times and with success (actual or perceived). This notion has three major implications for Error Correction.
 (1) Error Correction can play an important role in L2-acquisition since, in order to reject a wrong production, the learner needs lots of negative evidence that informs him/her of its incorrectness.

(2) Errors should be corrected consistently to avoid sending the learners confused messages about the correctness of a given structure.

(3) For Error Correction to lead to the de-fossilization of wrong Productions and the automatization of new, correct Productions, the former should occur in learner output as rarely as possible, whereas the latter should be produced as frequently as possible.

 Consistently with these three notions, a teacher may want to invest a lot of effort in raising the learners’ awareness of their errors, should be as consistent as possible in correcting them and, finally, encourage learners to practise the problematic structures as often as possible in and outside the context of the essays they will write.
Other implications refer to the concept of automatization. As discussed above, automatised cognitive structures are difficult to alter. It follows that Error Correction is more likely to be successful (in the absence of major developmental constraints) at the early stages of learning an L2-item, before ‘incorrect’ Productions have reached the ‘Strengthening’ stage of Acquisition. Thus, in order to prevent error fossilization or automatization any corrective intervention should tackle errors more prone to routinization (usually those referring to less semantically salient language items) as early as possible in the acquisition process.
Another set of implications relates to the causes and nature of learner errors. As discussed above, a number of errors result from L2-learners’ attempt to make up for their lack of correct L2-declarative knowledge through the deployment of the following problem-solving strategies:

(1) Communication Strategies: in the absence of linguistic knowledge of an L2-item a learner may deploy achievement strategies. As far as lexical items are concerned they may deploy the following strategies leading to error: ‘Approximation’, ‘Coinage’ and ‘Foreignization’. In the case of grammar or orthography learners will draw on existing declarative knowledge, over generalizing a rule (generalization) or guessing;

(2) Use of resources: learners may use dictionaries or other sources of L2-knowledge (including people) incorrectly;
(3) L1-or L3-transfer;

(4) Avoidance.

 Since these errors are extremely likely to occur in beginner and intermediate students’ writing, teachers should involve students in activities raising learner awareness of these issues and provide practice in ways of tackling them. For instance, as far as the above Communicative Strategies are concerned, students should be trained to use dictionaries and other resources more frequently to prevent errors due to Approximation, Coinage and Foreignization. Secondly, as far as poor use of resources is concerned learners must be made aware of the possible pitfalls of using dictionaries and textbooks and be trained to use these tools more effectively and efficiently. Thirdly, learners must be made aware of the issues related to the excessive reliance on L1-/L3-Transfer and of negative Transfer (again, through effective learner training)

As discussed above, errors can also be caused by WSTM processing failure due to cognitive overload. Grammatical, lexical and orthographical errors will occur as a result of learners handling structures which have not been sufficiently automatized, in situations where the operating conditions in WSTM are too challenging for the attentional system to monitor all levels of production effectively. The implications for Error Correction is that learners should be made aware of which types of contexts are more likely to cause processing efficiency failure so that they may approach them more carefully in the future. Examples of such contexts may be sentences where the learner is attempting to express a difficult concept which requires new vocabulary and the use of tenses/moods he has not totally mastered; long sentences where items agreeing with each other in gender and/or number are located quite far apart from each other (not an uncommon occurrence in Italian); situations in which the production of a sentence has to be interrupted several times because the learner needs to consult the dictionary. Remedial practice should provide the learners with opportunity to operate in such contexts in order to train them to cope with the cognitive demands they pose on processing efficiency in Real Operating Conditions.

Another important implication of my discussion for Error Correction refers to the notion that errors are not simply the result of a Translating failure, but also of an Editing failure. The failure to detect may be due to two factors. One relates to the goal oriented-ness of the Production systems that regulate any levels of language processing: the brain is going to review the accuracy of every single aspect of the text only if it perceives that this is relevant to its goals in the production of the text. Thus, if the communication of content is the main goal the writer sets in an essay, the accuracy of function words is likely to become a secondary concern since they are not perceived as salient to the realisation of that goal. The other issue will be time. It is likely that lack of time will exacerbate this issue since it will force learners to prioritise certain aspects of their output in the Editing phase(s) over others. The implication for Error Correction is that it should aim at developing learner intentionality to be accurate at every level of the text. This may not be easy if accuracy does not feature prominently amongst the curriculum, teacher and/or student’s priorities.
Secondly, editing failure may be due to the fact that reading an essay to check and/or improve the quality of its content is different from proofreading aimed at checking non-semantic aspects of the output. As noted above, the former approach to text revision often results in the failure to detect errors with function words. The implications of this phenomenon for corrective approaches is that learner awareness of the importance of paying greater attention to function words in Editing essays should be raised. Moreover, as an editing strategy, learners should be advised to carry out the revision of their essay-drafts in two distinct phases: one aimed at checking the content and another one focused exclusively on the accuracy of grammar, lexis and orthography.
Furthermore, editing failure may be caused by the same issues that caused learners to err in the first place, that is: processing efficiency. Thus, the contexts that I listed above, sentences that are long and/or complex and/or contain problematic structures, etc. may pose problems on the learner ability to detect and/or self-correct the errors. One way to tackle this issue in remedial teaching is to advise the learners to be particularly careful in editing this kind of sentences and to approach them in a way that poses less strain on their processing efficiency; for example, by concentrating first on the items that, based on the self-knowledge they will have developed as part of metacognitive training, they are more likely to get wrong in that kind of context (training in the Monitoring-Familiar-Errors strategy would help in this respect).
A final point refers to the implications of the phenomenon of Variability for the diagnostic phase of any error treatment. As discussed above, this phenomenon may confuse the teacher or the error analyst as to whether a learner knows a given structure or not, since s/he seems to get it right at times and wrong at others. The implications of this phenomenon for Error Correction is that teachers should investigate the causes of any occurrence of this phenomenon in their learners’ writing in order to ascertain whether they refer to poor editing skills, partial knowledge of the target rule, etc. Based on the identification of the causes an appropriate action plan will be decided.

13 common misconceptions about foreign language learning

     images (4)

  1. If language learners are exposed to a foreign language before puberty they will learn it with a native accent – There is strong evidence that this is true below the age of 7 provided that the learners receive masses of second language input (e.g. in a full immersion learning environment like an international school). Whether this can happen between this age and the onset of puberty is highly controversial; there is mounting evidence which suggests that sensor motor processing loses plasticity much earlier on than other cognitive processing (such as those responsible for grammar and vocabulary learning) and native pronunciation becomes fossilized well before puberty.
  2. Children learn foreign languages better than adults – This is true of pronunciation, but not of vocabulary or grammar. Given the same amount of instruction, there are no significant differences in uptake between children and adults. Also, there is evidence that some adults can indeed acquire native proficiency.
  3. Women’s brain is biologically better equipped for foreign language learning than men’s. That is why our female students are better than boys – This is also quite controversial. Brain imaging shows that whereas males tend to lateralize language processing (i.e. they only use one brain hemisphere) women use both hemispheres, which may, at least in theory, constitute an advantage. But whether this actually causes women to perform better than men is controversial. Other sociological and affective factors seem to play a more crucial role in determining female language learners ‘superiority’ at language learning ( see my blogpost: here)
  4. When we think, we think in our dominant language – Unless we engage in inner talk and subvocalize, the brain does not think in any particular language. When we think, we create ‘entities of information’ called propositions, which are not made up of words (scientists are still trying to figure out what they are made of ); we transform them into words as we speak (which has enormous implication for L2 processing. See my article: here ). During oral production or in writing our brain activates all of the languages we master, simultaneously; the language being used will receive stronger activation whilst the others will be less activated. This phenomenon explains why language learners, in unmonitored speech, often use words from their L1 whilst speaking the L2 even though they know the L2 word. (For more on this, read: here)
  5. Students should be taught in sync with their dominant learning style(s) as this will enhance their learning – Most psychologist/neuroscientists refute the learning-style and multiple intelligences constructs maintaining that they are not valid representation of how the brain works. No credible evidence has ever been put forward in support of the hypothesis that teaching learners in their ‘learning-style’ actually enhances language proficiency development
  6. Foreign language words similar to first language words (cognates) are easier to learn – This is true to a certain extent. It is true that cognates are easier to learn receptively; however, in terms of recall, when the spelling and/or pronunciation of an L1 and L2 word are very similar, they can cause ‘cross-association’ issues whereby the learner is confused as to which one is the correct spelling or pronunciation (due to the fact that the two items are very closely associated in Long-term memory).
  7. If we do not correct our students’ errors we ‘fail’ them – Although we may ‘fail’ them in terms of not fulfilling their expectations (as most of them do ask for corrective feedback), there is absolutely no conclusive evidence that error correction works. Most of the evidence put forward in support of the efficacy of error correction as it is traditionally carried out is not strong enough to justify the time spent by teachers correcting (see my article: herehere )
  8. Asking students to self-correct their errors is more effective than simply providing the correction – This is another belief that many teachers hold about error correction. Research suggests not only that it does not usually ‘work’ but that it can be, in some cases, detrimental to learning (see my article: here  )
  9. Mistakes in student written output that the students can self-correct are due to ‘carelessness’ – This is the case for a minor percentage of the mistakes found in our students’ written (and even oral) output. What we term ‘careless mistakes’ are in most cases due to processing inefficiency caused by cognitive overload on Working Memory, the inability, that is, for the brain to juggle all the cognitive demands posed by a task simultaneously (see my blog post: here )
  10. Learning-to-learn (training foreign language students in learning skills) enhances proficiency – Many books and articles have been written promoting the benefits of Strategy Based Instruction for language learning. We are told by many scholars and educators that we should instruct our students in learning strategies and life-long learning skills. Although there is some (fragmented) evidence that certain strategies or combinations of strategies may help learners at some level, the results of the studies carried out to date are mixed and controversial. This is due to a number of issues, one of them being that we do not really know what strategies work and which ones do not and how they interact with individual characteristics and different contexts.
  11. Some foreign language learning strategies are better than other – Some educational consultants make a living out of suggesting what learning strategies are effective in performing certain tasks. However, the issue is not which strategies are ‘good’ or ‘bad’. The issue is which strategy works best with specific students or tasks and whether they are applied at the right time, in the right context. This complicates the implementation of Strategy Based Instruction and makes one question whether the time invested into trying to figure out all these variables, deciding which strategies to teach and how and then implementing the training is actually justified by the gains one may obtain in the end.
  12. Children learn languages through an innate module of the human mind called LAD (Language Acquisition Device) that makes the acquisition of subsequent languages possible, too – The LAD is a system of principles that children would be born with that helps them learn language, and accounts for the order in which children learn structures, and the mistakes they make as they learn. According to the LAD proponents this device exists separately from any other cognitive mechanism of the brain. Just like our faith in a supernatural being, the belief that such a ‘magical’ device actually exists has never been proven scientifically. Nor has any reasonably detailed account of how it may work even been provided by his proponent, Stephen Krashen, or his supporters. Yet, many language educators swear by it and several teaching methodologies (e.g TPRS and CLIL) are based on the belief that LAD exists.
  13. First language and second language acquisition involve the same processes – This cannot be the case as L2 learners, are not ‘tabula rasa’ (clean slates); they have already acquired a language and, as masses of research show, they use that language to formulate hypotheses and make inferences about how any new (target) language works. The existence of Language Transfer evidences the importance of pre-existing languages in the acquisition of a subsequent one.

You can find more on these topics in my book ‘The language teacher toolkit’ , co-authored with Steve Smith and available for purchase on http://www.amazon.com