Mapping out the second language writing process and implications for teaching

1. Introduction 

In this article I take on the complex task of illustrating the cognitive processes that take place in the brain of a second language student writer as s/he produces an essay. Why? Because often, as teachers and target language experts, we forget how challenging it is for our students to write an essay in a foreign language. Gaining a better grasp of the thinking processes essay writing in a second language involves, may help teachers become more cognitively empathetic towards their students; moreover, they may reconsider the way they teach writing and treat student errors.

A caveat before we proceed: this article is quite a challenging read which may require some background in applied linguistics and/or cognitive psychology. However, if you want to avoid the complex stuff and concentrate on writing at lower proficiency levels (KS2 to KS4) you can go straight to section 3 below.

 

2. A Cognitive account of the writing processes: the Flower and Hayes (1981) model

The Flower and Hayes (1981) model of essay writing in a first language is regarded as one of the most effective accounts of writing available to-date (Eysenck and Keane, 2010). As Figure 1 below shows, it posits three major components:

  1. Task-environment,
  1. Writer’s Long-Term Memory,
  1. Writing process.

Figure 1: The Flower and Hayes model (click to expand)

The Task-environment includes: (1) the Writing Assignment (the topic, the target audience, and motivational factors) and the text; (2) the Writer’s Long-term memory, which provides factual knowledge and skill/genre specific procedures; (3) the Writing Process, which consists of the three sub-processes of Planning, Translating and Reviewing.

The Planning process sets goals based on information drawn from the Task-environment and Long-Term Memory (LTM). Once these have been established, a writing plan is developed to achieve those goals. More specifically, the Generating sub-process retrieves information from LTM through an associative chain in which each item of information or concept retrieved functions as a cue to retrieve the next item of information and so forth.The Organising sub-process selects the most relevant items of information retrieved and organizes them into a coherent writing plan. Finally, the Goal-setting sub-process sets rules (e.g. ‘keep it simple’) that will be applied in the Editing process. The second process, Translating, transforms the information retrieved from LTM into language. This is necessary, since concepts are stored in LTM in the form of Propositions (‘concepts’/ ‘imagery’), not words. Flower and Hayes (1980) provide the following examples of what propositions involve:

[(Concept A) (Relation B) (Concept C)]

or

{Concept D) (Attribute E)], etc.

Finally, the Reviewing processes of Reading and Editing have the function of enhancing the quality of the output. The Editing process checks that grammar rules and discourse conventions are not being flouted, looks for semantic inaccuracies and evaluates the text in the light of the writing goals. Editing has the form of a Production system with two IF- THEN conditions:

The first part specifies the kind of language to which the editing production

applies, e.g. formal sentences, notes, etc. The second is a fault detector for

such problems as grammatical errors, incorrect words, and missing context.

(Flower and Hayes, 1981: 17)

In other words, when the conditions of a Production are met, e.g. a wrong word ending is detected, an action is triggered for fixing the problem. For example:

CONDITION 1: (formal sentence) first letter of sentence lower case

CONDITION 2: change first letter to upper case

(Flower and Hayes, 1981: 17)

Two important features of the Editing process are: (1) it is triggered automatically whenever the conditions of an Editing Production are met; (2) it may interrupt any other ongoing process. Editing is regulated by an attentional system called The Monitor. Hayes and Flower do not provide a detailed account of how it operates. Differently from Krashen’s (1977) Monitor, a control system used solely for editing, Hayes and Flower’s (1980) device operates at all levels of production orchestrating the activation of the various sub-processes. This allows Hayes and Flower to account for two phenomena they observed. Firstly, the Editing and the Generating processes can cut across other processes. Secondly, the existence of the Monitor enables the system to be flexible in the application of goal-setting rules, in that through the Monitor any other processes can be triggered. This flexibility allows for the recursiveness of the writing process.

Hayes and Flower’s model is useful in providing teachers with a framework for understanding the many demands that essay writing poses on students. In particular, it helps teachers understand how the recursiveness of the writing process may cause those demands to interfere with each other causing cognitive overload and error.

Furthermore, by conceptualising editing as a process that can interrupt writing at any moment, the model has a very important implication for a theory of error: self-correctable errors occurring at any level of written production are not always the result of a retrieval failure; they may also be interpreted as caused by detection failure (failure to ‘spot’ a mistake).

One limitation of the model for a theory of error is that its description of the Translating and Editing sub-processes is too general. I shall therefore supplement it with Cooper and Matsuhashi’s (1983) list of writing plans and decisions along with findings from other L1-writing Cognitive research, which will provide the reader with a more detailed account. I shall also briefly discuss some findings from proofreading research which may help explain some of the problems encountered by L2-student writers during the Editing process.

3. The translating sub-processes

Cooper and Matsuhashi (1983) posit four stages, which correspond to Flower and Hayes’ (1981) conceptualization of the Translating process: Wording, Presenting, Storing and Transcribing (see picture 2 below)

Figure 2 –  The Translating sub-processes (Click to expand)

  • WORDING THE PROPOSITION (Lexical selection) – In this first stage, the brain transforms the propositional content into lexis. Although at this stage the pre-lexical decisions the writer made at earlier stages and the preceding discourse limit lexical choice, Wording the proposition is still a complex task: ‘the choice seems infinite, especially when we begin considering all the possibilities for modifying or qualifying the main verb and the agentive and affected nouns’ (Cooper and Matsuhashi, 1983: 32). Once s/he has selected the lexical items, the writer has to tackle the task of Presenting the proposition in standard written language.
  • PRESENTING THE PROPOSITION (Grammatical encoding) – This involves making a series of decisions in the areas of genre, grammar and syntax. In the area of grammar, Agreement, Word-order and Tense will be the main issues for L1_English learners of languages like French, German, Italian or Spanish. Functional processing, i.e. assigning a functional role (e.g. subject, verb, direct or indirect object) to every word in a sentence, precedes Positional processing, i.e. arranging the words in the correct syntactic order. This is the stage where grammatical mistakes are made, mostly due, in second language writing, to processing inefficiency (e.g. mistakes caused by cognitive overload), carelessness (i.e. superficial self-monitoring) and, of course, L1/L3 negative transfer (i.e. the influence of the first language or other languages).
  • STORING THE PROPOSITION (Phonological and Orthographic encoding) – The proposition, as planned so far, is then temporarily stored in Working Short Term Memory (henceforth WSTM) while Transcribing takes place, first in form of sound (phonological encoding). Phonological encoding is crucial for internal speech monitoring and for preparing the sentence for written output Propositions longer than just a few words will have to be rehearsed and re-rehearsed in WSTM for parts of it not to be lost before the transcription is complete. The limitations of WSTM create serious disadvantages for unpractised writers. Until they gain some confidence and fluency with spelling, their WSTM may have to be loaded up with letter sequences of single words or with only 2 or 3 words (Hotopf, 1980). This not only slows down the writing process, but it also means that all other planning must be suspended during the transcriptions of short letter or word sequences. This is where many spelling mistakes occur, especially with younger L2 learners (who have a much more limited working memory capacity than older learners) or less able older learners. This problem will be exacerbated in the case of children having to learn a completely different writing system (i.e. an English native learning to write in Mandarin).
  • TRANSCRIBING THE PROPOSITION (Motor planning and execution) – The physical act of transcribing the fully formed proposition begins once the graphic image of the output has been stored in WSTM. In L1-writing, transcription occupies subsidiary awareness, enabling the writer to use focal awareness for other plans and decisions. In practised writers, transcription of certain words and sentences can be so automatic as to permit planning the next proposition while one is still transcribing the previous one. An interesting finding with regards to these final stages of written production comes from Bereiter, Fire and Gartshore (1979) who investigated L1-writers aged 10-12. They identified several discrepancies between learners’ forecasts in think-aloud and their actual writing. 78 % of such discrepancies involved stylistic variations. Notably, in 17% of the forecasts, significant words were uttered in forecasts which did not appear in the writing. In about half of these cases the result was a syntactic flaw (e.g. the forecasted phrase ‘on the way to school’ was written ‘on the to school’). Bereiter and Scardamalia (1987) believe that lapses of this kind indicate that language is lost somewhere between storage in WSTM and grapho-motor execution. These lapses, they also assert, cannot be described as ‘forgetting what one was going to say’ since almost every omission was reported on recall: in the case of ‘on the to school’, for example, the author not only intended to write ‘on the way’ but claimed later to have written it. In their view, this is caused by interference from the attentional demands of the mechanics of writing (spelling, capitalization, etc.), the underlying psychological premise being that a writer has a limited amount of attention to allocate and that whatever is taken up with the lower level demands of written language must be taken from something else.

In sum, Cooper and Matsuhashi (1983) posit four main stages in the conversion of the preverbal message into a speech plan: (1) the selection of the right lexical units (2) the application of grammatical and syntactic rules. (3) The unit of language is then deposited in WSTM in phonological and orthographic form, awaiting translation into grapho-motor execution (the physical act of writing). (4) grapho-motor execution

The temporary storage in stage (3) raises the possibility that lower level demands affect production as follows: (1) causing the writer to omit material during grapho-motor execution; (2) leading to forgetting higher-level decisions already made. Interference resulting in WSTM loss can also be caused by lack of monitoring of the written output due to devoting conscious attention entirely to planning ahead, while leaving the process of transcription to run ‘on automatic’.

Picture 2 (repeated)

Implications for teaching

The implications of the above for second language instruction are obvious: the implementation of a process-based approach to writing instruction in which the teachers stages sequences of activities which explicitly address the micro-skills of writing. This entails engaging students, consistently, in tasks which practise said micro-skills. See picture 2 above, Picture 3 below, provides examples of activities that could be implemented for each micro-skill.

Imagine, after exploiting 90-95% comprehensible-input texts intensively through a range of activities, engaging the students in micro-writing tasks addressing all of the micro-skills of writing prior to staging more unstructured and creative activities. Would your students not perform better? Would you not be more inclusive?

4. How about editing? Some insights from proofreading research

Proofreading theories and research provide us with the following important insights in the mechanisms that regulate essay editing. Firstly, proofreading involves different processes from reading: when one proofreads a passage, one is generally looking for misspellings, words that might have been omitted or repeated, typographical mistakes, etc., and as a result, comprehension is not the goal. When one is reading a text, on the other hand, one’s primary goal is comprehension. Thus, reading involves construction of meaning, while proofreading involves visual search. For this reason, in reading, short function words, not being semantically salient, are not fixated (Paap, Newsome, McDonald and Schvaneveldt, 1982). Consequently, errors on such words are less likely to be spotted when one is editing a text concentrating mostly on its meaning than when one is focusing one’s attention on the text as part of a proofreading task (Haber and Schindler, 1981). Errors are likely to decrease even further when the proofreader is forced to fixate on every single function word in isolation (Haber and Schindler, 1981).

It should also be noted that some proofreader’s errors appear to be due to acoustic coding. This refers to the phenomenon whereby the way a proofreader pronounces a word/diphthong/letter influences his/her detection of an error. For example, if an English learner of L2-Italian pronounces the ‘e’ in the singular noun ‘stazione’ (= train station) as [i] instead of [e], s/he will find it difficult to differentiate it from the plural ‘stazioni’ (= train stations). This may impinge on her/his ability to spot errors with that word involving the use of the singular for the plural and vice versa.

Implications for teaching

The implications for language learning are that learners may have to be trained to edit their essays at least once focusing exclusively on form. Ideally, with beginner learners, the teacher should encourage several rounds of editing, each focusing on a different potential problem areas, gradually moving from easier to more challenging items.

Secondly, they should be told to pay particular attention to those words (e.g. function words) and parts of words (e.g. verb endings) which are not semantically and perceptually salient and are therefore less likely to be noticed.

Thirdly, dictations should feature regularly in language lessons from very early on in the L2 learning process, beginning with micro-dictation focusing on single letters or syllables, then moving on to gapped sentences and finally to longer texts with more cognitive challenging tasks such as dictogloss.

5. Bilingual written production: adapting the first language model

Writing, although slower than speaking, is still processed at enormous speed in mature native speakers’ WSTM. The processing time required by a writer will be greater in the L2 than in the L1 and will increase at lower levels of proficiency: at the Wording stage, more time will be needed to match non-proceduralized lexical materials to propositions; at the Presenting stage, more time will be needed to select and retrieve the right grammatical form. Furthermore, more attentional effort will be required in rehearsing the sentence plans in WSTM; in fact, just like Hotopf’s (1980) young L1-writers, non- proficient L2-learners may be able to store in WSTM only two or three words at a time. This has implications for Agreement in Italian, French or Spanish in view of the fact that words more than three-four words distant from one another may still have to agree in gender and number. Finally, in the Transcribing phase, the retrieval of spelling and other aspects of the writing mechanics will take up more WSTM focal awareness.

Monitoring too will require more conscious effort, increasing the chances of Short-term Memory loss. This is more likely to happen with less expert learners: the attentional system having to monitor levels of language that in the mature L1-speaker are normally automatized, it will not have enough channel capacity available, at the point of utterance, to cope with lexical/grammatical items that have not yet been proceduralised. This also implies that Editing is likely to be more recursive than in L1-writing, interrupting other writing processes more often, with consequences for the higher meta-components. In view of the attentional demands posed by L2-writing, the interference caused by planning ahead will also be more likely to occur, giving rise to processing failure. Processing failure/WSTM loss may also be caused by the L2-writer pausing to consult dictionaries or other resources to fill gaps in their L2-knowledge while rehearsing the incomplete sentence plan in WSTM. In fact, research indicates that although, in general terms, composing patterns (sequences of writing behaviours) are similar in L1s and L2s there are some important differences.

In his seminal review of the L1/L2-writing literature, Silva (1993) identified a number of discrepancies between L1- and L2-composing. Firstly, L2-composing was clearly more difficult. More specifically, the Transcribing phase was more laborious, less fluent, and less productive. Also, L2-writers spent more time referring back to an outline or prompt and consulting dictionaries. They also experienced more problems in selecting the appropriate vocabulary. Furthermore, L2-writers paused more frequently and for longer time, which resulted in L2-writing occurring at a slower rate. As far as Reviewing is concerned, Silva (1993) found evidence in the literature that in L2-writing there is usually less re-reading of and reflecting on written texts. He also reported evidence suggesting that L2-writers revise more, before and while drafting, and in between drafts. However, this revision was more problematic and more of a preoccupation. There also appears to be less auditory monitoring in the L2 and L2-revision seems to focus more on grammar and less on mechanics, particularly spelling. Finally, the text features of L2-written texts provide strong evidence suggesting that L2-writing is a less fluent process involving more errors and producing – at least in terms of the judgements of native English speakers – less effective texts.

Implications for teaching

Firstly, the process of writing being much more challenging in the second language, teachers must scaffold writing much more carefully. This starts with staging an intensive reading-to-learn phase prior to engaging the students in writing tasks, which unfortunately doesn’t happen with textbooks, because the latter only include reading-to-comprehend activities. After this intensive receptive phase, teachers should engage the students in a series of micro-writing tasks which gradually phase out support and increase in cognitive load. This means beginning writing practice with basic SVO sentences and gradually moving to more complex SVOCA sentence structures and subordination.

6. Conclusions

Essay writing is a very complex process which poses a huge cognitive load onto the average second  language learner’s brain, especially at lower levels of proficiency. The cognitive load is determined by the fact that the L2 student writer has to plan the essay whilst focusing on the act of translating ideas (propositions) into the foreign language. Converting propositions into L2 sentences, as I have tried to illustrate, is hugely challenging per se for a non-native speaker, let alone when the brain has to hold in Working Memory the ideas one intends to convey at the same time. Working Memory being limited in capacity it is easy to ‘lose’ one or the other in the process and equally easy to make mistakes, as the monitor (i.e. the error detecting system in our brain) receives less activation due to cognitive overload.

Hence, before plunging our students into essay writing teachers need to ensure that they provide lots of practice in the execution of the different sets of skills that writing involves (e.g. ideas generation, planning, organization, self-monitoring) separately. For instance, a writing lesson may involve sections where the students are focused on discrete sets of higher order skills (e.g. practising idea generation; evaluating relevance of the ideas generated to a given topic/essay title) and sections where lower order skills are drilled in ( application of grammar and syntax rules, lexical recall, spelling). Only when the students have reached a reasonable level of maturity across most of the key skills embedded in the models discussed above should students be asked to engage in extensive writing.

Consequently, an effective essay-writing instruction curriculum must identify the main skills involved in the writing process (as per the above model); allocate sufficient time for their extensive practice as contextualized within the themes and text genres relevant to the course under study; build in the higher order skill practice opportunities to embed practice in the lower order skills identified above (the mechanics of the language), whilst being mindful of potential cognitive overload issues.

In terms of editing, the above discussion has enormous implications as it suggests that teachers should train learners to become more effective editors through regular editing practice (e.g. ‘Error hunting’ activities). Such training may result in more rapid and effective application of editing skills in real operating conditions as the execution of Self-Monitoring will require less cognitive space in Working Memory. Training learners in editing should be a regular occurrence in lessons if we want it to actually work; also, it should be contextualized in a relevant linguistic environment as much as possible (e.g. if we are training the students to become better essay editors we ought to provide them with essay-editing practice, not just with random and uncontextualized sentences).

In conclusion, I firmly believe that the above model should be used by every language teacher, curriculum designer as a starting point for the planning of any writing instruction program. Not long ago I took part in a conference and a colleague was recommending to the attending teachers to give his Year 12 students exam-like discursive essays to write, week in week out for the very first week of the course. I am not ashamed to admit that I used to do the same in my first years of teaching A levels. The above discussion, however, would suggest that such an approach may be counterproductive; it may lead to errors, fossilization of those errors, and inhibit proficiency development whilst stifling the higher metacomponents of the writing process, idea-generation, essay organization and self-monitoring.

Why does training L2 learners in METACOGNITION often fail?

Introduction

In the 80s and 90s, metacognition – one’s awareness and regulation of one’s own thinking and learning processes – was a big deal in educational circles. L2 researchers like O’Malley and Chamot, Wenden, Cohen and, in England, Professors Macaro (my PhD supervisor) and Graham (my PhD internal examiner), advocated vehemently for the implementation of training in metacognitive strategies as a means to improve learning outcomes.

These advocates postulated, based on evidence from a handful of promising studies, that metacognition could be effectively taught following a principled framework (Explicit Strategy Training) which unfolded pretty much like the model in the picture below (ERSI = Explicit Reading Strategies Instruction), significantly enhancing L2 students performance across all four language skills.

Figure 1 – Explicit Strategy Training model

As often happens in our field, the interest fizzled out pretty soon, as language educators quickly realised that the time and effort they had to put in in order for metacognitive training (henceforth MT) to yield some substantive benefits was more than they could afford. There were other issues too, which I will explore below, to do with developmental readiness, teacher expertise and motivation, which deterred many language educators from buying into MT.

I experienced first-hand how time consuming, effortful and complex implementing an MT program is, during my PhD in Self-Monitoring strategies as applied to L2 essay writing. Mind you, the results were excellent: the training managed to significantly reduce a wide range of very stubborn errors in my students’ writing. However, the time and effort I invested in the process was something that I could have never been able to put in, had I been a teacher on a full timetable.

40 years on since its golden age, metacognition and MT are trending again in educational circles. Many schools are now implementing metacognition enhancement programs in the hope to increase learner planning, monitoring and self-evaluation skills. However, at least from what I have gleaned from my school visits, conversations with colleagues and other anecdotal data, many of these programs exhibit a number of flaws which seriously undermine their efficacy. Before delving into them, let me remind the reader of what metacognition and metacognitive strategies are about.

Metacognition and metacognitive strategies – what are they?

Having written about metacognition before, I will very briefly remind the reader of what metacognition entails.

Metacognition is the awareness and regulation of one’s own thinking and learning processes. It involves three key components:

(1) Metacognitive Knowledge—understanding how one learns best;

(2) Metacognitive Regulation—planning, monitoring, evaluating and adjusting learning strategies; and

(3) Metacognitive Experience—reflecting on past learning to improve future performance.

In language learning, metacognition helps learners set goals, choose effective strategies, and evaluate progress. It fosters independence, problem-solving, and long-term retention. Effective metacognitive strategy training enables learners to become more self-aware and adaptable, improving comprehension, speaking, and writing skills. Ultimately, metacognition transforms learners into active, strategic thinkers who optimize their own learning.

Metacognitive strategies include actions, mental operations and techniques that L2 learners undertake in order to improve their performance by planning, monitoring, self-evaluating and setting goals, Tables 1 and two below categorize metacognitive strategies into those that help with planning & monitoring and those that support self-regulation & reflection, making them easier to implement systematically.

Common shortcomings of metacognitive training programs

1️ Insufficient or Inconsistent Training Duration

Many programs do not provide enough time for learners to fully develop and internalize metacognitive strategies. Effective strategy use requires regular and long-term practice and reinforcement lasting 3 to 6 months or even longer, yet some programs last only a few weeks. This is the most common reason as to why MT programs fail according to the literature.

Example Issue:

🔹 A 4-week metacognitive training program may not show strong results because learners haven’t had enough exposure to develop automatic strategy use.

Solution:

✅ Longer programs with progressive scaffolding (e.g., training over an entire semester or year).
✅ Periodic strategy reinforcement instead of one-time instruction.


2.  Lack of Explicit Training

Why It Matters

Some teachers assume that learners will naturally pick up metacognitive strategies just by being exposed to them. Implicit instruction (modeling, indirect feedback) can play an important role but on its own is often not enough—students need explicit training on how and when to use these strategies.

Example Issue:

🔹 A study where learners are simply given reading comprehension tasks but are not explicitly taught how to plan, monitor, and evaluate their reading may fail to show significant improvements.

Solution:

Explicit strategy instruction with step-by-step guidance (e.g., teaching learners to pause, summarize, and predict while reading).
✅ Use of think-aloud protocols where instructors demonstrate metacognitive strategies.


3. Lack of Learner Awareness & Readiness

Why It Matters

Not all learners instinctively use metacognitive strategies. Beginners or low-proficiency learners may lack the cognitive capacity to focus on both language processing and strategy application at the same time.

Example Issue:

🔹 A program implementing high-level reflection strategies with beginner learners may find little impact because they struggle with basic comprehension, making strategy use overwhelming.

Solution:

✅ Gradual introduction of simple strategies first, then progression to more complex ones.
✅ Differentiated instruction based on learner proficiency.


4. Misalignment Between Metacognitive Strategies and Task Demands

Why It Matters

Some strategies may not be suitable for the specific language task the students are being training to perform. If the strategy does not align with the nature of the task, learners may misuse or underuse it.

Example Issue:

🔹 Testing metacognitive listening strategies (predicting, summarizing) on a phoneme discrimination task may not see much improvement because phoneme recognition relies more on cognitive than metacognitive skills.

Solution:

✅ Ensure the right strategies are taught for the right tasks (e.g., metacognitive strategies are most useful for reading, writing, and listening comprehension).
✅ Train students when to use which strategy effectively.


5. Limited Learner Motivation or Engagement

Why It Matters

Some students do not see the immediate value of metacognitive strategies and fail to engage with them actively. If students are not motivated, they are unlikely to consistently apply the strategies outside of training sessions.

Example Issue:

🔹 A study assumes that students will automatically use metacognitive strategies in their self-study time, but without motivation, many learners simply do not apply them.

Solution:

✅ Increase strategy relevance by linking them to real-world benefits (e.g., improving exam performance, fluency, or confidence).
✅ Use gamification and self-reflection exercises to keep learners engaged.


6. Failure to Account for Individual Differences

Why It Matters

Learners differ in cognitive styles, motivation, and prior strategy knowledge. Some learners naturally use metacognitive strategies, while others struggle even after training.

Example Issue:

🔹 A study may average the results across all learners without considering that some learners benefited while others did not.

Solution:

✅ Conduct pre-tests to determine baseline strategy use before training.
✅ Use personalized strategy training rather than a one-size-fits-all approach.


✅ Use multiple assessment methods (e.g., think-aloud protocols, task-based assessments, real-time monitoring).
✅ Measure language proficiency gains alongside self-reports.



7. Teacher Expertise & Implementation Issues

Why It Matters

Some teachers may not be adequately trained in metacognitive instruction, leading to ineffective delivery.

Example Issue:

🔹 A program on listening strategy training fails to show strong results because teachers do not provide clear modeling or feedback.

Solution:

✅ Ensure teacher training in explicit strategy instruction.
✅ Use standardized instructional methods across all participants.


8. Short-Term vs. Long-Term Impact Measurement

Why It Matters

Some program measure effects immediately after training, missing potential long-term benefits. Metacognitive strategies often require time to internalize before showing clear benefits.

Example Issue:

🔹 A program finds no significant impact after 4 weeks, but if measured after 6 months, the results might be different.

Solution:

✅ Conduct longitudinal follow-ups to check delayed improvements.
✅ Use delayed post-tests to assess strategy retention.


Conclusion: Why do many metacognitive training programs fail?

Many MT programs fail to show strong effects because of:

  • Too short training duration – Not enough time for mastery.
  • The students may not be cognitively ready – MT does require the application of higher order skills
  • The students may simply not be interested – they are there to learn a language and may not see the long-term benefits or what you are trying to achieve
  • Lack of explicit strategy instruction – Students don’t know how to use the strategies effectively.
  • Poor alignment of strategies with tasks – Wrong strategies for the wrong skills.
  • The teachers simply do not have the know-how to teach metacognitive skills

Any Language educator wanting to teach metacognition should bear the above issues in mind before embarking on an MT program. Following a trend can be a very perilous endeavour, especially in a field like L2 acquisition, in which the research evidence that MT programs actually work is very fragmented and inconclusive.

If you want to know more about Metacognition and metacognitive training, you can attend any of my workshops organised by http://www.networkforlearning.org.uk

The most frequent words that MFL learners don’t learn; why and what you can do about it

Introduction

The words in any given language can be divided in content and function words. Content words include nouns, adjectives, verbs and adverbs, i.e. words that carry meaning. Function words, on the other hand, are grammatical words that serve structural purposes in a sentence rather than carrying lexical meaning. They help establish relationships between content words and provide grammatical cohesion.

Function words include: articles, prepositions, pronouns, conjunctions, auxiliary verbs, negation words, question words and particles. Table 2 below provides the full list of function words with examples from French and Spanish.

Why Are Function Words SO Important?

Function words are essential because they structure sentences, provide grammatical cohesion, and clarify meaning. Without function words, communication would be disjointed, ambiguous, or even incomprehensible. In other words, they are the glue that holds sentences together and in speech they improve our students’ communication and fluency. They are also very useful for comprehension, as they help language learners predict sentence meaning even if they don’t know all the content words in a sentence.

The following stats will give you an idea of how essential they are to communication and how important it is for our students to learn them:

  • although they constitute 1% of the entire lexicon of a language, they make up 55% of any text.
  • they dominate the most frequent 200 words in most languages (e.g. English, French and Spanish)
  • 50% of the top 1,000 words in any language are function words

With the above statistics in mind it will be immediately clear how important these words are in the context of the new GCSE, whose core vocabulary is based on word-frequency lists: they will consitute a massive chunk of the vocabulary your GCSE students are expected to learn for the new GCSE. A telling example: the definite article ‘le’ is ranked n 1 in French, in terms of frequency, on those lists…

The least focused-on words in language curricula and the most lately acquired ones too!

Although they are so key to communication and fluency, these words are notoriously the most neglected and the least successfully taught words in most MFL curricula! It is not uncommon to find that even our A-level students struggle with these words. At GCSE, the vast majority of our students, including the more able, have problems recalling and using function words in spontaneous speech and writing. In fact, you may be surprised to learn that these words are acquired late in our native language too. But why is it?

The main reason why these words are the least successfully learnt by our students refer to five main issues summarised in the table below:

Points 1, 2 and 3 in Table 3 are key and compound one another. If a word is already not too important for meaning, is weakly stressed in speech and is abstract, it is obvious that the the average language learner is unlikely to notice and learn it. Add to this the fact that these words do not always have a straightforward translation in the students’ L1 and that often the translation varies (e.g. ‘en’ in French can mean in, at, to and whilst).

Many traditional instructional practices make the intrinsic challenges these words pose to the learner even greater. These are a few examples of such practices:

  1. Most of the listening and reading activities staged by language teachers do focus the students on comprehension, but rarely do they explicitly target these words.
  2. The speed of delivery of aural texts doesn’t often allow the learners, especially at lower levels of proficiency, to notice these words. In EPI, on the other hand, the teacher models the sentences using a slow to moderate pace and uses a number of techniques and tasks to make these words noticeable and learnable, e.g. (1) input enhancement (to make function words stand out), (2) input-flood (to induce repeated processing of these words), (3) gapped dictations or the ‘Spot the missing detail’ task ; (4) gapped and tangled translations (where the focus is on these words); (5) Faulty transcript/Spot the difference; (6) Listen and spot the error (where function words are used incorrectly); etc.
  3. Many teachers and published instructional resources teach words in isolation. The Linguascope website is the most flagrant example of this, with lists of ten words to learn on their own completely decontextualised – which, according to research, is the most ineffective way of learning vocabulary for beginners. In EPI, the focus being on learning chunks of language, the students are introduced to and practise function words all the time and repeatedly, in context, and through masses of highly comprehensible input.
  4. Teachers rarely deliberately plan for the regular recycling of these words overtime through distributed practice. This is a big shortcoming because function words, by virtue of their abstractness and low saliency, are harder to commit to memory, hence they may require even more recycling than content words.

Implications for teaching

First of all, language teachers may want to use sentence builders or any other modelling tools which present function words in context through highly comprehensible input.

Secondly, when modelling the use of function words orally, they should do so using input enhancement techniques. Modelling through listening-whilst-reading techniques (as we do in EPI) at a slow-to-moderate pace whilst emphasizing these words through vocal and visual input enhancement techniques is very effective in facilitating noticing, especially in the presence of the assimilation phenomenon which causes a function words to blend in with the first syllable in the next word (e.g. in ‘il y a beaucoup de gens’ where ‘de’ is hardly audible in naturalistic speech).

Thirdly, teachers should stage a number of receptive and productive retrieval practice activities which deliberately target function words (as per the examples provided in the previous paragraph). Sentence-combining tasks can also be powerful in practising these words meaningfully, especially when it comes to connectives.

Fourthly, the curriculum should deliberately recycle function words many times over, constantly engaging the students in retrieval practice episodes. Aim at 30 to 40 encounters overall across all four skills.

Fifthly, do raise your students’ awareness of the importance of these words and, by regularly drawing their attention to them when dealing with written and oral texts, sensitize them to their existence. Also, encourage them to experiment with the function words that will enhance their oral and written output, such high-frequency discourse markers, indirect or emphatic pronouns, etc.

Lastly, since these words are notoriously difficult to commit to long-term memory, besides tons of recycling, do endeavour to be as multimodal as possible, employing gestures, images, songs, rhymes, digital, miniwhiteboard as well as paper-based learning, mnemonics, etc. Table 4 below lists a number of possible techniques you could use.

In which order should we teach grammar structures based on SLA research?

Introduction

I was recently asked by a member of the Facebook group I co-founded with Dylan Vinales, Global Innovative Language teachers how grammar structures should be sequenced in a curriculum. The easy answer is: from easier to difficult, of course. But how do we establish which structures are more easily learnable than others?

A researcher by the name of Manfred Pienemann, attempted to answer this question with a landmark study Pienemann, M. (1984). “Psychological Constraints on the Teachability of Languages.” Studies in Second Language Acquisition, 6(2), 186-214. This study laid the groundwork for his later development of Processability Theory (1998), which further expanded on how learners acquire grammatical structures in a fixed sequence.

Manfred Pienemann’s Learnability Theory suggests that language acquisition follows a predictable sequence due to human working memory’s cognitive constraints. His Processability Theory (PT) builds on this by explaining how learners acquire grammatical structures step by step, as their cognitive processing abilities develop.

Key concepts of Learnability Theory

  1. Developmental Stages: Language structures are acquired in a sequence, meaning some grammatical forms cannot be learned before others.
  2. Teachability Hypothesis: Instruction can only be effective if it aligns with the learner’s current stage of acquisition. Trying to teach advanced structures too early is ineffective.
  3. Processing Hierarchy: Learners process simpler linguistic structures before tackling more complex ones.

Stages of French Language Acquisition Based on Processability Theory

Pienemann’s theory outlines a six-stage sequence for second language acquisition. Below is how this sequence applies to French learners:

Stage 1: Single Words and Fixed Phrases (No Real Grammar Processing)

At this pre-syntactic stage, learners rely on memorized words and formulaic phrases without grammatical manipulation.

  • Bonjour ! (Hello!)
  • Merci ! (Thank you!)
  • Comment ça va ? (How’s it going?)
  • Moi, Marie. (Me, Marie.)

Key Characteristics:

Learners do not yet process word order or inflections.
Responses are often formulaic and learned as whole chunks.


Stage 2: Simple Word Order (Canonical Word Order – SVO)

At this stage, learners start forming simple Subject-Verb-Object (SVO) sentences.

  • Je mange une pomme. (I eat an apple.)
  • Il aime le chocolat. (He likes chocolate.)
  • Marie regarde la télé. (Marie watches TV.)

Key Characteristics:

Learners can construct basic declarative sentences.
No agreement processing yet (gender, number).
No word order variation (such as inversion for questions).


Stage 3: Morphological Inflections (Lexical Morphology)

Learners begin processing grammatical markers like plural (-s), gender agreement, and verb inflections.

  • Les pommes sont rouges. (The apples are red.) → (Plural agreement)
  • Un petit garçon / Une petite fille (A small boy / A small girl) → (Gender agreement)
  • Je finis mon travail. (I finish my work.) → (Present tense verb inflection)

Key Characteristics:

Learners begin applying regular inflections (e.g., plural -s, feminine -e).
Still inconsistent with irregular forms.
Errors in agreement (e.g., les grande maison instead of les grandes maisons).


Stage 4: Sentence Internal Reordering (Question Formation & Object Pronouns)

At this stage, learners acquire word order changes beyond the basic SVO structure. This includes:

1. Question Formation (Simple & Inversion)

  • Tu as un chien ? (You have a dog?) → (Rising intonation, no inversion)
  • Est-ce que tu as un chien ? (Do you have a dog?) → (Fixed structure)
  • As-tu un chien ? (Have you a dog?) → (Inversion – more advanced)

2. Object Pronoun Placement

  • Je vois Marie. (I see Marie.) → Basic SVO order
  • Je la vois. (I see her.) → (Pronoun before verb – first instance of reordering)
  • Je ne la vois pas. (I don’t see her.) → (More complex negative structure)

Key Characteristics:

Learners start reordering elements in sentences.
Questions evolve from declarative word order to inversion patterns.
Object pronouns begin appearing in correct positions.
Errors still common (e.g., Je vois la instead of Je la vois).


Stage 5: Subordinate Clauses & Complex Structures

Learners begin processing embedded clauses and subordinate structures.

  • Je pense qu’il viendra demain. (I think that he will come tomorrow.) → (Subordination)
  • Le livre que j’ai lu est intéressant. (The book that I read is interesting.) → (Relative clause)
  • Si j’avais le temps, je voyagerais. (If I had time, I would travel.) → (Conditional sentences)

Key Characteristics:

 Learners can link ideas in longer sentences.

They produce relative clauses, conditionals, and reported speech.

Errors in conjugation and agreement still occur.


Stage 6: Full Processing of Advanced Structures

At this final stage, learners acquire full sentence reordering, advanced agreement, and complex clauses.

  • Le professeur dont je t’ai parlé est ici. (The teacher whom I told you about is here.) → (Relative pronoun “dont”)
  • Si j’avais su, je serais venu plus tôt. (If I had known, I would have come earlier.) → (Past conditional)
  • Il faut que tu viennes demain. (You must come tomorrow.) → (Subjunctive mood usage)

Key Characteristics:

Learners master subjunctive, advanced conditionals, and complex reordering.
Proficiency level approaches native-like fluency.
Errors become minor and infrequent.


How Learnability Theory Guides French Teaching

  1. Teach in the right order:
    • Start with simple sentences before introducing agreement rules.
    • Teach basic questions (Tu as un chien ?) before inversion (As-tu un chien ?).
    • Introduce object pronouns before relative clauses.
  2. Respect processing constraints:
    • How many cognitive steps does the execution of a specific grammar structure involve? If working memory can only process about 4-6 items in younger learners and 5 to 9 in 16+ learners, will they cope with the cognitive load posed by the target structure?  

Example 1: the perfect tense with ETRE involves 6 or 7 mental operations/substeps. Are we sure that the target language learners can process all of them?

Example 2: A beginner won’t use the subjunctive correctly (Il faut que tu viennes) if they haven’t mastered basic verb conjugations first.

  1. Trying to teach complex tenses (e.g., Si j’avais su, je serais venu) before learners are ready leads to confusion.
  2. Provide appropriate input:
    • At early stages, focus on high-frequency structures (e.g., present tense, SVO order).
    • Gradually introduce complex grammar once learners can process simpler structures.

In considering the cognitive load posed by the target grammar structures in an attempt to sequence them in your curriculum in a easier to harder flow, it is key to consider the challenges posed by each of them, summarised in the table below, from my workshop on Grammar Instruction.

Why do younger learners find learning grammar challenging?

Younger second language (L2) learners often struggle with grammar acquisition due to cognitive, linguistic, and developmental factors. Unlike vocabulary, which they can pick up more naturally, grammar rules require abstract thinking, memory, and metalinguistic awareness, which are still developing in young learners. Below are the key reasons why younger L2 learners find grammar learning challenging:

1. Limited Cognitive development

1.1 Abstract Thinking is Not Fully Developed

  • Grammar rules involve abstract concepts (e.g., verb conjugations, subject-verb agreement, and tenses).
  • Piaget’s (1954) Cognitive Development Theory states that children under 11 operate in the concrete operational stage, meaning they struggle with abstract rules.
  • Older learners (adolescents and adults) use formal operational thinking (age 12+), making them better at understanding syntactic structures

1.2 Working Memory Limitations

  • Younger children have smaller working memory capacity (Gathercole & Alloway, 2008), meaning they struggle to hold and process multiple grammar rules at once.Older learners can store and manipulate grammatical structures more efficiently.
  • Older learners can store and manipulate grammatical structures more efficiently.

2. Lack of Metalinguistic Awareness

2.1 Younger learners do not consciously analyze language

  • Metalinguistic awareness is the ability to think about and manipulate language structures, which develops with age.
  • Studies (Bialystok & Barac, 2012) show that younger L2 learners focus more on communication rather than explicit grammar rules.

2.2 Struggle with Error Correction

  • Because of their lack of metalinguistic awareness and limited levels of LAA* (language analytical ability), younger learners do not benefit much from error correction
  • Older learners can self-correct grammatical mistakes by applying rules.
  • Younger children often repeat mistakes without realizing why they are incorrect.

3. Implicit vs. Explicit Learning Differences

  • Younger learners rely more on implicit learning (unconscious absorption of rules), while older learners benefit from explicit instruction. That is why using EPI, which relies heavily on structural priming (subconscious learning of grammar) is so powerful at primary.
  • Grammar requires explicit learning (Ellis, 2006), and young children struggle with rule-based learning since they primarily learn through exposure and repetition rather than conscious analysis.

4. Difficulty Generalizing Grammar Rules

4.1 Overgeneralization of rules

  • Younger L2 learners tend to overgeneralize grammatical patterns (e.g., saying “goed” instead of “went”).
  • This happens because they rely on patterns rather than understanding exceptions, which is common in early L1 and L2 learning (Pinker, 1999).Grammar Rules Change Based on Context

4.2 Grammar Rules Change Based on Context

  • Some grammatical structures vary depending on context (e.g., past tense in regular vs. irregular verbs).
  • Young learners struggle to apply rules flexibly in different contexts.

5. Limited Input and Reinforcement

5.1 Grammar exposure in early L2 learning is inconsistent

  • Young learners often hear simplified language (e.g., teachers and caregivers speaking in basic sentences).
  • Without frequent rich input, grammar structures take longer to acquire.

5.2 Grammar Rules Change Based on Context

  • Young learners struggle to apply rules flexibly in different contexts.
  • Some grammatical structures vary depending on context (e.g., past tense in regular vs. irregular verbs).

6.6. Pronunciation and Phonological Constraints Affect Grammar Learning

Syntax and morphology take longer to develop, especially in languages with complex word order (e.g., German, Russian).

Younger learners focus more on pronunciation and vocabulary, delaying grammar acquisition.

7. Limited Literacy Skills

Reading and writing skills support grammar acquisition

  • Older learners benefit from written reinforcement (e.g., textbooks, grammar exercises).
  • Younger learners, especially pre-literate children, lack exposure to written forms of grammar.

Conclusions

Younger L2 learners acquire vocabulary naturally but struggle with grammar because it requires abstract thinking, rule analysis, and memory capacity.

Older learners are better at learning grammar explicitly due to stronger cognitive abilities and metalinguistic awareness.

Young learners need repeated exposure, interactive learning, and implicit reinforcement rather than direct rule-based teaching. The exposure must be as multimodal as possible and .cognizant of the TAP (transfer appropriate processing) phenomenon, i.e. the context-dependency of memory (e.g. a grammar rule learnt through rehearsing a song many times over is not likely to be transferred to other contexts and tasks, but is going to sty confined to that song).

The grammar content needs to be light and informed by the learners’ readiness to acquire the target structures.

The students should not be asked to produce language too soon and in the contexts of tasks that challenge them beyond their current level of competence. This applies to learners of any age, but it is particularly true of primary-age students, as they are low monitors of grammar accuracy.

EPI (Extensive Processing Instruction) is very powerful in this respect as it capitalizes on subsconscious learning through syntactic priming, i.e. where exposure to a specific sentence structure increases the likelihood of using the same structure in subsequent speech or writing. It occurs in both first (L1) and second language (L2) learning, reinforcing grammatical patterns through repetition. For example, if someone hears “The cat was chased by the dog” (passive voice), they are more likely to later produce another passive sentence like “The book was read by the student.” Studies (Bock, 1986) show syntactic priming helps L2 learners internalize complex structures, aiding fluency and reducing cognitive load during sentence formation.

Here’s a summary of the above points:

*Language Analytical Ability (LAA) refers to the cognitive skill that enables learners to analyze, understand, and manipulate linguistic structures in a second language (L2). It is crucial for explicit grammar learning, problem-solving in language acquisition, and recognizing language patterns.

Why do primary children find it more difficult to learn and retain vocabulary than secondary students?

Introduction

There is a myth whereby acquiring a new language for younger chidren is easier. Nothing could be further from the truth, in instructed second language acquisition settings, at least, especially with one hour or less a week. In fact, encoding in long-term memory (LTM) is slower for younger children due to biological, cognitive, and experiential factors that influence memory formation and retrieval. Tran Thi Tuyet, in her 2020 article “The Myth of ‘The Earlier the Better’ in Foreign Language Learning or the Optimal Age to Learn a Foreign Language,” contends that the assumption “the earlier, the better” in foreign language learning is often misleading. She suggests that investing too early in children’s foreign language education may lead to suboptimal outcomes if not appropriately aligned with effective teaching methodologies and the child’s developmental readiness.

Additionally, a publication by the Centre for Educational Neuroscience titled “Avoiding the Hype Over Early Foreign Language Teaching” emphasizes the lack of substantial evidence supporting the notion that early foreign language instruction guarantees superior language proficiency later in life. The paper advises education professionals to critically assess the evidence before implementing early language programs, suggesting that premature introduction without proper pedagogical support might not yield the desired benefits

Research indicates that students who begin foreign language study in secondary school can often catch up to, or even surpass, peers who started in primary school. This suggests that the advantages of early language learning might not be as significant as commonly believed.

A study by Muñoz (2006) examined the long-term proficiency of early, middle, and late starters in foreign language learning. The findings revealed that older learners often progress more rapidly than younger learners, leading to comparable or even superior proficiency levels over time. This challenges the assumption that an earlier start guarantees greater language proficiency

Why do younger learners struggle more than older learners when it comes to L2-word learning?

Here’s a detailed breakdown:


1. Brain Maturation & Neural Development

  • Hippocampal Immaturity:
    • The hippocampus, which plays a central role in memory formation, is still developing in younger children (Gómez & Edgin, 2016).
    • Studies using fMRI have shown that hippocampal connections strengthen with age, leading to more efficient encoding and retrieval of information in older children (Ghetti & Bunge, 2012).
  • Synaptic Pruning & Myelination:
    • Younger children have a higher number of neural connections, but they are not as efficient as in older learners.
    • Synaptic pruning (elimination of weaker neural connections) and myelination (fatty sheath around neurons improving transmission speed) increase memory efficiency with age (Paus et al., 2001).

2. Working Memory Limitations

  • Younger children have a lower working memory capacity, which affects how much information they can hold before transferring it to long-term storage (Gathercole & Alloway, 2008).
  • Miller’s Law (1956) suggests an average working memory span of 7±2 items in adults, while younger children may only retain half that – with great variation amongst children in the same class.
  • Reduced chunking ability: Older children and adults use chunking (grouping information into meaningful units), which improves memory retention, whereas younger children struggle with this strategy (Schneider et al., 2011).

3. Underdeveloped Memory Strategies

  • Lack of Rehearsal Techniques:
    • Older children use active rehearsal (repeating words mentally) to strengthen memory storage, while younger children often fail to engage in spontaneous rehearsal (Flavell, 1970).
  • Limited Use of Mnemonics & Organization Strategies:
    • Older children categorize words by meaning (e.g., grouping “apple, banana, orange” as “fruits”), making retrieval easier.
    • Younger children lack this organizational ability, leading to weaker memory recall (Bjorklund & Jacobs, 1985).

4. Episodic Memory & Schema Development

  • Less Developed Episodic Memory:
    • Episodic memory (memories of specific events) is less developed in younger children, making it harder for them to link new words to prior experiences (Nelson, 1993).
  • Lack of Established Schemas:
    • Older children have more structured knowledge frameworks (schemas), allowing them to fit new vocabulary into existing memory networks.
    • Younger children lack these schemas, making encoding slower and retrieval less efficient (Chi, 1978).

5. Slower Consolidation of Memories

  • Sleep & Memory Consolidation Differences:
    • Deep sleep (slow-wave sleep) aids in memory consolidation, but research suggests memory-related sleep processes are less efficient in young children (Wilhelm et al., 2012).
    • Older children and adults show better overnight retention of new information than younger children.

6. Less Exposure & Repetition Opportunities

  • Older children encounter more words in different contexts (reading, conversations, writing), reinforcing long-term retention.
  • Younger children need more repetitions to store words permanently in LTM (Webb, 2007).

Summary Table: Why retaining new vocabulary is slower in younger children

FactorYounger Children (Ages 7-11)Older Children (Ages 12+)
Brain MaturationHippocampus still developingMore mature hippocampal function
Neural EfficiencyLess myelination, weaker synaptic pruningFaster neural transmission
Working MemoryLimited (4-6 items)Stronger (7-9 items)
Rehearsal StrategiesRarely use spontaneous rehearsalRegularly use repetition
Mnemonics & CategorizationStruggle with categorizationGroup and organize words for better recall
Episodic MemoryLess developedStronger recall of contextual experiences
Schema DevelopmentLack structured knowledge networksUse existing schemas to reinforce learning
Sleep & Memory ConsolidationWeaker overnight memory retentionMore efficient sleep-based memory strengthening
Exposure to VocabularyLimited real-world exposureMore frequent and varied word encounters

Key Takeaways & pedagogical implications

  • Younger children CAN encode words into long-term memory, but it takes more repetitions and contextual learning.
  • Older children process and store words more efficiently due to better neural pathways, stronger working memory, and more effective learning strategies.
  • Memory strategies like rehearsal, categorization, and spaced repetition can help younger learners retain words more effectively.
  • Younger children are led by the principle of pleasure more than older children are, hence, their motivation to learn is more dependant on fun
  • Multimodal learning is key with language learners based on what we know about their working memory limitations. The more ways a word gets encoded in their younger brain, the more neural associations are created, which leads to longer term retention
  • Training in memorization techniques – as long as you make it fun – pays enormous dividends