(Co-authored with Steve Smith and Dylan Vinales)
Last week, during a workshop on vocabulary learning that I delivered in my faculty, I carried out a little experiment with my colleagues which aimed at raising their awareness of the limitation of human working memory by making them experience cognitive overload (i.e. the inability of working memory to cope with a task due to excessive demand on their processing capacity).
The task was simple. One person had a list of twenty words and had to utter each word on the list, one by one ( first word 1, then word 2, followed by word 3, etc.) to their partner. The latter had to repeat at each round all the words read so far, rigorously in the same order as they had been read to them ( relying solely on their memory as they had not access to the list).
As expected, the vast majority of the participants started making mistakes after the fourth or fifth item. But there was an outlier: a Chinese lady, V., who could remember ten words, double, that is the group’s average. How had she achieved that?
She had used a mnemonic (a memory strategy). On hearing each word, she had associated it with an image and had built a narrative using each image and word. In other words, she had ‘anchored’ each word to an image that was meaningful to her and to a pattern that gave sense to the input she was receiving.
This well-known ‘trick’ does not make one more intelligent, nor does it point to a bigger Working Memory. What it does, though, is pointing to a mechanism that has enabled humans throughout evolution to overcome the limitations of their working memory and has a decisive role in L2 learning.
As discussed in previous blogposts of mine, Working Memory is very limited in focus and processing capacity, i.e. the minimal distraction causes us to ‘lose’ the data we are handling and it can only process four items at any one time. What is interesting is that monkeys’ working memory shares the same limitations, with a processing capacity of 3 to 4 items nearly identical to ours.
So why is it that monkeys are stuck where homo sapiens started off 150,000 years ago whist we are able to build rockets, transplant organs, clone animals and harness the power of the atom?
According to Cambridge Professor Daniel Bor in his 2012 fascinating book ‘The ravenous brain: how the new science of consciousness explains our insatiable search for meaning’, the human brain has managed to overcome the limitations of its pea-size processor (working memory) by chunking new data to existing brain structures using pattern recognition as the main learning strategy. As Professon Bor puts it,
Perhaps what most distinguishes us humans from the rest of the animal kingdom is our ravenous desire to find structure in the information we pick up in the world. We cannot help actively searching for patterns — any hook in the data that will aid our performance and understanding. We constantly look for regularities in every facet of our lives, and there are few limits to what we can learn and improve on as we make these discoveries. We also develop strategies to further help us — strategies that themselves are forms of patterns that assist us in spotting other patterns.
In simple terms, the brain applies the patterns available in our Long-Term Memory to interpret whatever we process (see, hear, feel, etc.) and make sense of it; if what we process successfully using those patterns is ‘new’, the brain ‘hooks’ it to existing structures in the brain and compresses it in chunks which it stores in Long-Term Memory.
The reason why this expands the processing capacity of working memory is that, when patterns are applied automatically, i.e. subconsciously, they by-pass working memory, thereby keeping the latter free for performing other operations. That is why we can multi-task when we operate in contexts which we are very familiar with, but not in others which are totally new to us. So, for instance, when we drive a car, we perform sequences of actions automatically so that we can focus on the road and traffic.
What is equally interesting, is that patterns are used by the brain not simply to process the information we are currently handling, but also to predict what will come next. So, for instance, imagine talking to a colleague you know quite well in a specific context; your brain will use behavioural patterns built during your previous interactions with that person not only to elicit from her body language, intonation, lexical choice what mood she is in, what her communicative intentions are, etc., but also to predict what she is likely to say or do next – all based on probability.
This happens linguistically too; when we hear a sentence, our brain uses patterns, both linguistic (phonological, grammatical, etc.) and situational (our previous experiences with similar contexts) to interpret each sentence we process and predict what word, phrase or sentence is going to come next very much like Google does when we type up our query in the search box (see figure 1, below).
This predictive process which happens subconsciously, hence at very high speed in the brain is called Lexical Priming.
Figure 1 – google search and the priming effect
Pattern recognition, Chunking and Priming have therefore one thing in common. Speeding up Working Memory processing capacity. Since, as Skehan’s (2009) diagram below (figure 2) clearly shows, these processes are central to language acquisition, our teaching must aim at fostering and facilitating them.
Figure 2 – Language operational mechanisms involving Working Memory
That pattern recognition must be central to L2 instruction is evidenced by a number of studies which assessed L2 learners at the beginning of a course simply based on their pattern recognition skills and found that high scores on this measure were a very strong predictor of success at post-test (see summary of one of such studies here).
2. Implications of the above for L2-language learning and teaching
Please note that for reasons of space I will not delve as much as I would like to into the classroom implementations of the principles discussed below, reserving to do so in my next post.
2.1 What I mean by ‘patterns’
The above implies that effective language learning – from a processing perspective – is mainly about rapidly and accurately applying patterns in the understanding of L2 input and the production of L2 output.
Please note that patterns are not simply what we refer to as the phonological, morphological and syntactic rules of the language, but also the multi-word constructions with high generative power that we employ as chunks to express various communicative functions (e.g. ‘I don’t think’ that’ ‘I want you to’, ‘I am not sure if ’, ‘The worst/best thing is…’). This is important, because every teacher claims to teach patterns, but they usually refer to verb endings, agreement, conjugation and the likes.
Hence, effective L2 learning is not simply about learning the rules of grammar and phonology, but also and more importantly about learning how to break down the language into useful multi-word chunks of language (useful = with high surrender communicative value).
Learning single words, from word lists, e.g. the ones found in textbooks or that many teachers upload to Quizlet or Memrise is a clumsy and inefficient way of learning a language as Working Memory can only accomodate 4 items at any given time for only a handful of seconds. By learning 4 chunks made up of 4 words each instead of 4 single words, the brain is still processing 4 items but working with 16 words at the same time.
In first language acquisition children pick up the language through such chunks, after much exposure to them through caregivers’ talk. The grammar that glues the chunks together is not learnt by them explicitly but implicitly. In this sense, in first language acquisition and in L2 learning in immersive environments (e.g. in an international school), the dichotomy between grammar and vocabulary learning does not exist. Children learn how to piece chunks of languages together in the pursuit of the communicative goals they need to achieve not because their parents teach them grammar rules.
L2 acquisition in non-immersive environments is evidently different, of course, as to expect students to pick up a vast array of chunks and patterns implicitly through one or two hours’ exposure per week would be preposterous.
2.2 Implication 1 From authentic target language to patterned model language
The first major implication for L2 instruction is that the teaching of patterns must take a central role in L2 instruction from the very early stages of teaching the target language. This in turn entails providing novice learners with input which is highly patterned and contains repeated occurrences of useful chunks with high generative power, very much like caregivers do when they deal with toddlers in first language acquisition (e.g. through nursery rhymes).
This requires a shift, from teaching the target language to teaching a model language – to use Michael Lewis’ (1993) famous distinction – which is not necessarily ‘authentic’ (in that it does not 100% mirror real-life L2 usage) but serves the purpose of sensitizing our students to patterns through much repetition, redundancies and careful selection of highly generative chunks.
This does not mean that one has to rule out the use of authentic material; what it means is that before getting to ‘authentic’ texts the learner must have seriously routinized – at least receptively – a repertoire of patterns and chunks which will allow them to come to grips with the less linguistically predictable and more lexically and syntactically complex ‘authentic texts’. No point using aural or written input that contains cognitive obstacles which will ultimately hinder learning.
It is evident that one should select for teaching high-frequency chunks as much as possible. This will render the model language a closer approximation to the target language or at least will make the transition from the former to the latter easier.
2.3 Implication 2 – Chunks over single words
Like I said above, chunks have higher surrender value and more generative power than single lexical items. Moreover, since Working Memory can only process 4 items simultaneously, regardless if one item equals one word or four or five, teaching chunks makes learning more efficient in terms of cognitive load.
This does not mean that we should not teach single words at all any more. However, starting with chunks does make more sense. So, for instance, one may start with ‘I would like to travel to Spain’ and then subsequently teach the L2 names of countries as single words in order to enhance the generativity of that chunk and/or teach alternatives to ‘travel’ such as ‘go’, ‘drive’, ‘fly’, ‘bike’, etc.
Another advantage of teaching through chunks is that many mistakes with less salient items/rules (e.g. articles and prepositions) can be avoided because such items are learnt as part of the chunks themselves. Think about prepositions before the infinitive in French, how much easier it would be for your students to learn them if you taught those verbs in chunks including the preposition from day one, e.g. je vais commencer à faire mes devoirs, je vais commencer à jouer, etc.
2.4 Implication 3 – Grammar as subordinate to the teaching of chunks and functions
If we do believe that chunking input based on the core patterns we intend to impart on our students is the main priority of L2 teaching, then grammar does still play an important role, but one that it is subordinate to vocabulary teaching, i.e.: to add generative power to the chunks we set out to teach. Example: if I teach the French chunk : ‘je veux que tu + subjunctive (I want you to…) I will need to teach the conjugation of ‘vouloir’ (to want) in the present indicative and subjunctive of French verbs for that chunk to be used with subjects other than ‘I’ thereby acquiring high surrender value.
If our espoused teaching methodology is Communicative Language Teaching, it only makes sense that the chunks we teach are selected and grouped based on communicative functions (e.g. Accepting / Rejecting, Advising & Suggesting, Agreeing / Disagreeing, Approving / Disapproving, etc.). Unlike what other scholars advocate, I am not opposed to teaching functions and chunks within a specific topic, as having a unifying theme does facilitate retention and allows for a lot of semantic associations within the target word-set.
UK Modern Language textbooks do pay lip service to communicative-functions and patterns teaching but in actual fact they rarely do and focus mainly on grammar and discrete words at the expense of chunks. The Expo and Studio coursebooks, very popular in England, are appalling in this respect.
2.5 Implication 4 – Words’ collocational behaviour as important as grammar
Another major way in which we can enhance the generative power of chunks is by mastering the collocation of words. This is self-evident, as the wider the range of nouns we can use a given verb in a specific chunk with, the wider the range of communicative contexts we will be able to use that chunk in. Hence, the need for teaching collocations in our daily practice as much and often as possible.
Michael Lewis is the greatest advocate of teaching collocations and I do agree with him that, especially considering the recent changes in the English and Wales syllabus, this dimension of vocabulary teaching is by far the most important. Sadly, however, this is another area which is grossly neglected by most of the Modern Language books currently available on the market.
2.6 Implication 5 – Extensive receptive exposure to patterns as crucial
Masses of research indicate clearly that extensive exposure to phonological, collocational, morphological and syntactic patterns does sensitise learners to them. Unlike what is common practice in many modern language classrooms these days, students should process the target chunks/patterns as extensively as possible before having a go at deploying them in oral or written production. This is a point I have made in many posts of mine so I will not elaborate any further.
2.7 Implication 6 – Comprehensible input as a must for pattern detection and acquisition
Patterns are more likely to be noticed and acquired when they occur in texts which are highly accessible by the target students. This translates in providing students which texts which are not only highly patterned but also whose linguistic content, as a rule of thumb is 90-95 % familiar to the learners.
2.7 Implication 7 – Pushed output essential in recycling
‘Pushed output’ oral and written activities are tasks that allow the teacher total control over student output. Hence, they are crucial in order to ensure that every student has plenty of opportunities to recycle the target chunks. Role plays, translations and communicative drills are very easy to prepare and very effective in this respect.
2.8 Implication 8 – Autonomous learning of patterns as the ultimate foundation for successful L2 life-long learning
Students must become effective pattern-recognizers and pattern-deployers. This does not merely mean emphasizing patterns in our input, but also developing the following skills:
- the ability to autonomously identify patterns
- autonomously extract the rule governing the usage of those patterns
- autonomously experiment with those patterns
As advocated in my post ‘Why we should change our approach to grammar teaching’ this entails getting the students to inductively work out the grammar or phonological patterns from the input we provide, and, after much guided practice aimed at routinizing the patterns in controlled contexts, give them plenty of opportunities to experiment with them in familiar and less familiar contexts.
2.9 Use the first language to spot differences between the L1 and L2
It is natural for L2 learners to use the first language as a starting point for their hypotheses as to how the target language works. To discourage that, as many suggests, by banning the first language from the language classroom is a real waste. Emphasizing the differences or similarities between the two languages in terms of grammatical, lexical and phonological patterns is a must, in my opinion, as it gives our students a marked cognitive advantage.
3. My approach to teaching chunks
This is the approach I use in teaching chunks/patterns in a nutshell:
- Present chunks – I do this in sentences orally by using ‘sentence builders’ or other techniques (see post here)
- Provide lot of exposure through listening and reading (e.g. through narrow reading and listening). Note: there is no harm in not using the target chunks in oral or written production until the second lesson after first introducing them.
- Get the students to ‘unpack’ the chunk (e.g. through inductive grammar tasks)
- Practise the chunk in interpersonal writing (e.g. online conversation with peers using platforms such as Edmodo) or micro-writing (e.g. teacher asks questions, students respond in writing on mini-boards)
- Highly controlled oral practice (communicative drills eliciting use of target chunks)
- Semi-structured oral practice (e.g. interviews, surveys, picture tasks, find-someone who, role-plays)
- Free oral practice – it is here that the students are pushed to experiment
It goes without saying that, besides the points made above, all the other principles I laid out in my previous posts on vocabulary teaching (here) ought to inform the teaching of chunks, too. The most important being
- aim at automaticity in recognition and production
- prioritise deep processing (creating semantic association) over shallow processing (mere repetition)
- provide intensive and extensive recycling within the three months after first teaching
- hook new input to old material in terms of meaning, morphology and sound patterns
- make the input distinctive (compelling input)
The human brain is a highly sophisticated ‘computing machine’ that handles masses of data every day. However, its processor, working memory, has extremely limited processing capacity. Chunking data by means of patterns has allowed the human brain throughout evolution to overcome the limitations of working memory. Hence, we may consider pattern recognition as possibly the most important skill in the processing and learning of any information.
Humans need to see patterns in everything they see or hear. The same applies to language learners. Language learners who are not provided with patterns or other heuristics which help them make sense of the target information experience frustration and demotivation and use rote learning as the last resort. Nothing wrong with rote learning, provided that it is supported by an understanding of the underlying structure of what one learns and it is retained in the long term; but this is often not the case.
In this article I have advocated the importance of giving prominence to patterns and chunking over the teaching of single words or discrete grammar rules. New lexis should be taught in chunks, as this gives working memory a significant cognitive advantage; grammar should be taught to serve the teaching of chunks, to help the students unpack them and discover and later experiment with their full generative power. The Mastery of words’ collocation is pivotal too in enhancing the generative power of chunks, but it is sorely neglected in current modern language pedagogy.
Last, but not least, we should train students to detect and experiment with chunks as much and as often as possible by themselves, with little input from us, after sensitizing them to their existences through masses of comprehensible highly-patterned input. Inductive grammar learning is therefore a pedagogic must, in order to forge life-long learners who can effectively acquire languages autonomously in the real world.