How the brain acquires foreign language grammar – A Skill-theory perspective

Caveat: Being an adaptation of a section of a chapter in my Doctoral thesis, this is a fairly challenging article which may require solid grounding in Applied Linguistics and Cognitive Theories of Skill Acquisition.

1. L2-Acquisition as skill acquisition: the Anderson Model

The Anderson Model, called ACT* (Adaptive Control of Thought), was originally created as an account of the way students internalise geometry rules. It was later developed as a model of L2-learning (Anderson, 1980, 1983, 2000). The fundamental epistemological premise of adopting a skill-development model as a framework for L2-acquisition is that language is considered as governed by the same principles that regulate any other cognitive skill. A number of scholars such as Mc Laughlin (1987), Levelt (1989), O’Malley and Chamot (1990) and Johnson (1996), have produced a number of persuasive arguments in favour of this notion.

Although ACT* constitutes my espoused theory of L2 acquisition, I do not endorse Anderson’s claim that his model alone can give a completely satisfactory account of L2-acquisition. I do believe, however, that it can be used effectively to conceptualise at least three important dimensions of L2-acquisition which are relevant to type of Explicit MFL instructional approaches implemented in many British schools: (1) the acquisition of grammatical rules in explicit L2-instruction, (2) the developmental mechanisms of language processing and (3) the acquisition of Learning Strategies.


 Figure 1: The Anderson Model (adapted from Anderson, 1983)




The basic structure of the model is illustrated in Figure 1, above. Anderson posits three kinds of memory, Working Short-Term Memory (WSTM), Declarative Memory and Production (or Procedural) Memory. Working Memory shares the same features discussed in previous blogs (see ‘Eight important facts about Working Memory’) while Declarative and Production Memory may be seen as two subcomponents of Long-Term Memory (LTM). The model is based on the assumption that human cognition is regulated by cognitive structures (Productions) made up of ‘IF’ and ’THEN’ conditions. These are activated every single time the brain is processing information; whenever a learner is confronted with a problem the brain searches for a Production that matches the data pattern associated with it. For example:


IF the goal is to form the present perfect of a verb and the person is 3rd singular/


THEN form the 3rd singular of ‘have’


IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed /


THEN form the past participle of the verb


The creation of a Production is a long and careful process since Procedural Knowledge, once created, is difficult to alter. Furthermore, unlike declarative units, Productions control behaviour, thus the system must be circumspect in creating them. Once a Production has been created and proved to be successful, it has to be automatised in order for the behaviour that it controls to happen at naturalistic rates. According to Anderson (1985), this process goes through three stages: (1) a Cognitive Stage, in which the brain learns a description of a skill; (2) an Associative Stage, in which it works out a method for executing the skill; (3) an Autonomous Stage, in which the execution of the skill becomes more and more rapid and automatic.


In the Cognitive Stage, confronted with a new task requiring a skill that has not yet been proceduralised, the brain retrieves from LTM all the declarative representations associated with that skill, using the interpretive strategies of Problem-solving and Analogy to guide behaviour. This procedure is very time-consuming, as all the stages of a process have to be specified in great detail and in serial order in WSTM. Although each stage is a Production, the operation of Productions in interpretation is very slow and burdensome as it is under conscious control and involves retrieving declarative knowledge from LTM. Furthermore, since this declarative knowledge has to be kept in WSTM, the risk of cognitive overload leading to error may arise.


Thus, for instance, in translating a sentence from the L1 into the L2, the brain will have to consciously retrieve the rules governing the use of every single L1-item, applying them one by one. In the case of complex rules whose application requires performing several operations, every single operation will have to be performed in serial order under conscious attentional control. For example, in forming the third person of the Present perfect of ‘go’, the brain may have to: (1) retrieve and apply the general rule of the present perfect (have + past participle); (2) perform the appropriate conjugation of ‘have’ by retrieving and applying the rule that the third person of ‘have’ is ‘has’; (3) recall that the past participle of ‘go’ is irregular; (4) retrieve the form ‘gone’.


Producing language by these means is extremely inefficient. Thus, the brain tries to sort out the information into more efficient Productions. This is achieved by Compiling (‘running together’) the productions that have already been created so that larger groups of productions can be used as one unit. The Compilation process consists of two sub-processes: Composition and Proceduralisation. Composition takes a sequence of Productions that follow each other in solving a particular problem and collapses them into a single Production that has the effect of the sequence. This process lessens the number of steps referred to above and has the effect of speeding up the process. Thus, the Productions


P1 IF the goal is to form the present perfect of a verb / THEN form the simple present of have


P2 IF the goal is to form the present perfect of a verb and the appropriate form of ‘have’ has just been formed / THEN form the past participle of the verb would be composed as follows:


P3 IF the goal is to form the present perfect of a verb / THEN form the present simple of have and THEN the past participle of the verb


An important point made by Anderson is that newly composed Productions are weak and may require multiple creations before they gain enough strength to compete successfully with the Productions from which they are created. Composition does not replace Productions; rather, it supplements the Production set. Thus, a composition may be created on the first opportunity but may be ‘masked’ by stronger Productions for a number of subsequent opportunities until it has built up sufficient strength (Anderson, 2000). This means that even if the new Production is more effective and efficient than the stronger Production, the latter will be retrieved more quickly because its memory trace is stronger.


The process of Proceduralisation eliminates clauses in the condition of a Production that require information to be retrieved from LTM memory and held in WSTM. As a result, proceduralised knowledge becomes available much more quickly than non-proceduralised knowledge. For example, the Production P2 above would become


IF the goal is to form the present perfect of a verb


THEN form ‘have’ and then form the past participle of the verb


The process of Composition and Proceduralisation will eventually produce after repeated performance:


IF the goal is to form the present perfect of ‘play’/ THEN form ‘ has played’


For Anderson it seems reasonable to suggest that Proceduralisation only occurs when LTM knowledge has achieved some threshold of strength and has been used some criterion number of times. The mechanism through which the brain decides which Productions should be applied in a given context is called by Anderson Matching. When the brain is confronted with a problem, activation spreads from WSTM to Procedural Memory in search for a solution – i.e. a Production that matches the pattern of information in WSTM. If such matching is possible, then a Production will be retrieved. If the pattern to be matched in WSTM corresponds to the ‘condition side’ (the ‘if’) of a proceduralised Production, the matching will be quicker with the ‘action side’ (the ‘then’) of the Production being deposited in WSTM and make it immediately available for performance (execution). It is at this intermediate stage of development that most serious errors in acquiring a skill occur: during the conversion from Declarative to Procedural knowledge, unmonitored mistakes may slip into performance.


The final stage consists of the process of Tuning, made up of the three sub-processes of Generalisation, Discrimination and Strengthening. Generalisation is the process by which Production rules become broader in their range of applicability thereby allowing the speaker to generate and comprehend utterances never before encountered. Where two existing Productions partially overlap, it may be possible to combine them to create a greater level of generality by deleting a condition that was different in the two original Productions. Anderson (1982) produces the following example of generalization from language acquisition, in which P6 and P7 become P8


P6 IF the goal is to indicate that a coat belongs to me THEN say ‘My coat’


P7 IF the goal is to indicate that a ball belongs to me THEN say ‘My ball’


P8 IF the goal is to indicate that object X belongs to me THEN say ‘My X’


Discrimination is the process by which the range of application of a Production is restricted to the appropriate circumstances (Anderson, 1983). These processes would account for the way language learners over-generalise rules but then learn over time to discriminate between, for example, regular and irregular verbs. This process would require that we have examples of both correct and incorrect applications of the Production in our LTM.


Both processes are inductive in that they try to identify from examples of success and failure the features that characterize when a particular Production rule is applicable. These two processes produce multiple variants on the conditions (the ‘IF’ clause(s) of a Production) controlling the same action. Thus, at any point in time the system is entertaining as its hypothesis not just a single Production but a set of Productions with different conditions to control the action.

Since they are inductive processes, Generalization and Discrimination will sometimes err and produce incorrect Productions. As I shall discuss later in this chapter, there are possibilities for Overgeneralization and useless Discrimination, two phenomena that are widely documented in L2-acquisition research (Ellis, 1994). Thus, the system may simply create Productions that are incorrect, either because of misinformation or because of mistakes in its computations.
ACT* uses the Strengthening mechanism to identify the best problem-solving rules and eliminate wrong Productions. Strengthening is the process by which better rules are strengthened and poorer rules are weakened. This takes place in ACT* as follows: each time a condition in WSTM activates a Production from procedural memory and causes an action to be deployed and there is no negative feedback, the Production will become more robust. Because it is more robust it will be able to resist occasional negative feedback and also it will be more strongly activated when it is called upon:
The strength of a Production determines the amount of activation it receives in competition with other Productions during pattern matching.Thus, all other things being equal, the conditions of a stronger Production will be matched more rapidly and so repress the matching of a weaker Production (Anderson, 1983: 251)
Thus, if a wrong Interlanguage item has acquired greater strength in a learner’s LTM than the correct L2-item, when activation spreads the former is more likely to be activated first, giving rise to error. It is worth pointing out that, just as the strength of a Production increases with successful use, there is a power-law of decay in strength with disuse.
2.Extending the model: adding a ‘Procedural-to-Procedural route’ to L2-acquisition
One limitation of the model is that it does not account for the fact that sometimes unanalysed L2-chunks of language are through rote learning or frequent exposure. This happens quite frequently in classroom settings, for instance with set phrases used in everyday teacher-to-student communication (e.g. ‘Open the book’, ‘Listen up!’). As a solution to this issue Johnson (1996) suggested extending the model by allowing for the existence of a ‘Procedural to Procedural route’ to acquisition whereby some unanalysed L2-items can be automatised with use, ‘jumping’, as it were, the initial Declarative Stage posited by Anderson.
This means that teaching memorised unanalysed chunks can work in synergy with explicit language teaching, as happens in my approach. See my blog post on how I teach lexicogrammar.


12 thoughts on “How the brain acquires foreign language grammar – A Skill-theory perspective

  1. Your approach does a good job of explaining fossilized errors and why they are so difficult to eliminate. It doesn not seem to account for the manner in which children and many adults are able to acquire L2 without gong through the “initial Declarative Stage” at all, in which All L2 items are unanalysed and automatised through frequent exposure.


    • As I have briefly explained in the last paragraph, one has to allow for a procedural-to-procedural route as well. In this case, the brain will store formulaic language (unanalyzed chunks) and use them until they are stenghtened and acquired unless negative feedback from the environment throws the spanner in the works. I did not elaborate too much in the article because I thought it was implicit in the last paragraph. Maybe I should have 😉 Thanks for your contribution; truly ‘spot on’ ! 🙂


  2. Since I use Comprehensible Input methods, the procedural to procedural route is the one that interests me. I’m curious why so many researchers seem determined to ignore Krashen’s explanations.

    Liked by 1 person

    • Mainly because nobody really finds his theory ‘scientific’. Although I do think that his work is very interesting and I agree with some of the things he says (love the Narrow Listening idea for example) the vast majority of the academic/research community has issues with his approach as it requires a ‘leap of faith’. If, as my friend Chris Stoltz and others allege, Krashen’s theories were indeed rooted in sound science and research, don’t you think that people would have a different attitude towards them? 🙂 I love reading his stuff and found a lot of the things he says fascinating, but Cognitive Psychology is all the rage now because it is objective neuroscience. Although it cannot explain every aspect of acquisition yet, it explains some; also, the fact that most neuroscientists nowadays agree that language skills are acquired like any other cognitive skill makes them rebuke Krashen’s theories. There is really no reason for language skills to be different from other cognitive skills in terms of acquisition and processing; the brain is efficient; it would not create another completely different system, just for language learning when it has an efficient system to acquire skills already in place. It would be a waste of cognitive space and resources. I have nothing against Krashen personally and if it works for you, I think you should stick to his approach. 🙂


      • Thanks for the thorough post, as always, Gianfranco! I hope this reply isn’t too muddled, as it addresses some points made in the post itself and ones made in the comments.

        The statement “Cognitive Psychology is all the rage now because it is objective neuroscience” is jarring to American ears because, here, cognitive psychology is often considered the “fuzzy” branch of psychology and would be contrasted with other approaches to psychology that have a closer relationship to neuroscience. In fact, a common characterization of cognitive psychology, at least as practiced in the US, is that it studies the “mind” as opposed to the brain. (The ACT-R model is a partial exception because of the degree to which it tries to connect it’s study of “mind” with the results of neurological research.)

        Cognitive linguistics here is very much the little brother of mainstream (Chomskyan) linguistics–there are perhaps five universities in the United States where one can do advanced study in cognitive linguistics, none of them “big name” universities–precisely because it takes an approach to language that most formal linguists consider less scientific, or at least misaligned with the results of scientific inquiry into language.

        And yet cognitive science is not incompatible with the Chomskyan models that inform Input-based approaches to SLA. Take, for instance, my alma mater, the University of Michigan, whose linguistics department is Chomskyan through and through, but which also has a cognitive science major (rare for undergraduate programs) founded and run by a senior linguistics professor; the cognitive science program considers the results of research in cognitive science very much in line with Chomskyan linguistics.

        As for negative attitudes toward Krashenʻs theories being an indication that they are unscientific, thatʻs like saying, “if McDonaldʻs really is unhealthy, why don’t people stop eating there?” People don’t necessarily act on research, even if they know about it, and institutional change moves at a glacial pace. In any case, the same logic can be applied to any other approach, since there is no single dominant approach in language teaching. In the United States, one of the reasons teachers are increasingly gravitating toward input-based methods is precisely that they have not been pleased with the results of skills-based methods.

        It is fascinating to see how these issues are sometimes split along national lines. In the US, Krashen is as popular as ever–perhaps more popular than ever, as his research has finally trickled down to a critical mass of classroom practitioners–and many linguists who take more “scientific” approaches see their research bearing out what Krashen hypothesized long ago. It’s worth noting here that Krashen’s research in the 70s and 80s was not “unscientific”; it just was empirical science as opposed to neuroscience. If Krashen’s research is unscientific because it is empirical rather than experimental, then we have to call DeKeyser’s research “unscientific,” too, to say nothing of discounting most social science research and theoretical research as “unscientific.”

        The ACT-R model is part of a valuable pursuit because of its attempts to unite cognitive psychology with neuroscience. Of course, it is based on prior commitments that aren’t themselves inspired by neuroscience, such as the deep commitment to declarative and procedural knowledge and the deep commitment that only a clearly unified theory of cognition can be accurate. The idea of language being a distinctive human faculty offends ACT-R from the start, because of ACT-R’s a priori belief that there should be no unique cognitive processes.

        Regarding the idea that the brain wouldn’t have different systems for different cognitive processes, note that a skills-based approach also needs to posit two systems: one for children and one for adults.

        Just a few thoughts, not terribly well organized. Thanks, Gianfranco, for your well-researched and kindly expressed points!

        Liked by 1 person

  3. But there are two systems. What I find fascinating about neuroscience is that so much of what I have read (not widely, but …) corresponds to what Krashen has been saying all these years. Daniel Kahneman talks about System I and System II, or hot cognition and cold cognition. Edward Slingerman makes the same kind of disctinction when he talks about Confucian “trying” and Laotian “Not trying” and makes the connection with modern neuroscience. I’ve read discussions of Krashen’s theories by “experts” whose only objection seems to be that his hypothesis hasn’t been proven. Then they go on to talk about other theories which have not been proven either but which they are more comfortable with because they favor cold cognition. Krashen recently said “There’s an easy way and a hard way, and the hard way doesn’t work.” That exactly mirrors my experience with students. Other methods worked only with intelligent, highly motivated students who were willing to undergo the “Anderson Model” that you describe so well, whqt Krashen calls Learning. I found that ordinary students could acquire language (without understanding the grammatical terms) and use it spontaneiously and correctly through repeated exposure in a compelling context. If you visit there is an impressive number of studies which support his theory. I’ve tried to discuss this in a post on my blog called “Comprehensible Input and the Noble Art of Horse-riding”, but basically, I feel strongly about it because I’ve seen that it is an approach which makes SLA possible for so many students who were considered failures with other methods.

    Liked by 1 person

    • This is it what I enjoy the most of blogging: the cognitive and emotional responses that it evokes. I am familiar with the research you mentioned and I am a frequent visitor of Stephen Krashen’s website. I have been teaching students of all abilities and ages; the Anderson model is a framework to explain the acquisition of grammar, but by no means a full fledged account of second language acquisition. Wish it was that simple. If I said to you that my method reflects 100 % Anderson’s or any other model for that matter, I would be lying. I have taken what I thought worked best from various methodologies and adapted it. And I am sure that to a certain degree every teacher does that. My methods work well with my students, in my context, especially with the less able, actually. Yours work well with yours, in your context.. I invite you to read all my other blogs, not just the one about rule acquisition to get a better picture. The article you make reference to may give the wrong picture. Will read your blog; thanks for sending the link through and for your very valuable comment. Wish you all the best. You sound like a very passionate and reflective teacher. The kind I like ! 🙂


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s