Saturday, August 29, 2015

♫ Summer Running: Not Very Fast! / Summer Running: Pain in my Ass! ♫


Every couple of days I force myself to go outside and run about 2-4 miles. I do not enjoy it. It makes me feel like I am dying every time; I gasp and wheeze and, even after showering I stay uncomfortably sweaty for a few hours. Worse still, I do not feel "energized" or whatever other vital sensations people claim to derive from exercise; if anything, I feel especially fatigued afterwards, and this only gets more pronounced as the day progresses. However, I have convinced myself that the benefits of cardiovascular exercise outweigh its many miseries; I will go into my whys and wherefores later and try to convince you too (it could just be that I am an insane person); but to start, I want discuss the 'how' and the 'what'.

I started keeping track of this gruelling ordeal using an android app (RunKeeper); it uses GPS and links up with Google fit and is a terrible invasion of my privacy that has probably somehow already sent my info to every extant insurance company and devastated my future premiums. Indeed, it's probably also incremented with each new symptom-related google search ("knees hurt a lot", "gasp and wheeze", "how much sweating is normal" etc). Fact is, information pertaining to my health now exists in the ether, and with supply, demand, and end-user licensing agreements being what they are, someone savvy can get it if they want it badly enough; still though, like blogger, the app is awfully convenient. I tried a couple others (that didn't look as eager to sell your soul) and found them to be complete shit, functionally. Runkeeper is good at what it does; it currently operates with a freemium model, and evidently if you pay a little you get better stats. I used the basic free version and just manually entered everything into a spreadsheet-- it took less than a half-hour.

I started running in late March, but I didn't really seriously commit until June (see histogram). Since then, across 44 different running events, I have travelled 103.72 miles and wasted 13 hours and 22 minutes doing so. There are two basic routes I would run: a short route (~1.7 miles) and a long route (~3.4 miles).

So far, my average speed on the short run is 7:20/mile (440 seconds) with a standard deviation of 26 seconds, while my average speed on the long route is 7:45/mile (464 seconds) with a standard deviation of 20 seconds. On my fastest, I averaged 6:55/mile for 1.7 miles (update 8/30: new best time of 6:48/mile for 1.7 miles). On my slowest, I averaged 8:25/mile for 3.4 miles. Here's a graph showing my improvement over time.


Significant improvement over time, which was expected. A more interesting question is whether my improvement was greater for short runs or long runs.

Separate regression equations were fit for both long and short runs:

AveragePace(Short)= 471.647 - 1.68*(RunOrder) 
AveragePace(Long)= 506.28 - 1.47*(RunOrder)

A quick test of differences between slopes would be this:
Z = (b1-b2)/Sqrt((SEb1)^2 + (SEb2)^2)

This gives:
> (-1.6882- -1.4735)/sqrt(.2348^2+.2595^2)
[1] -0.6135005
> pnorm(ans())
[1] 0.2697727

So nope, slopes don't differ.

R-squares were large (.63 and .74, respectively), indicating that a significant amount of the variance in pace measurements are attributable to practice or the passage of time.  Looking at the graph, there appears to have been three pretty precipitous drops in average pace: initially for the short runs, and then again for the short runs after about 30 running events, but the drop in average pace for my long run time occurred just after my 20th running event, and didn't seem to affect the short runs.  I am happy enough with this; without getting into time series or forecasting (though here's a great tutorial), I checked my residual autocorrelations (ACF) and everything looked OK.




I'm sure I look ridiculous when I run--I wear cut-off jeans, tattered old t-shirts, and my $15 Costco-brand running sneakers. But this is intentional! First, I like the feeling of getting extra use out of my holey old clothes by using them as a running costume. Second, they are positively indecent and wholly unwearable, even to sleep in--out on the block this is another incentive NOT to stop running, indeed not even to slow down!

Why do I do this? Because I am by nature quite sedentary, and evidently this means I am going to get several diseases, my brain is going to atrophy, and I will die quite prematurely. Because I am scared of these things happening, I have been following this self-imposed routine of aerobic hell rather sedulously for the past few months. During the school-year I can tell myself convincing stories about how my daily 5-minute bike-rides to and fro the bus-stop really add up: "surely this is a sufficient amount of exercise". But during the summer, when I can easily remain seated in the same place for the entire day, even these weak rationalizations break down. Running is the easiest means of cardio-ing; you can do it anywhere there's a sidewalk.

Running appears to enhance cognitive performance in healthy individuals.
This wikipedia article provides an excellent summary, but I'll talk about a few specific studies below. Smith et al (2010) analyzed 29 studies that tested the association between neurocognitive performance and aerobic exercise; they found that individuals who had been randomly assigned to aerobic exercise conditions improved in attention, processing speed, executive function, and memory. Here's a PsychologyToday page about another study on the relationship between cardiovascular fitness and intelligence in young adulthood (spoiler: it's very positive).

Not only that, but cardio also appears to be optimal for longevity. VO2 max, the gold-standard measure for cardiovascular fitness, is a good predictor of life expectancy; the higher it is, the lower your risk of "all cause mortality" and cardiovascular disease. The good news is, VO2 max is trainable, especially if interval training is used!

This isn't just me showcasing studies that confirm my beliefs--here's an excerpt  about the relationship between exercise and cognitive function from the recent textbook "Memory" by three leaders in the field (Baddeley, Eysenck, and Anderson, 2014):
"The evidence is much stronger for a positive effect of exercise on maintaining cognitive function. In a typical study, Kramer, Hahn, Cohen, Banich, McAuley, Harrison, et al. (1999) studied 124 sedentary but healthy older adults, randomizing them into two groups. One group received aerobic walking-based exercise, while the control group received toning and stretching exercises. The groups trained for about an hour a day for 3 days a week over a 6-month period. Cognition was measured by a number of tests including task switching, attentional selection, and capacity to inhibit irrelevant information. They found a modest increase in aerobic fitness, together with a clear improvement in cognitive performance. A subsequent meta- analysis of a range of available studies by Colcombe and Kramer (2003) found convincing evidence for a positive impact of aerobic exercise on a range of cognitive tasks, most notably those involving executive processing."


Honestly though, I feel like the amount of car exhaust I have to breathe on my runs probably greatly offsets any potential gains of cardiovascular exercise. Especially when I read horrifying things about how even sitting in traffic can cause brain damage and how living near a busy road increases the risk of birth defects. I sure hope I'm not running right into the very outcomes I intended to run away from!



Here's some R-code I used for this post:
> sd(data1$AvgPace)
[1] 26.28296
> sd(data2$AvgPace)
[1] 20.17638
> mean(data1$AvgPace)
[1] 440.1333
> mean(data2$AvgPace)
[1] 464.6625
> summary(fit1)

Call:
lm(formula = data1$AvgPace ~ data1$Order)

Residuals:
Min 1Q Median 3Q Max
-25.918 -10.718 -2.435 9.947 31.418

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 471.6471 5.7739 81.687 < 2e-16 ***
data1$Order -1.6882 0.2595 -6.507 8.16e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.33 on 25 degrees of freedom
Multiple R-squared: 0.6287, Adjusted R-squared: 0.6139
F-statistic: 42.34 on 1 and 25 DF, p-value: 8.157e-07

summary(fit2)

Call:
lm(formula = data2$AvgPace ~ data2$Order)

Residuals:
Min 1Q Median 3Q Max
-15.9043 -7.8003 -0.2513 6.0742 19.6548

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 506.2880 7.1511 70.799 < 2e-16 ***
data2$Order -1.4735 0.2348 -6.276 2.04e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.69 on 14 degrees of freedom
Multiple R-squared: 0.7378, Adjusted R-squared: 0.719
F-statistic: 39.39 on 1 and 14 DF, p-value: 2.036e-05

plot(data$Order,data$AvgPace,col=data$LongShort, main="Average Pace over Time (ordinal)", ylab="Average Pace (seconds)", xlab="Order (1 = first run, 2 = second run, ... , 44 = most recent run)")
legend('bottomleft', legend = levels(data$LongShort), col = 1:3, pch = 1)
abline(lm(data1$AvgPace~data1$Date),col="red")
abline(lm(data2$AvgPace~data2$Date),col="blue")
lines(data1$Date,data1$AvgPace,type="l")
lines(data2$Date,data2$AvgPace,type="l")

plot(Date, AvgPace, t="l", xaxt="n", xlab="")
axis(1, at=Date, labels=FALSE)
text(x=seq(1,44,by=1), par("usr")[3]-6.5, labels=labs, adj=1, srt=45, xpd=TRUE)

hist(Month1, xlab="Month (March=3, April=4,...)",main="Logged Running Events per Month",breaks=seq(3,9,1),right=F,labels=T)

par(mfrow=c(2,2))
plot.ts(res3,ylab="res (AvgPace - SHORT)",main="residual autocorrelation (short runs)")
abline(0,0)
Acf(res3)
plot(res4,ylab="res (AvgPace - LONG)",main="residual autocorrelation (long runs)")
abline(0,0)
Acf(res4)

Friday, August 28, 2015

Summary/Review of "How Can The Mind Occur in The Physical Universe?"

"...There is this collection of ultimate scientific questions, and if you are lucky to get grabbed by one of these, that will just do you for the rest of your life. Why does the universe exist? When did it start? What’s the nature of life?...
The question for me is how can the human mind occur in the physical universe. We now know that the world is governed by physics. We now understand the way biology nestles comfortably within that. The issue is how will the mind do that as well." 
-Alan Newell, December 4, 1991
I found out about John R. Anderson almost immediately upon discovering intelligent tutoring systems a few years ago; he and his research group at Carnegie Mellon have blazed the way forward with these technologies. Their Cognitive Tutor, for example, is currently #5 out of 39 interventions in mathematics education, as evaluated by the US Department of Education's "What Works Clearing House". I learned that, notwithstanding these educational pursuits, his life's work had been more about developing a "cognitive architecture" -- a model of how the structure of the mind and its components work together to achieve human cognition. I learned that he called it ACT-R (for "adaptive control of thought - rational") and that it has been steadily undergoing refinements since it debuted in the early 70s. Anyway, given how amazed I was with his tutoring-systems research, I was naturally drawn to Anderson's 2007 book that surveys his life's work in attempting to answer the titular question via ACT-R.

I'm moved to blog this because I was extremely impressed by (1) the synthesis of seemingly disparate phenomena (ACT-R is very consistent with a wide range of findings in cognitive psychology), and (2) how well his theories map onto findings from neuroscience. This book contains the most convincing model of human cognition I know of, but it is spread out across several chapters and compartmentalized in such a way that I feel I can unbox everything and tie it all together here in a more readily intelligible, coarse-grained fashion. It really is amazing, but I understand if you don't want to sit here and read a whole long synopsis. For this reason, I will now post verbatim a summary given by Anderson at the end of the book (though before he talks about consciousness), so that you can make an informed decision about whether to read further.
1. The answer [to the title question] takes the form of a cognitive architecture—that is, the specification of the structure of the brain at a level of abstraction that explains how it achieves the function of the mind.
2. For reasons of efficiency of neural computation, the human
cognitive architecture takes the form of a set of largely inde-
pendent modules (e.g., figure 2.2) associated with different
brain regions.
3. Human identity is achieved through a declarative memory
module that, moment by moment, attempts to give each person
the most appropriate possible window into his or her past.
4. The various modules are coordinated by a central production
system that strives to develop a set of productions that will give
the most adaptive response to any state of the modules.
5. The human mind evolved out of the primate mind by achieving
the ability to exercise abstract control over cognition and the
ability to process complex relational patterns.

The Modular Nature of Mind and Brain

The function of a cognitive architecture, according to Anderson, is "to find a specification of the structure of the brain that explains how it achieves the function of the mind." He argues that connectionist models of cognition will never be able to completely account for human cognition as a whole:
"This is because the human mind is not just the sum of core competences such as memory, or categorization, or reasoning. It is about how all these pieces and other pieces work together to produce cognition. All the pieces might be adapted to the regularities in the world, but understanding their individual adaptations does not address how they are put together."
Though many cognitive phenomena are certainly connectionist in nature, there is also no question that the brain is more than a uniform network of individual neurons. Much in the way that a cell is functionality partitioned into organelles, or that an organism comprises interconnected organ systems that each carry out characteristic tasks, the brain too has modularized certain functions, as evidenced by unique regions of neural anatomy associated with the performance of different tasks. The brain isn't just one huge undifferentiated mass! Neurons that perform related computations occur close together by reason of parsimony: the further apart they are, the longer it would take for them to communicate. Thus, computation in the brain is local and parallel; different regions perform different functions in the service of cognition, though at a lower level the functionality of any given brain region is connectionist in nature. Indeed, almost all systems whose design is meant to achieve a function show this kind of hierarchical organization (Simon, 1962).

If the brain devotes local regions to certain functions, this implies that we should be able to use brain-scanning procedures to find regions that reflect specific activities. The ACT-R cognitive architecture proposes 8 basic modules, and has mapped them onto specific brain regions through a series of fMRI experiments.
The eight modules (four peripheral and four central), plus their associated brain regions, are as follows: (1) Visual - processing of attended information in the fusiform gyrus; (2) Aural - secondary auditory cortex; (3) Manual -  hand motor/sensory region of central sulcus; (4) Vocal - face/tongue motor/sensory region of central sulcus; (5) Imaginal - mental/spatial representation area in posterior parietal cortex; (6) Declarative - memory storage/retrieval operations in prefrontal cortical areas; (7) Goal - cognition directed by anterior cingulate cortex; and (8) Procedural - integration, selection of cognition actions through the basal ganglia. A single fMRI study (Anderson et al., 2007) demonstrated the exercise of all of these modules and their associated brain regions. For our purposes, two of these modules are worth considering in more detail.

While the many regions of the brain do their own separate processing, they must act in a coordinated manner to achieve cognition. Thus, many regions of localized functionality are interconnected by tracts of neural fibers; particularly important are the connections between the cortex (the outermost region of the brain) and subcortical structures. One subcortical area in particular, the basal ganglia, is innervated by most of the cortex and plays a major role in controlling behavior through its actions on the thalamus. It marks a point of convergence across brain regions, compressing widely distributed information into what is effectively a single decision point. Thus, the basal ganglia is believed to be the main brain structure involved in action selection, or choosing which of many possible behaviors to perform in a given instance. Like their associated brain regions, the ACT-R modules must be able to communicate among each other, and they do so by placing information in small-capacity buffers associated with each of them. The procedural module plays the role of the basal ganglia by responding to patterns of information in these buffers and producing action. Though all modules are capable of independent parallel processing, they have to communicate via the procedural module, which can only execute a single rule/action at a time, thus forming a serial "central bottleneck" in overall processing.

So the basal ganglia plays the role of a "coordinating module". Appropriately, this region is evolutionarily older than the cortex and it occurs to some extent in all vertebrates. The other module I wanted to consider is the Goal module, which enables means-ends analysis. This is a task that is more uniquely human; it requires that one be able to disengage from what one wants (the goal, or "end") in order to focus on something else (the "means"). Some researchers (Papineau, 2001) assert that this is a uniquely human capability.

So, where are we at? The human mind is thought to be partitioned into specific information-processing functions, and thankfully neuroanatomy appears to be cut along similar joints, with specific brain regions devoted to different functions and interconnections that provide for coordination among these functions. Having positive a cognitive architecture based on interacting modules, Anderson turns next to the nitty-gritty of learning and memory.

Learning and Memory in ACT-R

Above, I mentioned a "Declarative" module as being among the central modules posited by ACT-R. Anderson's fundamental claim is that "declarative memory tries to give us, moment by moment, the most appropriate possible window into our past," and "this window into our past gives us our identities."

He assumes the well-documented distinction between declarative learning, or learning of "facts" and procedural learning (skill acquisition). He doesn't, however, make Tulving's (1972) episodic/declarative distinction; instead he considers both explicitly learned in a given context, with the difference being that the "declarative" memory (such as "Lincoln was a U.S. president") has been encountered in so many subsequent contexts that we no longer have access to the context in which it was originally learned. Declarative memories can be strengthened, or made more available, by mere exposure.

In addition to the formation and strengthening of declarative memories, there is also procedural learning and subsequent conditioning of these actions. An example he gives is typing: we all know how to type, but we would have a difficult time if asked to give the location of a certain key on a keyboard (without using our fingers as an aid or relying on a common mnemonic like "the home row" or "qwerty"). Conditioning is how all animals learn that certain actions are more effective in certain situations through experience; these can be procedural actions or innate tendencies. Procedural knowledge is associated with the basal ganglia and will be discussed in greater detail below; for now, we will stay with declarative learning.

Interestingly, there are two ways of acquiring declarative memories. This can be illustrated by anterograde amnesiacs like H.M., who, despite the loss of the hippocampus (and the ability thereby to form new memories), was able to learn about famous people such as John F. Kennedy and others who became famous after his surgery. Recent researchers have postulated two different learning systems: while the hippocampus is known to subserve most declarative learning, other brain structures can slowly acquire such memories through repetition (presumably how H.M. came to know about famous people). Furthermore, through rehearsal, memories can be slowly transferred from the hippocampus to neocortical regions, explaining why those with a damaged or missing hippocampus can still access older memories (which are presumed to have undergone such transfer). So, while the hippocampus limits the capacity of declarative memory, it does not limit all learning.

I've long been confused about the relative finitude of memory, but Anderson makes a strong case for there being definite limits on the size of declarative memory. Beyond physical limits of sheer size and metabolic costs, he makes the interesting claim that the very flexibility of our memory-search ability derives from it being strategically limited, "throwing out" memories that are unlikely to be needed: "declarative memory, faced with limited capacity, is in effect constantly discarding memories that have outlived their usefulness".

Alongside Lael Schooler, Anderson (1991) researched the fundamental mechanisms of declarative memory. They found that if a memory has not been retrieved in a while, it becomes increasingly unlikely that it will be needed in the future. Indeed, there is a simple relationship between how likely a memory would be needed on a given day and how long it had been (t) since the memory was last used:
Odds needed = At-d

Where A is just a constant and d is the decay rate. Each time a memory was accessed, it added an increment to the odds that it would be needed again, with these increments all decaying according to a power function. Thus, if an item occurred n times, the odds of it appearing again is

Odds = ∑nk=1  Atk-d

Where tk is the time since the kth practice of an item. Thus, the past history of memory use predicts the odds that the memory will be needed. But the context of the current situation is involved as well. It turns out that memory availability is adjusted as a function of context; e.g., you will have an easier time remembering, say, your locker combination in the locker room than you would if someone were to randomly ask you for it elsewhere (Schooler and Anderson, 1997). Thus, human memory reflects the statistics of the environment and performs a triage on memories, devoting its limited resources to those that are most likely to be needed. How is this fact realized in ACT-R?

In ACT-R, the "past" that is available in the form of memories consists of the information that existed in the buffers of various modules. At any given moment, countless things are impinging on the human sensorium, of which we only remember a very small fraction. For instance, ambient sounds or things in the visual periphery certainly undergo processing in various brain regions, but they seldom attended to and thus often never make it into buffers. The system is "aware" only of the chunks information in the various buffers, and these chunks get stored in declarative memory. These chunks have activation values that govern the speed and success of their retrieval. Specifically, a given memory has an inherent, base-level activation, plus its strength of association to elements in the present context.

Since the odds of needing a memory can be considered the sum of a quantity that reflects the past history of that memory and the present context, we can represent this in Bayesian terms as

 log[prior(i)] + ∑(j∈C)log[likelihood(j|i)] = log[posterior(i|C)]

Where prior(i) is the base-level activation, or the prior odds that memory i would be needed based on factors such as recency/frequency of use, likelihood(j|i) is the likelihood ratio that element j would be part of the context given that memory i is needed (reflecting strength of association to the current context), and posterior(i|C) is the updated odds that memory i will be needed in contex C.



I'll give the basic ACT-R memory equations without going into them much further. The main point is that memory is responding to two statistical effects in the environment: (1) the more often a memory is retrieved, the more likely it is to be retrieved in the future. This produces a practice effect and is reflected in ACT-R's base-level activation. Secondly, (2) the more memories associated with a particular element, the worse a predictor the element is of any particular memory. This is reflected in the strengths of association in ACT-R, and produces the "fan" effect. The "fan" refers to the number of connections to a given element; increasing the sheer number of connections will decrease the strength of association between the element and any one of its connections. This is because when an element is associated with more memories, its appearance becomes a poorer predictor of any specific fact.

These results have been shown to affect all of our memories. In experimental illustration of this, Peterson and Potts (1982) had participants study 1 or 4 true facts about famous historical figures that they did not previously know, such as that Beethoven never married. Two weeks later, participants were tested on memory for three kinds of facts: (1) new facts they had learned about historical figures as part of the experiment, (2) known facts that they knew about the historical figures before the experiment (eg, Beethoven was a musician), and (3) false facts that they had not learned for the experiment and that should be recognizable as very unlikely (Beethoven was an famous athlete). Participants were shown these types of statements and had to rate them as true or false, and their speed in doing so was recorded. First, it was found that the facts they knew before the experiment were recognized much more quickly than those they learned for the experiment, reflecting the greater practice and base-level activation of the prior facts. More importantly, the number of facts they had learned for the experiment (1 vs. 4) affected BOTH new and prior facts: participants who learned 4 new facts made slower judgements for both well-known and newly-learned facts, while those who learned just 1 new fact were faster on both new and prior facts. Anderson writes:
From the perspective of the task facing declarative memory—making
most available those facts that are most likely to be useful—these results make perfect sense. The already known facts have been used many times in the past, and at delay of two weeks they are likely the ones needed, so the base-level activation works to make them most active. On the other hand, the more things one knows about an individual, the less likely any one fact will be, so they cannot be all made as active. The activation equations in table 3.2 capture these relationships.
This relationship is also borne out in fMRI research. The greater the activation of a memory, the less time/effort it will take to retrieve it; thus, higher activation should map onto weaker fMRI response. Using a fan-effect paradigm, it was found that greater fan (more connections to a single memory) resulted in decreased activation and therefore stronger fMRI respones (Sohn 2003, 2005).

Anderson goes on in this chapter to discuss how we often choose actions and make decisions based on our memories of similar past actions/decisions and the outcomes that they produced. Here, we rely on memories rather than reasoning on the basis of general principles. Sometimes we have general principles to reason from, while other times it's far easier to recall and act. This kind of instance-based reasoning may be far more common than has been traditionally thought.


The Adaptive Control of Thought

Given all of the above, we know how important a flexible declarative memory is to our ability to adapt to a changing environment; but once the relevant information has been retrieved, we have to act on it, using it to make inferences or predictions. This often requires intensive, deliberative processing which is not appropriate when we have to act rapidly in stressful situations. Indeed, to the extent that one can anticipate how knowledge will be used, it makes sense to prepackage the application of that knowledge in a way that can be executed without planning. It turns out that there is a process by which frequently useful computations are identified and cached as cognitive reactions that can be elicited directly by the situation, bypassing laborious deliberation. Thus, a balance must be struck between immediate reaction and deliberative reflection, a sort of dual processing reminiscent of Kahneman's "Thinking Fast and Slow." This is the way Anderson conceptualizes learning: a process of moving from intentional thinking and remembering (hippocampal/cortical) to more automatic reactions (basal ganglia).

But such an equal embrace of thought and action has not always characterized cognitive science; in fact, this very distinction marked the transition in psychology from the "behaviorist" to the "cognitive" era. This shift is very visible in the debate between Tolman and Hull about the relative roles of mental reflection and mechanistic action in producing behavior. To illustrate the struggle between thought and action in the mind, Anderson has us consider the Stroop task, where you are instructed to quickly report the font color while reading a list like red yellow orange green blue black etc. This task always takes slightly longer than simply reporting the color of non-words; Anderson points out that "this conflict basically involves the battle between Hull’s stimulus-response associations (the urge to say the word) and Tolman’s goal-directed processing (the requirement to comply with instructions)."

Anderson argues that 3 brain systems are especially relevant in achieving a balance between thought and action: the basal ganglia are responsible for the acquisition and application of "procedures", or Hull's automatic reactions; the hippocampal and prefrontal regions are responsible for storage and retrieval of declarative information, or Tolman's expectancies; and the anterior cingulate cortex (ACC) for exercising control in the selection of context-appropriate behavior. Note that these respectively correspond to the procedural module, the declarative module, and the goal module.

Declarative retrieval and of information during decision-making is very time and resource intensive; it would be sensible if our brains had a way of "hard-coding" frequently-used behaviors/actions so that we could respond more automatically to familiar situations. Fortunately, it appears they do just that! For example, Hikosaka et al. (1999) showed monkeys a sequence of 4x4 grids in which two cells were lit up, and the monkeys had to select them in the correct order. The monkeys practiced such sets over the course of several months, and telling differences emerged between performance during the early months and later months. Early on, the monkeys performed the same regardless of what order the grids were shown in, or of which hand they used; however, after months of practice, they had become much faster at completing the task but could not go out of order and could only use their favored hand to input the answer. Thus, it seemed that the monkeys had switched from a flexible declarative representation of the task to a classic stimulus-response representation. Hikosaka et al. examined the brains of monkeys performing the task in order to compare activity in the early vs. later months. As expected, the task activated prefrontal regions early on, but after much practice the task primarily produced activity basal ganglia structures, which are thought to display a variant of reinforcement learning. Furthermore, temporarily inactivating basal ganglia structures disrupted only the highly practiced sequences (not newly learned sequences).

The basal ganglia, then, is involved in producing automatic responses to stimuli. Indeed, it seems to display a variant of reinforcement learning, where a behavior followed by a "satisfying state of affairs" will increase in frequency (Thorndike's law of effect). The hippocampus is associated with Hebbian learning, where repeated occurrences of stimuli and response together serve to strengthen the connection (Thorndike's law of exercise); this is merely a function of temporal contiguity and does not depend on the consequences of the behavior. The basal ganglia is involved in a dopamine-mediated process that learns to recognize favorable patterns of activity in the cortex (Houk and Wise, 1995). That is, dopamine neurons provide information to the basal ganglia about how rewarding a behavior was, if it was more rewarding than expected, etc. Importantly, an element of time-travel is involved, because the rewards strengthen the salience of reward-producing contexual patterns. In humans, the basal ganglia (specifically the striatum) has been found to respond differentially to reward and punishment, the magnitude of the reward/punishment, and the difference between expected and recieved reward/punishment (Delgado et al. 2003). This was all very refreshing to me. Classical and operant conditioning are often presented in psychology classrooms as museum curiosities or animal training procedures, when in fact they apply equally well to human learning.

I wanted to share one final experimental demonstration of the difference between learning in the hippocampus versus the basal ganglia. This one involves a rat maze-learning paradigm; imagine a maze shaped like a plus sign (+); rats always enter on the same side, say the west side. Rats are trained to go to food housed in the south arm. What will rats do if they are put in the maze on the east side? Have they learned the spatial location of the food, or have they merely learned a right-turning behavior? If the former is true, they should turn down the correct arm of the maze to find the food; if they latter is true, their response will lead them down the wrong arm. Early results yielded no clear choice pattern (Restle, 1957). However, Packard and McGaugh (1996) trained all rats on the maze and then gave them injections that temporarily impaired either their hippocampus or their basal ganglia (specifically, the caudate). As you might expect, the rats with selective hippocampal impairment performed the right-turning response and ended up in the wrong arm of the maze, while rats with impediments to the basal ganglia chose the correct arm, presumably because their intact hippocampus contained the correct spatial "place-learning" representation. A convincing follow-up study by Packard (1999) produced the same pattern of results, but this time by using memory-enhancing agents applied selectively to the hippocampus or the caudate. This time, rats with hippocampal enhancements displayed behavior consistent with place-learning (they chose the correct arm), while rats with enhanced caudates relied on a right-turn response and chose the incorrect arm.


But where do these stimulus-response associations come from? In ACT-R, they are called "productions" or "production rules" -- when a situation arises for which the system does not already have rules, information must be retrieved from declarative memory and must be processed using more basic production rules. This could entail retrieving a similar prior experience upon which to base present actions or retrieving general principles and reasoning from them. In such a situation,
"the first production makes a retrieval request for some declarative information, that information is retrieved, and the next production harvests that retrieval and acts upon it. The compiled production eliminates that retrieval step and builds a production specific to the information retrieved. This is the process by which the system moves from deliberation to action. Each time a new production of this kind is created, another little piece of deliberation is dropped out in the interest of efficient execution."
However, this newly formed production requires multiple repetitions for it to acquire enough strength to be applicable in new situations. Such rules are learned slowly, consistent with the view that procedural memories are acquired gradually. This measure of strength is often called a rule's "utility" since it is a measure of the value of the rule; when a situation arises where multiple rules apply, the rule with the highest utility is chosen; further, rewarding consequences following the use of a rule serve to increase that rule's utility. When a new rule is first created, its utility is zero and thus it is extremely unlikely that it will "fire". However, each time this rule is recreated its utility is increased. Anderson gives an excellent example using children's learning of subtraction rules. In the interest of time I won't go into it here, other than to say that it accounts for the most common bug in learning to subtract two multi-digit numbers: instead of always subtracting the bottom number from the top number, the buggy rule children often use is to subtract the bigger from the smaller, regardless of which is on top. This rule is so persistent because half of the time, it produces the correct outcome and thus the same reward as the more limiting bottom-from-top rule. ACT-R is used to model the acquisition of the correct rule, and I found it very compelling.

This general learning process is seen clearly in skill learning: as one becomes more skillful (say, in riding a bike), there will be a decrease in the involvement of the more "cognitive" cortical regions and an increase in the involvement of the more "stimulus-response" posterior regions. Here's Anderson's summary:
"Learning can be conceptualized as a process of moving from thought-
ful reflection (hippocampus, prefrontal cortex) to automatic reaction
(basal ganglia). The module responsible for learning of this kind is the
procedural module (or production system). I offer the procedural mod-
ule as an explanation for behavior that embraces both Hull’s reactions
and Tolman’s reflections and provides a mechanism for the postulated
learning link between them. Through production compilation, thought-
ful behaviors become automatized; through utility learning, behavior
is modified to become adaptive. When combined with the declarative
memory module discussed in chapter 3, the production system provides a mechanism by which knowledge is used to make behavior more flexible and efficient."
Thus, an important part of cognition is the accumulation of production rules in long-term memory, which can then become activated by the contents of working memory, which can be composed into more complex production-rule chains when a particular problem is solved, the result of which can be cached and, if used above some some frequency threshold, will become a production rule in its own right.

Uniquely Human Learning

Anderson points out that his (and my) discussion up to this point has actually concerned primate learning; nothing so far has been unique to humans. In chapter 5, he discusses learning from verbal directions and worked-out examples. He also recognizes the role of individual discovery in the learning process, but criticizes the recent trend towards pure "discovery" learning in education:
" ...a third way to learn is by discovery and invention. Cultural artifacts such as algebra came into being because of such a process. Some constructivist mathematics educators advocate having children learn in the same manner (e.g., Cobb et al., 1992). In the extreme, it is a very inefficient way to learn algebra or any other cultural artifact... However, when one looks in detail at what happens in the process of learning from instruction and example, one frequently finds many minidiscoveries being made as students try to make sense of the instruction they are receiving and their experience in applying that instruction. Learning by discovery probably plays a more important role as a normal part of learning through social transmission (i.e., directions and examples) than it does as a solo means of learning."
Anderson goes on to discuss how human cognition can support a uniquely human skill: learning algebra from verbal directions and examples. He uses ACT-R to model algebra learning and to help point the way toward what is special about human cognition. He ends up describing three such features in detail: the potential for abstract control of cognition, the capacity for advanced pattern matching, and the metacognitive ability to reason about cognitive states.

The first is likely mediated by the anterior cingulate cortex (ACC), a structure involved in controlling behavior, which is especially active when people have to direct their behavior in ways that violate typical response tendencies. Interestingly, the ACC has undergone recent evolutionary changes found only in humans. Recall that this structure was the one associated with the goal buffer, which holds control elements. The idea is that the ACC allows us to maintain abstract control states which let us choose different actions when all the other buffers are in identical states. The second feature requires dynamic pattern matching, which allows for processing complex relational structures, as seen in analogical processing. It all gets pretty detailed and I won't go into it here. Instead I'll just quote the end of the chapter:
Dynamic pattern matching and recursive representations are connected. Dynamic pattern matching is only useful in a system that has powerful, interlinked representations. Processing recursive representations can be much easier with dynamic pattern matching. The human brain is expanded over that of other primates, and it is not just a matter of more brain. There are new prefrontal and parietal regions, and in the case of some regions such as the ACC, there are new kinds of cells. While brain lateralization is also a common feature of many species, its connection with language seems unique (Halpern et al., 2005), and Marcus’s second feature is strongly motivated by considerations of language processing. So, it seems pretty clear that there have been some changes to the structure of the human brain that enable the unique functions of human cognition.

The Question of Consciousness

It isn't really fair to talk about this here, because I have only given you a flavor for the main arguments presented in the book, and it is upon this foundation that his discussion of consciousness is founded. It requires an intimate understanding of ACT-R, and I don't think I've done a good enough job conveying that understanding in the present post. Still, I'll leave you with his thoughts on the subject, which he gives only grudgingly (preferring to "leave the philosopher's domain to the philosopher"):

In 2003, we noted that in ACT-R consciousness has an obvious mapping to the buffers that are associated with the modules. The contents of consciousness are the contents of these buffers, and conscious activity corresponds to the manipulation of the contents of these buffers by production rules. The information in the buffers is the information that is made available for general processing and is stored in declarative memory. ACT-R models can generate introspective reports by describing the contents of these buffers. In 2003 we did not think this was much of an answer and gave ACT-R low marks on this  dimension. I have subsequently come to the conclusion that this is indeed what consciousness is and that running ACT-R models are conscious. They may not be conscious in the same sense as humans, but this is probably because ACT-R gives a rather incomplete picture of the buffers that are available in the human system.

He immediately notes that this is "not a particularly novel interpretation of consciousness" and that it is essentially "the ACT-R realization of the global workspace theory of consciousness (Baars, 1988; Dehaene & Naccache, 2001)
These authors, Dehaene and Changeux (2004), summarize the view as follows:
We postulate the existence of a distinct set of cortical “workspace” neurons characterized by their ability to send and receive projections to many distant areas through long-range excitatory axons. These neurons therefore no longer obey a principle of local, encapsulated connectivity, but rather break the modularity of the cortex by allowing many different processors to exchange information in a global and flexible manner. Information, which is encoded in workspace neurons, can be quickly made available to many brain systems, in particular the motor and speech-production processors for overt behavioral report. We hypothesize that the entry of inputs into this global workspace constitutes the neural basis of access to consciousness. (p. 1147)
He is totally on-board with rejecting all "Cartesian theater" interpretations--the idea that there has to be something more to consciousness, some inner homunculus that watches our thoughts flit by-- and he seems to agree pretty completely with Dennett (1993). He finishing with the following:
 If we resist the temptation to believe in a hard problem of consciousness, we can appreciate how consciousness is the solution to the fundamental problem of achieving the mind in the brain. As noted in chapter 2, efficiency considerations drive the brain to try to achieve as much of its computation as possible locally in nearly encapsulated modules. However, the functionality of the mind demands communication among these modules, and to do this, some information must be made globally available. The purpose of the buffers in ACT-R is to create this global access. The contents of these buffers will create an information trail that can be reported and reflected upon. As in the last example in chapter 5, adaptive cognition sometimes requires reflection on this information trail. Thus, consciousness is the manifestation of the solution to the need for global coordination among modules. It is a trademark consequence of the architecture in figure 2.2. That being said, chapters 1–5 develop this architecture with only oblique references to consciousness. This is because the information processing associated with consciousness is already described by other terms of the theory. It still is not clear to me how invoking the concept of consciousness adds to the understanding of the human mind, but taking a coherent reading of the term consciousness, I am willing to declare ACT-R conscious.