Recognition Memory Paper


Word Frequency and List Length Effects on Recognition Memory


                                                                  Mark Mobley



The within-group experiment explored participant’s ability to recognize words that are low frequency and high frequency when presented in a short list of 20 words and a long list of 80 words.  The participants then took a recognition test to distinguish words they saw in the first phase from new words used as lures.  The results showed that people were able to recognize low frequency words significantly more often than high frequency words.  People also did significantly better on the shorter word list than on the long word list.  The experiment found evidence for a list-length effect and for low frequency words being easier to recognize.

1    Introduction

                People have always been interested in how much a person can remember and the limits of human memory.  So far, Psychology has not found a definite limit in the amount of information a person can retain and process.  There is a debate over whether there exist a “list-length effect” where shorter word lists are easier to recognize and recall or there is a “null list-length effect” where people are able to recognize words with the same ability regardless of the size of the word list.  Primacy and recency effects explain people’s ability to better remember the first and last few items in the list.  One person conducted an experiment investigating list length and was surprised his data had such strong recency effects.

Geoff Ward conducted the experiment where he presented subjects with a list of 10, 20 or 30 words.  Each word on the list appeared for one second and the subjects repeated and rehearsed the word aloud between words.  Ward found a strong list-length effect for his experiment and found that the proportion correct on the last 10 words were similar between the three word lists.  He found strong recency effects throughout his data and viewed the list-length effect as being a result of “recency and selective rehearsal” (Ward, 2002).  Unrehearsed words ended up being “further from the end of the list” the larger the list got.  Recency effects would increase recognition, and thus the words were not recalled would be more likely to be earlier in the list.  Ward stated there are limitations of analyzing verbal rehearsal because subjects could have chosen to not rehearse or selectively rehearsed words in each list.  The conclusion of his experiment was that the “list length effect in free recall is interpreted as a result of selective rehearsal within a recency based account of free recall” (Ward, 2002).  Unrehearsed words are “less recent with longer lists” and this is the cause of the list-length effects.

Some of the most important items stored in human memory are words since they form the basis of communication and allow our society to function.  Native speakers do not focus on recalling words together based on grammar rules and structure.  Society tends to use some words more often than others which results in people becoming very accustomed to speaking or reading these words.  This probably allows native speakers to not have to exert much attention to speaking and using words.  In this experiment, we forced people to pay attention to words which are both common and uncommon.  Words we use often are called high frequency words due to the high frequency in which they are used.  Yet, certain words/phrases are awkward or unknown and people have to spend more time thinking about them.  Also since it is impossible to use all the words in our language all the time, there exits words we do no use or do not recognize as words.  These words are called low frequency words.  There are many questions and experiments which explore a person’s ability to recall different frequency words.

Rachel Diana investigated the ability to recall words of different frequencies.  She concluded low frequency words “produce more hits and fewer false alarms than high frequency words in recognition tasks” (Diana, 2006).  In general, she stated that not all tasks have an “advantage” for low frequency words.  She ran her own experiment where she presented subjects with visual and word stimuli together.  Diana had people read the word in each stimulus as quickly as possible and the subjects were told to remember what they read and saw during the first phase.  In the second phase, there were 40 combinations of words and pictures from the first phase, but she also had 40 lures.  She measured the proportion of hits for high and low frequency words and concluded that low frequency words were easier to recall over high frequency words, even though low frequency words needed more attention (Diana, 2006).     

                We conducted a within subject experiment to explore recognition of words where word frequency and word length varied.  We measured recognition as whether people correctly identified the word as being old or new.  Old words were used in the first phase of the experiment and consisted of the first 20 words (10 high and 10 low frequency) shown.  The new words were not shown in the first list of words and consisted of 20 words (10 high frequency and 10 low). 

There we two very important research questions that prompted us to run this experiment.  The first question was does the frequency of the word, which is how often the word appears in modern day use, affect recognition.  The second question asked does the length of the study list affect recognition.  We expected the experiment to follow past literature and show that low frequency words had a higher discriminability, which is how well people recognize the word as old or new.  Diana’s experiment supported our hypothesis as her experiment was very similar to our experiment and she found low frequency words to also be recalled more often.  Our second hypothesis was that the difference in recognition between the long word list and the short word list would show list-length effects; specifically we expected the long word list to have worse compared to the shorter word list.  Ward’s experiment findings of recency effects for free recall support the existence of the list-length effect.  As a result, we predicted that our participants would have a lower accuracy for the long word list since we were indirectly testing primacy effects for recognition. 




2    Experiment

2.1    Participants

There were 48 participants in the experiment. The participants signed up for the experiment through the Experimetrix Subject Pool and were compensated with extra credit


2.2 Stimuli

Figure 1: A Sample Recognition Memory Screen in the Second Phase


                Participants sat down in front of a computer running the MatLab recognition memory program as seen in Figure 1.  After being briefed about the experiment, the subjects began with either the “short” or “long” word lists, randomly chosen based on lab time.  The short word list consisted of 20 words, 10 which were low frequency and 10 which were high frequency.  The long word list consisted of 80 words, 40 high frequency words and 40 low frequency words.  The subject’s goal was the rate the pleasantness of each word; it was not important to the experiment but facilitated the subject’s memory of the target stimuli.  To counter balance for the differences in time spent on each word list, the program had a delay for the short list only; this occurred after the 20 words were presented to the participants and was represented by a loading bar.  Therefore, the time spent on each word list was roughly the same. 

After rating all the words, the subjects moved on to the testing phase which tested recognition.  The recognition phase of the experiment tested 40 words, 20 new which did not appear in phase one and 20 old words from phase one.  The 20 old words consisted of the first 10 high and low frequency words the participants saw.  The 20 new words consisted of 10 high frequency and 10 low frequency words.  The subject was presented the word and he had to click the box that indicated that the word was “new” or “old.”  The program gave no feedback if the answer was correct for each word.  After the subjects completed the test phase for one of the word lists, there was a short break and then the subjects repeated the same process for the other world list.


3    Results

The data were analyzed using signal detection theory.  Hit rates and false alarms were calculated by adding one to the numerator and two to the denominator of the dividend.  This adjustment avoided having a hit rate or false alarm rate of zero or one.  We measured significance using bias and discriminability.


There were main effects for the length of the study list F(1,47)=7.161, p=.01, r=.36 with the shorter word list being more discriminable than the longer list.   

Figure 2: Recognition Ability

Word frequency also was significant F(1,47)=18.841, p<.001, r=.535.  Low frequency words were much more discriminable than high frequency words.

There was no interaction between word frequency and word length F(1,47)=1.037, p=.314.  



                           Figure 3: Bias in Recognition Experiment


                           Participants did not show significant bias based on the length of the word lists F(1,47)=2.81, p=.1.   Participants also did not have a significant bias based on the frequency of the word shown F(1,47)=.595, p=.44.  Yet the interaction between the length of the word lists and the frequency of the word were significant F(1,47)=4.761 p=.034, r=.3. 



Figure 4: False Alarms and Hit Rates for the Long List

4    Discussion

                Our hypothesis about lower frequency words having a higher discriminability was verified in this experiment.  The data showed strong significance for word frequency and showed a much higher discriminability for low frequency words.  In recall memory tasks other researchers have conducted, more often high frequency words are remembered.  Yet, this was not the case for this recognition experiment.  People overwhelmingly recognized low frequency words with a high discriminability showing that people more often correctly indicated that the word was truly old or new.  Figure 2 shows that for the experiment, people had a higher recognition for lower frequency words.  High frequency words were not recognized as often.  This result is consistent with Diana’s conclusion that lower frequency words are easier to recognize and have a higher discriminability.  People are going to have to spend the extra energy required to encode low frequency words.  This extra energy therefore translates into people better recognizing words they do not use very often.  As Diana notes, in conditions where people have limited time, low frequency words require “more processing resources” to encode these words and do not recall low frequency words as well (Diana, 2006).  In our experiment, we did not detect a drop in performance for low frequency words because we did not constrain encoding time. 

Figure 4 illustrates the hit rate and false alarm for identifying the words as old on the long list that had 80 items.  Participants more often made the mistake of recognizing high frequency words as appearing in the first phase.  Yet, many people most likely responded with “old” for the testing phase in the long list because people with high false alarm rates also have high amounts of hits.  This result is mostly consistent with Diana’s result that people for low frequency words have a high hit detection and low false alarm rate.  In this experiment, the false alarm rate was higher than Diana’s experiment.  Again, this is most likely to people selecting “old” as their primary answer.  People also did not rate most of the words as being “new” or else there would be data plots closer to the origin.    

                Our second hypothesis that the longer word list would show list-length effects and have lower recognition was also supported.  There is enough evidence to support a list-length effect in this experiment; word length was significant and people performed worse in recognizing words in the long list compared with the short list.  Item noise theories say shorter lists lead to better recognition because there are less words interfering with recognition and recall.   The short list did have better recognition.  Similar to Ward’s findings, people had more accurate recognition with the short word list of 20 items.  There was lower discriminability for the long word list (Figure 2).  Regardless of word frequency, people had much worse recognition for the long word list.  It seems the list-length effect is independent of the type of memory test; our recognition test found support for the effect while Ward’s free-recall test also found the word-list effect.  While our experiment did not require rehearing each word to aid encoding and memory, having the participants rate the pleasantness achieved results consistent with Ward.  For our list length as well, the “proportion of words recalled decreased with increasing list lengths” (Ward, 2002).  The data also do not support the argument for the null list-length effect.  Words were not equally recognized as there was a significant effect for length of the list.  This means that the list size had a large impact on people’s recall and there were differences in discriminability between the list lengths. 

How people made decisions in this experiment was not influenced by positive or negative bias.  When people made mistakes, it was unbiased; they did not miss more words they should have recognized or incorrectly rate more words as being in the word list (Figure 3).  High frequency words have a small positive bias for the long list and a small negative bias for the short list.  Low frequency words have a small negative bias for the long list and a small positive bias for the short lists.  The interaction for bias between length and frequency can be explained as a result of outliers in the data.  Some data had values that were extreme such the value near .6 in Figure 4.  This false alarm rate was very high, meaning that this person believed a large portion of the words was old even when they were not.  Also, their hit rate is very low which suggests this person selected “new” for most of their answers.  This behavior was not widespread and was likely due to the progressive effects such as boredom or low internal motivation.  Overall, there is very little biases in the experiment as the values for bias are very close to 0.  Since there were data that supported a list length effect, one reason that subjects incorrectly rated a word on the list was most likely a result of the list length or word frequency rather than bias. 

                Our experiment did not counterbalance for the primacy effect for recall memory; the first 20 words in either word list were used in the testing phase as the “old” words.  While the instructions did not inform participants that these specific words would be used on the testing phase, their arrangement at the beginning most likely increased recognition.  Ward found that with long lists of words, there was a very significant recency effect where people were not able to recall the beginning and middle words easily, but was able to recall the last few words (Ward, 2002).  Our experiment was using primacy effects to aid recognition since we have the participants recall the first 20 words they saw at the beginning of the experiment instead of having the 20 words randomly chosen from the word lists.  In the short list where people only were presented with 20 words, the delay before the testing phase did not introduce any new words and potentially subjects benefited from both the primacy and recency effects for these words.  If the recency effect is as strong as Ward concluded, then if we tested for the last 20 words in our long word lists, the data would show a higher discriminability and might be closer to the short list’s discriminability.  This could potentially eliminate some evidence for the list-length effect, especially if we told participants they would be tested on the last 20 words.  If we conducted further experiments, we could compare the results of a recency effect by making the last 20 old words be used for the testing phase.  We would then hypothesize increased discriminability overall for the short and long list, but low frequency words would still be more discriminable than high frequency words.


Diana, RA. (2006). The Low-Frequency Encoding Disadvantage: Word Frequency Affects Processing Demands. Journal of experimental psychology. Learning, memory, and cognition, 32(4), 805-815.


Ward, G. (2002). A recency-based account of the list length effect in free recall. Memory & cognition, 30(6), 885-892.


Word Frequency and List Length Effects on Recognition Memory is copyrighted 2008 by Mark Mobley.  It is exclusive to and may not be transmitted or copied to other other sites/domains.