In: B.G. Bara, L. Barsalou, & M. Bucciarelli (Eds.) 2005. Proceedings of the Twenty-Seventh Annual Conference of the Cognitive Science Society (p. 2476). Mahwah, NJ: Erlbaum.


Cognitive Constraints on the Organization of Language and Music


Gertraud Fenk-Oczlon (

Department of Linguistics and Computational, University of Klagenfurt

Universitaetsstrasse 65-67, 9020 Klagenfurt, Austria


August Fenk (

Department of Media and Communication Studies and Department of Psychology, University of Klagenfurt

Universitaetsstrasse 65-67, 9020 Klagenfurt, Austria



Some Parallels between Language and Music

Both language and music are organized temporally, both show rhythm and have syntactically structured sequences. In both the sounds come to us as a sequence of pulses - in language as syllables, in music as notes. Are there more detailed parallels between language and music, due to the functioning and constraints of our cognitive system?

Several crosslinguistic correlational studies by the authors (e.g. Fenk-Oczlon & Fenk, 1999) showed a mean number of about seven syllables per simple declarative sentence, ranging from 5 in Dutch to 10 in Japanese, and a mean number of about 4 words, ranging from 2.5 in Arabic to 5.4 in Chinese. These findings in a sample of 34 languages  correspond with Miller’s magical number 7±2 on the syllable level – the rather relevant level with respect to the rhythmic structure of  utterances – and with Cowan’s (2001) magical number 4±1 on the word level. Our studies showed, moreover, balancing effects such as a significant negative correlation between n of syllables per sentence and n of phonemes per syllable. These correlations point to time-related constraints in language processing. Such durational spans (of about 1.8 sec) and their relevance for subjective rhythmization have already been studied in the classical work of Fraisse. All that corresponds with descriptions of musicologists: 6 to 11 notes within a musical phrase (Thom, Spevak, & Höthker, 2002) and/or within Fraisse’s psychological present; and a number of 30 to 300 beats per minute (Parncutt & Drake, 2001) again amounts to a maximum of 10 pulses within a span of 2 seconds (300 pulses per min = 5 pulses per sec = 200 msec per pulse).

Rhythm-based Typological Co-variations?

In an  experimental study by Sadakata, Ohgushi, & Desain (2004) Dutch and Japanese percussion players performed rhythm patterns consisting of 2 notes; in the extreme ratios (1:4, 1:5, 5:1, 4:1) the Japanese musicians performed patterns  with a smaller duration ratio than given by the scores. In our context this could mean that they “distorted” the ratio in a way corresponding to the restricted variability of syllable duration in Japanese. This interpretation is supported by the results of Patel & Daniele (2003); their comparison  between the stress-timed language English and the syllable-timed language French shows that the respective music reflects the rhythm of speech. They state, moreover, that “vowels form the core of syllables, which can in turn be compared to musical tones” (p. B37). This is in line with our assumption that many parallels between language and music emerge from singing, which unites both of these achievements (see below), and with the tentative idea that the number of notes matches the vowel inventory.


The ideal size of the “packages” that our cognitive apparatus can simultaneously handle seems to be in the region of 5 rather complex or 10 very simple syllables within a clause or 2 seconds and in a region of 6 to 11 notes within a musical phrase or 2 seconds. The most original and still most common form of music is singing, and several authors claim that the communication between mother and infant is “protomusical”. Talking as well as singing comes about in intonation units which can be viewed as a special case of action units (Fenk-Oczlon & Fenk, 2002). One has to assume that any determinant of intonation units will be reflected in language as well as in music. Relevant determinants are the “clausal structure” of the breath cycle and the (coordinated) “clausal structure” of those cognitive processes programming the shape of sound.


Cowan, N. (2001). The magical number 4 in short-term memory. A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24 (1), 87-114.

Fenk-Oczlon, G., & Fenk, A. (1999). Cognition, quantitative linguistics, and systemic typology. Linguistic Typology, 3-2, 151-177.

Fenk-Oczlon, G., & Fenk, A. (2002). The clausal structure of linguistic and pre-linguistic behavior. In T. Givón & B.F. Malle (Eds.), The Evolution of Language out of Pre-Language.  Amsterdam: John Benjamins.

Parncutt, R., & Drake, C. (2001) Psychology: Rhythm. In S. Sadie (Ed.), New Grove Dictionary of Music and Musicians, 20. London, 535-538, 542-555.

Patel, A. D., & Daniele, J. R. (2003). An empirical comparison of rhythm in language and music. Cognition, 87, B35-45.

Sadakata, M., Ohgushi, K., & Desain, P. (2004). A cross-cultural comparison study of the production of simple rhythm patterns. Psychology of Music, 32, 389-403.

Thom, B., Spevak, C., & Höthker, K. (2002). Melodic segmentation: evaluating the performance of algorithms and musical experts. Proceedings of the International Computer Music Conference (ICMC), Goteburg.