by Donald E. Knuth
Important aspects of popular songs are best understood in terms of modern complexity theory.
tags: Knuth, complexity, computer science, humor, songs, ACM
Every day brings new evidence that the concepts of computer science are applicable to areas of life which have little or nothing to do with computers.
Originally published in Communications of the ACM, April 1984, Volume 27, Number 4.
The purpose of this survey paper is to demonstrate that important aspects of popular songs are best understood in terms of modern complexity theory.
It is known [3] that almost all songs of length n require a text of length Θ(n). But this puts a considerable space requirement on one's memory if many songs are to be learned; hence, our ancient ancestors invented the concept of a refrain [14]. When the song has a refrain, its space complexity can be reduced to cn, where c < 1, as shown by the following lemma. [].The research reported here was supported in part by the National Institute of Wealth under grant $262,144.
LEMMA 1. Let S be a song containing m verses of length V and a refrain of length R, where the refrain is to be sung first, last, and between adjacent verses. Then the space complexity of S is
(V/(V+R))n + O(1)for fixed V and R as m → ∞.
Proof. The length of S when sung is
n = R + (V + R)m (1)
while its space complexity is
c = R + Vm (2)
By the Distributive Law and the Commutative Law [4]:
c = n − (V + R)m + mV
= n − Vm − Rm + Vm (3)
= n − Rm
The lemma follows. □
(It is possible to generalise this lemma to the case of verses of differing lengths V₁, V₂, …, Vₘ, provided that the sequence (Vₖ) satisfies a certain smoothness condition. Details will appear in a future paper.)
A significant improvement on Lemma 1 was discovered in medieval European Jewish communities, where an anonymous composer was able to reduce the complexity to O(√n). His song "Ehad Mi Yode'a" or "Who Knows One?" is still traditionally sung near the end of the Passover ritual, reportedly in order to keep the children awake [6]. It consists of a refrain and 13 verses v₁, …, v₁₃, where vₖ is followed by vₖ₋₁ … v₂v₁ before the refrain is repeated; hence m verses of text lead to ½m² + O(m) verses of singing. A similar song called "Green Grow the Rushes O" or "The Dilly Song" is often sung in western Britain at Easter time [1], with only twelve verses, and Breton, Flemish, German, Greek, Medieval Latin, Moldavian, and Scottish versions are known.
The coefficient of √n was further improved by a Scottish farmer named O. MacDonald, whose construction appears in Lemma 2. [].MacDonald's priority has been disputed by some scholars; Peter Kennedy ([8], p. 676) claims that "I Bought Myself a Cock" and similar farmyard songs are actually much older.
LEMMA 2. Given positive integers
aandλ, there exists a song whose complexity is(20 + λ + a)√n / (30 + 2λ) + O(1).
Proof. Consider the following schema [9]:
V₀ = "Old MacDonald had a farm," R₁
R₁ = "Ee-igh, ee-igh, oh!"
R₂(x) = V₀ "And on this farm he had some" x "," R₁ "With a"
U₁(x, x') = x' "here and a" x' "there;"
U₂(x, y) = "here a" y ","
U₃(x, x') = U₁(x,x') U₂("h",x) U₂("t",x') U₂("everyw",x',x)
Vₖ = U₃(Wₖ, Wₖ') Vₖ₋₁ for k ≥ 1
where the animal sounds are:
W₁ = "chick" W₁' = "chicks"
W₂ = "quack" W₂' = "ducks"
W₃ = "gobble" W₃' = "turkeys"
W₄ = "oink" W₄' = "pigs"
W₅ = "moo" W₅' = "cows"
W₆ = "hee" W₆' = "donkeys" (W₆ = "haw" before the refrain)
The song of order m is defined by Jₘ = R₂(Wₘ') Vₘ Jₘ₋₁ for m ≥ 1, where J₀ = ε.
The length of J(m) is
n = 30m² + 153m + 4(λ₁ + (m−1)λ₂ + … + λₘ) + (a₁ + … + aₘ) (9)
while the length of the corresponding schema is
c = 20m + 211 + (λ₁ + … + λₘ) + (a₁ + … + aₘ) (10)
Here λₖ = |Wₖ| + |Wₖ'| and aₖ = |Wₖ''|, where |x| denotes the length of string x. The result follows at once if we assume that λₖ = λ and aₖ = a for all large k. □
Note that the coefficient (20 + λ + a) / (√(30 + 2λ)) assumes its minimum value at
k = max(1, a − 10) (11)
when a is fixed. Therefore if MacDonald's farm animals ultimately have long names, they should make slightly shorter noises.
Similar results were achieved by a French Canadian ornithologist, who named his song schema "Alouette" [2, 15]; and at about the same time by a Tyrolean butcher whose schema [5] is popularly called "Ist das nicht ein Schnitzelbank?" Several other cumulative songs have been collected by Peter Kennedy [8], including "The Mallard" with 17 verses and "The Barley Mow" with 18. More recent compositions, like "There's a Hole in the Bottom of the Sea" and "I Know an Old Lady Who Swallowed a Fly" unfortunately have comparatively large coefficients.
A fundamental improvement was claimed in England in 1824, when the true love of U. Jack gave to him a total of 12 ladies dancing, 22 lords a-leaping, 30 drummers drumming, 36 pipers piping, 40 maids a-milking, 42 swans a-swimming, 42 geese a-laying, 40 golden rings, 36 colly birds, 30 French hens, 22 turtle doves, and 12 partridges in pear trees during the twelve days of Christmas [11]. This amounted to ⅙m³ + ½m² + ⅓m gifts in m days, so the complexity appeared to be O(∛n); however, it was soon pointed out [10] that his computation was based on n gifts rather than n units of singing. A complexity of order O(√(n / log n)) was finally established (see [7]).
We have seen that the partridge in the pear tree gave an improvement of only 1/√(log n); but the importance of this discovery should not be underestimated, since it showed that the n^0.5 barrier could be broken. The next big breakthrough was in fact obtained by generalising the partridge schema in a remarkably simple way. It was J. W. Blatz of Milwaukee, Wisconsin who first discovered a class of songs known as "m Bottles of Beer on the Wall"; her elegant construction appears in the following proof of the first major result in the theory. [].Kennedy ([8], p. 631) claims priority for the English, in this case because of the song "I'll drink m if you'll drink m+1." However, the English start at m = 1 and get no higher than m = 9, possibly because they actually drink the beer instead of allowing the bottles to fall.
THEOREM 1. There exist songs of complexity
O(log n).
Proof. Consider the schema:
Vₖ = Tₖ "bottles of beer on the wall,"
Tₖ "bottles of beer."
"If one of those bottles should happen to fall,"
Tₖ₋₁ "bottles of beer on the wall."
where Tₖ is a translation of the integer k into English. It requires only O(m) space to define Tₖ for all k < 10ᵐ, since:
T(q·10ᵐ + r) = Tq "times 10 to the" Tₘ "plus" Tᵣ (14)
for 1 ≤ q ≤ 9 and 0 ≤ r < 10ᵐ⁻¹. Therefore the songs Sₖ defined by
S₀ = ε, Sₖ = Vₖ Sₖ₋₁ for k ≥ 1 (15)
have length n ≈ k log k, but the schema which defines them has length O(log k). The result follows. □
Theorem 1 was the best result known until recently, [].The chief rival for this honour was "This old man, he played m, he played knick-knack…" perhaps because it satisfied all practical requirements for song generation with limited memory space. In fact, 99 bottles of beer usually seemed to be more than sufficient in most cases.
However, the advent of modern drugs has led to demands for still less memory, and the ultimate improvement of Theorem 1 has consequently just been announced:
THEOREM 2. There exist arbitrarily long songs of complexity
O(1).
Proof (due to Casey and the Sunshine Band). Consider the songs Sₖ defined by (15), but with
Vₖ = "That's the way," U "I like it," U (16)
U = "uh huh," "uh huh"
for all k. □
It remains an open problem to study the complexity of nondeterministic songs.
Acknowledgment. I wish to thank J. M. Knuth and J. S. Knuth for suggesting the topic of this paper.
✦ memory · ☽ night · ∞ loops · ❧ margins · ◆ proof
a personal library in perpetual arrangement · MMXXVI