Today I Learned

(back up to October)

October 16th

Today I learned the proof for the lower bound that comparison sorts must take at least $\Omega(n\log n)$ time. This is not terribly hard, so I guess today's entry will be short. In short, a comparison sort is very roughly a query of one bit of information among the $n!$ total possible permutations. So purely in terms of information theory, we need at least $\log_2n!=\Theta(n\log n)$ queries, which is what we wanted.

Purely for the sake of more words, we add some of the details. Essentially, imagine there is an oracle with the array $(a_\bullet)$ and the sorting algorithm queries the oracle with indices $(i,j)$ for comparison information $a_i \gt a_j.$ Afterwards, the algorithm should output the (indexed, perhaps) sorted order of the array. We'll say querying the oracle has constant-time cost, as in real life. The claim is that at least $\Omega(n\log n)$ queries to the oracle are necessary to sort.

To see this, replace the oracle with an evil genie, as usual. The evil genie will decide what the array is while the algorithm queries. Begin with the set of elements $\{1,\ldots,n\}$; the genie will keep track of the full pool of $n!$ permutations. When the algorithm queries the genie $(i,j),$ the genie checks all currently possible permutations of $\{1,\ldots,n\}$ (taking into account previous comparison information given to the algorithm) and chooses the comparison information of $(i,j)$ permitting the most permutations of $\{1,\ldots,n\}$. Namely, there are two possibilities:

  • $a_i \gt a_j,$ or

  • $a_i\le a_j.$

One of these will have a number of associated permutations at least the other, and the genie will make that choice, dividing the set of possible remaining permutations of $\{1,\ldots,n\}$ into no less than half.

If the algorithm stops and asserts a sorted array (of indices, say) before there is one permutation remaining in the genie's pool, then the genie can just choose one of the permutations unequal to the one provided by the algorithm, and the algorithm will have sorted incorrectly on that permutation.

It follows that the number of queries to the evil genie must at least succeed in dividing the pool of $n!$ permutations down to a single permutation. But each query divides the pool by no more than half, so after $q$ queries, we will have divided the pool into no more than\[\frac{n!}{2^q}\]permutations. For this to be no more than $1,$ we must have taken $2^q\ge n!,$ so $q\ge\log_2n!$ queries. Adjusting for constants, this means we need $\Omega(\log n!)$ queries.

It remains to show $\Omega(\log n!)=\Omega(n\log n).$ The slick way to do this is to ignore constant terms and say\[\log n!=\sum_{k=1}^n\log k\ge\sum_{k \gt n/2}^n\log\frac n2\ge\left(\frac n2-1\right)\log\frac n2=\frac12n\log n-n\log2-\frac12\log n+\frac12\log2.\]Dividing by $n\log n$ and taking $n\to\infty$ gives a ratio of\[\frac12-\frac{\log2}{\log n}-\frac1{2n}+\frac{\log2}{n\log n}\longrightarrow\frac12.\]It follows $\Omega(\log n!)=\Omega(n\log n),$ which completes the proof.

Because I know some analytic number theory, I remark that Abel summation implies\[\log n!=\sum_{k=1}^n\log k\cdot1=n\log n-\int_1^n\frac{\floor x}x\,dx.\]For the integral, we can write it as\[\int_1^n\frac xx\,dx-\int_1^n\frac{\{x\}}x\,dx=n+O\left(\int_1^n\frac1x\,dx\right)=n+O(\log n).\]It follows $\log n!=n\log n+n+O(\log n).$