During a talk of Andrew Granville in the Probability in Number Theory conference in the CRM, I’ve learned Billingsley’s proof of the Erdős-Kac theorem (see here). This proof is pretty cool, and it uses really a bit number theory and two bits of probability. This post is an attempt to survey it.

For , let

be the number of distinct prime divisors of . We always use to denote a prime number. For example . The Erdős-Kac theorem is a Central Limit Theorem (CLT) for . It says that as , the average is

the variance is

and the CLT itself which says that for any

The proof goes by computing the moments of the normalization of and showing it coincide with a random model, which is Gaussian.

The story being the name of the theorem is quite nice: Erdős sat in a talk of Kac who presented this theorem as a something plausible, and Erdős gave a proof at the end of the talk.

The proof below is not the original proof, but rather a proof based on moments. Here is a list of the tools we will use, coming from probability and from number theory.

The first is about the moments of a random variable with mean and standard deviation are

Fact 1: .

The second is the classical CLT saying that a normalized sum of *many *independent random variables which are *reasonably distributed* is distributed normal with mean and standard deviation .

Fact 2: .

We will use it for sums of Bernoulli random variables.

The next fact comes from analytic number theory, and it follows quite directly from the prime number theorem. It gives an asymptotic formula for the sum of prime reciprocal:

Fact 3: .

## The random model

Let us define a sequence of independent Bernoulli random variables indexed by prime numbers with the following law of distribution:

So models the indicator function of integers divisible by . In this respect, the sum

models the function that gives the number of prime divisors of which are not bigger than . When is not too small with respect to , the function would be relatively close to , as will be explained below. We can compute the mean of using Fact 3:

.

Since the are independent, the variance may be computed similarly:

.

(Recall that .) So by Fact 2 we have

.

## The proof

Put . First, we compare the moments of and of :

Using the notation for least common multiple, the inner sum can be expanded and bounded in the following way:

This step explains the choice , because then also $y^j = x^{o(1)}$. Plugging this in , we conclude that is small compared to the variance , and so we have

,

with and .

Next we show that and are close. To this end take . But since can have at most prime divisors bigger than we get that

.

This implies that the moments of are as of a Gaussian. So to conclude the proof we just need to note that

so it is OK to replace by .

Q.E.D