On random walks with weights

December 30, 2018

It is well known to experts that the probability that a complex number \alpha will be a root of a random polynomial \pm 1 \pm X \pm X^2 + \cdots \pm X^n is O(1/\sqrt{n}). Here is a simulation of the values of \sum_{k=0}^n \pm i^k for n=1,\ldots, 100000:

There are many ways to prove such a statement, one is using the following theorem of Erdős:

Theorem: Let (a_1,a_2, \ldots ) be a sequence of nonzero complex numbers and c a complex number. Then

\mathbb{P}(\sum_{i=1}^n \pm a_i = c) =O(1/\sqrt{n}).

Here the implied constant is absolute.

The simplest case of this theorem is when all the a_i-s are equal and c=0—a simple random walk on \mathbb{Z}. Then the probability to hit 0 is either 0 if n is odd, or

\mathbb{P}(\sum_{i=1}^{n} \pm 1=0) = 2^{-n}\binom{n}{n/2}\ll \frac{1}{\sqrt{n}},

if n is even. Here f(n)\ll g(n) means that there exists an absolute constant C such that |f(n)|\leq Cg(n). The last inequality is elementary, see here, and using Stirling’s formula (or Wallis formula) can be made into the asymptotic formula \mathbb{P}(\sum_{i=1}^{n} \pm 1=0) \sim \sqrt{\frac{2}{\pi n }}.

The goal of this post is to describe an easy “circle-method-argument” that reduces the proof of the theorem to the case of a simple random walk on \mathbb{Z}. I’ve learnt this argument from James Maynard.


By conditioning on the last sign, one sees that it suffices to prove the statement for even n. Indeed, had we knew the assertion for even n-s, then for odd n we have

\displaystyle \begin{aligned}&\mathbb{P}(\sum_{i=1}^n \pm a_i = c)\\ &\quad = \frac{1}{2}\mathbb{P}(\sum_{i=1}^{n-1} \pm a_i = c-a_n)+\frac12\mathbb{P}(\sum_{i=1}^{n-1} \pm a_i = c+a_n)\\ &\quad \ll \frac{1}{\sqrt{n-1}}\leq\frac{2}{\sqrt{n}}.\end{aligned}

We start by considering the case where all the a_i-s are rational. Since \sum_{i=1}^n \pm a_i=c if and only if \sum_{i=1}^n \pm D a_i=Dc with D a common denominator for the a_i-s, we may assume w.l.o.g. that the a_i are integers. We may also assume that c is an integer, otherwise there is no solution and so the probability is 0. We put S(n,(a_1,a_2,\ldots),c) to be the number of choice of signs so that \sum_{i=1}^n\pm a_i=c so that \mathbb{P}(\sum_{i=1}^n \pm a_i=c)=2^{-n}S(n,(a_1,a_2,\ldots),c).  To compute S we use the trivial identity for integers a

\displaystyle \int_{0}^1 e(a\theta)d\theta = \begin{cases}1,& a=0\\ 0,&a\neq 0,\end{cases}

where e(\theta) = e^{2\pi i\theta}. Thus, by expanding the product, we have

\begin{aligned}&S(n,(a_1,a_2,\ldots),c) = \int_0^1 \prod_{i=1}^n (e(a_i\theta)+e(-a_i\theta))e(-c\theta)d\theta\\ &\quad \leq \int_0^1 \prod_{i=1}^n |e(a_i\theta)+e(-a_i\theta)|d\theta.\end{aligned}

We now apply the AM-GM inequality to get that

S(n,(a_1,a_2,\ldots),c)\leq \frac{1}{n}\sum_{i=1}^n \int_0^1 |e(a_i\theta)+e(-a_i\theta)|^nd\theta.

Since e(a_i\theta)+e(-a_i\theta) is real and n is even, we may remove the absolute value in the integral, and then by the same argument as before

\int_0^1 (e(a_i\theta)+e(-a_i\theta))^nd\theta = S(n,(a_i,a_i,\ldots),0).

So we are back to the simple random walk, that is,

S(n,(a_i,a_i,\ldots),0)=S(n,(1,1,\ldots),0) \ll 2^n/\sqrt{n}.

Averaging over i we conclude that

\mathbb{P} (\sum_{i=1}^n \pm a_i=c) \ll \frac{1}{\sqrt{n}},

as needed. This finishes the proof for rational a_i-s.

Next we consider the general setting, namely, when the a_i-s are nonzero complex numbers. Let V be the \mathbb{Q}-vector space that is spanned by the a_i-s and proceed by induction on k=\dim_{\mathbb{Q}} V. When k=1, then we are back in the setting of rational a_i-s, so we may assume that k>1. Take W\subseteq V of dimension k-1. So the a_i-s divide into two nonempty sets A_1 and A_2 such that A_1\subseteq W  and A_2\cap W=\emptyset.

Fix some a\in A_2 so that V=W\oplus \mathbb{Q}a. Any a_i\in A_2 breaks as a_i = w_i+a_i' a with a_i'\neq0 rational and w_i\in W. We may assume that c\in V (otherwise the probability is zero) and hence can be decompose as c=w_c+c'a with c' rational and w_c\in W. Then \sum_{i=1}^n \pm a_i = c if and only if \sum_{i: a_i\in A_2} \pm a_i'=c' and \sum_{i: a_i\in A_1} \pm a_i = d with d=c-\sum_{i:a_i\in A_2}\pm w_i. Denote the former event by E_2 and for any choice of signs for the elements in A_2 denote the latter event by E_1.

Putting p_i= \begin{cases} \frac12, & \#A_i=1 \\  \frac{1}{\sqrt{\#A_i}},&\mbox{o/w}\end{cases}, we get by the induction assumption that \mathbb{P}(E_2)\ll p_2 and conditioning on the choice of signs for the elements in A_2, the probability of  E_1 is  \ll p_2. Thus by conditioning on E_2 we get that

\displaystyle  \mathbb{P} (\sum_{i=1}^n \pm a_i=c) \leq p_1p_2.

This conclude the proof since clearly p_1p_2\leq \frac{1}{\sqrt{n}} (separate to the cases that \#A_i=1 for some i and that both \#A_i>1). QED



Billingsley’s proof of Erdős-Kac Theorem

July 25, 2018

During a talk of Andrew Granville in the Probability in Number Theory conference in the CRM, I’ve learned Billingsley’s proof of the Erdős-Kac theorem (see here). This proof is pretty cool, and it uses really a bit number theory and two bits of probability. This post is an attempt  to survey it.


For n\geq 1, let

\omega(n)=\sum_{p\mid n} 1

be the number of distinct prime divisors of n. We always use p to denote a prime number. For example \omega(6)=\omega(12)  = 2. The Erdős-Kac theorem  is a Central Limit Theorem (CLT) for \omega. It says that as x\to \infty, the average is

\displaystyle  \frac{1}{x} \sum_{1\leq n\leq x} \omega(n) = \log\log x (1+o(1))

the variance is

\displaystyle \frac{1}{x}\sum_{1\leq n\leq x}( \omega(n) -\log\log x)^{2} = \log\log x(1+o(1))

and the CLT itself which says that for any \lambda>0

\displaystyle \begin{aligned} &\frac{1}{x} \# \left\{ 1\leq n\leq x: \frac{\omega(n)-\log\log x}{\sqrt{\log\log x}} \geq \lambda \right\} \\ &\quad = (1+o(1))\frac{1}{\sqrt{2\pi}} \int_{\lambda}^{\infty} e^{-t^2/2}dt.\end{aligned}

The proof goes by computing the moments of the normalization of \omega and showing it coincide with a random model, which is Gaussian.

The story being the name of the theorem is quite  nice:  Erdős sat in a talk of Kac who presented this theorem as a something plausible, and Erdős gave a proof at the end of the talk.

The proof below is not the original proof, but rather a proof based on moments. Here is a list of the tools we will use, coming from probability and from number theory.
The first is about the moments of a random variable X with mean \mu=0 and standard deviation \sigma =1 are

Fact 1: \mathbb{E}(X^m) = \begin{cases} \frac{(2k)!}{2^k\cdot k!},& m=2k\\ 0, & m = 2k+1 \end{cases}.

The second is the classical CLT saying that a normalized sum S of many independent random variables which are reasonably distributed is distributed normal with mean 0 and standard deviation 1.

Fact 2: \frac{S-\mu}{\sigma} \sim N(0,1).

We will use it for sums of Bernoulli random variables.
The next fact comes from analytic number theory, and it follows quite directly from the prime number theorem. It gives an asymptotic formula for the sum of prime reciprocal:

Fact 3: \displaystyle \sum_{p\leq y} \frac{1}{p} = \log\log y +O(1).

The random model

Let us define a sequence of independent Bernoulli random variables indexed by prime numbers with the following law of distribution:

\mathbb{P}(X_p = 1) = \frac 1p, \qquad\mathbb{P}(X_p=0) = 1-\frac{1}{p}

So X_p models the indicator function 1_p of integers divisible by p. In this respect, the sum

W_y =  \sum_{p\leq y} X_p

models the function \omega_y = \sum_{p\leq y}1_p that gives the  number  of prime divisors of n which are not bigger than y. When y is not too small  with respect to x, the function \omega_y would be relatively close to \omega, as will be explained below. We can compute the mean of W_y using Fact 3:

\displaystyle\mu_y:= \mathbb{E}(W_y) = \sum_{p\leq y} \frac{1}{p}= \log\log y + O(1).

Since the X_p are independent, the variance may be computed similarly:

\displaystyle \sigma_y^2 := {\rm Var}(W_y) = \sum_{p\leq y} \Big(\frac{1}{p}-\frac{1}{p^2}\Big)=\log\log y+O(1).

(Recall that \sum \frac{1}{p^2} \ll \sum \frac{1}{n^2} =O(1).) So by Fact 2 we have

\displaystyle \frac{W_y-\mu_y}{\sigma_y}\to N(0,1).

The proof

Put y=x^{o(1)}. First, we compare the moments of W_y and of \omega_y:

\displaystyle \begin{aligned}  &\star :=\frac{1}{x}\sum_{n\leq x} (\omega_y(n)-\mu_y)^k  - \mathbb{E}(W_y-\mu_y)^k \\  &\quad =\sum_{j=1}^k \binom{k}{j} (-\mu_y)^{k-j} \Big(\frac{1}{x} \sum_{n\leq x} \omega_y(n)^j-\mathbb{E}W_y^j\Big).  \end{aligned}

Using the notation [\bullet,\ldots,\bullet] for least common multiple, the inner sum can be expanded and bounded in the following way:

\displaystyle\begin{aligned}&   \sum_{p_1\cdots p_j\leq y} \Big(\frac{1}{x} \sum_{n\leq x} 1_{p_1}(n)\cdots 1_{p_j}(n) - \mathbb{E}(X_{p_1}\cdots X_{p_j})\Big)\\  &\quad =\sum_{\substack{p_1\cdots p_j \leq y\\ L = [p_1,\ldots, p_j]}} \Big(\frac{1}{x}\Big\lfloor \frac{x}{L}\Big\rfloor - \frac{1}{L}\Big) \\  &\quad\ll \sum_{p_1\cdots p_j\leq y} \frac{1}{x} = \frac{\pi(y)^j}{x} \ll \frac{y^{j}}{x}\ll x^{-1+o(1)}.  \end{aligned}

This step explains the choice y=x^{o(1)}, because then also $y^j = x^{o(1)}$. Plugging this in \star, we conclude that \star is small compared to the variance \sigma_y^2, and so we have

\frac{1}{x} \sum_{n\leq x} \Big( \frac{\omega_y(n) -\mu_y}{\sigma_y}\Big)^k \sim \mathbb{E}\Big(\frac{W_y-\mu_y}{\sigma_y}\Big)^k \sim \mathbb{E}(X^k),

with X\sim N(0,1) and x\to \infty.

Next we show that \omega and \omega_y are close. To this end take y = x^{\frac{1}{\log\log\log x}}. But since n\leq x can have at most \log\log\log x prime divisors bigger than y we get that

\omega(n) - \omega_y(n) \leq \log\log\log x.

This implies that the moments of \frac{\omega(n) - \mu_y}{\mu_y} are as of a Gaussian. So to conclude the proof we just need to note that

\mu_y = \log\log y + O(1) = \log \log x^{1\log\log\log x} = \log \log x - \log \log \log\log\log x +O(1)

so it is OK to replace \log \log x by \mu_y.


Is a random polynomial with restricted coefficients irreducible?

May 11, 2018

This post reports on some progress Gady Kozma and I have done recently on the problem of the irreducibility of random polynomials.
The main object of study is a random polynomial; i.e.,

f(X) = X^n + \zeta_{n-1} X^{n-1} + \cdots + \zeta_1 X+ \zeta_0,

where \zeta_0 ,\ldots \zeta_{n-1} are random variables. In general we assume that the \zeta_i are iid. Perhaps the most interesting example is when \zeta_i=\pm 1 with probabilities 1/2. Here is how the roots look like:


The roots of polynomials with plus-minus 1 coefficients as illustrated by Sam Derbyshire

There are four types of classical questions on these polynomials:

  1. How many real roots f has? Here there is a huge body of literature starting with the classical works of Bloch-Pòlya, Offord-Littlewood, Kac, and many many more. (On average there are about \frac{2}{\pi} \log n real roots, for \zeta_i with mean zero. See the introduction in here for a nice historical review.)
  2. How the complex roots distribute? We refer to the work of Odlyzko-Poonen and to here for the above and many more beautiful illustrations.
  3. The universality question: How the answers to the above questions depend on the actual distribution of the \zeta_i.

If the \zeta_i\in \mathbb{Z}, then a question of irreducibility over \mathbb{Q} arises.

4. Is f irreducible whp?

Here we should stress that the assumption \textrm{Prob}(\zeta_0=0)=0 is obviously necessary. One of the most exciting problems in this field is:

Conjecture (Odlyzko-Poonen): Let \zeta_i be iid random variables taking the values 0,1 with equal probabilities expect \zeta_0=1. Then

\textrm{Prob}(f \textrm{ irreducible}) = 1 +o(1).

This conjecture is open. In fact, I got familiarized with these type of questions by some guy on the street who asked about irreducibly whp in the \zeta_i = \pm 1 model.

The best lower bound is due to Konyagin:

Theorem 1 (Konyagin): In the setting of the conjecture,

\textrm{Prob}(f \textrm{ irreducible}) \gg 1/\log n.

Here we use the notation f(n)\gg g(n) if there exists an absolute constant C>0such that f(n)\geq C g(n).

Even if one takes the easier problem when the coefficients are  allowed  to take values in a bigger fixed set, no result was known. Recently, Gady Kozma and I uploaded to the arXiv a paper, in which we prove a Odlyzko-Poonen type conjecture, for \zeta_i taking the values 1,\ldots, 210.

Theorem 2 (LBS-Kozma): Let H>0 be an integer divisible by at least 4 primes; the smallest being H=210. Assume that the \zeta_i-s are iid random variables taking values uniformly in \{1,\ldots, H\}. Then

\textrm{Prob}(f \textrm{ irreducible}) = 1+o(1)

(as n\to \infty).

In fact, we just need the coefficients to distribute uniformly modulo H and, of course, that \textrm{Prob}(\zeta_0=0)=0.

In fact, we show that f is very irreducible. For this we recall that for a polynomial with integer coefficients one attaches a Galois group: If \alpha_1, \ldots, \alpha_n are the complex roots of f (with multiplicity), and if N=\mathbb{Q}(\alpha_1,\ldots, \alpha_n) is the minimal subfield of \mathbb{C} containing the \alpha_i-s, then the Galois group is defined as the group of field automorphisms of N:

G_f = \textrm{Gal}(N/\mathbb{Q}) := \textrm{Aut}_{\mathbb{Q}}(N).

An element \sigma\in G_f is determined by its action on the \alpha_i since every element in N is a polynomial expression in the \alpha_i. So we have an embedding

G_f \to S_n

which is well defined up to reordering of the roots; i.e., up to conjugation in S_n. So from now on we consider G_f as a subgroup of S_n.

Galois theory connects arithmetic properties of f with group theoretical properties of G_f, and the most basic one is that

G_f is transitive  if and only if f is irreducible.

The property of G_f being doubly transitive (i.e., acting transitively on the set of pairs (i,j) with i\neq j) translates to the property that f is irreducible over \mathbb{Q} and f(X)/(X-\alpha_1) is irreducible over \mathbb{Q}(\alpha_1). One may continue like this for 3-transitive, and so on. However, by the classification of finite simple groups, this ends in 6 steps, since a 6-transitive group is either A_n or S_n in which case the group is either  n-2 or n-transitive, respectively. (We remarks that with four exceptions M_{11}, M_{12},M_{23},M_{24} we in fact can be done in 4 steps). We prove

Theorem 3 (LBS-Kozma): Let H>0 be an integer divisible by at least 4 primes; the smallest being H=210. Assume that the \zeta_i-s are iid random variables taking values uniformly in \{1,\ldots, H\}. Then

\textrm{Prob}(G_f =A_n \textrm{ or } S_n) = 1+o(1)

(as n\to \infty).

Large box model

We start by an easier model which we call the large box model. In this model, we fix the degree of f, and we choose the \zeta_i to vary uniformly in \{1,\ldots, H\}, where now H\to \infty. The study of this model goes back to Van der Waerden, who proved in the 1930s that

\textrm{Prob}(G_f = S_n) =O(H^{-1/6}).

The heuristic idea (which we give for irreducibility, for simplicity) is that

\{f \textrm{ reducible}) \} \subseteq \bigcap_{p} \{f \textrm{ reducible }\mod p\},

and the right-hand-side events are roughly independent when H is much bigger than p, and so we have

\begin{aligned}\textrm{Prob}(f \textrm{ reducible})&\lessapprox \prod_{p}  \textrm{Prob}(f \textrm{ reducible }\mod p)\\&\approx \prod_p(1-1/n),\end{aligned}

where in the last equality we have used the prime polynomial theorem. So, if this estimation is ok for a set primes which grows with n, we get that most polynomials are irreducible.

Van der Waerden’s bound was improved to O(H^{-1/2+\epsilon}) in the 1960s by Gallagher who applied the large sieve inequality and to O(H^{\sqrt{2}-2+\epsilon}) by Dietmann in 2012. In 2015, Rivin gave an elementary argument which gives the nearly optimal bound of O(H^{-1+\epsilon}). (The lower bound \textrm{Prob}(G_f\neq S_n) \geq  \textrm{Prob}(f \textrm{ reducible}) \gg H^{-1} is very easy.)

Let me present the argument in the irreducibility setting : A very naive approach is to say that for 1\leq k\leq n/2

\begin{aligned} &\#\{f : \exists g,h \textrm{ st } f=gh, \deg g=k\} \\&\quad\leq \#\{f : g,h \textrm{ st } f\equiv gh\mod H, \deg g=k\}\end{aligned}.

Since there are H^{k} monic g-s modulo H and H^{n-k} monic h-s modulo H we get the bound

\textrm{Prob}(f=gh, \deg g=k) \leq \frac{1}{H^n}\cdot H^{k}\cdot H^{n-k}=1.

We recover the trivial bound, which is not very helpful 😦 However, if we just condition on the first coefficients a,b of f,g and noting that f(0)=g(0)h(0), so b\mid a, we would already get the power saving we aim for. Indeed,  putting \tau(a) = \sum_{b\mid a} 1

\begin{aligned}&\textrm{Prob}(f=gh, \deg g=k) \\&\quad=\sum_{1\leq a\leq H}\sum_{b\mid a}\textrm{Prob}(f\equiv gh, \deg g=k, f(0)=a, g(0)=b) \\ &\quad \leq \frac{1}{H^{n}} H^{k-1}H^{n-k-1}\sum_{1\leq a\leq H}\tau(a)\\&\quad \ll \frac{\log H}{H}.\end{aligned}

Here we used the classical result in number theory that the average value of \tau in the interval [1,H] is \log H.

Noting that f is reducible if and only if f=gh, \deg g=k for some 1\leq k\leq n-1, we conclude that

\textrm{Prob}(f \textrm{ reducible }) \ll n\frac{\log H}{H},

with absolute implied constant.

One may see that this bound does not help in the restricted coefficient model discussed above, as it tends to \infty with n.

However, it gives a \pm 1 coefficient result in the easier case when the number of variables is 2:

Theorem (LBS-Kozma): Let f (X) = \sum_{i,j\leq n} \pm 1X^iY^j be a random bivariate polynomial with iid coefficients taking the values \pm 1 uniformly. Then

\textrm{Prob} (f \textrm{ irreducible}) = 1+o(1)

as n\to \infty.

The crux of the proof is specializing X=2, which essentially reduces to the large box model. This was generalized by Chern to more general fixed sets of coefficients, see here.

Restricted coefficient model

Ideas in the proof of Theorem 1

Going back to the restricted coefficients model, we now explain the main tools/steps in the proof of Theorem 2. We always assume that f is a random polynomial with iid coefficients taking the values 1,\ldots, 210, uniformly (unless explicitly stated otherwise).

It will be convenient to introduce a piece of terminology: An event on a random polynomial f holds a.s. (almost surely) if the probability for it to hold is 1+o(1), as n\to \infty.

Our first ingredient is that

 A.s., the degrees of f are \geq \omega(n), for some \lim_{n\to \infty}\omega(n)=\infty.

In fact, Konyagin, while proving Theorem 1, proved the above with \omega(n)\gg n/\log n  (he considered the model of 0,1, but his argument carries over to this model too). We will not need such a strong bound on \omega(n). To get a flavor of the tools in the proof we show that a random \pm 1-coefficient polynomial has no linear factors, a.s.:

A linear factor corresponds to a rational root, which by the rational root theorem must be \pm1. The probability that \pm 1 is a root is the same as the probability that a length n random walk on the line will come back to the starting point, which is \approx 1/\sqrt{n} =o(1).  So a.s. f has no linear factor.

Using the first ingredient we may reduce the proof of Theorem 1 to proving

\textrm{Prob}(f = gh, k\leq \deg g\leq 2k)\ll \frac{1}{(\log k)^2}.

Indeed, by the first case and neglecting events that a.s. do not occur,

\begin{aligned}&\textrm{Prob}(f\textrm{ reducible})\\ &\quad \leq \sum_{\log_2\omega(n)\leq d\leq \log n/2} \textrm{Prob}(f = gh, 2^d\leq \deg g\leq 2^{d+1})\\ &\quad\ll \sum_{d=\log_2\omega(n)}^{\infty} d^2 \to 0.\end{aligned}

The next thing we need relates to the anatomy of the polynomials. Consider the space

\Omega = \{(l_1,l_2,\ldots, l_n) : l_i\geq 0\textrm{ and } \sum il_i=n\}.

of partitions of n. We have two probabilities measures on \Omega coming from M_{n,q} the set of degree-n monic polynomials with coefficients in the finite field \mathbb{F}_q and from S_n, and are given respectively by

\begin{aligned} P_{S_n} (l_1,l_2,\ldots, l_n)  & = \frac{\#\{ \sigma\in S_n \textrm{ with } l_i \textrm{ orbits of length }i\}}{n!}\\ P_{M_{n,q}} (l_1,l_2,\ldots, l_n)  & =  \frac{\#\{ \phi \in M_{n,q} \textrm{ with }l_i \textrm{ prime factors of degree }i\}}{q^n}\end{aligned}

Since we have an exact formula for the number of primes of each degree (the prime polynomial theorem) both these probabilities have formulas, see Arratia, Barbour and Tavaré for an extensive study on the relations of the two probability functions.   In particular, it is not very difficult to show that \|P_{S_n} - P_{M_{n,q}}\|=O(q^{-1}) with absolute constant that is independent on n (and funny enough this gives a new proof of the derangement problem).

However in our case q=2,3,5,7. For such small q there are events, coming from the small i-s, for which the respective probabilities are not close (e.g.  compare the probabilities of the event l_1\neq 0). The next ingredient is to show that if an event R is invariant to changing l_i for i \ll (\log k)^2, then

|P_{S_n}(R) - P_{M_{n,q}}(R)| \ll (\log k)^{-2}   (*)

Dimitris Koukoulopoulos suggested a way to directly work with polynomials instead of going through permutations. This will be discussed elsewhere.

Since our f has coefficients that are uniform modulo 210, when we reduce modulo each of the primes 2,3,5,7 we get 4-independent uniform polynomial (this is a consequence of the Chinese remainder theorem). This consideration combined with the fact that the reductions of f modulo 2,3,5,7 are independent from each other and each of them is uniform, reduce the proof to showing

for every 1\leq k\leq n/4, the probability that there is k\leq l<2k for which four iid uniform random permutations has invariant set of size l is \ll (\log k)^{-2}.

This was shown (essentially) by Pemantle-Peres-Rivin with the bound \ll k^{-c}. This does not hold for three random permutations as was shown by Eberhard, Ford and Green. This completes the steps in the proof.

Galois group

Ideas in Proof of Theorem 3

Now we know that a.s. a random polynomial is irreducible, and in particular it has a transitive Galois group.  To get that the Galois group contains the alternating group, we follow Łuczak and Pyber who proved that a.s. a random permutation is contained in no transitive subgroup other than A_n or S_n. The difference of what is needed for our theorem, is that we have a random permutation only up to a small perturbation:

Let \sigma\in S_n be a random uniform permutation and let \alpha>0 be sufficiently small (in fact \alpha<1-\frac{\log\log2}{\log2} suffices). Then a.s. there exists no transitive group G\neq A_n,S_n that contains a permutation that can be get from changing \sigma by cycles of length \leq n^{\alpha}.

The need for perturbation comes from two sources. The first is that we can from (*). The second one comes from lifting the Frobenius.

To be more precise, write f_p for f modulo p. If f_p has m_i prime factors of degree i and it is squarefree, then the Frobenius acts on the roots of f_p has a permutation with m_i cycles of length i, and this action may be lifted to an element of G_f.

However, if f_p is not squarefree, this doesn’t work as simple. There exists a unique factorization f_p = gh with g squarefull,  h squarefree, and \gcd(g,h)=1. Then, the roots of g and the roots of h each are invariant set under the action of the Frobenius. Now we can lift the Frobenius to an element \sigma\in G_f, so that \sigma acts the same on the set of liftings of the roots of h. This means that, up to the perturbation of the size of \deg g, the element \sigma is uniform. Now, we know that a.s. \deg g\leq n^{\alpha}, and so we get a random element up to perturbation as needed.



Statistics of Arithmetic Functions over Large Finite Fields

May 8, 2018

The aim of this post is to survey some results that has been obtained recently on statistic of arithmetic functions over large finite fields using Galois theory. In a future post, I will discuss the proofs.

Statistic of Arithmetic Functions

Let \psi be  an arithmetic function, by which we just mean that \psi is a function

\psi\colon \mathbb{Z}_{>0} \to \mathbb{C}.

It is a main theme in analytic number theory to study asymptotic statistic properties of arithmetic functions. The mean value of \psi is defined as the asymptotics of

\displaystyle\langle \psi(n)\rangle_{ n\leq x } := \frac{1}{x} \sum_{n\leq x} \psi(n).

For example, the Prime Number Theorem can be formulated as

\displaystyle\langle \Lambda(n) \rangle_{n\leq x} \sim 1,

where \Lambda is the von Mangoldt arithmetic function:

\Lambda(n) = \begin{cases} \log p, & n=p^a, \ p \ \textrm{prime}\\ 0, &\textrm{otherwise.}\end{cases}

One may consider mean values on short intervals

\displaystyle\langle \psi(n) \rangle_{|n-x|\leq x^{\epsilon}} : = \frac{1}{2x^{\epsilon}} \sum_{|n-x|\leq x^{\epsilon}} \psi(n)

(where 1>\epsilon>0 is fixed) or on arithmetic progressions

\displaystyle\langle \psi(n) \rangle_{\substack{n\leq x\\n\equiv a(q)}}:= \frac{1}{qx} \sum_{\substack{n\leq x\\n\equiv a(q)} } \psi(n)

where we allow q to be in the regime q\leq x^{1-\delta}, for a fixed 1>\delta >0. Two central conjectures in the distribution of prime numbers are

\displaystyle\langle \Lambda(n) \rangle_{|n-x|\leq x^{\epsilon}} \sim 1

and for (a,q)=1:

\displaystyle\langle \psi(n) \rangle_{\substack{n\leq x\\n\equiv a(q)}}\sim \frac{q}{\phi(q)}.

The Riemann hypothesis implies the first conjecture only for \epsilon > 1/2 and the General Riemann Hypothesis implies the second conjecture only for \delta >1/2. We refer the interested reader to the excellent survey paper by Granville.

Other type of statistics is of correlation of arithmetic functions \psi_1,\ldots, \psi_k which, at a tuple h=(h_1,\ldots, h_k) \in \mathbb{Z}>0 is defined as the asymptotics of

\displaystyle\left \langle \prod_{i=1}^k \psi_i(n+h_i) \right\rangle_{n\leq x}, \quad x\to \infty.

If the \psi_i-s are all equal, we use the term auto-correlation, if k=2 we say pair-correlation, etc. Following the previous examples, the auto-correlations of \Lambda as given in the Hardy-Littlewood prime tuple conjecture, implies Poisson distribution of the number of primes in intervals of typical length (this is known as Gallagher’s lemma). See this post for some more on the Hardy-Littlewood prime tuple conjecture.

Another theme, which is in a sense more general, is about primes values of arithmetic functions. That is, given a polynomial f(X) with integral coefficients, we consider the asymptotics of

\langle \psi(f(n)) \rangle_{n\leq x}.

In particular, when \psi=\Lambda, the content of the Bateman-Horn conjecture (which is a quantitative version of the Bouniakowsky conjecture and the more general Schinzel Hypothesis H).

In particular, taking f(X) = \prod(X+h_i), we get the Hardy-Littlewood prime tuple conjecture. Taking f(X) = X^2+1, we get a quantitative Landau’s problem of finding infinitely many primes of the form p=n^2+1.

We exemplified the importance of this statistics with \psi=\Lambda, but taking other arithmetic functions such as the M\”obius function, divisor function, etc. leads to a variety of central problems in analytic number theory which have applications in other fields (e.g. the auto-correlations of b and its connections to the dynamics of billiards). I do not want to elaborate on those applications here; the bottom line is that statistics of arithmetic functions is central in number theory.

Function Fields

Here we go from the realm of the integers to the world of polynomials over finite fields. The usual dictionary is that the ring \mathbb{F}_q[t] of polynomials over the finite field \mathbb{F}_q of q elements serves as an analogue of \mathbb{Z}. The positive integers are modeled by the set of monic polynomials M_q\subseteq \mathbb{F}_q[t]. So, an arithmetic function \psi in this setting is a function

\psi\colon M_q \to \mathbb{C}.

The analogue of the interval [1,x] (or rather of [x,2x]) is the set of degree n monics:

M_{n,q} := \{ f\in M_q : \deg f=n\}.

We define the mean value of \psi as the asymptotic value of

\langle\psi(f)\rangle_{f\in M_{n,q}} := \frac{1}{q^n}\sum_{f\in M_{n,q}}\psi(f)

as q^n\to \infty. It is important to note now that the asymptotics may depend on the way q^n\to \infty. We list here some different limits, going from closest to the integers to the farthest:

  1. Large-degree limit: q fixed and \lim_{n\to \infty},
  2. Large-degree-large-field limit: \lim_{q\to \infty} \lim_{n\to \infty},
  3. Large-field-large-degree limit: \lim_{n\to \infty} \lim_{q\to \infty},
  4. Large-finite-field limit: n fixed and \lim_{q\to \infty}.

We proceed with the above tradition of exemplify using the von Mangoldt function: A prime polynomial is a monic irreducible polynomial and the von Mangoldt function is defined as

\Lambda_q(f) = \begin{cases}\deg P ,& f=P^k,\ P\  \textrm{prime polynomial}\\ 0,& \textrm{otherwise.}\end{cases}

The Prime Polynomial Theorem first proved by Gauss says that

\langle\Lambda(f)\rangle_{f\in M_{n,q}} =1.

The first thing to note, that unlike the number field case, there is NO error term here. In particular, this theorem is valid in the most general limit q^n \to \infty. The proof of this theorem goes in the same lines as the proof of the Prime Number Theorem, just that the corresponding zeta function is a rational function with no zeros, hence the analytic part is trivial in the function field case. The fact that there are no zeros gives the exact formula above.

In general, the General Riemann Hypothesis (GRH) for function fields is a theorem, hence we expect that any result for the integers conditional on GRH should be provable in the function field setting. However this is not the case, a beautiful example in the setting of the Bateman-Horn conjecture is discussed here.

Short intervals

The studies on the mean values of arithmetic functions in short intervals started with the Von Mongoldt (and the prime characteristic function) by Bank&Bary-Soroker&Rosenzweig and by Keating&Rudnick. The latter paper studies the distribution of of the number of primes (counted with von-Mongoldt) in short intervals, and in particular computes the variance in the limit q\to \infty. The work is based on an equidstribution result of Katz. The former result study the mean values of the number of primes in all short intervals, in the same limit q\to \infty. Here the main point the main point is a computation of a Galois group, which is based on earlier results of S.D. Cohen.

Here a short interval around a polynomial f\in \mathbb{F}_q[T](with parameter \deg f>m>0) is the set

\{ g\in \mathbb{F}_q[T] : \deg(f-g)\leq m\}

which corresponds to a short interval \{n:|n- x|\leq x^{\epsilon}\} around x (with parameter 1>\epsilon>0), where \epsilon corresponds to \frac{m}{\deg f}.

The results for primes were generalized to other arithmetic functions, to correlations, and even to arithmetic function evaluated at polynomials, a la Bateman-Horn conjecture. Some other people who contributed a lot in this direction: Julio Andrade, Dan Carmon, Alexei Entin, Arno Fehm, Paul Pollack, Brad Rodgers, and many more.

I want to focus here on several families of arithmetic functions that we can compute the mean value in short intervals in the large degree limit. I will start with simpler ones.

Factorization type arithmetic functions

We start with an abstract notion of factorization type of a polynomial. For us, a function \lambda \colon \mathbb{Z}_{>0}\times \mathbb{Z}_{>0}\to \mathbb{Z}_{\geq 0} with finite support is called a factorization type. The set of all factorization types will be denoted by

\Lambda_0 = \{ \lambda \textrm{ factorization type}\}.

The degree of $\latex \lambda\in \Lambda_0$ is defined to be

\deg \lambda = \sum_{d,e>0} de \lambda(d,e).

To make the reader more comfortable with the notion of factorization type, we immediately say that a nonzero polynomial f\in \mathbb{F}_q[T] with prime factorization f= \prod_{i} P^{e_i} induces the factorization type defined by

\lambda_f(d,e) = \# \{ i : \deg P_i = d, \ e_i=e\}.

In particular,

\deg f = \deg \lambda_f.

The space of functions on factorization functions is denoted

\Lambda_0^* = \{ \psi \colon \Lambda_0\to \mathbb{C}\}

and we refer to \psi \in \Lambda_0^* as a factorization type arithmetic function. We immediately note that \psi\in \Lambda_0^* defines an arithmetic function on \mathbb{F}_q[T] by setting

\psi_q(f) := \psi(\lambda_f).

Many of the good old arithmetic functions are of this type. For example, the function

\displaystyle \lambda \mapsto \begin{cases}D, &\textrm{if } \sum_{d,e>0}\lambda(d,e) = \sum_{e>0} \lambda(D,e)=1\\ 0,& \textrm{otherwise}\end{cases}

induces the Von Mangoldt function, the function

\displaystyle \lambda\mapsto \begin{cases}(-1)^{\sum_{d>0}\lambda(d,1)},& \sum_{d>0,e>1}\lambda(d,e)=0\\0, & \textrm{otherwise}\end{cases}

induces the Möbius function, etc. In the same manner one may induce many of hers favorite arithmetic functions…

We may evaluate factorization type arithmetic function also on the permutation group S_n. More precisely, a permutation \sigma\in S_n on n letters, has a factorization into a product of disjoint cycles \sigma = \sigma_1\cdots \sigma_r. In this factorization we also include the trivial cycles, so that \sum_{i=1}^r o(\sigma_i)=n, where o(\sigma_i) denotes the order of \sigma_i. So we may define a factorization type for \sigma:

\lambda_\sigma (d,e) :=\begin{cases} \#\{i : o(\sigma_i)=d\} , &\textrm{e=1}\\ 0, &\textrm{otherwise.} \end{cases}

This let us evaluate \psi\in \Lambda_0^* on elements of S_n, and in particular, we have the combinatorial quantity:

\left\langle \psi \right\rangle_{ S_n}:= \left\langle \psi(\lambda_\sigma) \right\rangle_{\sigma\in S_n} = \frac{1}{n!}\sum_{\sigma \in S_n} \psi(\lambda_\sigma).

A theorem in the spirit of Bank&Bary-Soroker&Rosenzweig and of Andrade&Bary-Soroker&Rudnick connects this quantity to the mean value in short intervals.

Theorem. For every B>0 there exists a constant C_B>0 with the following property. Let \psi \in \Lambda_0^* be a factorization type arithmetic function, let q be a prime power, let B>n>m>1, let  f\in \mathbb{F}_q[T] be monic of degree n. Then

\left|\displaystyle \left\langle \psi_q(g) \right\rangle_{\deg (f-g)\leq m} -\left\langle \psi\right\rangle_{ S_n}\right|\leq C_Bq^{-1/2}.

Signed factorization type

Recall that the arithmetic function b(n) is the indication function of sums of two squares. More precisely, b(n)=1 if and only if n=a^2+b^2. There were several works on the analogue in the function fields.

A naive approach is to take a literal analogue, that if to consider polynomials of the form f=A^2+B^2, where A,B\in \mathbb{F}_q[T]. This makes sense when q\equiv 3\mod 4. When q\equiv 1\mod 4, we have the decomposition A^2+B^2 = (A+\alpha B )(A-\alpha B), where \alpha \in \mathbb{F}_q satisfies \alpha^2=-1. So in this case, b is the constant function 1. So this is not interesting. Moreover, it seems like the correct analogue to take the function field  b to be an indicator of norms from a quadratic geometric extension of \mathbb{F}_q(T) of genus zero (as analogues of \mathbb{Z}[i]), and for simplicity (following the choice of Bary-Soroker&Smilanski&Wolf) we define

b_q(f) = \begin{cases} 1,& f=A^2+TB^2\\0,& \mbox{otherwise.} \end{cases}

The theory of such functions is very similar to the theory of b(n). In particular, the analogue of Fermat theorem — saying that b(n)=1 if and only if every p\equiv 3\mod 4 divides n an even number of times — is

Theorem. b(f) = 1 if and only if for every prime polynomial P with P(0)\neq \square \in \mathbb{F}_q divides f even number of times.

This implies that b is not a factorization type, as it depends also on which arithmetic progression the primes belong to. So two questions naturally comes in mind

Question 1. What is the generalization of the notion of arithmetic functions?

Question 2. What is the group the governs these arithmetic functions?

Bary-Soroker&Fehm introduce the following notion of signed factorization type to deal with the first question. A signed factorization type is a function

\lambda\colon \mathbb{Z}_{>0}\times \mathbb{Z}_{>0}\times \{\pm 1,0\}\to \mathbb{Z}_{\geq 0}

with finite support. We put

\Lambda_1= \{\textrm{signed factorization types}\}.

We immediately define two functions on \Lambda_1:

\deg \lambda = \sum_{d,e>0,\ \epsilon=\pm1,0}de \lambda(d,e,\epsilon)

\chi \lambda =\prod_{d,e>0,\ \epsilon =\pm1,0} \epsilon^{\lambda(d,e)e}.

We immediately note that if q is odd f\in \mathbb{F}_q[T] with primes factorization f=\prod_{i} P_i^{e_i} defines a signed factorization function by

\lambda_{1,f}(d,e,\epsilon) = \# \{ i : \deg P_i = d, \ e_i=e,\ P(0)^{\frac{q-1}{2}}=\epsilon\}.

Now a signed factorization arithmetic function is a function on signed factorization types, and the space of all signed factorization arithmetic functions is denoted by

\Lambda_1^* = \{ \psi \colon \Lambda_1 \to \mathbb{C}\}.

As for factorization arithmetic function, a signed factorization arithmetic function \psi induces and arithmetic function of \mathbb{F}_q[T] by setting

\psi_q(f) := \psi(\lambda_{1,f}).

Now we get to the group theory to answer Question 2. Consider the hyperoctahedral group H_n (aka signed permutation group aka Bn aka Cn) which is the symmetry group of the following graph with 2n vertices and n edges:Screen Shot 2018-05-07 at 22.12.48

We may represent H_n as the permutational wreath product H_n=C_2 \wr S_n, whose elements can be represented by (\zeta, \sigma) with \zeta \colon [n]\to \{\pm 1\} and \sigma \in S_n. If \sigma = \sigma_1 \cdots \sigma_r is the factorization of \sigma into disjoint cycles, and if we write \sigma_i = (j_1 \ \ldots \ j_d), then we attach to (\zeta,\sigma) the signed factorization type

\lambda_{(\zeta,\sigma)}(d,e,\epsilon) = \begin{cases}\#\{i : o(\sigma_i)=d,\ j_1\cdots j_d = \epsilon\}, & \textrm{if } e=1\\ 0,&\textrm{otherwise}\end{cases}.

This means that we can evaluate a signed factorization type arithmetic function \psi\in \Lambda_1 on the hyperoctahedral group. We denote the mean value by

\left<\psi \right>_{H_n} := \left<\psi(\lambda_{(\zeta,\sigma)})\right>_{(\zeta,\sigma)\in H_n} = \frac{1}{2^nn!}\sum_{(\zeta,\sigma)\in H_n} \psi(\lambda_{(\zeta,\sigma)}).

From Bank&Bary-Soroker&Fehm one may deduce the following

Theorem. For every B>0 there exists C_B>0 with the following property. Let \psi\in \Lambda_1, q an odd prime power, n>m>1, f\in \mathbb{F}_q[T] monic f degree n. Then

\displaystyle \left | \left<\psi_q(g)\right>_{\deg(f-g)\leq m} -\left<\psi\right>_{H_n}\right| \leq C_Bq^{-1/2}.



Some easy Frobenius computation

May 3, 2018

The objective of this post is to compute the Frobenius element in the simplest  example of an Artin-Schreier extension.

For a prime power q=p^{\nu} we write  \mathbb{F}_q for the finite field with q elements and \mathbb{F}_q(t) for the field of rational functions over \mathbb{F}_q.

Consider the Artin-Schreier polynomial f(X) = X^p - X - t. If s is a root of f, then for every i\in \mathbb{F}_p we have i^p=i and so

\begin{aligned} f(s +i)& = (s+i)^p-(s+i)-t \\ &= s^p +i^p-s-i-t=f(\alpha)=0\end{aligned}.

This implies that f(X) = \prod_{i\in \mathbb{F}_p} (X-(s+i)) and thus

E= \mathbb{F}_q(t,s)=\mathbb{F}_q(s)

is a Galois extension of \mathbb{F}_q(T) of degree [E:\mathbb{F}_q(t)]=p (since, obviously, f is irreducible over \mathbb{F}_q(t)). Thus the Galois group G=\mathrm{Gal}(E/\mathbb{F}_q(t)) is isomorphic to (the additive group of) \mathbb{F}_p with the explicit isomorphism

g\in G \mapsto s-g(s)\in \mathbb{F}_p.

Next we recall the definition of the Frobenius at a prime polynomial.
Let \mathbb{F}_q[T] be the ring of polynomials with coefficients in \mathbb{F}_q and let O_E = \mathbb{F}_q[s] be the integral closure of \mathbb{F}_q[T] in E which is a polynomial ring in s.
Let P\in \mathbb{F}_q[T] be a prime polynomial, which by a convention means an irreducible monic polynomial. The polynomial P may factor in O_E, let \frak{P}_i\in \mathbb{F}_q[s] be the prime factors, i=1,\ldots, \mu . We define the Frobenius \sigma_P at P to be the set of all g\in G for which there exists 1\leq i\leq \mu such that

g(u) \equiv u^{q^{\deg P}} \mod{  \frak{P}_i}. (*)

The general theory tells us that since G is abelian, we have two options: either P is unramified, and then \sigma_P is element in G (we identify the set with one element with the actual element) or P is ramified and then \sigma_P = \mathbb{F}_p. However we compute directly that

\sigma_P = \textrm{Tr}_{q,p}(\textrm{Tr}(P)) ,

where if P = t^n + a t^{n-1} + \cdots, then \textrm{Tr}(P) = a and \textrm{Tr}_{q,p}\colon \mathbb{F}_q\to \mathbb{F}_p is the trace map given by

\textrm{Tr}_{q,p}( \alpha) = \alpha + \alpha^{p} + \cdots + \alpha^{q/p}, \qquad \alpha\in \mathbb{F}_q.

Let g\in \sigma_P which corresponds to i\in \mathbb{F}_p given by  g(s) = s-i. We have to show that i=\textrm{Tr}_{q,p}(\textrm{Tr}(P)). Let \mathfrak{P} be the corresponding prime factor of P in \mathbb{F}_q[s] satisfying (*). Then \alpha := t\mod \frak{P}\in \mathbb{F}_{q^{\deg P}} is a root of P and \beta:= s\mod \mathfrak{P} is a root of X^p-X-\alpha. So, the equality i=s-g(s) \equiv s-s^{q^{\deg P}}\mod \frak{P} translates to

i=\beta- \beta^{q^{\deg (P)}}.

Recall that q=p^{\nu} and  use the `telescopic series trick’ to get that

\begin{aligned}i &=  \sum_{c=0}^{\deg(P)\nu-1} (\beta^{p^c}-\beta^{p^{c+1}} ) \\ &=\sum_{c=0}^{\deg(P)\nu-1} (\beta-\beta^{p})^{p^c} \\ & = -\sum_{c=0}^{\deg(P)\nu-1}\alpha^{p^c}.\end{aligned}

As  \textrm{Tr}(P)= -\sum_{a=0}^{\deg (P)-1}\alpha^{p^{\nu a}}, we conclude from the above that

\begin{aligned}\textrm{Tr}_{q,p}(\textrm{Tr}(P))& =-\sum_{b=0}^{\nu-1}\sum_{a=0}^{\deg (P)-1} \alpha^{p^bq^a}\\ & =- \sum_{c=0}^{\nu \deg (P)-1} \alpha^{p^{c}}=i\end{aligned},

as needed.

Irreducible speciailizations give group preserving specializations

June 12, 2013

In this post I discuss the classical connection between irreducible and group preserving specializations in Hilbert’s irreducibility theorem.

A field K is Hilbertian if every f\in K[t,X] that is separable and of positive degree in X and irreducible in K[t,X] admits infinitely many irreducible specializations, that means, there exist infinitely many a\in K such that f(a,X) is irreducible and separable as a polynomial in K[X].

A group preserving specialization is a stronger type of specialization. For f(t,X) separable in X, let F be its splitting field over K(t), and let G= Gal(f,K(t)) = Gal(F/K(t)) be the Galois group of f over K(t). For any a\in K with f(a,X) separable, let F_a be the splitting field of f(a,X) and G_a = Gal(f_a,K)=Gal(F_a/K) be the Galois group. One always has that G_a is a subquotient of G (that is G_a is isomorphic to the decomposition group modulo the inertia group of a prime of F that lies over the prime (t-a) of K(t)).

A specialization t\mapsto a is called group preserving specialization if G\cong G_a, or equivalently if |G|=|G_a|.

If t\mapsto a is an irreducible specialization for f it is NOT necessarily group preserving. For example, the Galois group of X^4+X^3+X^2+ X+t is S_4, the specialization t\mapsto 1 gives X^4+X^3+X^2+X+1 – which is irreducible but its Galois group is G_1 = (\mathbb{Z}/5\mathbb{Z})^\times \cong \mathbb{Z}/4\mathbb{Z} (since its the minimal polynomial of the root of unity \zeta= e^{2 \pi i/5}).

Nevertheless, if K is Hilbertian we can always find group preserving specializations:

Theorem (Hilbert): If K is Hilbertian, then  for every separable-in-X polynomial f(t,X) there is a group preserving specialization.

A proof with all details can be found, e.g., in Galois theory of Serre or in Field Arithmetic of Fried-Jarden. Here’s a

Sketch of Proof:
Let F be the splitting field of f(t,X) over K(t) and let =Gal(F/K(t)). Since F/K(t) is separable, by the primitive element theorem, there exists \xi \in F such that F=K(t)(\xi). Let g(t,X) be the minimal polynomial of \xi over K(t). Now apply the Hilbertianity property for g: There exist infinitely many a\in K such that g(a,X) is irreducible. Choose such an a\in K with the extra properties that (1) \deg_X f(t,X) = \deg f(a,X) and (2) \deg_X g(t,X) = \deg g(a,X). Now since F is generated by the roots of f(t,X) on the one side and by \xi on the other, one can show that the splitting field F_a of f(a,X) equals the field generated by a root of g(a,X). So

\begin{array}{ll} |G| &= [F:K(t)]=\deg_X g(t,X)\\ & = \deg g(a,X) =[F_a:K] =|G_a|.\end{array}

So the groups are of the same size, as needed.


Hardy-Littlewood tuple conjecture over large finite fields

November 27, 2012

In this post I will describe the solution to the Hardy-Littlewood tuple conjecture for polynomials over large finite fields.
Let’s first recall the

Classical Setting

The prime number theorem says that

\displaystyle \pi(x) := \sum_{0<p\leq x} 1 \sim \frac{x}{\log x}, \quad x\to \infty.

(In fact, the inverse logarithmic function, \textnormal{Li}(x) = \int_{2}^x \frac{1}{\log t}dt, better approximates \pi(x), but we will not get into these subtleties  in this post.)

pi vs li vs x/logx

Let k>0. We will say that a k-tuple (n_1, \ldots, n_k) of integers is a prime tuple if n_i is a prime for every i.

For a k-tuple of integers, say a=(a_1,\ldots, a_k), let \pi_a(x) be the number of 0<n\leq x such that (n+a_1, \ldots, n+a_k) is a prime tuple.

The Hardy-Littlewood tuple conjecture asserts that

\displaystyle \pi_a(x) \sim \mathfrak{S}_a\frac{x}{\log^k x}, \quad x\to\infty ,

where, if \nu(p) denotes the number of mod p residue classes  in \{a_1, \ldots, a_k\}, then the singular series, \mathfrak{S}_a, is given by

\displaystyle \mathfrak{S}_a = \prod_{p} \left(1-\frac{\nu(p)}{p}\right).

First we note that if \nu(p)=p, for some p, then p divides one of the entries of (n+a_1, \ldots, n+a_k), for every n, so \pi_a(x) = O(1). This is consistent with the Hardy-Littlewood conjecture, since in this case, \mathfrak{S}_a=0.

If \nu(p)<p for all primes, then one can show that the product converges, so \mathfrak{S}_a>0. Therefore, in this case, the events that n+a_1 is prime, \ldots, n+a_k is prime, are, in a sense, independent up to the factor of the singular series.

A notable case of The Hardy-Littlewood tuple conjecture is when a=(0,2), in which \pi_a(x) counts twin primes.

The only case of Hardy-Littlewood tuple conjecture that is settled is k=1, which is the prime number theorem. All other cases are hopelessly open.

Polynomials over Finite Fields

In the ring \mathbb{F}_q[t] of polynomials over a finite field with q elements, the monic irreducible polynomials play the role of (positive) prime numbers. To get the analog of the prime number theorem, we let \displaystyle \Pi(q,n) be the number of irreducible monic polynomials f\in \mathbb{F}_q[t] of degree n. Then the analogous prime number theorem tells us that

\displaystyle \Pi(q,n)=\frac{q^n}{n}+O(\frac{q^{n/2}}{n}) \sim \frac{q^n}{n}, \quad q^n \to \infty.

Here q^n plays the role of x in the classical setting, since there are q^n monic polynomials of degree n, and n=\log_q (q^n) plays the role of \log x.

We note that here, in contrast to the classical case, there are several ways q^n can tend to \infty, notably q\to \infty and n\to \infty. Since we have a good error term in the prime number theorem in polynomial rings, the theorem holds in all limits.

However, it happens that some questions in number theory in polynomial rings can be solved only in certain limits. In what follows I’m going to explain a solution of the Hardy-Littlewood tuple conjecture in polynomial rings in the limit q\to \infty, q odd.

For positive integers n,k, a prime power q, and a k-tuple a=(a_1, \ldots, a_k) of polynomials in \mathbb{F}_q[t] each of degree <n, we let \Pi_a(q,n) be the number of monic polynomials f \in \mathbb{F}_q[t] of degree n such that (f+a_1, \ldots, f+a_k) is a k-tuple of irreducible polynomials.

For an analogous Hardy-Littlewood tuple conjecture we shall have to introduce a singular series \mathfrak{S}_a, which is defined analogously to the classical case. That is

\displaystyle \mathfrak{S}_a =\prod_{f}\left(1-\frac{\nu(f)}{q^{\deg f}}\right),

where f runs over all monic irreducibles and \nu(f) is the number of residues of \{a_1, \ldots, a_k\} mod f. We note that \nu(p)\leq k, so if q\to \infty, then each factor will tend to 1, so \mathfrak{S}_a\to 1 (I wasn’t very rigorous here…).

Therefore, in the limit q\to \infty, the Hardy-Littlewood tuple conjecture should say that \Pi_a(q,n) \sim \frac{q^n}{n^k}, as q\to \infty.

At least if q is odd this is true:

Theorem (Bary-Soroker 2012): Let n,k be positive integers, let q be an odd prime power, and let (a_1,\ldots, a_k) be a k-tuple of distinct polynomials of degree <n. Then

\displaystyle \Pi_a(q,n) = \frac{q^n}{n^k} + O_{n,k}(q^{n-1/2})

Here the O_{n,k} notation means that there exists a constant C = C(n,k), depending only on n and k – and NOT on q or on a – such that |\Pi_a(q,n)-\frac{q^n}{n^k}|\leq C q^{n-1/2}.

Before going to the points in the proof several remarks are in place.

  1. The interesting case k=2 is due to Bender and Pollack, see here.
  2. Another interesting special case is when the polynomials in a are constants. In this case there are stronger results by Pollack and by myself.

Let’s explain the main steps of the proof

Basic Strategy

Let F be the generic polynomial of degree n over \mathbb{F}_q. That is to say, F_U(t)=t^n + U_1 t^{n-1} + \cdots +U_n, where U = (U_1, \ldots, U_n) is an n-tuple of variables. Then, we can think of \Pi_a(q,n) as the number of specializations U\mapsto u\in \mathbb{F}_q^n such that (F_u+a_1, \ldots, F_u+a_k) is a k-tuple of irreducibles. (Here F_u = t^n + u_1 t^{n-1} + \cdots + u_n.)

This point of view allows us to break the problem into two parts. The first part is an analog of Hilbert’s irreducibility theorem: The study of irreducible specializations for polynomials. We will see that unlike the classical setting, over finite fields the existence, and the number of irreducible specializations is governed by the Galois group. We will elaborate more below.

For now we shall say that the above consideration leads to the study of the Galois groups of the polynomials F_U + a_1, \ldots, F_U+a_k. Since the coefficients of F_U, hence of F_U + a_i, are variables we get that F_U+a_i is irreducible (in the ring \mathbb{F}_q[U,t], and even {\rm Gal}(F_U+a_i, \mathbb{F}_q(U)) = S_n). This, however, does not suffice for our needs. To apply the above mentioned Hilbert’s irreducibility theorem, we have to know the Galois group of \hat{F}_U = \prod_{i=1}^k (F_U + a_i).

Calculating a Galois Group

Let \mathbb{F} be an algebraic closure of \mathbb{F}_q.  Then the restriction of automorphisms map gives an embedding

\displaystyle {\rm Gal}(\hat{F}_U,\mathbb{F}(U))\to {\rm Gal}(\hat{F}_U,\mathbb{F}_q(U)).

Since \hat{F}_U = \prod_{i=1}^k (F_U + a_i) we have

\displaystyle {\rm Gal}(\hat{F}_U,\mathbb{F}_q(U)) \leq \prod_{i=1}^k {\rm Gal}(F_U+a_i,\mathbb{F}_q(U))\cong S_n^k.

Thus if we show that G = {\rm Gal}(\hat{F}_U,\mathbb{F}(U))\cong S_n^k, we will get that also  {\rm Gal}(\hat{F}_U,\mathbb{F}_q(U))\cong S_n^k.

For each k we have an epimorphism \varphi_i\colon G \to {\rm Gal}(F_U+a_i, \mathbb{F}(U))\cong S_n. Composing each \varphi_i with the signature map S_n\to S_2, we get \nu_i\colon G\to S_2.

Lemma. The map \prod_{i=1}^k \varphi_i \colon G\to S_n^k is surjective if and only if the map \prod_{i=1}^k \nu_i \colon G\to S_2^k is surjective.

By this lemma it suffices to study the maps \nu_i.

Since q is odd (here is the only place we are using the oddness of q), by Kummer theory, each \nu_i corresponds to a square class of an element in the field \mathbb{F}(U). By its definition, \nu_i corresponds to the class of the discriminant of F_U+a_i. Note that \prod_{i} \nu_i is surjective if and only if

  • the discriminant of F_U+a_i is  non-square (this is obvious since the corresponding Galois group is S_n) for each i.
  • The product of any subset of discriminants is non-square.

The latter condition can be achieved by applying a very nice argument of Carmon and Rudnick, taken from here. The details will be given in a later post.

Irreducible Specializations

Let me state a type Hilbert’s irreducibility theorem for finite fields in the setting we have here. I should say that this is not the most general theorem.

Theorem. Let U=(U_1, \ldots, U_n) be a set of variable, with n\geq 1, let F_1,\cdots, F_k be polynomials in \mathbb{F}_q[U,t] of respective degrees d_i and t-degrees n_i. Assume that the Galois group of the product F=F_1\cdots F_k over \mathbb{F}(U) is isomorphic to \prod_{i=1}^k S_{n_i}. Then the number of simultaneous irreducible specializations, i.e. of u\in \mathbb{F}_q^n for which F_i(u,t) is irreducible for i=1, \ldots, k, is

\displaystyle \frac{q^n}{n^k} + O_{D}(q^{n-1/2}),

where D = \sum_{i=1}^k d_i.

A few remarks:

  1. If the Galois groups of F  over \mathbb{F}(U) and over \mathbb{F}_q(U) are isomorphic; denote by G\leq \prod_{i=1}^k S_{n_i} , i.e., if the splitting field of F over \mathbb{F}_q(U) is regular, then the same result as in the above theorem holds with a factor c=c(G)=C(G)/|G| before \frac{q^n}{n}. Explicitly, C(G) is the number on elements g=(g_1, \ldots, g_k) in G\leq \prod_{i=1}^k S_{n_i} such that each g_i is an n_i-cycle.
  2. The theorem also has a version when the splitting field is not regular.
  3. The theorem straightforwardly follows from an explicit Chebotarev’s theorem for function fields. The case n=1 was proved by Geyer and Jarden, and perhaps a future post will discuss it.


As explained before \Pi_a(q,n) is the number of specializations U\to u\in \mathbb{F}_q^n such that F_{u}+a_i is irreducible for every i=1, \ldots, k. The calculation of the Galois group described above tells us that we can apply the theorem on irreducible specializations for F_i = F_U+a_i. Since the degree of F_U+a_i \leq n, we get that the sum D of the degrees is bounded by nk, so

\displaystyle \Pi_a(q,n) = \frac{q^n}{n^k}+O_{n,k}(q^{n-1/2}),

as needed.

square-independent numbers

March 21, 2012

We know that the number of non-square integers in I(x) = \{1, \ldots, x\} is approximately \sqrt{x}. Thus almost all numbers are not squares. Similar result holds when one looks for sets whose elements and product of two elements are all non-square.

Let’s say that a set of integers A is square independent if for every distinct elements a_1, \ldots, a_n of A we have that a_1 \cdots a_n is not a square. An interesting question is

What is the maximal size {\rm si}(x) of square-independent subset A of I(x)?

Clearly, the prime numbers in I(x) are square-independent, thus {\rm si}(x)\geq \pi(x).

At a first glance, one might be led to the conclusion (as I did) that {\rm si}(x) might be much bigger, however this is not the case. Namely

{\rm si}(x) = \pi(x).

After discussing this matter with Lior Rosenzweig, he came up with the following proof (we didn’t check if it appears in the literature, I’ll be glad to have a reference).

Let V be the set of all integers whose prime factors are \leq x modulo the square relation, i.e. n\sim m iff nm = \square. Then V is a vector space over \mathbb{F}_2. Since the set of primes \leq x is a basis of V, we get that \dim V = \pi(x). Since a square-independent set A is an independent set in V, we get that |A| \leq \pi(x). This finishes the proof.

A simplified proof of a theorem of Thornhill

January 29, 2012

In this post I wish to present a simplified proof of a theorem of Christopher Thornhill that settles a part of a conjecture of Moshe Jarden.

Conjecture (Jarden): Let K be a Hilbertian field, let n be an integer, and for each prime number \ell let \rho_\ell \colon G_K \to GL_n(\mathbb{Z}_\ell) be a Galois representation. (Here G_K is the absolute Galois group of K.) Let K_\ell be the fixed field of \ker \rho_\ell in the algebraic closure of K and let N = \prod_\ell K_\ell be the compositum of K_\ell. Then every extension  M/K that is contained in N is Hilbertian.

Jarden proved the conjecture for K a number field and abelian variety $A/K$, that is, \rho_\ell is the family of Galois representations induced from the action of the Tate module T_\ell = \varprojlim A[\ell^n]. The original conjecture of Jarden is about abelian varieties over arbitrary Hilbertian fields.

Thornhill proved the above conjecture under the extra assumption that M/K is Galois.

Theorem (Thornhill): Under the notation of the conjecture of Jarden, if M/K is Galois, then M is Hilbertian.

Thornhill’s proof contains many innovations, in particular his definition of Hilbertian pairs of profinite groups. In this post I wish to give a new and simplified proof of Thornhill’s Theorem.


The first result we need is a consequence of the Larsen-Pink Theorem that we formulate as a fact:

Fact 1(Larsen-Pink’s Theorem):
There exists a constant J(n), depending only on n, such that for every G\leq GL_n(\mathbb{F}_\ell) there exist normal subgroups G_3\leq G_2\leq G_1\leq G of G such that [G:G_1]\leq J(n), G_1/G_2 is a product of finite simple groups of Lie type in characteristic \ell, G_2/G_3 is abelian, and G_3 is an \ell group.

Lemma 1: There exists a constant m(n) such that for every G\leq GL_n(\mathbb{Z}_\ell) there exists a series of open subgroups

H_{m(n)} \leq \cdots \leq H_0 = G

such that  H_m(n) is pro-\ell, and for every 1\leq i\leq m(n)

  1. H_i is normal in G,
  2. if i is even, then H_{i-1}/H_i is abelian,
  3. if i is odd, then H_{i-1}/H_i is a product of finite simple groups.

Proof. The kernel N of GL_n(\mathbb{Z}_\ell)\to GL_n(\mathbb{F}_\ell) is pro-\ell (See Page 51 of Analytic pro-p groups). Since G/G\cap N\leq GL_n(\mathbb{F}_\ell), Fact 1 gives normal subgroups N\cap G \leq G_3\leq G_2\leq G_1\leq G of G such that [G:G_1]\leq J(n), G_1/G_2 is a product of finite simple groups, G_2/G_3 is abelian, and G_3/G\cap N is an \ell group. Let G_1=K_r\leq \cdots \leq K_0=G be normal subgroups of G such that for each i the group K_{i-1}/K_i is a minimal normal subgroup of G/K_{i}. Then for each 1\leq i\leq r we have K_{i-1}/K_i is a product of finite simple groups. Note that r\leq \log_2 J(n).

For i=0, \ldots, r set H_i = K_i and set H_{r+1}=G_2, H_{r+2} = G_3. Then H_{i-1}/H_{i} is either isomorphic to an abelian group of isomorphic to a product of finite simple groups, and H_{r+3} is pro-\ell. Extend the sequence by adding the same groups, if necessary, to assume that H_{i-1}/H_i is abelian for even i and a product of finite simple groups for odd i. Then the length of the series can at most be doubled, hence this length u is bounded by 2(r+2) \leq 2(\log_2 J(n)+2) =:m(n). We extend the series even further by setting H_{k} = H_u, for u< k\leq m(n), to conclude that the length is exactly m(n). QED

The second result we need is a Hilbertianity criteria that are well known to experts:

Lemma 2: Let K be Hilbertian and L/K a Galois extension such that Gal(L/K) is either

  1. finitely generated,
  2. abelian,
  3. pro-nilpotent but not pro-p for every prime p, or
  4. a product of finite simple groups.

Then L is Hilbertian.

Proof. The first three cases appear in Jarden-Lubotzky.

If Gal(L/K) is a product of one finite simple group, then it is finitely generated and thus we are done by (1). If Gal(L/K) is a product of finite simple groups that has at least two factors, then we have  Gal(L/K)= H\times G, where H,G\neq 1. If we let N,M be the respective fixed fields of H,G in L, then by the Galois correspondence, L=NM and N\cap M = L, so the assertion follows by Corollary 13.8.4 of Field Arithmetic. QED

Here is a very well known result about products of simple groups.

Lemma 3: Let G = \prod_{i\in I} S_i, where S_i is a non-abelian finite simple groups for every i\in I. Then for every normal subgroup N of G there exists I_0\subseteq I such that N = \prod_{i\in I_0} S_i \times \prod_{i\in I\smallsetminus I_0} 1.

Lemma 4: Let \{L_{i}/K\mid i\in I\} be a family of Galois extensions and let L=\prod_{i\in I} L_i. If L_i/K is abelian for each i\in I, then L/K is abelian. If Gal(L_i/K) is isomorphic to a product of non-abelian finite simple groups for each i\in I, then Gal(L/K) is isomorphic to a product of non-abelian  finite simple groups.

Proof. The abelian case is trivial. Assume thus that G_i=Gal(L_i/K) = \prod_{j\in J_i} S_{ij}, where S_{ij} is a non-abelian finite simple groups for each i\in I and j\in J_i. Take i\neq i'. Then Gal(L_iL_{i'}/K) = G_i \times_A G_{i'}, where A is a common quotient of G_i and G_{i'}. Let N_i be the  kernel of G_i\to A. Then there is a partition J_i = J_{0i} \cup J_{1i} such that N_{i}=\prod_{j\in J_{0i}} S_{ij} \times \prod_{j\in J_{1i}}1 (Lemma~\ref{lem:norsimp}). Thus A\cong \prod_{j\in J_{1i}} S_{ij} and G_{i} \cong N_{i} \times A. Similarly G_{i'} \cong N_{i'}\times A, where N_{i'}=\prod_{j\in J_{0i'}} S_{i'j} \times \prod_{j\in J_{1i'}}1.  So Gal(L_{i}L_{i'}/K)=G_i\times_{A} G_{i'} \cong N_{i}\times N_{i'} is isomorphic to a product of finite simple groups. This proves the assertion for |I|=2. In general one applies transfinite induction. QED

Fact 2: A closed subgroup of GL_n(\mathbb{Z}_\ell) is finitely generated.

Proof of Thornhill’s theorem.

By Lemma 1, for each \ell, we have a series of normal subgroups

H_{m(n)} \leq \cdots \leq H_0 = {\rm im}(\rho_{\ell}) = Gal(K_{\ell}/K)

satisfying the properties of the lemma. Let K_{\ell,i} be the fixed field of H_{i} in K_\ell. Then we have a tower of Galois extensions of K,

K= K_{\ell,0} \subseteq \cdots \subseteq K_{\ell,m(n)} \subseteq K_{\ell}

such that Gal(K_{\ell,i}/K_{\ell,i-1}) is abelian for even i and isomorphic to a product of non-abelian finite simple groups for odd i and such that Gal(K_\ell/K_{\ell, m(n)}) is pro-\ell.

Let K_i = \prod_{\ell} K_{\ell,i}, i=0, \ldots, m(n); in particular K_0=K. Let N=\prod_{\ell} K_\ell.
First note that

\begin{array}{lcl}  \Gamma_\ell &=& Gal(K_\ell K_{m(n)}/K_{m(n)}) \cong Gal(K_\ell/ K_{m(n)} \cap K_\ell)\\ &\leq& Gal(K_\ell/K_{\ell,m(n)}) \leq GL_n(\mathbb{Z}_\ell).\end{array}

Thus \Gamma_\ell is a pro-\ell group that is finitely generated (the latter follows from Fact 2). We also get that Gal(N/K_{m(n)}) \cong \prod_{\ell} \Gamma_\ell is pro-nilpotent.

Next, we show, by induction on i, that Gal(K_i/K_{i-1}) is either abelian or a product of non-abelian finite simple groups. Indeed, since each K_{\ell,1}/K has Galois group a product of non-abelian finite simple groups, by Lemma 4, Gal(K_1/K_0) is isomorphic to a product of finite simple groups, hence the induction basis. The Galois group Gal(K_{\ell, i+1}/K_{i}) is isomorphic to a normal subgroup of Gal(K_{\ell, i+1}/K_{\ell,i}). Hence either Gal(K_{\ell, i+1}K_{i}/K_{i}) is abelian for every \ell or  Gal(K_{\ell, i+1}K_{i}/K_{i}) is a product of non-abelian finite simple groups for every \ell (Lemma 3). By Lemma 4 we get that Gal(K_{i+1}/K_{i}) is either abelian or a product of non-abelian finite simple groups, as needed.

Now let M a Galois extension of K that is contained in N=\prod_{\ell} K_\ell. Let M_i = M\cap K_{i}. Then Gal(M/M_{m(n)}) is a quotient of Gal(N/K_{m(n)}), hence pro-nilpotent.  Moreover, if Gal(M/M_{m(n)}) is a pro-p group for some prime p, then it is a quotient of the p-sylow subgroup \Gamma_p of Gal(N/K_{m(n)}). Therefore, since \Gamma_p is finitely generated, so is Gal(M/M_{m(n)}).

For each 1\leq i\leq m(n)Gal(M_{i}/M_{i-1}) is a quotient of Gal(K_i/K_{i-1}). Thus it is either abelian or a product of non-abelian finite simple groups (Lemma 3). Applying Lemma 2 repeatedly yields that M_1, M_2, \ldots, M_{m(n)} are all Hilbertian.

Since we have showed that M/M_{m(n)} satisfies either Condition 1 or Condition 3 of Lemma 2, we get that M is Hilbertian.

Geometric embedding problems

December 20, 2011

Embedding probelms

Let K be a field. A finite embedding problem for a field K is composed of two finite groups G,H and continuous epimorphisms \alpha\colon G_K\to H and \beta \colon G\to H. Here G_K denotes the absolute Galois group of K.

The goal of this post is to explain a construction of embedding problem for a field K that come from geometric objects. I’ll give the definitions and discuss some applications. Probably in later posts I’ll discuss other applications.

A weak solution of the embedding problem

\mathcal{E} = (\alpha\colon G_K \to H, \beta\colon G\to H)

is a continuous homomorphism \gamma \colon G_K \to G such that \alpha =\beta\circ \gamma. A weak solution is called proper solution if it is surjective.

Note that \alpha factors through the restriction map G_K\to Gal(L/K), where L is the fixed field of \ker \alpha. Thus \alpha induces an isomorphism (via the first isomorphism theorem)  \bar\alpha\colon Gal(L/K) \to H. Therefore we may assume that H = Gal(L/K) and \alpha is the restriction map, when it is convenient.

If \gamma is a weak solution of \mathcal{E}, the the fixed field F of \ker \gamma is called the solution field, and L embeds inside F in such a way that the restriction map Gal(F/E) \to G(L/K) coincides with the map induced by \beta.

Geometric embedding problems

We now construct a special type of embedding problems coming from covers of algebraic varieties. The construction is generic in a sense, hence the embedding problems, actually, are defined on the level of function fields, and we shall use the terminology of function fields.

Let K be a field. Assume that E is a finitely generated regular extension of K and that F/E is a finite Galois extension with Galois group G. Let L be the algebraic closure of K in F. Then L/K is a Galois extension, and if we denote H=Gal(L/K) we have a natural restriction map \beta \colon G\to H. This map \beta is surjective because E is regular over K. Thus

\mathcal{E}(F/E) = (\alpha \colon G_K \to H, \beta\colon G\to H),

in which \alpha is the restriction map, is a finite embedding problem for K. We call such an embedding problem geometric.

Assume that R\subseteq S is an integral ring extension whose respective quotient field extension is E\subseteq F. Let us assume that R is finitely generated over K, i.e. R=K[x_1, \ldots, x_n], for some x_1, \ldots, x_n \in E. Further assume that L\subseteq S. Choose generators y_1, \ldots, y_k of S over R, i.e. S = R[y_1, \ldots, y_k]. Write d_i for the discriminant of y_i over E. Then d_i \in R.

Consider a homomorphism \phi \colon R\to K that is identity on K with the property that \phi(d_i)\neq 0. Extend \phi to a homomorphism \Phi \colon S \to K_s that is trivial on L. Here K_s is the separable closure of  K.

For every \sigma \in G_K the equation

\sigma\Phi( x) = \Phi(\Phi^{*}(\sigma x), \quad \forall x\in S

uniquely defines a element \Phi^*(\sigma)\in G. Moreover the map \Phi^*\colon G_K \to G is a continuous homomorphism and the fixed field of the kernel is \Phi(S). Since \Phi is the identity on L we have \alpha = \beta\circ \Phi^*. Thus every such \Phi defines a weak solution of \mathcal{E}(F/E). We call solutions obtained from such homomorphisms geometric solutions. (In general any \phi whose kernel is etale in S defines a geometric solution, this is a special case.)


Example 1. Assume E = \mathbb{Q}(x) and F = \mathbb{Q}(\sqrt x). Then L=\mathbb{Q} and

\mathcal{E}(F/E) = (G_\mathbb{Q} \to 1, \mathbb{Z}/2\mathbb{Z} \to 1).

Take R = \mathbb{Q}[x] and S = R[\sqrt x]. The discriminant of \sqrt x is 4x. Consider the homomorphism \phi\colon R\to \mathbb{Q} defined by \phi(x) = d, where d\neq 0. Then \Phi^* \colon G_K \to \mathbb{Z}/2\mathbb{Z} is defined by \Phi^*(\sigma\sqrt{x}) = \pm \sqrt{x}  iff \sigma \sqrt{d} = \pm \sqrt{d}.

Example 2. Assume E = \mathbb{Q}(x) and F = \mathbb{Q}(x,\sqrt 2). Here L = \mathbb{Q}(\sqrt{2}) and

\mathcal{E}(F/E) = (G_\mathbb{Q} \to \mathbb{Z}/2\mathbb{Z}, \mathbb{Z}/2\mathbb{Z} \to \mathbb{Z}/2\mathbb{Z}).

In particular any homomorphism will yield the same solution.

Example 3. Assume E = \mathbb{Q}(x) and F = \mathbb{Q}(\sqrt x + \sqrt 2). Then

\mathcal{E}(F/E) = (G_\mathbb{Q} \to \mathbb{Z}/2\mathbb{Z}, \mathbb{Z}/2\mathbb{Z}\times \mathbb{Z}/2\mathbb{Z} \to \mathbb{Z}/2\mathbb{Z}).

What are the geometric solutions here?

Application to pseudo algebraically closed fields

We call a field K pseudo algebraically closed, or in short PAC, if every non-void absolutely irreducible K-variety has a K rational point. Note that K is PAC if and only if for every finitely generated regular extension E over K and for every finitely generated ring R over K whose quotient field is E there exists a homomorphism \phi \colon R\to K.

Lemma: A field K is PAC if and only if every geometric embedding problem for K has a geometric solution.

Proof. Assume K is PAC and let \mathcal{E}(F/E) be a geometric embedding problem for K. Let x_1, \ldots, x_n be generators of E over $K$, let R_0 = K[x_1,\ldots, x_n]. Let y\in F be a generator of F/E that is integral over R  and let d\in E be the discriminant of y over E.

We set R = R_0[d^{-1}] and S = R[y]. As K is PAC, we have a homomorphism \phi \colon R\to K (recall that E, the quotient field of R, is regular over K, by the definition of geometric embedding problems). Extend \phi to a homomorphism \Phi \colon S\to K_s, to get that \Phi^* is a geometric solution of \mathcal{E}(F/E).

Next assume that every geometric embedding problem has a geometric solution. Let E be a regular extension of K. The geometric embedding problem \mathcal{E}(E/E) has a geometric solution that corresponds to a homomorphism \phi \colon R \to K, for some finitely generated K-algebra R. Hence K is PAC. QED

Corollary: The absolute Galois group of a PAC field is projective, i.e. every finite embedding problem for K has a weak solution.

Proof: For an epimorphism \beta\colon G\to H we define a function field extension F/E. Let L be the fixed field of the kernel of \alpha. Let \{x_\sigma\mid \sigma\in G\} be |G| variables and let F = L(x_\sigma\mid \sigma\in G). Define the “diagonal” action of G on F by

\tau x_\sigma = x_{\sigma \tau}, \quad \sigma, \tau \in G


\tau l = \beta (\tau) l, \quad \tau\in G and l\in L.

Let E be the fixed field of G in F under the above action. The action is free, hence F/E is Galois with Galois group G. Moreover it is a rather easy exercise in Galois theory to check that E is regular over  K. Clearly L is the algebraic closure of K in F. Summing up \mathcal{E}(F/E) = (\alpha,\beta). Thus it has a weak geometric solution, in particular a weak solution. QED

Theorem: Let K be a PAC field, let \mathcal{E}(F/E) be a geometric embedding problem for K, and let \theta be a weak solution. Then \theta is geometric. Moreover, for every 0\neq e\in E there exist a K-algebra S with quotient field F and a homomorphism \Phi\colon S\to K_s such that  \Phi induces a geometric solution, \Phi^* = \theta and \Phi (e) \neq 0.

I will skip the proof in the meanwhile, I’ll just say that it involves the field crossing argument which deserves a post on itself.

Corollary: Let K be a PAC field of characteristic 0. Then every separable extension of degree n is generated by a root of a polynomial X^n + X^{n-1} + a.

Proof: The polynomial f(T,X) = X^n + X^{n-1} + T has Galois group S_n over K_s(T) (this is classical result in Galois theory that appears, e.g., in Serre’s Topic in Galois Theory). Let F be the splitting field of f(T,X) over K(T). Let R = K[T] and S = R[x_1, \ldots, x_n], where x_1, \ldots, x_n are the roots of f in F. Then \mathcal{E}(F/K(T)) = (G_K\to 1, S_n \to 1).

Let L/K be a separable extension of degree n of K. Then the action of G_K on the cosets of G_L yields a homomorphism \theta\colon G_K \to S_n. By the theorem we have a homomorphism \Phi\colon S\to K_s such that \Phi(R) = K and \Phi^*=\theta.

The formula \sigma \Phi(x) = \Phi(\Phi^*(\sigma)x) for \sigma \in G_K and x\in S implies that \sigma\in G_L if and only if \Phi^{*} (\sigma) stabilizes 1 if and only if \Phi^*(\sigma) \in Gal(F/K(T,x_1)) if and only if \sigma \in G_{\Phi(K[T,x_1]}. Thus L = \Phi(K[T,x_1]) = K(\Phi(x_1)). This finishes the proof since \Phi(x_1) is a root of X^n + X^{n-1} + \Phi(T). QED