Minkowski’s theorem

In the post about Diophantine approximation, we saw that in order to find “good” rational approximations to a real number, it is enough to prove that given a lattice and a “big enough” box around the origin, the box must contain a nonzero lattice point. This very intuitive result is due to Minkowski which started what is today known as the geometry of numbers. After several examples in order to get some intuition, we shall prove this result and see several of its applications.

The most trivial lattices are the 1-dimensional lattices, and in these cases a box is simply an interval [a,b], and as we want to consider symmetric boxes around the origin, we will work with intervals of the form B=[-a,a] for a>0. This box contains a nonzero point from the lattice L=n\mathbb{Z} if and only if a\geq n, or in different notation vol(B)\geq 2\cdot covol(L).

The next lattice we consider is L=n\mathbb{Z}\times m\mathbb{Z}\subseteq \mathbb{R}^2, and a symmetric box there has the form B=[-a,a]\times [-b,b] for some a,b>0. Such a box will contain a nonzero lattice point if and only if a\geq n or b\geq m, which is certainly true if ab\geq mn. Using our lattice notations we see that B contains a nonzero lattice point if vol(B)=4ab\geq 4nm=4\cdot covol(L). A similar argument will show that in dimension 3 the right inequality will be vol(B)\geq 8\cdot covol(L), and more generally in dimension n we will require that vol(B) \geq 2^n covol(L).

Up until now we worked with boxes and box-lattices, which work well together because we can find fundamental domains for such lattices which are boxes (which is not true for the general lattice). We can also ask a similar question where B is a general parallelepiped, a ball, an ellipsoid etc and L is a general lattice. How big can we take B before it must contain a nonzero point from L?



At this point we should ask ourselves what are the right conditions for a set B so that we have a result of the form “if B is big enough, then it contains a nonzero lattice point”. After playing around a little bit with examples, you should see that the following two properties are very important.

Definition: Let \Omega\subseteq \mathbb{R}^n.

  1. \Omega is called convex if whenever x,y\in \Omega and t\in[0,1], then tx+(1-t)y\in \Omega.
  2. \Omega is called symmetric if whenever x\in\Omega we also have that -x\in \Omega.

The convexity property just means that whenever \Omega contains two points, it must also contain the line joining these two points. If we leave it out, then we can take a very big ball B=B(0,R), and remove from it the small balls B(\gamma,\varepsilon) for all \gamma\in L. If we choose \varepsilon>0 to be small enough, then the size of B with these holes will be roughly the size of B without the holes, which grows to infinity as R\to\infty. In other words, we can find very big nonconvex sets (though still symmetric) which don’t contain any nonzero lattice points. The symmetric condition is also important. In the example of the square-lattice \mathbb{Z}^2\subseteq \mathbb{R}^2 we can take the set [-a,a]\times [0.1,0.9] where a is as big as we want to get that B is convex and doesn’t contain any lattice point. Even if we add the condition that B must contain the origin, we can take the triangle B with vertices (0,0), (a,\frac{1}{2}),(-a,\frac{1}{2}) to get a counter example.

After this introduction, we are ready to state the theorem.

Theorem (Minkowski): Let L\subseteq \mathbb{R}^n be a lattice and B\subseteq \mathbb{R}^n be a measurable set which is convex and symmetric.

  1. If vol(B)>2^n covol(L), then B contains a nonzero lattice point.
  2. If in addition B is a closed set, then it is enough to require that vol(B)\geq 2^n covol(L).

Remark: You should check that these bounds are tight, i.e. we cannot change the strict inequality in (1) to \geq, and we cannot take a smaller bound than 2^n covol(L) in (2).

Proof: The main idea of the proof is that given a region A and \gamma\in L, the region A contains a lattice point if and only if \gamma+A contains a lattice point. This let us decompose \Omega to small parts and move them around to create new sets which are easier to work with.

The first step of the proof is due to Blichfeldt and states that any measurable set A with vol(A)>covol(L) contains two points a_1,a_2\in A such that a_1-a_2\in L. Otherwise, take some fundamental domain F for L, and decompose A to small parts and move each one of them into F using elements from L. Since vol(A)>covol(L)=vol(F), there will be overlap somewhere, which will produce two points as required. This is in a sense the pigeonhole principle which says that if we try to jam a “big set” into a “smaller set”, then two points must go to the same place.

Below we have the example with the square-lattice and the standard fundamental domain [0,1]\times [0,1]. After decomposing the set on the left and moving it into the fundamental domain (using translations from the lattice) we get an overlap in two triangles.

More formally, write A as a disjoint union (up to measure zero)

A=A\cap (\bigcup_{\gamma \in L} \gamma+F)= (\bigcup_{\gamma \in L} A\cap(\gamma+F)) \\  |F|<|A|= |(\bigcup_{\gamma \in L} A\cap(\gamma+F))|=\sum_{\gamma \in L} |A\cap(\gamma+F)|=\sum_{\gamma \in L} |(-\gamma+A)\cap F|

In the second line we used the property of fundamental domains which says that its translations by distinct lattice points are disjoint up to measure zero, so that the measure of the union is the sum of the measures. The sets (-\gamma+A)\cap F are all inside F and the sum of their measures is more than vol(F), so we can find \gamma_1,\gamma_2\in L such that (-\gamma_1+A)\cap (-\gamma_2+A)\cap F\neq \emptyset, which implies that there are a_1,a_2 \in A such that a_1-a_2=\gamma_1-\gamma_2\in L.

Now that we have proven Blichfeldt’s theorem, suppose that we are given \Omega\subseteq \mathbb{R}^n such that vol(\Omega)>2^n covol(L). It then follows that vol(\frac{1}{2}\Omega)=\frac{1}{2^n}vol(\Omega)>covol(L), so by Blichfeldt’s theorem we can find two distinct points a_1,a_2 \in \Omega such that \frac{1}{2}a_1+\frac{1}{2}(-a_2)=\gamma\in L. If we also know that \Omega is symmetric, then (-a_2)\in \Omega, and if it is also convex, then \gamma \in \Omega which is exactly what we needed.

The proof for closed sets is left as an exercise. \square

The first application for Minkowski’s theorem that we have already seen is to Diophantine approximation. Recall, that finding “good” rational approximations for some x\in \mathbb{R} translates to finding nontrivial points from the lattice L=span_{\mathbb{Z}}\{(1,0),(x,1)\} inside the boxes B_R=  [-\frac{1}{R},\frac{1}{R}]\times [-R,R] as R\to \infty. Since L has covol(L)=1 and B_R are closed boxes of size 2^2, we can apply Minkowski’s theorem and get the required rational approximations.

I will leave it as an exercise (for now) to show that a similar result is true for higher dimension – given a set of numbers \alpha_1,...,\alpha_n and r_1,...,r_n\geq 0 such that \sum r_i =1, we can find infinitely many distinct integer tuples p_{1},...,p_{n},q such that |\alpha_k-\frac{p_{k}}{q}|\leq \frac{1}{q^{1+r_k}} for every k.

There are many applications to Minkowski’s theorem in mathematics which seem to come out of nowhere. For the algebraic number theorists among you, Dirichlet’s unit theorem (which I will cover in a later post) and the fact that the ideal class group is finite are usually proven using Minkowski’s theorem. Another application is Lagrange’s four squares theorem which states that any positive integer can be written as a sum of four squares of integers.

Here is a baby case application of Minkowski’s theorem for number theory.  Let p\in \mathbb{Z} be a prime, and consider the problem of finding a solution to p=x^2+y^2 where x,y are integers. This problem is in a sense the first step in algebraic number theory, since such a solution is equivalent to a decomposition p=(a+bi)(a-bi) in the Gaussian integers \mathbb{Z}[i]. If p=2, then we have the solution 2=1^2+1^2, so let us assume that p is odd. Trying to solve this for small primes, we see that while 5,13,17,29 have a solution, the primes 3,7,11,19,23 don’t have any solution, which lead us to consider the residue of p mod 4. If p\equiv_4 3, then there is no solution in integers because there is no solution mod 4 – this follows from the fact that the squares mod 4 are only 0,1 and you cannot solve 3\equiv_4 a^2+b^2. If p\equiv_4 1, then a solution p=a^2+b^2 implies that (\frac{a}{b})^2\equiv_p -1. Actually, the second expression is always true for such primes (HINT: write (p-1)! in two different ways), and we can ask whether the inverse is still true.

Suppose that there is some integer q such that q^2\equiv_p -1 and we want to find a,b such that a^2+b^2=p. If this is the case, then (\frac {a}{b})^2\equiv_p -1 \equiv_p q^2, so up to a choice of sign we have that a\equiv_p bq (up to a sign), or in other words (a,b)=n(q,1)+m(p,0)+k(0,p). Running over all possible n,m,k\in \mathbb{Z} produces the 2-dimensional lattice  L=span_\mathbb{Z}\{(q,1),(p,0)\} which has covolume p. Thus, the nonzero vectors (a,b) in this lattice always satisfy p \mid a^2+b^2 and we are looking for such a vector which satisfy p=a^2+b^2. Note that such a vector must have the minimal (nonzero) length in the lattice. In search of small vectors in this lattice,  let us look on the open ball B=B(0,\sqrt{2p}) which has area \pi 2p>4p=2^2 covol(L). It follows that we can apply Minkowski’s theorem and find a nonzero lattice point (a,b) such that a^2+b^2<2p and p\mid a^2+b^2 so that a^2+b^2=p, and we are done.

Probably the most important application of Minkowski’s theorem is showing that lattices have “small” nonzero vectors. More precisely, let C_nR^n be the volume of an n-dimensional ball of radius R and let L\subseteq \mathbb{R}^n be a lattice. By Minkowski’s theorem if C_nR^n \geq 2^n covol(L) is big enough, then the closed ball \overline{B(0,R)} of radius R around the origin contains a nonzero lattice point, or in other words, there exists 0\neq v\in L such that |v|\leq 2\sqrt[n]{\frac{covol(L)}{C_n}}. It is quite hard in general to actually find the shortest nonzero vector from a lattice, and though I’m not sure, probably even finding the vector that Minkowski’s theorem promises is not that trivial. Finding a good algorithm for this problem (or show that there isn’t such) is an important open problem in computer science, and in cryptography in particular.


Going back to the problem of Diophantine approximation, the family of sets that we had there are boxes of the form B_R=  [-\frac{1}{R},\frac{1}{R}]\times [-R,R]. These boxes not only have the same size (which is big enough in order to apply Minkowski’s theorem), but are also very similar to one another. Actually, if we take only one box [-1,1]\times [-1,1]\subseteq \mathbb{R}^2 and multiply it by the matrices \left(\begin{array}{cc} R & 0\\ 0 & \frac{1}{R} \end{array}\right), then we will get the entire family of boxes. If we let R\in (0,\infty), then this family of matrices is actually a group isomorphic to (\mathbb{R}_{>0},\times). As always, mathematical objects become much more interesting once we find a nice group which acts on them and “behaves” nicely with respect to the properties that we want to check. Thus our next step (in some future post) will be to understand this group action and reformulate all of our results so far using this new language.


This entry was posted in Diophantine approximation and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s