We have now seen in the previous two posts a few results about prime numbers. In the first one, we saw why primes are so important and used Euclid’s proof to show that there are infinitely many primes. Moreover, we conjectured that if and are coprime, then there are infinitely many primes with . And to top it all, we also conjectured that they are evenly distributed among the integers which are coprime to .
Writing conjectures down is much easier than actually proving them, and in order to do this, in the second post we reproved the fact that there are infinitely many primes, this time using Euler’s proof. The main idea was to consider the infinite sum and to show that it diverges to infinity and therefore there are infinitely many primes.
The set of prime numbers has very little structure that we know of, and actually what we are trying to do is to find some of this structure. For that, we moved step by step to worlds with more structure, in which we have many more tools that can be used to tackle this problem – from the set of prime numbers to the integers (which have addition and multiplication) and from there to the real numbers (which also have a metric, derivatives, integrals, etc). Then, if we are lucky enough, we can prove this claim in the world of real numbers and pull it back to the integers and then back to the prime numbers. In this post, we will find the last step which will allows us to go to even smaller sets of primes, according to their equivalent classes modulo , and prove Dirichlet’s theorem:
THEOREM: For any coprime integers, there are infinitely many primes with .
The first attempt
For simplicity, let us consider the case of and . For ease of notation, let us denote
Note in particular that which we already know to be infinite, and similarly is the harmonic series.
We would like to show that and are infinite, and in a sense they are the “same” infinity.
A straight forward attempt at generalizing Euler’s proof from the previous post, is to some how show that is infinite, and then pull it back to the prime numbers and show that is infinite.
For example, for we will get the sum . To show that such a sum is infinite, follows almost automatically from the sum of the harmonic series, since
Note that almost doesn’t play a role in the computation above, and in a sense (which is more or less what we want in the end for the primes). More specifically, in our case, if we only sum over the integers with and look at the difference
it is uniformly bounded (at most 1) for all . Since both sums diverge to infinity, we consider them as diverging at the same speed.
The hard part is to move from the integers to the prime numbers. It would be really helpful if we could have a result like
which is similar to what we had in the previous post in Euler’s proof. However, trying to open the brackets doesn’t seem to work as last time. For example, a general integer appearing on the left with , doesn’t have to be a product of primes with as appearing on the right, so Euler’s proof method doesn’t exactly work here. For example, we have , while its decomposition is where .
We already reached our first road block – so what can we do?
Proving from the result
There are two results that we already know about the . The first one is that for it is quite trivial, since then (which is, by definition, finite). The second result is that
What we would like to show, is that both and are infinite, and since we are also a bit greedy, we would also like to show that in a sense they are the same infinity . This equality doesn’t say too much if they are both infinity, however (or at least, the limit in this difference is finite) is an entirely different story.
If this is indeed the result that we expect to see, what should we prove in order to get it? In the prime notation, the summation above is
which is a sum with a sort of alternating plus\minus signs. To separate the two parts of the (1) primes and the (2) signs, so we could order our thoughts, lets define
so that now our summation from above can be written simply as . This “simple” sum can be done over the integers as well, giving us
which is truly an alternating sum (the signs are 0, 1, -1 periodically). This should already sound familiar from calculus class, so we can return to Euler’s proof method:
- Show that converges to a finite number.
- Pull this result back to the prime numbers, and show that converges to a finite number.
- Together with what we already know, namely , conclude that both and converge to infinity, and in a sense they converge with the same speed.
Note that the two “equations” and (= is finite) imply that each of the and is about “half” of infinity! We will later give the exact definition for this, but for now the moral of the story should be that in order to understand and , we must consider them together with two specific combinations – one time the addition and one time the subtraction.
This holistic idea is something that comes up in many parts in mathematics. And just to see that we have already know of this type of behavior, consider the complex numbers. Suppose that you have a complex number , and you want to find its real and imaginary parts, but for some reason it is a difficult task. However, you might be able to find their sum and difference, namely and . In this case, you can always return back and get the real part and the imaginary part . The same behavior that we see here, that the pair of real and imaginary parts, and the pair of the complex number and its conjugate, are in a sense dual to one another, and knowing one is knowing the other, is exactly what we see with these and .
Before we get to this duality in the more complicated world of prime numbers, let us start with a simpler place that we all learn about in calculus class.
Infinite sums with alternate signs
The infinite sum has third of its summands actual zeros. If we ignore them, the rest of the numbers alternate between plus and minus signs, and decrease in absolute value. You have probably seen it in calculus class in its more general form, where there is a sequence which is positive and decreases to zero (written as , and we want to show that converges.
In our first attempt above, we looked at for , and noted that their difference is bounded. Just like in the primes, we get that the difference is exactly so we already know that this alternating sum is uniformly bounded. Let us show that this is true for more general sequences, as defined above, and that this difference is actually convergent.
We already have sums for subsequences with specific index modulo for the prime numbers, and for integers, so we now define them for the sequence as well. This time (we “ignore” the zero elements) and . Since is fixed, we remove it from our subscript. However, since we want to actually do some computations with them, we will add to the subscript and consider the partial sums:
CLAIM: For every we have that .
This claim follows easily from the fact that and is left as an exercise. Using this claim, we already know that the partial sums for our alternating signs sequence, which are more or less are always in . Bounded usually doesn’t imply convergent, but in our case it does.
If instead of starting at the index, we would start at the 2K index, then the segment will be which by definition is smaller, since . The partial sums for the two subsequences for starting at are by definition and . It now follows that
In other words, using this claim to the subsequences starting at for every shows that
is a Cauchy sequence and therefore converge. This almost shows that our alternating sum converges as well, sense we now are only missing the partial sums with odd number of summands. But these partial sums are very close to the even number of summands – indeed, their different is
so they both converge and to the same number, implying that the series converge to a finite number.
To summarize what we know so far: We wanted to understand both and , since we understand quite well and . However the direct move from the integers to the primes doesn’t work as we would like it to be, so we considered another perspective and looked at the difference. Also, if you think about it, in Euler’s proof in the previous post we looked at the sum. So now our proof idea has one more step in the middle:
We know that and are more or less the same. If we can now move from the integers to the prime numbers and show that is finite, we could show that and are more or less the same also, and complete the proof.
From integers to prime numbers
Our next step is to take the result from the previous section and pull it from the integers to the prime numbers. If our intuition so far is true, then we should expect to have an equality of the form:
When we open the brackets in the middle expression, we will again get products of primes which is how we get all the integers in the left expression, but we will also get products of images of . Initially, we constructed just to fit the result that we wanted to prove, which led it eventually to appear in this multiplication. Luckily for us, this function really likes multiplication – it is not hard to show that for any two integers we have . Such a function is called a multiplicative function. In particular, we can use it for decomposition of integers to prime numbers:
Thus, at least when formally opening the brackets, we get that the right hand side of the equation equals to the left hand side.
If we could also prove this equation as an equation between actual limits, the rest of the proof is very similar to Euler’s proof from the last post. Taking the logarithm of both sides we get that
The last sum has both positive and negative summands, so we can’t just say that is more or less and continue on. However, take it as an exercise to show that we can move to the sum without loosing too much (the difference is finite), and this will finish the proof.
Last time, we had to work a bit to show that the “formal” equality
is actually true when considering both sides as limits of sequences. This time, this is no longer an easy task, because the numbers keep changing signs, and the proof for this equality will be left for another time. However, if we look on a slightly different equation
then for we already know that the left hand side converges and in absolute value. Using the logarithm trick, you can also check that the right hand side converges to a finite number. In this case, the equality is an exercise in integrals and infinite sums, and the hard part is to show that this equality still holds when taking to zero.
To conclude, we have (almost) shown that the limit is a finite number (namely, the limit when we sum over more and more primes), and we know that diverges to infinity. Consider now the limits
where the element in the defining sequence is the summation in the numerator and denominator over all the primes . By definition, their sum is 1, while their difference is zero (finite divided by infinity), so we conclude that each of them is half. It is not exactly saying that more or less half of the primes are mod and half mod but it is really close! At least philosophically speaking this should be considered as saying that the prime number distribute evenly between and mod .
If it is true for 2 and true for 3 …
So what happens if , and there are many which are coprime to ? Even for we already have 4 such , and we can’t just look at the sum and difference and be done with it. The sum still works, and we know that it is infinite, namely , but we need something more general than just “difference”.
When looking at the proof above, we see that the important part was not the difference, but the fact that was a multiplicative function on the integers, and it was also 3-periodic. In other words, it was a composition
where the first map is the standard projection mod , while the second is a multiplicative function. More precisely, its restriction to the invertible numbers
is a group homomorphism, and the noninvertible numbers are sent to zero.
The group of invertible numbers mod contain exactly two numbers . Each group homomorphism must send 1 to 1 (both are the neutral elements), and therefore determined by where it sends . Both of the maps defined by and define such a group homomorphism. The second one is what we up until now thought of as , but the first one is also very important, and we actually used it all time.
To see this, let us fix another notation. For any function from the integers, we write
This already fits with the expressions that we saw up until now with in the previous sections. Moreover, if we let be the (-periodic) function
then we also get that , and similarly for the integers. Hence, we already see that this new definition is quite useful.
Moreover, by definition we get that is linear, namely
as a sequence (the summation up of primes ) and as limits if they exist.
In particular, since , it is not surprising that we got that . On the other hand, if we use the function we get which we know is infinity. The same goes for the integers, so basically our proof steps were:
It should be true for as well
Now, with this more general approach, let’s see if we can use it on . We start by looking at group homomorphisms .
The group of invertibles mod is a cylic group of size 4, since mod we have . There are exactly two such homomorphisms from this group, which we denote by defined by where we send the generator (namely ). We can now define
We can look at their sum to get only the primes which are or mod or their difference for or , but unlike with this time is not enough.
There are simply not enough homomorphisms into the set , so we instead cheat and increase this set to . The generator mod is of order 4, so it must be sent to an element such that . This time, over the complex numbers, we have exactly 4 such elements – , and therefore there are four multiplicative functions mod . So instead of and , we have the four combinations:
or to put it in matrix form:
Trying to get back the from the is basically finding the inverse to this matrix. You can check that this matrix is invertible by, for example, computing its nonzero determinant, or by noticing the fact that it is a Vandermonde matrix. However, this matrix, constructed from the multiplicative functions, is so special, that we can easily find its inverse. Indeed, denoting it by , you can check that . This is not just luck, and it always happens with matrices constructed from homomorphisms from a finite abelian group to . Such homomorphism are called characters and are the basis of group representation theory.
In general, the claim is that the matrix satisfies where is the Euler function which counts the number of which are coprime to . This claim is not too complicated to prove, but I will leave it for another time (or you can try to prove it as an exercise).
Now that we have the inverse, and we know that this should be true for all , the proof should be:
- Show that for all the infinite sums infinite, and go to infinity at the same speed (roughly like ).
- Multiply by the matrix to get for all the multiplicative functions mod .
- If came from the constant function (all invertibles were sent to ), then is simply the sum of all the with and it is infinity.
- Otherwise, the sum is zero (left as exercise – it is part of the fact that ). Using a similar proof to what we saw for series with alternating signs, show that must converge to a finite number.
- Pull these two results to the prime numbers: (1) (which we proved completely), and (2) is finite for non constant (which we gave an intuition why it should be true).
- Finally, multiply by the inverse and conclude that are more or less the same over the invertibles and grow to infinity like .
Modulo the exact analysis that we are missing when moving from to , this is the full proof. While we added a whole new concept in this final proof, this move from characteristic functions to multiplicative functions, the main theme of the proof is the same as in Euler’s proof in the previous post. Characteristic functions, while easy to look at and to describe, don’t have too much structure. In particular they don’t relate to the main property of the prime numbers, which are the multiplicative building blocks of the integers. Thus, we moved from the characteristic functions to the multiplicative functions (which have much more structure), “proved” our results there, and showed that we can pull it back to characteristic functions.
As I mentioned previously, this concept is the basis of group representation theory which appears almost everywhere in mathematics and in many other sciences. In particular, the all too well known Fourier transform that every engineer knows by heart, is basically this transformation between characteristic functions and multiplicative function. However, this will be left to another time and another post.