Mediants in Mathematics

In the previous post we learned about the mediant, which is the “wrong” way to add rationals : $\frac{a}{b}\oplus \frac{c}{d} = \frac{a+c}{b+d}$ . While this operation is not well defined, and depends on the presentation of the rational numbers and not just their values, we saw that it still has interesting properties, and in particular $\frac{a}{b} < \frac{a+c}{b+d} < \frac {c}{d}$ where $b,c,d>0$ and $a\geq 0$ . Moreover, if we start with $(\frac{0}{1} , \frac{1}{1})$ and add their mediants to get $(\frac{0}{1} , \frac{1}{2}, \frac{1}{1})$ , and then repeat the process by adding the mediants of each subsequent pair $(\frac{0}{1} , \frac{1}{3}, \frac{1}{2}, \frac{2}{3}, \frac{1}{1})$ , and continue repeating it again and again, then we get all of the rationals in $[0,1]$ .

In the previous post we tried to keep the math elementary as possible. In this post we will go a little bit more advanced and take a look at this operation from several mathematical perspectives.

Some linear algebra

It is almost always good when linear algebra appears when trying to study a subject. It is generally simple to handle, and there are many useful results in linear algebra which are just waiting to be used, and this case of mediants is not different.

In the previous post there were already some clues that there is some linear algebra in the background. The first one was the simple inequality of rational numbers $\frac{a}{b} < \frac{c}{d}$ , where for positive $b,d>0$ is equivalent to $ad-bc<0$ . This expression is non other than the determinant of the matrix $\left(\begin{array}{cc} a & c\\ b & d \end{array}\right)$ being negative. This determinant also appeared later on as the area of parallelograms. However, the place that is best to start with, is our process of finding all the rationals in $[0,1]$ using mediant computations.

Instead of looking at the points separately, let’s look at the segments, because in the end each mediant is an average of the endpoints of a segment. Furthermore, as we saw in the previous post, looking at the rationals is a bit misleading, and we should actually think about them as integer vectors with two coordinates. So all in all a segment should be represented by 2 vectors with 2 entries each, or better yet – a $2\times 2$ matrix.

The matrix corresponding to the first segment $[\frac{0}{1}, \frac{1}{1}]$ is simply the matrix $\left(\begin{array}{cc} 0 & 1\\ 1 & 1 \end{array}\right)$ . This segment splits to $[\frac{0}{1}, \frac{1}{2}]$ and $[\frac{1}{2}, \frac{1}{1}]$ which correspond to $\left(\begin{array}{cc} 0 & 1\\ 1 & 2 \end{array}\right)$ and $\left(\begin{array}{cc} 1 & 1\\ 2 & 1 \end{array}\right)$ . Next, we will have the matrices $\left(\begin{array}{cc} 0 & 1\\ 1 & 3 \end{array}\right)$ , $\left(\begin{array}{cc} 1 & 1\\ 3 & 2 \end{array}\right)$ , $\left(\begin{array}{cc} 1 & 2\\ 2 & 3 \end{array}\right)$ and $\left(\begin{array}{cc} 2 & 1\\ 3 & 1 \end{array}\right)$ etc.

Each time we split a segment to two, the corresponding matrix “splits” to two, each one contains one of the columns of the original matrix (one of the end points of the original segment) and the second is the sum of the two columns of the original matrix (the mediant!). This behavior is best described using simple matrix multiplication:

$\left(\begin{array}{cc} a & c\\ b & d \end{array}\right) \cdot \left(\begin{array}{cc} 1 & 1\\ 0 & 1 \end{array}\right) = \left(\begin{array}{cc} a & a+c\\ b & b+d \end{array}\right)$

$\left(\begin{array}{cc} a & c\\ b & d \end{array}\right) \cdot \left(\begin{array}{cc} 1 & 0\\ 1 & 1 \end{array}\right) = \left(\begin{array}{cc} a+c & a\\ b+d & d \end{array}\right)$ .

If we set $L= \left(\begin{array}{cc} 1 & 1\\ 0 & 1 \end{array}\right)$ and $R= \left(\begin{array}{cc} 1 & 0\\ 1 & 1 \end{array}\right)$ , then multiplying by $L$ (respectively $R$ ) from the right is the same as doing our mediant step to split a segment and go down to the left part (respectively right part). If we also set $S=\left(\begin{array}{cc} 0 & 1\\ 1 & 1 \end{array}\right)$ to be our starting matrix, then the matrices we are getting in our process are exactly those of the form $SL^{n_1}R^{m_2}\cdots L^{n_k}R^{m_k}$ where all the $n_k,m_k\geq 0$ . In other words, we have $Sw$ where $w$ is in the semigroup generated by $L,R$ (like a group, but do not need invertibles).

Note that $\det(L)=\det(R)=1$ and that $\det(S)=-1$ , so that each matrix appearing in our process has determinant $-1$ (this is exactly the same result of the invariant areas of parallelogram that we saw in the previous post, only there we used the cut and glue method). Also, since the entries of $L,R$ and $S$ are nonnegative, so are the entries of their product. We can now ask whether the opposite is true – if a matrix has nonnegative entries and determinant $-1$ , can it be written as $SL^{n_1}R^{m_2}\cdots L^{n_k}R^{m_k}$ ? This is almost true, however there is a very simple sounter example – $T=\left(\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right)$ . Fortunately we are not that far off since $S = TL$ . This is not only some algebraic problem that we need to solve – it actually has a nice geometric interpretation. Instead of starting with $0=\frac{0}{1}$ and $1=\frac{1}{1}$ , we start with $0$ and $\infty = \frac{1}{0}$ . Using this presentation for infinity, and doing our mediant process, you can go over the proof from the previous post and show that you can get any nonnegative rational. However, since we now have the langauge of linear algebra, we can do much more.

Theorem: Let $M=\left(\begin{array}{cc} a & c\\ b & d \end{array}\right)$ be an integer valued matrix with nonnegative entries and $\det(M)=-1$ . Then we can write it as $TL^{n_1}R^{m_2}\cdots L^{n_k}R^{m_k}$ . In particular, one of the segments in the mediant process (starting at $0=\frac{0}{1}, \infty=\frac{1}{0}$ ) will be $[\frac{a}{b}, \frac{c}{d}]$ .

Proof: We prove by induction on $a+b+c+d$ . The minimum of this sum for a determinant $-1$ matrix is 2, and the only matrix with a sum equals to 2 is $T$ , in which case the theorem is trivial.

If $M\neq T$ , we claim that either the entries in the left column of $M$ are larger than the right column (namely $a\geq c, b\geq d$ ) or the right columns is larger than the left. Indeed, if we assume by negation that $a<c$ and $b>d$ , then since we are dealing with integers, we have that $-1=det(M)=ad-bc\leq (c-1)d-(d+1)c=-d-c$ . Under the assumption that $c\geq a+1\geq 1$ and $d\geq 0$ , it follows that $a=0,c=1,d=0$ . Completing it to a determinant $-1$ matrix also implies that $b=1$ so we have the matrix $T$ . Similarly if $a>c$ and $b<d$ , then $-1=\det(M)\geq (c+1)d-(d-1)c=d+c$ which is not possible at all.

Saying that the left column is larger than the right, so that $a\geq c, b\geq d$ exactly means that $M\cdot R^{-1} = \left(\begin{array}{cc} a-c & c\\ b-d & d \end{array}\right)$ has only nonnegative entries. Moreover, it still has determinant $-1$ and the sum of the entries is $a+b < a+b+c+d$ (in an invertible matrix we can’t have $c=d=0$ ). We can now use the induction step to finish the proof. A similar argument works if the right column is smaller than the left, but this time multiply by $L^{-1}$ . $\square$

This was a very simple proof using induction on the size of the entries, but under the current formulation it is not enough to prove that every nonnegative rational number can appear as one of the endpoints of the segments in our process. For that we need to understand what are the nonnegative integer value matrices with determinant $-1$ . Luckily for us, we understand them quite well.

Lemma: Let $a,b$ be nonnegative coprime integers. Then there are integers $c_0,d_0$ such that $ad_0-bc_0=-1$ . Moreover, any other solution $c,d$ for $ad-bc=-1$ satisfy $(c,d)=(c_0, d_0)+k(a,b)$ for some $k\in\mathbb{Z}$ .

Before we prove the theorem, we have an immediate corollary:

Corollary: If $a,b> 0$ are coprime, then $\frac{a}{b}$ appears in the mediant process starting with $\frac{0}{1}, \frac{1}{0}.$ In particular, any nonnegative rational number appears in the process as an edge point.

Proof (corollary): Apply the lemma for $a,b> 0$ and find $c_0,d_0$ such that $ad_0-bc_0=-1$ . If we take $k$ large enough, then $(c,d)=(c_0,d_0)+k(a,b)$ has nonnegative entries. We can now apply the last theorem to the matrix $\left(\begin{array}{cc} a & c\\ b & d \end{array}\right)$ to get that $[\frac{a}{b},\frac{c}{d}]$ is a segment appearing in our mediant process which completes the proof.

Proof (lemma): It is well known basic result that each coprime integers $a,b$ have some integers $c_0,d_0$ such that $ad_0-bc_0=-1$ . If $c,d$ is any other such solution, then subtracting the equations gives us $a(d_0-d)-b(c_0-c)=0$ . Since $a\mid a(d_0-d)$ we get that $a \mid b(c_0-c)$ . Since $gcd(a,b)=1$ it follows that $a\mid (c_0-c)$ , so we can write $c = c_0 + ka$ for some integer $k$ . Similarly, we get that $d = d_0 +tb$ for some integer $t$ and injecting them into our equation gives $0=-tab+kab=ab(k-t)$ . Thus $t=k$ which is what we wanted to prove. $\square$

We now reproved the proof from the previous post, using some simple linear algebra and got a much stronger result. Moreover, we saw that the segments appearing in our mediant process have a nice structure – it is basically an orbit of the semigroup generated by $L$ and $R$ . Next, let us combine linear algebra with some combinatorics.

Growing trees

Once we start to look on segments instead of rational numbers, some results become clearer to see. The first one relies on the simple fact that each segment splits to exactly two nonoverlapping segments (except the edges). If we connect each segment to its two parts we get a directed binary tree:

Another way to think of this tree, is instead of taking the segments as vertices, to take the mediants of their end points (so that the root is $\frac{1}{2}$ , its children are $\frac{1}{3}$ and $\frac{2}{3}$ etc). This tree is called the Stern-Brocot tree.

The important property of a tree, is that if we walk along two branches without backtracking, then it is enough to have one place where the branches diverge to conclude that we end up in different places. For example, if we go Right, Right, Left we end at the segment $[\frac{2}{3},\frac{3}{4}]$ , while going Right, Left we end up at $[\frac{1}{2},\frac{2}{3}]$ .

Translating the example above to our linear algebra setting, this means that the matrices $TRRL$ and $TRL$ are different, and therefore $RRL\neq RL$ . More generally, this tree structure means that if we have two words over $R,L$ , say $w_i = \prod R^{n_{i,k}} L^{m_{i,k}}$ for $i=1,2$ describing two different pathes, then $w_1 \neq w_2$ . This type of structure on a semigroup has a name and is called a free semigroup. More or less it says that elements corresponding to two words are the same only for trivial reasons (the words are the same). Just proving this results without this tree structure is not trivial, not to mention that even knowing to ask this question is not automatic in any way.

On circles and lines

To study our ill-defined mediant operation, we first went up from rationals on the real line to integer vectors on the plane and then to $2\times 2$ integer matrices, and in the end we always try to see what the results that we get imply on our original problem with rational numbers. We now try to formalize this problem a bit.

Suppose that we want to go Right, Right, Left and then look on the rational number on the right endpoint. First, considering the segments, we need to look at the matrix $M=TRRL= \left(\begin{array}{cc} 2 & 3\\ 3 & 4 \end{array}\right)$ . Second, the right endpoint is the right column, so we are looking at $TRRL\cdot \left(\begin{array}{c} 0\\ 1 \end{array}\right) = \left(\begin{array}{c} 3\\ 4 \end{array}\right)$ . Finally we divide the first entry by the second to get $\frac{3}{4}$ .

We acted with $M$ on the vector $\left(\begin{array}{c} 0\\ 1 \end{array}\right)$ and took the “shadow” of the resulting vector. Can we instead act on the “shadow” of the first vector, namely have something like $TRRL\cdot (\frac{0}{1}) = \frac{3}{4}$ ? More generally, for a $2\times 2$ matrix $M$ such that $M \cdot \left(\begin{array}{c} n\\ m \end{array}\right) = \left(\begin{array}{c} n'\\ m' \end{array}\right)$ we want a new operation defined by $M(\frac{n}{m}) = \frac{n'}{m'}$ . When we took the mediant of two rationals it was not well defined, however somehow this matrix operation is well defined, and can be written as

$\displaystyle{ \left(\begin{array}{cc} a & c\\ b & d \end{array}\right) (z) = \frac {az+c}{bz+d}}$ .

This is a well known operation called the Mobius transformation. It is defined in general for any complex $2\times 2$ matrix and acts on complex numbers.

The Mobius transformations have all sorts of nice properties. One of the most important is:

Theorem: The Mobius transformations are bijective maps on the complex plane. Moreover, if $\Omega$ is a line or a circle in the complex plane $\mathbb{C}$ , and $M$ is any Mobius transformation, then $M(\Omega)$ is again a line or a circle.

There is an extra nice property that Mobius transformations coming from real matrices satisfy. In our case the matrices are integer valued, so in particular they are real valued. They also all have determinant $-1$ , however, this is only because we multiply by $T=\left(\begin{array}{cc} 0 & 1\\ 1 & 0 \end{array}\right)$ in the beginning which has determinant $-1$ . We can just as well multiply by the identity matrix, and the reason that we chose $T$ instead, is because we use to think of $0=\frac{0}{1}$ as being on the left of $1=\frac{1}{1}$ being on the left of $\infty = \frac{1}{0}$ . With this small technicality out of the way, we may assume that out matrices have positive determinant, so we can applay the following.

Claim: If $M$ is a real valued matrix with positive determinant, then the real axis and the upper half plane (namely $\{z \mid \; Im(z)>0\}$ ) are invariant under the corresponding Mobius transformation.

Both of the theorem and claim above are standard results in complex analysis. If you want to try to prove them yourself, you should first try to show that each $2\times 2$ matrix can be written as products of an upper triangular matrix with ones on the diagonal, a diagonal matrix, and the matrix $T$ , and see what is the corresponding Mobius transformation for each such matrix.

In this new world, there is a nice interpretation of our mediant process. First, since we deal with real valued matrices, we can restrict our attention to the upper half plane and the real axis. Our original process began with $0$ and $1$ . These are of course on the real axis, but other than that no line on the upper half plane passes through them. However, there are many circles passing through each of them, and each of these circles are tangent to the real axis. Of all of these circles, there are exactly two which pass through $0$ and $1$ respectively, and are tangent to each other as well. These are the circles with radius $\frac{1}{2}$ around $(0,\frac{1}{2})$ and $(1,\frac{1}{2})$ respectively (blue circles in the image below).

We now have 3 lines\circles – the two circles from above and the real axis – every two of which are tangent to each other. Is there another circle which is tangent to all three of them? There is one and it touches the real axis at exactly $\frac{1}{2}$ which is the mediant of $\frac{0}{1}$ and $\frac{1}{1}$ (green circle below). We can continue like this, each time take two circles and the real line and find another circle which is tangent to all of them, and then we will get this picture:

In other words, the points where the circles touch the real axis are constructed exactly using the mediant process! So while the average that the mediant “define” is not exactly the middle like the arithmetic mean, it is the “middle” where these new circle touch the real axis.

We can go even further than that. Initially, we looked for another circle which is tangent to both of the big blue circles above and the real line. But in the world of Mobius transformations circles and lines are the same. While we don’t have another circle which is tangent to all of these, there is another line – the line $y=1$ . This line is “tangent” to the real line at the point at infinity. Starting at this “infinity” $=\frac{1}{0}$ point and at $\frac{0}{1}$ we instead see the following picture:

This way we get all of the positive rational numbers and not just in the segment $[0,1]$ . Of course we can reflect these to get the negative rational numbers as well.

Finally, in the picture above we have a lot of circles and exactly two lines ( $y=0$ and $y=1$ ). However, for the Mobius transformations all the lines and circles are the same, and we can actually convert the real line to a circle and the upper half plane to the inner part of that circle. Using this transformation (+the reflection from above), brings us to this picture in the Poincare Disk:

While you need some more knowledge in complex analysis and a bit of dynamics to actually appreciate the mathematics itself behind these images, it is nonetheless nice that we can use mediants to create these interesting pictures.

You can view the code for creating the images above (or just create more images like them) here: https://openprocessing.org/sketch/1014060

And even more …

While in the previous post we saw how the mediant can be studied using elementary mathematics, here we saw that it appears in all sorts of interesting places in the more advanced parts of mathematics, whether it is in algebra, combinatorics or complex analysis. These mediants and in particular our mediant process appears in other places, and in particular probably its most natural location is in continued fractions and the dynamics it defines, and it becomes even more interesting when tried to generalize to other places, but I will leave it for another time.