Wednesday, December 31, 2014

Almost all integers contain the digit 3

This is my last chance to blog this year. However, I did not get back into the proof of Gödel's incompleteness theorem yet. So instead, this post is motivated by this Numberphile video, which explains why almost all integers contain the digit 3. More precisely, the percentage of integers that contain at least one occurrence of the digit '3' in the set \(\mathbb{N}_u = \{0,1,2,3,\cdots,u-1\}\) goes to 100% as the upper bound \(u\) goes to infinity.

We are going to attack this problem with two different approaches. First, we'll use a recurrence relation approach. Second, we'll go over the combinatorial proof discussed in the video. Finally, we'll check that the two approaches do give the same answer.

In both approaches, we will use values of \(u\) that are powers of 10. So if \(u = 10^n\), for \(n \in \mathbb{N}^+\), then the set under consideration \(\mathbb{N}_{10^n}\), which we denote \(A_n\), is the set of all of the (non-negative) integers containing at most \(n\) digits.

Recurrence relation approach

In this approach, we use \(S_n\), for \(n \in \mathbb{N}^+\), to denote the subset of \(A_n\) of all of the integers that contain at least one occurrence of the digit '3' and we use \(T_n\) to denote \(|S_n|\), the cardinality of \(S_n\).
  • Since \(A_1=\{0,1,2,3,4,5,6,7,8,9\}\), \(S_1=\{3\}\) and \(T_1=1\).
  • Since \(A_2=\{0,1,2,3,4,\cdots,98,99\}\), \(S_2=\{ 3,13,23,30,31,32,33,34,\cdots,39,43,53,63,73,83,93\}\)\(=\{ 3,13,23,43,53,63,73,83,93\}\)\( \cup \{30,31,32,33,34,\cdots,39\}\). In other words, we can split the set \(S_2\) into two subsets, one whose elements all start with the digit '3'  (there are exactly 10 such integers), and the other one whose elements do not start with the digit '3' (there are exactly \(9\cdot T_1\) of those, since if the first digit is not '3' then the least significant digit must belong to \(S_1\)). Thus, \(T_2 = 10 + 9\cdot T_1 = 10 + 9 = 19\).
  • Similarly, since \(A_3=\{0,1,2,3,4,\cdots,998,999\}\), \(S_3\) is the union of two sets,  one whose elements all start with the digit '3'  (there are exactly \(10^2\) such integers), and the other one whose elements do not start with the digit '3' (there are exactly \(9\cdot T_2\) of those, since if the most significant digit is not '3' then the number formed by the other digits must belong to \(S_2\)). Thus, \(T_3 = 10^2 + 9\cdot T_2= 100 + 9\cdot 19 = 271\).
  • Since this reasoning clearly generalizes to all values of \(n \in \mathbb{N}^+\), we obtain the following recurrence relation:
\[ \left\{ \begin{array}{ll}
                    T_1 = 1 & \\
                    T_n = 10^{n-1} + 9T_{n-1} & \text{if}\ n>1
             \end{array} \right.      
\] Now, let's solve this recurrence relation by iteration to get a closed-form formula for \(T_n\):

\(
\begin{array}{l@{0pt}l}
T_1  & = 1\\
T_2 = 10^1 + 9T_1  & = 10+9 \\
T_3 = 10^2 + 9T_2  & = 10^2+9(10+9) \\
                                 & = 10^2 + 9\cdot 10 + 9^2 \\
T_4 = 10^3 + 9T_3  & = 10^3+9(10^2 + 9\cdot 10 + 9^2) \\
                                 & = 10^3+9\cdot 10^2 + 9^2 \cdot 10 + 9^3\\
T_5 = 10^4 + 9T_4  & = 10^4+9(10^3+9\cdot 10^2 + 9^2 \cdot 10 + 9^3)\\
                                 & = 10^4+9\cdot 10^3+9^2\cdot 10^2 + 9^3 \cdot 10 + 9^4\\
\cdots &\\
T_n = 10^{n-1} + 9T_{n-1} & = 10^{n-1}+9\cdot 10^{n-2}+9^2\cdot 10^{n-3} + \cdots + 9^{n-2}\cdot 10^1 + 9^{n-1}\\
\end{array}
\)

In conclusion, \(\displaystyle T_n = \sum_{i=0}^{n-1} \left(  9^{n-i-1}\cdot 10^i\right) \) for \(n \in \mathbb{N}^+\)

Since this formula is rather ugly, let's turn to the combinatorial approach discussed in the Numberphile video.

Combinatorial approach

It is easy to determine the total number of integers with exactly \(n\) digits without having to enumerate them, namely \(10 \times 10 \times \cdots \times 10 = 10^n\),  since there are exactly 10 digits to choose from for each position in the \(n\)-digit number. Note that this count actually includes all of the numbers with leading zeros, such as 01 and 00023, which are identical to 1 and 23, respectively. In other words, what we really have is  \(\left|A_n\right|= 10^n\).

Similarly, we can compute the total number of integers with at most \(n\) digits that do not contain the digit '3', namely \(9 \times 9 \times \cdots \times 9 = 9^n\), since there are now only 9 digits to choose from at each position in the integer.

In conclusion, \(T_n = 10^n -9^n\).

This is both a much nicer and easier-to-derive closed-form formula than the one we obtained with the recurrence relation approach. But are the two formulas equal?

Let's check our work

Let \(P(n) \), for \(n \in \mathbb{N}^+\), denote: \(\displaystyle \sum_{i=0}^{n-1} \left(  9^{n-i-1}\cdot 10^i\right) = 10^n - 9^n\)

We now prove by mathematical induction that \( P(n)\) holds for  \(n \in \mathbb{N}^+.\)

Basis:
  •  \(\displaystyle \sum_{i=0}^{1-1} \left(  9^{1-i-1}\cdot 10^i\right) =  \sum_{i=0}^{0} \left(  9^{1-i-1}\cdot 10^i\right) = 9^{1-0-1}\cdot 10^0 = 9^0 \cdot 10^0 =1 \)
  • \( 10^1 - 9^1 = 10 - 9 = 1\)
  • Therefore \(P(1)\) holds
Inductive step:
  • Assume that \(P(n)\) holds for any \(n \in \mathbb{N}^+\), that is: \[ \sum_{i=0}^{n-1} \left(  9^{n-i-1}\cdot 10^i\right) = 10^n - 9^n \quad \text{[inductive hypothesis]} \]
  • Let's compute the left-hand side of \(P(n+1)\):
    \[
    \begin{array}{l@{0pt}l}
    \sum_{i=0}^{n} \left(  9^{n-i}\cdot 10^i\right) & = 9\left(\frac{1}{9}\sum_{i=0}^{n} \left(  9^{n-i}\cdot 10^i\right)\right)\\
        & = 9\left(\sum_{i=0}^{n} \left(  9^{n-i-1}\cdot 10^i\right)\right)\\
       & = 9\left(\sum_{i=0}^{n-1} \left(  9^{n-i-1}\cdot 10^i\right)  + \sum_{i=n}^{n} \left(  9^{n-i-1}\cdot 10^i\right) \right)\\
       & = 9\left(\sum_{i=0}^{n-1} \left(  9^{n-i-1}\cdot 10^i\right)  + \left(  9^{n-n-1}\cdot 10^n\right) \right)\\
     & = 9\left(\sum_{i=0}^{n-1} \left(  9^{n-i-1}\cdot 10^i\right)  + \frac{10^n}{9} \right)\\
    &  = 9\left( \left(10^n - 9^n\right)  + \frac{10^n}{9} \right) \quad \text{by the inductive hypothesis} \\
     & = 9\cdot 10^n - 9\cdot 9^n + 10^n\\
     & = 10\cdot 10^n - 9\cdot 9^n\\
     & = 10^{n+1} - 9^{n+1}\\
    \end{array}
    \]
  • Therefore \(P(n+1)\) holds.

Main result

We are now in a position to prove our main result, namely that the percentage of integers that contain the digit '3' in the set \(A_n\)  goes to 100% as \(n\) goes to infinity. This percentage is equal to
\(100\times\frac{T_n}{\left|A_n\right|}=100\times\left(\frac{10^n-9^n}{10^n}\right)=100\times\left(1-\frac{9^n}{10^n}\right)= 100 - 100\left(\frac{9}{10}\right)^n\).
And this percentage goes to 100% as \(n\) goes to infinity, since \(\displaystyle \lim_{n \to \infty}\left( \frac{9}{10}\right)^n = 0\).

Discussion

Of course, all of the proofs above still hold if we had picked the digit '8' (say) instead of the digit '3'. In other words, it is also true that almost all integers contain the digit '8'. So, if we use \(E_n\) to denote the cardinality of the subset of \(A_n\) of all of the integers that contain at least one occurrence of the digit '8', \(E_n = 10^n-9^n\). Similarly for the digit '5' (say): \(F_n = 10^n-9^n\).

But, if both \(100\times\frac{T_n}{\left|A_n\right|}\) and  \(100\times\frac{E_n}{\left|A_n\right|}\) go to 100% as \(n\) goes to infinity, the sum of these two percentages will eventually exceed 100%! That is true. However, this sum double counts all of the integers that contain both the digit '3' and the digit '8'. The correct percentage of such integers is \(100\times\frac{10^n-8^n}{10^n}\), which also goes to 100% as \(n\) goes to infinity.

So there is no paradox here. When \(n\) gets large enough, almost all of the integers will contain all of the digits 0 through 9. Note that all of these results are asymptotic. Indeed, it takes relatively large values of \(n\) for these percentages to get anywhere close to 100%. For example, for the percentage of integers that contain the digit '3' to reach 90%, 95%, 99% and 99.99%, the number \(n\) of digits in the integer must be larger than 21, 28, 43 and 87, respectively.