Processing math: 82%

Wednesday, December 31, 2014

Almost all integers contain the digit 3

This is my last chance to blog this year. However, I did not get back into the proof of Gödel's incompleteness theorem yet. So instead, this post is motivated by this Numberphile video, which explains why almost all integers contain the digit 3. More precisely, the percentage of integers that contain at least one occurrence of the digit '3' in the set Nu={0,1,2,3,,u1} goes to 100% as the upper bound u goes to infinity.

We are going to attack this problem with two different approaches. First, we'll use a recurrence relation approach. Second, we'll go over the combinatorial proof discussed in the video. Finally, we'll check that the two approaches do give the same answer.

In both approaches, we will use values of u that are powers of 10. So if u=10n, for nN+, then the set under consideration N10n, which we denote An, is the set of all of the (non-negative) integers containing at most n digits.

Recurrence relation approach

In this approach, we use Sn, for nN+, to denote the subset of An of all of the integers that contain at least one occurrence of the digit '3' and we use Tn to denote |Sn|, the cardinality of Sn.
  • Since A1={0,1,2,3,4,5,6,7,8,9}, S1={3} and T1=1.
  • Since A2={0,1,2,3,4,,98,99}, S2={3,13,23,30,31,32,33,34,,39,43,53,63,73,83,93}={3,13,23,43,53,63,73,83,93}{30,31,32,33,34,,39}. In other words, we can split the set S2 into two subsets, one whose elements all start with the digit '3'  (there are exactly 10 such integers), and the other one whose elements do not start with the digit '3' (there are exactly 9T1 of those, since if the first digit is not '3' then the least significant digit must belong to S1). Thus, T2=10+9T1=10+9=19.
  • Similarly, since A3={0,1,2,3,4,,998,999}, S3 is the union of two sets,  one whose elements all start with the digit '3'  (there are exactly 102 such integers), and the other one whose elements do not start with the digit '3' (there are exactly 9T2 of those, since if the most significant digit is not '3' then the number formed by the other digits must belong to S2). Thus, T3=102+9T2=100+919=271.
  • Since this reasoning clearly generalizes to all values of nN+, we obtain the following recurrence relation:
{T1=1Tn=10n1+9Tn1if n>1 Now, let's solve this recurrence relation by iteration to get a closed-form formula for Tn:

T1=1T2=101+9T1=10+9T3=102+9T2=102+9(10+9)=102+910+92T4=103+9T3=103+9(102+910+92)=103+9102+9210+93T5=104+9T4=104+9(103+9102+9210+93)=104+9103+92102+9310+94Tn=10n1+9Tn1=10n1+910n2+9210n3++9n2101+9n1

In conclusion, Tn=n1i=0(9ni110i) for nN+

Since this formula is rather ugly, let's turn to the combinatorial approach discussed in the Numberphile video.

Combinatorial approach

It is easy to determine the total number of integers with exactly n digits without having to enumerate them, namely 10×10××10=10n,  since there are exactly 10 digits to choose from for each position in the n-digit number. Note that this count actually includes all of the numbers with leading zeros, such as 01 and 00023, which are identical to 1 and 23, respectively. In other words, what we really have is  |An|=10n.

Similarly, we can compute the total number of integers with at most n digits that do not contain the digit '3', namely 9×9××9=9n, since there are now only 9 digits to choose from at each position in the integer.

In conclusion, Tn=10n9n.

This is both a much nicer and easier-to-derive closed-form formula than the one we obtained with the recurrence relation approach. But are the two formulas equal?

Let's check our work

Let P(n), for nN+, denote: n1i=0(9ni110i)=10n9n

We now prove by mathematical induction that P(n) holds for  nN+.

Basis:
  •  11i=0(91i110i)=0i=0(91i110i)=9101100=90100=1
  • 10191=109=1
  • Therefore P(1) holds
Inductive step:
  • Assume that P(n) holds for any nN+, that is: n1i=0(9ni110i)=10n9n[inductive hypothesis]
  • Let's compute the left-hand side of P(n+1):
    ni=0(9ni10i)=9(19ni=0(9ni10i))=9(ni=0(9ni110i))=9(n1i=0(9ni110i)+ni=n(9ni110i))=9(n1i=0(9ni110i)+(9nn110n))=9(n1i=0(9ni110i)+10n9)=9((10n9n)+10n9)by the inductive hypothesis=910n99n+10n=1010n99n=10n+19n+1
  • Therefore P(n+1) holds.

Main result

We are now in a position to prove our main result, namely that the percentage of integers that contain the digit '3' in the set An  goes to 100% as n goes to infinity. This percentage is equal to
100×Tn|An|=100×(10n9n10n)=100×(19n10n)=100100(910)n.
And this percentage goes to 100% as n goes to infinity, since lim.

Discussion

Of course, all of the proofs above still hold if we had picked the digit '8' (say) instead of the digit '3'. In other words, it is also true that almost all integers contain the digit '8'. So, if we use E_n to denote the cardinality of the subset of A_n of all of the integers that contain at least one occurrence of the digit '8', E_n = 10^n-9^n. Similarly for the digit '5' (say): F_n = 10^n-9^n.

But, if both 100\times\frac{T_n}{\left|A_n\right|} and  100\times\frac{E_n}{\left|A_n\right|} go to 100% as n goes to infinity, the sum of these two percentages will eventually exceed 100%! That is true. However, this sum double counts all of the integers that contain both the digit '3' and the digit '8'. The correct percentage of such integers is 100\times\frac{10^n-8^n}{10^n}, which also goes to 100% as n goes to infinity.

So there is no paradox here. When n gets large enough, almost all of the integers will contain all of the digits 0 through 9. Note that all of these results are asymptotic. Indeed, it takes relatively large values of n for these percentages to get anywhere close to 100%. For example, for the percentage of integers that contain the digit '3' to reach 90%, 95%, 99% and 99.99%, the number n of digits in the integer must be larger than 21, 28, 43 and 87, respectively.