We are going to attack this problem with two different approaches. First, we'll use a recurrence relation approach. Second, we'll go over the combinatorial proof discussed in the video. Finally, we'll check that the two approaches do give the same answer.
In both approaches, we will use values of u that are powers of 10. So if u=10n, for n∈N+, then the set under consideration N10n, which we denote An, is the set of all of the (non-negative) integers containing at most n digits.
Recurrence relation approach
In this approach, we use Sn, for n∈N+, to denote the subset of An of all of the integers that contain at least one occurrence of the digit '3' and we use Tn to denote |Sn|, the cardinality of Sn.
- Since A1={0,1,2,3,4,5,6,7,8,9}, S1={3} and T1=1.
- Since A2={0,1,2,3,4,⋯,98,99}, S2={3,13,23,30,31,32,33,34,⋯,39,43,53,63,73,83,93}={3,13,23,43,53,63,73,83,93}∪{30,31,32,33,34,⋯,39}. In other words, we can split the set S2 into two subsets, one whose elements all start with the digit '3' (there are exactly 10 such integers), and the other one whose elements do not start with the digit '3' (there are exactly 9⋅T1 of those, since if the first digit is not '3' then the least significant digit must belong to S1). Thus, T2=10+9⋅T1=10+9=19.
- Similarly, since A3={0,1,2,3,4,⋯,998,999}, S3 is the union of two sets, one whose elements all start with the digit '3' (there are exactly 102 such integers), and the other one whose elements do not start with the digit '3' (there are exactly 9⋅T2 of those, since if the most significant digit is not '3' then the number formed by the other digits must belong to S2). Thus, T3=102+9⋅T2=100+9⋅19=271.
- Since this reasoning clearly generalizes to all values of n∈N+, we obtain the following recurrence relation:
T1=1T2=101+9T1=10+9T3=102+9T2=102+9(10+9)=102+9⋅10+92T4=103+9T3=103+9(102+9⋅10+92)=103+9⋅102+92⋅10+93T5=104+9T4=104+9(103+9⋅102+92⋅10+93)=104+9⋅103+92⋅102+93⋅10+94⋯Tn=10n−1+9Tn−1=10n−1+9⋅10n−2+92⋅10n−3+⋯+9n−2⋅101+9n−1
In conclusion, Tn=n−1∑i=0(9n−i−1⋅10i) for n∈N+
Since this formula is rather ugly, let's turn to the combinatorial approach discussed in the Numberphile video.
Combinatorial approach
Similarly, we can compute the total number of integers with at most n digits that do not contain the digit '3', namely 9×9×⋯×9=9n, since there are now only 9 digits to choose from at each position in the integer.
In conclusion, Tn=10n−9n.
This is both a much nicer and easier-to-derive closed-form formula than the one we obtained with the recurrence relation approach. But are the two formulas equal?
Let's check our work
Let P(n), for n∈N+, denote: n−1∑i=0(9n−i−1⋅10i)=10n−9n
We now prove by mathematical induction that P(n) holds for n∈N+.
Basis:
- 1−1∑i=0(91−i−1⋅10i)=0∑i=0(91−i−1⋅10i)=91−0−1⋅100=90⋅100=1
- 101−91=10−9=1
- Therefore P(1) holds
- Assume that P(n) holds for any n∈N+, that is: n−1∑i=0(9n−i−1⋅10i)=10n−9n[inductive hypothesis]
- Let's compute the left-hand side of P(n+1):
∑ni=0(9n−i⋅10i)=9(19∑ni=0(9n−i⋅10i))=9(∑ni=0(9n−i−1⋅10i))=9(∑n−1i=0(9n−i−1⋅10i)+∑ni=n(9n−i−1⋅10i))=9(∑n−1i=0(9n−i−1⋅10i)+(9n−n−1⋅10n))=9(∑n−1i=0(9n−i−1⋅10i)+10n9)=9((10n−9n)+10n9)by the inductive hypothesis=9⋅10n−9⋅9n+10n=10⋅10n−9⋅9n=10n+1−9n+1
- Therefore P(n+1) holds.
Main result
We are now in a position to prove our main result, namely that the percentage of integers that contain the digit '3' in the set An goes to 100% as n goes to infinity. This percentage is equal to
100×Tn|An|=100×(10n−9n10n)=100×(1−9n10n)=100−100(910)n.
And this percentage goes to 100% as n goes to infinity, since lim.
Discussion
Of course, all of the proofs above still hold if we had picked the digit '8' (say) instead of the digit '3'. In other words, it is also true that almost all integers contain the digit '8'. So, if we use E_n to denote the cardinality of the subset of A_n of all of the integers that contain at least one occurrence of the digit '8', E_n = 10^n-9^n. Similarly for the digit '5' (say): F_n = 10^n-9^n.
But, if both 100\times\frac{T_n}{\left|A_n\right|} and 100\times\frac{E_n}{\left|A_n\right|} go to 100% as n goes to infinity, the sum of these two percentages will eventually exceed 100%! That is true. However, this sum double counts all of the integers that contain both the digit '3' and the digit '8'. The correct percentage of such integers is 100\times\frac{10^n-8^n}{10^n}, which also goes to 100% as n goes to infinity.
So there is no paradox here. When n gets large enough, almost all of the integers will contain all of the digits 0 through 9. Note that all of these results are asymptotic. Indeed, it takes relatively large values of n for these percentages to get anywhere close to 100%. For example, for the percentage of integers that contain the digit '3' to reach 90%, 95%, 99% and 99.99%, the number n of digits in the integer must be larger than 21, 28, 43 and 87, respectively.