Puzzle
5 minute read

Ponder This Challenge - November 2005 - Size of finite random binary tree

Ponder This Challenge:

Puzzle for November 2005.

This month's puzzle concerns random binary trees. Let p be a fixed parameter between 0 and 1. Starting with the complete infinite binary tree retain each edge randomly and independently with probability p. Our random binary tree is the portion connected to the root. So for example the binary tree consisting of the root alone will be selected with probability (1-p)**2 (ie when neither edge out of the root is retained).

Some of these random binary trees will contain an infinite number of vertices. Throw these out. Then we ask what is the expected number of vertices as a function of p of a random binary tree selected in this way. This is a problem for which it is possible to obtain the right answer in a non-rigorous way. We will not attempt to check the derivation of submitted solutions but you should think about what would be required to prove your answer rigorously.

Hints to the puzzle added on November 17**th 2005:
A lot of wrong answers so perhaps a hint is in order. It may be helpful to compute the fraction of random binary trees which are infinite and therefore thrown out. And remember you are computing the expected size over the remaining trees after throwing out the infinite trees (or to put it another way the expected size given that the tree is finite).


The first 100 people who answer correctly will be listed. The answer will be posted a week after the 100th is received, or at the end of the month.

We will post the names of those who submit a correct, original solution! If you don't want your name posted then please include such a statement in your submission!

We invite visitors to our website to submit an elegant solution. Send your submission to the ponder@il.ibm.com.

If you have any problems you think we might enjoy, please send them in. All replies should be sent to: ponder@il.ibm.com

Solution

  • Answer:

    Fix p and let x(n) be the probability that a random binary tree, T, selected (without throwing out infinite trees) has depth at least n. Then x(n+1)=2*p*x(n) - (p*x(n))**2 (by the principle of inclusion and exclusion since if T is has depth at least n+1 then some subtree of a son of the root of T has depth at least n but we don't want to double count the case where the root of T has two sons and each is the root of a subtree of depth at least n). We are interested in the behavior of x(n) as n goes to infinity. Clearly x(n) has a limit as it is non-increasing and bounded below by 0. Let this limit be x. Then x satisfies the equation x=2*p*x - (p*x)**2. The equation has two solutions x=0 or x=(2*p-1)/(p*p). It is easy to show the limit is equal to the first solution, x=0, for 0<p<.5, and that the limit is equal to the second solution, x=(2*p-1)/(p*p) for .5<p<1. Clearly x amounts to the probability that a random binary tree has infinite size. So 1-x is the probability that it is finite. This is 1 for 0<p<.5 and ((1-p)/p)**2 for .5<p<1.

    Now we can compute the expected size, s, of a random binary tree (with the removal of infinite trees). Suppose first 0<p<.5. There are 2**n vertices at depth n which could be included in a random binary tree. Each will be included if each of the n edges in the path to the root is chosen. This will occur with probability p**n. So the expected number of vertices at depth n is (2*p)**n. So summing over all depths s=1+2*p+(2*p)**2+...+(2*p)**n+... . This is a simple geometric series. Since p<.5, 2*p<1. So the sum converges and s=1/(1-2*p).

    Next suppose .5<p<1. We have the following equation for s by expressing the expected size of a random binary tree, T, in terms of the expected sizes of the subtrees whose roots are the sons of the root of T.

    s = ((1-p)**2 + 2*p*(1-p)*q*(1+s) + p*p*q*q*(1+2*s))/
    ((1-p)**2 + 2*p*(1-p)*q + p*p*q*q )

    Here q = ((1-p)/p)**2 which as shown above is the probability that a random binary tree is finite. The equation for s has three terms depending on how many edges out of the root are retained. So for example the (1-p)**2 term corresponds to the case where T consists solely of the root. The factors of q account for the fact that we are throwing out the cases where a subtree (and therefore T) is infinite. The divisor, d, is included to account for the fact that we are only averaging over the cases where T is finite. Note by substituting for q and simplifying we have d=((1-p)/p)**2=q as expected. Substituting into the equation for s we obtain

    s = p**2 + 2*p*(1-p)*(1+s) + ((1-p)**2)*(1+2*s)
    = 1 + s*(2*p*(1-p) + 2*(1-p)**2)
    = 1 + s*2*(1-p)

    so solving for s we have s=1/(2*p-1). This argument is not entirely rigorous as we are assuming s exists. However I believe it could be made rigorous by defining s(n) as the expected number of vertices in the random tree within distance n from the root and then letting n go to infinity.

    An alternative approach for .5<p<1 is to show that the distribution of random binary trees is the same as for q=1-p. For consider any finite binary tree with m edges. Any such tree will have m+2 places where an edge was not selected (else the tree would be bigger). Hence the tree will be chosen (without throwing out infinite trees) with probability (p**m)*((1-p)**(m+2)). But this can be rewritten as ((1-p)**m)*(p**(m+2))*(((1-p)/p)**2). So the distribution of finite trees for p is the same as the distribution of finite trees for q=1-p multiplied by the constant factor ((1-p)/p)**2. The sum of the probabilities over all finite trees will be 1 for q. Hence it will be ((1-p)/p)**2 for p. This agrees with the argument above that ((1-p)/p)**2 is the probability that the random binary tree is finite. So when we exclude infinite trees the constant multiplier in the distribution for p will disappear making the distribution of random binary trees exactly the same as for q=1-p. This means the expected size will be 1/(1-2*q) = 1/(2*p-1) as above.


    If you have any problems you think we might enjoy, please send them in. All replies should be sent to: ponder@il.ibm.com

Solvers

  • Eugene Vasilchenko (11.01.2005 @12:38:03 PM EDT)
  • Dan Dima (11.01.2005 @05:41:44 PM EDT)
  • Mark Perkins (11.01.2005 @07:27:05 PM EDT)
  • Dan Bernstein (11.02.2005 @09:25:46 AM EDT)
  • Christopher Howe (11.02.2005 @11:35:25 AM EDT)
  • Xiaoyang Guan (11.03.2005 @12:33:27 AM EDT)
  • Luke Pebody (11.03.2005 @06:24:58 PM EDT)
  • Chuck (11.04.2005 @06:38:01 PM EDT)
  • Lukas A Saul (11.04.2005 @10:54:25 PM EDT)
  • Michael Brand (11.05.2005 @03:41:20 AM EDT)
  • Itsik Horovitz (11.05.2005 @08:00:43 AM EDT)
  • Franco Bassi (11.06.2005 @04:20:26 PM EDT)
  • Patrick J. LoPresti (11.07.2005 @01:15:40 PM EDT)
  • Roberto Tauraso (11.07.2005 @07:10:47 PM EDT)
  • Ameet Deshpande (11.09.2005 @11:05:26 AM EDT)
  • Ken Bateman (11.09.2005 @12:16:54 PM EDT)
  • Evgeni (11.15.2005 @09:40:37 PM EDT)
  • Tomáš Jurík (11.16.2005 @11:13:10 AM EDT)
  • David Friedman (11.16.2005 @02:31:40 PM EDT)
  • Dharmadeep Muppalla (11.20.2005 @01:30:30 AM EDT)
  • Wolfgang Kais (11.21.2005 @03:11:18 AM EDT)
  • David McQuillan (11.23.2005 @06:20:25 AM EDT)
  • Huicheng Guo (11.29.2005 @10:53:27 PM EDT)
  • Vishal Mamania (12.01.2005 @02:31:10 AM EDT)

Related posts