Ponder This Challenge - June 2008 - Probability of DNA testing

Ponder This Challenge:

This month's problems concern DNA testing. DNA tests can exclude people as the source of a DNA sample. Assume that the tests will never exclude the actual source but sometimes will fail to exclude someone who is not the actual source but who by chance happens to match the actual source at the DNA locations being tested.

Suppose before DNA testing we estimate X has p chance of being the actual source. Suppose there is a random match probability of 1/n. If testing does not exclude X what is our new estimate of the probability that X is the source?
Suppose we have a database of DNA data from k people. Suppose before comparing with the DNA sample we estimate that there is a chance p that the database contains the actual source and further that every person in the database is equally likely to be the actual source. Assume as above the random match probability for someone who is not the actual source is 1/n and that this probability is independent for multiple people who are not the actual source. Suppose we find exactly one person, X, in the database whose DNA matches the sample. What is our new estimate of the probability that the database contains the actual source (and therefore that the actual source is X).
In an actual case k was 338000 and the random match probability was said to be 1/1100000. Suppose we further assume (before checking) that there is a 20% chance that the actual donor is in the database and that everyone in the database is equally likely to be the actual donor. Suppose (as in the actual case) exactly one person, X, matching the sample is found in the database. Subject to the above assumptions, what is the probability that X is the actual donor? Give the probability rounded to four decimal places.

We will post the names of those who submit a correct, original solution! If you don't want your name posted then please include such a statement in your submission!

We invite visitors to our website to submit an elegant solution. Send your submission to the ponder@il.ibm.com.

If you have any problems you think we might enjoy, please send them in. All replies should be sent to: ponder@il.ibm.com

Solution

The answers are:
1. p/(p+(1/n)*(1-p))
2. p/(p+(k/n)*(1-p))
3. .4486
A sketch of the proofs follows.
1. We originally have a fraction p as the actual source and a fraction (1-p) not as the actual source. Testing does not exclude any of the actual source fraction but does exclude all but 1/n of the not actual source fraction. So the new actual source fraction is p/(p+(1/n)*(1-p)).
2. Again we start with a fraction p containing the actual source and (1-p) not containing the actual source. When the database contains the actual source we will get exactly one match (1-1/n)**(k-1) of the time (as there can be no random matches on the remaining people in the database). When the database does not contain the actual source we will get exactly one match k*(1/n)*(1-1/n)**(k-1) of the time (as we must have a random match for one of the k people in the database but not for the others). So the new fraction containing the actual source given exactly one match is p*(1-1/n)**(k-1)/(p*(1-1/n)**(k-1)+(1-p)*k*(1/n)*(1-1/n)**(k-1)) = p/(p+(k/n)*(1-p)). So searching against a database effectively increases the random match probability by a factor k (the size of the database).
3. Plugging n=1100000 and k=338000 into the formula above we obtain .44861337+ or .4486 rounded to four places.
If you have any problems you think we might enjoy, please send them in. All replies should be sent to: ponder@il.ibm.com

Solvers

Chris Hills (06.02.2008 @01:55:32 PM EDT)
Joseph DeVincentis (06.02.2008 @02:42:15 PM EDT)
Frank Yang (06.02.2008 @02:46:09 PM EDT)
John Hart (06.02.2008 @03:45:43 PM EDT)
Arthur Breitman (06.02.2008 @06:21:02 PM EDT)
Mark Perkins (06.02.2008 @06:48:20 PM EDT)
Henry Bottomley (06.02.2008 @08:00:09 PM EDT)
John T. Robinson (06.03.2008 @01:34:25 AM EDT)
Willem H. de Boer (06.03.2008 @06:06:15 AM EDT)
Movin Jain & Siddharth Agarwal (06.03.2008 @09:10:14 AM EDT)
Dan Dima (06.03.2008 @09:21:27 AM EDT)
Jeff Steele (06.03.2008 @10:41:15 AM EDT)
Michael Quist (06.03.2008 @11:34:58 AM EDT)
Dion Harmon (06.03.2008 @12:45:37 PM EDT)
V Balakrishnan (06.03.2008 @05:50:56 PM EDT)
Wolf Mosle (06.03.2008 @06:15:39 PM EDT)
Krunoslav Kovac (06.03.2008 @07:17:46 PM EDT)
Joshua Green (06.04.2008 @01:15:55 AM EDT)
James Dow Allen (06.04.2008 @03:29:46 AM EDT)
Joseph C. Bonneau (06.04.2008 @04:31:53 AM EDT)
Ryan Milligan (06.04.2008 @11:51:58 AM EDT)
Sylvain Becker (06.04.2008 @01:50:03 PM EDT)
Michael Schaaf (06.04.2008 @ 02:41:05 PM EDT)
Ariel Flat (06.04.2008 @03:23:58 PM EDT)
Gary M Gerkin (06.05.2008 @05:53:05 AM EDT)
Jesse Kolman (06.05.2008 @06:19:09 AM EDT)
Derek Jennings (06.05.2008 @02:39:11 PM EDT)
Xiaoyang Guan (06.05.2008 @07:16:31 PM EDT)
Jimmy He (06.05.2008 @10:27:16 PM EDT)
Zhou Guang (06.06.2008 @12:18:48 AM EDT)
Hongcheng Zhu (06.06.2008 @05:18:07 AM EDT)
Samantha Casanova (06.06.2008 @03:59:06 PM EDT)
J K Viswanath Raju (06.06.2008 @04:03:28 PM EDT)
Mark Pilloff (06.06.2008 @08:16:33 PM EDT)
Dan Colestock (06.07.2008 @12:58:13 AM EDT)
Karthik Tadinada (06.07.2008 @05:28:27 PM EDT)
Will Hasenplaugh (06.08.2008 @01:56:23 AM EDT)
Mithil Ramteke (06.08.2008 @02:18:59 AM EDT)
Vicent Pla (06.08.2008 @05:45:14 PM EDT)
Daniel Linhart (06.09.2008 @06:50:37 PM EDT)
Ashutosh Mahajan (06.11.2008 @11:11:24 AM EDT)
Donald T Dodson (06.11.2008 @11:44:09 AM EDT)
Mark Gordon (06.11.2008 @07:06:20 PM EDT)
Pascal Strubi (06.12.2008 @03:18:27 AM EDT)
Richard Bjorklund (06.12.2008 @06:41:16 PM EDT)
Nyles Heise (06.13.2008 @02:25:16 AM EDT)
Joachim Ripken (06.16.2008 @07:47:20 PM EDT)
Frank E. Mullin (06.17.2008 @11:10:22 AM EDT)
Fred Batty (06.17.2008 @12:02:57 AM EDT)
Greg Janee (06.17.2008 @11:54:57 PM EDT)
Gale Greenlee (06.17.2008 @12:25:19 PM EDT)
Ian Glover (06.20.2008 @06:16:37 AM EDT)
Amos Guler (06.21.2008 @03:36:58 PM EDT)
John Douma (06.22.2008 @02:20:54 PM EDT)
Ben Tarlow (06.23.2008 @06:55:14 AM EDT)
Albert Stadler (06.23.2008 @02:31:06 PM EDT)
Graham Hesketh (06.25.2008 @06:39:55 AM EDT)
Jiri Navratil (06.26.2008 @04:40:59 PM EDT)
Sha Lin Shan (06.28.2008 @05:14:45 AM EDT)
Boris Nikolaus (06.28.2008 @04:25:36 PM EDT)
Ranchu Mathew (06.29.2008 @04:47:54 AM EDT)
Andrea Andenna (06.29.2008 @05:14:51 PM EDT)
John G. Fletcher (06.30.2008 @02:25:25 PM EDT)
Phil Muhm (07.01.2008 @06:39:32 PM EDT)
Ramakrishna Katragadda (07.02.2008 @09:52:39 AM EDT)

Ponder This Challenge - February 2026 - Blot-avoiding backgammon strategy
Puzzle
Gadi Aleksandrowicz
27 Jan 2026
- Ponder This
Ponder This Challenge - January 2026 - Number splitting
Puzzle
Gadi Aleksandrowicz
29 Dec 2025
- Ponder This
Ponder This Challenge - December 2025 - Sums of a prime and an even number
Puzzle
Gadi Aleksandrowicz
30 Nov 2025
- Ponder This
Ponder This Challenge - November 2025 - The CAT sequence
Puzzle
Gadi Aleksandrowicz
01 Nov 2025
- Ponder This

Solution

Solvers

Related posts

Ponder This Challenge - February 2026 - Blot-avoiding backgammon strategy

Ponder This Challenge - January 2026 - Number splitting

Ponder This Challenge - December 2025 - Sums of a prime and an even number

Ponder This Challenge - November 2025 - The CAT sequence