Puzzle
5 minute read

Ponder This Challenge - January 2023 - Gene editing

A "gene" of length nn is a string of nn letters from the character set {‘G’, ‘A’, ‘C’, ‘T’}.

The gene can be transformed into another gene in a sequence of steps. Each step changes exactly one letter in the gene.

In one step, the leftmost letter in the gene can be changed to any character in the set {‘G’, ‘A’, ‘C’, ‘T’}. Changing the remaining letters is subject to additional constraints:

  1. ‘T’ can be changed to ‘C’ if all the letters to its left are ‘C’.
  2. ‘T’ can be changed to ‘G’ if the letter to its immediate left is ‘C’, and the remaining letters to its left are ‘A’.
  3. ‘C’ can be changed to ‘T’ if all the letters to its left are ‘T’.
  4. ‘C’ can be changed to ‘A’ if the letter to its immediate left is ‘C’, and the remaining letters to its left are ‘A’.
  5. ‘A’ can be changed to ‘C’ if the letter to its immediate left is ‘C’, and the remaining letters to its left are ‘A’.
  6. ‘G’ can be changed to ‘T’ if all the letters to its left are ‘T’.

Given any gene, we wish to find a way to convert it to the gene "GGG...G" in the minimal number of steps.

For example, starting with the gene [‘C’, ‘T’, ‘T’, ‘G’, ‘G’], we can convert it to [‘G’, ‘G’, ‘G’, ‘G’, ‘G’] in the following eight steps:

[‘C’, ‘T’, ‘T’, ‘G’, ‘G’] [‘C’, ‘C’, ‘T’, ‘G’, ‘G’] [‘A’, ‘C’, ‘T’, ‘G’, ‘G’] [‘A’, ‘C’, ‘G’, ‘G’, ‘G’] [‘T’, ‘C’, ‘G’, ‘G’, ‘G’] [‘T’, ‘T’, ‘G’, ‘G’, ‘G’] [‘C’, ‘T’, ‘G’, ‘G’, ‘G’] [‘C’, ‘G’, ‘G’, ‘G’, ‘G’] [‘G’, ‘G’, ‘G’, ‘G’, ‘G’]

The solution can be written in the following manner, indicating sequentially in each step which letter position to convert (starting from 1), and to which letter to convert it:

[(2, ‘C’), (1, ‘A’), (3, ‘G’), (1, ‘T’), (2, ‘T’), (1, ‘C’), (2, ‘G’), (1, ‘G’)]

For the starting position [‘A’, ‘C’, ‘A’, ‘C’], the minimum number of steps to reach [‘G’, ‘G’, ‘G’, ‘G’] is 25.

Your goal: Find a starting position of 20 letters using only the characters ‘A’ and ‘C’ such that the minimal number of steps to reach a gene of all ‘G’ characters is between 880,000 and 890,000.

Provide your solution in two lines, the first containing the starting position in the same format as [‘A’, ‘C’, ‘A’, ‘C’], and the second line containing the minimal number of steps.

A Bonus "*" will be given for finding the minimal number of steps required for reaching the all-‘G’ state from an all-‘T’ state, for n=100 letters.


We will post the names of those who submit a correct, original solution! If you don't want your name posted then please include such a statement in your submission!

We invite visitors to our website to submit an elegant solution. Send your submission to the ponder@il.ibm.com.

If you have any problems you think we might enjoy, please send them in. All replies should be sent to: ponder@il.ibm.com

Solution

  • January 2023 Solution:

    This detailed explanation is due to Bertram Felgenhauer. Thanks!

    There are two solutions that take exactly 885000 steps:

    ['A','C','A','A','A','A','C','A','A','A','A','A','A','A','C','C','C','C','A','C']
    ['C','C','A','C','C','C','A','A','C','C','C','C','C','C','A','A','A','A','C','A']

    For the bonus problem, the answer is 845100400152152934331135480100.

    Extra: The following string requires 10^100 steps to reach G^332:

    AACACAAACCCAACCCCCAAAACCCAAAAACAAAAACCACCAACAACCACAAACACCCCCACCACCCCAACCCACCAACCCAAACCACCAAAC CAAACAACCCACCACCCACAAACAACACCCCCCAAACAACAACCCCACACAACAACACACCCAAACAACCCCACCACAACCCCCCCCCCCCAA CACAACCACCACCACACACCCCCACCCACCACACCACAACACACCAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAACAAAAACACACAAAAC

    Approach

    Except for the leftmost position, we have rules to change the letters as follows:
    A <--> C <--> T <--> G

    That is, to change A to G, we must go via C and T, and the same in reverse for changing G to A. Furthermore, whenever we change the right-most character of the string, all the preceding letters are determined by the rules. So for the ACAC example, the solution must contain the steps TTTC -> TTTT and AACT -> AACG. Finally, there is no point in changing the final letter more often than that. So the minimal cost is 2 steps, plus the number of steps needed for producing the required prefixes:

    ACA ->* TTT (4 steps) TTT ->* AAC (5 steps) AAC ->* GGG (14 steps)

    This can be readily implemented, and as most subproblems are comprised of prefixes of AC, C, and T* (that is, prefixes of the prefixes that are required by the rules), memoization is very effective.

    Further observations

    It's possible to derive closed formulas for the recurring cases. For example:

    • C^n <->* T^n requires n steps both ways
    • A^(n-1)C <->* A^n requires 2^n - 1 steps both ways (and produces all 2^n possible strings from {A,C}^20 along the way)
    • A^n <->* C^n requires (4*2^n - (-1)^n - 3)/6 steps both ways

    Not everything is symmetric: AAA ->* TTT requires 7 steps, while TTT ->* AAA requires 8

Solvers

  • *Martin Bisson (5/1/2023 10:55 AM IDT)
  • *Lazar Ilic (5/1/2023 3:16 PM IDT)
  • Paul Shaw (5/1/2023 3:39 PM IDT)
  • *Paul Revenant (5/1/2023 4:05 PM IDT)
  • *Alper Halbutogullari (5/1/2023 8:15 PM IDT)
  • *Hakan Summakoğlu (5/1/2023 10:02 PM IDT)
  • *Richard T Liu (6/1/2023 1:01 AM IDT)
  • *Victor Chang (6/1/2023 6:45 AM IDT)
  • *Bertram Felgenhauer (6/1/2023 3:08 PM IDT)
  • *Daniel Chong Jyh Tar (6/1/2023 8:01 PM IDT)
  • Johannes Antonio Wolfgan Losert (6/1/2023 10:07 PM IDT)
  • *Bert Dobbelaere (7/1/2023 10:52 AM IDT)
  • *Daniel Bitin (7/1/2023 5:44 PM IDT)
  • *Stephen Herbert Jr (8/1/2023 10:07 AM IDT)
  • *Lorenzo Gianferrari Pini (8/1/2023 12:05 PM IDT)
  • Caleb (8/1/2023 12:16 PM IDT)
  • *Stéphane Higueret (8/1/2023 11:19 PM IDT)
  • *Amos Guler (9/1/2023 10:25 PM IDT)
  • *Martin Thorne (10/1/2023 4:08 AM IDT)
  • *Gary M. Gerken (10/1/2023 6:43 AM IDT)
  • *Vladimir Volevich (10/1/2023 12:19 PM IDT)
  • *Dominik Reichl (10/1/2023 5:19 PM IDT)
  • *Seth Cohen (11/1/2023 5:55 AM IDT)
  • Kennedy (11/1/2023 10:40 PM IDT)
  • Michael Liepelt (12/1/2023 9:50 AM IDT)
  • Guillaume Escamocher (13/1/2023 4:37 PM IDT)
  • *Lorenz Reichel (13/1/2023 6:00 PM IDT)
  • *Harald Bögeholz (14/1/2023 6:24 AM IDT)
  • Fire H24d (14/1/2023 9:03 AM IDT)
  • *Sanandan Swaminathan (14/1/2023 11:25 AM IDT)
  • Julian Bayerl (15/1/2023 9:57 AM IDT)
  • *Dieter Beckerle (16/1/2023 5:46 PM IDT)
  • David Greer (16/1/2023 7:43 PM IDT)
  • Abu Said Manap (17/1/2023 2:50 PM IDT)
  • *David Tucker (17/1/2023 3:33 PM IDT)
  • *Phil Proudman (17/1/2023 6:22 PM IDT)
  • *Latchezar Christov (18/1/2023 3:16 PM IDT)
  • *K S (19/1/2023 2:25 AM IDT)
  • *Nyles Heise (19/1/2023 8:32 AM IDT)
  • *Kevin Bauer (19/1/2023 8:49 PM IDT)
  • *Karl Mahlburg (21/1/2023 6:28 AM IDT)
  • stan (21/1/2023 1:19 PM IDT)
  • *Motty Porat (21/1/2023 5:15 PM IDT)
  • *Dan Dima (21/1/2023 6:17 PM IDT)
  • *Tim Walters (24/1/2023 10:47 AM IDT)
  • *Joaquim Neves Carrapa (26/1/2023 5:07 PM IDT)
  • *Matt Cristina (28/1/2023 5:46 AM IDT)
  • *Kang Jin Cho (28/1/2023 6:57 PM IDT)
  • *Todd Will (28/1/2023 7:48 PM IDT)
  • *David F.H. Dunkley (29/1/2023 3:55 AM IDT)
  • Meron (30/1/2023 2:54 PM IDT)
  • *Andreas Stiller (30/1/2023 8:24 PM IDT)
  • Wilder Boyden (30/1/2023 9:54 PM IDT)
  • Oscar Volpatti (1/2/2023 10:31 AM IDT)
  • Fabio Michele Negroni (1/2/2023 11:01 AM IDT)
  • *Karl D’Souza (2/2/2023 12:24 AM IDT)
  • *Reiner Martin (2/2/2023 4:32 PM IDT)
  • *Marco Bellocchi (4/2/2023 8:58 PM IDT)
  • *Radu-Alexandru Todor (4/2/2023 11:59 PM IDT)
  • *Li Li (5/2/2023 7:12 AM IDT)

Related posts