Problem #14 is the first of many Project Euler problems to feature the Collatz Sequence. The Collatz Sequence, named from the infamous Collatz Conjecture, is the sequence of numbers that follow an arbitrary starting integer “n” when the following process is performed: If x is the current number, whenever x is even, divide x by 2 to get the next term, otherwise, multiply x by 3 and add 1 to get the next term. The Collatz Conjecture is an unproved hypothesis that all possible starting values of n will ultimately produce the oscillating sequence 4, 2, 1, 4, 2, 1, . . .
Proving the Collatz conjecture is a notoriously difficult task that is far beyond the scope of any problem on Project Euler. The famous Hungarian mathematician Paul Erdos infamously stated that “Mathematics may not be ready for such problems” when discussing the Collatz conjecture. Luckily, we are not required to prove the Collatz conjecture in Problem #14. The problem reads:
Project Euler Problem 14: Longest Collatz sequence
The following iterative sequence is defined for the set of positive integers:
n -> n/2 (n is even)
n -> 3n+1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
I have a small confession to make about this problem. When I first attempted this problem, I couldn’t think of a way to do it without basic brute force. As a result, I solved it through means of brute force with a script that took about 11 seconds to run. Afterwards, I read Project Euler’s documentation on this problem and spoiled myself about a more efficient means of solving it. I will present the brute force solution first, and then Project Euler’s solution:
Solution #1: Brute Force Approach
My brute force approach to this problem is possibly the most brain-dead means of solving this problem. First I made a function that finds the length of the Collatz sequence for an arbitrary starting number. Then I calculated the length of the sequence for all starting values less than a million and stored a running maximum index and length. Once the process was complete, I had the starting term with the longest chain. Here is my implementation of this method in Python 2.7:
'''
Author: Walker Kroubalkian
Brute Force Approach to Project Euler Problem #14
'''
import time
def getCollatzLength(x):
t = 0
while(x>1):
if(x%2==0):
x/=2
else:
x = 3*x+1
t+=1
return t
def projectEulerProblemFourteen(x):
maxIndex = -1
maxLength = 0
for i in range(1,x):
a = getCollatzLength(i)
if(a>maxLength):
maxLength = a
maxIndex = i
return maxIndex
start = time.time()
print projectEulerProblemFourteen(1000000)
print ("--- %s seconds ---" % (time.time()-start))
'''
Prints
837799
--- 11.196969986 seconds ---
for input of n=1000000
'''
As shown above, this solution does pass Project Euler’s “One Minute Rule”, but it is not very optimized. For example, by the time I calculate the length of the sequence for i = 600,000, I have already calculated the length of the sequence for i = 600,000/2 = 300,000 which should be 1 less. This algorithm does not pick up on that fact, and therefore it wastes a ton of time. There are many other inefficiencies in this algorithm that are corrected in Project Euler’s solution. Here is the solution from Project Euler’s documentation for this problem:
Solution #2: Dictionary Approach (Credits to PE)
Project Euler’s solution fixes two major inefficiencies with the brute force algorithm. First of all, it uses a dictionary to recursively store the length of the sequence for any intermediate value in a sequence we check. Secondly, it notices that the Collatz sequence for a starting value of 2n is always 1 greater than the Collatz sequence for a starting value of n, and therefore it is only necessary to check starting values greater than or equal to 500,000. Using these two observations, the solution becomes much more efficient. I have implemented Project Euler’s solution in Python 2.7:
'''
Author: Project Euler with Python Implementation by Walker Kroubalkian
Dictionary Approach to Project Euler Problem #14
'''
import time
def projectEulerProblemFourteen(n):
longestLength = 0
longestIndex = -1
knownIndices = {1: 1}
def indexLength(x):
if x in knownIndices:
return knownIndices.get(x)
if(x%2==0):
knownIndices[x] = indexLength(x/2)+1
else:
knownIndices[x] = indexLength(3*x+1)+1
return knownIndices[x]
for i in range(n/2,n):
a = indexLength(i)
if(a > longestLength):
longestLength = a
longestIndex = i
return longestIndex
start = time.time()
print projectEulerProblemFourteen(1000000)
print ("--- %s seconds ---" % (time.time()-start))
'''
Prints
837799
--- 1.09148907661 seconds ---
for input of n=1000000
'''
This solution was my first experience with Python dictionaries and functions within functions. Unfortunately, even with Project Euler’s improvements, my implementation in Python fails to solve this problem in under a second (I believe this is the first time among all of my blog posts). I would like to return to this problem to optimize it further.
Thanks for reading! See you tomorrow!