Collatz Proof Approach


Most readers of these pages,  but especially those with a mathematical bent,  find themselves overwhelmed by the volume of textual material,  and rapidly lose interest.  This page is an attempt to focus on the small fraction of this work which I believe suffices for the basis of a proof.  Those segments presented here are limited to minimal treatment with liberal reference to the more discursive material included elsewhere on this site..  Hopefully,  this skeletal treatment will pique the interest of readers to look,  as needed,  at the detail which is provided through the pointers.  Readers looking for more detail can select those supporting pages which they feel would be most useful to supplement those points which are inadequately covered here.

Accordingly,  I will focus here principally on the features which led to the development of a computer program,  rsetidea,  which produces a form of the Collatz predecessor tree employing residue sets as its basic structural element.  The program precisely implements an inductive construction of the infinite predecessor tree,  limited only by contraints of time and space for output.  This program suffices for understanding the process of creation of predecessors,  and its results give ample details of systematic patterns which provide the immediate basis for a proof of the conjecture.


There is a careful review of work on the Collatz conjecture which provides a bounty of references to individual papers.  The conjecture has atrtracted hundreds of interested workers during the seventy years since its proposal but is still without a generally accepted proof.  The conjecture involves a recursive iteration on the positive integers and proposes that the trajectories of these iterations are such that convergence to 1 will result from any starting positive integer value.  We need only prove that the Collatz predecessor graph is comprehensive and weakly connected.

Approach of this work

We must develop a structure which accommodates the peculiar behavior of the Collatz trajectories.  Tree structures,  which are weakly connected graphs,  are clearly the best choice.  Various forms of predecessor trees better illustrate certain aspects of the itineraries.

A general tree allows tracing the itinerary from any odd integer back to the root at 1. 

A binary tree links the integer nodes of the general tree in a different manner,  giving rise to some nomenclature (specifically,  left descent assemblies (aka l.d.a.s)  and extensions)  but has the peculiarity that tracing the itinerary from a second or later extension element omits mention of the the values of all those extension elements preceding them.  Instead the common parent of all the extension elements appear as their predecessor.

An abstract tree eschews consideration of individual integers,  instead focusing on sets of integers traversed during trajectories.  The abstract tree provides the long sought structure whose discovery serves to provide a ready approach to a proof of the conjecture.  The nodes {Ta} (Trajectory elements in the abstract tree)  are residue sets c modulo d,  denoted as {c[d]} or even {dn+c},  with c<d.

Note: The predecessor tree is considered to consist solely of odd integers ({Ta} in {1[2]})  as indicated in one version of the Collatz iteration formula.  Ts and Tp are successor and predecessor nodes,  respectively.

        Ts=(3*Tp+1)/2i                     where i is such that Ts is odd                                        (1)

The even numbers are included (almost as an afterthought)  in the proof since they are always powers-of-two multiples of the odd numbers,  i.e.  {Ta in {0[2]}} produced from 2kTa where {Ta} in {1[2]},k in N}.

The calculations which produce predecessors of any parental set arise directly from (2),  which is simply derived from (1). To produce successive generations of children in the predecessor trees,  we use the recursive formula

        Tp=(-1+Ts*2i)/3     where i  must be even when   Ts is congruent to 1 mod 3
                                                and i  must be odd when Ts  is congruent to 2 mod 3.           (2)

Def'n: An l.d.a.  (left descent assembly,  so-named because it constitutes a left descent in the binary predecessor tree)  is a path from the root of the abstract predecessor tree to some leaf node.  The path from the root ({Ta} in {5[8]})  to a leaf node ({Ta in {0[3]})  inclusive of the root node,  internal nodes,  and leaf node together comprise an l.d.a.  In eqn 2,  i values ≥3 lead to extensions,  but within l.d.a.s the value of i is restricted to {1,2},  thus limiting the number of successive divisions by 2 in an l.d.a.  portion of a Collatz trajectory.

The effective root of the abstract tree is {5[8]}.  Many elements in that set produce 1 as their (first odd)  successor in the odds-only version of the predecessor trees.  The others produce an odd integer which occurs in a residue set elsewhere in the tree after 3 or more divisions by 2 .

The combined effect of mod 3 and mod 8 components in calculation of the predecesors result in control of the result by the value of floor(mod(Ta,72)/24).  Every odd element in a residue set is in {0,1,2}[3].  Only those in {1[3]} or {2[3]} appear as headers of additional l.d.a.s.  The elements in {0[3]} are leaves and have no predecessors.  The elements in both {0[3]} and {5[8]} are both leaves and l.d.a.  headers.  In other words,  these are null l.d.a.s containing no internal nodes. .

All non-leaf nodes in l.d.a.s occur in one of two subsets,  ({1[3]} or {2[3]}),  and act as parents in elaboration of the l.d.a.  {Ta} in {1[3]} produce predecessors bigger than themselves,  and {Ta} in {2[3]} produce predecessors smaller than themselves,  whence we have formed the habit of calling the former b-steps and the latter s-steps.  This leads to two different predecessor residue sets from any internal l.d.a.  node.  An iteration of the predecessor calculations through successive generations will produce an infinite abstract tree.  The cumulative effect is that l.d.a.s can grow to indefinite length.  The genesis of any particular l.d.a.  can be succinctly shown by a string of the form "e(s|b)jt" where e represents the header extension,  s|b shows the successive smaller or bigger steps,  j is the number of steps in the l.d.a.,  and t represents the terminating leaf node

The elements of {5[8]} (the root of the abstract tree)  are scattered widely in the trees which use integers as the tree nodes.  If an integer's Collatz itinerary is traced through the abstract tree,  the path may pass into {5[8]} a large number of times.  Each such visit causes entry into a new l.d.a.  via an extension of one of its elements.  The process continues until one of the extensions of 5[8] whose odd predecessor is 1 appears, where the itinerary ends.

Overall,  the abstract binary predecessor tree consists of  l.d.a.s linked together by extensions which serve as headers for additional generations of  l.d.a.s.  The portrayal of the Collatz predecessor graph as a binary tree has the very important result that it provides a means of measuring the integer content of the graph.  Use is made of the density of the integers in each residue set.  The sum of the densities among the integers should approach 0.5 (since only odd ones are included)  as the tree is grown through successive generations.  Two distinct ways arise to do the summations: (a)  through the tree level by level (as shown)  where the increasing lengths of the l.d.a.  descriptive strings are all shared and (b)  using the Fibonacci series which comes from considering residue sets which share a given power of 2 in the d portion of the c[d] of their descriptor,  which arises from a skewed viewpoint of the binary tree.

The abstract tree is a tree whose nodes are infinite sets of odd integers {c[d],  i.e.  congruent to c(mod d}. We often refer to (and manipulate) these sets using formulas {dn+c,  with d in {2a*3b},[a,b] in Z+,n in N},  reflecting the means of their derivation (below).

Any set in the abstract tree other than a leaf node set can give rise to a predecessor set by either an s- or a b- step.  This immediately produces the abstract tree (see two views there).  As is evident in a brief schematic of abstract tree development all possible l.d.a.s are formed by systematic use of both s- and b-steps from the root and then each successively developed node.  Every path from root to a terminating leaf set in the abstract tree represents an l.d.a.  L.d.a.s can be denoted by a string "e(s|b)kt",  where the e represents the extension header,  the parenthesized expression indicates k ( in N)  selections in any order of s- or b-steps,  and the t represents termination due to having reached a leaf set.  This "esbt" expression denoting the detailed structure of an l.d.a.  is in predecessor order.  Read left to right to follow the growth of the l.d.a.  in the predecessor tree.  Read right to left to follow a path segment in the direction of the Collatz trajectory.  We will use the predecessor direction for l.d.a.s and other path segments.

We need now to develop the {c[d]} residue class for every element of every l.d.a.  For example,  the root node {5[8]} in the abstract tree includes as subsets the particular {c[d]} residue class for the root of every l.d.a.,  and similarly for all the other nodes in the abstract tree.  When eq2 is employed to produce each node's predecessors,  a leaf set is identified as well as two predecessor sets.  The leaf node {c[d]} value is precisely that describing the leaf set resulting from the l.d.a.  traced to that point.  Now,  applying eqn 1 through the successor (upward)  pathway,  the resulting series of {c[d]} values will represent those of the individual sets of the l.d.a.  traced,  always subsets of the values observed in the predecessor direction,  and concluding with the particular subset of {5[8]} which represents the header node of the l.d.a.  traced.

The process is pictured schematically,  and can be accomplished by hand in a down- and up again- process using the dn+c formula or by the program treegrow.  Two illustrative subsets (complete and headers only)  of the c[d] formulas for l.d.a.  elements are provided.  Margaret Maxfield has provided a proof that all these residue classes are unique.  Note that there has been a subtle transformation effected by the determination of residue classes for the nodes in the l.d.a.s in the abstract tree.  The abstract tree nodes which were first described as sets of individual elements in all the paths to leaf nodes (at whatever depth, leaf node or not)  have been converted to an identity (===)  describing the residue set of that single set representing the leaf node of the l.d.a.  which culminates there. This is consistant with the necessity to anchor l.d.a.s both at their header and leaf nodes.

Since every element of every l.d.a.  is a unique residue set {Ta===c[d]} we may turn to the task of "counting" the integers in that infinite set (of l.d.a.s)  of infinite sets (instantiations over k in N of each particular l.d.a.).

Combinatorics,  the Density Metric,  Infinite Summations

To begin with,  cardinality is not the correct metric to use to determine if a set of sets of odd integers contains all the odd integers.  I suggest herein that density provides an appropriate metric.  Set densities built directly on the abstract predecessor tree indicate that the total density of the sets of odd numbers created by it is 0.5,  i.e.  the totality of them.  These are based on a much less detailed analysis than that of the next paragraphs,  and thus less convincing.

For more detail, we'll take advantage of a picture, worth well over a thousand words.  The picture reveals two critical features: first the combinatorics which lead to the binomial pattern of the frequencies of occurrence of l.d.a.s sharing a common (a,b)  of the 2a*3b expression which is the d of c[d],  which simnply describes the depth to which the abstract tree has been developed, and, second, the use of an "isobar" to point out nodes with common power-of-2 depths into the abstract tree,  whence the Fibonacci series arises.  Both approaches lead quickly into summations of the densities of the odd integers across the whole infinite abstract predecessor tree.  These summations indicate that the abstract tree contains all the odd integers,  with the boundaries carefully aligned . ( Taking the limit of the uncounted odd integers a a goes to infinity confirms this result.)

Heart of the recursive code in program rsetidea

 csetcurs := proc(path::string, cvalue, dvalue, levelp2, mxl)  

The program needs only track the path reached at every point, the c[d] values developed to that point, and the maximum level to which the tree is developed. The complete code is more complex since it will track features required to identify the values of c[d] both before its progeny is determined and the ultimate anchored value which characterizes the node in its role as the leaf node.

The following code fragment directly implements the formula (2) for recursive construction of the abstract tree.. The code can only produce a tree, whence comes the claim that the abstract predecessor graph is a tree, and thus weakly connected.

It was expected that the output file would immediately reflect the expected distributions of densities, but in the result it became clear that the statements (bearing suffixed stars) which pass down multiples of the d value to deeper recursive levels lead to production of sets among deeper locations which are subsets of those obtained at shallower locations. The result was an overestimate of the coverage of the odd integes by at least 1/3.

The simple suppression of the starred statements results in loss from the output file of the descriptive information about vast numbers of leaf nodes in the tree. So the code was run as presented and the results post-processed to eliminate those resulting less dense subsets. With this correction, the expected results were obtained. The samples of the output file presented in a more complete discussion of rsetidea have been processed to remove the duplicative subsets.

 #it might be a leaf
    if  cvalue mod 3 = 0 then densout(path)
 #it might be a b-step parent
    elif cvalue mod 3 = 1 then rsetcurs(cat(path,"b"), 4/3*cvalue - 1/3, 4*dvalue, levelp2 + 2, mxl)  ****
 #it must be an s-step parent
    else rsetcurs(cat(path,"s"), 2/3*cvalue - 1/3, 2*dvalue, levelp2 + 1,  mxl)   ****
 #look at children of cvalue + dvalue (same possibilities for children)
    if (cvalue + dvalue) mod 3 = 0 then densout(path)
    elif (cvalue + dvalue) mod 3 = 1 then rsetcurs(cat(path,"b"),
        4/3*cvalue + 4/3*dvalue - 1/3, 4*dvalue, levelp2 + 2,  mxl)
    else rsetcurs(cat(path,"s"),
        2/3*cvalue + 2/3*dvalue - 1/3, 2*dvalue, levelp2 + 1,  mxl)
 #look at children of cvalue + 2*dvalue (same possibilities for children)
    if (cvalue + 2*dvalue) mod 3 = 0 then densout(path)
    elif (cvalue + 2*dvalue) mod 3 = 1 then rsetcurs(cat(path,"b"),
        4/3*cvalue + 8/3*dvalue - 1/3, 4*dvalue, levelp2 + 2,  mxl)
    else rsetcurs(cat(path,"s"),
        2/3*cvalue + 4/3*dvalue - 1/3, 2*dvalue, levelp2 + 1,  mxl)

My Collatz Home Page         Index to Terms Used