A much more efficient solution that guarantees that only new not previously generated molecules are produced at each step is based on the theory of free and oriented trees. A paraffin molecule can be viewed as a free tree with nodes corresponding to carbon atoms and edges corresponding to carbon-carbon bonds. However, the data structures for paraffin molecules are designed to represent oriented rather than free trees. Duplicates are avoided by a mapping between the vertices in the free tree and those in the oriented tree. Lexicographic ordering for the subnodes of a node eliminates the source of duplicates due to imposing order on the unordered neighbors of a node in a free tree.
Varying the vertex that is mapped to the root vertex in the oriented representation of a free tree is another source of duplicates. The centroid theorem is used to avoid this: a tree of odd size has a single centroid (vertex of a minimum height), while one of even size has either a single centroid or a pair of adjacent centroids.
Even-sized double centroid paraffin molecules, called "bond-centered paraffins", or BCPs have a root corresponding to a carbon-carbon bond. Each of the carbon atoms attached to this bond is a root of a paraffin radical. The two top-level radicals are of size i/2 each.
The root nodes of carbon radicals of size i, i>0, correspond to carbon atoms and have 3 subnodes, each the root of a subtree representing a paraffin radical. Each of these three subradicals have a size less than i-1 and total size of i-1.
A paraffin radical of size 0 is a hydrogen atom.
All three kinds of objects, radicals, BCPs, and CCPs are constructed by attaching a needed number of paraffin radicals of appropriate sizes to some other node.