On the optimality of the neighbor joining algorithm

On the optimality of the neighbor joining algorithm

Kord Eickmeyer, Peter Huggins, Lior Pachter, Ruriko Yoshida

Supplementary materials

Comparison of neighbor joining and balanced minimum evolution cones

Here we give approximations of spherical volumes of intersections between NJ and BME cones in the first orthant. The intersection between cones BME(T1) and NJ(T2) in the first orthant gives the set of all nonnegative dissimilarity maps for which the BME tree is T1 and the NJ tree is T2.

The volume approximations are based upon a spherical sampling of the first orthant. For each number of taxa n=5,6,7,8, we used a sample of size 10,000,000. For each pair of BME tree and NJ tree, we tallied the number of samples which yielded the pair. By dividing the tallies by 10,000,000, one can obtain an estimate of the fraction of dissimilarity maps yield each pair of BME tree and NJ tree.

Here's a tarball of the data files. For each n=5,6,7,8, the untarred file treen.finalsummary gives the sampling tallies in the following line-by-line format:

  tally   BME tree   NJ tree

where both the NJ and BME trees are given in BME vector format. The BME vector of a bifurcating tree T is as described in the paper, except multiplied by an appropriate power of 2 so that all entries are integers.

Software for estimating spherical volumes of cones

For cones of small volume, sampling the enitre sphere or an orthant can be prohibitively slow for volume estimation. Our spherical cone volume software estimates the volume of a triangulated cone using mesh approximation and sampling. The tarball untars into two MATLAB .m files. To run, put the two .m files in the same directory, and then use the function cone_spherevol() in MATLAB as follows:


where the rows of X are the generating rays of the trianguated cone, and T is a triangulation where each simplicial cone is given as a row of indices, and p is the number of samples to use.

For example, if our cone is the first orthant, with an extra ray (1,1) added to the triangulation, then we would use X = [0,1; 1,0; 1,1] and T = [1,3; 2,3] which means there are two simplicial cones in the trianguation, one cone generated by the 1st and 3rd rows of X, and one cone generated by the 2nd and 3rd rows of X.

The approximation converges to the true spherical volume as p increases. If any simplicial cones contain an angle close to 180 degrees, then the convergence will be much slower. Therefore we recommend breaking wide cones into pieces, for instance by intersecting with orthants. One can also add interior rays as in the above example, which will improve the initial approximation obtained from the simplicial mesh.

For any questions about this MATLAB software, please contact user phuggins at math(stop)berkeley(stop)edu