Skip to content

[Feature Request] Feasibility of implementing standard non-parametric bootstrap for CMAPLE / MAPLE #150

Description

@Janzulene

Background

MAPLE, as a fast phylogenetic inference method for large-scale data, has been integrated into IQ‑TREE 3. However, the standard -b bootstrap option does not work when MAPLE is used as the inference method((the same applies to the -o option for specifying an outgroup). At the very least, IQ‑TREE should at least issue a warning to users indicating that the bootstrap option will be ignored in this combination.

Furthermore, CMAPLE currently provides SPRTA as a branch support metric. In my own preliminary experiments, SPRTA scores show a low correlation with the branch support values obtained from the standard bootstrap ( -b / -bb ) under the traditional maximum‑likelihood (ML) framework in IQ‑TREE while the MAPLE tree itself shows a stronger topological correlation with the ML tree than other fast methods.

Suggestion

I would like to propose enabling Felsenstein’s non‑parametric bootstrap for CMAPLE. The basic workflow would be:

  1. Perform column‑wise resampling (with replacement) of the input alignment.

  2. Convert each bootstrap replicate into the MAPLE‑compatible format (e.g., using createMapleFile.py).

  3. Run CMAPLE independently on each replicate.

  4. Map the resulting topologies back to the original tree to compute branch support.

MAPLE is already substantially faster than traditional ML methods. Adding a bootstrap procedure would, of course, increase total computation time, but for large datasets it would still be much faster than running ML + bootstrap. If future experiments confirm that bootstrap is indeed more reliable than SPRTA (as my small‑scale tests already hint), then implementing bootstrap for MAPLE would be the best available option until an even faster method that can achieve near‑bootstrap reliability appears.

Adding standard bootstrap would also align MAPLE with other mainstream phylogenetic methods (ML, MP, etc.) in terms of branch support assessment, making it easier for users to compare and trust the results.

Major Obstacle & Open Question

Currently CMAPLE’s initialization logic expects the “reference + SNP difference” format. It remains unclear whether the algorithm is statistically compatible with resampled (bootstrap) data in terms of bootstrap consistency. Has any theoretical or simulation‑based validation been conducted?

Supporting Experiment

I have performed a small‑scale test: on a modest test tree, I compared SPRTA values against bootstrap / ultrafast bootstrap support for the same clades (identical topology). The correlation was not very strong. I plan to extend these experiments to larger simulated and real datasets and will share the detailed results in a follow‑up discussion. (Please note that due to other commitments, I may not be able to provide the full experimental follow‑up very soon — I appreciate your patience.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions