FAQ
- How do I speed up my analysis?
- My analysis quits without giving me a useful error message. What can I do?
- How can I get PartitionFinder to work on my Linux cluster?
- Can I use PartitionFinder to do model selection?
- Can I see the PartitionFinder source code?
- What models of molecular evolution are included in PartitionFinder?
How do I speed up my analysis?
PartitionFinder and PartitionFinderProtein have to do a huge number of calculations to find the best partitioning scheme. On very large datasets, some types of analysis are just impractical. There are three things you can do to make sure your analysis runs as quickly as possible. First, use the "search=greedy;" rather than "search=all;" option. Second, use a computer with multiple processors if you can. PartitionFinder automatically detects how many processors you have available, and uses all of them. The '-p' option can be used to control how many processors PartitionFinder uses, see the manual for more information. Third, reduce the number of models you're considering. Most people start by selecting "models = all;". This is a good start, but in some cases it's just not practical to analyse all possible models (56 for DNA, 112 for Amino Acids). PartitionFinder and PartitionFinderProtein will still work very well if you use just one or two models, for instance with DNA sequences you can use "models = GTR, GTR+G;". For AA sequences, a good option is to use four models: "models = LG, LG+G, LG+G+F, LG+F;". Once you have searched for the optimal partitioning scheme in this way, you can then use PartitionFinder to do model selection using all possible models on that scheme (see below).
My analysis quits without giving me a useful error message. What can I do?
PartitionFinder will usually give you a helpful error message when there's a problem, but in some cases we won't have anticipated a particular issue so it will just quit without any useful error message. There are three things to do here.
How can I get PartitionFinder to work on my Linux cluster?
To get PartitionFinder and PartitionFinder protein working on Linux, follow these simple steps.
Can I use PartitionFinder to do model selection?
Yes. PartitionFinder and PartitionFinder can easily be used to do standard model selection, and it works in a very similar way to programs like ModelTest, ProtTest, ModelGenerator, etc. PartitionFinder and PartitionFinder protein should be as quick, or quicker than, these programs. The big advantage of PartitionFinder is that it can perform model selection on partitioned datasets - doing model selection on each partition, without having to run of separate analyses. In fact, the algorithms we use in PartitionFinder and PartitionFinder protein are in many ways more appropriate for performing model selection on partitioned datasets than those in other programs, because we use information from the whole alignment to build a guide tree for the model selection. So, if you have a dataset and want to perform model selection, just follow these steps:
The best_schemes.txt file tells you the best model for each subset of sites (sometimes called a partition) in your alignment. PartitionFinder also stores all of the model selection results for each subset - very similar to the output of programs like ModelTest, ProtTest, etc. This information is stored in a .txt file inside the /analysis/subsets folder. To find it, copy the subset identifier from the best_schemes.txt file (in the "Alignment column"). This is a long name something like this "50bf1643d2a386419c9264eccd173b6b". Now go and find the .txt file in /analysis/subsets that has that name: e.g. 50bf1643d2a386419c9264eccd173b6b.txt. That file contains neatly formatted model selection results for the subset.
Can I see the PartitionFinder source code?
Yes. It's here: https://github.com/brettc/partitionfinder. It's released under a GNU General Public License, which means you can do more or less whatever you want with it.What models of molecular evolution are included in PartitionFinder?
PartitionFinder and PartitionFinder include all of the named models included in PhyML and RAxML, see the lists below. In principle we could include any model of amino acid replacement, or any sub-model of the GTR model (there are 203 in total). If you have specific requirements, send me an email and I'll implement additional models. (Note that the TrN models are annotated as TN93 in some programs).+I: include a proportion of invariant sites
+G: include gamma distributed rates across sites (with 4 categories)
JC, K80, TrNef, K81, TVMef, TIMef, SYM, F81, HKY, TrN, K81uf, TVM, TIM, GTR, JC+I, K80+I, TrNef+I, K81+I, TVMef+I, TIMef+I, SYM+I, F81+I, HKY+I, TrN+I, K81uf+I, TVM+I, TIM+I, GTR+I, JC+G, K80+G, TrNef+G, K81+G, TVMef+G, TIMef+G, SYM+G, F81+G, HKY+G, TrN+G, K81uf+G, TVM+G, TIM+G, GTR+G, JC+I+G, K80+I+G, TrNef+I+G, K81+I+G, TVMef+I+G, TIMef+I+G, SYM+I+G, F81+I+G, HKY+I+G, TrN+I+G, K81uf+I+G, TVM+I+G, TIM+I+G, GTR+I+G
Amino Acid Models in PartitionFinderProtein using default settings (112 in total)
+I: include a proportion of invariant sites
+G: include gamma distributed rates across sites (with 4 categories)
+F: include amino acid frequencies estimated from the alignment
LG, WAG, mtREV, Dayhoff, DCMut, JTT, VT, Blosum62, CpREV, RtREV, MtMam, MtArt, HIVb, HIVw, LG+F, WAG+F, mtREV+F, Dayhoff+F, DCMut+F, JTT+F, VT+F, Blosum62+F, CpREV+F, RtREV+F, MtMam+F, MtArt+F, HIVb+F, HIVw+F, LG+I, WAG+I, mtREV+I, Dayhoff+I, DCMut+I, JTT+I, VT+I, Blosum62+I, CpREV+I, RtREV+I, MtMam+I, MtArt+I, HIVb+I, HIVw+I, LG+G, WAG+G, mtREV+G, Dayhoff+G, DCMut+G, JTT+G, VT+G, Blosum62+G, CpREV+G, RtREV+G, MtMam+G, MtArt+G, HIVb+G, HIVw+G, LG+I+G, WAG+I+G, mtREV+I+G, Dayhoff+I+G, DCMut+I+G, JTT+I+G, VT+I+G, Blosum62+I+G, CpREV+I+G, RtREV+I+G, MtMam+I+G, MtArt+I+G, HIVb+I+G, HIVw+I+G, LG+I+F, WAG+I+F, mtREV+I+F, Dayhoff+I+F, DCMut+I+F, JTT+I+F, VT+I+F, Blosum62+I+F, CpREV+I+F, RtREV+I+F, MtMam+I+F, MtArt+I+F, HIVb+I+F, HIVw+I+F, LG+G+F, WAG+G+F, mtREV+G+F, Dayhoff+G+F, DCMut+G+F, JTT+G+F, VT+G+F, Blosum62+G+F, CpREV+G+F, RtREV+G+F, MtMam+G+F, MtArt+G+F, HIVb+G+F, HIVw+G+F, LG+I+G+F, WAG+I+G+F, mtREV+I+G+F, Dayhoff+I+G+F, DCMut+I+G+F, JTT+I+G+F, VT+I+G+F, Blosum62+I+G+F, CpREV+I+G+F, RtREV+I+G+F, MtMam+I+G+F, MtArt+I+G+F, HIVb+I+G+F, HIVw+I+G+F
GTR+G, GTR+I+G
Amino Acid Models in PartitionFinderProtein using --raxml option (55 in total)
DAYHOFF, DCMUT, JTT, MTREV, WAG, RTREV, CPREV, VT, BLOSUM62, MTMAM, LG, DAYHOFF+G, DCMUT+G, JTT+G, MTREV+G, WAG+G, RTREV+G, CPREV+G, VT+G, BLOSUM62+G, MTMAM+G, LG+G, DAYHOFF+G+F, DCMUT+G+F, JTT+G+F, MTREV+G+F, WAG+G+F, RTREV+G+F, CPREV+G+F, VT+G+F, BLOSUM62+G+F, MTMAM+G+F, LG+G+F, DAYHOFF+I+G, DCMUT+I+G, JTT+I+G, MTREV+I+G, WAG+I+G, RTREV+I+G, CPREV+I+G, VT+I+G, BLOSUM62+I+G, MTMAM+I+G, LG+I+G, DAYHOFF+I+G+F, DCMUT+I+G+F, JTT+I+G+F, MTREV+I+G+F, WAG+I+G+F, RTREV+I+G+F, CPREV+I+G+F, VT+I+G+F, BLOSUM62+I+G+F, MTMAM+I+G+F, LG+I+G+F
