FAQ

  1. How do I speed up my analysis?
  2. My analysis quits without giving me a useful error message. What can I do?
  3. How can I get PartitionFinder to work on my Linux cluster?
  4. Can I use PartitionFinder to do model selection?
  5. Can I see the PartitionFinder source code?
  6. What models of molecular evolution are included in PartitionFinder?

How do I speed up my analysis?

PartitionFinder and PartitionFinderProtein have to do a huge number of calculations to find the best partitioning scheme. On very large datasets, some types of analysis are just impractical. There are three things you can do to make sure your analysis runs as quickly as possible. First, use the "search=greedy;" rather than "search=all;" option. Second, use a computer with multiple processors if you can. PartitionFinder automatically detects how many processors you have available, and uses all of them. The '-p' option can be used to control how many processors PartitionFinder uses, see the manual for more information. Third, reduce the number of models you're considering. Most people start by selecting "models = all;". This is a good start, but in some cases it's just not practical to analyse all possible models (56 for DNA, 112 for Amino Acids). PartitionFinder and PartitionFinderProtein will still work very well if you use just one or two models, for instance with DNA sequences you can use "models = GTR, GTR+G;". For AA sequences, a good option is to use four models: "models = LG, LG+G, LG+G+F, LG+F;". Once you have searched for the optimal partitioning scheme in this way, you can then use PartitionFinder to do model selection using all possible models on that scheme (see below).

My analysis quits without giving me a useful error message. What can I do?

PartitionFinder will usually give you a helpful error message when there's a problem, but in some cases we won't have anticipated a particular issue so it will just quit without any useful error message. There are three things to do here.

(1) Double check that your partition_finder.cfg file follows all the conventions described in the manual. This is by far the most common cause of problems.
(2) Try re-starting your analysis from scratch. To do this, add "--force-restart" to the end of your command line. Be careful though, this command will delete all previous analyses.
(3) If none of the above work, post a question on the PartitionFinder google group. The more detail you can provide, the more likely we are to be able to help figure out the problem. If you really don't like posting onto public forums, feel free to email me instead.

How can I get PartitionFinder to work on my Linux cluster?

To get PartitionFinder and PartitionFinder protein working on Linux, follow these simple steps.

(1) Download and compile the latest version of PhyML from here: http://code.google.com/p/phyml/
(2) Name the executable you've created "phyml"
(3) Download and unzip PartitionFinder
(4) Delete everything from the 'programs' folder within the 'PartitionFinder' folder
(5) Put the phyml executable you've made into the 'programs' folder
(6) Run PartitionFinder from the command line, as described in the Mac section of the manual

Can I use PartitionFinder to do model selection?

Yes. PartitionFinder and PartitionFinder can easily be used to do standard model selection, and it works in a very similar way to programs like ModelTest, ProtTest, ModelGenerator, etc. PartitionFinder and PartitionFinder protein should be as quick, or quicker than, these programs. The big advantage of PartitionFinder is that it can perform model selection on partitioned datasets - doing model selection on each partition, without having to run of separate analyses. In fact, the algorithms we use in PartitionFinder and PartitionFinder protein are in many ways more appropriate for performing model selection on partitioned datasets than those in other programs, because we use information from the whole alignment to build a guide tree for the model selection. So, if you have a dataset and want to perform model selection, just follow these steps:

(1) In the .cfg file, specify the models you want to compare and the metric you want to compare them with (AIC, AICc, BIC)
(2) In the .cfg file, set "search=user;"
(3) In the .cfg file, specify the partitioning scheme you want to use (see the manual for how to do this).
(4) Run PartitionFinder following the instructions in the manual
(5) Your results will be printed out in the 'best_schemes.txt' file, which is in the /analysis folder

The best_schemes.txt file tells you the best model for each subset of sites (sometimes called a partition) in your alignment. PartitionFinder also stores all of the model selection results for each subset - very similar to the output of programs like ModelTest, ProtTest, etc. This information is stored in a .txt file inside the /analysis/subsets folder. To find it, copy the subset identifier from the best_schemes.txt file (in the "Alignment column"). This is a long name something like this "50bf1643d2a386419c9264eccd173b6b". Now go and find the .txt file in /analysis/subsets that has that name: e.g. 50bf1643d2a386419c9264eccd173b6b.txt. That file contains neatly formatted model selection results for the subset.

Can I see the PartitionFinder source code?

Yes. It's here: https://github.com/brettc/partitionfinder. It's released under a GNU General Public License, which means you can do more or less whatever you want with it.

What models of molecular evolution are included in PartitionFinder?

PartitionFinder and PartitionFinder include all of the named models included in PhyML and RAxML, see the lists below. In principle we could include any model of amino acid replacement, or any sub-model of the GTR model (there are 203 in total). If you have specific requirements, send me an email and I'll implement additional models. (Note that the TrN models are annotated as TN93 in some programs).

Nucelotide Models in PartitionFinder using default settings (56 in total)
+I: include a proportion of invariant sites
+G: include gamma distributed rates across sites (with 4 categories)
JC, K80, TrNef, K81, TVMef, TIMef, SYM, F81, HKY, TrN, K81uf, TVM, TIM, GTR, JC+I, K80+I, TrNef+I, K81+I, TVMef+I, TIMef+I, SYM+I, F81+I, HKY+I, TrN+I, K81uf+I, TVM+I, TIM+I, GTR+I, JC+G, K80+G, TrNef+G, K81+G, TVMef+G, TIMef+G, SYM+G, F81+G, HKY+G, TrN+G, K81uf+G, TVM+G, TIM+G, GTR+G, JC+I+G, K80+I+G, TrNef+I+G, K81+I+G, TVMef+I+G, TIMef+I+G, SYM+I+G, F81+I+G, HKY+I+G, TrN+I+G, K81uf+I+G, TVM+I+G, TIM+I+G, GTR+I+G

Amino Acid Models in PartitionFinderProtein using default settings (112 in total)
+I: include a proportion of invariant sites
+G: include gamma distributed rates across sites (with 4 categories)
+F: include amino acid frequencies estimated from the alignment
LG, WAG, mtREV, Dayhoff, DCMut, JTT, VT, Blosum62, CpREV, RtREV, MtMam, MtArt, HIVb, HIVw, LG+F, WAG+F, mtREV+F, Dayhoff+F, DCMut+F, JTT+F, VT+F, Blosum62+F, CpREV+F, RtREV+F, MtMam+F, MtArt+F, HIVb+F, HIVw+F, LG+I, WAG+I, mtREV+I, Dayhoff+I, DCMut+I, JTT+I, VT+I, Blosum62+I, CpREV+I, RtREV+I, MtMam+I, MtArt+I, HIVb+I, HIVw+I, LG+G, WAG+G, mtREV+G, Dayhoff+G, DCMut+G, JTT+G, VT+G, Blosum62+G, CpREV+G, RtREV+G, MtMam+G, MtArt+G, HIVb+G, HIVw+G, LG+I+G, WAG+I+G, mtREV+I+G, Dayhoff+I+G, DCMut+I+G, JTT+I+G, VT+I+G, Blosum62+I+G, CpREV+I+G, RtREV+I+G, MtMam+I+G, MtArt+I+G, HIVb+I+G, HIVw+I+G, LG+I+F, WAG+I+F, mtREV+I+F, Dayhoff+I+F, DCMut+I+F, JTT+I+F, VT+I+F, Blosum62+I+F, CpREV+I+F, RtREV+I+F, MtMam+I+F, MtArt+I+F, HIVb+I+F, HIVw+I+F, LG+G+F, WAG+G+F, mtREV+G+F, Dayhoff+G+F, DCMut+G+F, JTT+G+F, VT+G+F, Blosum62+G+F, CpREV+G+F, RtREV+G+F, MtMam+G+F, MtArt+G+F, HIVb+G+F, HIVw+G+F, LG+I+G+F, WAG+I+G+F, mtREV+I+G+F, Dayhoff+I+G+F, DCMut+I+G+F, JTT+I+G+F, VT+I+G+F, Blosum62+I+G+F, CpREV+I+G+F, RtREV+I+G+F, MtMam+I+G+F, MtArt+I+G+F, HIVb+I+G+F, HIVw+I+G+F
Nucelotide Models in PartitionFinder using --raxml option (2 in total)
GTR+G, GTR+I+G

Amino Acid Models in PartitionFinderProtein using --raxml option (55 in total)
DAYHOFF, DCMUT, JTT, MTREV, WAG, RTREV, CPREV, VT, BLOSUM62, MTMAM, LG, DAYHOFF+G, DCMUT+G, JTT+G, MTREV+G, WAG+G, RTREV+G, CPREV+G, VT+G, BLOSUM62+G, MTMAM+G, LG+G, DAYHOFF+G+F, DCMUT+G+F, JTT+G+F, MTREV+G+F, WAG+G+F, RTREV+G+F, CPREV+G+F, VT+G+F, BLOSUM62+G+F, MTMAM+G+F, LG+G+F, DAYHOFF+I+G, DCMUT+I+G, JTT+I+G, MTREV+I+G, WAG+I+G, RTREV+I+G, CPREV+I+G, VT+I+G, BLOSUM62+I+G, MTMAM+I+G, LG+I+G, DAYHOFF+I+G+F, DCMUT+I+G+F, JTT+I+G+F, MTREV+I+G+F, WAG+I+G+F, RTREV+I+G+F, CPREV+I+G+F, VT+I+G+F, BLOSUM62+I+G+F, MTMAM+I+G+F, LG+I+G+F