I write a few scripts in R and Python. Some of these are released along with particular publications (see Publications page). Others are here. Feel free to try them out, modify them, and re-publish them. I hope that some of these scripts will be useful to other people.
Click on the icons next to the script to download them. Some of the scripts have readme's and example files and can be run without knowing any programming, for others you'll need to know a tiny bit of either Python or R to use them.
Python
- Get Taxonomy
- This script can be used to search the Entrez Taxonomy database for the taxonomy of a given species. Names of species are supplied in a text file, and the script is then run from the command line. Note that Entrez taxonomy is not always (or even often, ahem) totally reliable. Still, the script can be useful for many things... The python icon links to a zip file with the script, instructions for using it, and example input and output files.
- GeneFinder
- This script accompanies the paper: "Estimating phylogenies for species assemblages: a complete phylogeny for the past and present native birds of New Zealand." The paper describes an approach to building molecular phylogenies for species assemblages (otherwise known as community phylogenetics). The script was central to our efforts to estimate the tree of all the NZ birds. To use it, just open up the .py file in a text editor, and have a read. You need a list of taxon IDs from GenBank, and a list of search terms, and the script will go and look for those search terms within those taxon IDs.
- Generate Submodels
- I've been working on models of DNA sequence evolution (like the GTR family of models). I needed a way to generate all the submodels of a given model (like all the submodels of the GTR model, of which there are 203). This script contains functions to generate those submodels, and uses a neat little recursive algorithm to do it (thanks in large part to help from Brett Calcott). The script uses notation that will be familiar to users of HyPhy or GARLI-PART. In this notation, the rate parameters of a model of sequence evolution are represented as a string of digits. So the GTR model family are all represented by six digit strings. If digits are the same, then those rate parameters are forced to be equal. So, the GTR model (6 rate parameters which can all differ) is designated by the string '012345', and submodels of the GTR model are designated using six digit strings with less than six different digits. For instance, the the JC model (all six parameters equal) looks like this "000000". This notation can of course be extended to families of models with more than 6 possible parameters (but be careful - the number of possible submodels can get very big).
- Generate Permutations of local clocks
- This script accompanies the paper "The local-clock permutation test: A simple test to compare rates of molecular evolution on phylogenetic trees". The paper describes a method for comparing the rates of evolution estimated from sets of branches on a phylogenetic tree (local clocks). This test is statistically preferable to the often-employed Likelihood Ratio Test. Detailed instructions on how to use the script are provided in the script itself. The script takes as input a tree with branch labels in PAML format, and produces permutations of those labels on that tree which can be analysed in PAML.
- Get Longest Genbank ID
- This script searches through GenBank for the longest accession (i.e. most base pairs) that matches a particular set of search terms. You can provide a list of TaxonIDs, and the script will return the longest accesssion for each one. The script is far from perfect, but I use it as a first step when building large supermatrices. Example files are provided.
- Replace Leading and Trailing Sequence Gaps with N's
- There was a recent request on Evoldir for a script to replace the leading and trailing gaps of sequences with 'N's'. That's what this script does. For simplicity, I've avoided using anything that might be a pain to install (like BioPython), so this script should run with almost any version of Python installed. Also, it should be fairly easy to edit to make it do different things, even if you've never written a script before. The script will only work on Fasta alignments (but that would also be simple to change).