Merge lp:~jon-hill/supertree-toolkit/sub_in_subfile into lp:supertree-toolkit

Proposed by Jon Hill
Status: Merged
Merged at revision: 281
Proposed branch: lp:~jon-hill/supertree-toolkit/sub_in_subfile
Merge into: lp:supertree-toolkit
Diff against target: 13342 lines (+11326/-781)
44 files modified
debian/control (+1/-1)
debian/rules (+1/-0)
notes.txt (+38/-0)
stk/bzr_version.py (+5/-5)
stk/p4/NexusToken.py (+1/-0)
stk/p4/NexusToken2.py (+1/-1)
stk/p4/Tree.py (+1/-9)
stk/p4/Tree_muck.py (+4/-2)
stk/scripts/check_nomenclature.py (+0/-224)
stk/scripts/check_nomenclature.py.moved (+224/-0)
stk/scripts/create_colours_itol.py (+2/-11)
stk/scripts/create_taxonomy.py (+4/-100)
stk/scripts/fill_in_with_taxonomy.py (+711/-174)
stk/scripts/plot_character_taxa_matrix.py (+83/-1)
stk/scripts/plot_tree_taxa_matrix.py (+56/-0)
stk/scripts/remove_poorly_constrained_taxa.py (+43/-20)
stk/scripts/tree_from_taxonomy.py (+142/-0)
stk/stk (+787/-34)
stk/stk_exceptions.py (+8/-0)
stk/supertree_toolkit.py (+849/-47)
stk/test/_substitute_taxa.py (+19/-1)
stk/test/_supertree_toolkit.py (+138/-15)
stk/test/_trees.py (+13/-1)
stk/test/data/input/auto_sub.phyml (+97/-0)
stk/test/data/input/check_data_ind.phyml (+141/-0)
stk/test/data/input/check_taxonomy.phyml (+67/-0)
stk/test/data/input/check_taxonomy_fixes.phyml (+378/-0)
stk/test/data/input/create_taxonomy.csv (+6/-6)
stk/test/data/input/create_taxonomy.phyml (+67/-0)
stk/test/data/input/equivalents.csv (+5/-0)
stk/test/data/input/mrca.tre (+1/-0)
stk/test/data/input/old_stk_test_data_ind.phyml (+1324/-0)
stk/test/data/input/old_stk_test_data_tax_overlap.phyml (+627/-0)
stk/test/data/input/old_stk_test_nonmonophyl_removed.phyml (+1324/-0)
stk/test/data/input/old_stk_test_species_level.phyml (+1324/-0)
stk/test/data/input/old_stk_test_taxonomy.csv (+334/-0)
stk/test/data/input/old_stk_test_taxonomy_check_subs.dat (+26/-0)
stk/test/data/input/old_stk_test_taxonomy_checked.phyml (+1324/-0)
stk/test/data/input/old_stk_test_taxonomy_checker.csv (+336/-0)
stk/test/data/output/one_click_subs_output.phyml (+97/-0)
stk/test/util.py (+7/-0)
stk_gui/gui/gui.glade (+670/-124)
stk_gui/plugins/phyml/name_author.py (+4/-1)
stk_gui/stk_gui/interface.py (+36/-4)
To merge this branch: bzr merge lp:~jon-hill/supertree-toolkit/sub_in_subfile
Reviewer Review Type Date Requested Status
Jon Hill Approve
Review via email: mp+314598@code.launchpad.net

Description of the change

Adding taxonomic awareness and fixing a lot of bugs

To post a comment you must log in.
322. By Jon Hill

removing file that shouldn't be there

323. By Jon Hill

removing file that shouldn't be there

Revision history for this message
Jon Hill (jon-hill) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'debian/control'
--- debian/control 2016-12-14 16:22:12 +0000
+++ debian/control 2017-01-12 09:27:31 +0000
@@ -9,7 +9,7 @@
99
10Package: supertree-toolkit10Package: supertree-toolkit
11Architecture: all11Architecture: all
12Depends: python-tk, python-dxdiff, python-pygraphviz, python-lxml-dbg, python-lxml, python-gtk2, python-numpy, python-matplotlib, python-lxml, libxml2-utils, python, python-gtksourceview2, python-glade2, python-networkx12Depends: python-tk, python-simplejson, python-dxdiff, python-pygraphviz, python-lxml-dbg, python-lxml, python-gtk2, python-numpy, python-matplotlib, python-lxml, libxml2-utils, python, python-gtksourceview2, python-glade2, python-networkx, python-argcomplete
13Recommends: python-psyco13Recommends: python-psyco
14Suggests: 14Suggests:
15Conflicts: 15Conflicts:
1616
=== modified file 'debian/rules'
--- debian/rules 2013-10-14 12:58:59 +0000
+++ debian/rules 2017-01-12 09:27:31 +0000
@@ -6,5 +6,6 @@
66
7override_dh_auto_install:7override_dh_auto_install:
8 python setup.py install --root=debian/supertree-toolkit --install-layout=deb --install-scripts=/usr/bin8 python setup.py install --root=debian/supertree-toolkit --install-layout=deb --install-scripts=/usr/bin
9 argcomplete.autocomplete(parser)
910
10override_dh_auto_build:11override_dh_auto_build:
1112
=== added file 'notes.txt'
--- notes.txt 1970-01-01 00:00:00 +0000
+++ notes.txt 2017-01-12 09:27:31 +0000
@@ -0,0 +1,38 @@
1Ideas:
2
3Collect data, remove paraphyletic
4
5Take taxonomy (from dbs), phyml, users knowledge (encoded as subs file) and information on synonyms (from dbs)
6to create a master subs file that takes the dat to species level
7
8User needs to be able to edit taxonomy - CSV file
9
10User needs to choose database source - preferred source.
11
12
13Taxonomic name checker:
14
15 - use database to get synonyms and possible mispellings
16 - Gui is a 2 column table with green, yellow, red. User filles in red (or removes it), green is fine. Yellow - drop down list with alternatives.
17 - Use this to generate a two column CSV file
18 - On CLI, generate a three column CSV. Original name, new name (or blank for unknown) and a list of possibles. Warn user they *must* fill in the second column or remove the row or the taxa will be deleted.
19
20For colloqual names, user adds to column 1 of taxonomy csv and then adds the latin name in the approriate column of the database. The subs can then generate the species list.
21
22Use these two csv files to generate a subs file, including replacing higher taxa and genera to create a "to species" substtution (can also output this file for later)
23
24Generating data to any taxonomic level can happen later - need to check each species is accounted for in the taxonomy, with correct levels - may need another parse of the taxonomy csv
25
26
27Add data -> paraphyletic taxa -> taxonomy checker -> sub synonyms -> taxonomy generator -> create species level dataset
28
29New functions:
30 - taxonomic name checker (this might take a while when online for large dataset) - note that this should be a one for one substitution - seperate function so we can check this?
31 - Pull in taxonomy generator
32 - Add csv file to schema
33 - amaend manual with workflow
34 - warning on multiple subs in data in manual
35 - generate species level subsfile from taxonomy
36 - generate specified taxonomic level data
37
38
039
=== modified file 'stk/bzr_version.py'
--- stk/bzr_version.py 2017-01-11 17:42:56 +0000
+++ stk/bzr_version.py 2017-01-12 09:27:31 +0000
@@ -4,12 +4,12 @@
4So don't edit it. :)4So don't edit it. :)
5"""5"""
66
7version_info = {'branch_nick': u'supertree-toolkit',7version_info = {'branch_nick': u'sub_in_subfile',
8 'build_date': '2017-01-11 17:42:27 +0000',8 'build_date': '2017-01-11 17:48:33 +0000',
9 'clean': None,9 'clean': None,
10 'date': '2017-01-11 17:39:43 +0000',10 'date': '2017-01-11 17:48:18 +0000',
11 'revision_id': 'jon.hill@imperial.ac.uk-20170111173943-88so1icr33su3afo',11 'revision_id': 'jon.hill@imperial.ac.uk-20170111174818-9q8a9octvnawruuw',
12 'revno': '279'}12 'revno': '317'}
1313
14revisions = {}14revisions = {}
1515
1616
=== modified file 'stk/p4/NexusToken.py'
--- stk/p4/NexusToken.py 2012-01-11 08:57:43 +0000
+++ stk/p4/NexusToken.py 2017-01-12 09:27:31 +0000
@@ -44,6 +44,7 @@
44 gm = ["safeNextTok(), called from %s" % caller]44 gm = ["safeNextTok(), called from %s" % caller]
45 else:45 else:
46 gm = ["safeNextTok()"]46 gm = ["safeNextTok()"]
47 print flob
47 gm.append("Premature Death.")48 gm.append("Premature Death.")
48 gm.append("Ran out of understandable things to read in nexus file.")49 gm.append("Ran out of understandable things to read in nexus file.")
49 raise Glitch, gm50 raise Glitch, gm
5051
=== modified file 'stk/p4/NexusToken2.py'
--- stk/p4/NexusToken2.py 2012-01-11 08:57:43 +0000
+++ stk/p4/NexusToken2.py 2017-01-12 09:27:31 +0000
@@ -88,7 +88,7 @@
88 else:88 else:
89 gm = ["safeNextTok()"]89 gm = ["safeNextTok()"]
90 gm.append("Premature Death.")90 gm.append("Premature Death.")
91 gm.append("Ran out of understandable things to read in nexus file.")91 gm.append("Ran out of understandable things to read in nexus file." + str(flob))
92 raise Glitch, gm92 raise Glitch, gm
93 else:93 else:
94 return t94 return t
9595
=== modified file 'stk/p4/Tree.py'
--- stk/p4/Tree.py 2013-08-25 09:24:34 +0000
+++ stk/p4/Tree.py 2017-01-12 09:27:31 +0000
@@ -996,17 +996,9 @@
996 if not item.name:996 if not item.name:
997 if item == self.root:997 if item == self.root:
998 if var.fixRootedTrees:998 if var.fixRootedTrees:
999 if self.name:999 #print "Fixing tree to work with SuperTree scores"
1000 print "Tree.initFinish() tree '%s'" % self.name
1001 else:
1002 print 'Tree.initFinish()'
1003 print "Fixing tree to work with SuperTree scores"
1004 self.removeRoot()1000 self.removeRoot()
1005 elif var.warnAboutTerminalRootWithNoName:1001 elif var.warnAboutTerminalRootWithNoName:
1006 if self.name:
1007 print "Tree.initFinish() tree '%s'" % self.name
1008 else:
1009 print 'Tree.initFinish()'
1010 print ' Non-fatal warning: the root is terminal, but has no name.'1002 print ' Non-fatal warning: the root is terminal, but has no name.'
1011 print ' This may be what you wanted. Or not?'1003 print ' This may be what you wanted. Or not?'
1012 print ' (To get rid of this warning, turn off var.warnAboutTerminalRootWithNoName)'1004 print ' (To get rid of this warning, turn off var.warnAboutTerminalRootWithNoName)'
10131005
=== modified file 'stk/p4/Tree_muck.py'
--- stk/p4/Tree_muck.py 2015-02-19 14:47:06 +0000
+++ stk/p4/Tree_muck.py 2017-01-12 09:27:31 +0000
@@ -769,6 +769,7 @@
769 else:769 else:
770 gm.append("The 2 specified nodes should have a parent-child relationship")770 gm.append("The 2 specified nodes should have a parent-child relationship")
771 raise Glitch, gm771 raise Glitch, gm
772
772 if var.usePfAndNumpy:773 if var.usePfAndNumpy:
773 self.deleteCStuff()774 self.deleteCStuff()
774775
@@ -1629,7 +1630,7 @@
16291630
16301631
16311632
1632def addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None):1633def addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None, ignoreRootAssert=False):
1633 """Add a subtree to a tree.1634 """Add a subtree to a tree.
16341635
1635 The nodes from theSubTree are added to self.nodes, and theSubTree1636 The nodes from theSubTree are added to self.nodes, and theSubTree
@@ -1666,7 +1667,8 @@
16661667
1667 assert selfNode in self.nodes1668 assert selfNode in self.nodes
1668 assert selfNode.parent1669 assert selfNode.parent
1669 assert theSubTree.root.leftChild and not theSubTree.root.leftChild.sibling # its a root on a stick1670 if not ignoreRootAssert:
1671 assert theSubTree.root.leftChild and not theSubTree.root.leftChild.sibling # its a root on a stick
1670 if not subTreeTaxNames:1672 if not subTreeTaxNames:
1671 subTreeTaxNames = [n.name for n in theSubTree.iterLeavesNoRoot()]1673 subTreeTaxNames = [n.name for n in theSubTree.iterLeavesNoRoot()]
16721674
16731675
=== removed file 'stk/scripts/check_nomenclature.py'
--- stk/scripts/check_nomenclature.py 2016-07-14 10:12:17 +0000
+++ stk/scripts/check_nomenclature.py 1970-01-01 00:00:00 +0000
@@ -1,224 +0,0 @@
1#!/usr/bin/env python
2#
3# Derived from the Supertree Toolkit. Software for managing and manipulating sources
4# trees ready for supretree construction.
5# Copyright (C) 2015, Jon Hill, Katie Davis
6#
7# This program is free software: you can redistribute it and/or modify
8# it under the terms of the GNU General Public License as published by
9# the Free Software Foundation, either version 3 of the License, or
10# (at your option) any later version.
11#
12# This program is distributed in the hope that it will be useful,
13# but WITHOUT ANY WARRANTY; without even the implied warranty of
14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15# GNU General Public License for more details.
16#
17# You should have received a copy of the GNU General Public License
18# along with this program. If not, see <http://www.gnu.org/licenses/>.
19#
20# Jon Hill. jon.hill@york.ac.uk.
21#
22#
23# This is an enitrely self-contained script that does not require the STK to be installed.
24
25import urllib2
26from urllib import quote_plus
27import simplejson as json
28import argparse
29import os
30import sys
31import csv
32
33def main():
34
35 # do stuff
36 parser = argparse.ArgumentParser(
37 prog="Check nomenclature",
38 description="Check nomenclature from a tree file or list against valid names derived from EOL",
39 )
40 parser.add_argument(
41 '-v',
42 '--verbose',
43 action='store_true',
44 help="Verbose output: mainly progress reports.",
45 default=False
46 )
47 parser.add_argument(
48 '--existing',
49 help="An existing output file to update further, e.g. with a new set of taxa. Supply the file name."
50 )
51 parser.add_argument(
52 'input_file',
53 metavar='input_file',
54 nargs=1,
55 help="Your input taxa list"
56 )
57 parser.add_argument(
58 'output_file',
59 metavar='output_file',
60 nargs=1,
61 help="The output file. A CSV-based output, listing name checked, valid name, synonyms and status (red, amber, yellow, green)."
62 )
63
64 args = parser.parse_args()
65 verbose = args.verbose
66 input_file = args.input_file[0]
67 output_file = args.output_file[0]
68 existing_data = args.existing
69
70 if (not existing_data == None):
71 exiting_data = load_equivalents(existing_data)
72 else:
73 existing_data = None
74
75 with open(input_file,'r') as f:
76 lines = f.read().splitlines()
77 equivs = taxonomic_checker_list(lines, existing_data, verbose=verbose)
78
79
80 f = open(output_file,"w")
81 for taxon in sorted(equivs.keys()):
82 f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n")
83 f.close()
84
85 return
86
87
88def taxonomic_checker_list(name_list,existing_data=None,verbose=False):
89 """ For each name in the database generate a database of the original name,
90 possible synonyms and if the taxon is not know, signal that. We do this by
91 using the EoL API to grab synonyms of each taxon. """
92
93
94 if existing_data == None:
95 equivalents = {}
96 else:
97 equivalents = existing_data
98
99 # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result?
100 # if not, is there another API function to do this?
101 # search for the taxon and grab the name - if you search for a recognised synonym on EoL then
102 # you get the original ('correct') name - shorten this to two words and you're done.
103 for t in name_list:
104 # make sure t has no spaces.
105 t = t.replace(" ","_")
106 if t in equivalents:
107 continue
108 taxon = t.replace("_"," ")
109 if (verbose):
110 print "Looking up ", taxon
111 # get the data from EOL on taxon
112 taxonq = quote_plus(taxon)
113 URL = "http://eol.org/api/search/1.0.json?q="+taxonq
114 req = urllib2.Request(URL)
115 opener = urllib2.build_opener()
116 f = opener.open(req)
117 data = json.load(f)
118 # check if there's some data
119 if len(data['results']) == 0:
120 equivalents[t] = [[t],'red']
121 continue
122 amber = False
123 if len(data['results']) > 1:
124 # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this
125 # for automatic processing we'll just take the first one though
126 # colour is amber in this case
127 amber = True
128 ID = str(data['results'][0]['id']) # take first hit
129 URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=2&videos=0&sounds=0&maps=0&text=2&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0"
130 req = urllib2.Request(URL)
131 opener = urllib2.build_opener()
132
133 try:
134 f = opener.open(req)
135 except urllib2.HTTPError:
136 equivalents[t] = [[t],'red']
137 continue
138 data = json.load(f)
139 if len(data['scientificName']) == 0:
140 # not found a scientific name, so set as red
141 equivalents[t] = [[t],'red']
142 continue
143 correct_name = data['scientificName'].encode("ascii","ignore")
144 # we only want the first two bits of the name, not the original author and year if any
145 temp_name = correct_name.split(' ')
146 if (len(temp_name) > 2):
147 correct_name = ' '.join(temp_name[0:2])
148 correct_name = correct_name.replace(' ','_')
149 print correct_name, t
150
151 # build up the output dictionary - original name is key, synonyms/missing is value
152 if (correct_name == t or correct_name == taxon):
153 # if the original matches the 'correct', then it's green
154 equivalents[t] = [[t], 'green']
155 else:
156 # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the
157 # 'correct' taxon at the top
158 eol_synonyms = data['synonyms']
159 synonyms = []
160 for s in eol_synonyms:
161 ts = s['synonym'].encode("ascii","ignore")
162 temp_syn = ts.split(' ')
163 if (len(temp_syn) > 2):
164 temp_syn = ' '.join(temp_syn[0:2])
165 ts = temp_syn
166 if (s['relationship'] == "synonym"):
167 ts = ts.replace(" ","_")
168 synonyms.append(ts)
169 synonyms = _uniquify(synonyms)
170 # we need to put the correct name at the top of the list now
171 if (correct_name in synonyms):
172 synonyms.insert(0, synonyms.pop(synonyms.index(correct_name)))
173 elif len(synonyms) == 0:
174 synonyms.append(correct_name)
175 else:
176 synonyms.insert(0,correct_name)
177
178 if (amber):
179 equivalents[t] = [synonyms,'amber']
180 else:
181 equivalents[t] = [synonyms,'yellow']
182 # if our search was empty, then it's red - see above
183
184 # up to the calling funciton to do something sensible with this
185 # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red.
186 # Amber means we found synonyms and multilpe hits. User def needs to sort these!
187
188 return equivalents
189
190def load_equivalents(equiv_csv):
191 """Load equivalents data from a csv and convert to a equivalents Dict.
192 Structure is key, with a list that is array of synonyms, followed by status ('green',
193 'yellow', 'amber', or 'red').
194
195 """
196
197 import csv
198
199 equivalents = {}
200
201 with open(equiv_csv, 'rU') as csvfile:
202 equiv_reader = csv.reader(csvfile, delimiter=',')
203 equiv_reader.next() # skip header
204 for row in equiv_reader:
205 i = 1
206 equivalents[row[0]] = [row[1].split(';'),row[2]]
207
208 return equivalents
209
210def _uniquify(l):
211 """
212 Make a list, l, contain only unique data
213 """
214 keys = {}
215 for e in l:
216 keys[e] = 1
217
218 return keys.keys()
219
220if __name__ == "__main__":
221 main()
222
223
224
2250
=== added file 'stk/scripts/check_nomenclature.py.moved'
--- stk/scripts/check_nomenclature.py.moved 1970-01-01 00:00:00 +0000
+++ stk/scripts/check_nomenclature.py.moved 2017-01-12 09:27:31 +0000
@@ -0,0 +1,224 @@
1#!/usr/bin/env python
2#
3# Derived from the Supertree Toolkit. Software for managing and manipulating sources
4# trees ready for supretree construction.
5# Copyright (C) 2015, Jon Hill, Katie Davis
6#
7# This program is free software: you can redistribute it and/or modify
8# it under the terms of the GNU General Public License as published by
9# the Free Software Foundation, either version 3 of the License, or
10# (at your option) any later version.
11#
12# This program is distributed in the hope that it will be useful,
13# but WITHOUT ANY WARRANTY; without even the implied warranty of
14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15# GNU General Public License for more details.
16#
17# You should have received a copy of the GNU General Public License
18# along with this program. If not, see <http://www.gnu.org/licenses/>.
19#
20# Jon Hill. jon.hill@york.ac.uk.
21#
22#
23# This is an enitrely self-contained script that does not require the STK to be installed.
24
25import urllib2
26from urllib import quote_plus
27import simplejson as json
28import argparse
29import os
30import sys
31import csv
32
33def main():
34
35 # do stuff
36 parser = argparse.ArgumentParser(
37 prog="Check nomenclature",
38 description="Check nomenclature from a tree file or list against valid names derived from EOL",
39 )
40 parser.add_argument(
41 '-v',
42 '--verbose',
43 action='store_true',
44 help="Verbose output: mainly progress reports.",
45 default=False
46 )
47 parser.add_argument(
48 '--existing',
49 help="An existing output file to update further, e.g. with a new set of taxa. Supply the file name."
50 )
51 parser.add_argument(
52 'input_file',
53 metavar='input_file',
54 nargs=1,
55 help="Your input taxa list"
56 )
57 parser.add_argument(
58 'output_file',
59 metavar='output_file',
60 nargs=1,
61 help="The output file. A CSV-based output, listing name checked, valid name, synonyms and status (red, amber, yellow, green)."
62 )
63
64 args = parser.parse_args()
65 verbose = args.verbose
66 input_file = args.input_file[0]
67 output_file = args.output_file[0]
68 existing_data = args.existing
69
70 if (not existing_data == None):
71 exiting_data = load_equivalents(existing_data)
72 else:
73 existing_data = None
74
75 with open(input_file,'r') as f:
76 lines = f.read().splitlines()
77 equivs = taxonomic_checker_list(lines, existing_data, verbose=verbose)
78
79
80 f = open(output_file,"w")
81 for taxon in sorted(equivs.keys()):
82 f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n")
83 f.close()
84
85 return
86
87
88def taxonomic_checker_list(name_list,existing_data=None,verbose=False):
89 """ For each name in the database generate a database of the original name,
90 possible synonyms and if the taxon is not know, signal that. We do this by
91 using the EoL API to grab synonyms of each taxon. """
92
93
94 if existing_data == None:
95 equivalents = {}
96 else:
97 equivalents = existing_data
98
99 # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result?
100 # if not, is there another API function to do this?
101 # search for the taxon and grab the name - if you search for a recognised synonym on EoL then
102 # you get the original ('correct') name - shorten this to two words and you're done.
103 for t in name_list:
104 # make sure t has no spaces.
105 t = t.replace(" ","_")
106 if t in equivalents:
107 continue
108 taxon = t.replace("_"," ")
109 if (verbose):
110 print "Looking up ", taxon
111 # get the data from EOL on taxon
112 taxonq = quote_plus(taxon)
113 URL = "http://eol.org/api/search/1.0.json?q="+taxonq
114 req = urllib2.Request(URL)
115 opener = urllib2.build_opener()
116 f = opener.open(req)
117 data = json.load(f)
118 # check if there's some data
119 if len(data['results']) == 0:
120 equivalents[t] = [[t],'red']
121 continue
122 amber = False
123 if len(data['results']) > 1:
124 # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this
125 # for automatic processing we'll just take the first one though
126 # colour is amber in this case
127 amber = True
128 ID = str(data['results'][0]['id']) # take first hit
129 URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=2&videos=0&sounds=0&maps=0&text=2&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0"
130 req = urllib2.Request(URL)
131 opener = urllib2.build_opener()
132
133 try:
134 f = opener.open(req)
135 except urllib2.HTTPError:
136 equivalents[t] = [[t],'red']
137 continue
138 data = json.load(f)
139 if len(data['scientificName']) == 0:
140 # not found a scientific name, so set as red
141 equivalents[t] = [[t],'red']
142 continue
143 correct_name = data['scientificName'].encode("ascii","ignore")
144 # we only want the first two bits of the name, not the original author and year if any
145 temp_name = correct_name.split(' ')
146 if (len(temp_name) > 2):
147 correct_name = ' '.join(temp_name[0:2])
148 correct_name = correct_name.replace(' ','_')
149 print correct_name, t
150
151 # build up the output dictionary - original name is key, synonyms/missing is value
152 if (correct_name == t or correct_name == taxon):
153 # if the original matches the 'correct', then it's green
154 equivalents[t] = [[t], 'green']
155 else:
156 # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the
157 # 'correct' taxon at the top
158 eol_synonyms = data['synonyms']
159 synonyms = []
160 for s in eol_synonyms:
161 ts = s['synonym'].encode("ascii","ignore")
162 temp_syn = ts.split(' ')
163 if (len(temp_syn) > 2):
164 temp_syn = ' '.join(temp_syn[0:2])
165 ts = temp_syn
166 if (s['relationship'] == "synonym"):
167 ts = ts.replace(" ","_")
168 synonyms.append(ts)
169 synonyms = _uniquify(synonyms)
170 # we need to put the correct name at the top of the list now
171 if (correct_name in synonyms):
172 synonyms.insert(0, synonyms.pop(synonyms.index(correct_name)))
173 elif len(synonyms) == 0:
174 synonyms.append(correct_name)
175 else:
176 synonyms.insert(0,correct_name)
177
178 if (amber):
179 equivalents[t] = [synonyms,'amber']
180 else:
181 equivalents[t] = [synonyms,'yellow']
182 # if our search was empty, then it's red - see above
183
184 # up to the calling funciton to do something sensible with this
185 # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red.
186 # Amber means we found synonyms and multilpe hits. User def needs to sort these!
187
188 return equivalents
189
190def load_equivalents(equiv_csv):
191 """Load equivalents data from a csv and convert to a equivalents Dict.
192 Structure is key, with a list that is array of synonyms, followed by status ('green',
193 'yellow', 'amber', or 'red').
194
195 """
196
197 import csv
198
199 equivalents = {}
200
201 with open(equiv_csv, 'rU') as csvfile:
202 equiv_reader = csv.reader(csvfile, delimiter=',')
203 equiv_reader.next() # skip header
204 for row in equiv_reader:
205 i = 1
206 equivalents[row[0]] = [row[1].split(';'),row[2]]
207
208 return equivalents
209
210def _uniquify(l):
211 """
212 Make a list, l, contain only unique data
213 """
214 keys = {}
215 for e in l:
216 keys[e] = 1
217
218 return keys.keys()
219
220if __name__ == "__main__":
221 main()
222
223
224
0225
=== modified file 'stk/scripts/create_colours_itol.py'
--- stk/scripts/create_colours_itol.py 2014-12-09 10:58:48 +0000
+++ stk/scripts/create_colours_itol.py 2017-01-12 09:27:31 +0000
@@ -88,17 +88,8 @@
88 saturation=0.2588 saturation=0.25
89 value=0.889 value=0.8
9090
91 index = 3 # family91 index = stk.taxonomy_levels.index(level.lower())+1
92 if (level == "Superfamily"):92 print index
93 index = 4
94 elif (level == "Infraorder"):
95 index = 5
96 elif (level == "Suborder"):
97 index = 6
98 elif (level == "Order"):
99 index = 7
100 elif (level == "Genus"):
101 index = 2
10293
103 if (tree):94 if (tree):
104 tree_data = stk.import_tree(input_file)95 tree_data = stk.import_tree(input_file)
10596
=== modified file 'stk/scripts/create_taxonomy.py'
--- stk/scripts/create_taxonomy.py 2014-03-13 18:45:05 +0000
+++ stk/scripts/create_taxonomy.py 2017-01-12 09:27:31 +0000
@@ -16,6 +16,8 @@
16import supertree_toolkit as stk16import supertree_toolkit as stk
17import csv17import csv
1818
19taxonomy_levels = stk.taxonomy_levels
20
19def main():21def main():
2022
21 # do stuff23 # do stuff
@@ -66,13 +68,6 @@
66 f.close()68 f.close()
6769
68 taxonomy = {}70 taxonomy = {}
69 # What we get from EOL
70 current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom']
71 # And the extra ones from ITIS
72 extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom']
73 # all of them in order
74 taxonomy_levels = ['species','genus','family','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom']
75
7671
77 for taxon in taxa:72 for taxon in taxa:
78 taxon = taxon.replace("_"," ")73 taxon = taxon.replace("_"," ")
@@ -180,99 +175,8 @@
180 continue175 continue
181 176
182177
183 # Now create the CSV output178 stk.save_taxonomy(taxonomy, output_file)
184 with open(output_file, 'w') as f:179
185 writer = csv.writer(f)
186 writer.writerow(taxonomy_levels)
187 for t in taxonomy:
188 species = t
189 try:
190 genus = taxonomy[t]['genus']
191 except KeyError:
192 genus = "-"
193 try:
194 family = taxonomy[t]['family']
195 except KeyError:
196 family = "-"
197 try:
198 superfamily = taxonomy[t]['superfamily']
199 except KeyError:
200 superfamily = "-"
201 try:
202 infraorder = taxonomy[t]['infraorder']
203 except KeyError:
204 infraorder = "-"
205 try:
206 suborder = taxonomy[t]['suborder']
207 except KeyError:
208 suborder = "-"
209 try:
210 order = taxonomy[t]['order']
211 except KeyError:
212 order = "-"
213 try:
214 superorder = taxonomy[t]['superorder']
215 except KeyError:
216 superorder = "-"
217 try:
218 subclass = taxonomy[t]['subclass']
219 except KeyError:
220 subclass = "-"
221 try:
222 tclass = taxonomy[t]['class']
223 except KeyError:
224 tclass = "-"
225 try:
226 subphylum = taxonomy[t]['subphylum']
227 except KeyError:
228 subphylum = "-"
229 try:
230 phylum = taxonomy[t]['phylum']
231 except KeyError:
232 phylum = "-"
233 try:
234 superphylum = taxonomy[t]['superphylum']
235 except KeyError:
236 superphylum = "-"
237 try:
238 infrakingdom = taxonomy[t]['infrakingdom']
239 except:
240 infrakingdom = "-"
241 try:
242 subkingdom = taxonomy[t]['subkingdom']
243 except:
244 subkingdom = "-"
245 try:
246 kingdom = taxonomy[t]['kingdom']
247 except KeyError:
248 kingdom = "-"
249 try:
250 provider = taxonomy[t]['provider']
251 except KeyError:
252 provider = "-"
253
254
255 this_classification = [
256 species.encode('utf-8'),
257 genus.encode('utf-8'),
258 family.encode('utf-8'),
259 superfamily.encode('utf-8'),
260 infraorder.encode('utf-8'),
261 suborder.encode('utf-8'),
262 order.encode('utf-8'),
263 superorder.encode('utf-8'),
264 subclass.encode('utf-8'),
265 tclass.encode('utf-8'),
266 subphylum.encode('utf-8'),
267 phylum.encode('utf-8'),
268 superphylum.encode('utf-8'),
269 infrakingdom.encode('utf-8'),
270 subkingdom.encode('utf-8'),
271 kingdom.encode('utf-8'),
272 provider.encode('utf-8')]
273 writer.writerow(this_classification)
274
275
276def _uniquify(l):180def _uniquify(l):
277 """181 """
278 Make a list, l, contain only unique data182 Make a list, l, contain only unique data
279183
=== modified file 'stk/scripts/fill_in_with_taxonomy.py'
--- stk/scripts/fill_in_with_taxonomy.py 2016-12-14 16:22:12 +0000
+++ stk/scripts/fill_in_with_taxonomy.py 2017-01-12 09:27:31 +0000
@@ -23,21 +23,90 @@
23from urllib import quote_plus23from urllib import quote_plus
24import simplejson as json24import simplejson as json
25import argparse25import argparse
26import copy
26import os27import os
27import sys28import sys
28stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir )29stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir )
29sys.path.insert(0, stk_path)30sys.path.insert(0, stk_path)
30import supertree_toolkit as stk31import supertree_toolkit as stk
31import csv32import csv
3233from ete2 import Tree
33# What we get from EOL34import tempfile
34current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom']35import re
35# And the extra ones from ITIS36
36extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom']37taxonomy_levels = stk.taxonomy_levels
37# all of them in order38#tlevels = ['species','genus','family','superfamily','suborder','order','class','phylum','kingdom']
38taxonomy_levels = ['species','genus','subfamily','family','tribe','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom']39tlevels = ['species','genus', 'subfamily', 'family','infraorder','order','class','phylum','kingdom']
3940
40def get_tree_taxa_taxonomy(taxon,wsdlObjectWoRMS):41def get_tree_taxa_taxonomy_eol(taxon):
42
43 taxonq = quote_plus(taxon)
44 URL = "http://eol.org/api/search/1.0.json?q="+taxonq
45 req = urllib2.Request(URL)
46 opener = urllib2.build_opener()
47 f = opener.open(req)
48 data = json.load(f)
49
50 if data['results'] == []:
51 return {}
52 ID = str(data['results'][0]['id']) # take first hit
53 # Now look for taxonomies
54 URL = "http://eol.org/api/pages/1.0/"+ID+".json"
55 req = urllib2.Request(URL)
56 opener = urllib2.build_opener()
57 f = opener.open(req)
58 data = json.load(f)
59 if len(data['taxonConcepts']) == 0:
60 return {}
61 TID = str(data['taxonConcepts'][0]['identifier']) # take first hit
62 currentdb = str(data['taxonConcepts'][0]['nameAccordingTo'])
63 # loop through and get preferred one if specified
64 # now get taxonomy
65 for db in data['taxonConcepts']:
66 currentdb = db['nameAccordingTo'].lower()
67 TID = str(db['identifier'])
68 break
69 URL="http://eol.org/api/hierarchy_entries/1.0/"+TID+".json"
70 req = urllib2.Request(URL)
71 opener = urllib2.build_opener()
72 f = opener.open(req)
73 data = json.load(f)
74 tax_array = {}
75 tax_array['provider'] = currentdb
76 for a in data['ancestors']:
77 try:
78 if a.has_key('taxonRank') :
79 temp_level = a['taxonRank'].encode("ascii","ignore")
80 if (temp_level in taxonomy_levels):
81 # note the dump into ASCII
82 temp_name = a['scientificName'].encode("ascii","ignore")
83 temp_name = temp_name.split(" ")
84 if (temp_level == 'species'):
85 tax_array[temp_level] = "_".join(temp_name[0:2])
86
87 else:
88 tax_array[temp_level] = temp_name[0]
89 except KeyError as e:
90 logging.exception("Key not found: taxonRank")
91 continue
92 try:
93 # add taxonomy in to the taxonomy!
94 # some issues here, so let's make sure it's OK
95 temp_name = taxon.split(" ")
96 if data.has_key('taxonRank') :
97 if not data['taxonRank'].lower() == 'species':
98 tax_array[data['taxonRank'].lower()] = temp_name[0]
99 else:
100 tax_array[data['taxonRank'].lower()] = ' '.join(temp_name[0:2])
101 except KeyError as e:
102 return tax_array
103
104 return tax_array
105
106def get_tree_taxa_taxonomy_worms(taxon):
107
108 from SOAPpy import WSDL
109 wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1')
41110
42 taxon_data = wsdlObjectWoRMS.getAphiaRecords(taxon.replace('_',' '))111 taxon_data = wsdlObjectWoRMS.getAphiaRecords(taxon.replace('_',' '))
43 if taxon_data == None:112 if taxon_data == None:
@@ -51,6 +120,8 @@
51 classification = wsdlObjectWoRMS.getAphiaClassificationByID(taxon_id)120 classification = wsdlObjectWoRMS.getAphiaClassificationByID(taxon_id)
52 # construct array121 # construct array
53 tax_array = {}122 tax_array = {}
123 if (classification == ""):
124 return {}
54 # classification is a nested dictionary, so we need to iterate down it125 # classification is a nested dictionary, so we need to iterate down it
55 current_child = classification.child126 current_child = classification.child
56 while True:127 while True:
@@ -60,27 +131,252 @@
60 break131 break
61 return tax_array132 return tax_array
62133
63134def get_tree_taxa_taxonomy_itis(taxon):
64135
65def get_taxonomy_worms(taxonomy, start_otu):136 URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(taxon.replace('_',' ').strip())
137 req = urllib2.Request(URL)
138 opener = urllib2.build_opener()
139 f = opener.open(req)
140 string = unicode(f.read(),"ISO-8859-1")
141 this_item = json.loads(string)
142 if this_item['scientificNames'] == [None]: # not found
143 return {}
144 tsn = this_item['scientificNames'][0]['tsn'] # there might be records that aren't valid - they point to the valid one though
145 # so call another function to get any valid names
146 URL="http://www.itis.gov/ITISWebService/jsonservice/getAcceptedNamesFromTSN?tsn="+tsn
147 req = urllib2.Request(URL)
148 opener = urllib2.build_opener()
149 f = opener.open(req)
150 string = unicode(f.read(),"ISO-8859-1")
151 this_item = json.loads(string)
152 if not this_item['acceptedNames'] == [None]:
153 tsn = this_item['acceptedNames'][0]['acceptedTsn']
154
155 URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+str(tsn)
156 req = urllib2.Request(URL)
157 opener = urllib2.build_opener()
158 f = opener.open(req)
159 string = unicode(f.read(),"ISO-8859-1")
160 data = json.loads(string)
161 # construct array
162 this_taxonomy = {}
163 for level in data['hierarchyList']:
164 if level['rankName'].lower() in taxonomy_levels:
165 # note the dump into ASCII
166 this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore")
167
168 return this_taxonomy
169
170
171
172def get_taxonomy_eol(taxonomy, start_otu, verbose,tmpfile=None,skip=False):
173
174 # this is the recursive function
175 def get_children(taxonomy, ID, aphiaIDsDone):
176
177 # get data
178 URL="http://eol.org/api/hierarchy_entries/1.0/"+str(ID)+".json?common_names=false&synonyms=false&cache_ttl="
179 req = urllib2.Request(URL)
180 opener = urllib2.build_opener()
181 f = opener.open(req)
182 string = unicode(f.read(),"ISO-8859-1")
183 this_item = json.loads(string)
184 if this_item == None:
185 return taxonomy
186 if this_item['taxonRank'].lower().strip() == 'species':
187 # add data to taxonomy dictionary
188 taxon = this_item['scientificName'].split()[0:2] # just the first two words
189 taxon = " ".join(taxon[0:2])
190 # NOTE following line means existing items are *not* updated
191 if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy
192 this_taxonomy = {}
193 for level in this_item['ancestors']:
194 if level['taxonRank'].lower() in taxonomy_levels:
195 # note the dump into ASCII
196 this_taxonomy[level['taxonRank'].lower().encode("ascii","ignore")] = level['scientificName'].encode("ascii","ignore")
197 # add species:
198 this_taxonomy['species'] = taxon.replace(" ","_")
199 if verbose:
200 print "\tAdding "+taxon
201 taxonomy[taxon] = this_taxonomy
202 if not tmpfile == None:
203 stk.save_taxonomy(taxonomy,tmpfile)
204 return taxonomy
205 else:
206 return taxonomy
207 all_children = []
208 for level in this_item['children']:
209 if not level == None:
210 all_children.append(level['taxonID'])
211
212 if (len(all_children) == 0):
213 return taxonomy
214
215 for child in all_children:
216 if child in aphiaIDsDone: # we get stuck sometime
217 continue
218 aphiaIDsDone.append(child)
219 taxonomy = get_children(taxonomy, child, aphiaIDsDone)
220 return taxonomy
221
222
223 # main bit of the get_taxonomy_eol function
224 taxonq = quote_plus(start_otu)
225 URL = "http://eol.org/api/search/1.0.json?q="+taxonq
226 req = urllib2.Request(URL)
227 opener = urllib2.build_opener()
228 f = opener.open(req)
229 data = json.load(f)
230 start_id = str(data['results'][0]['id']) # this is the page ID. We get the species ID next
231 URL = "http://eol.org/api/pages/1.0/"+start_id+".json"
232 req = urllib2.Request(URL)
233 opener = urllib2.build_opener()
234 f = opener.open(req)
235 data = json.load(f)
236 if len(data['taxonConcepts']) == 0:
237 print "Error finding you start taxa. Spelling?"
238 return None
239 start_id = data['taxonConcepts'][0]['identifier']
240 start_taxonomy_level = data['taxonConcepts'][0]['taxonRank'].lower()
241
242 aphiaIDsDone = []
243 if not skip:
244 taxonomy = get_children(taxonomy,start_id,aphiaIDsDone)
245
246 return taxonomy, start_taxonomy_level
247
248
249
250def get_taxonomy_itis(taxonomy, start_otu, verbose,tmpfile=None,skip=False):
251 import simplejson as json
252
253 # this is the recursive function
254 def get_children(taxonomy, ID, aphiaIDsDone):
255
256 # get data
257 URL="http://www.itis.gov/ITISWebService/jsonservice/getFullRecordFromTSN?tsn="+ID
258 req = urllib2.Request(URL)
259 opener = urllib2.build_opener()
260 f = opener.open(req)
261 string = unicode(f.read(),"ISO-8859-1")
262 this_item = json.loads(string)
263 if this_item == None:
264 return taxonomy
265 if not this_item['usage']['taxonUsageRating'].lower() == 'valid':
266 print "rejecting " , this_item['scientificName']['combinedName']
267 return taxonomy
268 if this_item['taxRank']['rankName'].lower().strip() == 'species':
269 # add data to taxonomy dictionary
270 taxon = this_item['scientificName']['combinedName']
271 # NOTE following line means existing items are *not* updated
272 if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy
273 # get the taxonomy of this species
274 tsn = this_item["scientificName"]["tsn"]
275 URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+tsn
276 req = urllib2.Request(URL)
277 opener = urllib2.build_opener()
278 f = opener.open(req)
279 string = unicode(f.read(),"ISO-8859-1")
280 data = json.loads(string)
281 this_taxonomy = {}
282 for level in data['hierarchyList']:
283 if level['rankName'].lower() in taxonomy_levels:
284 # note the dump into ASCII
285 this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore")
286 if verbose:
287 print "\tAdding "+taxon
288 taxonomy[taxon] = this_taxonomy
289 if not tmpfile == None:
290 stk.save_taxonomy(taxonomy,tmpfile)
291 return taxonomy
292 else:
293 return taxonomy
294
295 all_children = []
296 URL="http://www.itis.gov/ITISWebService/jsonservice/getHierarchyDownFromTSN?tsn="+ID
297 req = urllib2.Request(URL)
298 opener = urllib2.build_opener()
299 f = opener.open(req)
300 string = unicode(f.read(),"ISO-8859-1")
301 this_item = json.loads(string)
302 if this_item == None:
303 return taxonomy
304
305 for level in this_item['hierarchyList']:
306 if not level == None:
307 all_children.append(level['tsn'])
308
309 if (len(all_children) == 0):
310 return taxonomy
311
312 for child in all_children:
313 if child in aphiaIDsDone: # we get stuck sometime
314 continue
315 aphiaIDsDone.append(child)
316 taxonomy = get_children(taxonomy, child, aphiaIDsDone)
317
318 return taxonomy
319
320
321 # main bit of the get_taxonomy_worms function
322 URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(start_otu.strip())
323 req = urllib2.Request(URL)
324 opener = urllib2.build_opener()
325 f = opener.open(req)
326 string = unicode(f.read(),"ISO-8859-1")
327 this_item = json.loads(string)
328 start_id = this_item['scientificNames'][0]['tsn'] # there might be records that aren't valid - they point to the valid one though
329 # call it again via the ID this time to make sure we've got the right one.
330 # so call another function to get any valid names
331 URL="http://www.itis.gov/ITISWebService/jsonservice/getAcceptedNamesFromTSN?tsn="+start_id
332 req = urllib2.Request(URL)
333 opener = urllib2.build_opener()
334 f = opener.open(req)
335 string = unicode(f.read(),"ISO-8859-1")
336 this_item = json.loads(string)
337 if not this_item['acceptedNames'] == [None]:
338 start_id = this_item['acceptedNames'][0]['acceptedTsn']
339
340 URL="http://www.itis.gov/ITISWebService/jsonservice/getFullRecordFromTSN?tsn="+start_id
341 req = urllib2.Request(URL)
342 opener = urllib2.build_opener()
343 f = opener.open(req)
344 string = unicode(f.read(),"ISO-8859-1")
345 this_item = json.loads(string)
346 start_taxonomy_level = this_item['taxRank']['rankName'].lower()
347
348 aphiaIDsDone = []
349 if not skip:
350 taxonomy = get_children(taxonomy,start_id,aphiaIDsDone)
351
352 return taxonomy, start_taxonomy_level
353
354
355
356
357def get_taxonomy_worms(taxonomy, start_otu, verbose,tmpfile=None,skip=False):
66 """ Gets and processes a taxon from the queue to get its taxonomy."""358 """ Gets and processes a taxon from the queue to get its taxonomy."""
67 from SOAPpy import WSDL 359 from SOAPpy import WSDL
68360
69 wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1')361 wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1')
70362
71 # this is the recursive function363 # this is the recursive function
72 def get_children(taxonomy, ID):364 def get_children(taxonomy, ID, aphiaIDsDone):
73365
74 # get data366 # get data
75 this_item = wsdlObjectWoRMS.getAphiaRecordByID(ID)367 this_item = wsdlObjectWoRMS.getAphiaRecordByID(ID)
76 if this_item == None:368 if this_item == None:
77 return taxonomy369 return taxonomy
370 if not this_item['status'].lower() == 'accepted':
371 print "rejecting " , this_item.valid_name
372 return taxonomy
78 if this_item['rank'].lower() == 'species':373 if this_item['rank'].lower() == 'species':
79 # add data to taxonomy dictionary374 # add data to taxonomy dictionary
80 # get the taxonomy of this species375 taxon = this_item.valid_name
81 classification = wsdlObjectWoRMS.getAphiaClassificationByID(ID)376 # NOTE following line means existing items are *not* updated
82 taxon = this_item.scientificname
83 if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy377 if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy
378 # get the taxonomy of this species
379 classification = wsdlObjectWoRMS.getAphiaClassificationByID(ID)
84 # construct array380 # construct array
85 tax_array = {}381 tax_array = {}
86 # classification is a nested dictionary, so we need to iterate down it382 # classification is a nested dictionary, so we need to iterate down it
@@ -92,16 +388,36 @@
92 current_child = current_child.child388 current_child = current_child.child
93 if current_child == '': # empty one is a string for some reason389 if current_child == '': # empty one is a string for some reason
94 break390 break
95 taxonomy[this_item.scientificname] = tax_array391 if verbose:
392 print "\tAdding "+this_item.scientificname
393 taxonomy[this_item.valid_name] = tax_array
394 if not tmpfile == None:
395 stk.save_taxonomy(taxonomy,tmpfile)
96 return taxonomy396 return taxonomy
97 else:397 else:
98 return taxonomy398 return taxonomy
99399
100 children = wsdlObjectWoRMS.getAphiaChildrenByID(ID, 1, False)400 all_children = []
101 401 start = 1
102 for child in children:402 while True:
103 taxonomy = get_children(taxonomy, child['valid_AphiaID'])403 children = wsdlObjectWoRMS.getAphiaChildrenByID(ID, start, False)
104404 if (children is None or children == None):
405 break
406 if (len(children) < 50):
407 all_children.extend(children)
408 break
409 all_children.extend(children)
410 start += 50
411
412 if (len(all_children) == 0):
413 return taxonomy
414
415 for child in all_children:
416 if child['valid_AphiaID'] in aphiaIDsDone: # we get stuck sometime
417 continue
418 aphiaIDsDone.append(child['valid_AphiaID'])
419 taxonomy = get_children(taxonomy, child['valid_AphiaID'], aphiaIDsDone)
420
105 return taxonomy421 return taxonomy
106 422
107423
@@ -111,12 +427,17 @@
111 start_id = start_taxa[0]['valid_AphiaID'] # there might be records that aren't valid - they point to the valid one though427 start_id = start_taxa[0]['valid_AphiaID'] # there might be records that aren't valid - they point to the valid one though
112 # call it again via the ID this time to make sure we've got the right one.428 # call it again via the ID this time to make sure we've got the right one.
113 start_taxa = wsdlObjectWoRMS.getAphiaRecordByID(start_id)429 start_taxa = wsdlObjectWoRMS.getAphiaRecordByID(start_id)
114 start_taxonomy_level = start_taxa['rank'].lower()430 if start_taxa == None:
115 except HTTPError:431 start_taxonomy_level = 'infraorder'
116 print "Error"432 else:
433 start_taxonomy_level = start_taxa['rank'].lower()
434 except urllib2.HTTPError:
435 print "Error finding start_otu taxonomic level. Do you have an internet connection?"
117 sys.exit(-1)436 sys.exit(-1)
118437
119 taxonomy = get_children(taxonomy,start_id)438 aphiaIDsDone = []
439 if not skip:
440 taxonomy = get_children(taxonomy,start_id,aphiaIDsDone)
120441
121 return taxonomy, start_taxonomy_level442 return taxonomy, start_taxonomy_level
122 443
@@ -136,9 +457,16 @@
136 default=False457 default=False
137 )458 )
138 parser.add_argument(459 parser.add_argument(
460 '-s',
461 '--skip',
462 action='store_true',
463 help="Skip online checking, just use taxonomy files",
464 default=False
465 )
466 parser.add_argument(
139 '--pref_db',467 '--pref_db',
140 help="Taxonomy database to use. Default is Species 2000/ITIS",468 help="Taxonomy database to use. Default is Species 2000/ITIS",
141 choices=['itis', 'worms', 'ncbi'],469 choices=['itis', 'worms', 'ncbi', 'eol'],
142 default = 'worms'470 default = 'worms'
143 )471 )
144 parser.add_argument(472 parser.add_argument(
@@ -178,58 +506,250 @@
178 top_level = args.top_level[0]506 top_level = args.top_level[0]
179 save_taxonomy_file = args.save_taxonomy507 save_taxonomy_file = args.save_taxonomy
180 tree_taxonomy = args.tree_taxonomy508 tree_taxonomy = args.tree_taxonomy
509 taxonomy = args.taxonomy_from_file
181 pref_db = args.pref_db510 pref_db = args.pref_db
511 skip = args.skip
182 if (save_taxonomy_file == None):512 if (save_taxonomy_file == None):
183 save_taxonomy = False513 save_taxonomy = False
184 else:514 else:
185 save_taxonomy = True515 save_taxonomy = True
516 load_tree_taxonomy = False
517 if (not tree_taxonomy == None):
518 tree_taxonomy_file = tree_taxonomy
519 load_tree_taxonomy = True
520 if skip:
521 if taxonomy == None:
522 print "Error: If you're skipping checking online, then you need to supply taxonomy files"
523 return
186524
187 # grab taxa in tree525 # grab taxa in tree
188 tree = stk.import_tree(input_file)526 tree = stk.import_tree(input_file)
189 taxa_list = stk._getTaxaFromNewick(tree)527 taxa_list = stk._getTaxaFromNewick(tree)
190528
191 taxonomy = {}529 if verbose:
192530 print "Taxa count for input tree: ", len(taxa_list)
193 # we're going to add the taxa in the tree to the taxonomy, to stop them531
532 # load in any taxonomy files - we still call the APIs as a) they may have updated data and
533 # b) the user may have missed some first time round (i.e. expanded the tree and not redone
534 # the taxonomy
535 if (taxonomy == None):
536 taxonomy = {}
537 else:
538 taxonomy = stk.load_taxonomy(taxonomy)
539 tree_taxonomy = {}
540 # this might also have tree_taxonomy in too - let's check this
541 for t in taxa_list:
542 if t in taxonomy:
543 tree_taxonomy[t] = taxonomy[t]
544 elif t.replace("_"," ") in taxonomy:
545 tree_taxonomy[t] = taxonomy[t.replace("_"," ")]
546
547 if (load_tree_taxonomy): # overwrite the good work above...
548 tree_taxonomy = stk.load_taxonomy(tree_taxonomy_file)
549 if (tree_taxonomy == None):
550 tree_taxonomy = {}
551
552 # we're going to add the taxa in the tree to the main WORMS taxonomy, to stop them
194 # being fetched in first place. We delete them later553 # being fetched in first place. We delete them later
554 # If you've loaded a taxonomy created by this script, this overwrites the tree taxa in the main taxonomy dict
555 # Don't worry, we put them back in before saving again!
195 for taxon in taxa_list:556 for taxon in taxa_list:
196 taxon = taxon.replace('_',' ')557 taxon = taxon.replace('_',' ')
197 taxonomy[taxon] = []558 taxonomy[taxon] = {}
198
199559
200 if (pref_db == 'itis'):560 if (pref_db == 'itis'):
201 # get taxonomy info from itis561 # get taxonomy info from itis
202 print "Sorry, ITIS is not implemented yet"562 if (verbose):
203 pass563 print "Getting data from ITIS"
564 if (verbose):
565 print "Dealing with taxa in tree"
566 for t in taxa_list:
567 if verbose:
568 print "\t"+t
569 if not(t in tree_taxonomy or t.replace("_"," ") in tree_taxonomy):
570 # we don't have data - NOTE we assume things are *not* updated here if we do
571 tree_taxonomy[t] = get_tree_taxa_taxonomy_itis(t)
572
573 if save_taxonomy:
574 if (verbose):
575 print "Saving tree taxonomy"
576 # note -temporary save as we overwrite this file later.
577 stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv')
578
579 # get taxonomy from worms
580 if verbose:
581 print "Now dealing with all other taxa - this might take a while..."
582 # create a temp file so we can checkpoint and continue
583 tmpf, tmpfile = tempfile.mkstemp()
584
585 if os.path.isfile('.fit_lock'):
586 f = open('.fit_lock','r')
587 tf = f.read()
588 f.close()
589 if os.path.isfile(tf.strip()):
590 taxonomy = stk.load_taxonomy(tf.strip())
591 os.remove('.fit_lock')
592
593 # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue
594 # where we left off.
595 with open(".fit_lock", 'w') as f:
596 f.write(tmpfile)
597 # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function
598 taxonomy, start_level = get_taxonomy_itis(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there
599
600 # clean up
601 os.close(tmpf)
602 os.remove('.fit_lock')
603 try:
604 os.remove('tmpfile')
605 except OSError:
606 pass
204 elif (pref_db == 'worms'):607 elif (pref_db == 'worms'):
608 if (verbose):
609 print "Getting data from WoRMS"
205 # get tree taxonomy from worms610 # get tree taxonomy from worms
206 if (tree_taxonomy == None):611 if (verbose):
207 tree_taxonomy = {}612 print "Dealing with taxa in tree"
208 for t in taxa_list:613
209 from SOAPpy import WSDL 614 for t in taxa_list:
210 wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1')615 if verbose:
211 tree_taxonomy[t] = get_tree_taxa_taxonomy(t,wsdlObjectWoRMS)616 print "\t"+t
212 else:617 if not(t in tree_taxonomy or t.replace("_"," ") in tree_taxonomy):
213 tree_taxonomy = stk.load_taxonomy(tree_taxonomy)618 # we don't have data - NOTE we assume things are *not* updated here if we do
619 tree_taxonomy[t] = get_tree_taxa_taxonomy_worms(t)
620
621 if save_taxonomy:
622 if (verbose):
623 print "Saving tree taxonomy"
624 # note -temporary save as we overwrite this file later.
625 stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv')
626
214 # get taxonomy from worms627 # get taxonomy from worms
215 taxonomy, start_level = get_taxonomy_worms(taxonomy,top_level)628 if verbose:
629 print "Now dealing with all other taxa - this might take a while..."
630 # create a temp file so we can checkpoint and continue
631 tmpf, tmpfile = tempfile.mkstemp()
632
633 if os.path.isfile('.fit_lock'):
634 f = open('.fit_lock','r')
635 tf = f.read()
636 f.close()
637 if os.path.isfile(tf.strip()):
638 taxonomy = stk.load_taxonomy(tf.strip())
639 os.remove('.fit_lock')
640
641 # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue
642 # where we left off.
643 with open(".fit_lock", 'w') as f:
644 f.write(tmpfile)
645 # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function
646 taxonomy, start_level = get_taxonomy_worms(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there
647
648 # clean up
649 os.close(tmpf)
650 os.remove('.fit_lock')
651 try:
652 os.remove('tmpfile')
653 except OSError:
654 pass
216655
217 elif (pref_db == 'ncbi'):656 elif (pref_db == 'ncbi'):
218 # get taxonomy from ncbi657 # get taxonomy from ncbi
219 print "Sorry, NCBI is not implemented yet" 658 print "Sorry, NCBI is not implemented yet"
220 pass659 pass
660 elif (pref_db == 'eol'):
661 if (verbose):
662 print "Getting data from EOL"
663 # get tree taxonomy from worms
664 if (verbose):
665 print "Dealing with taxa in tree"
666 for t in taxa_list:
667 if verbose:
668 print "\t"+t
669 try:
670 tree_taxonomy[t]
671 pass # we have data - NOTE we assume things are *not* updated here...
672 except KeyError:
673 try:
674 tree_taxonomy[t.replace('_',' ')]
675 except KeyError:
676 tree_taxonomy[t] = get_tree_taxa_taxonomy_eol(t)
677
678 if save_taxonomy:
679 if (verbose):
680 print "Saving tree taxonomy"
681 # note -temporary save as we overwrite this file later.
682 stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv')
683
684 # get taxonomy from worms
685 if verbose:
686 print "Now dealing with all other taxa - this might take a while..."
687 # create a temp file so we can checkpoint and continue
688 tmpf, tmpfile = tempfile.mkstemp()
689
690 if os.path.isfile('.fit_lock'):
691 f = open('.fit_lock','r')
692 tf = f.read()
693 f.close()
694 if os.path.isfile(tf.strip()):
695 taxonomy = stk.load_taxonomy(tf.strip())
696 os.remove('.fit_lock')
697
698 # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue
699 # where we left off.
700 with open(".fit_lock", 'w') as f:
701 f.write(tmpfile)
702 # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function
703 taxonomy, start_level = get_taxonomy_eol(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there
704
705 # clean up
706 os.close(tmpf)
707 os.remove('.fit_lock')
708 try:
709 os.remove('tmpfile')
710 except OSError:
711 pass
221 else:712 else:
222 print "ERROR: Didn't understand you database choice"713 print "ERROR: Didn't understand your database choice"
223 sys.exit(-1)714 sys.exit(-1)
224715
225 # clean up taxonomy, deleting the ones already in the tree716 # clean up taxonomy, deleting the ones already in the tree
226 for taxon in taxa_list:717 for taxon in taxa_list:
227 taxon = taxon.replace('_',' ') 718 taxon = taxon.replace('_',' ')
228 del taxonomy[taxon]719 try:
720 del taxonomy[taxon]
721 except KeyError:
722 pass # if it's not there, so we care?
723
724 # We now have 2 taxonomies:
725 # - for taxa in the tree
726 # - for all other taxa in the clade of interest
727
728 if save_taxonomy:
729 tot_taxonomy = taxonomy.copy()
730 tot_taxonomy.update(tree_taxonomy)
731 stk.save_taxonomy(tot_taxonomy,save_taxonomy_file)
732
733
734 orig_taxa_list = taxa_list
735
736 remove_higher_level = [] # for storing the higher level taxa in the original tree that need deleting
737 generic = []
738 # find all the generic and build an internal subs file
739 for t in taxa_list:
740 t = t.replace(" ","_")
741 if t.find("_") == -1:
742 # no underscore, so just generic
743 generic.append(t)
229744
230 # step up the taxonomy levels from genus, adding taxa to the correct node745 # step up the taxonomy levels from genus, adding taxa to the correct node
231 # as a polytomy746 # as a polytomy
232 for level in taxonomy_levels[1::]: # skip species....747 start_level = start_level.encode('utf-8').strip()
748 if verbose:
749 print "I think your start OTU is at: ", start_level
750 for level in tlevels[1::]: # skip species....
751 if verbose:
752 print "Dealing with ",level
233 new_taxa = []753 new_taxa = []
234 for t in taxonomy:754 for t in taxonomy:
235 # skip odd ones that should be in there755 # skip odd ones that should be in there
@@ -239,135 +759,61 @@
239 except KeyError:759 except KeyError:
240 continue # don't have this info760 continue # don't have this info
241 new_taxa = _uniquify(new_taxa)761 new_taxa = _uniquify(new_taxa)
762
242 for nt in new_taxa:763 for nt in new_taxa:
243 taxa_to_add = []764 taxa_to_add = {}
244 taxa_in_clade = []765 taxa_in_clade = []
245 for t in taxonomy:766 for t in taxonomy:
246 if start_level in taxonomy[t] and taxonomy[t][start_level] == top_level:767 if start_level in taxonomy[t] and taxonomy[t][start_level] == top_level:
247 try:768 try:
248 if taxonomy[t][level] == nt:769 if taxonomy[t][level] == nt and not t in taxa_list:
249 taxa_to_add.append(t.replace(' ','_'))770 taxa_to_add[t] = taxonomy[t]
250 except KeyError:771 except KeyError:
251 continue772 continue
773
252 # add to tree774 # add to tree
253 for t in taxa_list:775 for t in taxa_list:
254 if level in tree_taxonomy[t] and tree_taxonomy[t][level] == nt:776 if level in tree_taxonomy[t] and tree_taxonomy[t][level] == nt:
255 taxa_in_clade.append(t)777 taxa_in_clade.append(t)
256 if len(taxa_in_clade) > 0:778 if t in generic:
257 tree = add_taxa(tree, taxa_to_add, taxa_in_clade)779 # we are appending taxa to this higher taxon, so we need to remove it
258 for t in taxa_to_add: # clean up taxonomy780 remove_higher_level.append(t)
259 del taxonomy[t.replace('_',' ')]781
260782
261783 if len(taxa_in_clade) > 0 and len(taxa_to_add) > 0:
784 tree = add_taxa(tree, taxa_to_add, taxa_in_clade,level)
785 try:
786 taxa_list = stk._getTaxaFromNewick(tree)
787 except stk.TreeParseError as e:
788 print taxa_to_add, taxa_in_clade, level, tree
789 print e.msg
790 return
791
792 for t in taxa_to_add:
793 tree_taxonomy[t.replace(' ','_')] = taxa_to_add[t]
794 try:
795 del taxonomy[t.replace('_',' ')]
796 except KeyError:
797 # It might have _ or it might not...
798 del taxonomy[t]
799
800
801 # remove singelton nodes
802 tree = stk._collapse_nodes(tree)
803 tree = stk._collapse_nodes(tree)
804 tree = stk._collapse_nodes(tree)
805
806 tree = stk._sub_taxa_in_tree(tree, remove_higher_level)
262 trees = {}807 trees = {}
263 trees['tree_1'] = tree808 trees['tree_1'] = tree
264 output = stk._amalgamate_trees(trees,format='nexus')809 output = stk._amalgamate_trees(trees,format='nexus')
265 f = open(output_file, "w")810 f = open(output_file, "w")
266 f.write(output)811 f.write(output)
267 f.close()812 f.close()
268813 taxa_list = stk._getTaxaFromNewick(tree)
269 if not save_taxonomy_file == None:814
270 with open(save_taxonomy_file, 'w') as f:815 print "Final taxa count:", len(taxa_list)
271 writer = csv.writer(f)816
272 headers = []
273 headers.append("OTU")
274 headers.extend(taxonomy_levels)
275 headers.append("Data source")
276 writer.writerow(headers)
277 for t in taxonomy:
278 otu = t
279 try:
280 species = taxonomy[t]['species']
281 except KeyError:
282 species = "-"
283 try:
284 genus = taxonomy[t]['genus']
285 except KeyError:
286 genus = "-"
287 try:
288 family = taxonomy[t]['family']
289 except KeyError:
290 family = "-"
291 try:
292 superfamily = taxonomy[t]['superfamily']
293 except KeyError:
294 superfamily = "-"
295 try:
296 infraorder = taxonomy[t]['infraorder']
297 except KeyError:
298 infraorder = "-"
299 try:
300 suborder = taxonomy[t]['suborder']
301 except KeyError:
302 suborder = "-"
303 try:
304 order = taxonomy[t]['order']
305 except KeyError:
306 order = "-"
307 try:
308 superorder = taxonomy[t]['superorder']
309 except KeyError:
310 superorder = "-"
311 try:
312 subclass = taxonomy[t]['subclass']
313 except KeyError:
314 subclass = "-"
315 try:
316 tclass = taxonomy[t]['class']
317 except KeyError:
318 tclass = "-"
319 try:
320 subphylum = taxonomy[t]['subphylum']
321 except KeyError:
322 subphylum = "-"
323 try:
324 phylum = taxonomy[t]['phylum']
325 except KeyError:
326 phylum = "-"
327 try:
328 superphylum = taxonomy[t]['superphylum']
329 except KeyError:
330 superphylum = "-"
331 try:
332 infrakingdom = taxonomy[t]['infrakingdom']
333 except:
334 infrakingdom = "-"
335 try:
336 subkingdom = taxonomy[t]['subkingdom']
337 except:
338 subkingdom = "-"
339 try:
340 kingdom = taxonomy[t]['kingdom']
341 except KeyError:
342 kingdom = "-"
343 try:
344 provider = taxonomy[t]['provider']
345 except KeyError:
346 provider = "-"
347
348 if (isinstance(species, list)):
349 species = " ".join(species)
350 this_classification = [
351 otu.encode('utf-8'),
352 species.encode('utf-8'),
353 genus.encode('utf-8'),
354 family.encode('utf-8'),
355 superfamily.encode('utf-8'),
356 infraorder.encode('utf-8'),
357 suborder.encode('utf-8'),
358 order.encode('utf-8'),
359 superorder.encode('utf-8'),
360 subclass.encode('utf-8'),
361 tclass.encode('utf-8'),
362 subphylum.encode('utf-8'),
363 phylum.encode('utf-8'),
364 superphylum.encode('utf-8'),
365 infrakingdom.encode('utf-8'),
366 subkingdom.encode('utf-8'),
367 kingdom.encode('utf-8'),
368 provider.encode('utf-8')]
369 writer.writerow(this_classification)
370
371817
372def _uniquify(l):818def _uniquify(l):
373 """819 """
@@ -379,28 +825,119 @@
379825
380 return keys.keys()826 return keys.keys()
381827
382def add_taxa(tree, new_taxa, taxa_in_clade):828def add_taxa(tree, new_taxa, taxa_in_clade, level):
383829
384 # create new tree of the new taxa830 # create new tree of the new taxa
385 #tree_string = "(" + ",".join(new_taxa) + ");"831 additionalTaxa = tree_from_taxonomy(level,new_taxa)
386 #additionalTaxa = stk._parse_tree(tree_string)
387832
388 # find mrca parent833 # find mrca parent
389 treeobj = stk._parse_tree(tree)834 treeobj = stk._parse_tree(tree)
390 mrca = stk.get_mrca(tree,taxa_in_clade)835 mrca = stk.get_mrca(tree,taxa_in_clade)
391 mrca_parent = treeobj.node(mrca).parent836 if (mrca == 0):
392837 # we need to make a new tree! The additional taxa are being placed at the root of the tree
393 # insert a node into the tree between the MRCA and it's parent (p4.addNodeBetweenNodes)838 t = Tree()
394 newNode = treeobj.addNodeBetweenNodes(mrca, mrca_parent)839 A = t.add_child()
395840 B = t.add_child()
396 # add the new tree at the new node using p4.addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None)841 t1 = Tree(additionalTaxa)
397 #treeobj.addSubTree(newNode, additionalTaxa)842 t2 = Tree(tree)
398 for t in new_taxa:843 A.add_child(t1)
399 treeobj.addSibLeaf(newNode,t)844 B.add_child(t2)
400845 return t.write(format=9)
401 # return new tree846 else:
847 mrca = treeobj.nodes[mrca]
848 additionalTaxa = stk._parse_tree(additionalTaxa)
849
850 if len(taxa_in_clade) == 1:
851 taxon = treeobj.node(taxa_in_clade[0])
852 mrca = treeobj.addNodeBetweenNodes(taxon,mrca)
853
854
855 # insert a node into the tree between the MRCA and it's parent (p4.addNodeBetweenNodes)
856 # newNode = treeobj.addNodeBetweenNodes(mrca, mrca_parent)
857
858 # add the new tree at the new node using p4.addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None)
859 treeobj.addSubTree(mrca, additionalTaxa, ignoreRootAssert=True)
860
402 return treeobj.writeNewick(fName=None,toString=True).strip()861 return treeobj.writeNewick(fName=None,toString=True).strip()
403862
863
864
865def tree_from_taxonomy(top_level, tree_taxonomy):
866
867 start_level = taxonomy_levels.index(top_level)
868 new_taxa = tree_taxonomy.keys()
869
870 tl_types = []
871 for tt in tree_taxonomy:
872 tl_types.append(tree_taxonomy[tt][top_level])
873
874 tl_types = _uniquify(tl_types)
875 levels_to_worry_about = tlevels[0:tlevels.index(top_level)+1]
876
877 t = Tree()
878 nodes = {}
879 nodes[top_level] = []
880 for tl in tl_types:
881 n = t.add_child(name=tl)
882 nodes[top_level].append({tl:n})
883
884 for l in levels_to_worry_about[-2::-1]:
885 names = []
886 nodes[l] = []
887 ci = levels_to_worry_about.index(l)
888 for tt in tree_taxonomy:
889 try:
890 names.append(tree_taxonomy[tt][l])
891 except KeyError:
892 pass
893 names = _uniquify(names)
894 for n in names:
895 # find my parent
896 parent = None
897 for tt in tree_taxonomy:
898 try:
899 if tree_taxonomy[tt][l] == n:
900 try:
901 parent = tree_taxonomy[tt][levels_to_worry_about[ci+1]]
902 level = ci+1
903 except KeyError:
904 try:
905 parent = tree_taxonomy[tt][levels_to_worry_about[ci+2]]
906 level = ci+2
907 except KeyError:
908 try:
909 parent = tree_taxonomy[tt][levels_to_worry_about[ci+3]]
910 level = ci+3
911 except KeyError:
912 print "ERROR: tried to find some taxonomic info for "+tt+" from tree_taxonomy file/downloaded data and I went two levels up, but failed find any. Looked at:\n"
913 print "\t"+levels_to_worry_about[ci+1]
914 print "\t"+levels_to_worry_about[ci+2]
915 print "\t"+levels_to_worry_about[ci+3]
916 print "This is the taxonomy info I have for "+tt
917 print tree_taxonomy[tt]
918 sys.exit(1)
919
920 k = []
921 for nd in nodes[levels_to_worry_about[level]]:
922 k.extend(nd.keys())
923 i = 0
924 for kk in k:
925 if kk == parent:
926 break
927 i += 1
928 parent_id = i
929 break
930 except KeyError:
931 pass # no data at this level for this beastie
932 # find out where to attach it
933 node_id = nodes[levels_to_worry_about[level]][parent_id][parent]
934 nd = node_id.add_child(name=n.replace(" ","_"))
935 nodes[l].append({n:nd})
936
937 tree = t.write(format=9)
938
939 return tree
940
404if __name__ == "__main__":941if __name__ == "__main__":
405 main()942 main()
406943
407944
=== modified file 'stk/scripts/plot_character_taxa_matrix.py'
--- stk/scripts/plot_character_taxa_matrix.py 2014-12-10 08:55:43 +0000
+++ stk/scripts/plot_character_taxa_matrix.py 2017-01-12 09:27:31 +0000
@@ -42,6 +42,18 @@
42 default=False42 default=False
43 )43 )
44 parser.add_argument(44 parser.add_argument(
45 '-t',
46 '--taxonomy',
47 help="Use taxonomy to sort the taxa on the axis. Supply a STK taxonomy file",
48 )
49 parser.add_argument(
50 '--level',
51 choices=['family','superfamily','infraorder','suborder','order'],
52 default='family',
53 help="""What level to group the taxonomy at. Default is family.
54 Note data for a particular levelmay be missing in taxonomy."""
55 )
56 parser.add_argument(
45 'input_file', 57 'input_file',
46 metavar='input_file',58 metavar='input_file',
47 nargs=1,59 nargs=1,
@@ -59,14 +71,58 @@
59 verbose = args.verbose71 verbose = args.verbose
60 input_file = args.input_file[0]72 input_file = args.input_file[0]
61 output_file = args.output_file[0]73 output_file = args.output_file[0]
74 taxonomy = args.taxonomy
75 level = args.level
6276
63 XML = stk.load_phyml(input_file)77 XML = stk.load_phyml(input_file)
78 if not taxonomy == None:
79 taxonomy = stk.load_taxonomy(taxonomy)
80
64 all_taxa = stk.get_all_taxa(XML)81 all_taxa = stk.get_all_taxa(XML)
65 all_chars_d = stk.get_all_characters(XML)82 all_chars_d = stk.get_all_characters(XML)
66 all_chars = []83 all_chars = []
67 for c in all_chars_d:84 for c in all_chars_d:
68 all_chars.extend(all_chars_d[c])85 all_chars.extend(all_chars_d[c])
6986
87 if not taxonomy == None:
88 tax_data = {}
89 new_all_taxa = []
90 for t in all_taxa:
91 taxon = t.replace("_"," ")
92 try:
93 if taxonomy[taxon][level] == "":
94 # skip this
95 continue
96 tax_data[t] = taxonomy[taxon][level]
97 except KeyError:
98 print "Couldn't find "+t+" in taxonomy. Adding as null data"
99 tax_data[t] = 'zzzzz' # it's at the end...
100
101 from sets import Set
102 unique = set(tax_data.values())
103 unique = list(unique)
104 unique.sort()
105 print "Groups are:"
106 print unique
107 counts = []
108 for u in unique:
109 count = 0
110 for t in tax_data:
111 if tax_data[t] == u:
112 count += 1
113 new_all_taxa.append(t)
114 counts.append(count)
115
116 all_taxa = new_all_taxa
117 # cumulate counts
118 count_cumulate = []
119 count_cumulate.append(counts[0])
120 for c in counts[1::]:
121 count_cumulate.append(c+count_cumulate[-1])
122
123 print count_cumulate
124
125
70 taxa_character_matrix = {}126 taxa_character_matrix = {}
71 for t in all_taxa:127 for t in all_taxa:
72 taxa_character_matrix[t] = []128 taxa_character_matrix[t] = []
@@ -77,7 +133,8 @@
77 taxa = stk.get_taxa_from_tree(XML,t, sort=True)133 taxa = stk.get_taxa_from_tree(XML,t, sort=True)
78 for taxon in taxa:134 for taxon in taxa:
79 taxon = taxon.replace(" ","_")135 taxon = taxon.replace(" ","_")
80 taxa_character_matrix[taxon].extend(chars)136 if taxon in all_taxa:
137 taxa_character_matrix[taxon].extend(chars)
81 138
82 for t in taxa_character_matrix:139 for t in taxa_character_matrix:
83 array = taxa_character_matrix[t]140 array = taxa_character_matrix[t]
@@ -92,6 +149,31 @@
92 x.append(i)149 x.append(i)
93 y.append(j)150 y.append(j)
94151
152
153 i = 0
154 for j in all_chars:
155 # do a substitution of character names to tidy things up
156 if j.lower().startswith('mitochondrial carrier; adenine nucleotide translocator'):
157 j = "ANT"
158 if j.lower().startswith('mitochondrially encoded 12s'):
159 j = '12S'
160 if j.lower().startswith('complete mitochondrial genome'):
161 j = 'Mitogenome'
162 if j.lower().startswith('mtdna'):
163 j = "mtDNA restriction sites"
164 if j.lower().startswith('h3 histone'):
165 j = 'H3'
166 if j.lower().startswith('mitochondrially encoded cytochrome'):
167 j = 'COI'
168 if j.lower().startswith('rna, 28s'):
169 j = '28S'
170 if j.lower().startswith('rna, 18s'):
171 j = '18S'
172 if j.lower().startswith('mitochondrially encoded 16s'):
173 j = '16S'
174 all_chars[i] = j
175 i += 1
176
95 fig=figure(figsize=(22,17),dpi=90)177 fig=figure(figsize=(22,17),dpi=90)
96 fig.subplots_adjust(left=0.3)178 fig.subplots_adjust(left=0.3)
97 ax = fig.add_subplot(1,1,1)179 ax = fig.add_subplot(1,1,1)
98180
=== modified file 'stk/scripts/plot_tree_taxa_matrix.py'
--- stk/scripts/plot_tree_taxa_matrix.py 2014-12-10 08:55:43 +0000
+++ stk/scripts/plot_tree_taxa_matrix.py 2017-01-12 09:27:31 +0000
@@ -43,6 +43,18 @@
43 default=False43 default=False
44 )44 )
45 parser.add_argument(45 parser.add_argument(
46 '-t',
47 '--taxonomy',
48 help="Use taxonomy to sort the taxa on the axis. Supply a STK taxonomy file",
49 )
50 parser.add_argument(
51 '--level',
52 choices=['family','superfamily','infraorder','suborder','order'],
53 default='family',
54 help="""What level to group the taxonomy at. Default is family.
55 Note data for a particular levelmay be missing in taxonomy."""
56 )
57 parser.add_argument(
46 'input_file', 58 'input_file',
47 metavar='input_file',59 metavar='input_file',
48 nargs=1,60 nargs=1,
@@ -60,13 +72,57 @@
60 verbose = args.verbose72 verbose = args.verbose
61 input_file = args.input_file[0]73 input_file = args.input_file[0]
62 output_file = args.output_file[0]74 output_file = args.output_file[0]
75 taxonomy = args.taxonomy
76 level = args.level
6377
64 XML = stk.load_phyml(input_file)78 XML = stk.load_phyml(input_file)
79 if not taxonomy == None:
80 taxonomy = stk.load_taxonomy(taxonomy)
81
65 all_taxa = stk.get_all_taxa(XML)82 all_taxa = stk.get_all_taxa(XML)
6683
67 taxa_tree_matrix = {}84 taxa_tree_matrix = {}
68 for t in all_taxa:85 for t in all_taxa:
69 taxa_tree_matrix[t] = []86 taxa_tree_matrix[t] = []
87
88 if not taxonomy == None:
89 tax_data = {}
90 new_all_taxa = []
91 for t in all_taxa:
92 taxon = t.replace("_"," ")
93 try:
94 if taxonomy[taxon][level] == "":
95 # skip this
96 continue
97 tax_data[t] = taxonomy[taxon][level]
98 except KeyError:
99 print "Couldn't find "+t+" in taxonomy. Adding as null data"
100 tax_data[t] = 'zzzzz' # it's at the end...
101
102 from sets import Set
103 unique = set(tax_data.values())
104 unique = list(unique)
105 unique.sort()
106 print "Groups are:"
107 print unique
108 counts = []
109 for u in unique:
110 count = 0
111 for t in tax_data:
112 if tax_data[t] == u:
113 count += 1
114 new_all_taxa.append(t)
115 counts.append(count)
116
117 all_taxa = new_all_taxa
118 # cumulate counts
119 count_cumulate = []
120 count_cumulate.append(counts[0])
121 for c in counts[1::]:
122 count_cumulate.append(c+count_cumulate[-1])
123
124 print count_cumulate
125
70126
71 trees = stk.obtain_trees(XML)127 trees = stk.obtain_trees(XML)
72 i = 0128 i = 0
73129
=== modified file 'stk/scripts/remove_poorly_constrained_taxa.py'
--- stk/scripts/remove_poorly_constrained_taxa.py 2014-04-18 11:57:14 +0000
+++ stk/scripts/remove_poorly_constrained_taxa.py 2017-01-12 09:27:31 +0000
@@ -12,8 +12,8 @@
1212
13 # do stuff13 # do stuff
14 parser = argparse.ArgumentParser(14 parser = argparse.ArgumentParser(
15 prog="convert tree from specific to generic",15 prog="remove poorly contrained taxa",
16 description="""Converts a tree at specific level to generic level""",16 description="""Remove taxa that appea in one source tree only.""",
17 )17 )
18 parser.add_argument(18 parser.add_argument(
19 '-v', 19 '-v',
@@ -34,6 +34,13 @@
34 " to removal those in polytomies *and* only in one other tree."34 " to removal those in polytomies *and* only in one other tree."
35 )35 )
36 parser.add_argument(36 parser.add_argument(
37 '--tree_only',
38 default=False,
39 action='store_true',
40 help="Restrict removal of taxa that only occur in one source tree. Default"+
41 " to removal those in polytomies *and* only in one other tree."
42 )
43 parser.add_argument(
37 'input_phyml', 44 'input_phyml',
38 metavar='input_phyml',45 metavar='input_phyml',
39 nargs=1,46 nargs=1,
@@ -43,13 +50,13 @@
43 'input_tree', 50 'input_tree',
44 metavar='input_tree',51 metavar='input_tree',
45 nargs=1,52 nargs=1,
46 help="Your tree"53 help="Your tree - can be NULL or None"
47 )54 )
48 parser.add_argument(55 parser.add_argument(
49 'output_tree', 56 'output_tree',
50 metavar='output_tree',57 metavar='output_tree',
51 nargs=1,58 nargs=1,
52 help="Your output tree"59 help="Your output tree or phyml - if input_tree is none, this is the Phyml"
53 )60 )
5461
5562
@@ -62,14 +69,20 @@
62 dl = True69 dl = True
63 poly_only = args.poly_only70 poly_only = args.poly_only
64 input_tree = args.input_tree[0]71 input_tree = args.input_tree[0]
65 output_tree = args.output_tree[0]72 if input_tree == 'NULL' or input_tree == 'None':
73 input_tree = None
74 output_file = args.output_tree[0]
66 input_phyml = args.input_phyml[0]75 input_phyml = args.input_phyml[0]
6776
68 XML = stk.load_phyml(input_phyml)77 XML = stk.load_phyml(input_phyml)
69 # load tree78 # load tree
70 supertree = stk.import_tree(input_tree)79 if (not input_tree == None):
80 supertree = stk.import_tree(input_tree)
81 taxa = stk._getTaxaFromNewick(supertree)
82 else:
83 supertree = None
84 taxa = stk.get_all_taxa(XML)
71 # grab taxa85 # grab taxa
72 taxa = stk._getTaxaFromNewick(supertree)
73 delete_list = []86 delete_list = []
7487
75 # loop over taxa in supertree and get some stats88 # loop over taxa in supertree and get some stats
@@ -115,19 +128,29 @@
115128
116 print "Taxa: "+str(len(taxa))129 print "Taxa: "+str(len(taxa))
117 print "Deleting: "+str(len(delete_list))130 print "Deleting: "+str(len(delete_list))
118 # done, so delete the problem taxa from the supertree131
119 for t in delete_list:132 if not supertree == None:
120 # remove taxa from supertree133 # done, so delete the problem taxa from the supertree
121 supertree = stk._sub_taxa_in_tree(supertree,t)134 for t in delete_list:
122135 # remove taxa from supertree
123 # save supertree136 supertree = stk._sub_taxa_in_tree(supertree,t)
124 tree = {}137
125 tree['Tree_1'] = supertree138 # save supertree
126 output = stk._amalgamate_trees(tree,format='nexus')139 tree = {}
127 # write file140 tree['Tree_1'] = supertree
128 f = open(output_tree,"w")141 output = stk._amalgamate_trees(tree,format='nexus')
129 f.write(output)142 # write file
130 f.close()143 f = open(output_file,"w")
144 f.write(output)
145 f.close()
146 else:
147 new_phyml = stk.substitute_taxa(XML,delete_list)
148 # write file
149 f = open(output_file,"w")
150 f.write(new_phyml)
151 f.close()
152
153
131154
132 if (dl):155 if (dl):
133 # write file156 # write file
134157
=== added file 'stk/scripts/tree_from_taxonomy.py'
--- stk/scripts/tree_from_taxonomy.py 1970-01-01 00:00:00 +0000
+++ stk/scripts/tree_from_taxonomy.py 2017-01-12 09:27:31 +0000
@@ -0,0 +1,142 @@
1# trees ready for supretree construction.
2# Copyright (C) 2015, Jon Hill, Katie Davis
3#
4# This program is free software: you can redistribute it and/or modify
5# it under the terms of the GNU General Public License as published by
6# the Free Software Foundation, either version 3 of the License, or
7# (at your option) any later version.
8#
9# This program is distributed in the hope that it will be useful,
10# but WITHOUT ANY WARRANTY; without even the implied warranty of
11# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12# GNU General Public License for more details.
13#
14# You should have received a copy of the GNU General Public License
15# along with this program. If not, see <http://www.gnu.org/licenses/>.
16#
17# Jon Hill. jon.hill@york.ac.uk
18
19import argparse
20import copy
21import os
22import sys
23stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir )
24sys.path.insert(0, stk_path)
25import supertree_toolkit as stk
26import csv
27from ete2 import Tree
28
29taxonomy_levels = ['species','subgenus','genus','subfamily','family','superfamily','subsection','section','infraorder','suborder','order','superorder','subclass','class','superclass','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom']
30tlevels = ['species','genus','family','order','class','phylum','kingdom']
31
32
33def main():
34
35 # do stuff
36 parser = argparse.ArgumentParser(
37 prog="create a tree from a taxonomy file",
38 description="Create a taxonomic tree",
39 )
40 parser.add_argument(
41 '-v',
42 '--verbose',
43 action='store_true',
44 help="Verbose output: mainly progress reports.",
45 default=False
46 )
47 parser.add_argument(
48 'top_level',
49 nargs=1,
50 help="The top level group to start with, e.g. family"
51 )
52 parser.add_argument(
53 'input_file',
54 metavar='input_file',
55 nargs=1,
56 help="Your taxonomy file"
57 )
58 parser.add_argument(
59 'output_file',
60 metavar='output_file',
61 nargs=1,
62 help="Your new tree file"
63 )
64
65 args = parser.parse_args()
66 verbose = args.verbose
67 input_file = args.input_file[0]
68 output_file = args.output_file[0]
69 top_level = args.top_level[0]
70
71 start_level = taxonomy_levels.index(top_level)
72 tree_taxonomy = stk.load_taxonomy(input_file)
73 new_taxa = tree_taxonomy.keys()
74
75 tl_types = []
76 for tt in tree_taxonomy:
77 tl_types.append(tree_taxonomy[tt][top_level])
78
79 tl_types = _uniquify(tl_types)
80 levels_to_worry_about = tlevels[0:tlevels.index(top_level)+1]
81
82 #print levels_to_worry_about[-2::-1]
83
84 t = Tree()
85 nodes = {}
86 nodes[top_level] = []
87 for tl in tl_types:
88 n = t.add_child(name=tl)
89 nodes[top_level].append({tl:n})
90
91 for l in levels_to_worry_about[-2::-1]:
92 #print t
93 names = []
94 nodes[l] = []
95 ci = levels_to_worry_about.index(l)
96 for tt in tree_taxonomy:
97 names.append(tree_taxonomy[tt][l])
98 names = _uniquify(names)
99 for n in names:
100 #print n
101 # find my parent
102 parent = None
103 for tt in tree_taxonomy:
104 if tree_taxonomy[tt][l] == n:
105 parent = tree_taxonomy[tt][levels_to_worry_about[ci+1]]
106 k = []
107 for nd in nodes[levels_to_worry_about[ci+1]]:
108 k.extend(nd.keys())
109 i = 0
110 for kk in k:
111 print kk
112 if kk == parent:
113 break
114 i += 1
115 parent_id = i
116 break
117 # find out where to attach it
118 node_id = nodes[levels_to_worry_about[ci+1]][parent_id][parent]
119 nd = node_id.add_child(name=n.replace(" ","_"))
120 nodes[l].append({n:nd})
121
122 tree = t.write(format=9)
123 tree = stk._collapse_nodes(tree)
124 tree = stk._collapse_nodes(tree)
125 print tree
126
127
128def _uniquify(l):
129 """
130 Make a list, l, contain only unique data
131 """
132 keys = {}
133 for e in l:
134 keys[e] = 1
135
136 return keys.keys()
137
138if __name__ == "__main__":
139 main()
140
141
142
0143
=== modified file 'stk/stk'
--- stk/stk 2014-12-09 10:58:48 +0000
+++ stk/stk 2017-01-12 09:27:31 +0000
@@ -23,6 +23,7 @@
23import sys23import sys
24import argparse24import argparse
25import traceback25import traceback
26import time
26try:27try:
27 __file__28 __file__
28except NameError:29except NameError:
@@ -41,6 +42,10 @@
41import string42import string
42import stk.p4 as p443import stk.p4 as p4
43import lxml44import lxml
45import csv
46import tempfile
47from subprocess import check_call, CalledProcessError, call
48
44import stk.bzr_version as bzr_version49import stk.bzr_version as bzr_version
45d = bzr_version.version_info50d = bzr_version.version_info
46build = d.get('revno','<unknown revno>')51build = d.get('revno','<unknown revno>')
@@ -366,7 +371,7 @@
366371
367 # Clean data372 # Clean data
368 parser_cm = subparsers.add_parser('clean_data',373 parser_cm = subparsers.add_parser('clean_data',
369 help='Remove errant taxa, uninformative trees and empty sources.'374 help='Renames all sources and trees sensibly. Removes errant taxa, uninformative trees and empty sources.'
370 )375 )
371 parser_cm.add_argument('input', 376 parser_cm.add_argument('input',
372 help='The input phyml file')377 help='The input phyml file')
@@ -488,7 +493,81 @@
488 parser_cm.add_argument('subs',493 parser_cm.add_argument('subs',
489 help='The subs file')494 help='The subs file')
490 parser_cm.set_defaults(func=check_subs)495 parser_cm.set_defaults(func=check_subs)
491 496
497 # taxonomic name checker
498 parser_cm = subparsers.add_parser('check_otus',
499 help='Check your OTUs against EoL.'
500 )
501 parser_cm.add_argument('input',
502 help='The input Phyml. Also accepts tree files or a simple list')
503 parser_cm.add_argument('output',
504 help='The output CSV file. Taxon, synonyms, status')
505 parser_cm.add_argument('--overwrite',
506 action='store_true',
507 default=False,
508 help="Overwrite the existing file without asking for confirmation")
509 parser_cm.set_defaults(func=check_otus)
510
511 # create taxonomy csv file
512 parser_cm = subparsers.add_parser('create_taxonomy',
513 help='Create a taxonomy file in CSV for you to then augment.'
514 )
515 parser_cm.add_argument('input',
516 help='The input Phyml. Also accepts tree files or a simple list')
517 parser_cm.add_argument('output',
518 help='The output CSV file. Name, followed by classification and source')
519 parser_cm.add_argument('--overwrite',
520 action='store_true',
521 default=False,
522 help="Overwrite the existing file without asking for confirmation")
523 parser_cm.add_argument('--taxonomy',
524 help="Give a starting taxonomy file, e.g. one you ran earlier",)
525 parser_cm.set_defaults(func=create_taxonomy)
526
527
528 # do the subs in a one go using taxonomy
529 parser_cm = subparsers.add_parser('auto_subs',
530 help='Using a taxonomy, generate a species level version of your data in one go.'
531 )
532 parser_cm.add_argument('input',
533 help='The input Phyml')
534 parser_cm.add_argument('taxonomy',
535 help='Your taxonomy file',
536 )
537 parser_cm.add_argument('output',
538 help='The output phyml')
539 parser_cm.add_argument('--overwrite',
540 action='store_true',
541 default=False,
542 help="Overwrite the existing file without asking for confirmation")
543 #parser_cm.add_argument('--level',
544 # choices=supertree_toolkit.taxonomy_levels,
545 # help="Taxonomic level to output at",)
546 parser_cm.set_defaults(func=auto_subs)
547
548
549 # attempt to process the data into a matrix all automatically
550 parser_cm = subparsers.add_parser('process',
551 help='Generate a species-level matrix, and do all the checks and processing automatically. Note this creates a taxonomy and does all the processing, but will not be perfect (as taxonomies are not perfect)'
552 )
553 parser_cm.add_argument('input',
554 help='The input Phyml')
555 parser_cm.add_argument('output',
556 help='The output matrix')
557 parser_cm.add_argument('--taxonomy_file',
558 help='Existing taxonomy file to prevent redownloading data. Any taxa not in the file will be checked online, so partial complete file are OK.')
559 parser_cm.add_argument('--equivalents_file',
560 help='Existing equivalents file from a taxonomic name check. Any taxa not in the file will be checked online, so partially complete files are OK.')
561 parser_cm.add_argument('--overwrite',
562 action='store_true',
563 default=False,
564 help="Overwrite the existing file without asking for confirmation")
565 parser_cm.add_argument('--no_store',
566 action="store_true",
567 default=False,
568 help="Do not store intermediate files -- not recommended")
569 parser_cm.set_defaults(func=process)
570
492571
493 # before we let argparse work its magic, check for --version572 # before we let argparse work its magic, check for --version
494 if "--version" in sys.argv:573 if "--version" in sys.argv:
@@ -602,7 +681,7 @@
602 # check if output files are there681 # check if output files are there
603 if (output_file and os.path.exists(output_file) and not overwrite):682 if (output_file and os.path.exists(output_file) and not overwrite):
604 print "Output file exists. Either remove the file or use the --overwrite flag."683 print "Output file exists. Either remove the file or use the --overwrite flag."
605 print "Do you wish to continue? [Y/n]"684 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
606 while True:685 while True:
607 k=inkey()686 k=inkey()
608 if k.lower() == 'n':687 if k.lower() == 'n':
@@ -612,7 +691,7 @@
612 break691 break
613 if (not newphyml == None and os.path.exists(newphyml) and not overwrite):692 if (not newphyml == None and os.path.exists(newphyml) and not overwrite):
614 print "Output Phyml file exists. Either remove the file or use the --overwrite flag."693 print "Output Phyml file exists. Either remove the file or use the --overwrite flag."
615 print "Do you wish to continue? [Y/n]"694 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
616 while True:695 while True:
617 k=inkey()696 k=inkey()
618 if k.lower() == 'n':697 if k.lower() == 'n':
@@ -624,9 +703,9 @@
624 XML = supertree_toolkit.load_phyml(input_file)703 XML = supertree_toolkit.load_phyml(input_file)
625 try:704 try:
626 if (newphyml == None):705 if (newphyml == None):
627 data_independence = supertree_toolkit.data_independence(XML,ignoreWarnings=ignoreWarnings)706 data_independence, subsets = supertree_toolkit.data_independence(XML,ignoreWarnings=ignoreWarnings)
628 else:707 else:
629 data_independence, new_phyml = supertree_toolkit.data_independence(XML,make_new_xml=True,ignoreWarnings=ignoreWarnings)708 data_independence, subsets, new_phyml = supertree_toolkit.data_independence(XML,make_new_xml=True,ignoreWarnings=ignoreWarnings)
630 except NotUniqueError as detail:709 except NotUniqueError as detail:
631 msg = "***Error: Failed to check independence.\n"+detail.msg710 msg = "***Error: Failed to check independence.\n"+detail.msg
632 print msg711 print msg
@@ -644,7 +723,7 @@
644 print msg723 print msg
645 return724 return
646 except:725 except:
647 msg = "***Error: failed to check independence due to unknown error."726 msg = "***Error: failed to check independence due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit"
648 print msg727 print msg
649 traceback.print_exc()728 traceback.print_exc()
650 return729 return
@@ -653,16 +732,14 @@
653 data_ind = ""732 data_ind = ""
654 #column headers733 #column headers
655 data_ind = "Source trees that are subsets of others\n"734 data_ind = "Source trees that are subsets of others\n"
656 data_ind = data_ind + "Flagged tree, is a subset of:\n"735 data_ind = data_ind + "Flagged tree(s), is/are subset(s) of:\n"
657 for name in data_independence:736 for names in subsets:
658 if ( data_independence[name][1] == supertree_toolkit.SUBSET ):737 data_ind += names[1:] + "," + names[0] + "\n"
659 data_ind += name + "," + data_independence[name][0] + "\n"
660 738
661 data_ind += "\n\nSource trees that are identical to others\n"739 data_ind += "\n\nSource trees that are identical to others\n"
662 data_ind = data_ind + "Flagged tree, is identical to:\n"740 data_ind = data_ind + "Flagged tree(s), is/are identical to:\n"
663 for name in data_independence:741 for names in data_independence:
664 if ( data_independence[name][1] == supertree_toolkit.IDENTICAL ):742 data_ind += names[1:] + "," + names[0] + "\n"
665 data_ind += name + "," + data_independence[name][0] + "\n"
666 743
667744
668 if (output_file == False or745 if (output_file == False or
@@ -762,7 +839,7 @@
762 # Does the output file already exist?839 # Does the output file already exist?
763 if (os.path.exists(output_file) and not overwrite):840 if (os.path.exists(output_file) and not overwrite):
764 print "Output file exists. Either remove the file or use the --overwrite flag."841 print "Output file exists. Either remove the file or use the --overwrite flag."
765 print "Do you wish to continue? [Y/n]"842 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
766 while True:843 while True:
767 k=inkey()844 k=inkey()
768 if k.lower() == 'n':845 if k.lower() == 'n':
@@ -771,6 +848,7 @@
771 if k.lower() == 'y':848 if k.lower() == 'y':
772 break849 break
773 try:850 try:
851
774 XML = supertree_toolkit.load_phyml(input_file)852 XML = supertree_toolkit.load_phyml(input_file)
775 input_is_xml = True853 input_is_xml = True
776 except:854 except:
@@ -896,7 +974,7 @@
896 # Does the output file already exist?974 # Does the output file already exist?
897 if (os.path.exists(output_file) and not overwrite):975 if (os.path.exists(output_file) and not overwrite):
898 print "Output file exists. Either remove the file or use the --overwrite flag."976 print "Output file exists. Either remove the file or use the --overwrite flag."
899 print "Do you wish to continue? [Y/n]"977 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
900 while True:978 while True:
901 k=inkey()979 k=inkey()
902 if k.lower() == 'n':980 if k.lower() == 'n':
@@ -942,7 +1020,7 @@
942 print msg1020 print msg
943 return1021 return
944 except: 1022 except:
945 msg = "***Error: Failed sbstituting taxa due to unknown error.\n"1023 msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
946 print msg1024 print msg
947 traceback.print_exc()1025 traceback.print_exc()
948 return 1026 return
@@ -983,7 +1061,7 @@
9831061
984 if (os.path.exists(output_file) and not overwrite):1062 if (os.path.exists(output_file) and not overwrite):
985 print "Output file exists. Either remove the file or use the --overwrite flag."1063 print "Output file exists. Either remove the file or use the --overwrite flag."
986 print "Do you wish to continue? [Y/n]"1064 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
987 while True:1065 while True:
988 k=inkey()1066 k=inkey()
989 if k.lower() == 'n':1067 if k.lower() == 'n':
@@ -1013,7 +1091,7 @@
1013 print msg1091 print msg
1014 return1092 return
1015 except: 1093 except:
1016 msg = "***Error: Failed sbstituting taxa due to unknown error.\n"1094 msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
1017 print msg1095 print msg
1018 traceback.print_exc()1096 traceback.print_exc()
1019 return 1097 return
@@ -1060,7 +1138,7 @@
1060 print msg1138 print msg
1061 return1139 return
1062 except: 1140 except:
1063 msg = "***Error: Failed to export data due to unknown error.\n"1141 msg = "***Error: Failed to export data due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
1064 print msg1142 print msg
1065 traceback.print_exc()1143 traceback.print_exc()
1066 return 1144 return
@@ -1115,7 +1193,7 @@
1115 print msg1193 print msg
1116 return1194 return
1117 except: 1195 except:
1118 msg = "***Error: Failed to check overlap due to unknown error.\n"1196 msg = "***Error: Failed to check overlap due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
1119 print msg1197 print msg
1120 traceback.print_exc()1198 traceback.print_exc()
1121 return 1199 return
@@ -1161,7 +1239,7 @@
1161 # check if output files are there1239 # check if output files are there
1162 if (output_file and os.path.exists(output_file) and not overwrite):1240 if (output_file and os.path.exists(output_file) and not overwrite):
1163 print "Output file exists. Either remove the file or use the --overwrite flag."1241 print "Output file exists. Either remove the file or use the --overwrite flag."
1164 print "Do you wish to continue? [Y/n]"1242 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
1165 while True:1243 while True:
1166 k=inkey()1244 k=inkey()
1167 if k.lower() == 'n':1245 if k.lower() == 'n':
@@ -1191,7 +1269,7 @@
1191 print msg1269 print msg
1192 return1270 return
1193 except: 1271 except:
1194 msg = "***Error: Failed to export trees due to unknown error.\n"1272 msg = "***Error: Failed to export trees due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
1195 print msg1273 print msg
1196 traceback.print_exc()1274 traceback.print_exc()
1197 return 1275 return
@@ -1220,7 +1298,7 @@
1220 # check if output files are there1298 # check if output files are there
1221 if (output_file and os.path.exists(output_file) and not overwrite):1299 if (output_file and os.path.exists(output_file) and not overwrite):
1222 print "Output file exists. Either remove the file or use the --overwrite flag."1300 print "Output file exists. Either remove the file or use the --overwrite flag."
1223 print "Do you wish to continue? [Y/n]"1301 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
1224 while True:1302 while True:
1225 k=inkey()1303 k=inkey()
1226 if k.lower() == 'n':1304 if k.lower() == 'n':
@@ -1309,7 +1387,7 @@
1309 print msg1387 print msg
1310 return1388 return
1311 except: 1389 except:
1312 msg = "***Error: Failed to permute trees due to unknown error.\n"1390 msg = "***Error: Failed to permute trees due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
1313 print msg1391 print msg
1314 traceback.print_exc()1392 traceback.print_exc()
1315 return 1393 return
@@ -1347,7 +1425,7 @@
1347 # check if output files are there1425 # check if output files are there
1348 if (os.path.exists(output_file) and not overwrite):1426 if (os.path.exists(output_file) and not overwrite):
1349 print "Output file exists. Either remove the file or use the --overwrite flag."1427 print "Output file exists. Either remove the file or use the --overwrite flag."
1350 print "Do you wish to continue? [Y/n]"1428 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
1351 while True:1429 while True:
1352 k=inkey()1430 k=inkey()
1353 if k.lower() == 'n':1431 if k.lower() == 'n':
@@ -1376,7 +1454,7 @@
1376 print msg1454 print msg
1377 return1455 return
1378 except: 1456 except:
1379 msg = "***Error: Failed to clean data due to unknown error.\n"1457 msg = "***Error: Failed to clean data due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
1380 print msg1458 print msg
1381 traceback.print_exc()1459 traceback.print_exc()
1382 return 1460 return
@@ -1404,7 +1482,7 @@
1404 # check if output files are there1482 # check if output files are there
1405 if (os.path.exists(output_file) and not overwrite):1483 if (os.path.exists(output_file) and not overwrite):
1406 print "Output file exists. Either remove the file or use the --overwrite flag."1484 print "Output file exists. Either remove the file or use the --overwrite flag."
1407 print "Do you wish to continue? [Y/n]"1485 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
1408 while True:1486 while True:
1409 k=inkey()1487 k=inkey()
1410 if k.lower() == 'n':1488 if k.lower() == 'n':
@@ -1433,7 +1511,7 @@
1433 print msg1511 print msg
1434 return1512 return
1435 except: 1513 except:
1436 msg = "***Error: Failed to replace genera due to unknown error.\n"1514 msg = "***Error: Failed to replace genera due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
1437 print msg1515 print msg
1438 traceback.print_exc()1516 traceback.print_exc()
1439 return 1517 return
@@ -1488,7 +1566,7 @@
1488 new_trees = {}1566 new_trees = {}
1489 i = 11567 i = 1
1490 for t in trees:1568 for t in trees:
1491 new_trees['tree_'+str(i)] = t1569 new_trees['tree_'+str(i)] = supertree_toolkit._collapse_nodes(t)
1492 i += 11570 i += 1
1493 output = supertree_toolkit._amalgamate_trees(new_trees,format=output_format)1571 output = supertree_toolkit._amalgamate_trees(new_trees,format=output_format)
1494 except TreeParseError as detail:1572 except TreeParseError as detail:
@@ -1503,7 +1581,7 @@
1503 # check if output files are there1581 # check if output files are there
1504 if (os.path.exists(output_file) and not overwrite):1582 if (os.path.exists(output_file) and not overwrite):
1505 print "Output file exists. Either remove the file or use the --overwrite flag."1583 print "Output file exists. Either remove the file or use the --overwrite flag."
1506 print "Do you wish to continue? [Y/n]"1584 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
1507 while True:1585 while True:
1508 k=inkey()1586 k=inkey()
1509 if k.lower() == 'n':1587 if k.lower() == 'n':
@@ -1540,7 +1618,7 @@
1540 # check if output files are there1618 # check if output files are there
1541 if (os.path.exists(output_file) and not overwrite):1619 if (os.path.exists(output_file) and not overwrite):
1542 print "Output file exists. Either remove the file or use the --overwrite flag."1620 print "Output file exists. Either remove the file or use the --overwrite flag."
1543 print "Do you wish to continue? [Y/n]"1621 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
1544 while True:1622 while True:
1545 k=inkey()1623 k=inkey()
1546 if k.lower() == 'n':1624 if k.lower() == 'n':
@@ -1589,7 +1667,7 @@
1589 print msg1667 print msg
1590 return1668 return
1591 except: 1669 except:
1592 msg = "***Error: Failed to create subset due to unknown error.\n"1670 msg = "***Error: Failed to create subset due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
1593 print msg1671 print msg
1594 traceback.print_exc()1672 traceback.print_exc()
1595 return 1673 return
@@ -1637,6 +1715,681 @@
1637 print "**************************************************************\n"1715 print "**************************************************************\n"
16381716
16391717
1718def check_otus(args):
1719 """check out the OTUs in the Phyml - are they considered valid?"""
1720
1721 verbose = args.verbose
1722 input_file = args.input
1723 output_file = args.output
1724
1725 print input_file
1726 if (input_file.endswith(".phyml")):
1727 XML = supertree_toolkit.load_phyml(input_file)
1728 try:
1729 equivs = supertree_toolkit.taxonomic_checker(XML, verbose=verbose)
1730 except NotUniqueError as detail:
1731 msg = "***Error: Failed to check OTUs.\n"+detail.msg
1732 print msg
1733 return
1734 except InvalidSTKData as detail:
1735 msg = "***Error: Failed to check OTUs.\n"+detail.msg
1736 print msg
1737 return
1738 except UninformativeTreeError as detail:
1739 msg = "***Error: Failed to check OTUs.\n"+detail.msg
1740 print msg
1741 return
1742 except TreeParseError as detail:
1743 msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg
1744 print msg
1745 return
1746 except:
1747 # what about no internet conenction? What error do that throw?
1748 msg = "***Error: failed to create OTUs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit"
1749 print msg
1750 traceback.print_exc()
1751 return
1752 elif (input_file.endswith(".txt") or input_file.endswith('.dat')):
1753 # read file - assume one taxa per line
1754 with open(input_file,'r') as f:
1755 lines = f.read().splitlines()
1756 equivs = supertree_toolkit.taxonomic_checker_list(lines, verbose=verbose)
1757 else:
1758 # assume a tree!
1759 equivs = supertree_toolkit.taxonomic_checker_tree(input_file, verbose=verbose)
1760
1761
1762
1763 f = open(output_file,"w")
1764 for taxon in sorted(equivs.keys()):
1765 f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n")
1766 f.close()
1767
1768
1769
1770def create_taxonomy(args):
1771 """create a taxonomic heirachy for each OTU in the Phyml"""
1772
1773 verbose = args.verbose
1774 input_file = args.input
1775 output_file = args.output
1776 existing_taxonomy = args.taxonomy
1777 ignoreWarnings = args.ignoreWarnings
1778
1779 XML = supertree_toolkit.load_phyml(input_file)
1780 if (not existing_taxonomy == None):
1781 existing_taxonomy = supertree_toolkit.load_taxonomy(existing_taxonomy) # load it in and create the dictionary
1782 pass
1783
1784 try:
1785 taxonomy = supertree_toolkit.create_taxonomy(XML,existing_taxonomy=existing_taxonomy,verbose=verbose,ignoreWarnings=ignoreWarnings)
1786 except NotUniqueError as detail:
1787 msg = "***Error: Failed to create taxonomy.\n"+detail.msg
1788 print msg
1789 return
1790 except InvalidSTKData as detail:
1791 msg = "***Error: Failed to create taxonomy.\n"+detail.msg
1792 print msg
1793 return
1794 except UninformativeTreeError as detail:
1795 msg = "***Error: Failed to create taxonomy.\n"+detail.msg
1796 print msg
1797 return
1798 except TreeParseError as detail:
1799 msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg
1800 print msg
1801 return
1802 except:
1803 # what about no internet conenction? What error do that throw?
1804 msg = "***Error: failed to create taxonomy due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit"
1805 print msg
1806 traceback.print_exc()
1807 return
1808
1809 # Now create the CSV output
1810 with open(output_file, 'w') as f:
1811 writer = csv.writer(f)
1812 headers = []
1813 headers.append("OTU")
1814 headers.extend(supertree_toolkit.taxonomy_levels)
1815 headers.append("Data source")
1816 writer.writerow(headers)
1817 for t in taxonomy:
1818 otu = t
1819 try:
1820 species = taxonomy[t]['species']
1821 except KeyError:
1822 species = "-"
1823 try:
1824 genus = taxonomy[t]['genus']
1825 except KeyError:
1826 genus = "-"
1827 try:
1828 family = taxonomy[t]['family']
1829 except KeyError:
1830 family = "-"
1831 try:
1832 superfamily = taxonomy[t]['superfamily']
1833 except KeyError:
1834 superfamily = "-"
1835 try:
1836 infraorder = taxonomy[t]['infraorder']
1837 except KeyError:
1838 infraorder = "-"
1839 try:
1840 suborder = taxonomy[t]['suborder']
1841 except KeyError:
1842 suborder = "-"
1843 try:
1844 order = taxonomy[t]['order']
1845 except KeyError:
1846 order = "-"
1847 try:
1848 superorder = taxonomy[t]['superorder']
1849 except KeyError:
1850 superorder = "-"
1851 try:
1852 subclass = taxonomy[t]['subclass']
1853 except KeyError:
1854 subclass = "-"
1855 try:
1856 tclass = taxonomy[t]['class']
1857 except KeyError:
1858 tclass = "-"
1859 try:
1860 subphylum = taxonomy[t]['subphylum']
1861 except KeyError:
1862 subphylum = "-"
1863 try:
1864 phylum = taxonomy[t]['phylum']
1865 except KeyError:
1866 phylum = "-"
1867 try:
1868 superphylum = taxonomy[t]['superphylum']
1869 except KeyError:
1870 superphylum = "-"
1871 try:
1872 infrakingdom = taxonomy[t]['infrakingdom']
1873 except:
1874 infrakingdom = "-"
1875 try:
1876 subkingdom = taxonomy[t]['subkingdom']
1877 except:
1878 subkingdom = "-"
1879 try:
1880 kingdom = taxonomy[t]['kingdom']
1881 except KeyError:
1882 kingdom = "-"
1883 try:
1884 provider = taxonomy[t]['provider']
1885 except KeyError:
1886 provider = "-"
1887
1888 if (isinstance(species, list)):
1889 species = " ".join(species)
1890 this_classification = [
1891 otu.encode('utf-8'),
1892 species.encode('utf-8'),
1893 genus.encode('utf-8'),
1894 family.encode('utf-8'),
1895 superfamily.encode('utf-8'),
1896 infraorder.encode('utf-8'),
1897 suborder.encode('utf-8'),
1898 order.encode('utf-8'),
1899 superorder.encode('utf-8'),
1900 subclass.encode('utf-8'),
1901 tclass.encode('utf-8'),
1902 subphylum.encode('utf-8'),
1903 phylum.encode('utf-8'),
1904 superphylum.encode('utf-8'),
1905 infrakingdom.encode('utf-8'),
1906 subkingdom.encode('utf-8'),
1907 kingdom.encode('utf-8'),
1908 provider.encode('utf-8')]
1909 writer.writerow(this_classification)
1910
1911def auto_subs(args):
1912 """Get all OTUs to the same taxonomic level"""
1913
1914
1915 verbose = args.verbose
1916 input_file = args.input
1917 output = args.output
1918 taxonomy = args.taxonomy
1919 ignoreWarnings = args.ignoreWarnings
1920
1921 if (os.path.exists(output) and not overwrite):
1922 print "Output Phyml file exists. Either remove the file or use the --overwrite flag."
1923 print "Do you wish to continue and overwrite the file anyway?? [Y/n]"
1924 while True:
1925 k=inkey()
1926 if k.lower() == 'n':
1927 print "Exiting..."
1928 sys.exit(0)
1929 if k.lower() == 'y':
1930 break
1931
1932 XML = supertree_toolkit.load_phyml(input_file)
1933 taxonomy = supertree_toolkit.load_taxonomy(taxonomy) # load it in and create the dictionary
1934
1935 try:
1936 newXML = supertree_toolkit.generate_species_level_data(XML,taxonomy,verbose=verbose,ignoreWarnings=ignoreWarnings)
1937 except NotUniqueError as detail:
1938 msg = "***Error: Failed to carry out auto subs.\n"+detail.msg
1939 print msg
1940 return
1941 except InvalidSTKData as detail:
1942 msg = "***Error: Failed to carry out auto subs.\n"+detail.msg
1943 print msg
1944 return
1945 except UninformativeTreeError as detail:
1946 msg = "***Error: Failed to carry out auto subs.\n"+detail.msg
1947 print msg
1948 return
1949 except TreeParseError as detail:
1950 msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg
1951 print msg
1952 return
1953 except NoneCompleteTaxonomy as detail:
1954 msg = "***Error: Failed to carry out auto subs.\n"+detail.msg
1955 print msg
1956 return
1957 except:
1958 # what about no internet conenction? What error do that throw?
1959 msg = "***Error: failed to carry out auto subs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit"
1960 print msg
1961 traceback.print_exc()
1962 return
1963
1964 f = open(output,"w")
1965 f.write(newXML)
1966 f.close()
1967
1968def process(args):
1969
1970 verbose = args.verbose
1971 input_file = args.input
1972 output = args.output
1973 no_store = args.no_store
1974 ignoreWarnings = args.ignoreWarnings
1975 taxonomy_file = args.taxonomy_file
1976 equivalents_file = args.equivalents_file
1977 overwrite = args.overwrite
1978
1979 if (os.path.exists(output) and not overwrite):
1980 print "Output matrix file exists. Either remove the file or use the --overwrite flag."
1981 print "Do you wish to continue and overwrite the file anyway? [Y/n]"
1982 while True:
1983 k=inkey()
1984 if k.lower() == 'n':
1985 print "Exiting..."
1986 sys.exit(0)
1987 if k.lower() == 'y':
1988 break
1989
1990 filename = os.path.basename(input_file)
1991 dirname = os.path.dirname(input_file)
1992
1993 if verbose:
1994 print "Loading and checking your data"
1995 # 0) load and check data
1996 try:
1997 phyml = supertree_toolkit.load_phyml(input_file)
1998 project_name = supertree_toolkit.get_project_name(phyml)
1999 supertree_toolkit._check_data(phyml)
2000 except NotUniqueError as detail:
2001 msg = "***Error: Failed to load data.\n"+detail.msg
2002 print msg
2003 return
2004 except InvalidSTKData as detail:
2005 msg = "***Error: Failed to load data.\n"+detail.msg
2006 print msg
2007 return
2008 except UninformativeTreeError as detail:
2009 msg = "***Error: Failed to load data.\n"+detail.msg
2010 print msg
2011 return
2012 except TreeParseError as detail:
2013 msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg
2014 print msg
2015 return
2016 except:
2017 msg = "***Error: Failed to load input due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
2018 print msg
2019 traceback.print_exc()
2020 return
2021
2022 if verbose:
2023 print "Checking taxa againt online databases"
2024 # 1) taxonomy checker with autoreplace
2025 # Load existing data if any:
2026 if (not equivalents_file == None):
2027 equivalents = supertree_toolkit.load_equivalents(equivalents_file)
2028 else:
2029 equivalents = None
2030 equivalents = supertree_toolkit.taxonomic_checker(phyml,existing_data=equivalents,verbose=verbose)
2031 # save the equivalents for later (as CSV and as sub file)
2032 data_string_csv = _equivalents_to_csv(equivalents)
2033 data_string_subs = _equivalents_to_subs(equivalents)
2034 f = open(os.path.join(dirname,project_name+"_taxonomy_checker.csv"), "w")
2035 f.write(data_string_csv)
2036 f.close()
2037 f = open(os.path.join(dirname,project_name+"_taxonomy_check_subs.dat"), "w")
2038 f.write(data_string_subs)
2039 f.close()
2040
2041 # now do the replacements - we use the subs file :)
2042 if verbose:
2043 print "Swapping in the corrected taxa names"
2044 try:
2045 old_taxa, new_taxa = supertree_toolkit.parse_subs_file(os.path.join(dirname,project_name+"_taxonomy_check_subs.dat"))
2046 except UnableToParseSubsFile as e:
2047 print e.msg
2048 sys.exit(-1)
2049 try:
2050 phyml = supertree_toolkit.substitute_taxa(phyml,old_taxa,new_taxa,only_existing=False,verbose=verbose)
2051 except NotUniqueError as detail:
2052 msg = "***Error: Failed to substituting taxa.\n"+detail.msg
2053 print msg
2054 return
2055 except InvalidSTKData as detail:
2056 msg = "***Error: Failed substituting taxa.\n"+detail.msg
2057 print msg
2058 return
2059 except UninformativeTreeError as detail:
2060 msg = "***Error: Failed to substituting taxa.\n"+detail.msg
2061 print msg
2062 return
2063 except TreeParseError as detail:
2064 msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg
2065 print msg
2066 return
2067 except:
2068 msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
2069 print msg
2070 traceback.print_exc()
2071 return
2072 # save phyml as intermediate step
2073 f = open(os.path.join(dirname,project_name+"_taxonomy_checked.phyml"), "w")
2074 f.write(phyml)
2075 f.close()
2076
2077
2078 if verbose:
2079 print "Creating taxonomic information"
2080 # 2) create taxonomy
2081 if (not taxonomy_file == None):
2082 taxonomy = supertree_toolkit.load_taxonomy(taxonomy_file)
2083 else:
2084 taxonomy = None
2085 taxonomy = supertree_toolkit.create_taxonomy(phyml,existing_taxonomy=taxonomy,verbose=verbose)
2086 # save the taxonomy for later
2087 # Now create the CSV output - seperate out into function in STK (used several times)
2088 with open(os.path.join(dirname,project_name+"_taxonomy.csv"), 'w') as f:
2089 writer = csv.writer(f)
2090 headers = []
2091 headers.append("OTU")
2092 headers.extend(supertree_toolkit.taxonomy_levels)
2093 headers.append("Data source")
2094 writer.writerow(headers)
2095 for t in taxonomy:
2096 otu = t
2097 try:
2098 species = taxonomy[t]['species']
2099 except KeyError:
2100 species = "-"
2101 try:
2102 subgenus = taxonomy[t]['subgenus']
2103 except KeyError:
2104 subgenus = "-"
2105 try:
2106 genus = taxonomy[t]['genus']
2107 except KeyError:
2108 genus = "-"
2109 try:
2110 subfamily = taxonomy[t]['subfamily']
2111 except KeyError:
2112 subfamily = "-"
2113 try:
2114 family = taxonomy[t]['family']
2115 except KeyError:
2116 family = "-"
2117 try:
2118 superfamily = taxonomy[t]['superfamily']
2119 except KeyError:
2120 superfamily = "-"
2121 try:
2122 subsection = taxonomy[t]['subsection']
2123 except KeyError:
2124 subsection = "-"
2125 try:
2126 section = taxonomy[t]['section']
2127 except KeyError:
2128 section = "-"
2129 try:
2130 infraorder = taxonomy[t]['infraorder']
2131 except KeyError:
2132 infraorder = "-"
2133 try:
2134 suborder = taxonomy[t]['suborder']
2135 except KeyError:
2136 suborder = "-"
2137 try:
2138 order = taxonomy[t]['order']
2139 except KeyError:
2140 order = "-"
2141 try:
2142 superorder = taxonomy[t]['superorder']
2143 except KeyError:
2144 superorder = "-"
2145 try:
2146 subclass = taxonomy[t]['subclass']
2147 except KeyError:
2148 subclass = "-"
2149 try:
2150 tclass = taxonomy[t]['class']
2151 except KeyError:
2152 tclass = "-"
2153 try:
2154 superclass = taxonomy[t]['superclass']
2155 except KeyError:
2156 superclass = "-"
2157 try:
2158 subphylum = taxonomy[t]['subphylum']
2159 except KeyError:
2160 subphylum = "-"
2161 try:
2162 phylum = taxonomy[t]['phylum']
2163 except KeyError:
2164 phylum = "-"
2165 try:
2166 superphylum = taxonomy[t]['superphylum']
2167 except KeyError:
2168 superphylum = "-"
2169 try:
2170 infrakingdom = taxonomy[t]['infrakingdom']
2171 except:
2172 infrakingdom = "-"
2173 try:
2174 subkingdom = taxonomy[t]['subkingdom']
2175 except:
2176 subkingdom = "-"
2177 try:
2178 kingdom = taxonomy[t]['kingdom']
2179 except KeyError:
2180 kingdom = "-"
2181 try:
2182 provider = taxonomy[t]['provider']
2183 except KeyError:
2184 provider = "-"
2185 this_classification = [
2186 otu.encode('utf-8'),
2187 species.encode('utf-8'),
2188 subgenus.encode('utf-8'),
2189 genus.encode('utf-8'),
2190 subfamily.encode('utf-8'),
2191 family.encode('utf-8'),
2192 superfamily.encode('utf-8'),
2193 subsection.encode('utf-8'),
2194 section.encode('utf-8'),
2195 infraorder.encode('utf-8'),
2196 suborder.encode('utf-8'),
2197 order.encode('utf-8'),
2198 superorder.encode('utf-8'),
2199 subclass.encode('utf-8'),
2200 tclass.encode('utf-8'),
2201 superclass.encode('utf-8'),
2202 subphylum.encode('utf-8'),
2203 phylum.encode('utf-8'),
2204 superphylum.encode('utf-8'),
2205 infrakingdom.encode('utf-8'),
2206 subkingdom.encode('utf-8'),
2207 kingdom.encode('utf-8'),
2208 provider.encode('utf-8')]
2209 writer.writerow(this_classification)
2210
2211 # 3) create species level dataset
2212 if verbose:
2213 print "Converting data to species level"
2214 try:
2215 phyml = supertree_toolkit.generate_species_level_data(phyml,taxonomy,verbose=verbose)
2216 except NotUniqueError as detail:
2217 msg = "***Error: Failed to carry out auto subs.\n"+detail.msg
2218 print msg
2219 return
2220 except InvalidSTKData as detail:
2221 msg = "***Error: Failed to carry out auto subs.\n"+detail.msg
2222 print msg
2223 return
2224 except UninformativeTreeError as detail:
2225 msg = "***Error: Failed to carry out auto subs.\n"+detail.msg
2226 print msg
2227 return
2228 except TreeParseError as detail:
2229 msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg
2230 print msg
2231 return
2232 except NoneCompleteTaxonomy as detail:
2233 msg = "***Error: Failed to carry out auto subs.\n"+detail.msg
2234 print msg
2235 return
2236 except:
2237 # what about no internet conenction? What error do that throw?
2238 msg = "***Error: failed to carry out auto subs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit"
2239 print msg
2240 traceback.print_exc()
2241 return
2242 # save the phyml as intermediate step
2243 f = open(os.path.join(dirname,project_name+"_species_level.phyml"), "w")
2244 f.write(phyml)
2245 f.close()
2246
2247 # 4) Remove non-monophyletic taxa (requires TNT to be installed)
2248 if verbose:
2249 print "Removing non-monophyletic taxa via mini-supertree method"
2250 tree_list = supertree_toolkit._find_trees_for_permuting(phyml)
2251 try:
2252 for t in tree_list:
2253 # permute
2254 output_string = supertree_toolkit.permute_tree(tree_list[t],matrix='hennig',treefile=None,verbose=verbose)
2255 #save
2256 if (not output_string == ""):
2257 file_name = os.path.basename(filename)
2258 dirname = os.path.dirname(filename)
2259 new_output = os.path.join(dirname,t,t+"_matrix.tnt")
2260 try:
2261 os.makedirs(os.path.join(dirname,t))
2262 except OSError:
2263 if not os.path.isdir(os.path.join(dirname,t)):
2264 raise
2265 f = open(new_output,'w',0)
2266 f.write(output_string)
2267 f.close
2268 time.sleep(1)
2269
2270 # now create the tnt command to deal with this
2271 # create a tmp file for the output tree
2272 temp_file_handle, temp_file = tempfile.mkstemp(suffix=".tnt")
2273 tnt_command = "tnt mxram 512,run "+new_output+",echo= ,timeout 00:10:00,rseed0,rseed*,hold 1000,xmult= level 0,taxname=,nelsen *,tsave *"+temp_file+",save /,quit"
2274 #tnt_command = "tnt run "+new_output+",ienum,taxname=,nelsen*,tsave *"+temp_file+",save /,quit"
2275 # run tnt, grab the output and store back in the data
2276 #try:
2277 call(tnt_command, shell=True)
2278 #except CalledProcessError as e:
2279 # msg = "***Error: Failed to run TNT. Is it installed correctl?.\n"+e.msg
2280 # print msg
2281 # return
2282 #ret = os.system(tnt_command)
2283 #if (not ret == 0):
2284 # print "error running tnt"
2285 # return
2286
2287 new_tree = supertree_toolkit.import_tree(temp_file)
2288 phyml = supertree_toolkit._swap_tree_in_XML(phyml,new_tree,t)
2289
2290 except TreeParseError as e:
2291 msg = "***Error permuting trees.\n"+e.msg
2292 print msg
2293 return
2294
2295 #4.5) remove MRP_Outgroups
2296 phyml = supertree_toolkit.substitute_taxa(phyml,'MRP_Outgroup')
2297 phyml = supertree_toolkit.substitute_taxa(phyml,'MRPOutgroup')
2298 phyml = supertree_toolkit.substitute_taxa(phyml,'MRP_outgroup')
2299 phyml = supertree_toolkit.substitute_taxa(phyml,'MRPoutgroup')
2300 phyml = supertree_toolkit.substitute_taxa(phyml,'MRPOUTGROUP')
2301
2302 # save intermediate phyml
2303 f = open(os.path.join(dirname,project_name+"_nonmonophyl_removed.phyml"), "w")
2304 f.write(phyml)
2305 f.close()
2306
2307
2308 # 5) Remove common names
2309 # no function to do this yet...
2310
2311 # 6) Data independance
2312 if verbose:
2313 print "Checking data independence"
2314 data_ind,subsets,phyml = supertree_toolkit.data_independence(phyml,make_new_xml=True)
2315 # save phyml
2316 f = open(os.path.join(dirname,project_name+"_data_ind.phyml"), "w")
2317 f.write(phyml)
2318 f.close()
2319
2320 # 7) Data overlap
2321 if verbose:
2322 print "Checking data overlap"
2323 sufficient_overlap, key_list = supertree_toolkit.data_overlap(phyml,verbose=verbose)
2324 # process the key_list to remove the unconnected trees
2325 if not sufficient_overlap:
2326 # we don't, have enough, then remove all but the largest group.
2327 # the key contains a list, with the largest group first (thanks networkX!)
2328 # we can therefore just remove trees from everything but the first in the list
2329 delete_me = []
2330 for t in key_list[1::]: # skip 0
2331 delete_me.extend(t)
2332 for tree in delete_me:
2333 phyml = supertree_toolkit._swap_tree_in_XML(phyml, None, tree, delete=True) # delete the tree and clean the data as we go
2334 # save phyml
2335 f = open(os.path.join(dirname,project_name+"_data_tax_overlap.phyml"), "w")
2336 f.write(phyml)
2337 f.close()
2338
2339
2340 # 8) Create matrix
2341 if verbose:
2342 print "Creating matrix"
2343 try:
2344 matrix = supertree_toolkit.create_matrix(phyml)
2345 except NotUniqueError as detail:
2346 msg = "***Error: Failed to create matrix.\n"+detail.msg
2347 print msg
2348 return
2349 except InvalidSTKData as detail:
2350 msg = "***Error: Failed to create matrix.\n"+detail.msg
2351 print msg
2352 return
2353 except UninformativeTreeError as detail:
2354 msg = "***Error: Failed to create matrix.\n"+detail.msg
2355 print msg
2356 return
2357 except TreeParseError as detail:
2358 msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg
2359 print msg
2360 return
2361 except:
2362 msg = "***Error: Failed to create matrix due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n"
2363 print msg
2364 traceback.print_exc()
2365 return
2366
2367 f = open(output, "w")
2368 f.write(matrix)
2369 f.close()
2370
2371 return
2372
2373
2374def _equivalents_to_csv(equivalents):
2375
2376 output_string = 'Taxa,Equivalents,Status\n'
2377
2378 for taxon in sorted(equivalents):
2379 output_string += taxon + "," + ';'.join(equivalents[taxon][0]) + "," + equivalents[taxon][1] + "\n"
2380
2381 return output_string
2382
2383
2384def _equivalents_to_subs(equivalents):
2385 """Only corrects the yellow ones. Red and green are left alone"""
2386
2387 output_string = ""
2388 for taxon in sorted(equivalents):
2389 if (equivalents[taxon][1] == 'yellow'):
2390 # the first name is always the correct one
2391 output_string += taxon + " = "+equivalents[taxon][0][0]+"\n"
2392 return output_string
16402393
1641if __name__ == "__main__":2394if __name__ == "__main__":
1642 main()2395 main()
16432396
=== modified file 'stk/stk_exceptions.py'
--- stk/stk_exceptions.py 2013-10-22 08:26:54 +0000
+++ stk/stk_exceptions.py 2017-01-12 09:27:31 +0000
@@ -134,4 +134,12 @@
134 def __init__(self, msg):134 def __init__(self, msg):
135 self.msg = msg135 self.msg = msg
136136
137class NoneCompleteTaxonomy(Error):
138 """Exception raised when a taxonomy is not complete for these data
139 Attributes:
140 msg -- explaination of error
141 """
142
143 def __init__(self, msg):
144 self.msg = msg
137145
138146
=== modified file 'stk/supertree_toolkit.py'
--- stk/supertree_toolkit.py 2017-01-11 15:16:21 +0000
+++ stk/supertree_toolkit.py 2017-01-12 09:27:31 +0000
@@ -44,15 +44,49 @@
44import unicodedata44import unicodedata
45from stk_internals import *45from stk_internals import *
46from copy import deepcopy46from copy import deepcopy
47import Queue
48import threading
49import urllib2
50from urllib import quote_plus
51import simplejson as json
52import time
47import types53import types
4854
49#plt.ion()55#plt.ion()
5056
57sys.setrecursionlimit(50000)
51# GLOBAL VARIABLES58# GLOBAL VARIABLES
52IDENTICAL = 059IDENTICAL = 0
53SUBSET = 160SUBSET = 1
54PLATFORM = sys.platform61PLATFORM = sys.platform
55taxonomy_levels = ['species','genus','family','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom']62#Logging
63import logging
64logging.basicConfig(filename='supertreetoolkit.log', level=logging.DEBUG, format='%(asctime)s %(levelname)s:%(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
65
66# taxonomy levels
67# What we get from EOL
68current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom']
69# And the extra ones from ITIS
70extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom']
71# all of them in order
72taxonomy_levels = ['species','subgenus','genus','tribe','subfamily','family','superfamily','subsection','section','parvorder','infraorder','suborder','order','superorder','subclass','class','superclass','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom']
73
74SPECIES = taxonomy_levels[0]
75GENUS = taxonomy_levels[1]
76FAMILY = taxonomy_levels[2]
77SUPERFAMILY = taxonomy_levels[3]
78INFRAORDER = taxonomy_levels[4]
79SUBORDER = taxonomy_levels[5]
80ORDER = taxonomy_levels[6]
81SUPERORDER = taxonomy_levels[7]
82SUBCLASS = taxonomy_levels[8]
83CLASS = taxonomy_levels[9]
84SUBPHYLUM = taxonomy_levels[10]
85PHYLUM = taxonomy_levels[11]
86SUPERPHYLUM = taxonomy_levels[12]
87INFRAKINGDOM = taxonomy_levels[13]
88SUBKINGDOM = taxonomy_levels[14]
89KINGDOM = taxonomy_levels[15]
5690
57# supertree_toolkit is the backend for the STK. Loaded by both the GUI and91# supertree_toolkit is the backend for the STK. Loaded by both the GUI and
58# CLI, this contains all the functions to actually *do* something92# CLI, this contains all the functions to actually *do* something
@@ -60,6 +94,17 @@
60# All functions take XML and a list of other arguments, process the data and return94# All functions take XML and a list of other arguments, process the data and return
61# it back to the user interface handler to save it somewhere95# it back to the user interface handler to save it somewhere
6296
97
98def get_project_name(XML):
99 """
100 Get the name of the dataset currently being worked on
101 """
102
103 xml_root = _parse_xml(XML)
104
105 return xml_root.xpath('/phylo_storage/project_name/string_value')[0].text
106
107
63def create_name(authors, year, append=''):108def create_name(authors, year, append=''):
64 """ 109 """
65 Construct a sensible from a list of authors and a year for a 110 Construct a sensible from a list of authors and a year for a
@@ -161,6 +206,22 @@
161 206
162 return names207 return names
163208
209def get_all_tree_names(XML):
210 """ From a full XML-PHYML string, extract all tree names.
211 """
212
213 xml_root = _parse_xml(XML)
214 find = etree.XPath("//source")
215 sources = find(xml_root)
216 names = []
217 for s in sources:
218 for st in s.xpath("source_tree"):
219 if 'name' in st.attrib and not st.attrib['name'] == "":
220 names.append(st.attrib['name'])
221
222 return names
223
224
164def set_unique_names(XML):225def set_unique_names(XML):
165 """ Ensures all sources have unique names.226 """ Ensures all sources have unique names.
166 """227 """
@@ -249,9 +310,17 @@
249 if (ele.tag == "source"):310 if (ele.tag == "source"):
250 sources.append(ele)311 sources.append(ele)
251312
313 if overwrite:
314 # remove all the names first
315 for s in sources:
316 for st in s.xpath("source_tree"):
317 if 'name' in st.attrib:
318 del st.attrib['name']
319
320
252 for s in sources:321 for s in sources:
253 for st in s.xpath("source_tree"):322 for st in s.xpath("source_tree"):
254 if overwrite or not 'name' in st.attrib:323 if not'name' in st.attrib:
255 tree_name = create_tree_name(XML,st)324 tree_name = create_tree_name(XML,st)
256 st.attrib['name'] = tree_name325 st.attrib['name'] = tree_name
257 326
@@ -339,7 +408,7 @@
339 taxa = etree.SubElement(s_tree,"taxa_data")408 taxa = etree.SubElement(s_tree,"taxa_data")
340 taxa.tail="\n "409 taxa.tail="\n "
341 # Note: we do not add all elements as otherwise they get set to some option410 # Note: we do not add all elements as otherwise they get set to some option
342 # rather than remaining blank (and hence blue int he interface)411 # rather than remaining blank (and hence blue in the interface)
343412
344 # append our new source to the main tree413 # append our new source to the main tree
345 # if sources has no valid source, overwrite,414 # if sources has no valid source, overwrite,
@@ -877,7 +946,7 @@
877 # Need to add checks on the file. Problems include:946 # Need to add checks on the file. Problems include:
878# TNT: outputs Phyllip format or something - basically a Newick947# TNT: outputs Phyllip format or something - basically a Newick
879# string without commas, so add 'em back in948# string without commas, so add 'em back in
880 m = re.search(r'proc-;', content)949 m = re.search(r'proc.;', content)
881 if (m != None):950 if (m != None):
882 # TNT output tree951 # TNT output tree
883 # Done on a Mac? Replace ^M with a newline952 # Done on a Mac? Replace ^M with a newline
@@ -1402,6 +1471,36 @@
14021471
1403 return _amalgamate_trees(trees,format,anonymous)1472 return _amalgamate_trees(trees,format,anonymous)
1404 1473
1474def get_taxa_from_tree_for_taxonomy(tree, pretty=False, ignoreErrors=False):
1475 """Returns a list of all taxa available for the tree passed as argument.
1476 :param tree: string with the data for the tree in Newick format.
1477 :type tree: string
1478 :param pretty: defines if '_' in taxa names should be replaced with spaces.
1479 :type pretty: boolean
1480 :param ignoreErrors: should execution continue on error?
1481 :type ignoreErrors: boolean
1482 :returns: list of strings with the taxa names, sorted alphabetically
1483 :rtype: list
1484 """
1485 taxa_list = []
1486
1487 try:
1488 taxa_list.extend(_getTaxaFromNewick(tree))
1489 except TreeParseError as detail:
1490 if (ignoreErrors):
1491 logging.warning(detail.msg)
1492 pass
1493 else:
1494 raise TreeParseError( detail.msg )
1495
1496 # now uniquify the list of taxa
1497 taxa_list = _uniquify(taxa_list)
1498 taxa_list.sort()
1499
1500 if (pretty):
1501 taxa_list = [x.replace('_', ' ') for x in taxa_list]
1502
1503 return taxa_list
14051504
1406def get_all_taxa(XML, pretty=False, ignoreErrors=False):1505def get_all_taxa(XML, pretty=False, ignoreErrors=False):
1407 """ Produce a taxa list by scanning all trees within 1506 """ Produce a taxa list by scanning all trees within
@@ -1422,21 +1521,17 @@
1422 taxa_list.extend(_getTaxaFromNewick(t))1521 taxa_list.extend(_getTaxaFromNewick(t))
1423 except TreeParseError as detail:1522 except TreeParseError as detail:
1424 if (ignoreErrors):1523 if (ignoreErrors):
1524 logging.warning(detail.msg)
1425 pass1525 pass
1426 else:1526 else:
1427 raise TreeParseError( detail.msg )1527 raise TreeParseError( detail.msg )
14281528
1429
1430
1431 # now uniquify the list of taxa1529 # now uniquify the list of taxa
1432 taxa_list = _uniquify(taxa_list)1530 taxa_list = _uniquify(taxa_list)
1433 taxa_list.sort()1531 taxa_list.sort()
14341532
1435 if (pretty):1533 if (pretty): #Remove underscores from names
1436 unpretty_tl = taxa_list1534 taxa_list = [x.replace('_', ' ') for x in taxa_list]
1437 taxa_list = []
1438 for t in unpretty_tl:
1439 taxa_list.append(t.replace('_',' '))
14401535
1441 return taxa_list1536 return taxa_list
14421537
@@ -1508,7 +1603,7 @@
1508 return outgroups1603 return outgroups
15091604
15101605
1511def create_matrix(XML,format="hennig",quote=False,taxonomy=None,outgroups=False,ignoreWarnings=False):1606def create_matrix(XML,format="hennig",quote=False,taxonomy=None,outgroups=False,ignoreWarnings=False, verbose=False):
1512 """ From all trees in the XML, create a matrix1607 """ From all trees in the XML, create a matrix
1513 """1608 """
15141609
@@ -1553,7 +1648,7 @@
1553 taxa.sort()1648 taxa.sort()
1554 taxa.insert(0,"MRP_Outgroup")1649 taxa.insert(0,"MRP_Outgroup")
1555 1650
1556 return _create_matrix(trees, taxa, format=format, quote=quote, weights=weights)1651 return _create_matrix(trees, taxa, format=format, quote=quote, weights=weights,verbose=verbose)
15571652
15581653
1559def create_matrix_from_trees(trees,format="hennig"):1654def create_matrix_from_trees(trees,format="hennig"):
@@ -1925,7 +2020,7 @@
1925 _check_data(XML)2020 _check_data(XML)
19262021
1927 xml_root = _parse_xml(XML)2022 xml_root = _parse_xml(XML)
1928 proj_name = xml_root.xpath('/phylo_storage/project_name/string_value')[0].text2023 proj_name = get_project_name(XML)
19292024
1930 output_string = "======================\n"2025 output_string = "======================\n"
1931 output_string += " Data summary of: " + proj_name + "\n" 2026 output_string += " Data summary of: " + proj_name + "\n"
@@ -1989,6 +2084,188 @@
19892084
1990 return output_string2085 return output_string
19912086
2087def taxonomic_checker_list(name_list,existing_data=None,verbose=False):
2088 """ For each name in the database generate a database of the original name,
2089 possible synonyms and if the taxon is not know, signal that. We do this by
2090 using the EoL API to grab synonyms of each taxon. """
2091
2092 import urllib2
2093 from urllib import quote_plus
2094 import simplejson as json
2095
2096 if existing_data == None:
2097 equivalents = {}
2098 else:
2099 equivalents = existing_data
2100
2101 # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result?
2102 # if not, is there another API function to do this?
2103 # search for the taxon and grab the name - if you search for a recognised synonym on EoL then
2104 # you get the original ('correct') name - shorten this to two words and you're done.
2105 for t in name_list:
2106 if t in equivalents:
2107 continue
2108 taxon = t.replace("_"," ")
2109 if (verbose):
2110 print "Looking up ", taxon
2111 # get the data from EOL on taxon
2112 taxonq = quote_plus(taxon)
2113 URL = "http://eol.org/api/search/1.0.json?q="+taxonq
2114 req = urllib2.Request(URL)
2115 opener = urllib2.build_opener()
2116 f = opener.open(req)
2117 data = json.load(f)
2118 # check if there's some data
2119 if len(data['results']) == 0:
2120 equivalents[t] = [[t],'red']
2121 continue
2122 amber = False
2123 if len(data['results']) > 1:
2124 # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this
2125 # for automatic processing we'll just take the first one though
2126 # colour is amber in this case
2127 amber = True
2128 ID = str(data['results'][0]['id']) # take first hit
2129 URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=0&videos=0&sounds=0&maps=0&text=0&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0"
2130 req = urllib2.Request(URL)
2131 opener = urllib2.build_opener()
2132
2133 try:
2134 f = opener.open(req)
2135 except urllib2.HTTPError:
2136 equivalents[t] = [[t],'red']
2137 continue
2138 data = json.load(f)
2139 if len(data['scientificName']) == 0:
2140 # not found a scientific name, so set as red
2141 equivalents[t] = [[t],'red']
2142 continue
2143 correct_name = data['scientificName'].encode("ascii","ignore")
2144 # we only want the first two bits of the name, not the original author and year if any
2145 temp_name = correct_name.split(' ')
2146 if (len(temp_name) > 2):
2147 correct_name = ' '.join(temp_name[0:2])
2148 correct_name = correct_name.replace(' ','_')
2149
2150 # build up the output dictionary - original name is key, synonyms/missing is value
2151 if (correct_name == t):
2152 # if the original matches the 'correct', then it's green
2153 equivalents[t] = [[t], 'green']
2154 else:
2155 # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the
2156 # 'correct' taxon at the top
2157 eol_synonyms = data['synonyms']
2158 synonyms = []
2159 for s in eol_synonyms:
2160 ts = s['synonym'].encode("ascii","ignore")
2161 temp_syn = ts.split(' ')
2162 if (len(temp_syn) > 2):
2163 temp_syn = ' '.join(temp_syn[0:2])
2164 ts = temp_syn
2165 if (s['relationship'] == "synonym"):
2166 ts = ts.replace(" ","_")
2167 synonyms.append(ts)
2168 synonyms = _uniquify(synonyms)
2169 # we need to put the correct name at the top of the list now
2170 if (correct_name in synonyms):
2171 synonyms.insert(0, synonyms.pop(synonyms.index(correct_name)))
2172 elif len(synonyms) == 0:
2173 synonyms.append(correct_name)
2174 else:
2175 synonyms.insert(0,correct_name)
2176
2177 if (amber):
2178 equivalents[t] = [synonyms,'amber']
2179 else:
2180 equivalents[t] = [synonyms,'yellow']
2181 # if our search was empty, then it's red - see above
2182
2183 # up to the calling funciton to do something sensible with this
2184 # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red.
2185 # Amber means we found synonyms and multilpe hits. User def needs to sort these!
2186
2187 return equivalents
2188
2189def taxonomic_checker_tree(tree_file,existing_data=None,verbose=False):
2190 """ For each name in the database generate a database of the original name,
2191 possible synonyms and if the taxon is not know, signal that. We do this by
2192 using the EoL API to grab synonyms of each taxon. """
2193
2194 tree = import_tree(tree_file)
2195 p4tree = _parse_tree(tree)
2196 taxa = p4tree.getAllLeafNames(p4tree.root)
2197 if existing_data == None:
2198 equivalents = {}
2199 else:
2200 equivalents = existing_data
2201
2202 equivalents = taxonomic_checker_list(taxa,existing_data,verbose)
2203 return equivalents
2204
2205def taxonomic_checker(XML,existing_data=None,verbose=False):
2206 """ For each name in the database generate a database of the original name,
2207 possible synonyms and if the taxon is not know, signal that. We do this by
2208 using the EoL API to grab synonyms of each taxon. """
2209
2210 # grab all taxa
2211 taxa = get_all_taxa(XML)
2212
2213 if existing_data == None:
2214 equivalents = {}
2215 else:
2216 equivalents = existing_data
2217
2218 equivalents = taxonomic_checker_list(taxa,existing_data,verbose)
2219 return equivalents
2220
2221
2222def load_equivalents(equiv_csv):
2223 """Load equivalents data from a csv and convert to a equivalents Dict.
2224 Structure is key, with a list that is array of synonyms, followed by status ('green',
2225 'yellow' or 'red').
2226
2227 """
2228
2229 import csv
2230
2231 equivalents = {}
2232
2233 with open(equiv_csv, 'rU') as csvfile:
2234 equiv_reader = csv.reader(csvfile, delimiter=',')
2235 equiv_reader.next() # skip header
2236 for row in equiv_reader:
2237 i = 1
2238 equivalents[row[0]] = [row[1].split(';'),row[2]]
2239
2240 return equivalents
2241
2242def save_taxonomy(taxonomy, output_file):
2243
2244 import csv
2245
2246 with open(output_file, 'w') as f:
2247 writer = csv.writer(f)
2248 row = ['OTU']
2249 row.extend(taxonomy_levels)
2250 row.append('Provider')
2251 writer.writerow(row)
2252 for t in taxonomy:
2253 species = t
2254 row = []
2255 row.append(t.encode('utf-8'))
2256 for l in taxonomy_levels:
2257 try:
2258 g = taxonomy[t][l]
2259 except KeyError:
2260 g = '-'
2261 row.append(g.encode('utf-8'))
2262 try:
2263 provider = taxonomy[t]['provider']
2264 except KeyError:
2265 provider = "-"
2266 row.append(provider)
2267
2268 writer.writerow(row)
19922269
19932270
1994def load_taxonomy(taxonomy_csv):2271def load_taxonomy(taxonomy_csv):
@@ -2000,20 +2277,443 @@
20002277
2001 with open(taxonomy_csv, 'rU') as csvfile:2278 with open(taxonomy_csv, 'rU') as csvfile:
2002 tax_reader = csv.reader(csvfile, delimiter=',')2279 tax_reader = csv.reader(csvfile, delimiter=',')
2003 tax_reader.next()2280 try:
2004 for row in tax_reader:2281 j = 0
2005 current_taxonomy = {}2282 for row in tax_reader:
2006 i = 12283 if j == 0:
2007 for t in taxonomy_levels:2284 tax_levels = row[1:-1]
2008 if not row[i] == '-':2285 j += 1
2009 current_taxonomy[t] = row[i]2286 continue
2010 i = i+ 12287 i = 1
20112288 current_taxonomy = {}
2012 current_taxonomy['provider'] = row[17] # data source2289 for t in tax_levels:
2013 taxonomy[row[0]] = current_taxonomy2290 if not row[i] == '-':
2014 2291 current_taxonomy[t] = row[i]
2015 return taxonomy2292 i = i+ 1
20162293 current_taxonomy['provider'] = row[-1] # data source
2294 taxonomy[row[0].replace(" ","_")] = current_taxonomy
2295 j += 1
2296 except:
2297 pass
2298
2299 return taxonomy
2300
2301
2302class TaxonomyFetcher(threading.Thread):
2303 """ Class to provide the taxonomy fetching functionality as a threaded function to be used individually or working with a pool.
2304 """
2305
2306 def __init__(self, taxonomy, lock, queue, id=0, pref_db=None, verbose=False, ignoreWarnings=False):
2307 """ Constructor for the threaded model.
2308 :param taxonomy: previous taxonomy available (if available) or an empty dictionary to store the results .
2309 :type taxonomy: dictionary
2310 :param lock: lock to keep the taxonomy threadsafe.
2311 :type lock: Lock
2312 :param queue: queue where the taxa are kept to be processed.
2313 :type queue: Queue of strings
2314 :param id: id for the thread to use if messages need to be printed.
2315 :type id: int
2316 :param pref_db: Gives priority to database. Seems it is unused.
2317 :type pref_db: string
2318 :param verbose: Show verbose messages during execution, will also define level of logging. True will set logging level to INFO.
2319 :type verbose: boolean
2320 :param ignoreWarnings: Ignore warnings and errors during execution? Errors will be logged with ERROR level on the logging output.
2321 :type ignoreWarnings: boolean
2322 """
2323
2324 threading.Thread.__init__(self)
2325 self.taxonomy = taxonomy
2326 self.lock = lock
2327 self.queue = queue
2328 self.id = id
2329 self.verbose = verbose
2330 self.pref_db = pref_db
2331 self.ignoreWarnings = ignoreWarnings
2332
2333 def run(self):
2334 """ Gets and processes a taxon from the queue to get its taxonomy."""
2335 while True :
2336 if self.verbose :
2337 logging.getLogger().setLevel(logging.INFO)
2338 #get taxon from queue
2339 taxon = self.queue.get()
2340
2341 logging.debug("Starting {} with thread #{} remaining ~{}".format(taxon,str(self.id),str(self.queue.qsize())))
2342
2343 #Lock access to the taxonomy
2344 self.lock.acquire()
2345 if not taxon in self.taxonomy: # is a new taxon, not previously in the taxonomy
2346 #Release access to the taxonomy
2347 self.lock.release()
2348 if (self.verbose):
2349 print "Looking up ", taxon
2350 logging.info("Loolking up taxon: {}".format(str(taxon)))
2351 try:
2352 # get the data from EOL on taxon
2353 taxonq = quote_plus(taxon)
2354 URL = "http://eol.org/api/search/1.0.json?q="+taxonq
2355 req = urllib2.Request(URL)
2356 opener = urllib2.build_opener()
2357 f = opener.open(req)
2358 data = json.load(f)
2359 # check if there's some data
2360 if len(data['results']) == 0:
2361 # try PBDB as it might be a fossil
2362 URL = "http://paleobiodb.org/data1.1/taxa/single.json?name="+taxonq+"&show=phylo&vocab=pbdb"
2363 req = urllib2.Request(URL)
2364 opener = urllib2.build_opener()
2365 f = opener.open(req)
2366 datapbdb = json.load(f)
2367 if (len(datapbdb['records']) == 0):
2368 # no idea!
2369 with self.lock:
2370 self.taxonomy[taxon] = {}
2371 self.queue.task_done()
2372 continue
2373 # otherwise, let's fill in info here - only if extinct!
2374 if datapbdb['records'][0]['is_extant'] == 0:
2375 this_taxonomy = {}
2376 this_taxonomy['provider'] = 'PBDB'
2377 for level in taxonomy_levels:
2378 try:
2379 if datapbdb.has_key('records'):
2380 pbdb_lev = datapbdb['records'][0][level]
2381 temp_lev = pbdb_lev.split(" ")
2382 # they might have the author on the end, so strip it off
2383 if (level == 'species'):
2384 this_taxonomy[level] = ' '.join(temp_lev[0:2])
2385 else:
2386 this_taxonomy[level] = temp_lev[0]
2387 except KeyError as e:
2388 logging.exception("Key not found records")
2389 continue
2390 # add the taxon at right level too
2391 try:
2392 if datapbdb.has_key('records'):
2393 current_level = datapbdb['records'][0]['rank']
2394 this_taxonomy[current_level] = datapbdb['records'][0]['taxon_name']
2395 except KeyError as e:
2396 self.queue.task_done()
2397 logging.exception("Key not found records")
2398 continue
2399 with self.lock:
2400 self.taxonomy[taxon] = this_taxonomy
2401 self.queue.task_done()
2402 continue
2403 else:
2404 # extant, but not in EoL - leave the user to sort this one out
2405 with self.lock:
2406 self.taxonomy[taxon] = {}
2407 self.queue.task_done()
2408 continue
2409
2410
2411 ID = str(data['results'][0]['id']) # take first hit
2412 # Now look for taxonomies
2413 URL = "http://eol.org/api/pages/1.0/"+ID+".json"
2414 req = urllib2.Request(URL)
2415 opener = urllib2.build_opener()
2416 f = opener.open(req)
2417 data = json.load(f)
2418 if len(data['taxonConcepts']) == 0:
2419 with self.lock:
2420 self.taxonomy[taxon] = {}
2421 self.queue.task_done()
2422 continue
2423 TID = str(data['taxonConcepts'][0]['identifier']) # take first hit
2424 currentdb = str(data['taxonConcepts'][0]['nameAccordingTo'])
2425 # loop through and get preferred one if specified
2426 # now get taxonomy
2427 if (not self.pref_db is None):
2428 for db in data['taxonConcepts']:
2429 currentdb = db['nameAccordingTo'].lower()
2430 if (self.pref_db.lower() in currentdb):
2431 TID = str(db['identifier'])
2432 break
2433 URL="http://eol.org/api/hierarchy_entries/1.0/"+TID+".json"
2434 req = urllib2.Request(URL)
2435 opener = urllib2.build_opener()
2436 f = opener.open(req)
2437 data = json.load(f)
2438 this_taxonomy = {}
2439 this_taxonomy['provider'] = currentdb
2440 for a in data['ancestors']:
2441 try:
2442 if a.has_key('taxonRank') :
2443 temp_level = a['taxonRank'].encode("ascii","ignore")
2444 if (temp_level in taxonomy_levels):
2445 # note the dump into ASCII
2446 temp_name = a['scientificName'].encode("ascii","ignore")
2447 temp_name = temp_name.split(" ")
2448 if (temp_level == 'species'):
2449 this_taxonomy[temp_level] = temp_name[0:2]
2450
2451 else:
2452 this_taxonomy[temp_level] = temp_name[0]
2453 except KeyError as e:
2454 logging.exception("Key not found: taxonRank")
2455 continue
2456 try:
2457 # add taxonomy in to the taxonomy!
2458 # some issues here, so let's make sure it's OK
2459 temp_name = taxon.split(" ")
2460 if data.has_key('taxonRank') :
2461 if not data['taxonRank'].lower() == 'species':
2462 this_taxonomy[data['taxonRank'].lower()] = temp_name[0]
2463 else:
2464 this_taxonomy[data['taxonRank'].lower()] = ' '.join(temp_name[0:2])
2465 except KeyError as e:
2466 self.queue.task_done()
2467 logging.exception("Key not found: taxonRank")
2468 continue
2469 with self.lock:
2470 #Send result to dictionary
2471 self.taxonomy[taxon] = this_taxonomy
2472 except urllib2.HTTPError:
2473 print("Network error when processing {} ".format(taxon,))
2474 logging.info("Network error when processing {} ".format(taxon,))
2475 self.queue.task_done()
2476 continue
2477 except urllib2.URLError:
2478 print("Network error when processing {} ".format(taxon,))
2479 logging.info("Network error when processing {} ".format(taxon,))
2480 self.queue.task_done()
2481 continue
2482 else :
2483 #Nothing to do release the lock on taxonomy
2484 self.lock.release()
2485 #Mark task as done
2486 self.queue.task_done()
2487
2488def create_taxonomy_from_taxa(taxa, taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False, threadNumber=5):
2489 """Uses the taxa provided to generate a taxonomy for all the taxon available.
2490 :param taxa: list of the taxa.
2491 :type taxa : list
2492 :param taxonomy: previous taxonomy available (if available) or an empty
2493 dictionary to store the results. If None will be init to an empty dictionary
2494 :type taxonomy: dictionary
2495 :param pref_db: Gives priority to database. Seems it is unused.
2496 :type pref_db: string
2497 :param verbose: Show verbose messages during execution, will also define
2498 level of logging. True will set logging level to INFO.
2499 :type verbose: boolean
2500 :param ignoreWarnings: Ignore warnings and errors during execution? Errors
2501 will be logged with ERROR level on the logging output.
2502 :type ignoreWarnings: boolean
2503 :param threadNumber: Maximum number of threads to use for taxonomy processing.
2504 :type threadNumber: int
2505 :returns: dictionary with resulting taxonomy for each taxon (keys)
2506 :rtype: dictionary
2507 """
2508 if verbose :
2509 logging.getLogger().setLevel(logging.INFO)
2510 if taxonomy is None:
2511 taxonomy = {}
2512
2513 lock = threading.Lock()
2514 queue = Queue.Queue()
2515
2516 #Starting a few threads as daemons checking the queue
2517 for i in range(threadNumber) :
2518 t = TaxonomyFetcher(taxonomy, lock, queue, i, pref_db, verbose, ignoreWarnings)
2519 t.setDaemon(True)
2520 t.start()
2521
2522 #Popoluate the queue with the taxa.
2523 for taxon in taxa :
2524 queue.put(taxon)
2525
2526 #Wait till everyone finishes
2527 queue.join()
2528 logging.getLogger().setLevel(logging.WARNING)
2529
2530def create_taxonomy_from_tree(tree, existing_taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False):
2531 """ Generates the taxonomy from a tree. Uses a similar method to the XML version but works directly on a string with the tree.
2532 :param tree: list of the taxa.
2533 :type tree : list
2534 :param existing_taxonomy: list of the taxa.
2535 :type existing_taxonomy: list
2536 :param pref_db: Gives priority to database. Seems it is unused.
2537 :type pref_db: string
2538 :param verbose: Flag for verbosity.
2539 :type verbose: boolean
2540 :param ignoreWarnings: Flag for exception processing.
2541 :type ignoreWarnings: boolean
2542 :returns: the modified taxonomy
2543 :rtype: dictionary
2544 """
2545 starttime = time.time()
2546
2547 if(existing_taxonomy is None) :
2548 taxonomy = {}
2549 else :
2550 taxonomy = existing_taxonomy
2551
2552 taxa = get_taxa_from_tree_for_taxonomy(tree, pretty=True)
2553
2554 create_taxonomy_from_taxa(taxa, taxonomy)
2555
2556 taxonomy = create_extended_taxonomy(taxonomy, starttime, verbose, ignoreWarnings)
2557
2558 return taxonomy
2559
2560def create_taxonomy(XML, existing_taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False):
2561 """Generates a taxonomy of the data from EoL data. This is stored as a
2562 dictionary of taxonomy for each taxon in the dataset. Missing data are
2563 encoded as '' (blank string). It's up to the calling function to store this
2564 data to file or display it."""
2565
2566 starttime = time.time()
2567
2568 if not ignoreWarnings:
2569 _check_data(XML)
2570
2571 if (existing_taxonomy is None):
2572 taxonomy = {}
2573 else:
2574 taxonomy = existing_taxonomy
2575 taxa = get_all_taxa(XML, pretty=True)
2576 create_taxonomy_from_taxa(taxa, taxonomy)
2577 #taxonomy = create_extended_taxonomy(taxonomy, starttime, verbose, ignoreWarnings)
2578 return taxonomy
2579
2580def create_extended_taxonomy(taxonomy, starttime, verbose=False, ignoreWarnings=False):
2581 """Bring extra taxonomy terms from other databases, shared method for completing the taxonomy
2582 both for trees comming from XML or directly from trees.
2583 :param taxonomy: Dictionary with the relationship for taxa and taxonomy terms.
2584 :type taxonomy: dictionary
2585 :param starttime: time to keep track of processing time.
2586 :type starttime: long
2587 :param verbose: Flag for verbosity.
2588 :type verbose: boolean
2589 :param ignoreWarnings: Flag for exception processing.
2590 :type ignoreWarnings: boolean
2591 :returns: the modified taxonomy
2592 :rtype: dictionary
2593 """
2594
2595 if (verbose):
2596 logging.info('Done basic taxonomy, getting more info from ITIS')
2597 print("Time elapsed {}".format(str(time.time() - starttime)))
2598 print "Done basic taxonomy, getting more info from ITIS"
2599 # fill in the rest of the taxonomy
2600 # get all genera
2601 genera = []
2602 for t in taxonomy:
2603 if t in taxonomy:
2604 if GENUS in taxonomy[t]:
2605 genera.append(taxonomy[t][GENUS])
2606 genera = _uniquify(genera)
2607 # We then use ITIS to fill in missing info based on the genera only - that saves us a species level search
2608 # and we can fill in most of the EoL missing data
2609 for g in genera:
2610 if (verbose):
2611 print "Looking up ", g
2612 logging.info("Looking up {}".format(str(g)))
2613 try:
2614 URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(g.strip())
2615 except:
2616 continue
2617 req = urllib2.Request(URL)
2618 opener = urllib2.build_opener()
2619 try:
2620 f = opener.open(req)
2621 except urllib2.HTTPError:
2622 continue
2623 string = unicode(f.read(),"ISO-8859-1")
2624 data = json.loads(string)
2625 if data['scientificNames'][0] == None:
2626 continue
2627 tsn = data["scientificNames"][0]["tsn"]
2628 URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+str(tsn)
2629 req = urllib2.Request(URL)
2630 opener = urllib2.build_opener()
2631 f = opener.open(req)
2632 try:
2633 string = unicode(f.read(),"ISO-8859-1")
2634 except:
2635 continue
2636 data = json.loads(string)
2637 this_taxonomy = {}
2638 for level in data['hierarchyList']:
2639 if not level['rankName'].lower() in current_taxonomy_levels:
2640 # note the dump into ASCII
2641 if level['rankName'].lower() == 'species':
2642 this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = ' '.join.level['taxonName'][0:2].encode("ascii","ignore")
2643 else:
2644 this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore")
2645
2646 for t in taxonomy:
2647 if t in taxonomy:
2648 if GENUS in taxonomy[t]:
2649 if taxonomy[t][GENUS] == g:
2650 taxonomy[t].update(this_taxonomy)
2651
2652 return taxonomy
2653
2654def generate_species_level_data(XML, taxonomy, ignoreWarnings=False, verbose=False):
2655 """ Based on a taxonomy data set, amend the data to be at species level as
2656 far as possible. This function creates an internal 'subs file' and calls
2657 the standard substitution functions. The internal subs are generated by
2658 looping over the taxa and if not at species-level, working out which level
2659 they are at and then adding species already in the dataset to replace it
2660 via a polytomy. This has to be done in one step to avoid adding spurious
2661 structure to the phylogenies """
2662
2663 if not ignoreWarnings:
2664 _check_data(XML)
2665
2666 # if taxonomic checker not done, warn
2667 if (not taxonomy):
2668 raise NoneCompleteTaxonomy("Taxonomy is empty. Create a taxonomy first. You'll probably need to hand edit the file to complete")
2669 return
2670
2671 # if missing data in taxonomy, warn
2672 taxa = get_all_taxa(XML)
2673 keys = taxonomy.keys()
2674 if (not ignoreWarnings):
2675 for t in taxa:
2676 t = t.replace("_"," ")
2677 if not t in keys:
2678 # This idea here is that the caller will catch this, then re-run with ignoreWarnings set to True
2679 raise NoneCompleteTaxonomy("Taxonomy is not complete. I will soldier on anyway, but this might not work as intended")
2680
2681 # get all taxa - see above!
2682 # for each taxa, if not at species level
2683 new_taxa = []
2684 old_taxa = []
2685 for t in taxa:
2686 subs = []
2687 t = t.replace("_"," ")
2688 if (not SPECIES in taxonomy[t]): # the current taxon is not a species, but higher level taxon
2689 # work out which level - should we encode this in the data to start with?
2690 for tl in taxonomy_levels:
2691 try:
2692 tax_data = taxonomy[t][tl]
2693 except KeyError:
2694 continue
2695 if (t == taxonomy[t][tl]):
2696 current_level = tl
2697 # find all species in the taxonomy that match this level
2698 for taxon in taxa:
2699 taxon = taxon.replace("_"," ")
2700 if (SPECIES in taxonomy[taxon]):
2701 try:
2702 if taxonomy[taxon][current_level] == t: # our current taxon
2703 subs.append(taxon.replace(" ","_"))
2704 except KeyError:
2705 continue
2706
2707 # create the sub
2708 if len(subs) > 0:
2709 old_taxa.append(t.replace(" ","_"))
2710 new_taxa.append(','.join(subs))
2711
2712 # call the sub
2713 new_XML = substitute_taxa(XML, old_taxa, new_taxa, verbose=verbose)
2714 new_XML = clean_data(new_XML)
2715
2716 return new_XML
20172717
2018def data_overlap(XML, overlap_amount=2, filename=None, detailed=False, show=False, verbose=False, ignoreWarnings=False):2718def data_overlap(XML, overlap_amount=2, filename=None, detailed=False, show=False, verbose=False, ignoreWarnings=False):
2019 """ Calculate the amount of taxonomic overlap between source trees.2719 """ Calculate the amount of taxonomic overlap between source trees.
@@ -2024,7 +2724,7 @@
2024 If filename is None, no graphic is generated. Otherwise a simple2724 If filename is None, no graphic is generated. Otherwise a simple
2025 graphic is generated showing the number of cluster. If detailed is set to2725 graphic is generated showing the number of cluster. If detailed is set to
2026 true, a graphic is generated showing *all* trees. For data containing >2002726 true, a graphic is generated showing *all* trees. For data containing >200
2027 source tres this could be very big and take along time. More likely, you'll run2727 source trees this could be very big and take along time. More likely, you'll run
2028 out of memory.2728 out of memory.
2029 """2729 """
2030 import matplotlib2730 import matplotlib
@@ -2103,6 +2803,7 @@
2103 sufficient_overlap = True2803 sufficient_overlap = True
21042804
2105 # The above list actually contains which components are seperate from each other2805 # The above list actually contains which components are seperate from each other
2806 key_list = connected_components
21062807
2107 if (not filename == None or show):2808 if (not filename == None or show):
2108 if (verbose):2809 if (verbose):
@@ -2266,7 +2967,9 @@
2266 prev_char = None2967 prev_char = None
2267 prev_taxa = None2968 prev_taxa = None
2268 prev_name = None2969 prev_name = None
2269 non_ind = {}2970 subsets = []
2971 identical = []
2972 is_identical = False
2270 for data in data_ind:2973 for data in data_ind:
2271 name = data[0]2974 name = data[0]
2272 char = data[1]2975 char = data[1]
@@ -2275,22 +2978,71 @@
2275 # when sorted, the longer list comes first2978 # when sorted, the longer list comes first
2276 if set(taxa).issubset(set(prev_taxa)):2979 if set(taxa).issubset(set(prev_taxa)):
2277 if (taxa == prev_taxa):2980 if (taxa == prev_taxa):
2278 non_ind[name] = [prev_name,IDENTICAL]2981 if (is_identical):
2982 identical[-1].append(name)
2983 else:
2984 identical.append([name,prev_name])
2985 is_identical = True
2986
2279 else:2987 else:
2280 non_ind[name] = [prev_name,SUBSET]2988 subsets.append([prev_name, name])
2989 prev_name = name
2990 is_identical = False
2991 else:
2992 prev_name = name
2993 is_identical = False
2994 else:
2995 prev_name = name
2996 is_identical = False
2997
2281 prev_char = char2998 prev_char = char
2282 prev_taxa = taxa2999 prev_taxa = taxa
2283 prev_name = name3000
2284
2285 if (make_new_xml):3001 if (make_new_xml):
2286 new_xml = XML3002 new_xml = XML
2287 for name in non_ind:3003 # deal with subsets
2288 if (non_ind[name][1] == SUBSET):3004 for s in subsets:
2289 new_xml = _swap_tree_in_XML(new_xml,None,name) 3005 new_xml = _swap_tree_in_XML(new_xml,None,s[1])
2290 new_xml = clean_data(new_xml)3006 new_xml = clean_data(new_xml)
2291 return non_ind, new_xml3007 # deal with identical - weight them, if there's 3, weights are 0.3, i.e.
3008 # weights are 1/no of identical trees
3009 for i in identical:
3010 weight = 1.0 / float(len(i))
3011 new_xml = add_weights(new_xml, i, weight)
3012
3013 return identical, subsets, new_xml
2292 else:3014 else:
2293 return non_ind3015 return identical, subsets
3016
3017
3018def add_weights(XML, names, weight):
3019 """ Add weights for tree, supply array of names and a weight, they get set
3020 Returns a new XML
3021 """
3022
3023 xml_root = _parse_xml(XML)
3024 # By getting source, we can then loop over each source_tree
3025 find = etree.XPath("//source_tree")
3026 sources = find(xml_root)
3027 for s in sources:
3028 s_name = s.attrib['name']
3029 for n in names:
3030 if s_name == n:
3031 if s.xpath("tree/weight/real_value") == []:
3032 # add weights
3033 weights_element = etree.Element("weight")
3034 weights_element.tail="\n"
3035 real_value = etree.SubElement(weights_element,'real_value')
3036 real_value.attrib['rank'] = '0'
3037 real_value.tail = '\n'
3038 real_value.text = str(weight)
3039 t = s.xpath("tree")[0]
3040 t.append(weights_element)
3041 else:
3042 s.xpath("tree/weight/real_value")[0].text = str(weight)
3043
3044 return etree.tostring(xml_root,pretty_print=True)
3045
22943046
2295def add_historical_event(XML, event_description):3047def add_historical_event(XML, event_description):
2296 """3048 """
@@ -2380,8 +3132,15 @@
2380 # check trees are informative3132 # check trees are informative
2381 XML = _check_informative_trees(XML,delete=True)3133 XML = _check_informative_trees(XML,delete=True)
23823134
3135
2383 # check sources3136 # check sources
2384 XML = _check_sources(XML,delete=True)3137 XML = _check_sources(XML,delete=True)
3138 XML = all_sourcenames(XML)
3139
3140 # fix tree names
3141 XML = set_unique_names(XML)
3142 XML = set_all_tree_names(XML,overwrite=True)
3143
23853144
2386 # unpermutable trees3145 # unpermutable trees
2387 permutable_trees = _find_trees_for_permuting(XML)3146 permutable_trees = _find_trees_for_permuting(XML)
@@ -2659,7 +3418,7 @@
2659 s.getparent().remove(s)3418 s.getparent().remove(s)
26603419
2661 # edit name (append _subset)3420 # edit name (append _subset)
2662 proj_name = xml_root.xpath('/phylo_storage/project_name/string_value')[0].text3421 proj_name = get_project_name(XML)
2663 proj_name += "_subset"3422 proj_name += "_subset"
2664 xml_root.xpath('/phylo_storage/project_name/string_value')[0].text = proj_name3423 xml_root.xpath('/phylo_storage/project_name/string_value')[0].text = proj_name
26653424
@@ -2928,6 +3687,37 @@
29283687
2929 return mrca3688 return mrca
29303689
3690
3691def tree_from_taxonomy(taxonomy, end_level, end_rank):
3692 """Create a tree from a taxonomy data structure.
3693 This is not the most efficient way, but works OK
3694 """
3695
3696 # Grab data only for the end_level classification
3697 required_taxonomy = {}
3698 for t in taxonomy:
3699 if (end_level in t):
3700 required_taxonomy[t] = taxonomy[t]
3701
3702 rank_index = taxonomy_levels.index(end_rank)
3703
3704 # create basic string
3705
3706 # get unique otus
3707
3708 # sort by the subfamily
3709
3710 # for each genus create a newick string
3711
3712 # if it's the same grouping as previous, add as sister clade (i.e. ,)
3713 # else, prepend a (, append a ) and add new clade (ie. ,)
3714
3715
3716 # return tree
3717
3718
3719
3720
2931################ PRIVATE FUNCTIONS ########################3721################ PRIVATE FUNCTIONS ########################
29323722
2933def _uniquify(l):3723def _uniquify(l):
@@ -2975,13 +3765,25 @@
2975 "The source names in the dataset are not unique. Please run the auto-name function on these data. Name: "+name+"\n"3765 "The source names in the dataset are not unique. Please run the auto-name function on these data. Name: "+name+"\n"
2976 last_name = name3766 last_name = name
29773767
3768 # do same for tree names:
3769 names = get_all_tree_names(XML)
3770 names.sort()
3771 last_name = "" # This will actually throw an non-unique error if a name is empty
3772 # not great, but still an error!
3773 for name in names:
3774 if name == last_name:
3775 # if non-unique throw exception
3776 message = message + \
3777 "The tree names in the dataset are not unique. Please run the auto-name function on these data with replace or edit by hand. Name: "+name+"\n"
3778 last_name = name
3779
2978 if (not message == ""):3780 if (not message == ""):
2979 raise NotUniqueError(message)3781 raise NotUniqueError(message)
29803782
2981 return3783 return
29823784
29833785
2984def _assemble_tree_matrix(tree_string):3786def _assemble_tree_matrix(tree_string, verbose=False):
2985 """ Assembles the MRP matrix for an individual tree3787 """ Assembles the MRP matrix for an individual tree
29863788
2987 returns: matrix (2D numpy array: taxa on i, nodes on j)3789 returns: matrix (2D numpy array: taxa on i, nodes on j)
@@ -3009,7 +3811,7 @@
3009 for i in range(0,len(names)):3811 for i in range(0,len(names)):
3010 adjmat.append([1])3812 adjmat.append([1])
3011 adjmat = numpy.array(adjmat)3813 adjmat = numpy.array(adjmat)
30123814 if verbose:
3013 print "Warning: Found uninformative tree in data. Including it in the matrix anyway"3815 print "Warning: Found uninformative tree in data. Including it in the matrix anyway"
30143816
3015 return adjmat, names3817 return adjmat, names
@@ -3020,7 +3822,7 @@
3020 3822
3021 If the new_taxa array is missing, simply delete the old_taxa3823 If the new_taxa array is missing, simply delete the old_taxa
3022 """3824 """
3023 3825
3024 tree = _correctly_quote_taxa(tree)3826 tree = _correctly_quote_taxa(tree)
3025 # are the input values lists or simple strings?3827 # are the input values lists or simple strings?
3026 if (isinstance(old_taxa,str)):3828 if (isinstance(old_taxa,str)):
@@ -3564,7 +4366,7 @@
35644366
3565 return permute_trees4367 return permute_trees
35664368
3567def _create_matrix(trees, taxa, format="hennig", quote=False, weights=None):4369def _create_matrix(trees, taxa, format="hennig", quote=False, weights=None, verbose=False):
3568 """4370 """
3569 Does the hard work on creating a matrix4371 Does the hard work on creating a matrix
3570 """4372 """
@@ -3585,7 +4387,7 @@
3585 if (not weights == None):4387 if (not weights == None):
3586 weight = weights[key]4388 weight = weights[key]
3587 names.append(key)4389 names.append(key)
3588 submatrix, tree_taxa = _assemble_tree_matrix(trees[key])4390 submatrix, tree_taxa = _assemble_tree_matrix(trees[key], verbose=verbose)
3589 nChars = len(submatrix[0,:])4391 nChars = len(submatrix[0,:])
3590 # loop over characters in the submatrix4392 # loop over characters in the submatrix
3591 for i in range(1,nChars):4393 for i in range(1,nChars):
@@ -3637,7 +4439,7 @@
3637 matrix_string += string + "\n"4439 matrix_string += string + "\n"
3638 i += 14440 i += 1
3639 4441
3640 matrix_string += "\t;\n"4442 matrix_string += "\n"
3641 if (not weights == None):4443 if (not weights == None):
3642 # get unique weights4444 # get unique weights
3643 unique_weights = _uniquify(weights)4445 unique_weights = _uniquify(weights)
@@ -3652,7 +4454,7 @@
3652 matrix_string += " " + str(i)4454 matrix_string += " " + str(i)
3653 i += 14455 i += 1
3654 matrix_string += ";\n"4456 matrix_string += ";\n"
3655 matrix_string += "procedure /;"4457 matrix_string += "proc /;"
3656 elif (format == 'nexus'):4458 elif (format == 'nexus'):
3657 matrix_string = "#nexus\n\nbegin data;\n"4459 matrix_string = "#nexus\n\nbegin data;\n"
3658 matrix_string += "\tdimensions ntax = "+str(len(taxa)) +" nchar = "+str(last_char)+";\n"4460 matrix_string += "\tdimensions ntax = "+str(len(taxa)) +" nchar = "+str(last_char)+";\n"
36594461
=== modified file 'stk/test/_substitute_taxa.py'
--- stk/test/_substitute_taxa.py 2016-07-14 10:12:17 +0000
+++ stk/test/_substitute_taxa.py 2017-01-12 09:27:31 +0000
@@ -10,6 +10,7 @@
10from stk.supertree_toolkit import check_subs, _tree_contains, _correctly_quote_taxa, _remove_single_poly_taxa10from stk.supertree_toolkit import check_subs, _tree_contains, _correctly_quote_taxa, _remove_single_poly_taxa
11from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_all_taxa, _parse_tree, _delete_taxon11from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_all_taxa, _parse_tree, _delete_taxon
12from stk.supertree_toolkit import _collapse_nodes, import_tree, subs_from_csv, _getTaxaFromNewick, obtain_trees12from stk.supertree_toolkit import _collapse_nodes, import_tree, subs_from_csv, _getTaxaFromNewick, obtain_trees
13from stk.supertree_toolkit import generate_species_level_data
13from lxml import etree14from lxml import etree
14from util import *15from util import *
15from stk.stk_exceptions import *16from stk.stk_exceptions import *
@@ -776,7 +777,24 @@
776 new_tree = _sub_taxa_in_tree(tree2,"Thereuopodina",sub_in,skip_existing=True);777 new_tree = _sub_taxa_in_tree(tree2,"Thereuopodina",sub_in,skip_existing=True);
777 self.assert_(answer2, new_tree)778 self.assert_(answer2, new_tree)
778779
779 780
781 def test_auto_subs_taxonomy(self):
782 """test the automatic subs function with a simple test"""
783 XML = etree.tostring(etree.parse('data/input/auto_sub.phyml',parser),pretty_print=True)
784 taxonomy = {'Ardea goliath': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Ecdysozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Ardea', 'order': 'Pelecaniformes', 'species': 'Ardea goliath'},
785 'Pelecaniformes': {'kingdom': 'Animalia', 'phylum': 'Chordata', 'order': 'Pelecaniformes', 'class': 'Aves', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013'}, 'Gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes'},
786 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'infraphylum': 'Gnathostomata', 'superclass': 'Tetrapoda', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'order': 'Procellariiformes', 'species': 'Thalassarche melanophris'},
787 'Platalea leucorodia': {'kingdom': 'Animalia', 'subfamily': 'Plataleinae', 'family': 'Threskiornithidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'infraphylum': 'Gnathostomata', 'superclass': 'Tetrapoda', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Platalea', 'order': 'Pelecaniformes', 'species': 'Platalea leucorodia'},
788 'Gallus lafayetii': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus lafayetii'},
789 'Ardea humbloti': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Ecdysozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Ardea', 'order': 'Pelecaniformes', 'species': 'Ardea humbloti'},
790 'Gallus varius': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus varius'}}
791 XML = generate_species_level_data(XML, taxonomy)
792 expected_XML = etree.tostring(etree.parse('data/output/one_click_subs_output.phyml',parser),pretty_print=True)
793 trees = obtain_trees(XML)
794 expected_trees = obtain_trees(expected_XML)
795 for t in trees:
796 self.assert_(_trees_equal(trees[t], expected_trees[t]))
797
780 def test_parrot_edge_case(self):798 def test_parrot_edge_case(self):
781 """Random edge case where the tree dissappeared..."""799 """Random edge case where the tree dissappeared..."""
782 trees = ["(((((((Agapornis_lilianae, Agapornis_nigrigenis), Agapornis_personata, Agapornis_fischeri), Agapornis_roseicollis), (Agapornis_pullaria, Agapornis_taranta)), Agapornis_cana), Loriculus_galgulus), Geopsittacus_occidentalis);"]800 trees = ["(((((((Agapornis_lilianae, Agapornis_nigrigenis), Agapornis_personata, Agapornis_fischeri), Agapornis_roseicollis), (Agapornis_pullaria, Agapornis_taranta)), Agapornis_cana), Loriculus_galgulus), Geopsittacus_occidentalis);"]
783801
=== modified file 'stk/test/_supertree_toolkit.py'
--- stk/test/_supertree_toolkit.py 2015-03-26 09:58:58 +0000
+++ stk/test/_supertree_toolkit.py 2017-01-12 09:27:31 +0000
@@ -7,12 +7,13 @@
7import os7import os
8stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir, os.pardir )8stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir, os.pardir )
9sys.path.insert(0, stk_path)9sys.path.insert(0, stk_path)
10from stk.supertree_toolkit import _check_uniqueness, _check_taxa, _check_data, get_all_characters, data_independence10from stk.supertree_toolkit import _check_uniqueness, _check_taxa, _check_data, get_all_characters, data_independence, add_weights
11from stk.supertree_toolkit import get_fossil_taxa, get_publication_years, data_summary, get_character_numbers, get_analyses_used11from stk.supertree_toolkit import get_fossil_taxa, get_publication_years, data_summary, get_character_numbers, get_analyses_used
12from stk.supertree_toolkit import data_overlap, read_matrix, subs_file_from_str, clean_data, obtain_trees, get_all_source_names12from stk.supertree_toolkit import data_overlap, read_matrix, subs_file_from_str, clean_data, obtain_trees, get_all_source_names
13from stk.supertree_toolkit import add_historical_event, _sort_data, _parse_xml, _check_sources, _swap_tree_in_XML, replace_genera13from stk.supertree_toolkit import add_historical_event, _sort_data, _parse_xml, _check_sources, _swap_tree_in_XML, replace_genera
14from stk.supertree_toolkit import get_all_taxa, _get_all_siblings, _parse_tree, get_characters_used, _trees_equal, get_weights14from stk.supertree_toolkit import get_all_taxa, _get_all_siblings, _parse_tree, get_characters_used, _trees_equal, get_weights
15from stk.supertree_toolkit import get_outgroup, set_all_tree_names, create_tree_name, load_taxonomy15from stk.supertree_toolkit import get_outgroup, set_all_tree_names, create_tree_name, taxonomic_checker, load_taxonomy, load_equivalents
16from stk.supertree_toolkit import create_taxonomy, create_taxonomy_from_tree, get_all_tree_names
16from lxml import etree17from lxml import etree
17from util import *18from util import *
18from stk.stk_exceptions import *19from stk.stk_exceptions import *
@@ -268,19 +269,52 @@
268 269
269 def test_data_independence(self):270 def test_data_independence(self):
270 XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True)271 XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True)
271 expected_dict = {'Hill_2011_2': ['Hill_2011_1', 1], 'Hill_Davis_2011_1': ['Hill_Davis_2011_2', 0]}272 expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']]
272 non_ind = data_independence(XML)273 non_ind,subsets = data_independence(XML)
273 self.assertDictEqual(expected_dict, non_ind)274 expected_subsets = [['Hill_2011_1', 'Hill_2011_2']]
275 self.assertListEqual(expected_subsets, subsets)
276 self.assertListEqual(expected_idents, non_ind)
274277
275 def test_data_independence(self):278 def test_data_independence_2(self):
276 XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True)279 XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True)
277 expected_dict = {'Hill_2011_2': ['Hill_2011_1', 1], 'Hill_Davis_2011_1': ['Hill_Davis_2011_2', 0]}280 expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']]
278 non_ind, new_xml = data_independence(XML,make_new_xml=True)281 expected_subsets = [['Hill_2011_1', 'Hill_2011_2']]
279 self.assertDictEqual(expected_dict, non_ind)282 non_ind, subset, new_xml = data_independence(XML,make_new_xml=True)
283 self.assertListEqual(expected_idents, non_ind)
284 self.assertListEqual(expected_subsets, subset)
280 # check the second tree has not been removed285 # check the second tree has not been removed
281 self.assertRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,H:1.00000)0.00000:0.00000)0.00000:0.00000;'))286 self.assertRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,H:1.00000)0.00000:0.00000)0.00000:0.00000;'))
282 # check that the first tree is removed287 # check that the first tree is removed
283 self.assertNotRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,(F:1.00000,E:1.00000)0.00000:0.00000)0.00000:0.00000;'))288 self.assertNotRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,(F:1.00000,E:1.00000)0.00000:0.00000)0.00000:0.00000;'))
289
290 def test_add_weights(self):
291 """Add weights to a bunch of trees"""
292 XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True)
293 # see above
294 expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']]
295 # so the first should end up with a weight of 0.33333 and the second with 0.5
296 for ei in expected_idents:
297 weight = 1.0/float(len(ei))
298 XML = add_weights(XML, ei, weight)
299
300 expected_weights = [str(1.0/3.0), str(1.0/3.0), str(1.0/3.0), str(0.5), str(0.5)]
301 weights_in_xml = []
302 # now check weights have been added to the correct part of the tree
303 xml_root = _parse_xml(XML)
304 i = 0
305 for ei in expected_idents:
306 for tree in ei:
307 find = etree.XPath("//source_tree")
308 trees = find(xml_root)
309 for t in trees:
310 if t.attrib['name'] == tree:
311 # check len(trees) == 0
312 weights_in_xml.append(t.xpath("tree/weight/real_value")[0].text)
313
314 self.assertListEqual(expected_weights,weights_in_xml)
315
316
317
284 318
285 def test_overlap(self):319 def test_overlap(self):
286 XML = etree.tostring(etree.parse('data/input/check_overlap_ok.phyml',parser),pretty_print=True)320 XML = etree.tostring(etree.parse('data/input/check_overlap_ok.phyml',parser),pretty_print=True)
@@ -438,7 +472,7 @@
438 XML = clean_data(XML)472 XML = clean_data(XML)
439 trees = obtain_trees(XML)473 trees = obtain_trees(XML)
440 self.assert_(len(trees) == 2)474 self.assert_(len(trees) == 2)
441 expected_trees = {'Hill_2011_4': '(A,B,(C,D,E));', 'Hill_2011_2': '(A, B, C, (D, E, F));'}475 expected_trees = {'Hill_2011_2': '(A,B,(C,D,E));', 'Hill_2011_1': '(A, B, C, (D, E, F));'}
442 for t in trees:476 for t in trees:
443 self.assert_(_trees_equal(trees[t],expected_trees[t]))477 self.assert_(_trees_equal(trees[t],expected_trees[t]))
444478
@@ -558,18 +592,78 @@
558 self.assert_(c in expected_characters)592 self.assert_(c in expected_characters)
559 self.assert_(len(characters) == len(expected_characters))593 self.assert_(len(characters) == len(expected_characters))
560594
595 def test_create_taxonomy(self):
596 XML = etree.tostring(etree.parse('data/input/create_taxonomy.phyml',parser),pretty_print=True)
597 # Tested on 11/01/17 and EOL have changed the output
598 # old_expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'superclass': 'Sarcopterygii', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Egretta tricolor', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Jeletzkytes criptonodosus': {'superfamily': 'Scaphitoidea', 'family': 'Scaphitidae', 'subkingdom': 'Metazoa', 'subclass': 'Ammonoidea', 'species': 'Jeletzkytes criptonodosus', 'phylum': 'Mollusca', 'suborder': 'Ancyloceratina', 'provider': 'Paleobiology Database', 'genus': 'Jeletzkytes', 'class': 'Cephalopoda'}}
599 expected = {'Jeletzkytes criptonodosus': {'superfamily': 'Scaphitoidea', 'family': 'Scaphitidae', 'subkingdom': 'Metazoa', 'subclass': 'Ammonoidea', 'species': 'Jeletzkytes criptonodosus', 'phylum': 'Mollusca', 'suborder': 'Ancyloceratina', 'provider': 'Paleobiology Database', 'genus': 'Jeletzkytes', 'class': 'Cephalopoda'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'infraspecies': 'Egretta', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': ['Egretta', 'tricolor'], 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Archaeopteryx lithographica': {'genus': 'Archaeopteryx', 'provider': 'Paleobiology Database'}}
600 if (internet_on()):
601 taxonomy = create_taxonomy(XML)
602 self.maxDiff = None
603 self.assertDictEqual(taxonomy, expected)
604 else:
605 print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function"
606 return
607
608 def test_create_taxonomy_from_tree(self):
609 """Tests if taxonomy from tree works. Uses same data for normal XML test but goes directly for the tree instead of parsing the XML """
610 # Tested on 11/01/17 and this no longer worked, but is correct! EOL returned something different.
611 #old_expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'superclass': 'Sarcopterygii', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Egretta tricolor', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}}
612 expected = {'Archaeopteryx lithographica': {'genus': 'Archaeopteryx', 'provider': 'Paleobiology Database'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'infraspecies': 'Egretta', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': ['Egretta', 'tricolor'], 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}}
613 tree = "(Archaeopteryx_lithographica, (Gallus_gallus, (Thalassarche_melanophris, Egretta_tricolor)));"
614 if (internet_on()):
615 taxonomy = create_taxonomy_from_tree(tree)
616 self.maxDiff = None
617 self.assertDictEqual(taxonomy, expected)
618 else:
619 print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the create_taxonomy function"
620 return
621
622 def test_taxonomy_checker(self):
623 expected = {'Thalassarche_melanophrys': [['Thalassarche_melanophris', 'Thalassarche_melanophrys', 'Diomedea_melanophris', 'Thalassarche_[melanophrys', 'Diomedea_melanophrys'], 'amber'], 'Egretta_tricolor': [['Egretta_tricolor'], 'green'], 'Gallus_gallus': [['Gallus_gallus'], 'green']}
624 XML = etree.tostring(etree.parse('data/input/check_taxonomy.phyml',parser),pretty_print=True)
625 if (internet_on()):
626 equivs = taxonomic_checker(XML)
627 self.maxDiff = None
628 self.assertDictEqual(equivs, expected)
629 else:
630 print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function"
631 return
632
633 def test_taxonomy_checker2(self):
634 XML = etree.tostring(etree.parse('data/input/check_taxonomy_fixes.phyml',parser),pretty_print=True)
635 if (internet_on()):
636 # This test is a bit dodgy as it depends on EOL's server speed. Run it a few times before deciding it's broken.
637 equivs = taxonomic_checker(XML,verbose=False)
638 self.maxDiff = None
639 self.assert_(equivs['Agathamera_crassa'][0][0] == 'Agathemera_crassa')
640 self.assert_(equivs['Celatoblatta_brunni'][0][0] == 'Maoriblatta_brunni')
641 self.assert_(equivs['Blatta_lateralis'][1] == 'amber')
642 else:
643 print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function"
644 return
645
646
561 def test_load_taxonomy(self):647 def test_load_taxonomy(self):
562 csv_file = "data/input/create_taxonomy.csv"648 csv_file = "data/input/create_taxonomy.csv"
563 expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'},649 expected = {'Jeletzkytes_criptonodosus': {'kingdom': 'Metazoa', 'subclass': 'Cephalopoda', 'species': 'Jeletzkytes criptonodosus', 'suborder': 'Ammonoidea', 'provider': 'PBDB', 'subfamily': 'Scaphitidae', 'class': 'Mollusca'}, 'Archaeopteryx_lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Egretta_tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'subkingdom': 'Bilateria', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'subclass': 'Neoloricata', 'species': 'Egretta tricolor', 'phylum': 'Chordata', 'suborder': 'Ischnochitonina', 'superphylum': 'Lophozoa', 'infrakingdom': 'Protostomia', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus_gallus': {'kingdom': 'Animalia', 'superorder': 'Galliformes', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'class': 'Aves'}, 'Thalassarche_melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'species': 'Thalassarche melanophris', 'order': 'Procellariiformes', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'class': 'Aves'}}
564 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'subclass': 'Neoloricata', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'suborder': 'Ischnochitonina', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Egretta', 'order': 'Pelecaniformes', 'species': 'Egretta tricolor'},
565 'Gallus gallus': {'kingdom': 'Animalia', 'infrakingdom': 'Protostomia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus gallus'},
566 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'order': 'Procellariiformes', 'species': 'Thalassarche melanophris'},
567 'Jeletzkytes criptonodosus': {'kingdom': 'Metazoa', 'family': 'Scaphitidae', 'order': 'Ammonoidea', 'phylum': 'Mollusca', 'provider': 'PBDB', 'species': 'Jeletzkytes criptonodosus', 'class': 'Cephalopoda'}}
568 taxonomy = load_taxonomy(csv_file)650 taxonomy = load_taxonomy(csv_file)
569 self.maxDiff = None651 self.maxDiff = None
570652
571 self.assertDictEqual(taxonomy, expected)653 self.assertDictEqual(taxonomy, expected)
572654
655
656 def test_load_equivalents(self):
657 csv_file = "data/input/equivalents.csv"
658 expected = {'Turnix_sylvatica': [['Turnix_sylvaticus','Tetrao_sylvaticus','Tetrao_sylvatica','Turnix_sylvatica'],'yellow'],
659 'Xiphorhynchus_pardalotus':[['Xiphorhynchus_pardalotus'],'green'],
660 'Phaenicophaeus_curvirostris':[['Zanclostomus_curvirostris','Rhamphococcyx_curvirostris','Phaenicophaeus_curvirostris','Rhamphococcyx_curvirostr'],'yellow'],
661 'Megalapteryx_benhami':[['Megalapteryx_benhami'],'red']
662 }
663 equivalents = load_equivalents(csv_file)
664 self.assertDictEqual(equivalents, expected)
665
666
573 def test_name_tree(self):667 def test_name_tree(self):
574 XML = etree.tostring(etree.parse('data/input/single_source_no_names.phyml',parser),pretty_print=True)668 XML = etree.tostring(etree.parse('data/input/single_source_no_names.phyml',parser),pretty_print=True)
575 xml_root = _parse_xml(XML)669 xml_root = _parse_xml(XML)
@@ -583,6 +677,35 @@
583 XML = etree.tostring(etree.parse('data/input/single_source.phyml',parser),pretty_print=True)677 XML = etree.tostring(etree.parse('data/input/single_source.phyml',parser),pretty_print=True)
584 self.assert_(isEqualXML(new_xml,XML))678 self.assert_(isEqualXML(new_xml,XML))
585679
680 def test_all_rename_tree(self):
681 XML = etree.tostring(etree.parse('data/input/single_source_same_tree_name.phyml',parser),pretty_print=True)
682 new_xml = set_all_tree_names(XML,overwrite=True)
683 XML = etree.tostring(etree.parse('data/output/single_source_same_tree_name.phyml',parser),pretty_print=True)
684 self.assert_(isEqualXML(new_xml,XML))
685
686 def test_get_all_tree_names(self):
687 XML = etree.tostring(etree.parse('data/input/single_source_same_tree_name.phyml',parser),pretty_print=True)
688 names = get_all_tree_names(XML)
689 self.assertListEqual(names,['Hill_2011_2','Hill_2011_2'])
690
691
692def internet_on(host="8.8.8.8", port=443, timeout=5):
693 import socket
694
695 """
696 Host: 8.8.8.8 (google-public-dns-a.google.com)
697 OpenPort: 53/tcp
698 Service: domain (DNS/TCP)
699 """
700 try:
701 socket.setdefaulttimeout(timeout)
702 socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect((host, port))
703 return True
704 except Exception as ex:
705 print ex.message
706 return False
707
708
586709
587if __name__ == '__main__':710if __name__ == '__main__':
588 unittest.main()711 unittest.main()
589712
=== modified file 'stk/test/_trees.py'
--- stk/test/_trees.py 2015-03-26 09:58:58 +0000
+++ stk/test/_trees.py 2017-01-12 09:27:31 +0000
@@ -5,7 +5,7 @@
5sys.path.insert(0,"../../")5sys.path.insert(0,"../../")
6from stk.supertree_toolkit import import_tree, obtain_trees, get_all_taxa, _assemble_tree_matrix, create_matrix, _delete_taxon, _sub_taxon,_tree_contains6from stk.supertree_toolkit import import_tree, obtain_trees, get_all_taxa, _assemble_tree_matrix, create_matrix, _delete_taxon, _sub_taxon,_tree_contains
7from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_taxa_from_tree, get_characters_from_tree, amalgamate_trees, _uniquify7from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_taxa_from_tree, get_characters_from_tree, amalgamate_trees, _uniquify
8from stk.supertree_toolkit import import_trees, import_tree, _trees_equal, _find_trees_for_permuting, permute_tree, get_all_source_names, _getTaxaFromNewick8from stk.supertree_toolkit import import_trees, import_tree, _trees_equal, _find_trees_for_permuting, permute_tree, get_all_source_names, _getTaxaFromNewick, _parse_tree
9from stk.supertree_toolkit import get_mrca9from stk.supertree_toolkit import get_mrca
10import os10import os
11from lxml import etree11from lxml import etree
@@ -215,6 +215,18 @@
215 mrca = get_mrca(tree,["A","I", "L"])215 mrca = get_mrca(tree,["A","I", "L"])
216 self.assert_(mrca == 8)216 self.assert_(mrca == 8)
217217
218 def test_get_mrca(self):
219 tree = "(B,(C,(D,(E,((A,F),((I,(G,H)),(J,(K,L))))))));"
220 mrca = get_mrca(tree,["A","F"])
221 print mrca
222 #self.assert_(mrca == 8)
223 to = _parse_tree('(X,Y,Z,(Q,W));')
224 treeobj = _parse_tree(tree)
225 newnode = treeobj.addNodeBetweenNodes(10,9)
226 treeobj.addSubTree(newnode, to, ignoreRootAssert=True)
227 treeobj.draw()
228
229
218 def test_get_all_trees(self):230 def test_get_all_trees(self):
219 XML = etree.tostring(etree.parse(single_source_input,parser),pretty_print=True)231 XML = etree.tostring(etree.parse(single_source_input,parser),pretty_print=True)
220 tree = obtain_trees(XML)232 tree = obtain_trees(XML)
221233
=== added file 'stk/test/data/input/auto_sub.phyml'
--- stk/test/data/input/auto_sub.phyml 1970-01-01 00:00:00 +0000
+++ stk/test/data/input/auto_sub.phyml 2017-01-12 09:27:31 +0000
@@ -0,0 +1,97 @@
1<?xml version='1.0' encoding='utf-8'?>
2<phylo_storage>
3 <project_name>
4 <string_value lines="1">Test</string_value>
5 </project_name>
6 <sources>
7 <source name="Hill_2011">
8 <bibliographic_information>
9 <article>
10 <authors>
11 <author>
12 <surname>
13 <string_value lines="1">Hill</string_value>
14 </surname>
15 <other_names>
16 <string_value lines="1">Jon</string_value>
17 </other_names>
18 </author>
19 </authors>
20 <title>
21 <string_value lines="1">A great paper</string_value>
22 </title>
23 <year>
24 <integer_value rank="0">2011</integer_value>
25 </year>
26 <journal>
27 <string_value lines="1">Nature</string_value>
28 </journal>
29 <pages>
30 <string_value lines="1">1-12</string_value>
31 </pages>
32 </article>
33 </bibliographic_information>
34 <source_tree name="Hill_2011_1">
35 <tree>
36 <tree_string>
37 <string_value lines="1">(Thalassarche_melanophris, Pelecaniformes, (Gallus, Gallus_varius));</string_value>
38 </tree_string>
39 <figure_legend>
40 <string_value lines="1">NA</string_value>
41 </figure_legend>
42 <figure_number>
43 <string_value lines="1">1</string_value>
44 </figure_number>
45 <page_number>
46 <string_value lines="1">1</string_value>
47 </page_number>
48 <tree_inference>
49 <optimality_criterion name="Maximum Parsimony"/>
50 </tree_inference>
51 <topology>
52 <outgroup>
53 <string_value lines="1">A</string_value>
54 </outgroup>
55 </topology>
56 </tree>
57 <taxa_data>
58 <all_extant/>
59 </taxa_data>
60 <character_data>
61 <character type="molecular" name="12S"/>
62 </character_data>
63 </source_tree>
64 <source_tree name="Hill_2011_2">
65 <tree>
66 <tree_string>
67 <string_value lines="1">(Gallus_lafayetii, (Platalea_leucorodia, (Ardea_humbloti, Ardea_goliath)));</string_value>
68 </tree_string>
69 <figure_legend>
70 <string_value lines="1">NA</string_value>
71 </figure_legend>
72 <figure_number>
73 <string_value lines="1">1</string_value>
74 </figure_number>
75 <page_number>
76 <string_value lines="1">1</string_value>
77 </page_number>
78 <tree_inference>
79 <optimality_criterion name="Maximum Parsimony"/>
80 </tree_inference>
81 <topology>
82 <outgroup>
83 <string_value lines="1">A</string_value>
84 </outgroup>
85 </topology>
86 </tree>
87 <taxa_data>
88 <all_extant/>
89 </taxa_data>
90 <character_data>
91 <character type="molecular" name="12S"/>
92 </character_data>
93 </source_tree>
94 </source>
95 </sources>
96 <history/>
97</phylo_storage>
098
=== modified file 'stk/test/data/input/check_data_ind.phyml'
--- stk/test/data/input/check_data_ind.phyml 2014-10-09 09:33:21 +0000
+++ stk/test/data/input/check_data_ind.phyml 2017-01-12 09:27:31 +0000
@@ -249,6 +249,147 @@
249 <character type="molecular" name="12S"/>249 <character type="molecular" name="12S"/>
250 </character_data>250 </character_data>
251 </source_tree>251 </source_tree>
252 <source_tree name="Hill_Davis_2011_3">
253 <tree>
254 <tree_string>
255 <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,H:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value>
256 </tree_string>
257 <figure_legend>
258 <string_value lines="1">NA</string_value>
259 </figure_legend>
260 <figure_number>
261 <string_value lines="1">0</string_value>
262 </figure_number>
263 <page_number>
264 <string_value lines="1">0</string_value>
265 </page_number>
266 <tree_inference>
267 <optimality_criterion name="Maximum Parsimony"/>
268 </tree_inference>
269 <topology>
270 <outgroup>
271 <string_value lines="1">A</string_value>
272 </outgroup>
273 </topology>
274 </tree>
275 <taxa_data>
276 <mixed_fossil_and_extant>
277 <taxon name="A">
278 <fossil/>
279 </taxon>
280 <taxon name="B">
281 <fossil/>
282 </taxon>
283 </mixed_fossil_and_extant>
284 </taxa_data>
285 <character_data>
286 <character type="molecular" name="12S"/>
287 </character_data>
288 </source_tree>
289 </source>
290 <source name="Hill_Davis_2013">
291 <bibliographic_information>
292 <article>
293 <authors>
294 <author>
295 <surname>
296 <string_value lines="1">Hill</string_value>
297 </surname>
298 <other_names>
299 <string_value lines="1">Jon</string_value>
300 </other_names>
301 </author>
302 <author>
303 <surname>
304 <string_value lines="1">Davis</string_value>
305 </surname>
306 <other_names>
307 <string_value lines="1">Katie</string_value>
308 </other_names>
309 </author>
310 </authors>
311 <title>
312 <string_value lines="1">Another superb paper</string_value>
313 </title>
314 <year>
315 <integer_value rank="0">2013</integer_value>
316 </year>
317 </article>
318 </bibliographic_information>
319 <source_tree name="Hill_Davis_2013_1">
320 <tree>
321 <tree_string>
322 <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,Z:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value>
323 </tree_string>
324 <figure_legend>
325 <string_value lines="1">NA</string_value>
326 </figure_legend>
327 <figure_number>
328 <string_value lines="1">0</string_value>
329 </figure_number>
330 <page_number>
331 <string_value lines="1">0</string_value>
332 </page_number>
333 <tree_inference>
334 <optimality_criterion name="Maximum Parsimony"/>
335 </tree_inference>
336 <topology>
337 <outgroup>
338 <string_value lines="1">A</string_value>
339 </outgroup>
340 </topology>
341 </tree>
342 <taxa_data>
343 <mixed_fossil_and_extant>
344 <taxon name="A">
345 <fossil/>
346 </taxon>
347 <taxon name="B">
348 <fossil/>
349 </taxon>
350 </mixed_fossil_and_extant>
351 </taxa_data>
352 <character_data>
353 <character type="molecular" name="12S"/>
354 </character_data>
355 </source_tree>
356 <source_tree name="Hill_Davis_2013_2">
357 <tree>
358 <tree_string>
359 <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,Z:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value>
360 </tree_string>
361 <figure_legend>
362 <string_value lines="1">NA</string_value>
363 </figure_legend>
364 <figure_number>
365 <string_value lines="1">0</string_value>
366 </figure_number>
367 <page_number>
368 <string_value lines="1">0</string_value>
369 </page_number>
370 <tree_inference>
371 <optimality_criterion name="Maximum Parsimony"/>
372 </tree_inference>
373 <topology>
374 <outgroup>
375 <string_value lines="1">A</string_value>
376 </outgroup>
377 </topology>
378 </tree>
379 <taxa_data>
380 <mixed_fossil_and_extant>
381 <taxon name="A">
382 <fossil/>
383 </taxon>
384 <taxon name="B">
385 <fossil/>
386 </taxon>
387 </mixed_fossil_and_extant>
388 </taxa_data>
389 <character_data>
390 <character type="molecular" name="12S"/>
391 </character_data>
392 </source_tree>
252 </source>393 </source>
253 </sources>394 </sources>
254 <history/>395 <history/>
255396
=== added file 'stk/test/data/input/check_taxonomy.phyml'
--- stk/test/data/input/check_taxonomy.phyml 1970-01-01 00:00:00 +0000
+++ stk/test/data/input/check_taxonomy.phyml 2017-01-12 09:27:31 +0000
@@ -0,0 +1,67 @@
1<?xml version='1.0' encoding='utf-8'?>
2<phylo_storage>
3 <project_name>
4 <string_value lines="1">Test</string_value>
5 </project_name>
6 <sources>
7 <source name="Hill_2011">
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches